From Pengfei.Li at arm.com  Mon Sep  3 05:49:46 2018
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Mon, 3 Sep 2018 05:49:46 +0000
Subject: RFR(S): 8210152: Optimize integer divisible by power-of-2 check
In-Reply-To: <97407ba1-7aec-0893-b540-7e1472ce9529@oracle.com>
References: <DB7PR08MB3115A30536986552B26081CE96090@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <97407ba1-7aec-0893-b540-7e1472ce9529@oracle.com>
Message-ID: <DB7PR08MB31152CB3D365E14F849A284B960C0@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi Vladimir, Dean,

Thanks for your review.

> I don't see where negation is coming from for 'X % 2 == 0' expression.
> It should be only 2 instructions: 'cmp (X and 1), 0'
The 'cmp (X and 1), 0' is just what we expected. But there's redundant conditional negation coming from the possibly negative X handling in "X % 2".
For instance, X = -5, "X % 2" should be -1. So only "(X and 1)" operation is not enough. We have to negate the result.

> I will look on it next week. But it would be nice if you can provide small test to show this issue.
I've already provided a case of "if (a%2 == 0) { ... }" in JBS description. What code generated and what can be optimized are listed there.
You could see https://bugs.openjdk.java.net/browse/JDK-8210152 for details. You could also see the test case for this optimization I attached below.

> It looks like your matching may allow more patterns than expected. I was expecting it to look for < 0 or >= 0 for the conditional negation, but I don't see it.  
Yes. I didn't limit the if condition to <0 or >= 0 so it will match more patterns. But nothing is going wrong if this ideal transformation applies on more cases.
In pseudo code, if someone writes:
if ( some_condition ) { x = -x; }
if ( x == 0 ) { do_something(); }
The negation in 1st if-clause could always be eliminated whatever the condition is.

--
Thanks,
Pengfei


-- my test case attached below --
public class Foo {

    public static void main(String[] args) {
        int[] dividends = { 0, 17, 1553, -90, -35789, 0x80000000 };
        for (int i = 0; i < dividends.length; i++) {
            int x = dividends[i];
            System.out.println(testDivisible(x));
            System.out.println(testModulo(x));
            testCondNeg(x);
        }
        return;
    }

    public static int testDivisible(int x) {
        // Modulo result is only for zero check
        if (x % 4 == 0) {
            return 444;
        }
        return 555;
    }

    public static int testModulo(int x) {
        int y = x % 4;
        if (y == 0) {
            return 222;
        }
        // Modulo result is used elsewhere
        System.out.println(y);
        return 333;
    }

    public static void testCondNeg(int x) {
        // Pure conditional negation
        if (printAndIfNeg(x)) {
            x = -x;
        }
        if (x == 0) {
            System.out.println("zero!");
        }
    }

    static boolean printAndIfNeg(int x) {
        System.out.println(x);
        return x <= 0;
    }
}

From rwestrel at redhat.com  Mon Sep  3 07:21:16 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 03 Sep 2018 09:21:16 +0200
Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11
In-Reply-To: <98587949-4bef-51ad-0652-3a74f79af233@oracle.com>
References: <dk6zhx64pl5.fsf@rwestrel.remote.csb>
 <98587949-4bef-51ad-0652-3a74f79af233@oracle.com>
Message-ID: <dk67ek311xv.fsf@rwestrel.remote.csb>


Hi Vladimir,

Thanks for the review. Here is a new webrev that should address your
comment.

http://cr.openjdk.java.net/~roland/8209544/webrev.01/

Roland.

From erik.osterlund at oracle.com  Mon Sep  3 08:37:04 2018
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Mon, 3 Sep 2018 10:37:04 +0200
Subject: RFR: JDK-8207169: X86: Modularize cmpxchg-oop assembler for C1
 and C2
In-Reply-To: <dk6a7p21tmb.fsf@rwestrel.remote.csb>
References: <a812e13d-016c-11fb-8acb-6372a9cf196c@redhat.com>
 <e4b8c3a6-2456-0cc3-297f-953c38cd2255@redhat.com>
 <dk61sah45fd.fsf@rwestrel.remote.csb>
 <55efde87-1da8-e644-f45d-51dfbb3b934c@redhat.com>
 <5B86B7CE.3030507@oracle.com>
 <9c22a683-9065-6a2e-8f7c-202cc30cb378@redhat.com>
 <5B86BBB6.7000401@oracle.com>
 <026c2eef-fa68-cc74-0763-c3c54018342a@redhat.com>
 <A01BE002-96A0-472C-A8F4-B57FC390E543@oracle.com>
 <dk6d0ty1uj8.fsf@rwestrel.remote.csb>
 <4D4C2A84-7FB2-4E51-8EAF-51EBF73D0F14@oracle.com>
 <dk6a7p21tmb.fsf@rwestrel.remote.csb>
Message-ID: <5B8CF2B0.2060509@oracle.com>

Hi Roland,

First of all, I apologize for getting your name wrong in the last email.

On 2018-08-31 16:46, Roland Westrelin wrote:
>> Well... C1 uses CAS in the heap only for the Unsafe CAS intrinsic,
>> which is indeed inserted at parse time. And all other GCs alter the
>> CFG for the GC barriers in their CAS barriers, using LIR. Except
>> Epsilon I suppose.
> Are you talking about for instance G1BarrierSetC1::pre_barrier()? That
> method adds control flow withing a basic block. It doesn't hack the CFG
> (it doesn't add new basic blocks). How can the register allocator
> compute liveness without a correct CFG? Either
> G1BarrierSetC1::pre_barrier() is a simple enough case that register
> allocation is correct or there are some nasty bugs in there. In any
> case, building control flow within a block like
> G1BarrierSetC1::pre_barrier() does is an ugly hack. Doing anything more
> complicated that way is asking for trouble.

The C1 basic blocks are built and optimized as part of the HIR and are 
not to be changed after that. Once the HIR is generated, the LIR inserts 
operations required for lowering this optimized HIR to machine code. 
After IR::compute_code() of the HIR, those basic blocks are set in 
stone. That means that any control flow alterations needed by the 
LIRGenerator, which comes into play after that, is going to use branches 
within the HIR basic block instead (as we promised not to change the HIR 
basic blocks after the HIR is built and optimized). I can see how that 
might feel like a hack, but that is kind of the way that things are 
currently done in C1. It is used this way for all barrier sets today 
(UseCondCardMark for card marking GCs, for G1, ZGC), and it's also used 
by T_BOOLEAN normalization, switch statements, checking for referents in 
unsafe intrinsics etc. I suppose the stubs inserted at the LIR level 
also similarly break the basic block abstraction of the HIR level. These 
are things that can of course be changed into a more strict basic block 
model even at the LIR level. But I don't know how much that would help 
given that this is just the pass before lowering to machine code. But 
that is a whole different discussion.

I do not propose to move the GC barriers into the HIR - it is too early. 
I propose to insert it at the LIR level like all the other GCs, in a 
similar way to all the other GCs, using the same mechanisms used by all 
the other GCs.

@Roman: If you feel more comfortable using your own LIR_Op with your own 
lowering or stubs instead because you want this written in assembly for 
whatever reason, then I am fine with that too as long as it is contained 
in the shenandoah folders. What I do have reservations against is to 
change the API that everybody else uses to make the LIRGenerator raw CAS 
get lowered into a not raw Access call to the macro assembler, passing 
in temporary registers used by Shenandoah from above into the raw cas 
used by the not raw macro assembler access CAS.

For example, in ZGC we have a class LIR_OpZLoadBarrierTest LIR_Op 
defined in zBarrierSetC1.cpp, which allows us to do custom machine 
dependent lowering of the test itself, which can be inserted into the 
LIR list.

I hope we are on the same page here!

Thanks,
/Erik

> Roland.


From rwestrel at redhat.com  Mon Sep  3 08:41:21 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 03 Sep 2018 10:41:21 +0200
Subject: RFR: JDK-8207169: X86: Modularize cmpxchg-oop assembler for C1
 and C2
In-Reply-To: <5B8CF2B0.2060509@oracle.com>
References: <a812e13d-016c-11fb-8acb-6372a9cf196c@redhat.com>
 <e4b8c3a6-2456-0cc3-297f-953c38cd2255@redhat.com>
 <dk61sah45fd.fsf@rwestrel.remote.csb>
 <55efde87-1da8-e644-f45d-51dfbb3b934c@redhat.com>
 <5B86B7CE.3030507@oracle.com>
 <9c22a683-9065-6a2e-8f7c-202cc30cb378@redhat.com>
 <5B86BBB6.7000401@oracle.com>
 <026c2eef-fa68-cc74-0763-c3c54018342a@redhat.com>
 <A01BE002-96A0-472C-A8F4-B57FC390E543@oracle.com>
 <dk6d0ty1uj8.fsf@rwestrel.remote.csb>
 <4D4C2A84-7FB2-4E51-8EAF-51EBF73D0F14@oracle.com>
 <dk6a7p21tmb.fsf@rwestrel.remote.csb> <5B8CF2B0.2060509@oracle.com>
Message-ID: <dk6va7nynv2.fsf@rwestrel.remote.csb>


Hi Erik,

> The C1 basic blocks are built and optimized as part of the HIR and are 
> not to be changed after that. Once the HIR is generated, the LIR inserts 
> operations required for lowering this optimized HIR to machine code. 
> After IR::compute_code() of the HIR, those basic blocks are set in 
> stone. That means that any control flow alterations needed by the 
> LIRGenerator, which comes into play after that, is going to use branches 
> within the HIR basic block instead (as we promised not to change the HIR 
> basic blocks after the HIR is built and optimized). I can see how that 
> might feel like a hack, but that is kind of the way that things are 
> currently done in C1. It is used this way for all barrier sets today 
> (UseCondCardMark for card marking GCs, for G1, ZGC), and it's also used 
> by T_BOOLEAN normalization, switch statements, checking for referents in 
> unsafe intrinsics etc. I suppose the stubs inserted at the LIR level 
> also similarly break the basic block abstraction of the HIR level. These 
> are things that can of course be changed into a more strict basic block 
> model even at the LIR level. But I don't know how much that would help 
> given that this is just the pass before lowering to machine code. But 
> that is a whole different discussion.

Adding a loop within a basic block is simply not possible. The register
allocator won't know it's a loop and has no way to know operands are
live across iterations. So it's not like we even have a choice.

Roland.

From erik.osterlund at oracle.com  Mon Sep  3 08:54:34 2018
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Mon, 3 Sep 2018 10:54:34 +0200
Subject: RFR: JDK-8207169: X86: Modularize cmpxchg-oop assembler for C1
 and C2
In-Reply-To: <dk6va7nynv2.fsf@rwestrel.remote.csb>
References: <a812e13d-016c-11fb-8acb-6372a9cf196c@redhat.com>
 <e4b8c3a6-2456-0cc3-297f-953c38cd2255@redhat.com>
 <dk61sah45fd.fsf@rwestrel.remote.csb>
 <55efde87-1da8-e644-f45d-51dfbb3b934c@redhat.com>
 <5B86B7CE.3030507@oracle.com>
 <9c22a683-9065-6a2e-8f7c-202cc30cb378@redhat.com>
 <5B86BBB6.7000401@oracle.com>
 <026c2eef-fa68-cc74-0763-c3c54018342a@redhat.com>
 <A01BE002-96A0-472C-A8F4-B57FC390E543@oracle.com>
 <dk6d0ty1uj8.fsf@rwestrel.remote.csb>
 <4D4C2A84-7FB2-4E51-8EAF-51EBF73D0F14@oracle.com>
 <dk6a7p21tmb.fsf@rwestrel.remote.csb> <5B8CF2B0.2060509@oracle.com>
 <dk6va7nynv2.fsf@rwestrel.remote.csb>
Message-ID: <5B8CF6CA.1010600@oracle.com>

Hi Roman,

Who would clobber those registers between iterations though in your 
tight loop?

/Erik

On 2018-09-03 10:41, Roland Westrelin wrote:
> Hi Erik,
>
>> The C1 basic blocks are built and optimized as part of the HIR and are
>> not to be changed after that. Once the HIR is generated, the LIR inserts
>> operations required for lowering this optimized HIR to machine code.
>> After IR::compute_code() of the HIR, those basic blocks are set in
>> stone. That means that any control flow alterations needed by the
>> LIRGenerator, which comes into play after that, is going to use branches
>> within the HIR basic block instead (as we promised not to change the HIR
>> basic blocks after the HIR is built and optimized). I can see how that
>> might feel like a hack, but that is kind of the way that things are
>> currently done in C1. It is used this way for all barrier sets today
>> (UseCondCardMark for card marking GCs, for G1, ZGC), and it's also used
>> by T_BOOLEAN normalization, switch statements, checking for referents in
>> unsafe intrinsics etc. I suppose the stubs inserted at the LIR level
>> also similarly break the basic block abstraction of the HIR level. These
>> are things that can of course be changed into a more strict basic block
>> model even at the LIR level. But I don't know how much that would help
>> given that this is just the pass before lowering to machine code. But
>> that is a whole different discussion.
> Adding a loop within a basic block is simply not possible. The register
> allocator won't know it's a loop and has no way to know operands are
> live across iterations. So it's not like we even have a choice.
>
> Roland.


From rwestrel at redhat.com  Mon Sep  3 08:58:00 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 03 Sep 2018 10:58:00 +0200
Subject: RFR: JDK-8207169: X86: Modularize cmpxchg-oop assembler for C1
 and C2
In-Reply-To: <5B8CF6CA.1010600@oracle.com>
References: <a812e13d-016c-11fb-8acb-6372a9cf196c@redhat.com>
 <e4b8c3a6-2456-0cc3-297f-953c38cd2255@redhat.com>
 <dk61sah45fd.fsf@rwestrel.remote.csb>
 <55efde87-1da8-e644-f45d-51dfbb3b934c@redhat.com>
 <5B86B7CE.3030507@oracle.com>
 <9c22a683-9065-6a2e-8f7c-202cc30cb378@redhat.com>
 <5B86BBB6.7000401@oracle.com>
 <026c2eef-fa68-cc74-0763-c3c54018342a@redhat.com>
 <A01BE002-96A0-472C-A8F4-B57FC390E543@oracle.com>
 <dk6d0ty1uj8.fsf@rwestrel.remote.csb>
 <4D4C2A84-7FB2-4E51-8EAF-51EBF73D0F14@oracle.com>
 <dk6a7p21tmb.fsf@rwestrel.remote.csb> <5B8CF2B0.2060509@oracle.com>
 <dk6va7nynv2.fsf@rwestrel.remote.csb> <5B8CF6CA.1010600@oracle.com>
Message-ID: <dk6pnxvyn3b.fsf@rwestrel.remote.csb>


> Who would clobber those registers between iterations though in your 
> tight loop?

Ignoring cas, but with a simple example:

input = 0;

loop_entry:
input++;
array[i] = input;
// some other code
goto loop_entry;

input is live across iterations but given the loop is hidden in a basic
block, the register allocator expects it to be live from its
initialization to the store in the array. So it's free to assign a
register to input and reuse it in whatever code is in the rest of the
loop body.

Roland.

From rkennke at redhat.com  Mon Sep  3 08:59:13 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 03 Sep 2018 10:59:13 +0200
Subject: RFR: JDK-8207169: X86: Modularize cmpxchg-oop assembler for C1
 and C2
In-Reply-To: <5B8CF2B0.2060509@oracle.com>
References: <a812e13d-016c-11fb-8acb-6372a9cf196c@redhat.com>
 <e4b8c3a6-2456-0cc3-297f-953c38cd2255@redhat.com>
 <dk61sah45fd.fsf@rwestrel.remote.csb>
 <55efde87-1da8-e644-f45d-51dfbb3b934c@redhat.com>
 <5B86B7CE.3030507@oracle.com>
 <9c22a683-9065-6a2e-8f7c-202cc30cb378@redhat.com>
 <5B86BBB6.7000401@oracle.com>
 <026c2eef-fa68-cc74-0763-c3c54018342a@redhat.com>
 <A01BE002-96A0-472C-A8F4-B57FC390E543@oracle.com>
 <dk6d0ty1uj8.fsf@rwestrel.remote.csb>
 <4D4C2A84-7FB2-4E51-8EAF-51EBF73D0F14@oracle.com>
 <dk6a7p21tmb.fsf@rwestrel.remote.csb> <5B8CF2B0.2060509@oracle.com>
Message-ID: <AF0C3779-85EB-470F-B1BE-F2EB4E26776F@redhat.com>

I wasn't sure that the BarrierSetC1 interface allows to define custom ops. This sounds like a good natural solution. Ditto for C2. Let's see if we can make that work.

Roman

Am 3. September 2018 10:37:04 MESZ schrieb "Erik ?sterlund" <erik.osterlund at oracle.com>:
>Hi Roland,
>
>First of all, I apologize for getting your name wrong in the last
>email.
>
>On 2018-08-31 16:46, Roland Westrelin wrote:
>>> Well... C1 uses CAS in the heap only for the Unsafe CAS intrinsic,
>>> which is indeed inserted at parse time. And all other GCs alter the
>>> CFG for the GC barriers in their CAS barriers, using LIR. Except
>>> Epsilon I suppose.
>> Are you talking about for instance G1BarrierSetC1::pre_barrier()?
>That
>> method adds control flow withing a basic block. It doesn't hack the
>CFG
>> (it doesn't add new basic blocks). How can the register allocator
>> compute liveness without a correct CFG? Either
>> G1BarrierSetC1::pre_barrier() is a simple enough case that register
>> allocation is correct or there are some nasty bugs in there. In any
>> case, building control flow within a block like
>> G1BarrierSetC1::pre_barrier() does is an ugly hack. Doing anything
>more
>> complicated that way is asking for trouble.
>
>The C1 basic blocks are built and optimized as part of the HIR and are 
>not to be changed after that. Once the HIR is generated, the LIR
>inserts 
>operations required for lowering this optimized HIR to machine code. 
>After IR::compute_code() of the HIR, those basic blocks are set in 
>stone. That means that any control flow alterations needed by the 
>LIRGenerator, which comes into play after that, is going to use
>branches 
>within the HIR basic block instead (as we promised not to change the
>HIR 
>basic blocks after the HIR is built and optimized). I can see how that 
>might feel like a hack, but that is kind of the way that things are 
>currently done in C1. It is used this way for all barrier sets today 
>(UseCondCardMark for card marking GCs, for G1, ZGC), and it's also used
>
>by T_BOOLEAN normalization, switch statements, checking for referents
>in 
>unsafe intrinsics etc. I suppose the stubs inserted at the LIR level 
>also similarly break the basic block abstraction of the HIR level.
>These 
>are things that can of course be changed into a more strict basic block
>
>model even at the LIR level. But I don't know how much that would help 
>given that this is just the pass before lowering to machine code. But 
>that is a whole different discussion.
>
>I do not propose to move the GC barriers into the HIR - it is too
>early. 
>I propose to insert it at the LIR level like all the other GCs, in a 
>similar way to all the other GCs, using the same mechanisms used by all
>
>the other GCs.
>
>@Roman: If you feel more comfortable using your own LIR_Op with your
>own 
>lowering or stubs instead because you want this written in assembly for
>
>whatever reason, then I am fine with that too as long as it is
>contained 
>in the shenandoah folders. What I do have reservations against is to 
>change the API that everybody else uses to make the LIRGenerator raw
>CAS 
>get lowered into a not raw Access call to the macro assembler, passing 
>in temporary registers used by Shenandoah from above into the raw cas 
>used by the not raw macro assembler access CAS.
>
>For example, in ZGC we have a class LIR_OpZLoadBarrierTest LIR_Op 
>defined in zBarrierSetC1.cpp, which allows us to do custom machine 
>dependent lowering of the test itself, which can be inserted into the 
>LIR list.
>
>I hope we are on the same page here!
>
>Thanks,
>/Erik
>
>> Roland.

From erik.osterlund at oracle.com  Mon Sep  3 09:25:12 2018
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Mon, 3 Sep 2018 11:25:12 +0200
Subject: RFR: JDK-8207169: X86: Modularize cmpxchg-oop assembler for C1
 and C2
In-Reply-To: <AF0C3779-85EB-470F-B1BE-F2EB4E26776F@redhat.com>
References: <a812e13d-016c-11fb-8acb-6372a9cf196c@redhat.com>
 <e4b8c3a6-2456-0cc3-297f-953c38cd2255@redhat.com>
 <dk61sah45fd.fsf@rwestrel.remote.csb>
 <55efde87-1da8-e644-f45d-51dfbb3b934c@redhat.com>
 <5B86B7CE.3030507@oracle.com>
 <9c22a683-9065-6a2e-8f7c-202cc30cb378@redhat.com>
 <5B86BBB6.7000401@oracle.com>
 <026c2eef-fa68-cc74-0763-c3c54018342a@redhat.com>
 <A01BE002-96A0-472C-A8F4-B57FC390E543@oracle.com>
 <dk6d0ty1uj8.fsf@rwestrel.remote.csb>
 <4D4C2A84-7FB2-4E51-8EAF-51EBF73D0F14@oracle.com>
 <dk6a7p21tmb.fsf@rwestrel.remote.csb> <5B8CF2B0.2060509@oracle.com>
 <AF0C3779-85EB-470F-B1BE-F2EB4E26776F@redhat.com>
Message-ID: <5B8CFDF8.9040700@oracle.com>

Hi Roman,

It did not use to be possible as it needed its own enum switches all 
over the place. But as part of my C1 barrier set interface work, I 
wanted to make it possible to make your own LIR_Ops in the barrier set 
as well without cluttering the switches and inserted appropriate virutal 
calls to the LIR_Ops allowing you to do that. Now, basically, if your 
LIR_Op id is lir_none (which the default constructor sets it to), then 
it will use virtual calls into your LIR_Op in the switch statements.

I see how inserting LIR loops in the HIR basic block in the general case 
can go horribly wrong as Roland showed in his example. So if you feel 
like defining your own LIR_Op and lower it in your barrier set is the 
more natural solution for Shenandoah, you can use that mechanism of course.

It sounds like we have reached an agreement?

Thanks,
/Erik

On 2018-09-03 10:59, Roman Kennke wrote:
> I wasn't sure that the BarrierSetC1 interface allows to define custom ops. This sounds like a good natural solution. Ditto for C2. Let's see if we can make that work.
>
> Roman
>
> Am 3. September 2018 10:37:04 MESZ schrieb "Erik ?sterlund" <erik.osterlund at oracle.com>:
>> Hi Roland,
>>
>> First of all, I apologize for getting your name wrong in the last
>> email.
>>
>> On 2018-08-31 16:46, Roland Westrelin wrote:
>>>> Well... C1 uses CAS in the heap only for the Unsafe CAS intrinsic,
>>>> which is indeed inserted at parse time. And all other GCs alter the
>>>> CFG for the GC barriers in their CAS barriers, using LIR. Except
>>>> Epsilon I suppose.
>>> Are you talking about for instance G1BarrierSetC1::pre_barrier()?
>> That
>>> method adds control flow withing a basic block. It doesn't hack the
>> CFG
>>> (it doesn't add new basic blocks). How can the register allocator
>>> compute liveness without a correct CFG? Either
>>> G1BarrierSetC1::pre_barrier() is a simple enough case that register
>>> allocation is correct or there are some nasty bugs in there. In any
>>> case, building control flow within a block like
>>> G1BarrierSetC1::pre_barrier() does is an ugly hack. Doing anything
>> more
>>> complicated that way is asking for trouble.
>> The C1 basic blocks are built and optimized as part of the HIR and are
>> not to be changed after that. Once the HIR is generated, the LIR
>> inserts
>> operations required for lowering this optimized HIR to machine code.
>> After IR::compute_code() of the HIR, those basic blocks are set in
>> stone. That means that any control flow alterations needed by the
>> LIRGenerator, which comes into play after that, is going to use
>> branches
>> within the HIR basic block instead (as we promised not to change the
>> HIR
>> basic blocks after the HIR is built and optimized). I can see how that
>> might feel like a hack, but that is kind of the way that things are
>> currently done in C1. It is used this way for all barrier sets today
>> (UseCondCardMark for card marking GCs, for G1, ZGC), and it's also used
>>
>> by T_BOOLEAN normalization, switch statements, checking for referents
>> in
>> unsafe intrinsics etc. I suppose the stubs inserted at the LIR level
>> also similarly break the basic block abstraction of the HIR level.
>> These
>> are things that can of course be changed into a more strict basic block
>>
>> model even at the LIR level. But I don't know how much that would help
>> given that this is just the pass before lowering to machine code. But
>> that is a whole different discussion.
>>
>> I do not propose to move the GC barriers into the HIR - it is too
>> early.
>> I propose to insert it at the LIR level like all the other GCs, in a
>> similar way to all the other GCs, using the same mechanisms used by all
>>
>> the other GCs.
>>
>> @Roman: If you feel more comfortable using your own LIR_Op with your
>> own
>> lowering or stubs instead because you want this written in assembly for
>>
>> whatever reason, then I am fine with that too as long as it is
>> contained
>> in the shenandoah folders. What I do have reservations against is to
>> change the API that everybody else uses to make the LIRGenerator raw
>> CAS
>> get lowered into a not raw Access call to the macro assembler, passing
>> in temporary registers used by Shenandoah from above into the raw cas
>> used by the not raw macro assembler access CAS.
>>
>> For example, in ZGC we have a class LIR_OpZLoadBarrierTest LIR_Op
>> defined in zBarrierSetC1.cpp, which allows us to do custom machine
>> dependent lowering of the test itself, which can be inserted into the
>> LIR list.
>>
>> I hope we are on the same page here!
>>
>> Thanks,
>> /Erik
>>
>>> Roland.


From rkennke at redhat.com  Mon Sep  3 09:57:51 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 3 Sep 2018 11:57:51 +0200
Subject: RFR: JDK-8207169: X86: Modularize cmpxchg-oop assembler for C1
 and C2
In-Reply-To: <5B8CFDF8.9040700@oracle.com>
References: <a812e13d-016c-11fb-8acb-6372a9cf196c@redhat.com>
 <e4b8c3a6-2456-0cc3-297f-953c38cd2255@redhat.com>
 <dk61sah45fd.fsf@rwestrel.remote.csb>
 <55efde87-1da8-e644-f45d-51dfbb3b934c@redhat.com>
 <5B86B7CE.3030507@oracle.com>
 <9c22a683-9065-6a2e-8f7c-202cc30cb378@redhat.com>
 <5B86BBB6.7000401@oracle.com>
 <026c2eef-fa68-cc74-0763-c3c54018342a@redhat.com>
 <A01BE002-96A0-472C-A8F4-B57FC390E543@oracle.com>
 <dk6d0ty1uj8.fsf@rwestrel.remote.csb>
 <4D4C2A84-7FB2-4E51-8EAF-51EBF73D0F14@oracle.com>
 <dk6a7p21tmb.fsf@rwestrel.remote.csb> <5B8CF2B0.2060509@oracle.com>
 <AF0C3779-85EB-470F-B1BE-F2EB4E26776F@redhat.com>
 <5B8CFDF8.9040700@oracle.com>
Message-ID: <40ef7c2e-faad-ce6e-3fda-3e1c66aaf517@redhat.com>

Hi Erik,

> It did not use to be possible as it needed its own enum switches all
> over the place. But as part of my C1 barrier set interface work, I
> wanted to make it possible to make your own LIR_Ops in the barrier set
> as well without cluttering the switches and inserted appropriate virutal
> calls to the LIR_Ops allowing you to do that. Now, basically, if your
> LIR_Op id is lir_none (which the default constructor sets it to), then
> it will use virtual calls into your LIR_Op in the switch statements.
> 
> I see how inserting LIR loops in the HIR basic block in the general case
> can go horribly wrong as Roland showed in his example. So if you feel
> like defining your own LIR_Op and lower it in your barrier set is the
> more natural solution for Shenandoah, you can use that mechanism of course.
> 
> It sounds like we have reached an agreement?

I think so, at least for now. We'll try to turn our cmpxchg-oop problem
into LIR_Op and C2 node and see how that goes. I withdraw this RFR.

Thanks a lot,
Roman


> 
> Thanks,
> /Erik
> 
> On 2018-09-03 10:59, Roman Kennke wrote:
>> I wasn't sure that the BarrierSetC1 interface allows to define custom
>> ops. This sounds like a good natural solution. Ditto for C2. Let's see
>> if we can make that work.
>>
>> Roman
>>
>> Am 3. September 2018 10:37:04 MESZ schrieb "Erik ?sterlund"
>> <erik.osterlund at oracle.com>:
>>> Hi Roland,
>>>
>>> First of all, I apologize for getting your name wrong in the last
>>> email.
>>>
>>> On 2018-08-31 16:46, Roland Westrelin wrote:
>>>>> Well... C1 uses CAS in the heap only for the Unsafe CAS intrinsic,
>>>>> which is indeed inserted at parse time. And all other GCs alter the
>>>>> CFG for the GC barriers in their CAS barriers, using LIR. Except
>>>>> Epsilon I suppose.
>>>> Are you talking about for instance G1BarrierSetC1::pre_barrier()?
>>> That
>>>> method adds control flow withing a basic block. It doesn't hack the
>>> CFG
>>>> (it doesn't add new basic blocks). How can the register allocator
>>>> compute liveness without a correct CFG? Either
>>>> G1BarrierSetC1::pre_barrier() is a simple enough case that register
>>>> allocation is correct or there are some nasty bugs in there. In any
>>>> case, building control flow within a block like
>>>> G1BarrierSetC1::pre_barrier() does is an ugly hack. Doing anything
>>> more
>>>> complicated that way is asking for trouble.
>>> The C1 basic blocks are built and optimized as part of the HIR and are
>>> not to be changed after that. Once the HIR is generated, the LIR
>>> inserts
>>> operations required for lowering this optimized HIR to machine code.
>>> After IR::compute_code() of the HIR, those basic blocks are set in
>>> stone. That means that any control flow alterations needed by the
>>> LIRGenerator, which comes into play after that, is going to use
>>> branches
>>> within the HIR basic block instead (as we promised not to change the
>>> HIR
>>> basic blocks after the HIR is built and optimized). I can see how that
>>> might feel like a hack, but that is kind of the way that things are
>>> currently done in C1. It is used this way for all barrier sets today
>>> (UseCondCardMark for card marking GCs, for G1, ZGC), and it's also used
>>>
>>> by T_BOOLEAN normalization, switch statements, checking for referents
>>> in
>>> unsafe intrinsics etc. I suppose the stubs inserted at the LIR level
>>> also similarly break the basic block abstraction of the HIR level.
>>> These
>>> are things that can of course be changed into a more strict basic block
>>>
>>> model even at the LIR level. But I don't know how much that would help
>>> given that this is just the pass before lowering to machine code. But
>>> that is a whole different discussion.
>>>
>>> I do not propose to move the GC barriers into the HIR - it is too
>>> early.
>>> I propose to insert it at the LIR level like all the other GCs, in a
>>> similar way to all the other GCs, using the same mechanisms used by all
>>>
>>> the other GCs.
>>>
>>> @Roman: If you feel more comfortable using your own LIR_Op with your
>>> own
>>> lowering or stubs instead because you want this written in assembly for
>>>
>>> whatever reason, then I am fine with that too as long as it is
>>> contained
>>> in the shenandoah folders. What I do have reservations against is to
>>> change the API that everybody else uses to make the LIRGenerator raw
>>> CAS
>>> get lowered into a not raw Access call to the macro assembler, passing
>>> in temporary registers used by Shenandoah from above into the raw cas
>>> used by the not raw macro assembler access CAS.
>>>
>>> For example, in ZGC we have a class LIR_OpZLoadBarrierTest LIR_Op
>>> defined in zBarrierSetC1.cpp, which allows us to do custom machine
>>> dependent lowering of the test itself, which can be inserted into the
>>> LIR list.
>>>
>>> I hope we are on the same page here!
>>>
>>> Thanks,
>>> /Erik
>>>
>>>> Roland.
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180903/1732d708/signature-0001.asc>

From goetz.lindenmaier at sap.com  Mon Sep  3 12:27:56 2018
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Mon, 3 Sep 2018 12:27:56 +0000
Subject: RFR: 8208171: PPC64: Enrich SLP support
In-Reply-To: <346da54af45243c4bdaf475f118a450d@sap.com>
References: <OF27BFB560.0558A39A-ON002582D5.000EEFEA-492582D5.001F7E2E@notes.na.collabserv.com>
 <b235720a-353b-c0e4-d494-eb337eba06a7@linux.vnet.ibm.com>
 <OFCAFD0563.FF8B85D0-ON002582D6.0019C339-492582D6.0019F896@notes.na.collabserv.com>
 <f9be32d4-556e-5833-1c4c-8fed133f26d8@linux.vnet.ibm.com>
 <346da54af45243c4bdaf475f118a450d@sap.com>
Message-ID: <9553d65d98f74f37a35b49a1e39f015e@sap.com>

Hi Michihiro, 

I had a look at your change. 
First, this should have been reviewed on hotspot-compiler-dev. It is clearly 
a compiler change. 
http://mail.openjdk.java.net/mailman/listinfo says that hotspot-dev is for
"Technical discussion about the development of the HotSpot virtual machine that's not specific to any particular component"
while hotspot-compiler-dev is for
"Technical discussion about the development of the HotSpot bytecode compilers"

Also, I can not find all of the mail traffic in
http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html.
Is this a problem of the pipermail server?

For some reason this webrev lacks the links to browse the diffs.
Do you need to use a more recent webrev?  You can obtain it with
hg clone http://hg.openjdk.java.net/code-tools/webrev/ .

Why do you rename vnoreg to vnoregi?

Besides that the change is fine, thanks for implementing this!

Best regards,
  Goetz.


> -----Original Message-----
> From: Doerr, Martin
> Sent: Dienstag, 28. August 2018 19:35
> To: Gustavo Romero <gromero at linux.vnet.ibm.com>; Michihiro Horie
> <HORIE at jp.ibm.com>
> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker
> <volker.simonis at sap.com>
> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
> 
> Hi Michihiro,
> 
> thank you for implementing it. I have just taken a first look at your
> webrev.01.
> 
> It looks basically good. Only the Power version check seems to be incorrect.
> VM_Version::has_popcntb() checks for Power5.
> I believe most instructions are available with Power7.
> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with
> Power8?
> We should check this carefully.
> 
> Also, indentation in register_ppc.hpp could get improved.
> 
> Thanks and best regard,
> Martin
> 
> 
> -----Original Message-----
> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> Sent: Donnerstag, 26. Juli 2018 16:02
> To: Michihiro Horie <HORIE at jp.ibm.com>
> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>; ppc-aix-
> port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>
> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
> 
> Hi Michi,
> 
> On 07/26/2018 01:43 AM, Michihiro Horie wrote:
> > I updated webrev:
> > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/
> 
> Thanks for providing an updated webrev and for fixing indentation and
> function
> order in assembler_ppc.inline.hpp as well. I have no further comments :)
> 
> 
> Best Regards,
> Gustavo
> 
> >
> > Best regards,
> > --
> > Michihiro,
> > IBM Research - Tokyo
> >
> > Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi,
> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---
> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie
> wrote:
> >
> > From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> > To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-
> dev at openjdk.java.net, hotspot-dev at openjdk.java.net
> > Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin"
> <martin.doerr at sap.com>
> > Date: 2018/07/25 23:05
> > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
> >
> > -------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------
> -----------------------------------------------------
> >
> >
> >
> > Hi Michi,
> >
> > On 07/25/2018 02:43 AM, Michihiro Horie wrote:
> >  > Dear all,
> >  >
> >  > Would you review the following change?
> >  > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171
> >  > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00
> >  >
> >  > This change adds support for vectorized arithmetic calculation with SLP.
> >  >
> >  > The to_vr function is added to convert VSR to VR. Currently, vecX is
> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad,
> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the
> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the
> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to
> the ConvD2FNode::Value in convertnode.cpp.
> >
> > Looks good. Just a few comments:
> >
> > - In vmul4F_reg() would it be reasonable to use xvmulsp instead of
> vmaddfp in
> >    order to avoid the splat?
> >
> > - Although all instructions added by your change where introduced in ISA
> 2.06,
> >    so POWER7 and above are OK, as I see probes for
> PowerArchictecturePPC64=6|5 in
> >    vm_version_ppc.cpp (line 64),  I'm wondering if there is any control point
> to
> >    guarantee that these instructions won't be emitted on a CPU that does
> not
> >    support them.
> >
> > - I think that in general string in format %{} are in upper case. For instance,
> >    this the current output on optoassembly for vmul4F:
> >
> > 2941835 5b4     ADDI    R24, R24, #64
> > 2941836 5b8     vmaddfp  VSR32,VSR32,VSR36      ! mul packed4F
> > 2941837 5c0     STXVD2X     [R17], VSR32        // store 16-byte Vector
> >
> >    I think it would be better to be in upper case instead. I also think that if
> >    the node match emits more than one instruction all instructions must be
> listed
> >    in format %{}, since it's meant for detailed debugging. Finally I think it
> >    would be better to replace \t! by \t// in that string (unless I'm missing any
> >    special meaning for that char). So for vmul4F it would be something like:
> >
> > 2941835 5b4     ADDI      R24, R24, #64
> >                  VSPLTISW  VSR34, 0                 // Splat 0 imm in VSR34
> > 2941836 5b8     VMADDFP   VSR32,VSR32,VSR36,VSR34  // Mul packed4F
> > 2941837 5c0     STXVD2X   [R17], VSR32             // store 16-byte Vector
> >
> >
> > But feel free to change anything just after you get additional reviews :)
> >
> >
> >  > I confirmed this change with JTREG. In addition, I used attached micro
> benchmarks.
> >  > /(See attached file: slp_microbench.zip)/
> >
> > Thanks for sharing it.
> > Btw, another option to host it would be in the CR
> > server, in http://cr.openjdk.java.net/~mhorie/8208171
> >
> >
> > Best regards,
> > Gustavo
> >
> >  >
> >  > Best regards,
> >  > --
> >  > Michihiro,
> >  > IBM Research - Tokyo
> >  >
> >
> >
> >

From gromero at linux.vnet.ibm.com  Mon Sep  3 12:56:44 2018
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Mon, 3 Sep 2018 09:56:44 -0300
Subject: RFR: 8208171: PPC64: Enrich SLP support
In-Reply-To: <9553d65d98f74f37a35b49a1e39f015e@sap.com>
References: <OF27BFB560.0558A39A-ON002582D5.000EEFEA-492582D5.001F7E2E@notes.na.collabserv.com>
 <b235720a-353b-c0e4-d494-eb337eba06a7@linux.vnet.ibm.com>
 <OFCAFD0563.FF8B85D0-ON002582D6.0019C339-492582D6.0019F896@notes.na.collabserv.com>
 <f9be32d4-556e-5833-1c4c-8fed133f26d8@linux.vnet.ibm.com>
 <346da54af45243c4bdaf475f118a450d@sap.com>
 <9553d65d98f74f37a35b49a1e39f015e@sap.com>
Message-ID: <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com>

Hi Goetz,

On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote:
> Also, I can not find all of the mail traffic in
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html.
> Is this a problem of the pipermail server?
> 
> For some reason this webrev lacks the links to browse the diffs.
> Do you need to use a more recent webrev?  You can obtain it with
> hg clone http://hg.openjdk.java.net/code-tools/webrev/ .

Yes, probably it was a problem of the pipermail or in some relay.
I noted the same thing, i.e. at least one Michi reply arrived
to me but missed a ML.

The initial discussion is here:
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html

I understand Martin reviewed the last webrev in that thread, which is
http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/  (taken from
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html)

Martin's review of webrev.01:
http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html

and Michi's reply to Martin's review of webrev.01:
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02,
taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html).

and your last review:
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html


HTH.

Best regards,
Gustavo
  
> Why do you rename vnoreg to vnoregi?
> 
> Besides that the change is fine, thanks for implementing this!
> 
> Best regards,
>    Goetz.
> 
> 
>> -----Original Message-----
>> From: Doerr, Martin
>> Sent: Dienstag, 28. August 2018 19:35
>> To: Gustavo Romero <gromero at linux.vnet.ibm.com>; Michihiro Horie
>> <HORIE at jp.ibm.com>
>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker
>> <volker.simonis at sap.com>
>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>>
>> Hi Michihiro,
>>
>> thank you for implementing it. I have just taken a first look at your
>> webrev.01.
>>
>> It looks basically good. Only the Power version check seems to be incorrect.
>> VM_Version::has_popcntb() checks for Power5.
>> I believe most instructions are available with Power7.
>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with
>> Power8?
>> We should check this carefully.
>>
>> Also, indentation in register_ppc.hpp could get improved.
>>
>> Thanks and best regard,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
>> Sent: Donnerstag, 26. Juli 2018 16:02
>> To: Michihiro Horie <HORIE at jp.ibm.com>
>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>> dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>; ppc-aix-
>> port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>
>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>
>> Hi Michi,
>>
>> On 07/26/2018 01:43 AM, Michihiro Horie wrote:
>>> I updated webrev:
>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/
>>
>> Thanks for providing an updated webrev and for fixing indentation and
>> function
>> order in assembler_ppc.inline.hpp as well. I have no further comments :)
>>
>>
>> Best Regards,
>> Gustavo
>>
>>>
>>> Best regards,
>>> --
>>> Michihiro,
>>> IBM Research - Tokyo
>>>
>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi,
>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---
>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie
>> wrote:
>>>
>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-
>> dev at openjdk.java.net, hotspot-dev at openjdk.java.net
>>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin"
>> <martin.doerr at sap.com>
>>> Date: 2018/07/25 23:05
>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>>
>>> -------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> -----------------------------------------------------
>>>
>>>
>>>
>>> Hi Michi,
>>>
>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:
>>>   > Dear all,
>>>   >
>>>   > Would you review the following change?
>>>   > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171
>>>   > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00
>>>   >
>>>   > This change adds support for vectorized arithmetic calculation with SLP.
>>>   >
>>>   > The to_vr function is added to convert VSR to VR. Currently, vecX is
>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad,
>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the
>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the
>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to
>> the ConvD2FNode::Value in convertnode.cpp.
>>>
>>> Looks good. Just a few comments:
>>>
>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of
>> vmaddfp in
>>>     order to avoid the splat?
>>>
>>> - Although all instructions added by your change where introduced in ISA
>> 2.06,
>>>     so POWER7 and above are OK, as I see probes for
>> PowerArchictecturePPC64=6|5 in
>>>     vm_version_ppc.cpp (line 64),  I'm wondering if there is any control point
>> to
>>>     guarantee that these instructions won't be emitted on a CPU that does
>> not
>>>     support them.
>>>
>>> - I think that in general string in format %{} are in upper case. For instance,
>>>     this the current output on optoassembly for vmul4F:
>>>
>>> 2941835 5b4     ADDI    R24, R24, #64
>>> 2941836 5b8     vmaddfp  VSR32,VSR32,VSR36      ! mul packed4F
>>> 2941837 5c0     STXVD2X     [R17], VSR32        // store 16-byte Vector
>>>
>>>     I think it would be better to be in upper case instead. I also think that if
>>>     the node match emits more than one instruction all instructions must be
>> listed
>>>     in format %{}, since it's meant for detailed debugging. Finally I think it
>>>     would be better to replace \t! by \t// in that string (unless I'm missing any
>>>     special meaning for that char). So for vmul4F it would be something like:
>>>
>>> 2941835 5b4     ADDI      R24, R24, #64
>>>                   VSPLTISW  VSR34, 0                 // Splat 0 imm in VSR34
>>> 2941836 5b8     VMADDFP   VSR32,VSR32,VSR36,VSR34  // Mul packed4F
>>> 2941837 5c0     STXVD2X   [R17], VSR32             // store 16-byte Vector
>>>
>>>
>>> But feel free to change anything just after you get additional reviews :)
>>>
>>>
>>>   > I confirmed this change with JTREG. In addition, I used attached micro
>> benchmarks.
>>>   > /(See attached file: slp_microbench.zip)/
>>>
>>> Thanks for sharing it.
>>> Btw, another option to host it would be in the CR
>>> server, in http://cr.openjdk.java.net/~mhorie/8208171
>>>
>>>
>>> Best regards,
>>> Gustavo
>>>
>>>   >
>>>   > Best regards,
>>>   > --
>>>   > Michihiro,
>>>   > IBM Research - Tokyo
>>>   >
>>>
>>>
>>>
> 


From martin.doerr at sap.com  Mon Sep  3 17:18:18 2018
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 3 Sep 2018 17:18:18 +0000
Subject: RFR: 8208171: PPC64: Enrich SLP support
In-Reply-To: <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com>
References: <OF27BFB560.0558A39A-ON002582D5.000EEFEA-492582D5.001F7E2E@notes.na.collabserv.com>
 <b235720a-353b-c0e4-d494-eb337eba06a7@linux.vnet.ibm.com>
 <OFCAFD0563.FF8B85D0-ON002582D6.0019C339-492582D6.0019F896@notes.na.collabserv.com>
 <f9be32d4-556e-5833-1c4c-8fed133f26d8@linux.vnet.ibm.com>
 <346da54af45243c4bdaf475f118a450d@sap.com>
 <9553d65d98f74f37a35b49a1e39f015e@sap.com>
 <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com>
Message-ID: <fc4310e4a8544c69b3d14fd593a065fc@sap.com>

Hi Gustavo and Michihiro,

we noticed jtreg test failures when using this change:
compiler/runtime/safepoints/TestRegisterRestoring.java
compiler/runtime/Test7196199.java

TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000.

We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler.
The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default.

That's what I found out so far. Maybe you have an idea?

I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented.

Best regards,
Martin


-----Original Message-----
From: hotspot-dev <hotspot-dev-bounces at openjdk.java.net> On Behalf Of Gustavo Romero
Sent: Montag, 3. September 2018 14:57
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Michihiro Horie <HORIE at jp.ibm.com>
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; hotspot-dev at openjdk.java.net
Subject: Re: RFR: 8208171: PPC64: Enrich SLP support

Hi Goetz,

On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote:
> Also, I can not find all of the mail traffic in
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html.
> Is this a problem of the pipermail server?
> 
> For some reason this webrev lacks the links to browse the diffs.
> Do you need to use a more recent webrev?  You can obtain it with
> hg clone http://hg.openjdk.java.net/code-tools/webrev/ .

Yes, probably it was a problem of the pipermail or in some relay.
I noted the same thing, i.e. at least one Michi reply arrived
to me but missed a ML.

The initial discussion is here:
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html

I understand Martin reviewed the last webrev in that thread, which is
http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/  (taken from
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html)

Martin's review of webrev.01:
http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html

and Michi's reply to Martin's review of webrev.01:
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02,
taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html).

and your last review:
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html


HTH.

Best regards,
Gustavo
  
> Why do you rename vnoreg to vnoregi?
> 
> Besides that the change is fine, thanks for implementing this!
> 
> Best regards,
>    Goetz.
> 
> 
>> -----Original Message-----
>> From: Doerr, Martin
>> Sent: Dienstag, 28. August 2018 19:35
>> To: Gustavo Romero <gromero at linux.vnet.ibm.com>; Michihiro Horie
>> <HORIE at jp.ibm.com>
>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker
>> <volker.simonis at sap.com>
>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>>
>> Hi Michihiro,
>>
>> thank you for implementing it. I have just taken a first look at your
>> webrev.01.
>>
>> It looks basically good. Only the Power version check seems to be incorrect.
>> VM_Version::has_popcntb() checks for Power5.
>> I believe most instructions are available with Power7.
>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with
>> Power8?
>> We should check this carefully.
>>
>> Also, indentation in register_ppc.hpp could get improved.
>>
>> Thanks and best regard,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
>> Sent: Donnerstag, 26. Juli 2018 16:02
>> To: Michihiro Horie <HORIE at jp.ibm.com>
>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>> dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>; ppc-aix-
>> port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>
>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>
>> Hi Michi,
>>
>> On 07/26/2018 01:43 AM, Michihiro Horie wrote:
>>> I updated webrev:
>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/
>>
>> Thanks for providing an updated webrev and for fixing indentation and
>> function
>> order in assembler_ppc.inline.hpp as well. I have no further comments :)
>>
>>
>> Best Regards,
>> Gustavo
>>
>>>
>>> Best regards,
>>> --
>>> Michihiro,
>>> IBM Research - Tokyo
>>>
>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi,
>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---
>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie
>> wrote:
>>>
>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-
>> dev at openjdk.java.net, hotspot-dev at openjdk.java.net
>>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin"
>> <martin.doerr at sap.com>
>>> Date: 2018/07/25 23:05
>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>>
>>> -------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> -----------------------------------------------------
>>>
>>>
>>>
>>> Hi Michi,
>>>
>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:
>>>   > Dear all,
>>>   >
>>>   > Would you review the following change?
>>>   > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171
>>>   > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00
>>>   >
>>>   > This change adds support for vectorized arithmetic calculation with SLP.
>>>   >
>>>   > The to_vr function is added to convert VSR to VR. Currently, vecX is
>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad,
>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the
>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the
>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to
>> the ConvD2FNode::Value in convertnode.cpp.
>>>
>>> Looks good. Just a few comments:
>>>
>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of
>> vmaddfp in
>>>     order to avoid the splat?
>>>
>>> - Although all instructions added by your change where introduced in ISA
>> 2.06,
>>>     so POWER7 and above are OK, as I see probes for
>> PowerArchictecturePPC64=6|5 in
>>>     vm_version_ppc.cpp (line 64),  I'm wondering if there is any control point
>> to
>>>     guarantee that these instructions won't be emitted on a CPU that does
>> not
>>>     support them.
>>>
>>> - I think that in general string in format %{} are in upper case. For instance,
>>>     this the current output on optoassembly for vmul4F:
>>>
>>> 2941835 5b4     ADDI    R24, R24, #64
>>> 2941836 5b8     vmaddfp  VSR32,VSR32,VSR36      ! mul packed4F
>>> 2941837 5c0     STXVD2X     [R17], VSR32        // store 16-byte Vector
>>>
>>>     I think it would be better to be in upper case instead. I also think that if
>>>     the node match emits more than one instruction all instructions must be
>> listed
>>>     in format %{}, since it's meant for detailed debugging. Finally I think it
>>>     would be better to replace \t! by \t// in that string (unless I'm missing any
>>>     special meaning for that char). So for vmul4F it would be something like:
>>>
>>> 2941835 5b4     ADDI      R24, R24, #64
>>>                   VSPLTISW  VSR34, 0                 // Splat 0 imm in VSR34
>>> 2941836 5b8     VMADDFP   VSR32,VSR32,VSR36,VSR34  // Mul packed4F
>>> 2941837 5c0     STXVD2X   [R17], VSR32             // store 16-byte Vector
>>>
>>>
>>> But feel free to change anything just after you get additional reviews :)
>>>
>>>
>>>   > I confirmed this change with JTREG. In addition, I used attached micro
>> benchmarks.
>>>   > /(See attached file: slp_microbench.zip)/
>>>
>>> Thanks for sharing it.
>>> Btw, another option to host it would be in the CR
>>> server, in http://cr.openjdk.java.net/~mhorie/8208171
>>>
>>>
>>> Best regards,
>>> Gustavo
>>>
>>>   >
>>>   > Best regards,
>>>   > --
>>>   > Michihiro,
>>>   > IBM Research - Tokyo
>>>   >
>>>
>>>
>>>
> 


From gromero at linux.vnet.ibm.com  Mon Sep  3 22:15:23 2018
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Mon, 3 Sep 2018 19:15:23 -0300
Subject: RFR (S): 8209972: [GRAAL] Don't run RTM tests with Graal
In-Reply-To: <3ae68161-0f78-2766-655f-fe3fe4a4719f@oracle.com>
References: <9148c105-1bd8-0852-9130-6b3c7823330e@linux.vnet.ibm.com>
 <3ae68161-0f78-2766-655f-fe3fe4a4719f@oracle.com>
Message-ID: <ebccc7d7-aa73-9f00-7b91-379ff5cf1a84@linux.vnet.ibm.com>

Hi Vladimir,

Thanks a lot for reviewing it and for your comments.

On 08/31/2018 03:12 PM, Vladimir Kozlov wrote:
> Hi Gustavo,
> 
> I think you should replace !Compiler.isGraalEnabled() with Compiler.isC2Enabled() because C2 may be switched off with TieredStopAtLevel < 4 flag

Yes, although currently afaics all tests will explicitly enabled C2 (for
instance, passing -Xcomp and -XX:-TieredCompilation) or implicitly enable C2
through a warming up before testing, I agree that nothing forbids one to
switch off C2 with TieredStopAtLevel = level < 4 for a test case. It also
looks better to list explicitly which compilers do support RTM instead of
the ones that don't support it.

I've updated the webrev accordingly:

http://cr.openjdk.java.net/~gromero/8209972/v2/

diff in there looks odd so I generated another one with --patience for a
better (IMO) diff format:

http://cr.openjdk.java.net/~gromero/8209972/v2/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v2.diff


> Also can platforms check be replaced with one vmRTMCPU()? Is ppc64 return true from vmRTMCPU()?

For example, on Linux the following cases are possible regarding CPU / OS
RTM support:

POWER7   : cpu = false, os = false         => vm.rtm.cpu = false
POWER8   : cpu = true,  os = false | true  => vm.rtm.cpu = false | true
POWER9 VM: cpu = true,  os = false | true  => vm.rtm.cpu = false | true
POWER9 NV: cpu = true,  os = false         => vm.rtm.cpu = false

PPC64 will return 'true' for vmRTMCPU() _only_ if both CPU and OS support
RTM. In other words, when @requires asks for "vm.rtm.cpu & vm.rtm.os" it
really looks like a tautology because if "vm.rtm.cpu" is 'true' it implies
"vm.rtm.os" being 'true' as well, otherwise the JVM would never advertise
the "rtm" feature (which is used by "vm.rtm.cpu"). That seems true for
Linux and for AIX.

That said I don't think that the platforms check can be replaced with one
vmRTMCPU(), because in some cases it's necessary to run a test for
cpu = false and compiler = true, i.e. it's necessary to run a test on an
unsupported CPU for a given platform _only if_ the compiler in use supports
RTM (like C2). So if, for instance, we do:

'vm.rtm.compiler = C2 && vmRTMCPU()' and use 'vm.rtm.compiler' in @requires
we "tie" CPU+OS RTM support to compiler RTM support and the evaluation
returns 'false' for cpu = false and compiler = true, skipping the test
(vm.rtm.compiler = false). On the other hand, to evaluate 'vm.rtm.compiler'
as 'true' and run the test in that case one could match for
'!vm.rtm.compiler', but if compiler = false (Graal) '!vm.rtm.compiler' will
be evaluated as 'true' and the test will run even thought the Graal
compiler is selected, which is wrong.

Hence 'vm.rtm.compiler' property must not rely on vmRTMCPU() and must
contain its own list of supported compilers with RTM support for each
platform IMO. Basically we can't ask the JVM about the compiler's support
for RTM, since the JVM can only tell us about the CPU+OS support for RTM
regarding the CPU and OS in which the JVM is running on.


> And since you check cpu in here why not to replace all vm.rtm.* with one vm.rtm.supported? In such case you would need only one @requires checks in tests instead of:
> 
> vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os & vm.rtm.compiler

I think it's not possible either. Currently there are 5 match cases in
RTM tests:

gromero at moog:~/hg/jdk/jdk/test/hotspot/jtreg/compiler/rtm$ fgrep @require -RIn | cut -d ' ' -f 2-40 | sort -u
* @requires !(vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os)
* @requires vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os
* @requires (!vm.rtm.cpu) & (vm.flavor == "server" & !vm.emulatedClient)
* @requires vm.rtm.cpu & !(vm.flavor == "server" & !vm.emulatedClient)

which can be simplified 5 cases as:

1:          !(flavor == "server" & !emulatedClient  & cpu & os)
2:            flavor == "server" & !emulatedClient  & cpu & os
3: (!cpu) &  (flavor == "server" & !emulatedClient)
4:   cpu  & !(flavor == "server" & !emulatedClient)
5: no @requires

I understand that case 1 and 2 (since CPU implies OS) can be simplified as:


1:          !(flavor == "server" & !emulatedClient  & cpu)
2:            flavor == "server" & !emulatedClient  & cpu
3: (!cpu) &  (flavor == "server" & !emulatedClient)
4:   cpu  & !(flavor == "server" & !emulatedClient)
5: no @requires

and case 1 and 2 are mere opposites, so we have 4 cases:

1:          !(flavor == "server" & !emulatedClient  & cpu)
3: (!cpu) &  (flavor == "server" & !emulatedClient)
4:   cpu  & !(flavor == "server" & !emulatedClient)
5: no @requires

We could simplify further making P = (flavor == "server" & !emulatedClient),
and make:

1:          !(P & cpu)
3: (!cpu) &  (P)
4:   cpu  & !(P)
5: no @requires

So if we add a compiler = C2 && (x64 | PPC) property to each of them in
order to control running the tests only if the selected compiler on a
given platform has RTM support (skipping Graal, for instance):

1:          !(P & cpu) & compiler
3: (!cpu) &  (P)       & compiler
4:   cpu  & !(P)       & compiler
5: no @requires        & compiler

So it looks like that at minimum we would need 3 properties, but IMO it's
not worth to add another property P = (flavor == "server" & !emulatedClient)
just to simplify further the @requires line.

In summing up, I think it's only possible to replace 'cpu & os' by 'cpu',
so I updated the webrev removing the vm.rtm.os property and keeping only
vm.rtm.cpu and vm.rtm.compiler (plus flavor and emulatedClient checks).

I've tested the following scenarios and observed no regression [1]:

1. X86_64 w/ RTM
2. X86_64 w/ RTM + Graal enabled
3. POWER7: no CPU+OS support for RTM
4. POWER8: CPU+OS support for RTM

But I think we need a confirmation from SAP about AIX.


Best regards,
Gustavo

[1]

** X86_64 w/ RTM **
Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
Passed: compiler/rtm/locking/TestRTMAbortRatio.java
Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
Passed: compiler/rtm/locking/TestRTMRetryCount.java
Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
Passed: compiler/rtm/locking/TestUseRTMDeopt.java
Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
Test results: passed: 30


** X86_64 w/ RTM + Graal enabled **
Test results: no tests selected (all RTM tests skipped)


** POWER7: no CPU+OS support for RTM **
Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
Test results: passed: 10


** POWER8: CPU+OS support for RTM **
Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
Passed: compiler/rtm/locking/TestRTMAbortRatio.java
Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
Passed: compiler/rtm/locking/TestRTMRetryCount.java
Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
Passed: compiler/rtm/locking/TestUseRTMDeopt.java
Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
Test results: passed: 30


> Thanks,
> Vladimir
> 
> On 8/31/18 8:38 AM, Gustavo Romero wrote:
>> Hi,
>>
>> Could the following small change be reviewed please?
>>
>> Bug   : https://bugs.openjdk.java.net/browse/JDK-8209972
>> Webrev: http://cr.openjdk.java.net/~gromero/8209972/v1/
>>
>> It disables all RTM tests when a compiler not supporting RTM (e.g. Graal)
>> is selected on platforms that can have CPU/OS with RTM support.
>>
>> It also disables all RTM tests for any other platform that has not a single
>> compiler supporting RTM.
>>
>> The RTM support was first added to C2 compiler and once checkers for RTM
>> (notably vm.rtm.cpu) find the feature "rtm" advertised by the JVM they
>> assume that a compiler supporting RTM is available for sure ("rtm" is
>> advertised only if RTM is supported by both CPU and OS). Later the JVM
>> began to allow the selection of a compiler different from C2, like Graal,
>> and it became possible to select a compiler without RTM support despite the
>> fact that both the CPU and the OS support RTM. Thus for platforms
>> supporting Graal or any other specific compiler the compiler availability for
>> the RTM tests must be adjusted and if the selected compiler does not
>> support RTM then all RTM tests must be skipped, including the ones meant
>> for platforms without CPU or OS RTM support (e.g. *Unsupported*.java)
>> because in some cases, like in TestUseRTMLockingOptionOnUnsupportedCPU.java,
>> the test expects JVM initialization errors that will never occur because the
>> problem is not that the RTM support for CPU or OS is missing, but rather
>> because the selected compiler does not support RTM.
>>
>> That change adds a new VM property 'vm.rtm.compiler' which can be used to
>> filter out compilers without RTM support for specific platforms and adapts
>> the current RTM tests to use that new property.
>>
>> Nothing changes regarding the number of passing/selected tests for the
>> various cpu/os/compiler combinations on platforms that currently might
>> support RTM [1], except when Graal is in use.
>>
>> Thank you.
>>
>> Best regards,
>> Gustavo
>>
>>
>> [1]
>>
>> ** X64 w/ CPU and OS supporting RTM **
>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>> Test results: passed: 30
>>
>> ** X64 w/ CPU and OS supporting RTM + Graal compiler wo/ RTM support **
>> Test results: no tests selected (all RTM tests skipped)
>>
>> ** POWER8 w/ CPU and OS supporting RTM **
>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>> Test results: passed: 30
>>
>> ** POWER7 wo/ CPU and OS supporting RTM **
>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>> Test results: passed: 10
>>
> 


From HORIE at jp.ibm.com  Tue Sep  4 05:32:01 2018
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Tue, 4 Sep 2018 14:32:01 +0900
Subject: RFR: 8208171: PPC64: Enrich SLP support
In-Reply-To: <fc4310e4a8544c69b3d14fd593a065fc@sap.com>
References: <OF27BFB560.0558A39A-ON002582D5.000EEFEA-492582D5.001F7E2E@notes.na.collabserv.com>
 <b235720a-353b-c0e4-d494-eb337eba06a7@linux.vnet.ibm.com>
 <OFCAFD0563.FF8B85D0-ON002582D6.0019C339-492582D6.0019F896@notes.na.collabserv.com>
 <f9be32d4-556e-5833-1c4c-8fed133f26d8@linux.vnet.ibm.com>
 <346da54af45243c4bdaf475f118a450d@sap.com>
 <9553d65d98f74f37a35b49a1e39f015e@sap.com>
 <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com>
 <fc4310e4a8544c69b3d14fd593a065fc@sap.com>
Message-ID: <OF1DDE5FDC.048940CF-ON002582FD.007F93C0-492582FE.001E65D8@notes.na.collabserv.com>


Hi Goetz, Martin, and Gustavo,


>First, this should have been reviewed on hotspot-compiler-dev. It is
clearly
>a compiler change.
>http://mail.openjdk.java.net/mailman/listinfo
?says that hotspot-dev is for
>"Technical discussion about the development of the HotSpot virtual machine
that's not specific to any particular component"
>while hotspot-compiler-dev is for
>"Technical discussion about the development of the HotSpot bytecode
compilers"
I understood the instruction and would use hotspot-compiler-dev in future
RFRs, thanks.


> Why do you rename vnoreg to vnoregi?
I followed the way of coding for vsnoreg and vsnoregi, but the renaming
does not look necessary. I would get this part back. Should I also rename
vsnoregi to vsnoreg?


>we noticed jtreg test failures when using this change:
>compiler/runtime/safepoints/TestRegisterRestoring.java
>compiler/runtime/Test7196199.java
>
>TestRegisterRestoring is a simple test which returns arbitrary results
instead of 10000.
>
>We didn't see it on all machines, so it might be an issue with
saving&restoring VR registers in the signal handler.
>The machine which I have used has "SUSE Linux Enterprise Server 12 SP3"
with kernel 4.4.126-94.22-default.
Thank you for letting me know the issue, I will try to reproduce this on a
SUSE machine.


>I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when
your patch is applied. Looks like matching the vector nodes needs to be
prevented.
Thank you for pointing out another issue. Currently I do not hit this
problem, but preventing to match the vector nodes makes sense to avoid the
crash. I did not prepare match rules for non-vector nodes, so it might be
better to prepare them similarly like the Replicate* rules, in any case.


Gustavo, thanks for the wrap-up!


Best regards,
--
Michihiro,
IBM Research - Tokyo


From:	"Doerr, Martin" <martin.doerr at sap.com>
To:	Gustavo Romero <gromero at linux.vnet.ibm.com>, "Lindenmaier,
            Goetz" <goetz.lindenmaier at sap.com>, Michihiro Horie
            <HORIE at jp.ibm.com>
Cc:	hotspot compiler <hotspot-compiler-dev at openjdk.java.net>,
            "hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>
Date:	2018/09/04 02:18
Subject:	RE: RFR: 8208171: PPC64: Enrich SLP support


Hi Gustavo and Michihiro,

we noticed jtreg test failures when using this change:
compiler/runtime/safepoints/TestRegisterRestoring.java
compiler/runtime/Test7196199.java

TestRegisterRestoring is a simple test which returns arbitrary results
instead of 10000.

We didn't see it on all machines, so it might be an issue with
saving&restoring VR registers in the signal handler.
The machine which I have used has "SUSE Linux Enterprise Server 12 SP3"
with kernel 4.4.126-94.22-default.

That's what I found out so far. Maybe you have an idea?

I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when
your patch is applied. Looks like matching the vector nodes needs to be
prevented.

Best regards,
Martin


-----Original Message-----
From: hotspot-dev <hotspot-dev-bounces at openjdk.java.net> On Behalf Of
Gustavo Romero
Sent: Montag, 3. September 2018 14:57
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Michihiro Horie
<HORIE at jp.ibm.com>
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>;
hotspot-dev at openjdk.java.net
Subject: Re: RFR: 8208171: PPC64: Enrich SLP support

Hi Goetz,

On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote:
> Also, I can not find all of the mail traffic in
>
http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html
.
> Is this a problem of the pipermail server?
>
> For some reason this webrev lacks the links to browse the diffs.
> Do you need to use a more recent webrev?  You can obtain it with
> hg clone
http://hg.openjdk.java.net/code-tools/webrev/
 .

Yes, probably it was a problem of the pipermail or in some relay.
I noted the same thing, i.e. at least one Michi reply arrived
to me but missed a ML.

The initial discussion is here:
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html


I understand Martin reviewed the last webrev in that thread, which is
http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/
  (taken from
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html
)

Martin's review of webrev.01:
http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html


and Michi's reply to Martin's review of webrev.01:
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html
 (with webrev.02,
taken from
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html
).

and your last review:
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html


HTH.

Best regards,
Gustavo

> Why do you rename vnoreg to vnoregi?
>
> Besides that the change is fine, thanks for implementing this!
>
> Best regards,
>    Goetz.
>
>
>> -----Original Message-----
>> From: Doerr, Martin
>> Sent: Dienstag, 28. August 2018 19:35
>> To: Gustavo Romero <gromero at linux.vnet.ibm.com>; Michihiro Horie
>> <HORIE at jp.ibm.com>
>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker
>> <volker.simonis at sap.com>
>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>>
>> Hi Michihiro,
>>
>> thank you for implementing it. I have just taken a first look at your
>> webrev.01.
>>
>> It looks basically good. Only the Power version check seems to be
incorrect.
>> VM_Version::has_popcntb() checks for Power5.
>> I believe most instructions are available with Power7.
>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with
>> Power8?
>> We should check this carefully.
>>
>> Also, indentation in register_ppc.hpp could get improved.
>>
>> Thanks and best regard,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
>> Sent: Donnerstag, 26. Juli 2018 16:02
>> To: Michihiro Horie <HORIE at jp.ibm.com>
>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>> dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>; ppc-aix-
>> port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>
>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>
>> Hi Michi,
>>
>> On 07/26/2018 01:43 AM, Michihiro Horie wrote:
>>> I updated webrev:
>>>
http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/

>>
>> Thanks for providing an updated webrev and for fixing indentation and
>> function
>> order in assembler_ppc.inline.hpp as well. I have no further comments :)
>>
>>
>> Best Regards,
>> Gustavo
>>
>>>
>>> Best regards,
>>> --
>>> Michihiro,
>>> IBM Research - Tokyo
>>>
>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi
Michi,
>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---
>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie
>> wrote:
>>>
>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-
>> dev at openjdk.java.net, hotspot-dev at openjdk.java.net
>>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin"
>> <martin.doerr at sap.com>
>>> Date: 2018/07/25 23:05
>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>>
>>>
-------------------------------------------------------------------------------------------

>>
----------------------------------------------------------------------------------------------

>>
----------------------------------------------------------------------------------------------

>>
----------------------------------------------------------------------------------------------

>>
----------------------------------------------------------------------------------------------

>>
----------------------------------------------------------------------------------------------

>>
----------------------------------------------------------------------------------------------

>>
----------------------------------------------------------------------------------------------

>>
----------------------------------------------------------------------------------------------

>>
----------------------------------------------------------------------------------------------

>> -----------------------------------------------------
>>>
>>>
>>>
>>> Hi Michi,
>>>
>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:
>>>   > Dear all,
>>>   >
>>>   > Would you review the following change?
>>>   > Bug:
https://bugs.openjdk.java.net/browse/JDK-8208171

>>>   > Webrev:
http://cr.openjdk.java.net/~mhorie/8208171/webrev.00

>>>   >
>>>   > This change adds support for vectorized arithmetic calculation with
SLP.
>>>   >
>>>   > The to_vr function is added to convert VSR to VR. Currently, vecX
is
>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad,
>> which are exactly overlapped with VRs. Instruction APIs receiving VRs
use the
>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the
>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due
to
>> the ConvD2FNode::Value in convertnode.cpp.
>>>
>>> Looks good. Just a few comments:
>>>
>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of
>> vmaddfp in
>>>     order to avoid the splat?
>>>
>>> - Although all instructions added by your change where introduced in
ISA
>> 2.06,
>>>     so POWER7 and above are OK, as I see probes for
>> PowerArchictecturePPC64=6|5 in
>>>     vm_version_ppc.cpp (line 64),  I'm wondering if there is any
control point
>> to
>>>     guarantee that these instructions won't be emitted on a CPU that
does
>> not
>>>     support them.
>>>
>>> - I think that in general string in format %{} are in upper case. For
instance,
>>>     this the current output on optoassembly for vmul4F:
>>>
>>> 2941835 5b4     ADDI    R24, R24, #64
>>> 2941836 5b8     vmaddfp  VSR32,VSR32,VSR36      ! mul packed4F
>>> 2941837 5c0     STXVD2X     [R17], VSR32        // store 16-byte Vector
>>>
>>>     I think it would be better to be in upper case instead. I also
think that if
>>>     the node match emits more than one instruction all instructions
must be
>> listed
>>>     in format %{}, since it's meant for detailed debugging. Finally I
think it
>>>     would be better to replace \t! by \t// in that string (unless I'm
missing any
>>>     special meaning for that char). So for vmul4F it would be something
like:
>>>
>>> 2941835 5b4     ADDI      R24, R24, #64
>>>                   VSPLTISW  VSR34, 0                 // Splat 0 imm in
VSR34
>>> 2941836 5b8     VMADDFP   VSR32,VSR32,VSR36,VSR34  // Mul packed4F
>>> 2941837 5c0     STXVD2X   [R17], VSR32             // store 16-byte
Vector
>>>
>>>
>>> But feel free to change anything just after you get additional
reviews :)
>>>
>>>
>>>   > I confirmed this change with JTREG. In addition, I used attached
micro
>> benchmarks.
>>>   > /(See attached file: slp_microbench.zip)/
>>>
>>> Thanks for sharing it.
>>> Btw, another option to host it would be in the CR
>>> server, in
http://cr.openjdk.java.net/~mhorie/8208171

>>>
>>>
>>> Best regards,
>>> Gustavo
>>>
>>>   >
>>>   > Best regards,
>>>   > --
>>>   > Michihiro,
>>>   > IBM Research - Tokyo
>>>   >
>>>
>>>
>>>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180904/5d6dc749/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180904/5d6dc749/graycol-0001.gif>

From goetz.lindenmaier at sap.com  Tue Sep  4 06:12:19 2018
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Tue, 4 Sep 2018 06:12:19 +0000
Subject: RFR: 8208171: PPC64: Enrich SLP support
In-Reply-To: <OF1DDE5FDC.048940CF-ON002582FD.007F93C0-492582FE.001E65D8@notes.na.collabserv.com>
References: <OF27BFB560.0558A39A-ON002582D5.000EEFEA-492582D5.001F7E2E@notes.na.collabserv.com>
 <b235720a-353b-c0e4-d494-eb337eba06a7@linux.vnet.ibm.com>
 <OFCAFD0563.FF8B85D0-ON002582D6.0019C339-492582D6.0019F896@notes.na.collabserv.com>
 <f9be32d4-556e-5833-1c4c-8fed133f26d8@linux.vnet.ibm.com>
 <346da54af45243c4bdaf475f118a450d@sap.com>
 <9553d65d98f74f37a35b49a1e39f015e@sap.com>
 <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com>
 <fc4310e4a8544c69b3d14fd593a065fc@sap.com>
 <OF1DDE5FDC.048940CF-ON002582FD.007F93C0-492582FE.001E65D8@notes.na.collabserv.com>
Message-ID: <24ecdf108c9c4d77a3a673ecaaca3ab3@sap.com>

> > Why do you rename vnoreg to vnoregi?
> I followed the way of coding for vsnoreg and vsnoregi, but the renaming
> does not look necessary. I would get this part back. Should I also rename
> vsnoregi to vsnoreg?
I think it would be more consistent, but it's not that important :)

Best regards,
  Goetz.


> 
> 
> >we noticed jtreg test failures when using this change:
> >compiler/runtime/safepoints/TestRegisterRestoring.java
> >compiler/runtime/Test7196199.java
> >
> >TestRegisterRestoring is a simple test which returns arbitrary results instead
> of 10000.
> >
> >We didn't see it on all machines, so it might be an issue with
> saving&restoring VR registers in the signal handler.
> >The machine which I have used has "SUSE Linux Enterprise Server 12 SP3"
> with kernel 4.4.126-94.22-default.
> Thank you for letting me know the issue, I will try to reproduce this on a SUSE
> machine.
> 
> 
> >I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when
> your patch is applied. Looks like matching the vector nodes needs to be
> prevented.
> Thank you for pointing out another issue. Currently I do not hit this problem,
> but preventing to match the vector nodes makes sense to avoid the crash. I
> did not prepare match rules for non-vector nodes, so it might be better to
> prepare them similarly like the Replicate* rules, in any case.
> 
> 
> Gustavo, thanks for the wrap-up!
> 
> 
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo
> 
> "Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we
> noticed jtreg test failures when using this change:
> 
> From: "Doerr, Martin" <martin.doerr at sap.com>
> To: Gustavo Romero <gromero at linux.vnet.ibm.com>, "Lindenmaier, Goetz"
> <goetz.lindenmaier at sap.com>, Michihiro Horie <HORIE at jp.ibm.com>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>, "hotspot-
> dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>
> Date: 2018/09/04 02:18
> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
> 
> ________________________________
> 
> 
> 
> 
> Hi Gustavo and Michihiro,
> 
> we noticed jtreg test failures when using this change:
> compiler/runtime/safepoints/TestRegisterRestoring.java
> compiler/runtime/Test7196199.java
> 
> TestRegisterRestoring is a simple test which returns arbitrary results instead
> of 10000.
> 
> We didn't see it on all machines, so it might be an issue with saving&restoring
> VR registers in the signal handler.
> The machine which I have used has "SUSE Linux Enterprise Server 12 SP3"
> with kernel 4.4.126-94.22-default.
> 
> That's what I found out so far. Maybe you have an idea?
> 
> I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when
> your patch is applied. Looks like matching the vector nodes needs to be
> prevented.
> 
> Best regards,
> Martin
> 
> 
> -----Original Message-----
> From: hotspot-dev <hotspot-dev-bounces at openjdk.java.net> On Behalf Of
> Gustavo Romero
> Sent: Montag, 3. September 2018 14:57
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Michihiro Horie
> <HORIE at jp.ibm.com>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; hotspot-
> dev at openjdk.java.net
> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
> 
> Hi Goetz,
> 
> On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote:
> > Also, I can not find all of the mail traffic in
> > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-
> August/thread.html.
> > Is this a problem of the pipermail server?
> >
> > For some reason this webrev lacks the links to browse the diffs.
> > Do you need to use a more recent webrev?  You can obtain it with
> > hg clone http://hg.openjdk.java.net/code-tools/webrev/ .
> 
> Yes, probably it was a problem of the pipermail or in some relay.
> I noted the same thing, i.e. at least one Michi reply arrived
> to me but missed a ML.
> 
> The initial discussion is here:
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-
> July/003613.html
> 
> I understand Martin reviewed the last webrev in that thread, which is
> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/  (taken from
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-
> July/003615.html)
> 
> Martin's review of webrev.01:
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-
> August/033958.html
> 
> and Michi's reply to Martin's review of webrev.01:
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-
> August/003632.html (with webrev.02,
> taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-
> August/003632.html).
> 
> and your last review:
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-
> September/030419.html
> 
> 
> HTH.
> 
> Best regards,
> Gustavo
> 
> > Why do you rename vnoreg to vnoregi?
> >
> > Besides that the change is fine, thanks for implementing this!
> >
> > Best regards,
> >    Goetz.
> >
> >
> >> -----Original Message-----
> >> From: Doerr, Martin
> >> Sent: Dienstag, 28. August 2018 19:35
> >> To: Gustavo Romero <gromero at linux.vnet.ibm.com>; Michihiro Horie
> >> <HORIE at jp.ibm.com>
> >> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> >> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis,
> Volker
> >> <volker.simonis at sap.com>
> >> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
> >>
> >> Hi Michihiro,
> >>
> >> thank you for implementing it. I have just taken a first look at your
> >> webrev.01.
> >>
> >> It looks basically good. Only the Power version check seems to be
> incorrect.
> >> VM_Version::has_popcntb() checks for Power5.
> >> I believe most instructions are available with Power7.
> >> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with
> >> Power8?
> >> We should check this carefully.
> >>
> >> Also, indentation in register_ppc.hpp could get improved.
> >>
> >> Thanks and best regard,
> >> Martin
> >>
> >>
> >> -----Original Message-----
> >> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> >> Sent: Donnerstag, 26. Juli 2018 16:02
> >> To: Michihiro Horie <HORIE at jp.ibm.com>
> >> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> >> dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>; ppc-aix-
> >> port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>
> >> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
> >>
> >> Hi Michi,
> >>
> >> On 07/26/2018 01:43 AM, Michihiro Horie wrote:
> >>> I updated webrev:
> >>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/
> >>
> >> Thanks for providing an updated webrev and for fixing indentation and
> >> function
> >> order in assembler_ppc.inline.hpp as well. I have no further comments :)
> >>
> >>
> >> Best Regards,
> >> Gustavo
> >>
> >>>
> >>> Best regards,
> >>> --
> >>> Michihiro,
> >>> IBM Research - Tokyo
> >>>
> >>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi
> Michi,
> >> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---
> >> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie
> >> wrote:
> >>>
> >>> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> >>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-
> >> dev at openjdk.java.net, hotspot-dev at openjdk.java.net
> >>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr,
> Martin"
> >> <martin.doerr at sap.com>
> >>> Date: 2018/07/25 23:05
> >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
> >>>
> >>> ----------------------------------------------------------------------------------------
> ---
> >> ------------------------------------------------------------------------------------------
> ----
> >> ------------------------------------------------------------------------------------------
> ----
> >> ------------------------------------------------------------------------------------------
> ----
> >> ------------------------------------------------------------------------------------------
> ----
> >> ------------------------------------------------------------------------------------------
> ----
> >> ------------------------------------------------------------------------------------------
> ----
> >> ------------------------------------------------------------------------------------------
> ----
> >> ------------------------------------------------------------------------------------------
> ----
> >> ------------------------------------------------------------------------------------------
> ----
> >> -----------------------------------------------------
> >>>
> >>>
> >>>
> >>> Hi Michi,
> >>>
> >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:
> >>>   > Dear all,
> >>>   >
> >>>   > Would you review the following change?
> >>>   > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171
> >>>   > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00
> >>>   >
> >>>   > This change adds support for vectorized arithmetic calculation with
> SLP.
> >>>   >
> >>>   > The to_vr function is added to convert VSR to VR. Currently, vecX is
> >> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad,
> >> which are exactly overlapped with VRs. Instruction APIs receiving VRs use
> the
> >> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the
> >> matching with SqrtVF. I think the change in sqrtF_reg would be fine due
> to
> >> the ConvD2FNode::Value in convertnode.cpp.
> >>>
> >>> Looks good. Just a few comments:
> >>>
> >>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of
> >> vmaddfp in
> >>>     order to avoid the splat?
> >>>
> >>> - Although all instructions added by your change where introduced in ISA
> >> 2.06,
> >>>     so POWER7 and above are OK, as I see probes for
> >> PowerArchictecturePPC64=6|5 in
> >>>     vm_version_ppc.cpp (line 64),  I'm wondering if there is any control
> point
> >> to
> >>>     guarantee that these instructions won't be emitted on a CPU that does
> >> not
> >>>     support them.
> >>>
> >>> - I think that in general string in format %{} are in upper case. For
> instance,
> >>>     this the current output on optoassembly for vmul4F:
> >>>
> >>> 2941835 5b4     ADDI    R24, R24, #64
> >>> 2941836 5b8     vmaddfp  VSR32,VSR32,VSR36      ! mul packed4F
> >>> 2941837 5c0     STXVD2X     [R17], VSR32        // store 16-byte Vector
> >>>
> >>>     I think it would be better to be in upper case instead. I also think that if
> >>>     the node match emits more than one instruction all instructions must
> be
> >> listed
> >>>     in format %{}, since it's meant for detailed debugging. Finally I think it
> >>>     would be better to replace \t! by \t// in that string (unless I'm missing
> any
> >>>     special meaning for that char). So for vmul4F it would be something
> like:
> >>>
> >>> 2941835 5b4     ADDI      R24, R24, #64
> >>>                   VSPLTISW  VSR34, 0                 // Splat 0 imm in VSR34
> >>> 2941836 5b8     VMADDFP   VSR32,VSR32,VSR36,VSR34  // Mul packed4F
> >>> 2941837 5c0     STXVD2X   [R17], VSR32             // store 16-byte Vector
> >>>
> >>>
> >>> But feel free to change anything just after you get additional reviews :)
> >>>
> >>>
> >>>   > I confirmed this change with JTREG. In addition, I used attached micro
> >> benchmarks.
> >>>   > /(See attached file: slp_microbench.zip)/
> >>>
> >>> Thanks for sharing it.
> >>> Btw, another option to host it would be in the CR
> >>> server, in http://cr.openjdk.java.net/~mhorie/8208171
> >>>
> >>>
> >>> Best regards,
> >>> Gustavo
> >>>
> >>>   >
> >>>   > Best regards,
> >>>   > --
> >>>   > Michihiro,
> >>>   > IBM Research - Tokyo
> >>>   >
> >>>
> >>>
> >>>
> >
> 
> 
> 
> 
> 


From HORIE at jp.ibm.com  Tue Sep  4 07:36:06 2018
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Tue, 4 Sep 2018 16:36:06 +0900
Subject: RFR: 8208171: PPC64: Enrich SLP support
In-Reply-To: <24ecdf108c9c4d77a3a673ecaaca3ab3@sap.com>
References: <OF27BFB560.0558A39A-ON002582D5.000EEFEA-492582D5.001F7E2E@notes.na.collabserv.com>
 <b235720a-353b-c0e4-d494-eb337eba06a7@linux.vnet.ibm.com>
 <OFCAFD0563.FF8B85D0-ON002582D6.0019C339-492582D6.0019F896@notes.na.collabserv.com>
 <f9be32d4-556e-5833-1c4c-8fed133f26d8@linux.vnet.ibm.com>
 <346da54af45243c4bdaf475f118a450d@sap.com>
 <9553d65d98f74f37a35b49a1e39f015e@sap.com>
 <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com>
 <fc4310e4a8544c69b3d14fd593a065fc@sap.com>
 <OF1DDE5FDC.048940CF-ON002582FD.007F93C0-492582FE.001E65D8@notes.na.collabserv.com>
 <24ecdf108c9c4d77a3a673ecaaca3ab3@sap.com>
Message-ID: <OF7536515C.74760CB6-ON002582FE.0024AE12-492582FE.0029C214@notes.na.collabserv.com>


Hi Goetz,

>I think it would be more consistent, but it's not that important :)
Thank you for your comments, then I would firstly try to resolve the crash
issues.


Best regards,
--
Michihiro,
IBM Research - Tokyo


From:	"Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>
To:	Michihiro Horie <HORIE at jp.ibm.com>, "Doerr, Martin"
            <martin.doerr at sap.com>
Cc:	Gustavo Romero <gromero at linux.vnet.ibm.com>, hotspot compiler
            <hotspot-compiler-dev at openjdk.java.net>,
            "hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>
Date:	2018/09/04 15:12
Subject:	RE: RFR: 8208171: PPC64: Enrich SLP support


> > Why do you rename vnoreg to vnoregi?
> I followed the way of coding for vsnoreg and vsnoregi, but the renaming
> does not look necessary. I would get this part back. Should I also rename
> vsnoregi to vsnoreg?
I think it would be more consistent, but it's not that important :)

Best regards,
  Goetz.


>
>
> >we noticed jtreg test failures when using this change:
> >compiler/runtime/safepoints/TestRegisterRestoring.java
> >compiler/runtime/Test7196199.java
> >
> >TestRegisterRestoring is a simple test which returns arbitrary results
instead
> of 10000.
> >
> >We didn't see it on all machines, so it might be an issue with
> saving&restoring VR registers in the signal handler.
> >The machine which I have used has "SUSE Linux Enterprise Server 12 SP3"
> with kernel 4.4.126-94.22-default.
> Thank you for letting me know the issue, I will try to reproduce this on
a SUSE
> machine.
>
>
> >I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when
> your patch is applied. Looks like matching the vector nodes needs to be
> prevented.
> Thank you for pointing out another issue. Currently I do not hit this
problem,
> but preventing to match the vector nodes makes sense to avoid the crash.
I
> did not prepare match rules for non-vector nodes, so it might be better
to
> prepare them similarly like the Replicate* rules, in any case.
>
>
> Gustavo, thanks for the wrap-up!
>
>
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo
>
> "Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we
> noticed jtreg test failures when using this change:
>
> From: "Doerr, Martin" <martin.doerr at sap.com>
> To: Gustavo Romero <gromero at linux.vnet.ibm.com>, "Lindenmaier, Goetz"
> <goetz.lindenmaier at sap.com>, Michihiro Horie <HORIE at jp.ibm.com>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>, "hotspot-
> dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>
> Date: 2018/09/04 02:18
> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>
> ________________________________
>
>
>
>
> Hi Gustavo and Michihiro,
>
> we noticed jtreg test failures when using this change:
> compiler/runtime/safepoints/TestRegisterRestoring.java
> compiler/runtime/Test7196199.java
>
> TestRegisterRestoring is a simple test which returns arbitrary results
instead
> of 10000.
>
> We didn't see it on all machines, so it might be an issue with
saving&restoring
> VR registers in the signal handler.
> The machine which I have used has "SUSE Linux Enterprise Server 12 SP3"
> with kernel 4.4.126-94.22-default.
>
> That's what I found out so far. Maybe you have an idea?
>
> I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when
> your patch is applied. Looks like matching the vector nodes needs to be
> prevented.
>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: hotspot-dev <hotspot-dev-bounces at openjdk.java.net> On Behalf Of
> Gustavo Romero
> Sent: Montag, 3. September 2018 14:57
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Michihiro Horie
> <HORIE at jp.ibm.com>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; hotspot-
> dev at openjdk.java.net
> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>
> Hi Goetz,
>
> On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote:
> > Also, I can not find all of the mail traffic in
> >
http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-

> August/thread.html.
> > Is this a problem of the pipermail server?
> >
> > For some reason this webrev lacks the links to browse the diffs.
> > Do you need to use a more recent webrev?  You can obtain it with
> > hg clone
http://hg.openjdk.java.net/code-tools/webrev/
 .
>
> Yes, probably it was a problem of the pipermail or in some relay.
> I noted the same thing, i.e. at least one Michi reply arrived
> to me but missed a ML.
>
> The initial discussion is here:
>
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-

> July/003613.html
>
> I understand Martin reviewed the last webrev in that thread, which is
>
http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/
  (taken from
>
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-

> July/003615.html)
>
> Martin's review of webrev.01:
>
http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-

> August/033958.html
>
> and Michi's reply to Martin's review of webrev.01:
>
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-

> August/003632.html (with webrev.02,
> taken from
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-

> August/003632.html).
>
> and your last review:
>
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-

> September/030419.html
>
>
> HTH.
>
> Best regards,
> Gustavo
>
> > Why do you rename vnoreg to vnoregi?
> >
> > Besides that the change is fine, thanks for implementing this!
> >
> > Best regards,
> >    Goetz.
> >
> >
> >> -----Original Message-----
> >> From: Doerr, Martin
> >> Sent: Dienstag, 28. August 2018 19:35
> >> To: Gustavo Romero <gromero at linux.vnet.ibm.com>; Michihiro Horie
> >> <HORIE at jp.ibm.com>
> >> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> >> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis,
> Volker
> >> <volker.simonis at sap.com>
> >> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
> >>
> >> Hi Michihiro,
> >>
> >> thank you for implementing it. I have just taken a first look at your
> >> webrev.01.
> >>
> >> It looks basically good. Only the Power version check seems to be
> incorrect.
> >> VM_Version::has_popcntb() checks for Power5.
> >> I believe most instructions are available with Power7.
> >> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with
> >> Power8?
> >> We should check this carefully.
> >>
> >> Also, indentation in register_ppc.hpp could get improved.
> >>
> >> Thanks and best regard,
> >> Martin
> >>
> >>
> >> -----Original Message-----
> >> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> >> Sent: Donnerstag, 26. Juli 2018 16:02
> >> To: Michihiro Horie <HORIE at jp.ibm.com>
> >> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> >> dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>; ppc-aix-
> >> port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>
> >> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
> >>
> >> Hi Michi,
> >>
> >> On 07/26/2018 01:43 AM, Michihiro Horie wrote:
> >>> I updated webrev:
> >>>
http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/

> >>
> >> Thanks for providing an updated webrev and for fixing indentation and
> >> function
> >> order in assembler_ppc.inline.hpp as well. I have no further
comments :)
> >>
> >>
> >> Best Regards,
> >> Gustavo
> >>
> >>>
> >>> Best regards,
> >>> --
> >>> Michihiro,
> >>> IBM Research - Tokyo
> >>>
> >>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi
> Michi,
> >> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---
> >> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro
Horie
> >> wrote:
> >>>
> >>> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> >>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-
> >> dev at openjdk.java.net, hotspot-dev at openjdk.java.net
> >>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr,
> Martin"
> >> <martin.doerr at sap.com>
> >>> Date: 2018/07/25 23:05
> >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
> >>>
> >>>
----------------------------------------------------------------------------------------

> ---
> >>
------------------------------------------------------------------------------------------

> ----
> >>
------------------------------------------------------------------------------------------

> ----
> >>
------------------------------------------------------------------------------------------

> ----
> >>
------------------------------------------------------------------------------------------

> ----
> >>
------------------------------------------------------------------------------------------

> ----
> >>
------------------------------------------------------------------------------------------

> ----
> >>
------------------------------------------------------------------------------------------

> ----
> >>
------------------------------------------------------------------------------------------

> ----
> >>
------------------------------------------------------------------------------------------

> ----
> >> -----------------------------------------------------
> >>>
> >>>
> >>>
> >>> Hi Michi,
> >>>
> >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:
> >>>   > Dear all,
> >>>   >
> >>>   > Would you review the following change?
> >>>   > Bug:
https://bugs.openjdk.java.net/browse/JDK-8208171

> >>>   > Webrev:
http://cr.openjdk.java.net/~mhorie/8208171/webrev.00

> >>>   >
> >>>   > This change adds support for vectorized arithmetic calculation
with
> SLP.
> >>>   >
> >>>   > The to_vr function is added to convert VSR to VR. Currently, vecX
is
> >> associated with a VSR class vs_reg that only defines VSR32-51 in
ppc.ad,
> >> which are exactly overlapped with VRs. Instruction APIs receiving VRs
use
> the
> >> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the
> >> matching with SqrtVF. I think the change in sqrtF_reg would be fine
due
> to
> >> the ConvD2FNode::Value in convertnode.cpp.
> >>>
> >>> Looks good. Just a few comments:
> >>>
> >>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of
> >> vmaddfp in
> >>>     order to avoid the splat?
> >>>
> >>> - Although all instructions added by your change where introduced in
ISA
> >> 2.06,
> >>>     so POWER7 and above are OK, as I see probes for
> >> PowerArchictecturePPC64=6|5 in
> >>>     vm_version_ppc.cpp (line 64),  I'm wondering if there is any
control
> point
> >> to
> >>>     guarantee that these instructions won't be emitted on a CPU that
does
> >> not
> >>>     support them.
> >>>
> >>> - I think that in general string in format %{} are in upper case. For
> instance,
> >>>     this the current output on optoassembly for vmul4F:
> >>>
> >>> 2941835 5b4     ADDI    R24, R24, #64
> >>> 2941836 5b8     vmaddfp  VSR32,VSR32,VSR36      ! mul packed4F
> >>> 2941837 5c0     STXVD2X     [R17], VSR32        // store 16-byte
Vector
> >>>
> >>>     I think it would be better to be in upper case instead. I also
think that if
> >>>     the node match emits more than one instruction all instructions
must
> be
> >> listed
> >>>     in format %{}, since it's meant for detailed debugging. Finally I
think it
> >>>     would be better to replace \t! by \t// in that string (unless I'm
missing
> any
> >>>     special meaning for that char). So for vmul4F it would be
something
> like:
> >>>
> >>> 2941835 5b4     ADDI      R24, R24, #64
> >>>                   VSPLTISW  VSR34, 0                 // Splat 0 imm
in VSR34
> >>> 2941836 5b8     VMADDFP   VSR32,VSR32,VSR36,VSR34  // Mul packed4F
> >>> 2941837 5c0     STXVD2X   [R17], VSR32             // store 16-byte
Vector
> >>>
> >>>
> >>> But feel free to change anything just after you get additional
reviews :)
> >>>
> >>>
> >>>   > I confirmed this change with JTREG. In addition, I used attached
micro
> >> benchmarks.
> >>>   > /(See attached file: slp_microbench.zip)/
> >>>
> >>> Thanks for sharing it.
> >>> Btw, another option to host it would be in the CR
> >>> server, in
http://cr.openjdk.java.net/~mhorie/8208171

> >>>
> >>>
> >>> Best regards,
> >>> Gustavo
> >>>
> >>>   >
> >>>   > Best regards,
> >>>   > --
> >>>   > Michihiro,
> >>>   > IBM Research - Tokyo
> >>>   >
> >>>
> >>>
> >>>
> >
>
>
>
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180904/45ce8c2e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180904/45ce8c2e/graycol-0001.gif>

From lutz.schmidt at sap.com  Tue Sep  4 08:29:09 2018
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Tue, 4 Sep 2018 08:29:09 +0000
Subject: RFR(S): 8210319: [s390]: Use of shift operators not covered by cpp
 standard
Message-ID: <383FFEC2-0FB7-402B-95B1-6ACD017BB630@sap.com>

Dear All,

may I please request reviews for this small, s390-only patch. It fixes some shift operations which relied on behavior not covered by the language standard.
Bug:    https://bugs.openjdk.java.net/browse/JDK-8210319
Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8210319.00/

Thank you!
Lutz

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180904/071e18ae/attachment.html>

From martin.doerr at sap.com  Tue Sep  4 09:28:06 2018
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 4 Sep 2018 09:28:06 +0000
Subject: RFR(S): 8210319: [s390]: Use of shift operators not covered by
 cpp standard
In-Reply-To: <383FFEC2-0FB7-402B-95B1-6ACD017BB630@sap.com>
References: <383FFEC2-0FB7-402B-95B1-6ACD017BB630@sap.com>
Message-ID: <bd3b71bbce404b7eadcf808c4ec68949@sap.com>

Hi Lutz,

looks good. Thanks for improving.

Best regards,
Martin


From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Schmidt, Lutz
Sent: Dienstag, 4. September 2018 10:29
To: hotspot-compiler-dev at openjdk.java.net
Subject: RFR(S): 8210319: [s390]: Use of shift operators not covered by cpp standard

Dear All,

may I please request reviews for this small, s390-only patch. It fixes some shift operations which relied on behavior not covered by the language standard.
Bug:    https://bugs.openjdk.java.net/browse/JDK-8210319
Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8210319.00/

Thank you!
Lutz

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180904/d1731be9/attachment.html>

From shade at redhat.com  Tue Sep  4 10:28:55 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 4 Sep 2018 12:28:55 +0200
Subject: RFR (XS) 8210355: Minimal and Zero non-PCH builds fail after
 JDK-8207343 (Automate vtable/itable stub size calculation)
Message-ID: <f8f15a73-98f6-1f08-f0b8-396b7d6d3f8f@redhat.com>

Bug:
  https://bugs.openjdk.java.net/browse/JDK-8210355

Fix:

diff -r 3ee917225506 src/hotspot/share/code/vtableStubs.cpp
--- a/src/hotspot/share/code/vtableStubs.cpp	Tue Sep 04 14:47:55 2018 +0800
+++ b/src/hotspot/share/code/vtableStubs.cpp	Tue Sep 04 12:23:23 2018 +0200
@@ -26,6 +26,7 @@
 #include "code/vtableStubs.hpp"
 #include "compiler/compileBroker.hpp"
 #include "compiler/disassembler.hpp"
+#include "logging/log.hpp"
 #include "memory/allocation.inline.hpp"
 #include "memory/resourceArea.hpp"
 #include "oops/instanceKlass.hpp"

Seems like it is transitively included from somewhere (compiler?) in most configuration, but it
breaks for Minimal and Zero non-PCH builds which are configured without C1/C2. Zero is still broken
by other thing.

Testing: Linux x86_64 minimal builds

Thanks,
-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180904/d3b08606/signature.asc>

From erik.osterlund at oracle.com  Tue Sep  4 10:32:43 2018
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Tue, 4 Sep 2018 12:32:43 +0200
Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics
In-Reply-To: <d1a91628-4cef-5a02-6b78-825cc096e79e@oracle.com>
References: <c4823bb8-ebb1-62ea-153e-3ea3311d1a61@oracle.com>
 <dk6h8jc2j7w.fsf@rwestrel.remote.csb>
 <d1a91628-4cef-5a02-6b78-825cc096e79e@oracle.com>
Message-ID: <5B8E5F4B.5060707@oracle.com>

Hi,

Any more takers?

Full:
http://cr.openjdk.java.net/~eosterlund/8210158/webrev.01/

Thanks,
/Erik

On 2018-08-30 17:06, Erik ?sterlund wrote:
> Hi Roland,
>
> Thank you for the review.
>
> On 2018-08-30 13:21, Roland Westrelin wrote:
>> Hi Erik,
>>
>>> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.00
>> make_load() already calls _gvn.transform(), right?
>
> Yes you are right. I will remove the redundant _gvn.transform call of 
> the access_load; it is redundant indeed.
>
>> You don't set MO_UNORDERED. Why is it not required?
>
> MO_UNORDERED is the default MO of loads and stores. It is set up in 
> the C2Access object using fixup_decorators() which sets sane defaults 
> for various decorators, including MO.
>
> Thanks,
> /Erik
>
>> Roland.
>


From shade at redhat.com  Tue Sep  4 10:33:03 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 4 Sep 2018 12:33:03 +0200
Subject: RFR (XS) 8210357: Zero builds fail after JDK-8207343 (Automate
 vtable/itable stub size calculation)
Message-ID: <1a517f29-eace-8e8b-4387-848700bad214@redhat.com>

Bug:
  https://bugs.openjdk.java.net/browse/JDK-8210357

Seems like VtableStub::pd_code_size_limit is gone, and should be purged from Zero too.

Fix:

diff -r bc76fd44b029 src/hotspot/cpu/zero/vtableStubs_zero.cpp
--- a/src/hotspot/cpu/zero/vtableStubs_zero.cpp	Tue Sep 04 12:28:12 2018 +0200
+++ b/src/hotspot/cpu/zero/vtableStubs_zero.cpp	Tue Sep 04 12:30:21 2018 +0200
@@ -37,11 +37,6 @@
   return NULL;
 }

-int VtableStub::pd_code_size_limit(bool is_vtable_stub) {
-  ShouldNotCallThis();
-  return 0;
-}
-
 int VtableStub::pd_code_alignment() {
   ShouldNotCallThis();
   return 0;


Testing: Linux x86_64 zero build

Thanks,
-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180904/d32e5f76/signature-0001.asc>

From rickard.backman at oracle.com  Tue Sep  4 10:36:22 2018
From: rickard.backman at oracle.com (Rickard =?utf-8?Q?B=C3=A4ckman?=)
Date: Tue, 4 Sep 2018 12:36:22 +0200
Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics
In-Reply-To: <c4823bb8-ebb1-62ea-153e-3ea3311d1a61@oracle.com>
References: <c4823bb8-ebb1-62ea-153e-3ea3311d1a61@oracle.com>
Message-ID: <20180904103622.sijpgfiltco4mxd2@rbackman>

Looks good.

/R

On 08/30, Erik ?sterlund wrote:
> Hi,
> 
> The JFR getEventWriter() intrinsics have code in C1 and C2 that manually
> resolves jobjects. This should go through the Access API to make sure the
> necessary GC barriers are inserted.
> 
> I noticed this in an attempt to move JNI handle processing out of the pause
> (among other things). It crashed in kitchensink.
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8210158
> 
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.00
> 
> I tested the patch by running it, together with a patch that moves out JNI
> handle processing outside of the pause, through hs-tier1-3, as well as
> running it through Kitchensink24H (as it originally crashed in kitchensink).
> 
> Thanks,
> /Erik

From erik.osterlund at oracle.com  Tue Sep  4 10:40:11 2018
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Tue, 4 Sep 2018 12:40:11 +0200
Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics
In-Reply-To: <20180904103622.sijpgfiltco4mxd2@rbackman>
References: <c4823bb8-ebb1-62ea-153e-3ea3311d1a61@oracle.com>
 <20180904103622.sijpgfiltco4mxd2@rbackman>
Message-ID: <5B8E610B.70909@oracle.com>

Hi Rickard,

Thank you for the review.

/Erik

On 2018-09-04 12:36, Rickard B?ckman wrote:
> Looks good.
>
> /R
>
> On 08/30, Erik ?sterlund wrote:
>> Hi,
>>
>> The JFR getEventWriter() intrinsics have code in C1 and C2 that manually
>> resolves jobjects. This should go through the Access API to make sure the
>> necessary GC barriers are inserted.
>>
>> I noticed this in an attempt to move JNI handle processing out of the pause
>> (among other things). It crashed in kitchensink.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8210158
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.00
>>
>> I tested the patch by running it, together with a patch that moves out JNI
>> handle processing outside of the pause, through hs-tier1-3, as well as
>> running it through Kitchensink24H (as it originally crashed in kitchensink).
>>
>> Thanks,
>> /Erik


From tobias.hartmann at oracle.com  Tue Sep  4 11:09:07 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 4 Sep 2018 13:09:07 +0200
Subject: RFR (XS) 8210355: Minimal and Zero non-PCH builds fail after
 JDK-8207343 (Automate vtable/itable stub size calculation)
In-Reply-To: <f8f15a73-98f6-1f08-f0b8-396b7d6d3f8f@redhat.com>
References: <f8f15a73-98f6-1f08-f0b8-396b7d6d3f8f@redhat.com>
Message-ID: <097fa86d-7e07-d125-5b54-2b91ebb75fdf@oracle.com>

Hi Aleksey,

looks good to me and can be considered trivial.

Best regards,
Tobias

On 04.09.2018 12:28, Aleksey Shipilev wrote:
> Bug:
>   https://bugs.openjdk.java.net/browse/JDK-8210355
> 
> Fix:
> 
> diff -r 3ee917225506 src/hotspot/share/code/vtableStubs.cpp
> --- a/src/hotspot/share/code/vtableStubs.cpp	Tue Sep 04 14:47:55 2018 +0800
> +++ b/src/hotspot/share/code/vtableStubs.cpp	Tue Sep 04 12:23:23 2018 +0200
> @@ -26,6 +26,7 @@
>  #include "code/vtableStubs.hpp"
>  #include "compiler/compileBroker.hpp"
>  #include "compiler/disassembler.hpp"
> +#include "logging/log.hpp"
>  #include "memory/allocation.inline.hpp"
>  #include "memory/resourceArea.hpp"
>  #include "oops/instanceKlass.hpp"
> 
> Seems like it is transitively included from somewhere (compiler?) in most configuration, but it
> breaks for Minimal and Zero non-PCH builds which are configured without C1/C2. Zero is still broken
> by other thing.
> 
> Testing: Linux x86_64 minimal builds
> 
> Thanks,
> -Aleksey
> 

From tobias.hartmann at oracle.com  Tue Sep  4 11:09:22 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 4 Sep 2018 13:09:22 +0200
Subject: RFR (XS) 8210357: Zero builds fail after JDK-8207343 (Automate
 vtable/itable stub size calculation)
In-Reply-To: <1a517f29-eace-8e8b-4387-848700bad214@redhat.com>
References: <1a517f29-eace-8e8b-4387-848700bad214@redhat.com>
Message-ID: <50b8c457-31bc-dc56-223a-f8e655744d50@oracle.com>

Hi Aleksey,

looks good to me and can be considered trivial.

Best regards,
Tobias


On 04.09.2018 12:33, Aleksey Shipilev wrote:
> Bug:
>   https://bugs.openjdk.java.net/browse/JDK-8210357
> 
> Seems like VtableStub::pd_code_size_limit is gone, and should be purged from Zero too.
> 
> Fix:
> 
> diff -r bc76fd44b029 src/hotspot/cpu/zero/vtableStubs_zero.cpp
> --- a/src/hotspot/cpu/zero/vtableStubs_zero.cpp	Tue Sep 04 12:28:12 2018 +0200
> +++ b/src/hotspot/cpu/zero/vtableStubs_zero.cpp	Tue Sep 04 12:30:21 2018 +0200
> @@ -37,11 +37,6 @@
>    return NULL;
>  }
> 
> -int VtableStub::pd_code_size_limit(bool is_vtable_stub) {
> -  ShouldNotCallThis();
> -  return 0;
> -}
> -
>  int VtableStub::pd_code_alignment() {
>    ShouldNotCallThis();
>    return 0;
> 
> 
> Testing: Linux x86_64 zero build
> 
> Thanks,
> -Aleksey
> 

From shade at redhat.com  Tue Sep  4 11:21:55 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 4 Sep 2018 13:21:55 +0200
Subject: RFR (XS) 8210357: Zero builds fail after JDK-8207343 (Automate
 vtable/itable stub size calculation)
In-Reply-To: <50b8c457-31bc-dc56-223a-f8e655744d50@oracle.com>
References: <1a517f29-eace-8e8b-4387-848700bad214@redhat.com>
 <50b8c457-31bc-dc56-223a-f8e655744d50@oracle.com>
Message-ID: <0d67f0f0-ffe0-a6d4-5d49-896eb2621a0f@redhat.com>

Thanks, pushed.

-Aleksey

On 09/04/2018 01:09 PM, Tobias Hartmann wrote:
> Hi Aleksey,
> 
> looks good to me and can be considered trivial.
> 
> Best regards,
> Tobias
> 
> 
> On 04.09.2018 12:33, Aleksey Shipilev wrote:
>> Bug:
>>   https://bugs.openjdk.java.net/browse/JDK-8210357
>>
>> Seems like VtableStub::pd_code_size_limit is gone, and should be purged from Zero too.
>>
>> Fix:
>>
>> diff -r bc76fd44b029 src/hotspot/cpu/zero/vtableStubs_zero.cpp
>> --- a/src/hotspot/cpu/zero/vtableStubs_zero.cpp	Tue Sep 04 12:28:12 2018 +0200
>> +++ b/src/hotspot/cpu/zero/vtableStubs_zero.cpp	Tue Sep 04 12:30:21 2018 +0200
>> @@ -37,11 +37,6 @@
>>    return NULL;
>>  }
>>
>> -int VtableStub::pd_code_size_limit(bool is_vtable_stub) {
>> -  ShouldNotCallThis();
>> -  return 0;
>> -}
>> -
>>  int VtableStub::pd_code_alignment() {
>>    ShouldNotCallThis();
>>    return 0;
>>
>>
>> Testing: Linux x86_64 zero build
>>
>> Thanks,
>> -Aleksey
>>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180904/b3c8eae4/signature.asc>

From shade at redhat.com  Tue Sep  4 11:22:09 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 4 Sep 2018 13:22:09 +0200
Subject: RFR (XS) 8210355: Minimal and Zero non-PCH builds fail after
 JDK-8207343 (Automate vtable/itable stub size calculation)
In-Reply-To: <097fa86d-7e07-d125-5b54-2b91ebb75fdf@oracle.com>
References: <f8f15a73-98f6-1f08-f0b8-396b7d6d3f8f@redhat.com>
 <097fa86d-7e07-d125-5b54-2b91ebb75fdf@oracle.com>
Message-ID: <d72a1189-d893-34f7-6449-f047e3be2be0@redhat.com>

Thanks, pushed.

-Aleksey


On 09/04/2018 01:09 PM, Tobias Hartmann wrote:
> Hi Aleksey,
> 
> looks good to me and can be considered trivial.
> 
> Best regards,
> Tobias
> 
> On 04.09.2018 12:28, Aleksey Shipilev wrote:
>> Bug:
>>   https://bugs.openjdk.java.net/browse/JDK-8210355
>>
>> Fix:
>>
>> diff -r 3ee917225506 src/hotspot/share/code/vtableStubs.cpp
>> --- a/src/hotspot/share/code/vtableStubs.cpp	Tue Sep 04 14:47:55 2018 +0800
>> +++ b/src/hotspot/share/code/vtableStubs.cpp	Tue Sep 04 12:23:23 2018 +0200
>> @@ -26,6 +26,7 @@
>>  #include "code/vtableStubs.hpp"
>>  #include "compiler/compileBroker.hpp"
>>  #include "compiler/disassembler.hpp"
>> +#include "logging/log.hpp"
>>  #include "memory/allocation.inline.hpp"
>>  #include "memory/resourceArea.hpp"
>>  #include "oops/instanceKlass.hpp"
>>
>> Seems like it is transitively included from somewhere (compiler?) in most configuration, but it
>> breaks for Minimal and Zero non-PCH builds which are configured without C1/C2. Zero is still broken
>> by other thing.
>>
>> Testing: Linux x86_64 minimal builds
>>
>> Thanks,
>> -Aleksey
>>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180904/9fc1475e/signature-0001.asc>

From sgehwolf at redhat.com  Tue Sep  4 11:44:39 2018
From: sgehwolf at redhat.com (Severin Gehwolf)
Date: Tue, 04 Sep 2018 13:44:39 +0200
Subject: RFR (XS) 8210357: Zero builds fail after JDK-8207343 (Automate
 vtable/itable stub size calculation)
In-Reply-To: <0d67f0f0-ffe0-a6d4-5d49-896eb2621a0f@redhat.com>
References: <1a517f29-eace-8e8b-4387-848700bad214@redhat.com>
 <50b8c457-31bc-dc56-223a-f8e655744d50@oracle.com>
 <0d67f0f0-ffe0-a6d4-5d49-896eb2621a0f@redhat.com>
Message-ID: <cfce3af80b5cf1aad399f18b67fd864caccf951c.camel@redhat.com>

On Tue, 2018-09-04 at 13:21 +0200, Aleksey Shipilev wrote:
> Thanks, pushed.

Thanks for the Zero build fixes, Aleksey!

Cheers,
Severin


From lutz.schmidt at sap.com  Tue Sep  4 12:58:50 2018
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Tue, 4 Sep 2018 12:58:50 +0000
Subject: RFR(S): 8210319: [s390]: Use of shift operators not covered by
 cpp standard
In-Reply-To: <bd3b71bbce404b7eadcf808c4ec68949@sap.com>
References: <383FFEC2-0FB7-402B-95B1-6ACD017BB630@sap.com>
 <bd3b71bbce404b7eadcf808c4ec68949@sap.com>
Message-ID: <B42550A0-F7E7-42B6-9FB7-88A73EB775FB@sap.com>

Hi Martin,
thanks for the review!
Regards,
Lutz

From: "Doerr, Martin (martin.doerr at sap.com)" <martin.doerr at sap.com>
Date: Tuesday, 4. September 2018 at 11:28
To: Lutz Schmidt <lutz.schmidt at sap.com>, "hotspot-compiler-dev at openjdk.java.net" <hotspot-compiler-dev at openjdk.java.net>
Subject: RE: RFR(S): 8210319: [s390]: Use of shift operators not covered by cpp standard

Hi Lutz,

looks good. Thanks for improving.

Best regards,
Martin


From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Schmidt, Lutz
Sent: Dienstag, 4. September 2018 10:29
To: hotspot-compiler-dev at openjdk.java.net
Subject: RFR(S): 8210319: [s390]: Use of shift operators not covered by cpp standard

Dear All,

may I please request reviews for this small, s390-only patch. It fixes some shift operations which relied on behavior not covered by the language standard.
Bug:    https://bugs.openjdk.java.net/browse/JDK-8210319
Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8210319.00/

Thank you!
Lutz

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180904/dd96d81b/attachment.html>

From lutz.schmidt at sap.com  Tue Sep  4 13:01:15 2018
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Tue, 4 Sep 2018 13:01:15 +0000
Subject: RFR (XS) 8210357: Zero builds fail after JDK-8207343 (Automate
 vtable/itable stub size calculation)
In-Reply-To: <cfce3af80b5cf1aad399f18b67fd864caccf951c.camel@redhat.com>
References: <1a517f29-eace-8e8b-4387-848700bad214@redhat.com>
 <50b8c457-31bc-dc56-223a-f8e655744d50@oracle.com>
 <0d67f0f0-ffe0-a6d4-5d49-896eb2621a0f@redhat.com>
 <cfce3af80b5cf1aad399f18b67fd864caccf951c.camel@redhat.com>
Message-ID: <F16C5923-8957-4E4F-8DB6-FD235C498347@sap.com>

Hi folks, 
thanks for fixing my failures!
Best regards,
Lutz

?On 04.09.18, 13:44, "hotspot-compiler-dev on behalf of Severin Gehwolf" <hotspot-compiler-dev-bounces at openjdk.java.net on behalf of sgehwolf at redhat.com> wrote:

    On Tue, 2018-09-04 at 13:21 +0200, Aleksey Shipilev wrote:
    > Thanks, pushed.
    
    Thanks for the Zero build fixes, Aleksey!
    
    Cheers,
    Severin
    
    
From lutz.schmidt at sap.com  Tue Sep  4 13:14:56 2018
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Tue, 4 Sep 2018 13:14:56 +0000
Subject: RFR (XS) 8210357: Zero builds fail after JDK-8207343 (Automate
 vtable/itable stub size calculation)
In-Reply-To: <1a517f29-eace-8e8b-4387-848700bad214@redhat.com>
References: <1a517f29-eace-8e8b-4387-848700bad214@redhat.com>
Message-ID: <EAE620EB-3311-45FB-9863-68492AE2E734@sap.com>

Sorry for the mess! Got the below mail just 10 minutes ago. May be related to internet connectivity issues here this morning. 

Yes, pd_code_size_limit() should be gone on all platforms. I did a grep across the source tree to find all occurrences and obviously missed zero. Don't know why. 

Regards, 
Lutz


?On 04.09.18, 12:33, "hotspot-compiler-dev on behalf of Aleksey Shipilev" <hotspot-compiler-dev-bounces at openjdk.java.net on behalf of shade at redhat.com> wrote:

    Bug:
      https://bugs.openjdk.java.net/browse/JDK-8210357
    
    Seems like VtableStub::pd_code_size_limit is gone, and should be purged from Zero too.
    
    Fix:
    
    diff -r bc76fd44b029 src/hotspot/cpu/zero/vtableStubs_zero.cpp
    --- a/src/hotspot/cpu/zero/vtableStubs_zero.cpp	Tue Sep 04 12:28:12 2018 +0200
    +++ b/src/hotspot/cpu/zero/vtableStubs_zero.cpp	Tue Sep 04 12:30:21 2018 +0200
    @@ -37,11 +37,6 @@
       return NULL;
     }
    
    -int VtableStub::pd_code_size_limit(bool is_vtable_stub) {
    -  ShouldNotCallThis();
    -  return 0;
    -}
    -
     int VtableStub::pd_code_alignment() {
       ShouldNotCallThis();
       return 0;
    
    
    Testing: Linux x86_64 zero build
    
    Thanks,
    -Aleksey
    
    
From gromero at linux.vnet.ibm.com  Tue Sep  4 13:42:02 2018
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Tue, 4 Sep 2018 10:42:02 -0300
Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR
 assembler code
Message-ID: <fb8bc7c4-05f8-001b-316a-75c316a02417@linux.vnet.ibm.com>

Hi,

May I please request reviews for this tiny change that fixes two
uninitialized variables in PPC64 C1 LIR code?

Bug   : https://bugs.openjdk.java.net/browse/JDK-8210320
Webrev: http://cr.openjdk.java.net/~gromero/8210320/v1/

GCC 4.8 does not complain about these two uninitialized pointers ('data'
and 'md') but more recent versions, like 5.4.0 and 7.3.1, complain about
it:

In file included from /home/gromero/hg/jdk/jdk/src/hotspot/share/c1/c1_Compilation.hpp:29:0,
                  from /home/gromero/hg/jdk/jdk/src/hotspot/share/precompiled/precompiled.hpp:286:
/home/gromero/hg/jdk/jdk/src/hotspot/share/ci/ciMethodData.hpp: In member function ?void LIR_Assembler::emit_typecheck_helper(LIR_OpTypeCheck*, Label*, Label*, Label*)?:
/home/gromero/hg/jdk/jdk/src/hotspot/share/ci/ciMethodData.hpp:595:100: warning: ?data? may be used uninitialized in this function [-Wmaybe-uninitialized]
    int      byte_offset_of_slot(ciProfileData* data, ByteSize slot_offset_in_data) { return in_bytes(offset_of_slot(data, slot_offset_in_data)); }
                                                                                                     ^
/home/gromero/hg/jdk/jdk/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp:2400:18: note: ?data? was declared here
    ciProfileData* data;
                   ^
/home/gromero/hg/jdk/jdk/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp:2483:78: warning: ?md? may be used uninitialized in this function [-Wmaybe-uninitialized]
      type_profile_helper(mdo, mdo_offset_bias, md, data, recv, Rtmp1, success);
                                                                               ^

Thank you.

Best regards,
Gustavo


From shade at redhat.com  Tue Sep  4 13:49:12 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 4 Sep 2018 15:49:12 +0200
Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR
 assembler code
In-Reply-To: <fb8bc7c4-05f8-001b-316a-75c316a02417@linux.vnet.ibm.com>
References: <fb8bc7c4-05f8-001b-316a-75c316a02417@linux.vnet.ibm.com>
Message-ID: <36d840d7-1050-80a4-5e78-fdb36a734702@redhat.com>

On 09/04/2018 03:42 PM, Gustavo Romero wrote:
> May I please request reviews for this tiny change that fixes two
> uninitialized variables in PPC64 C1 LIR code?
> 
> Bug?? : https://bugs.openjdk.java.net/browse/JDK-8210320
> Webrev: http://cr.openjdk.java.net/~gromero/8210320/v1/

Looks good and trivial to me.

-Aleksey


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180904/e2b5a1e2/signature-0001.asc>

From matthias.baesken at sap.com  Tue Sep  4 13:48:54 2018
From: matthias.baesken at sap.com (Baesken, Matthias)
Date: Tue, 4 Sep 2018 13:48:54 +0000
Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR
 assembler code
In-Reply-To: <fb8bc7c4-05f8-001b-316a-75c316a02417@linux.vnet.ibm.com>
References: <fb8bc7c4-05f8-001b-316a-75c316a02417@linux.vnet.ibm.com>
Message-ID: <4ca7e1f4303a44df8c0ada10cb402718@sap.com>


Hi Gustavo , looks good  (not a reviewer however).
It  might not hurt to initialize   md and data   as well   in the same file  in emit_opTypeCheck  as well  ( even  without gcc complaints ) :


void LIR_Assembler::emit_opTypeCheck(LIR_OpTypeCheck* op) {
  LIR_Code code = op->code();
  if (code == lir_store_check) {
    Register value = op->object()->as_register();
    Register array = op->array()->as_register();
    Register k_RInfo = op->tmp1()->as_register();
    Register klass_RInfo = op->tmp2()->as_register();
    Register Rtmp1 = op->tmp3()->as_register();
    bool should_profile = op->should_profile();

    __ verify_oop(value);
    CodeStub* stub = op->stub();
    // Check if it needs to be profiled.
    ciMethodData* md;
    ciProfileData* data;
 ...


Best regards, Matthias


> -----Original Message-----
> From: ppc-aix-port-dev <ppc-aix-port-dev-bounces at openjdk.java.net> On
> Behalf Of Gustavo Romero
> Sent: Dienstag, 4. September 2018 15:42
> To: hotspot-compiler-dev at openjdk.java.net
> Cc: ppc-aix-port-dev at openjdk.java.net
> Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler
> code
> Importance: High
> 
> Hi,
> 
> May I please request reviews for this tiny change that fixes two
> uninitialized variables in PPC64 C1 LIR code?
> 
> Bug   : https://bugs.openjdk.java.net/browse/JDK-8210320
> Webrev: http://cr.openjdk.java.net/~gromero/8210320/v1/
> 
> GCC 4.8 does not complain about these two uninitialized pointers ('data'
> and 'md') but more recent versions, like 5.4.0 and 7.3.1, complain about
> it:
> 
> In file included from
> /home/gromero/hg/jdk/jdk/src/hotspot/share/c1/c1_Compilation.hpp:29:0,
>                   from
> /home/gromero/hg/jdk/jdk/src/hotspot/share/precompiled/precompiled.h
> pp:286:
> /home/gromero/hg/jdk/jdk/src/hotspot/share/ci/ciMethodData.hpp: In
> member function ?void
> LIR_Assembler::emit_typecheck_helper(LIR_OpTypeCheck*, Label*, Label*,
> Label*)?:
> /home/gromero/hg/jdk/jdk/src/hotspot/share/ci/ciMethodData.hpp:595:10
> 0: warning: ?data? may be used uninitialized in this function [-Wmaybe-
> uninitialized]
>     int      byte_offset_of_slot(ciProfileData* data, ByteSize
> slot_offset_in_data) { return in_bytes(offset_of_slot(data,
> slot_offset_in_data)); }
>                                                                                                      ^
> /home/gromero/hg/jdk/jdk/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cp
> p:2400:18: note: ?data? was declared here
>     ciProfileData* data;
>                    ^
> /home/gromero/hg/jdk/jdk/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cp
> p:2483:78: warning: ?md? may be used uninitialized in this function [-
> Wmaybe-uninitialized]
>       type_profile_helper(mdo, mdo_offset_bias, md, data, recv, Rtmp1,
> success);
>                                                                                ^
> 
> Thank you.
> 
> Best regards,
> Gustavo


From gromero at linux.vnet.ibm.com  Tue Sep  4 14:11:05 2018
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Tue, 4 Sep 2018 11:11:05 -0300
Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR
 assembler code
In-Reply-To: <36d840d7-1050-80a4-5e78-fdb36a734702@redhat.com>
References: <fb8bc7c4-05f8-001b-316a-75c316a02417@linux.vnet.ibm.com>
 <36d840d7-1050-80a4-5e78-fdb36a734702@redhat.com>
Message-ID: <96db713b-a1af-d0f6-55fe-053f4780dab8@linux.vnet.ibm.com>

Hi Matthias and Aleksey,

Thanks for reviewing it.

On 09/04/2018 10:49 AM, Aleksey Shipilev wrote:
> On 09/04/2018 03:42 PM, Gustavo Romero wrote:
>> May I please request reviews for this tiny change that fixes two
>> uninitialized variables in PPC64 C1 LIR code?
>>
>> Bug   : https://bugs.openjdk.java.net/browse/JDK-8210320
>> Webrev: http://cr.openjdk.java.net/~gromero/8210320/v1/
> 
> Looks good and trivial to me.

Aleksey, I've updated that change to include another case pointed out by Matthias:

http://cr.openjdk.java.net/~gromero/8210320/v2/

I think it's still trivial as before?

If so it means I can push it once I receive a second OK from you?

I also think I don't need to push it first to the 'submit' repo since it's
a PPC64-only change. Is that correct?

Thank you.

Best regards,
Gustavo


From martin.doerr at sap.com  Tue Sep  4 14:12:09 2018
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 4 Sep 2018 14:12:09 +0000
Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR
 assembler code
In-Reply-To: <fb8bc7c4-05f8-001b-316a-75c316a02417@linux.vnet.ibm.com>
References: <fb8bc7c4-05f8-001b-316a-75c316a02417@linux.vnet.ibm.com>
Message-ID: <dd3c702e218f4ec082b22aba4da3bfb5@sap.com>

Hi Gustavo,

it's not a real bug, just a build warning. But it needs to get fixed. Thanks for doing it. Reviewed.

Best regards,
Martin


-----Original Message-----
From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Gustavo Romero
Sent: Dienstag, 4. September 2018 15:42
To: hotspot-compiler-dev at openjdk.java.net
Cc: ppc-aix-port-dev at openjdk.java.net
Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler code
Importance: High

Hi,

May I please request reviews for this tiny change that fixes two
uninitialized variables in PPC64 C1 LIR code?

Bug   : https://bugs.openjdk.java.net/browse/JDK-8210320
Webrev: http://cr.openjdk.java.net/~gromero/8210320/v1/

GCC 4.8 does not complain about these two uninitialized pointers ('data'
and 'md') but more recent versions, like 5.4.0 and 7.3.1, complain about
it:

In file included from /home/gromero/hg/jdk/jdk/src/hotspot/share/c1/c1_Compilation.hpp:29:0,
                  from /home/gromero/hg/jdk/jdk/src/hotspot/share/precompiled/precompiled.hpp:286:
/home/gromero/hg/jdk/jdk/src/hotspot/share/ci/ciMethodData.hpp: In member function ?void LIR_Assembler::emit_typecheck_helper(LIR_OpTypeCheck*, Label*, Label*, Label*)?:
/home/gromero/hg/jdk/jdk/src/hotspot/share/ci/ciMethodData.hpp:595:100: warning: ?data? may be used uninitialized in this function [-Wmaybe-uninitialized]
    int      byte_offset_of_slot(ciProfileData* data, ByteSize slot_offset_in_data) { return in_bytes(offset_of_slot(data, slot_offset_in_data)); }
                                                                                                     ^
/home/gromero/hg/jdk/jdk/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp:2400:18: note: ?data? was declared here
    ciProfileData* data;
                   ^
/home/gromero/hg/jdk/jdk/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp:2483:78: warning: ?md? may be used uninitialized in this function [-Wmaybe-uninitialized]
      type_profile_helper(mdo, mdo_offset_bias, md, data, recv, Rtmp1, success);
                                                                               ^

Thank you.

Best regards,
Gustavo


From martin.doerr at sap.com  Tue Sep  4 14:15:29 2018
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 4 Sep 2018 14:15:29 +0000
Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR
 assembler code
In-Reply-To: <96db713b-a1af-d0f6-55fe-053f4780dab8@linux.vnet.ibm.com>
References: <fb8bc7c4-05f8-001b-316a-75c316a02417@linux.vnet.ibm.com>
 <36d840d7-1050-80a4-5e78-fdb36a734702@redhat.com>
 <96db713b-a1af-d0f6-55fe-053f4780dab8@linux.vnet.ibm.com>
Message-ID: <afcd61a67db44b0f8fc501d22d51fde6@sap.com>

Hi Gustavo,

> I think it's still trivial as before?
Yes.

> If so it means I can push it once I receive a second OK from you?
> 
> I also think I don't need to push it first to the 'submit' repo since it's
> a PPC64-only change. Is that correct?
That's fine (assuming you have run a local build).

Best regards,
Martin


-----Original Message-----
From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Gustavo Romero
Sent: Dienstag, 4. September 2018 16:11
To: Aleksey Shipilev <shade at redhat.com>; hotspot-compiler-dev at openjdk.java.net; Baesken, Matthias <matthias.baesken at sap.com>
Cc: ppc-aix-port-dev at openjdk.java.net
Subject: Re: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler code

Hi Matthias and Aleksey,

Thanks for reviewing it.

On 09/04/2018 10:49 AM, Aleksey Shipilev wrote:
> On 09/04/2018 03:42 PM, Gustavo Romero wrote:
>> May I please request reviews for this tiny change that fixes two
>> uninitialized variables in PPC64 C1 LIR code?
>>
>> Bug   : https://bugs.openjdk.java.net/browse/JDK-8210320
>> Webrev: http://cr.openjdk.java.net/~gromero/8210320/v1/
> 
> Looks good and trivial to me.

Aleksey, I've updated that change to include another case pointed out by Matthias:

http://cr.openjdk.java.net/~gromero/8210320/v2/

I think it's still trivial as before?

If so it means I can push it once I receive a second OK from you?

I also think I don't need to push it first to the 'submit' repo since it's
a PPC64-only change. Is that correct?

Thank you.

Best regards,
Gustavo


From shade at redhat.com  Tue Sep  4 14:15:32 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 4 Sep 2018 16:15:32 +0200
Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR
 assembler code
In-Reply-To: <96db713b-a1af-d0f6-55fe-053f4780dab8@linux.vnet.ibm.com>
References: <fb8bc7c4-05f8-001b-316a-75c316a02417@linux.vnet.ibm.com>
 <36d840d7-1050-80a4-5e78-fdb36a734702@redhat.com>
 <96db713b-a1af-d0f6-55fe-053f4780dab8@linux.vnet.ibm.com>
Message-ID: <ec77d35b-279a-1802-ca0f-0eea766bfcdb@redhat.com>

On 09/04/2018 04:11 PM, Gustavo Romero wrote:
> http://cr.openjdk.java.net/~gromero/8210320/v2/
> 
> I think it's still trivial as before?

Yes.

> If so it means I can push it once I receive a second OK from you?

Yes, this is trivial, and AFAIU only one Reviewer is required for trivial issues.

> I also think I don't need to push it first to the 'submit' repo since it's
> a PPC64-only change. Is that correct?

Yes, I don't see the need to test trivial patches like this with submit repo.

Thanks,
-Aleksey


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180904/bb824efb/signature-0001.asc>

From gromero at linux.vnet.ibm.com  Tue Sep  4 14:47:54 2018
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Tue, 4 Sep 2018 11:47:54 -0300
Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR
 assembler code
In-Reply-To: <dd3c702e218f4ec082b22aba4da3bfb5@sap.com>
References: <fb8bc7c4-05f8-001b-316a-75c316a02417@linux.vnet.ibm.com>
 <dd3c702e218f4ec082b22aba4da3bfb5@sap.com>
Message-ID: <9accaa5c-5bd5-b07f-e246-c3956bff5643@linux.vnet.ibm.com>

Hi Martin!

On 09/04/2018 11:12 AM, Doerr, Martin wrote:
> Hi Gustavo,
> 
> it's not a real bug, just a build warning. But it needs to get fixed. Thanks for doing it. Reviewed.

Thanks for reviewing it. Yes, I agree.

Btw, I tried to precisely determine which change was introduced in gcc 7.3
(for instance) hoping it was only a matter of an additional -Wextra or
-Wall in a gcc spec but it turned out that that was not the case
apparently... I could not find a reasonable change in gcc flags or source
code that might cause such a warnings when gcc 7.3 is used.

I've create a "test case" from JVM code for that [1] (which is still
4.4 MiB since I didn't have the change to prune it further). But curious
enough although the following simple code really triggers the same warning
_both_ on gcc 4.8 and 7.3 when compiled with:

$ g++  -Wuninitialized -O3 mu.cpp -c -o mu

mu.cpp:

void foo(int x) {
   printf("%d\n", x+1);
}

int main(int argc, char** argv)
{
   int x;
   switch (argc) {
     case 1: x = 1;
       break;
     case 2: x = 4;
       break;
     case 3: x = 5;
     }
   foo(x);
}

code [1] only triggers the warning in question when gcc 7.3 is used (using
the exact same flags):

$ g++ -Wuninitialized -O3 ok.cpp -c -o ok.o

Passing '-v' to gcc to check the flags from spec didnt show any clue.

Toolchain folks also were not able to tell any differences that could
account for that behavior on gcc 7.3 without a detailed look...

Anyway, it's only a note :)

Thanks.


Best regards,
Gustavo

[1] http://cr.openjdk.java.net/~gromero/misc/ok.cpp

> Best regards,
> Martin
> 
> 
> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Gustavo Romero
> Sent: Dienstag, 4. September 2018 15:42
> To: hotspot-compiler-dev at openjdk.java.net
> Cc: ppc-aix-port-dev at openjdk.java.net
> Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler code
> Importance: High
> 
> Hi,
> 
> May I please request reviews for this tiny change that fixes two
> uninitialized variables in PPC64 C1 LIR code?
> 
> Bug   : https://bugs.openjdk.java.net/browse/JDK-8210320
> Webrev: http://cr.openjdk.java.net/~gromero/8210320/v1/
> 
> GCC 4.8 does not complain about these two uninitialized pointers ('data'
> and 'md') but more recent versions, like 5.4.0 and 7.3.1, complain about
> it:
> 
> In file included from /home/gromero/hg/jdk/jdk/src/hotspot/share/c1/c1_Compilation.hpp:29:0,
>                    from /home/gromero/hg/jdk/jdk/src/hotspot/share/precompiled/precompiled.hpp:286:
> /home/gromero/hg/jdk/jdk/src/hotspot/share/ci/ciMethodData.hpp: In member function ?void LIR_Assembler::emit_typecheck_helper(LIR_OpTypeCheck*, Label*, Label*, Label*)?:
> /home/gromero/hg/jdk/jdk/src/hotspot/share/ci/ciMethodData.hpp:595:100: warning: ?data? may be used uninitialized in this function [-Wmaybe-uninitialized]
>      int      byte_offset_of_slot(ciProfileData* data, ByteSize slot_offset_in_data) { return in_bytes(offset_of_slot(data, slot_offset_in_data)); }
>                                                                                                       ^
> /home/gromero/hg/jdk/jdk/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp:2400:18: note: ?data? was declared here
>      ciProfileData* data;
>                     ^
> /home/gromero/hg/jdk/jdk/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp:2483:78: warning: ?md? may be used uninitialized in this function [-Wmaybe-uninitialized]
>        type_profile_helper(mdo, mdo_offset_bias, md, data, recv, Rtmp1, success);
>                                                                                 ^
> 
> Thank you.
> 
> Best regards,
> Gustavo
> 


From gromero at linux.vnet.ibm.com  Tue Sep  4 14:49:36 2018
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Tue, 4 Sep 2018 11:49:36 -0300
Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR
 assembler code
In-Reply-To: <afcd61a67db44b0f8fc501d22d51fde6@sap.com>
References: <fb8bc7c4-05f8-001b-316a-75c316a02417@linux.vnet.ibm.com>
 <36d840d7-1050-80a4-5e78-fdb36a734702@redhat.com>
 <96db713b-a1af-d0f6-55fe-053f4780dab8@linux.vnet.ibm.com>
 <afcd61a67db44b0f8fc501d22d51fde6@sap.com>
Message-ID: <04bb4c95-6252-1837-e7bf-4ad2d2e411c0@linux.vnet.ibm.com>

On 09/04/2018 11:15 AM, Doerr, Martin wrote:
> Hi Gustavo,
> 
>> I think it's still trivial as before?
> Yes.
> 
>> If so it means I can push it once I receive a second OK from you?
>>
>> I also think I don't need to push it first to the 'submit' repo since it's
>> a PPC64-only change. Is that correct?
> That's fine (assuming you have run a local build).

Sure :)


Regards,
Gustavo
  
> Best regards,
> Martin
> 
> 
> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Gustavo Romero
> Sent: Dienstag, 4. September 2018 16:11
> To: Aleksey Shipilev <shade at redhat.com>; hotspot-compiler-dev at openjdk.java.net; Baesken, Matthias <matthias.baesken at sap.com>
> Cc: ppc-aix-port-dev at openjdk.java.net
> Subject: Re: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler code
> 
> Hi Matthias and Aleksey,
> 
> Thanks for reviewing it.
> 
> On 09/04/2018 10:49 AM, Aleksey Shipilev wrote:
>> On 09/04/2018 03:42 PM, Gustavo Romero wrote:
>>> May I please request reviews for this tiny change that fixes two
>>> uninitialized variables in PPC64 C1 LIR code?
>>>
>>> Bug   : https://bugs.openjdk.java.net/browse/JDK-8210320
>>> Webrev: http://cr.openjdk.java.net/~gromero/8210320/v1/
>>
>> Looks good and trivial to me.
> 
> Aleksey, I've updated that change to include another case pointed out by Matthias:
> 
> http://cr.openjdk.java.net/~gromero/8210320/v2/
> 
> I think it's still trivial as before?
> 
> If so it means I can push it once I receive a second OK from you?
> 
> I also think I don't need to push it first to the 'submit' repo since it's
> a PPC64-only change. Is that correct?
> 
> Thank you.
> 
> Best regards,
> Gustavo
> 


From gromero at linux.vnet.ibm.com  Tue Sep  4 14:52:35 2018
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Tue, 4 Sep 2018 11:52:35 -0300
Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR
 assembler code
In-Reply-To: <ec77d35b-279a-1802-ca0f-0eea766bfcdb@redhat.com>
References: <fb8bc7c4-05f8-001b-316a-75c316a02417@linux.vnet.ibm.com>
 <36d840d7-1050-80a4-5e78-fdb36a734702@redhat.com>
 <96db713b-a1af-d0f6-55fe-053f4780dab8@linux.vnet.ibm.com>
 <ec77d35b-279a-1802-ca0f-0eea766bfcdb@redhat.com>
Message-ID: <e66755c2-bbdc-9d1a-4247-bd4dafd72d94@linux.vnet.ibm.com>

On 09/04/2018 11:15 AM, Aleksey Shipilev wrote:
> On 09/04/2018 04:11 PM, Gustavo Romero wrote:
>> http://cr.openjdk.java.net/~gromero/8210320/v2/
>>
>> I think it's still trivial as before?
> 
> Yes.
> 
>> If so it means I can push it once I receive a second OK from you?
> 
> Yes, this is trivial, and AFAIU only one Reviewer is required for trivial issues.

Got it. Thanks for confirming it. Either way Martin reviewed it also by now.


>> I also think I don't need to push it first to the 'submit' repo since it's
>> a PPC64-only change. Is that correct?
> 
> Yes, I don't see the need to test trivial patches like this with submit repo.

OK. Thanks!


Best regards,
Gustavo

> Thanks,
> -Aleksey
> 
> 


From martin.doerr at sap.com  Tue Sep  4 16:20:58 2018
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 4 Sep 2018 16:20:58 +0000
Subject: RFR: 8208171: PPC64: Enrich SLP support
In-Reply-To: <OF1DDE5FDC.048940CF-ON002582FD.007F93C0-492582FE.001E65D8@notes.na.collabserv.com>
References: <OF27BFB560.0558A39A-ON002582D5.000EEFEA-492582D5.001F7E2E@notes.na.collabserv.com>
 <b235720a-353b-c0e4-d494-eb337eba06a7@linux.vnet.ibm.com>
 <OFCAFD0563.FF8B85D0-ON002582D6.0019C339-492582D6.0019F896@notes.na.collabserv.com>
 <f9be32d4-556e-5833-1c4c-8fed133f26d8@linux.vnet.ibm.com>
 <346da54af45243c4bdaf475f118a450d@sap.com>
 <9553d65d98f74f37a35b49a1e39f015e@sap.com>
 <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com>
 <fc4310e4a8544c69b3d14fd593a065fc@sap.com>
 <OF1DDE5FDC.048940CF-ON002582FD.007F93C0-492582FE.001E65D8@notes.na.collabserv.com>
Message-ID: <57ebd30a66504577a6b2ec267aee4b69@sap.com>

Hi Michihiro,

thanks for looking into the problems.

I also prefer "vnoreg" and "vsnoreg".

I'd be fine with just adding "&& SuperwordUseVSX" for the new rules in "match_rule_supported".

Can you reproduce the test failures?
The very same VM works fine on a different Power8 machine which uses the same instructions by C2.
The VM was built on the machine where it works ("SUSE Linux Enterprise Server 12 SP1").
I have seen several linux kernel changes regarding saving and restoring the VSX registers.
I still haven't found out how the kernel determines things like "tsk->thread.used_vsr" which is used to set "msr |= MSR_VEC".
Maybe something is missing which tells the kernel that we're using it. But that's just a guess.

Best regards,
Martin


From: Michihiro Horie <HORIE at jp.ibm.com>
Sent: Dienstag, 4. September 2018 07:32
To: Doerr, Martin <martin.doerr at sap.com>
Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Gustavo Romero <gromero at linux.vnet.ibm.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; hotspot-dev at openjdk.java.net
Subject: RE: RFR: 8208171: PPC64: Enrich SLP support


Hi Goetz, Martin, and Gustavo,


>First, this should have been reviewed on hotspot-compiler-dev. It is clearly
>a compiler change.
>https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.openjdk.java.net_mailman_listinfo&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oecsIpYF-cifqq2i1JEH0Q&m=AwJriSOfe9Z0niOEpp6HGgsCBhKwnM19dyn4CipYwyU&s=O9RJz8qw_uJHSJyEdWsuR2j_lgnquX3sbwyEgkFZ3YQ&e=<http://mail.openjdk.java.net/mailman/listinfo> says that hotspot-dev is for
>"Technical discussion about the development of the HotSpot virtual machine that's not specific to any particular component"
>while hotspot-compiler-dev is for
>"Technical discussion about the development of the HotSpot bytecode compilers"
I understood the instruction and would use hotspot-compiler-dev in future RFRs, thanks.


> Why do you rename vnoreg to vnoregi?
I followed the way of coding for vsnoreg and vsnoregi, but the renaming does not look necessary. I would get this part back. Should I also rename vsnoregi to vsnoreg?


>we noticed jtreg test failures when using this change:
>compiler/runtime/safepoints/TestRegisterRestoring.java
>compiler/runtime/Test7196199.java
>
>TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000.
>
>We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler.
>The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default.
Thank you for letting me know the issue, I will try to reproduce this on a SUSE machine.


>I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented.
Thank you for pointing out another issue. Currently I do not hit this problem, but preventing to match the vector nodes makes sense to avoid the crash. I did not prepare match rules for non-vector nodes, so it might be better to prepare them similarly like the Replicate* rules, in any case.


Gustavo, thanks for the wrap-up!


Best regards,
--
Michihiro,
IBM Research - Tokyo

[Inactive hide details for "Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures whe]"Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures when using this change:

From: "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>
To: Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>, "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>, Michihiro Horie <HORIE at jp.ibm.com<mailto:HORIE at jp.ibm.com>>
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>, "hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>>
Date: 2018/09/04 02:18
Subject: RE: RFR: 8208171: PPC64: Enrich SLP support

________________________________


Hi Gustavo and Michihiro,

we noticed jtreg test failures when using this change:
compiler/runtime/safepoints/TestRegisterRestoring.java
compiler/runtime/Test7196199.java

TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000.

We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler.
The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default.

That's what I found out so far. Maybe you have an idea?

I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented.

Best regards,
Martin


-----Original Message-----
From: hotspot-dev <hotspot-dev-bounces at openjdk.java.net<mailto:hotspot-dev-bounces at openjdk.java.net>> On Behalf Of Gustavo Romero
Sent: Montag, 3. September 2018 14:57
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>; Michihiro Horie <HORIE at jp.ibm.com<mailto:HORIE at jp.ibm.com>>
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>; hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>
Subject: Re: RFR: 8208171: PPC64: Enrich SLP support

Hi Goetz,

On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote:
> Also, I can not find all of the mail traffic in
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html.
> Is this a problem of the pipermail server?
>
> For some reason this webrev lacks the links to browse the diffs.
> Do you need to use a more recent webrev?  You can obtain it with
> hg clone http://hg.openjdk.java.net/code-tools/webrev/ .

Yes, probably it was a problem of the pipermail or in some relay.
I noted the same thing, i.e. at least one Michi reply arrived
to me but missed a ML.

The initial discussion is here:
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html

I understand Martin reviewed the last webrev in that thread, which is
http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/  (taken from
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html)

Martin's review of webrev.01:
http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html

and Michi's reply to Martin's review of webrev.01:
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02,
taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html).

and your last review:
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html


HTH.

Best regards,
Gustavo

> Why do you rename vnoreg to vnoregi?
>
> Besides that the change is fine, thanks for implementing this!
>
> Best regards,
>    Goetz.
>
>
>> -----Original Message-----
>> From: Doerr, Martin
>> Sent: Dienstag, 28. August 2018 19:35
>> To: Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>; Michihiro Horie
>> <HORIE at jp.ibm.com<mailto:HORIE at jp.ibm.com>>
>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>; hotspot-
>> dev at openjdk.java.net<mailto:dev at openjdk.java.net>; ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>; Simonis, Volker
>> <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>
>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>>
>> Hi Michihiro,
>>
>> thank you for implementing it. I have just taken a first look at your
>> webrev.01.
>>
>> It looks basically good. Only the Power version check seems to be incorrect.
>> VM_Version::has_popcntb() checks for Power5.
>> I believe most instructions are available with Power7.
>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with
>> Power8?
>> We should check this carefully.
>>
>> Also, indentation in register_ppc.hpp could get improved.
>>
>> Thanks and best regard,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>
>> Sent: Donnerstag, 26. Juli 2018 16:02
>> To: Michihiro Horie <HORIE at jp.ibm.com<mailto:HORIE at jp.ibm.com>>
>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>; hotspot-
>> dev at openjdk.java.net<mailto:dev at openjdk.java.net>; Doerr, Martin <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>; ppc-aix-
>> port-dev at openjdk.java.net<mailto:port-dev at openjdk.java.net>; Simonis, Volker <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>
>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>
>> Hi Michi,
>>
>> On 07/26/2018 01:43 AM, Michihiro Horie wrote:
>>> I updated webrev:
>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/
>>
>> Thanks for providing an updated webrev and for fixing indentation and
>> function
>> order in assembler_ppc.inline.hpp as well. I have no further comments :)
>>
>>
>> Best Regards,
>> Gustavo
>>
>>>
>>> Best regards,
>>> --
>>> Michihiro,
>>> IBM Research - Tokyo
>>>
>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi,
>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---
>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie
>> wrote:
>>>
>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>
>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-
>> dev at openjdk.java.net<mailto:dev at openjdk.java.net>, hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>
>>> Cc: goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>, volker.simonis at sap.com<mailto:volker.simonis at sap.com>, "Doerr, Martin"
>> <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>
>>> Date: 2018/07/25 23:05
>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>>
>>> -------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------------------------
>> -----------------------------------------------------
>>>
>>>
>>>
>>> Hi Michi,
>>>
>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:
>>>   > Dear all,
>>>   >
>>>   > Would you review the following change?
>>>   > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171
>>>   > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00
>>>   >
>>>   > This change adds support for vectorized arithmetic calculation with SLP.
>>>   >
>>>   > The to_vr function is added to convert VSR to VR. Currently, vecX is
>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad,
>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the
>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the
>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to
>> the ConvD2FNode::Value in convertnode.cpp.
>>>
>>> Looks good. Just a few comments:
>>>
>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of
>> vmaddfp in
>>>     order to avoid the splat?
>>>
>>> - Although all instructions added by your change where introduced in ISA
>> 2.06,
>>>     so POWER7 and above are OK, as I see probes for
>> PowerArchictecturePPC64=6|5 in
>>>     vm_version_ppc.cpp (line 64),  I'm wondering if there is any control point
>> to
>>>     guarantee that these instructions won't be emitted on a CPU that does
>> not
>>>     support them.
>>>
>>> - I think that in general string in format %{} are in upper case. For instance,
>>>     this the current output on optoassembly for vmul4F:
>>>
>>> 2941835 5b4     ADDI    R24, R24, #64
>>> 2941836 5b8     vmaddfp  VSR32,VSR32,VSR36      ! mul packed4F
>>> 2941837 5c0     STXVD2X     [R17], VSR32        // store 16-byte Vector
>>>
>>>     I think it would be better to be in upper case instead. I also think that if
>>>     the node match emits more than one instruction all instructions must be
>> listed
>>>     in format %{}, since it's meant for detailed debugging. Finally I think it
>>>     would be better to replace \t! by \t// in that string (unless I'm missing any
>>>     special meaning for that char). So for vmul4F it would be something like:
>>>
>>> 2941835 5b4     ADDI      R24, R24, #64
>>>                   VSPLTISW  VSR34, 0                 // Splat 0 imm in VSR34
>>> 2941836 5b8     VMADDFP   VSR32,VSR32,VSR36,VSR34  // Mul packed4F
>>> 2941837 5c0     STXVD2X   [R17], VSR32             // store 16-byte Vector
>>>
>>>
>>> But feel free to change anything just after you get additional reviews :)
>>>
>>>
>>>   > I confirmed this change with JTREG. In addition, I used attached micro
>> benchmarks.
>>>   > /(See attached file: slp_microbench.zip)/
>>>
>>> Thanks for sharing it.
>>> Btw, another option to host it would be in the CR
>>> server, in http://cr.openjdk.java.net/~mhorie/8208171
>>>
>>>
>>> Best regards,
>>> Gustavo
>>>
>>>   >
>>>   > Best regards,
>>>   > --
>>>   > Michihiro,
>>>   > IBM Research - Tokyo
>>>   >
>>>
>>>
>>>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180904/c074946b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 105 bytes
Desc: image001.gif
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180904/c074946b/image001-0001.gif>

From vladimir.kozlov at oracle.com  Tue Sep  4 17:23:23 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 4 Sep 2018 10:23:23 -0700
Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics
In-Reply-To: <20180904103622.sijpgfiltco4mxd2@rbackman>
References: <c4823bb8-ebb1-62ea-153e-3ea3311d1a61@oracle.com>
 <20180904103622.sijpgfiltco4mxd2@rbackman>
Message-ID: <fbd993d4-3455-de8a-1d50-9fb9afd7d757@oracle.com>

+1

Thanks,
Vladimir

On 9/4/18 3:36 AM, Rickard B?ckman wrote:
> Looks good.
> 
> /R
> 
> On 08/30, Erik ?sterlund wrote:
>> Hi,
>>
>> The JFR getEventWriter() intrinsics have code in C1 and C2 that manually
>> resolves jobjects. This should go through the Access API to make sure the
>> necessary GC barriers are inserted.
>>
>> I noticed this in an attempt to move JNI handle processing out of the pause
>> (among other things). It crashed in kitchensink.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8210158
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.00
>>
>> I tested the patch by running it, together with a patch that moves out JNI
>> handle processing outside of the pause, through hs-tier1-3, as well as
>> running it through Kitchensink24H (as it originally crashed in kitchensink).
>>
>> Thanks,
>> /Erik

From vladimir.kozlov at oracle.com  Tue Sep  4 17:25:58 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 4 Sep 2018 10:25:58 -0700
Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11
In-Reply-To: <dk67ek311xv.fsf@rwestrel.remote.csb>
References: <dk6zhx64pl5.fsf@rwestrel.remote.csb>
 <98587949-4bef-51ad-0652-3a74f79af233@oracle.com>
 <dk67ek311xv.fsf@rwestrel.remote.csb>
Message-ID: <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com>

Looks good.

Thanks,
Vladimir

On 9/3/18 12:21 AM, Roland Westrelin wrote:
> 
> Hi Vladimir,
> 
> Thanks for the review. Here is a new webrev that should address your
> comment.
> 
> http://cr.openjdk.java.net/~roland/8209544/webrev.01/
> 
> Roland.
> 

From erik.osterlund at oracle.com  Tue Sep  4 17:55:30 2018
From: erik.osterlund at oracle.com (Erik Osterlund)
Date: Tue, 4 Sep 2018 19:55:30 +0200
Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics
In-Reply-To: <fbd993d4-3455-de8a-1d50-9fb9afd7d757@oracle.com>
References: <c4823bb8-ebb1-62ea-153e-3ea3311d1a61@oracle.com>
 <20180904103622.sijpgfiltco4mxd2@rbackman>
 <fbd993d4-3455-de8a-1d50-9fb9afd7d757@oracle.com>
Message-ID: <1D400C8F-D4CF-44E3-82C4-00EB32AE103C@oracle.com>

Hi Vladimir,

Thank you for the review.

/Erik

> On 4 Sep 2018, at 19:23, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> +1
> 
> Thanks,
> Vladimir
> 
>> On 9/4/18 3:36 AM, Rickard B?ckman wrote:
>> Looks good.
>> /R
>>> On 08/30, Erik ?sterlund wrote:
>>> Hi,
>>> 
>>> The JFR getEventWriter() intrinsics have code in C1 and C2 that manually
>>> resolves jobjects. This should go through the Access API to make sure the
>>> necessary GC barriers are inserted.
>>> 
>>> I noticed this in an attempt to move JNI handle processing out of the pause
>>> (among other things). It crashed in kitchensink.
>>> 
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8210158
>>> 
>>> Webrev:
>>> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.00
>>> 
>>> I tested the patch by running it, together with a patch that moves out JNI
>>> handle processing outside of the pause, through hs-tier1-3, as well as
>>> running it through Kitchensink24H (as it originally crashed in kitchensink).
>>> 
>>> Thanks,
>>> /Erik


From vladimir.kozlov at oracle.com  Tue Sep  4 18:40:42 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 4 Sep 2018 11:40:42 -0700
Subject: RFR (S): 8209972: [GRAAL] Don't run RTM tests with Graal
In-Reply-To: <ebccc7d7-aa73-9f00-7b91-379ff5cf1a84@linux.vnet.ibm.com>
References: <9148c105-1bd8-0852-9130-6b3c7823330e@linux.vnet.ibm.com>
 <3ae68161-0f78-2766-655f-fe3fe4a4719f@oracle.com>
 <ebccc7d7-aa73-9f00-7b91-379ff5cf1a84@linux.vnet.ibm.com>
Message-ID: <9e7a0c2b-47c0-45b7-a394-a88849cfa3a1@oracle.com>

Thank you Gustavo for detailed answer.

I agree with your suggestion to keep vm.rtm.cpu and remove vm.rtm.os. Changes in VMProps.java are good now.

About property check (flavor == "server" & !emulatedClient) in tests. If C2 (and Graal in a future when it supports RTM) 
is used, this check is always true because C2 (and Graal) is only available in Server VM and it is disabled in 
emulatedClient setting. This check was added before we have C2 usage check which can be done now with vm.rtm.compiler.

Thanks,
Vladimir

On 9/3/18 3:15 PM, Gustavo Romero wrote:
> Hi Vladimir,
> 
> Thanks a lot for reviewing it and for your comments.
> 
> On 08/31/2018 03:12 PM, Vladimir Kozlov wrote:
>> Hi Gustavo,
>>
>> I think you should replace !Compiler.isGraalEnabled() with Compiler.isC2Enabled() because C2 may be switched off with 
>> TieredStopAtLevel < 4 flag
> 
> Yes, although currently afaics all tests will explicitly enabled C2 (for
> instance, passing -Xcomp and -XX:-TieredCompilation) or implicitly enable C2
> through a warming up before testing, I agree that nothing forbids one to
> switch off C2 with TieredStopAtLevel = level < 4 for a test case. It also
> looks better to list explicitly which compilers do support RTM instead of
> the ones that don't support it.
> 
> I've updated the webrev accordingly:
> 
> http://cr.openjdk.java.net/~gromero/8209972/v2/
> 
> diff in there looks odd so I generated another one with --patience for a
> better (IMO) diff format:
> 
> http://cr.openjdk.java.net/~gromero/8209972/v2/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v2.diff
> 
> 
>> Also can platforms check be replaced with one vmRTMCPU()? Is ppc64 return true from vmRTMCPU()?
> 
> For example, on Linux the following cases are possible regarding CPU / OS
> RTM support:
> 
> POWER7?? : cpu = false, os = false???????? => vm.rtm.cpu = false
> POWER8?? : cpu = true,? os = false | true? => vm.rtm.cpu = false | true
> POWER9 VM: cpu = true,? os = false | true? => vm.rtm.cpu = false | true
> POWER9 NV: cpu = true,? os = false???????? => vm.rtm.cpu = false
> 
> PPC64 will return 'true' for vmRTMCPU() _only_ if both CPU and OS support
> RTM. In other words, when @requires asks for "vm.rtm.cpu & vm.rtm.os" it
> really looks like a tautology because if "vm.rtm.cpu" is 'true' it implies
> "vm.rtm.os" being 'true' as well, otherwise the JVM would never advertise
> the "rtm" feature (which is used by "vm.rtm.cpu"). That seems true for
> Linux and for AIX.
> 
> That said I don't think that the platforms check can be replaced with one
> vmRTMCPU(), because in some cases it's necessary to run a test for
> cpu = false and compiler = true, i.e. it's necessary to run a test on an
> unsupported CPU for a given platform _only if_ the compiler in use supports
> RTM (like C2). So if, for instance, we do:
> 
> 'vm.rtm.compiler = C2 && vmRTMCPU()' and use 'vm.rtm.compiler' in @requires
> we "tie" CPU+OS RTM support to compiler RTM support and the evaluation
> returns 'false' for cpu = false and compiler = true, skipping the test
> (vm.rtm.compiler = false). On the other hand, to evaluate 'vm.rtm.compiler'
> as 'true' and run the test in that case one could match for
> '!vm.rtm.compiler', but if compiler = false (Graal) '!vm.rtm.compiler' will
> be evaluated as 'true' and the test will run even thought the Graal
> compiler is selected, which is wrong.
> 
> Hence 'vm.rtm.compiler' property must not rely on vmRTMCPU() and must
> contain its own list of supported compilers with RTM support for each
> platform IMO. Basically we can't ask the JVM about the compiler's support
> for RTM, since the JVM can only tell us about the CPU+OS support for RTM
> regarding the CPU and OS in which the JVM is running on.
> 
> 
>> And since you check cpu in here why not to replace all vm.rtm.* with one vm.rtm.supported? In such case you would need 
>> only one @requires checks in tests instead of:
>>
>> vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os & vm.rtm.compiler
> 
> I think it's not possible either. Currently there are 5 match cases in
> RTM tests:
> 
> gromero at moog:~/hg/jdk/jdk/test/hotspot/jtreg/compiler/rtm$ fgrep @require -RIn | cut -d ' ' -f 2-40 | sort -u
> * @requires !(vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os)
> * @requires vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os
> * @requires (!vm.rtm.cpu) & (vm.flavor == "server" & !vm.emulatedClient)
> * @requires vm.rtm.cpu & !(vm.flavor == "server" & !vm.emulatedClient)
> 
> which can be simplified 5 cases as:
> 
> 1:????????? !(flavor == "server" & !emulatedClient? & cpu & os)
> 2:??????????? flavor == "server" & !emulatedClient? & cpu & os
> 3: (!cpu) &? (flavor == "server" & !emulatedClient)
> 4:?? cpu? & !(flavor == "server" & !emulatedClient)
> 5: no @requires
> 
> I understand that case 1 and 2 (since CPU implies OS) can be simplified as:
> 
> 
> 1:????????? !(flavor == "server" & !emulatedClient? & cpu)
> 2:??????????? flavor == "server" & !emulatedClient? & cpu
> 3: (!cpu) &? (flavor == "server" & !emulatedClient)
> 4:?? cpu? & !(flavor == "server" & !emulatedClient)
> 5: no @requires
> 
> and case 1 and 2 are mere opposites, so we have 4 cases:
> 
> 1:????????? !(flavor == "server" & !emulatedClient? & cpu)
> 3: (!cpu) &? (flavor == "server" & !emulatedClient)
> 4:?? cpu? & !(flavor == "server" & !emulatedClient)
> 5: no @requires
> 
> We could simplify further making P = (flavor == "server" & !emulatedClient),
> and make:
> 
> 1:????????? !(P & cpu)
> 3: (!cpu) &? (P)
> 4:?? cpu? & !(P)
> 5: no @requires
> 
> So if we add a compiler = C2 && (x64 | PPC) property to each of them in
> order to control running the tests only if the selected compiler on a
> given platform has RTM support (skipping Graal, for instance):
> 
> 1:????????? !(P & cpu) & compiler
> 3: (!cpu) &? (P)?????? & compiler
> 4:?? cpu? & !(P)?????? & compiler
> 5: no @requires??????? & compiler
> 
> So it looks like that at minimum we would need 3 properties, but IMO it's
> not worth to add another property P = (flavor == "server" & !emulatedClient)
> just to simplify further the @requires line.
> 
> In summing up, I think it's only possible to replace 'cpu & os' by 'cpu',
> so I updated the webrev removing the vm.rtm.os property and keeping only
> vm.rtm.cpu and vm.rtm.compiler (plus flavor and emulatedClient checks).
> 
> I've tested the following scenarios and observed no regression [1]:
> 
> 1. X86_64 w/ RTM
> 2. X86_64 w/ RTM + Graal enabled
> 3. POWER7: no CPU+OS support for RTM
> 4. POWER8: CPU+OS support for RTM
> 
> But I think we need a confirmation from SAP about AIX.
> 
> 
> Best regards,
> Gustavo
> 
> [1]
> 
> ** X86_64 w/ RTM **
> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
> Passed: compiler/rtm/locking/TestRTMRetryCount.java
> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
> Test results: passed: 30
> 
> 
> ** X86_64 w/ RTM + Graal enabled **
> Test results: no tests selected (all RTM tests skipped)
> 
> 
> ** POWER7: no CPU+OS support for RTM **
> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
> Test results: passed: 10
> 
> 
> ** POWER8: CPU+OS support for RTM **
> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
> Passed: compiler/rtm/locking/TestRTMRetryCount.java
> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
> Test results: passed: 30
> 
> 
>> Thanks,
>> Vladimir
>>
>> On 8/31/18 8:38 AM, Gustavo Romero wrote:
>>> Hi,
>>>
>>> Could the following small change be reviewed please?
>>>
>>> Bug?? : https://bugs.openjdk.java.net/browse/JDK-8209972
>>> Webrev: http://cr.openjdk.java.net/~gromero/8209972/v1/
>>>
>>> It disables all RTM tests when a compiler not supporting RTM (e.g. Graal)
>>> is selected on platforms that can have CPU/OS with RTM support.
>>>
>>> It also disables all RTM tests for any other platform that has not a single
>>> compiler supporting RTM.
>>>
>>> The RTM support was first added to C2 compiler and once checkers for RTM
>>> (notably vm.rtm.cpu) find the feature "rtm" advertised by the JVM they
>>> assume that a compiler supporting RTM is available for sure ("rtm" is
>>> advertised only if RTM is supported by both CPU and OS). Later the JVM
>>> began to allow the selection of a compiler different from C2, like Graal,
>>> and it became possible to select a compiler without RTM support despite the
>>> fact that both the CPU and the OS support RTM. Thus for platforms
>>> supporting Graal or any other specific compiler the compiler availability for
>>> the RTM tests must be adjusted and if the selected compiler does not
>>> support RTM then all RTM tests must be skipped, including the ones meant
>>> for platforms without CPU or OS RTM support (e.g. *Unsupported*.java)
>>> because in some cases, like in TestUseRTMLockingOptionOnUnsupportedCPU.java,
>>> the test expects JVM initialization errors that will never occur because the
>>> problem is not that the RTM support for CPU or OS is missing, but rather
>>> because the selected compiler does not support RTM.
>>>
>>> That change adds a new VM property 'vm.rtm.compiler' which can be used to
>>> filter out compilers without RTM support for specific platforms and adapts
>>> the current RTM tests to use that new property.
>>>
>>> Nothing changes regarding the number of passing/selected tests for the
>>> various cpu/os/compiler combinations on platforms that currently might
>>> support RTM [1], except when Graal is in use.
>>>
>>> Thank you.
>>>
>>> Best regards,
>>> Gustavo
>>>
>>>
>>> [1]
>>>
>>> ** X64 w/ CPU and OS supporting RTM **
>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>> Test results: passed: 30
>>>
>>> ** X64 w/ CPU and OS supporting RTM + Graal compiler wo/ RTM support **
>>> Test results: no tests selected (all RTM tests skipped)
>>>
>>> ** POWER8 w/ CPU and OS supporting RTM **
>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>> Test results: passed: 30
>>>
>>> ** POWER7 wo/ CPU and OS supporting RTM **
>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>> Test results: passed: 10
>>>
>>
> 

From nils.eliasson at oracle.com  Tue Sep  4 19:50:49 2018
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Tue, 4 Sep 2018 21:50:49 +0200
Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics
In-Reply-To: <5B8E5F4B.5060707@oracle.com>
References: <c4823bb8-ebb1-62ea-153e-3ea3311d1a61@oracle.com>
 <dk6h8jc2j7w.fsf@rwestrel.remote.csb>
 <d1a91628-4cef-5a02-6b78-825cc096e79e@oracle.com>
 <5B8E5F4B.5060707@oracle.com>
Message-ID: <29357f36-4afa-9f98-5430-d12df410ca4d@oracle.com>

Looks good!

// Nils


On 2018-09-04 12:32, Erik ?sterlund wrote:
> Hi,
>
> Any more takers?
>
> Full:
> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.01/
>
> Thanks,
> /Erik
>
> On 2018-08-30 17:06, Erik ?sterlund wrote:
>> Hi Roland,
>>
>> Thank you for the review.
>>
>> On 2018-08-30 13:21, Roland Westrelin wrote:
>>> Hi Erik,
>>>
>>>> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.00
>>> make_load() already calls _gvn.transform(), right?
>>
>> Yes you are right. I will remove the redundant _gvn.transform call of 
>> the access_load; it is redundant indeed.
>>
>>> You don't set MO_UNORDERED. Why is it not required?
>>
>> MO_UNORDERED is the default MO of loads and stores. It is set up in 
>> the C2Access object using fixup_decorators() which sets sane defaults 
>> for various decorators, including MO.
>>
>> Thanks,
>> /Erik
>>
>>> Roland.
>>
>


From erik.osterlund at oracle.com  Tue Sep  4 20:00:56 2018
From: erik.osterlund at oracle.com (Erik Osterlund)
Date: Tue, 4 Sep 2018 22:00:56 +0200
Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics
In-Reply-To: <29357f36-4afa-9f98-5430-d12df410ca4d@oracle.com>
References: <c4823bb8-ebb1-62ea-153e-3ea3311d1a61@oracle.com>
 <dk6h8jc2j7w.fsf@rwestrel.remote.csb>
 <d1a91628-4cef-5a02-6b78-825cc096e79e@oracle.com>
 <5B8E5F4B.5060707@oracle.com>
 <29357f36-4afa-9f98-5430-d12df410ca4d@oracle.com>
Message-ID: <66BA2B51-3F67-4A15-ADC2-51529DEB14E9@oracle.com>

Hi Nils,

Thank you for the review!

/Erik

> On 4 Sep 2018, at 21:50, Nils Eliasson <nils.eliasson at oracle.com> wrote:
> 
> Looks good!
> 
> // Nils
> 
> 
>> On 2018-09-04 12:32, Erik ?sterlund wrote:
>> Hi,
>> 
>> Any more takers?
>> 
>> Full:
>> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.01/
>> 
>> Thanks,
>> /Erik
>> 
>>> On 2018-08-30 17:06, Erik ?sterlund wrote:
>>> Hi Roland,
>>> 
>>> Thank you for the review.
>>> 
>>>> On 2018-08-30 13:21, Roland Westrelin wrote:
>>>> Hi Erik,
>>>> 
>>>>> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.00
>>>> make_load() already calls _gvn.transform(), right?
>>> 
>>> Yes you are right. I will remove the redundant _gvn.transform call of the access_load; it is redundant indeed.
>>> 
>>>> You don't set MO_UNORDERED. Why is it not required?
>>> 
>>> MO_UNORDERED is the default MO of loads and stores. It is set up in the C2Access object using fixup_decorators() which sets sane defaults for various decorators, including MO.
>>> 
>>> Thanks,
>>> /Erik
>>> 
>>>> Roland.
>>> 
>> 
> 


From gromero at linux.vnet.ibm.com  Tue Sep  4 22:03:22 2018
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Tue, 4 Sep 2018 19:03:22 -0300
Subject: RFR: 8208171: PPC64: Enrich SLP support
In-Reply-To: <57ebd30a66504577a6b2ec267aee4b69@sap.com>
References: <OF27BFB560.0558A39A-ON002582D5.000EEFEA-492582D5.001F7E2E@notes.na.collabserv.com>
 <b235720a-353b-c0e4-d494-eb337eba06a7@linux.vnet.ibm.com>
 <OFCAFD0563.FF8B85D0-ON002582D6.0019C339-492582D6.0019F896@notes.na.collabserv.com>
 <f9be32d4-556e-5833-1c4c-8fed133f26d8@linux.vnet.ibm.com>
 <346da54af45243c4bdaf475f118a450d@sap.com>
 <9553d65d98f74f37a35b49a1e39f015e@sap.com>
 <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com>
 <fc4310e4a8544c69b3d14fd593a065fc@sap.com>
 <OF1DDE5FDC.048940CF-ON002582FD.007F93C0-492582FE.001E65D8@notes.na.collabserv.com>
 <57ebd30a66504577a6b2ec267aee4b69@sap.com>
Message-ID: <8a093a6e-631e-4c20-07dc-96bc29f8c411@linux.vnet.ibm.com>

Hi Martin and Michi,

On 09/04/2018 01:20 PM, Doerr, Martin wrote:
> Can you reproduce the test failures?
> 
> The very same VM works fine on a different Power8 machine which uses the same instructions by C2.
> 
> The VM was built on the machine where it works ("SUSE Linux Enterprise Server 12 SP1").
> 
> I have seen several linux kernel changes regarding saving and restoring the VSX registers.
> 
> I still haven?t found out how the kernel determines things like ?tsk->thread.used_vsr? which is used to set ?msr |= MSR_VEC?.
> 
> Maybe something is missing which tells the kernel that we?re using it. But that?s just a guess.

Facilities like FP (fp registers), VEC (vector registers - aka VMX/Altivec), and
VSX (vector-scalar registers) are usually disabled on a new born process. Once
any instruction associated to these facilities is used in the process it causes
an exception that is treated by the kernel [1, 2, 3]: kernel enables the
facility that caused the exception (see load_up_fp & friends) and re-execute the
instruction when kernel returns the control back to the process in userspace.

Starting from kernel v4.6 [4] there is a simple heuristic that employs a 8-bit
counter to help track if a process, after using these facilities for the first
time, continues to use the facilities. The counters (load_fp and load_vec) are
incremented on each context switch and if the process stops using the FP or VEC
facilities then they are disabled again with FP/VEC/VSX save/restore on context
switches being disabled as well in order to improve the performance on context
switches by avoiding the FP/VEC/VEX register save/restore.

Either way (before or after the change introduced in v4.6) *that mechanism is
opaque to userspace*, particularly to the process using these facilities. If a
given facility is not enabled by the kernel (in case the CPU does not support
it, kernel sends a SIGILL to the process). It's possible to inspect the thread
member dynamics/state from userspace using tools like 'systemtap' (for
exemple, this simple script can be used to inspect a VRSAVE registers on given
thread that is running a program called 'vrsave_' [5]) or using the 'perf' tool.

"tsk->thread.used_vsr" [6] is actually associated to the VSX facility whilst
MSR_VEC is associated to the VEC/VMX/Altivec facility [7], so
"tsk->thread.used_vsr" is set to 1 once a VSX instruction is used (if it's a new
process or if the load_fp and load_vec counters overflowed and became zero
disabling VSX or if only FP or only VEC  - not both - were used in the process).
In that case kernel will also enable the VSX by MSR |= MSR_VSX. A similar
mechanism drives the FP (MSR_FP) and the VEC (MSR_VEC) facilities.

If both FP and VEC facilities are used the VSX facility is enabled automatically
since FP+VEC regsets == VSX regset [8].

Thus as this mechanism is entirely opaque to userspace I understand that if a
program has to tell to kernel it wants to use any of these facilities
(FP/VEC/VEC) before using it there is something wrong going in kernelspace.

Martin and Michi, if you want any help on drilling it further at kernel side
please let me know, maybe I can help.

I didn't have the chance to reproduce the crash yet, so if I find anything
meaningful about it tomorrow I'll keep you posted.


Kind regards,
Gustavo

[1] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L851-L869   (FP)
[2] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211 (VEC/VMX/Altivec)
[3] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211 (VSX)
[4] https://github.com/torvalds/linux/commit/70fe3d980#diff-ef76830326856a12ea2b45630123d1adR239
[5] http://cr.openjdk.java.net/~gromero/script.d
[6] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R310
[7] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R250
[8] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R437

> Best regards,
> 
> Martin
> 
> *From:*Michihiro Horie <HORIE at jp.ibm.com>
> *Sent:* Dienstag, 4. September 2018 07:32
> *To:* Doerr, Martin <martin.doerr at sap.com>
> *Cc:* Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Gustavo Romero <gromero at linux.vnet.ibm.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; hotspot-dev at openjdk.java.net
> *Subject:* RE: RFR: 8208171: PPC64: Enrich SLP support
> 
> Hi Goetz, Martin, and Gustavo,
> 
> 
>>First, this should have been reviewed on hotspot-compiler-dev. It is clearly
>>a compiler change. _
> _>https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.openjdk.java.net_mailman_listinfo&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oecsIpYF-cifqq2i1JEH0Q&m=AwJriSOfe9Z0niOEpp6HGgsCBhKwnM19dyn4CipYwyU&s=O9RJz8qw_uJHSJyEdWsuR2j_lgnquX3sbwyEgkFZ3YQ&e= <http://mail.openjdk.java.net/mailman/listinfo> says that hotspot-dev is for
>>"Technical discussion about the development of the HotSpot virtual machine that's not specific to any particular component"
>>while hotspot-compiler-dev is for
>>"Technical discussion about the development of the HotSpot bytecode compilers"
> I understood the instruction and would use hotspot-compiler-dev in future RFRs, thanks.
> 
> 
>> Why do you rename vnoreg to vnoregi?
> I followed the way of coding for vsnoreg and vsnoregi, but the renaming does not look necessary. I would get this part back. Should I also rename vsnoregi to vsnoreg?
> 
> 
>>we noticed jtreg test failures when using this change:
>>compiler/runtime/safepoints/TestRegisterRestoring.java
>>compiler/runtime/Test7196199.java
>>
>>TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000.
>>
>>We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler.
>>The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default.
> Thank you for letting me know the issue, I will try to reproduce this on a SUSE machine.
> 
> 
>>I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented.
> Thank you for pointing out another issue. Currently I do not hit this problem, but preventing to match the vector nodes makes sense to avoid the crash. I did not prepare match rules for non-vector nodes, so it might be better to prepare them similarly like the Replicate* rules, in any case.
> 
> 
> Gustavo, thanks for the wrap-up!
> 
> 
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo
> 
> Inactive hide details for "Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures whe"Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures when using this change:
> 
> From: "Doerr, Martin" <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>
> To: Gustavo Romero <gromero at linux.vnet.ibm.com <mailto:gromero at linux.vnet.ibm.com>>, "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>, Michihiro Horie <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>>, "hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>>
> Date: 2018/09/04 02:18
> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
> 
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> 
> 
> Hi Gustavo and Michihiro,
> 
> we noticed jtreg test failures when using this change:
> compiler/runtime/safepoints/TestRegisterRestoring.java
> compiler/runtime/Test7196199.java
> 
> TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000.
> 
> We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler.
> The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default.
> 
> That's what I found out so far. Maybe you have an idea?
> 
> I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented.
> 
> Best regards,
> Martin
> 
> 
> -----Original Message-----
> From: hotspot-dev <hotspot-dev-bounces at openjdk.java.net <mailto:hotspot-dev-bounces at openjdk.java.net>> On Behalf Of Gustavo Romero
> Sent: Montag, 3. September 2018 14:57
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>; Michihiro Horie <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>>; hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>
> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
> 
> Hi Goetz,
> 
> On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote:
>> Also, I can not find all of the mail traffic in
>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html.
>> Is this a problem of the pipermail server?
>> 
>> For some reason this webrev lacks the links to browse the diffs.
>> Do you need to use a more recent webrev?  You can obtain it with
>> hg clone http://hg.openjdk.java.net/code-tools/webrev/ .
> 
> Yes, probably it was a problem of the pipermail or in some relay.
> I noted the same thing, i.e. at least one Michi reply arrived
> to me but missed a ML.
> 
> The initial discussion is here:
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html
> 
> I understand Martin reviewed the last webrev in that thread, which is
> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ <http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/>  (taken from
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html)
> 
> Martin's review of webrev.01:
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html
> 
> and Michi's reply to Martin's review of webrev.01:
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02,
> taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html).
> 
> and your last review:
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html
> 
> 
> HTH.
> 
> Best regards,
> Gustavo
> 
>> Why do you rename vnoreg to vnoregi?
>> 
>> Besides that the change is fine, thanks for implementing this!
>> 
>> Best regards,
>>    Goetz.
>> 
>> 
>>> -----Original Message-----
>>> From: Doerr, Martin
>>> Sent: Dienstag, 28. August 2018 19:35
>>> To: Gustavo Romero <gromero at linux.vnet.ibm.com <mailto:gromero at linux.vnet.ibm.com>>; Michihiro Horie
>>> <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
>>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>; hotspot-
>>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>; ppc-aix-port-dev at openjdk.java.net <mailto:ppc-aix-port-dev at openjdk.java.net>; Simonis, Volker
>>> <volker.simonis at sap.com <mailto:volker.simonis at sap.com>>
>>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>>>
>>> Hi Michihiro,
>>>
>>> thank you for implementing it. I have just taken a first look at your
>>> webrev.01.
>>>
>>> It looks basically good. Only the Power version check seems to be incorrect.
>>> VM_Version::has_popcntb() checks for Power5.
>>> I believe most instructions are available with Power7.
>>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with
>>> Power8?
>>> We should check this carefully.
>>>
>>> Also, indentation in register_ppc.hpp could get improved.
>>>
>>> Thanks and best regard,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com <mailto:gromero at linux.vnet.ibm.com>>
>>> Sent: Donnerstag, 26. Juli 2018 16:02
>>> To: Michihiro Horie <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
>>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>; hotspot-
>>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>; Doerr, Martin <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>; ppc-aix-
>>> port-dev at openjdk.java.net <mailto:port-dev at openjdk.java.net>; Simonis, Volker <volker.simonis at sap.com <mailto:volker.simonis at sap.com>>
>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>>
>>> Hi Michi,
>>>
>>> On 07/26/2018 01:43 AM, Michihiro Horie wrote:
>>>> I updated webrev:
>>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ <http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/>
>>>
>>> Thanks for providing an updated webrev and for fixing indentation and
>>> function
>>> order in assembler_ppc.inline.hpp as well. I have no further comments :)
>>>
>>>
>>> Best Regards,
>>> Gustavo
>>>
>>>>
>>>> Best regards,
>>>> --
>>>> Michihiro,
>>>> IBM Research - Tokyo
>>>>
>>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi,
>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---
>>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie
>>> wrote:
>>>>
>>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com <mailto:gromero at linux.vnet.ibm.com>>
>>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-
>>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>, hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>
>>>> Cc: goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>, volker.simonis at sap.com <mailto:volker.simonis at sap.com>, "Doerr, Martin"
>>> <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>
>>>> Date: 2018/07/25 23:05
>>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>>>
>>>> -------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> -----------------------------------------------------
>>>>
>>>>
>>>>
>>>> Hi Michi,
>>>>
>>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:
>>>>   > Dear all,
>>>>   >
>>>>   > Would you review the following change?
>>>>   > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171
>>>>   > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 <http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.00>
>>>>   >
>>>>   > This change adds support for vectorized arithmetic calculation with SLP.
>>>>   >
>>>>   > The to_vr function is added to convert VSR to VR. Currently, vecX is
>>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad,
>>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the
>>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the
>>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to
>>> the ConvD2FNode::Value in convertnode.cpp.
>>>>
>>>> Looks good. Just a few comments:
>>>>
>>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of
>>> vmaddfp in
>>>>     order to avoid the splat?
>>>>
>>>> - Although all instructions added by your change where introduced in ISA
>>> 2.06,
>>>>     so POWER7 and above are OK, as I see probes for
>>> PowerArchictecturePPC64=6|5 in
>>>>     vm_version_ppc.cpp (line 64),  I'm wondering if there is any control point
>>> to
>>>>     guarantee that these instructions won't be emitted on a CPU that does
>>> not
>>>>     support them.
>>>>
>>>> - I think that in general string in format %{} are in upper case. For instance,
>>>>     this the current output on optoassembly for vmul4F:
>>>>
>>>> 2941835 5b4     ADDI    R24, R24, #64
>>>> 2941836 5b8     vmaddfp  VSR32,VSR32,VSR36      ! mul packed4F
>>>> 2941837 5c0     STXVD2X     [R17], VSR32        // store 16-byte Vector
>>>>
>>>>     I think it would be better to be in upper case instead. I also think that if
>>>>     the node match emits more than one instruction all instructions must be
>>> listed
>>>>     in format %{}, since it's meant for detailed debugging. Finally I think it
>>>>     would be better to replace \t! by \t// in that string (unless I'm missing any
>>>>     special meaning for that char). So for vmul4F it would be something like:
>>>>
>>>> 2941835 5b4     ADDI      R24, R24, #64
>>>>                   VSPLTISW  VSR34, 0                 // Splat 0 imm in VSR34
>>>> 2941836 5b8     VMADDFP   VSR32,VSR32,VSR36,VSR34  // Mul packed4F
>>>> 2941837 5c0     STXVD2X   [R17], VSR32             // store 16-byte Vector
>>>>
>>>>
>>>> But feel free to change anything just after you get additional reviews :)
>>>>
>>>>
>>>>   > I confirmed this change with JTREG. In addition, I used attached micro
>>> benchmarks.
>>>>   > /(See attached file: slp_microbench.zip)/
>>>>
>>>> Thanks for sharing it.
>>>> Btw, another option to host it would be in the CR
>>>> server, in http://cr.openjdk.java.net/~mhorie/8208171 <http://cr.openjdk.java.net/%7Emhorie/8208171>
>>>>
>>>>
>>>> Best regards,
>>>> Gustavo
>>>>
>>>>   >
>>>>   > Best regards,
>>>>   > --
>>>>   > Michihiro,
>>>>   > IBM Research - Tokyo
>>>>   >
>>>>
>>>>
>>>>
>> 
> 
> 
> 
> 

From rwestrel at redhat.com  Wed Sep  5 08:05:00 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 05 Sep 2018 10:05:00 +0200
Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11
In-Reply-To: <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com>
References: <dk6zhx64pl5.fsf@rwestrel.remote.csb>
 <98587949-4bef-51ad-0652-3a74f79af233@oracle.com>
 <dk67ek311xv.fsf@rwestrel.remote.csb>
 <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com>
Message-ID: <dk65zzkz7wz.fsf@rwestrel.remote.csb>


Thanks for the review. Anyone else?

Roland.

From rwestrel at redhat.com  Wed Sep  5 08:06:06 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 05 Sep 2018 10:06:06 +0200
Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics
In-Reply-To: <5B8E5F4B.5060707@oracle.com>
References: <c4823bb8-ebb1-62ea-153e-3ea3311d1a61@oracle.com>
 <dk6h8jc2j7w.fsf@rwestrel.remote.csb>
 <d1a91628-4cef-5a02-6b78-825cc096e79e@oracle.com>
 <5B8E5F4B.5060707@oracle.com>
Message-ID: <dk636uoz7v5.fsf@rwestrel.remote.csb>


Hi Erik,

> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.01/

This one still has useless _gvn.transform() calls.

Roland.

From erik.osterlund at oracle.com  Wed Sep  5 08:16:19 2018
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Wed, 5 Sep 2018 10:16:19 +0200
Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics
In-Reply-To: <dk636uoz7v5.fsf@rwestrel.remote.csb>
References: <c4823bb8-ebb1-62ea-153e-3ea3311d1a61@oracle.com>
 <dk6h8jc2j7w.fsf@rwestrel.remote.csb>
 <d1a91628-4cef-5a02-6b78-825cc096e79e@oracle.com>
 <5B8E5F4B.5060707@oracle.com> <dk636uoz7v5.fsf@rwestrel.remote.csb>
Message-ID: <5B8F90D3.5000000@oracle.com>

Hi Roland,

On 2018-09-05 10:06, Roland Westrelin wrote:
> Hi Erik,
>
>> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.01/
> This one still has useless _gvn.transform() calls.

Fixed.

Full:
http://cr.openjdk.java.net/~eosterlund/8210158/webrev.02

Incremental:
http://cr.openjdk.java.net/~eosterlund/8210158/webrev.01_02

Thanks,
/Erik

>
> Roland.


From rwestrel at redhat.com  Wed Sep  5 08:16:35 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 05 Sep 2018 10:16:35 +0200
Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics
In-Reply-To: <5B8F90D3.5000000@oracle.com>
References: <c4823bb8-ebb1-62ea-153e-3ea3311d1a61@oracle.com>
 <dk6h8jc2j7w.fsf@rwestrel.remote.csb>
 <d1a91628-4cef-5a02-6b78-825cc096e79e@oracle.com>
 <5B8E5F4B.5060707@oracle.com> <dk636uoz7v5.fsf@rwestrel.remote.csb>
 <5B8F90D3.5000000@oracle.com>
Message-ID: <dk6wos0xst8.fsf@rwestrel.remote.csb>


> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.02

Looks good. Thank you.

Roland.

From erik.osterlund at oracle.com  Wed Sep  5 08:20:46 2018
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Wed, 5 Sep 2018 10:20:46 +0200
Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics
In-Reply-To: <dk6wos0xst8.fsf@rwestrel.remote.csb>
References: <c4823bb8-ebb1-62ea-153e-3ea3311d1a61@oracle.com>
 <dk6h8jc2j7w.fsf@rwestrel.remote.csb>
 <d1a91628-4cef-5a02-6b78-825cc096e79e@oracle.com>
 <5B8E5F4B.5060707@oracle.com> <dk636uoz7v5.fsf@rwestrel.remote.csb>
 <5B8F90D3.5000000@oracle.com> <dk6wos0xst8.fsf@rwestrel.remote.csb>
Message-ID: <5B8F91DE.3040709@oracle.com>

Hi Roland,

Thank you for the review.

/Erik

On 2018-09-05 10:16, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.02
> Looks good. Thank you.
>
> Roland.


From vladimir.x.ivanov at oracle.com  Wed Sep  5 09:22:05 2018
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 5 Sep 2018 12:22:05 +0300
Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11
In-Reply-To: <dk67ek311xv.fsf@rwestrel.remote.csb>
References: <dk6zhx64pl5.fsf@rwestrel.remote.csb>
 <98587949-4bef-51ad-0652-3a74f79af233@oracle.com>
 <dk67ek311xv.fsf@rwestrel.remote.csb>
Message-ID: <3c8ae9e3-e3f2-df0e-0add-4d1c54589198@oracle.com>


> http://cr.openjdk.java.net/~roland/8209544/webrev.01/

Looks good.

Best regards,
Vladimir Ivanov

From HORIE at jp.ibm.com  Wed Sep  5 10:22:57 2018
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Wed, 5 Sep 2018 19:22:57 +0900
Subject: RFR: 8208171: PPC64: Enrich SLP support
In-Reply-To: <8a093a6e-631e-4c20-07dc-96bc29f8c411@linux.vnet.ibm.com>
References: <OF27BFB560.0558A39A-ON002582D5.000EEFEA-492582D5.001F7E2E@notes.na.collabserv.com>
 <b235720a-353b-c0e4-d494-eb337eba06a7@linux.vnet.ibm.com>
 <OFCAFD0563.FF8B85D0-ON002582D6.0019C339-492582D6.0019F896@notes.na.collabserv.com>
 <f9be32d4-556e-5833-1c4c-8fed133f26d8@linux.vnet.ibm.com>
 <346da54af45243c4bdaf475f118a450d@sap.com>
 <9553d65d98f74f37a35b49a1e39f015e@sap.com>
 <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com>
 <fc4310e4a8544c69b3d14fd593a065fc@sap.com>
 <OF1DDE5FDC.048940CF-ON002582FD.007F93C0-492582FE.001E65D8@notes.na.collabserv.com>
 <57ebd30a66504577a6b2ec267aee4b69@sap.com>
 <8a093a6e-631e-4c20-07dc-96bc29f8c411@linux.vnet.ibm.com>
Message-ID: <OFD4AAA261.28219E95-ON002582FF.00371825-492582FF.00390865@notes.na.collabserv.com>


Hi Martin, Gustavo,

I cannot still reproduce the problem. I noticed the machine I have is not
SUSE but OpenSUSE with 4.1.21-14-default. I've also tried kernel
4.4.0-31-generic but it's Ubuntu.

Gustavo, is there any suspicious change before/after v4.4, which Martin got
the crash?


Apart from the problem, I uploaded the latest webrev:
http://cr.openjdk.java.net/~mhorie/8208171/webrev.03/


Best regards,
--
Michihiro,
IBM Research - Tokyo


From:	Gustavo Romero <gromero at linux.vnet.ibm.com>
To:	"Doerr, Martin" <martin.doerr at sap.com>, Michihiro
            Horie/Japan/IBM at IBMJP
Cc:	"Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>, hotspot
            compiler <hotspot-compiler-dev at openjdk.java.net>
Date:	2018/09/05 07:03
Subject:	Re: RFR: 8208171: PPC64: Enrich SLP support


Hi Martin and Michi,

On 09/04/2018 01:20 PM, Doerr, Martin wrote:
> Can you reproduce the test failures?
>
> The very same VM works fine on a different Power8 machine which uses the
same instructions by C2.
>
> The VM was built on the machine where it works ("SUSE Linux Enterprise
Server 12 SP1").
>
> I have seen several linux kernel changes regarding saving and restoring
the VSX registers.
>
> I still haven?t found out how the kernel determines things like ?tsk->
thread.used_vsr? which is used to set ?msr |= MSR_VEC?.
>
> Maybe something is missing which tells the kernel that we?re using it.
But that?s just a guess.

Facilities like FP (fp registers), VEC (vector registers - aka
VMX/Altivec), and
VSX (vector-scalar registers) are usually disabled on a new born process.
Once
any instruction associated to these facilities is used in the process it
causes
an exception that is treated by the kernel [1, 2, 3]: kernel enables the
facility that caused the exception (see load_up_fp & friends) and
re-execute the
instruction when kernel returns the control back to the process in
userspace.

Starting from kernel v4.6 [4] there is a simple heuristic that employs a
8-bit
counter to help track if a process, after using these facilities for the
first
time, continues to use the facilities. The counters (load_fp and load_vec)
are
incremented on each context switch and if the process stops using the FP or
VEC
facilities then they are disabled again with FP/VEC/VSX save/restore on
context
switches being disabled as well in order to improve the performance on
context
switches by avoiding the FP/VEC/VEX register save/restore.

Either way (before or after the change introduced in v4.6) *that mechanism
is
opaque to userspace*, particularly to the process using these facilities.
If a
given facility is not enabled by the kernel (in case the CPU does not
support
it, kernel sends a SIGILL to the process). It's possible to inspect the
thread
member dynamics/state from userspace using tools like 'systemtap' (for
exemple, this simple script can be used to inspect a VRSAVE registers on
given
thread that is running a program called 'vrsave_' [5]) or using the 'perf'
tool.

"tsk->thread.used_vsr" [6] is actually associated to the VSX facility
whilst
MSR_VEC is associated to the VEC/VMX/Altivec facility [7], so
"tsk->thread.used_vsr" is set to 1 once a VSX instruction is used (if it's
a new
process or if the load_fp and load_vec counters overflowed and became zero
disabling VSX or if only FP or only VEC  - not both - were used in the
process).
In that case kernel will also enable the VSX by MSR |= MSR_VSX. A similar
mechanism drives the FP (MSR_FP) and the VEC (MSR_VEC) facilities.

If both FP and VEC facilities are used the VSX facility is enabled
automatically
since FP+VEC regsets == VSX regset [8].

Thus as this mechanism is entirely opaque to userspace I understand that if
a
program has to tell to kernel it wants to use any of these facilities
(FP/VEC/VEC) before using it there is something wrong going in kernelspace.

Martin and Michi, if you want any help on drilling it further at kernel
side
please let me know, maybe I can help.

I didn't have the chance to reproduce the crash yet, so if I find anything
meaningful about it tomorrow I'll keep you posted.


Kind regards,
Gustavo

[1]
https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L851-L869
   (FP)
[2]
https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211
 (VEC/VMX/Altivec)
[3]
https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211
 (VSX)
[4]
https://github.com/torvalds/linux/commit/70fe3d980#diff-ef76830326856a12ea2b45630123d1adR239

[5] http://cr.openjdk.java.net/~gromero/script.d
[6]
https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R310

[7]
https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R250

[8]
https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R437


> Best regards,
>
> Martin
>
> *From:*Michihiro Horie <HORIE at jp.ibm.com>
> *Sent:* Dienstag, 4. September 2018 07:32
> *To:* Doerr, Martin <martin.doerr at sap.com>
> *Cc:* Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Gustavo Romero
<gromero at linux.vnet.ibm.com>; hotspot compiler
<hotspot-compiler-dev at openjdk.java.net>; hotspot-dev at openjdk.java.net
> *Subject:* RE: RFR: 8208171: PPC64: Enrich SLP support
>
> Hi Goetz, Martin, and Gustavo,
>
>
>>First, this should have been reviewed on hotspot-compiler-dev. It is
clearly
>>a compiler change. _
> _>
http://mail.openjdk.java.net/mailman/listinfo
 <http://mail.openjdk.java.net/mailman/listinfo> says that hotspot-dev is
for
>>"Technical discussion about the development of the HotSpot virtual
machine that's not specific to any particular component"
>>while hotspot-compiler-dev is for
>>"Technical discussion about the development of the HotSpot bytecode
compilers"
> I understood the instruction and would use hotspot-compiler-dev in future
RFRs, thanks.
>
>
>> Why do you rename vnoreg to vnoregi?
> I followed the way of coding for vsnoreg and vsnoregi, but the renaming
does not look necessary. I would get this part back. Should I also rename
vsnoregi to vsnoreg?
>
>
>>we noticed jtreg test failures when using this change:
>>compiler/runtime/safepoints/TestRegisterRestoring.java
>>compiler/runtime/Test7196199.java
>>
>>TestRegisterRestoring is a simple test which returns arbitrary results
instead of 10000.
>>
>>We didn't see it on all machines, so it might be an issue with
saving&restoring VR registers in the signal handler.
>>The machine which I have used has "SUSE Linux Enterprise Server 12 SP3"
with kernel 4.4.126-94.22-default.
> Thank you for letting me know the issue, I will try to reproduce this on
a SUSE machine.
>
>
>>I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when
your patch is applied. Looks like matching the vector nodes needs to be
prevented.
> Thank you for pointing out another issue. Currently I do not hit this
problem, but preventing to match the vector nodes makes sense to avoid the
crash. I did not prepare match rules for non-vector nodes, so it might be
better to prepare them similarly like the Replicate* rules, in any case.
>
>
> Gustavo, thanks for the wrap-up!
>
>
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo
>
> Inactive hide details for "Doerr, Martin" ---2018/09/04 02:18:24---Hi
Gustavo and Michihiro, we noticed jtreg test failures whe"Doerr, Martin"
---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test
failures when using this change:
>
> From: "Doerr, Martin" <martin.doerr at sap.com <mailto:martin.doerr at sap.com
>>
> To: Gustavo Romero <gromero at linux.vnet.ibm.com <
mailto:gromero at linux.vnet.ibm.com>>, "Lindenmaier, Goetz"
<goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>, Michihiro
Horie <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net <
mailto:hotspot-compiler-dev at openjdk.java.net>>,
"hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>"
<hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>>
> Date: 2018/09/04 02:18
> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>
>
>
>
> Hi Gustavo and Michihiro,
>
> we noticed jtreg test failures when using this change:
> compiler/runtime/safepoints/TestRegisterRestoring.java
> compiler/runtime/Test7196199.java
>
> TestRegisterRestoring is a simple test which returns arbitrary results
instead of 10000.
>
> We didn't see it on all machines, so it might be an issue with
saving&restoring VR registers in the signal handler.
> The machine which I have used has "SUSE Linux Enterprise Server 12 SP3"
with kernel 4.4.126-94.22-default.
>
> That's what I found out so far. Maybe you have an idea?
>
> I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when
your patch is applied. Looks like matching the vector nodes needs to be
prevented.
>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: hotspot-dev <hotspot-dev-bounces at openjdk.java.net <
mailto:hotspot-dev-bounces at openjdk.java.net>> On Behalf Of Gustavo Romero
> Sent: Montag, 3. September 2018 14:57
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com <
mailto:goetz.lindenmaier at sap.com>>; Michihiro Horie <HORIE at jp.ibm.com <
mailto:HORIE at jp.ibm.com>>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net <
mailto:hotspot-compiler-dev at openjdk.java.net>>;
hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>
> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>
> Hi Goetz,
>
> On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote:
>> Also, I can not find all of the mail traffic in
>>
http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html.
>> Is this a problem of the pipermail server?
>>
>> For some reason this webrev lacks the links to browse the diffs.
>> Do you need to use a more recent webrev?  You can obtain it with
>> hg clone http://hg.openjdk.java.net/code-tools/webrev/ .
>
> Yes, probably it was a problem of the pipermail or in some relay.
> I noted the same thing, i.e. at least one Michi reply arrived
> to me but missed a ML.
>
> The initial discussion is here:
>
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html

>
> I understand Martin reviewed the last webrev in that thread, which is
> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ <
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/>  (taken from
>
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html
)
>
> Martin's review of webrev.01:
>
http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html
>
> and Michi's reply to Martin's review of webrev.01:
>
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html
 (with webrev.02,
> taken from
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html
).
>
> and your last review:
>
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html

>
>
> HTH.
>
> Best regards,
> Gustavo
>
>> Why do you rename vnoreg to vnoregi?
>>
>> Besides that the change is fine, thanks for implementing this!
>>
>> Best regards,
>>    Goetz.
>>
>>
>>> -----Original Message-----
>>> From: Doerr, Martin
>>> Sent: Dienstag, 28. August 2018 19:35
>>> To: Gustavo Romero <gromero at linux.vnet.ibm.com <
mailto:gromero at linux.vnet.ibm.com>>; Michihiro Horie
>>> <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
>>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com <
mailto:goetz.lindenmaier at sap.com>>; hotspot-
>>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>;
ppc-aix-port-dev at openjdk.java.net <mailto:ppc-aix-port-dev at openjdk.java.net
>; Simonis, Volker
>>> <volker.simonis at sap.com <mailto:volker.simonis at sap.com>>
>>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>>>
>>> Hi Michihiro,
>>>
>>> thank you for implementing it. I have just taken a first look at your
>>> webrev.01.
>>>
>>> It looks basically good. Only the Power version check seems to be
incorrect.
>>> VM_Version::has_popcntb() checks for Power5.
>>> I believe most instructions are available with Power7.
>>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with
>>> Power8?
>>> We should check this carefully.
>>>
>>> Also, indentation in register_ppc.hpp could get improved.
>>>
>>> Thanks and best regard,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com <
mailto:gromero at linux.vnet.ibm.com>>
>>> Sent: Donnerstag, 26. Juli 2018 16:02
>>> To: Michihiro Horie <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
>>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com <
mailto:goetz.lindenmaier at sap.com>>; hotspot-
>>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>; Doerr, Martin
<martin.doerr at sap.com <mailto:martin.doerr at sap.com>>; ppc-aix-
>>> port-dev at openjdk.java.net <mailto:port-dev at openjdk.java.net>; Simonis,
Volker <volker.simonis at sap.com <mailto:volker.simonis at sap.com>>
>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>>
>>> Hi Michi,
>>>
>>> On 07/26/2018 01:43 AM, Michihiro Horie wrote:
>>>> I updated webrev:
>>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ <
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/>
>>>
>>> Thanks for providing an updated webrev and for fixing indentation and
>>> function
>>> order in assembler_ppc.inline.hpp as well. I have no further
comments :)
>>>
>>>
>>> Best Regards,
>>> Gustavo
>>>
>>>>
>>>> Best regards,
>>>> --
>>>> Michihiro,
>>>> IBM Research - Tokyo
>>>>
>>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi
Michi,
>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---
>>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie
>>> wrote:
>>>>
>>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com <
mailto:gromero at linux.vnet.ibm.com>>
>>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-
>>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>,
hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>
>>>> Cc: goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>,
volker.simonis at sap.com <mailto:volker.simonis at sap.com>, "Doerr, Martin"
>>> <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>
>>>> Date: 2018/07/25 23:05
>>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>>>
>>>>
-------------------------------------------------------------------------------------------

>>>
----------------------------------------------------------------------------------------------

>>>
----------------------------------------------------------------------------------------------

>>>
----------------------------------------------------------------------------------------------

>>>
----------------------------------------------------------------------------------------------

>>>
----------------------------------------------------------------------------------------------

>>>
----------------------------------------------------------------------------------------------

>>>
----------------------------------------------------------------------------------------------

>>>
----------------------------------------------------------------------------------------------

>>>
----------------------------------------------------------------------------------------------

>>> -----------------------------------------------------
>>>>
>>>>
>>>>
>>>> Hi Michi,
>>>>
>>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:
>>>>   > Dear all,
>>>>   >
>>>>   > Would you review the following change?
>>>>   > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171
>>>>   > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 <
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.00>
>>>>   >
>>>>   > This change adds support for vectorized arithmetic calculation
with SLP.
>>>>   >
>>>>   > The to_vr function is added to convert VSR to VR. Currently, vecX
is
>>> associated with a VSR class vs_reg that only defines VSR32-51 in
ppc.ad,
>>> which are exactly overlapped with VRs. Instruction APIs receiving VRs
use the
>>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the
>>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due
to
>>> the ConvD2FNode::Value in convertnode.cpp.
>>>>
>>>> Looks good. Just a few comments:
>>>>
>>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of
>>> vmaddfp in
>>>>     order to avoid the splat?
>>>>
>>>> - Although all instructions added by your change where introduced in
ISA
>>> 2.06,
>>>>     so POWER7 and above are OK, as I see probes for
>>> PowerArchictecturePPC64=6|5 in
>>>>     vm_version_ppc.cpp (line 64),  I'm wondering if there is any
control point
>>> to
>>>>     guarantee that these instructions won't be emitted on a CPU that
does
>>> not
>>>>     support them.
>>>>
>>>> - I think that in general string in format %{} are in upper case. For
instance,
>>>>     this the current output on optoassembly for vmul4F:
>>>>
>>>> 2941835 5b4     ADDI    R24, R24, #64
>>>> 2941836 5b8     vmaddfp  VSR32,VSR32,VSR36      ! mul packed4F
>>>> 2941837 5c0     STXVD2X     [R17], VSR32        // store 16-byte
Vector
>>>>
>>>>     I think it would be better to be in upper case instead. I also
think that if
>>>>     the node match emits more than one instruction all instructions
must be
>>> listed
>>>>     in format %{}, since it's meant for detailed debugging. Finally I
think it
>>>>     would be better to replace \t! by \t// in that string (unless I'm
missing any
>>>>     special meaning for that char). So for vmul4F it would be
something like:
>>>>
>>>> 2941835 5b4     ADDI      R24, R24, #64
>>>>                   VSPLTISW  VSR34, 0                 // Splat 0 imm in
VSR34
>>>> 2941836 5b8     VMADDFP   VSR32,VSR32,VSR36,VSR34  // Mul packed4F
>>>> 2941837 5c0     STXVD2X   [R17], VSR32             // store 16-byte
Vector
>>>>
>>>>
>>>> But feel free to change anything just after you get additional
reviews :)
>>>>
>>>>
>>>>   > I confirmed this change with JTREG. In addition, I used attached
micro
>>> benchmarks.
>>>>   > /(See attached file: slp_microbench.zip)/
>>>>
>>>> Thanks for sharing it.
>>>> Btw, another option to host it would be in the CR
>>>> server, in http://cr.openjdk.java.net/~mhorie/8208171 <
http://cr.openjdk.java.net/%7Emhorie/8208171>
>>>>
>>>>
>>>> Best regards,
>>>> Gustavo
>>>>
>>>>   >
>>>>   > Best regards,
>>>>   > --
>>>>   > Michihiro,
>>>>   > IBM Research - Tokyo
>>>>   >
>>>>
>>>>
>>>>
>>
>
>
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180905/19f780c9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180905/19f780c9/graycol-0001.gif>

From dmitry.chuyko at bell-sw.com  Wed Sep  5 15:50:34 2018
From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko)
Date: Wed, 5 Sep 2018 18:50:34 +0300
Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11
In-Reply-To: <dk65zzkz7wz.fsf@rwestrel.remote.csb>
References: <dk6zhx64pl5.fsf@rwestrel.remote.csb>
 <98587949-4bef-51ad-0652-3a74f79af233@oracle.com>
 <dk67ek311xv.fsf@rwestrel.remote.csb>
 <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com>
 <dk65zzkz7wz.fsf@rwestrel.remote.csb>
Message-ID: <ff6c3ae6-99a9-47bc-8dea-01232eab4927@bell-sw.com>

I made few runs on ThunderX2 (aarch64). It is funny but I see almost 
reverse difference in small.AESBench.encrypt: ~4% regression for both 
-XX:-UseSwitchProfiling and patched version against current code. No 
difference for full.AESBench.encrypt.

Stub code is the same and profiles differ slightly:

Mainline
 ?53.91%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (128 
bytes)
 ?29.76%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (40 
bytes)
 ? 7.64%???????? c2, level 4 
com.sun.crypto.provider.CipherCore::doFinal, version 868 (356 bytes)

-XX:+UnlockExperimentalVMOptions -XX:-UseSwitchProfiling
 ?57.08%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (132 
bytes)
 ?26.95%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (40 
bytes)
 ? 7.85%???????? c2, level 4 
com.sun.crypto.provider.CipherCore::doFinal, version 860 (384 bytes)

Patched
 ?58.15%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (132 
bytes)
 ?26.44%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (40 
bytes)
 ? 6.67%???????? c2, level 4 
com.sun.crypto.provider.CipherCore::doFinal, version 866 (128 bytes)

-Dmitry

On 09/05/2018 11:05 AM, Roland Westrelin wrote:
> Thanks for the review. Anyone else?
>
> Roland.


From vladimir.kozlov at oracle.com  Wed Sep  5 16:00:44 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 5 Sep 2018 09:00:44 -0700
Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11
In-Reply-To: <ff6c3ae6-99a9-47bc-8dea-01232eab4927@bell-sw.com>
References: <dk6zhx64pl5.fsf@rwestrel.remote.csb>
 <98587949-4bef-51ad-0652-3a74f79af233@oracle.com>
 <dk67ek311xv.fsf@rwestrel.remote.csb>
 <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com>
 <dk65zzkz7wz.fsf@rwestrel.remote.csb>
 <ff6c3ae6-99a9-47bc-8dea-01232eab4927@bell-sw.com>
Message-ID: <dca1a96d-4120-2c7d-6175-31137161ccdd@oracle.com>

Hi Dmitry,

What are (* bytes) values? Is it bytecode size? Why it is different?

Thanks,
Vladimir

On 9/5/18 8:50 AM, Dmitry Chuyko wrote:
> I made few runs on ThunderX2 (aarch64). It is funny but I see almost reverse difference in small.AESBench.encrypt: ~4% 
> regression for both -XX:-UseSwitchProfiling and patched version against current code. No difference for 
> full.AESBench.encrypt.
> 
> Stub code is the same and profiles differ slightly:
> 
> Mainline
>  ?53.91%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (128 bytes)
>  ?29.76%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (40 bytes)
>  ? 7.64%???????? c2, level 4 com.sun.crypto.provider.CipherCore::doFinal, version 868 (356 bytes)
> 
> -XX:+UnlockExperimentalVMOptions -XX:-UseSwitchProfiling
>  ?57.08%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (132 bytes)
>  ?26.95%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (40 bytes)
>  ? 7.85%???????? c2, level 4 com.sun.crypto.provider.CipherCore::doFinal, version 860 (384 bytes)
> 
> Patched
>  ?58.15%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (132 bytes)
>  ?26.44%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (40 bytes)
>  ? 6.67%???????? c2, level 4 com.sun.crypto.provider.CipherCore::doFinal, version 866 (128 bytes)
> 
> -Dmitry
> 
> On 09/05/2018 11:05 AM, Roland Westrelin wrote:
>> Thanks for the review. Anyone else?
>>
>> Roland.
> 

From gromero at linux.vnet.ibm.com  Wed Sep  5 16:20:31 2018
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Wed, 5 Sep 2018 13:20:31 -0300
Subject: RFR: 8208171: PPC64: Enrich SLP support
In-Reply-To: <fc4310e4a8544c69b3d14fd593a065fc@sap.com>
References: <OF27BFB560.0558A39A-ON002582D5.000EEFEA-492582D5.001F7E2E@notes.na.collabserv.com>
 <b235720a-353b-c0e4-d494-eb337eba06a7@linux.vnet.ibm.com>
 <OFCAFD0563.FF8B85D0-ON002582D6.0019C339-492582D6.0019F896@notes.na.collabserv.com>
 <f9be32d4-556e-5833-1c4c-8fed133f26d8@linux.vnet.ibm.com>
 <346da54af45243c4bdaf475f118a450d@sap.com>
 <9553d65d98f74f37a35b49a1e39f015e@sap.com>
 <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com>
 <fc4310e4a8544c69b3d14fd593a065fc@sap.com>
Message-ID: <0fb73e86-12b8-a5d1-f9bb-5f4963606fbc@linux.vnet.ibm.com>

Hi Martin,

On 09/03/2018 02:18 PM, Doerr, Martin wrote:
> Hi Gustavo and Michihiro,
> 
> we noticed jtreg test failures when using this change:
> compiler/runtime/safepoints/TestRegisterRestoring.java
> compiler/runtime/Test7196199.java
> 
> TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000.

Just to confirm I understood the description correctly:

Where you able to check it's returning random values for the
array instead of 10_000 or you just checked that test failed?

Also, did you pass -XX:-SuperwordUseVSX when it failed? I'm
asking because I'm able to fail that test due to a timeout, but not sure
if it's the same you got there. Look (I'm using the same kernel as yours):

http://cr.openjdk.java.net/~gromero/logs/slp_failure0.txt


Thank you.

Best regards,
Gustavo

> We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler.
> The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default.
> 
> That's what I found out so far. Maybe you have an idea?
> 
> I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented.
> 
> Best regards,
> Martin
> 
> 
> -----Original Message-----
> From: hotspot-dev <hotspot-dev-bounces at openjdk.java.net> On Behalf Of Gustavo Romero
> Sent: Montag, 3. September 2018 14:57
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Michihiro Horie <HORIE at jp.ibm.com>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; hotspot-dev at openjdk.java.net
> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
> 
> Hi Goetz,
> 
> On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote:
>> Also, I can not find all of the mail traffic in
>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html.
>> Is this a problem of the pipermail server?
>>
>> For some reason this webrev lacks the links to browse the diffs.
>> Do you need to use a more recent webrev?  You can obtain it with
>> hg clone http://hg.openjdk.java.net/code-tools/webrev/ .
> 
> Yes, probably it was a problem of the pipermail or in some relay.
> I noted the same thing, i.e. at least one Michi reply arrived
> to me but missed a ML.
> 
> The initial discussion is here:
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html
> 
> I understand Martin reviewed the last webrev in that thread, which is
> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/  (taken from
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html)
> 
> Martin's review of webrev.01:
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html
> 
> and Michi's reply to Martin's review of webrev.01:
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02,
> taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html).
> 
> and your last review:
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html
> 
> 
> HTH.
> 
> Best regards,
> Gustavo
>    
>> Why do you rename vnoreg to vnoregi?
>>
>> Besides that the change is fine, thanks for implementing this!
>>
>> Best regards,
>>     Goetz.
>>
>>
>>> -----Original Message-----
>>> From: Doerr, Martin
>>> Sent: Dienstag, 28. August 2018 19:35
>>> To: Gustavo Romero <gromero at linux.vnet.ibm.com>; Michihiro Horie
>>> <HORIE at jp.ibm.com>
>>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>>> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker
>>> <volker.simonis at sap.com>
>>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>>>
>>> Hi Michihiro,
>>>
>>> thank you for implementing it. I have just taken a first look at your
>>> webrev.01.
>>>
>>> It looks basically good. Only the Power version check seems to be incorrect.
>>> VM_Version::has_popcntb() checks for Power5.
>>> I believe most instructions are available with Power7.
>>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with
>>> Power8?
>>> We should check this carefully.
>>>
>>> Also, indentation in register_ppc.hpp could get improved.
>>>
>>> Thanks and best regard,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
>>> Sent: Donnerstag, 26. Juli 2018 16:02
>>> To: Michihiro Horie <HORIE at jp.ibm.com>
>>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>>> dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>; ppc-aix-
>>> port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>
>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>>
>>> Hi Michi,
>>>
>>> On 07/26/2018 01:43 AM, Michihiro Horie wrote:
>>>> I updated webrev:
>>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/
>>>
>>> Thanks for providing an updated webrev and for fixing indentation and
>>> function
>>> order in assembler_ppc.inline.hpp as well. I have no further comments :)
>>>
>>>
>>> Best Regards,
>>> Gustavo
>>>
>>>>
>>>> Best regards,
>>>> --
>>>> Michihiro,
>>>> IBM Research - Tokyo
>>>>
>>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi,
>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---
>>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie
>>> wrote:
>>>>
>>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
>>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-
>>> dev at openjdk.java.net, hotspot-dev at openjdk.java.net
>>>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin"
>>> <martin.doerr at sap.com>
>>>> Date: 2018/07/25 23:05
>>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>>>
>>>> -------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> -----------------------------------------------------
>>>>
>>>>
>>>>
>>>> Hi Michi,
>>>>
>>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:
>>>>    > Dear all,
>>>>    >
>>>>    > Would you review the following change?
>>>>    > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171
>>>>    > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00
>>>>    >
>>>>    > This change adds support for vectorized arithmetic calculation with SLP.
>>>>    >
>>>>    > The to_vr function is added to convert VSR to VR. Currently, vecX is
>>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad,
>>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the
>>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the
>>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to
>>> the ConvD2FNode::Value in convertnode.cpp.
>>>>
>>>> Looks good. Just a few comments:
>>>>
>>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of
>>> vmaddfp in
>>>>      order to avoid the splat?
>>>>
>>>> - Although all instructions added by your change where introduced in ISA
>>> 2.06,
>>>>      so POWER7 and above are OK, as I see probes for
>>> PowerArchictecturePPC64=6|5 in
>>>>      vm_version_ppc.cpp (line 64),  I'm wondering if there is any control point
>>> to
>>>>      guarantee that these instructions won't be emitted on a CPU that does
>>> not
>>>>      support them.
>>>>
>>>> - I think that in general string in format %{} are in upper case. For instance,
>>>>      this the current output on optoassembly for vmul4F:
>>>>
>>>> 2941835 5b4     ADDI    R24, R24, #64
>>>> 2941836 5b8     vmaddfp  VSR32,VSR32,VSR36      ! mul packed4F
>>>> 2941837 5c0     STXVD2X     [R17], VSR32        // store 16-byte Vector
>>>>
>>>>      I think it would be better to be in upper case instead. I also think that if
>>>>      the node match emits more than one instruction all instructions must be
>>> listed
>>>>      in format %{}, since it's meant for detailed debugging. Finally I think it
>>>>      would be better to replace \t! by \t// in that string (unless I'm missing any
>>>>      special meaning for that char). So for vmul4F it would be something like:
>>>>
>>>> 2941835 5b4     ADDI      R24, R24, #64
>>>>                    VSPLTISW  VSR34, 0                 // Splat 0 imm in VSR34
>>>> 2941836 5b8     VMADDFP   VSR32,VSR32,VSR36,VSR34  // Mul packed4F
>>>> 2941837 5c0     STXVD2X   [R17], VSR32             // store 16-byte Vector
>>>>
>>>>
>>>> But feel free to change anything just after you get additional reviews :)
>>>>
>>>>
>>>>    > I confirmed this change with JTREG. In addition, I used attached micro
>>> benchmarks.
>>>>    > /(See attached file: slp_microbench.zip)/
>>>>
>>>> Thanks for sharing it.
>>>> Btw, another option to host it would be in the CR
>>>> server, in http://cr.openjdk.java.net/~mhorie/8208171
>>>>
>>>>
>>>> Best regards,
>>>> Gustavo
>>>>
>>>>    >
>>>>    > Best regards,
>>>>    > --
>>>>    > Michihiro,
>>>>    > IBM Research - Tokyo
>>>>    >
>>>>
>>>>
>>>>
>>
> 


From dmitry.chuyko at bell-sw.com  Wed Sep  5 16:24:51 2018
From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko)
Date: Wed, 5 Sep 2018 19:24:51 +0300
Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11
In-Reply-To: <dca1a96d-4120-2c7d-6175-31137161ccdd@oracle.com>
References: <dk6zhx64pl5.fsf@rwestrel.remote.csb>
 <98587949-4bef-51ad-0652-3a74f79af233@oracle.com>
 <dk67ek311xv.fsf@rwestrel.remote.csb>
 <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com>
 <dk65zzkz7wz.fsf@rwestrel.remote.csb>
 <ff6c3ae6-99a9-47bc-8dea-01232eab4927@bell-sw.com>
 <dca1a96d-4120-2c7d-6175-31137161ccdd@oracle.com>
Message-ID: <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com>

On 09/05/2018 07:00 PM, Vladimir Kozlov wrote:
> Hi Dmitry,
>
> What are (* bytes) values? Is it bytecode size? Why it is different?
It is a distance between captured event addresses in particular hot 
region (first and last).
Perf attributes one more instruction (2 instrs down) in 132 bytes case, 
it is just a comparison with 52 (0.37%). The code is the same so this 
doesn't look too suspicious to me. But different percentage for stub 
parts does. Note, regions percentage distribution after inlining looks 
the same, e.g.

....[Hottest Methods (after 
inlining)]..............................................................
 ?83.67%??????? runtime stub? StubRoutines::aescrypt_encryptBlock
 ? 7.69%???????? c2, level 4 
com.sun.crypto.provider.CipherCore::doFinal, version 868
 ? 4.34%???????? c2, level 4 
org.openjdk.bench.javax.crypto.small.generated.AESBench_encrypt_jmhTest::encrypt_thrpt_jmhStub, 
version 889

and

 ?84.03%??????? runtime stub? StubRoutines::aescrypt_encryptBlock
 ? 7.85%???????? c2, level 4 
com.sun.crypto.provider.CipherCore::doFinal, version 860
 ? 4.22%???????? c2, level 4 
org.openjdk.bench.javax.crypto.small.generated.AESBench_encrypt_jmhTest::encrypt_thrpt_jmhStub, 
version 880

-Dmitry

>
> Thanks,
> Vladimir
>
> On 9/5/18 8:50 AM, Dmitry Chuyko wrote:
>> I made few runs on ThunderX2 (aarch64). It is funny but I see almost 
>> reverse difference in small.AESBench.encrypt: ~4% regression for both 
>> -XX:-UseSwitchProfiling and patched version against current code. No 
>> difference for full.AESBench.encrypt.
>>
>> Stub code is the same and profiles differ slightly:
>>
>> Mainline
>> ??53.91%??????? runtime stub StubRoutines::aescrypt_encryptBlock (128 
>> bytes)
>> ??29.76%??????? runtime stub StubRoutines::aescrypt_encryptBlock (40 
>> bytes)
>> ?? 7.64%???????? c2, level 4 
>> com.sun.crypto.provider.CipherCore::doFinal, version 868 (356 bytes)
>>
>> -XX:+UnlockExperimentalVMOptions -XX:-UseSwitchProfiling
>> ??57.08%??????? runtime stub StubRoutines::aescrypt_encryptBlock (132 
>> bytes)
>> ??26.95%??????? runtime stub StubRoutines::aescrypt_encryptBlock (40 
>> bytes)
>> ?? 7.85%???????? c2, level 4 
>> com.sun.crypto.provider.CipherCore::doFinal, version 860 (384 bytes)
>>
>> Patched
>> ??58.15%??????? runtime stub StubRoutines::aescrypt_encryptBlock (132 
>> bytes)
>> ??26.44%??????? runtime stub StubRoutines::aescrypt_encryptBlock (40 
>> bytes)
>> ?? 6.67%???????? c2, level 4 
>> com.sun.crypto.provider.CipherCore::doFinal, version 866 (128 bytes)
>>
>> -Dmitry
>>
>> On 09/05/2018 11:05 AM, Roland Westrelin wrote:
>>> Thanks for the review. Anyone else?
>>>
>>> Roland.
>>


From dmitry.chuyko at bell-sw.com  Wed Sep  5 17:11:44 2018
From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko)
Date: Wed, 5 Sep 2018 20:11:44 +0300
Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11
In-Reply-To: <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com>
References: <dk6zhx64pl5.fsf@rwestrel.remote.csb>
 <98587949-4bef-51ad-0652-3a74f79af233@oracle.com>
 <dk67ek311xv.fsf@rwestrel.remote.csb>
 <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com>
 <dk65zzkz7wz.fsf@rwestrel.remote.csb>
 <ff6c3ae6-99a9-47bc-8dea-01232eab4927@bell-sw.com>
 <dca1a96d-4120-2c7d-6175-31137161ccdd@oracle.com>
 <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com>
Message-ID: <f14ff724-5635-fb52-d18d-154395207979@bell-sw.com>

On 09/05/2018 07:24 PM, Dmitry Chuyko wrote:
> On 09/05/2018 07:00 PM, Vladimir Kozlov wrote:
>> Hi Dmitry,
>>
>> What are (* bytes) values? Is it bytecode size? Why it is different?
> It is a distance between captured event addresses in particular hot 
> region (first and last).
> Perf attributes one more instruction (2 instrs down) in 132 bytes 
> case, it is just a comparison with 52 (0.37%). The code is the same so 
> this doesn't look too suspicious to me. 
Or it does :-) That may be a branch / branch miss miss inside stub. And 
then we may see extra instructions attributed and the branch itself. The 
extra part of region 1 is

 ??? __ cmpw(keylen, 44);
 ??? __ br(Assembler::EQ, L_doLast);

 ??? __ aese(v0, v1);
 ??? __ aesmc(v0, v0);
 ??? __ aese(v0, v2);
 ??? __ aesmc(v0, v0);

 ??? __ ld1(v1, v2, __ T16B, __ post(key, 32));
 ??? __ rev32(v1, __ T16B, v1);
 ??? __ rev32(v2, __ T16B, v2);

 ??? __ cmpw(keylen, 52);
 ??? __ br(Assembler::EQ, L_doLast);


Region 2 is what happens in L_doLast.

-prof perfnorm shows 7-14% more branch misses.

> But different percentage for stub parts does. Note, regions percentage 
> distribution after inlining looks the same, e.g.
>
> ....[Hottest Methods (after 
> inlining)]..............................................................
> ?83.67%??????? runtime stub? StubRoutines::aescrypt_encryptBlock
> ? 7.69%???????? c2, level 4 
> com.sun.crypto.provider.CipherCore::doFinal, version 868
> ? 4.34%???????? c2, level 4 
> org.openjdk.bench.javax.crypto.small.generated.AESBench_encrypt_jmhTest::encrypt_thrpt_jmhStub, 
> version 889
>
> and
>
> ?84.03%??????? runtime stub? StubRoutines::aescrypt_encryptBlock
> ? 7.85%???????? c2, level 4 
> com.sun.crypto.provider.CipherCore::doFinal, version 860
> ? 4.22%???????? c2, level 4 
> org.openjdk.bench.javax.crypto.small.generated.AESBench_encrypt_jmhTest::encrypt_thrpt_jmhStub, 
> version 880
>
> -Dmitry
>
>>
>> Thanks,
>> Vladimir
>>
>> On 9/5/18 8:50 AM, Dmitry Chuyko wrote:
>>> I made few runs on ThunderX2 (aarch64). It is funny but I see almost 
>>> reverse difference in small.AESBench.encrypt: ~4% regression for 
>>> both -XX:-UseSwitchProfiling and patched version against current 
>>> code. No difference for full.AESBench.encrypt.
>>>
>>> Stub code is the same and profiles differ slightly:
>>>
>>> Mainline
>>> ??53.91%??????? runtime stub StubRoutines::aescrypt_encryptBlock 
>>> (128 bytes)
>>> ??29.76%??????? runtime stub StubRoutines::aescrypt_encryptBlock (40 
>>> bytes)
>>> ?? 7.64%???????? c2, level 4 
>>> com.sun.crypto.provider.CipherCore::doFinal, version 868 (356 bytes)
>>>
>>> -XX:+UnlockExperimentalVMOptions -XX:-UseSwitchProfiling
>>> ??57.08%??????? runtime stub StubRoutines::aescrypt_encryptBlock 
>>> (132 bytes)
>>> ??26.95%??????? runtime stub StubRoutines::aescrypt_encryptBlock (40 
>>> bytes)
>>> ?? 7.85%???????? c2, level 4 
>>> com.sun.crypto.provider.CipherCore::doFinal, version 860 (384 bytes)
>>>
>>> Patched
>>> ??58.15%??????? runtime stub StubRoutines::aescrypt_encryptBlock 
>>> (132 bytes)
>>> ??26.44%??????? runtime stub StubRoutines::aescrypt_encryptBlock (40 
>>> bytes)
>>> ?? 6.67%???????? c2, level 4 
>>> com.sun.crypto.provider.CipherCore::doFinal, version 866 (128 bytes)
>>>
>>> -Dmitry
>>>
>>> On 09/05/2018 11:05 AM, Roland Westrelin wrote:
>>>> Thanks for the review. Anyone else?
>>>>
>>>> Roland.
>>>
>


From martin.doerr at sap.com  Wed Sep  5 17:45:22 2018
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 5 Sep 2018 17:45:22 +0000
Subject: RFR: 8208171: PPC64: Enrich SLP support
In-Reply-To: <0fb73e86-12b8-a5d1-f9bb-5f4963606fbc@linux.vnet.ibm.com>
References: <OF27BFB560.0558A39A-ON002582D5.000EEFEA-492582D5.001F7E2E@notes.na.collabserv.com>
 <b235720a-353b-c0e4-d494-eb337eba06a7@linux.vnet.ibm.com>
 <OFCAFD0563.FF8B85D0-ON002582D6.0019C339-492582D6.0019F896@notes.na.collabserv.com>
 <f9be32d4-556e-5833-1c4c-8fed133f26d8@linux.vnet.ibm.com>
 <346da54af45243c4bdaf475f118a450d@sap.com>
 <9553d65d98f74f37a35b49a1e39f015e@sap.com>
 <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com>
 <fc4310e4a8544c69b3d14fd593a065fc@sap.com>
 <0fb73e86-12b8-a5d1-f9bb-5f4963606fbc@linux.vnet.ibm.com>
Message-ID: <6eb4c5c5d42d49bf8564e8bfae1b3e9b@sap.com>

Hi Gustavo,

thank you for your detailed explanation. I wonder what happens with the registers when VSX gets disabled, but the regs are read again many context switches later. But I guess this is solved somehow.

I'm getting different incorrect results every time I run the test on some machines, while other machines always compute the correct result and the test passes.

But I found out that the problem shows up with different kernel versions (4.4.0-101-generic, 3.10.0-693.1.1.el7.ppc64le, 4.4.126-94.22-default). So I guess it's rather unlikely that the problem is only caused by the OS.

After more investigation, it rather looks like v0 is not preserved across the safepoint:

vs32 = v0, vs36 = v4, vs40 = v8

  0x00007fff6813e6d0: extsw   r15,r17
  0x00007fff6813e6d4: rldic   r18,r17,2,30
  0x00007fff6813e6d8: add     r18,r21,r18
  0x00007fff6813e6dc: addi    r20,r18,16
  0x00007fff6813e6e0: addi    r18,r18,16
  0x00007fff6813e6e4: lxvd2x  vs36,0,r18
  0x00007fff6813e6e8: vaddfp  v4,v4,v0
  0x00007fff6813e6ec: rldicr  r15,r15,2,61
  0x00007fff6813e6f0: add     r15,r21,r15
  0x00007fff6813e6f4: addi    r18,r15,32
  0x00007fff6813e6f8: addi    r15,r15,32
  0x00007fff6813e6fc: lxvd2x  vs40,0,r15                    ;*faload {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 21 (line 62)

  0x00007fff6813e700: stxvd2x vs36,0,r20
  0x00007fff6813e704: vaddfp  v4,v8,v0
  0x00007fff6813e708: stxvd2x vs36,0,r18                    ;*fastore {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 24 (line 62)

  0x00007fff6813e70c: addi    r17,r17,8                     ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 25 (line 61)

  0x00007fff6813e710: cmpw    cr5,r17,r24
  0x00007fff6813e714: blt     cr5,0x00007fff6813e6d0        ;*goto {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61)

 ;; B15: #      B14 B16 <- B14  Freq: 12356.3

  0x00007fff6813e718: ld      r15,288(r16)                  ; ImmutableOopMap{R21=Oop }
                                                            ;*goto {reexecute=1 rethrow=0 return_oop=0}
                                                            ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61)

  0x00007fff6813e71c: tdlgei  r15,8                         ;*goto {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61)
                                                            ;   {poll}
  0x00007fff6813e720: cmpw    cr6,r17,r24
  0x00007fff6813e724: blt     cr6,0x00007fff6813e6d0


At the end of the method, I see v4_float = {10000, 10000, 10000, 10000} on machines on which the test passes.
On a machine on which it fails, e.g. v4_float = {0xffffffff, 0x8a296200, 0xffffffff, 0xffffffff}

I thought we had already checked saving and restoring vector registers at safepoints, but seems like we have missed something.

Best regards,
Martin


-----Original Message-----
From: Gustavo Romero <gromero at linux.vnet.ibm.com> 
Sent: Mittwoch, 5. September 2018 18:21
To: Doerr, Martin <martin.doerr at sap.com>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Michihiro Horie <HORIE at jp.ibm.com>
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; hotspot-dev at openjdk.java.net
Subject: Re: RFR: 8208171: PPC64: Enrich SLP support

Hi Martin,

On 09/03/2018 02:18 PM, Doerr, Martin wrote:
> Hi Gustavo and Michihiro,
> 
> we noticed jtreg test failures when using this change:
> compiler/runtime/safepoints/TestRegisterRestoring.java
> compiler/runtime/Test7196199.java
> 
> TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000.

Just to confirm I understood the description correctly:

Where you able to check it's returning random values for the
array instead of 10_000 or you just checked that test failed?

Also, did you pass -XX:-SuperwordUseVSX when it failed? I'm
asking because I'm able to fail that test due to a timeout, but not sure
if it's the same you got there. Look (I'm using the same kernel as yours):

http://cr.openjdk.java.net/~gromero/logs/slp_failure0.txt


Thank you.

Best regards,
Gustavo

> We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler.
> The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default.
> 
> That's what I found out so far. Maybe you have an idea?
> 
> I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented.
> 
> Best regards,
> Martin
> 
> 
> -----Original Message-----
> From: hotspot-dev <hotspot-dev-bounces at openjdk.java.net> On Behalf Of Gustavo Romero
> Sent: Montag, 3. September 2018 14:57
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Michihiro Horie <HORIE at jp.ibm.com>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; hotspot-dev at openjdk.java.net
> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
> 
> Hi Goetz,
> 
> On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote:
>> Also, I can not find all of the mail traffic in
>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html.
>> Is this a problem of the pipermail server?
>>
>> For some reason this webrev lacks the links to browse the diffs.
>> Do you need to use a more recent webrev?  You can obtain it with
>> hg clone http://hg.openjdk.java.net/code-tools/webrev/ .
> 
> Yes, probably it was a problem of the pipermail or in some relay.
> I noted the same thing, i.e. at least one Michi reply arrived
> to me but missed a ML.
> 
> The initial discussion is here:
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html
> 
> I understand Martin reviewed the last webrev in that thread, which is
> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/  (taken from
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html)
> 
> Martin's review of webrev.01:
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html
> 
> and Michi's reply to Martin's review of webrev.01:
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02,
> taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html).
> 
> and your last review:
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html
> 
> 
> HTH.
> 
> Best regards,
> Gustavo
>    
>> Why do you rename vnoreg to vnoregi?
>>
>> Besides that the change is fine, thanks for implementing this!
>>
>> Best regards,
>>     Goetz.
>>
>>
>>> -----Original Message-----
>>> From: Doerr, Martin
>>> Sent: Dienstag, 28. August 2018 19:35
>>> To: Gustavo Romero <gromero at linux.vnet.ibm.com>; Michihiro Horie
>>> <HORIE at jp.ibm.com>
>>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>>> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker
>>> <volker.simonis at sap.com>
>>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>>>
>>> Hi Michihiro,
>>>
>>> thank you for implementing it. I have just taken a first look at your
>>> webrev.01.
>>>
>>> It looks basically good. Only the Power version check seems to be incorrect.
>>> VM_Version::has_popcntb() checks for Power5.
>>> I believe most instructions are available with Power7.
>>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with
>>> Power8?
>>> We should check this carefully.
>>>
>>> Also, indentation in register_ppc.hpp could get improved.
>>>
>>> Thanks and best regard,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
>>> Sent: Donnerstag, 26. Juli 2018 16:02
>>> To: Michihiro Horie <HORIE at jp.ibm.com>
>>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>>> dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>; ppc-aix-
>>> port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>
>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>>
>>> Hi Michi,
>>>
>>> On 07/26/2018 01:43 AM, Michihiro Horie wrote:
>>>> I updated webrev:
>>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/
>>>
>>> Thanks for providing an updated webrev and for fixing indentation and
>>> function
>>> order in assembler_ppc.inline.hpp as well. I have no further comments :)
>>>
>>>
>>> Best Regards,
>>> Gustavo
>>>
>>>>
>>>> Best regards,
>>>> --
>>>> Michihiro,
>>>> IBM Research - Tokyo
>>>>
>>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi,
>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---
>>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie
>>> wrote:
>>>>
>>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
>>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-
>>> dev at openjdk.java.net, hotspot-dev at openjdk.java.net
>>>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin"
>>> <martin.doerr at sap.com>
>>>> Date: 2018/07/25 23:05
>>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>>>
>>>> -------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> -----------------------------------------------------
>>>>
>>>>
>>>>
>>>> Hi Michi,
>>>>
>>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:
>>>>    > Dear all,
>>>>    >
>>>>    > Would you review the following change?
>>>>    > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171
>>>>    > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00
>>>>    >
>>>>    > This change adds support for vectorized arithmetic calculation with SLP.
>>>>    >
>>>>    > The to_vr function is added to convert VSR to VR. Currently, vecX is
>>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad,
>>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the
>>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the
>>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to
>>> the ConvD2FNode::Value in convertnode.cpp.
>>>>
>>>> Looks good. Just a few comments:
>>>>
>>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of
>>> vmaddfp in
>>>>      order to avoid the splat?
>>>>
>>>> - Although all instructions added by your change where introduced in ISA
>>> 2.06,
>>>>      so POWER7 and above are OK, as I see probes for
>>> PowerArchictecturePPC64=6|5 in
>>>>      vm_version_ppc.cpp (line 64),  I'm wondering if there is any control point
>>> to
>>>>      guarantee that these instructions won't be emitted on a CPU that does
>>> not
>>>>      support them.
>>>>
>>>> - I think that in general string in format %{} are in upper case. For instance,
>>>>      this the current output on optoassembly for vmul4F:
>>>>
>>>> 2941835 5b4     ADDI    R24, R24, #64
>>>> 2941836 5b8     vmaddfp  VSR32,VSR32,VSR36      ! mul packed4F
>>>> 2941837 5c0     STXVD2X     [R17], VSR32        // store 16-byte Vector
>>>>
>>>>      I think it would be better to be in upper case instead. I also think that if
>>>>      the node match emits more than one instruction all instructions must be
>>> listed
>>>>      in format %{}, since it's meant for detailed debugging. Finally I think it
>>>>      would be better to replace \t! by \t// in that string (unless I'm missing any
>>>>      special meaning for that char). So for vmul4F it would be something like:
>>>>
>>>> 2941835 5b4     ADDI      R24, R24, #64
>>>>                    VSPLTISW  VSR34, 0                 // Splat 0 imm in VSR34
>>>> 2941836 5b8     VMADDFP   VSR32,VSR32,VSR36,VSR34  // Mul packed4F
>>>> 2941837 5c0     STXVD2X   [R17], VSR32             // store 16-byte Vector
>>>>
>>>>
>>>> But feel free to change anything just after you get additional reviews :)
>>>>
>>>>
>>>>    > I confirmed this change with JTREG. In addition, I used attached micro
>>> benchmarks.
>>>>    > /(See attached file: slp_microbench.zip)/
>>>>
>>>> Thanks for sharing it.
>>>> Btw, another option to host it would be in the CR
>>>> server, in http://cr.openjdk.java.net/~mhorie/8208171
>>>>
>>>>
>>>> Best regards,
>>>> Gustavo
>>>>
>>>>    >
>>>>    > Best regards,
>>>>    > --
>>>>    > Michihiro,
>>>>    > IBM Research - Tokyo
>>>>    >
>>>>
>>>>
>>>>
>>
> 


From martin.doerr at sap.com  Wed Sep  5 18:10:01 2018
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 5 Sep 2018 18:10:01 +0000
Subject: RFR: 8208171: PPC64: Enrich SLP support
In-Reply-To: <6eb4c5c5d42d49bf8564e8bfae1b3e9b@sap.com>
References: <OF27BFB560.0558A39A-ON002582D5.000EEFEA-492582D5.001F7E2E@notes.na.collabserv.com>
 <b235720a-353b-c0e4-d494-eb337eba06a7@linux.vnet.ibm.com>
 <OFCAFD0563.FF8B85D0-ON002582D6.0019C339-492582D6.0019F896@notes.na.collabserv.com>
 <f9be32d4-556e-5833-1c4c-8fed133f26d8@linux.vnet.ibm.com>
 <346da54af45243c4bdaf475f118a450d@sap.com>
 <9553d65d98f74f37a35b49a1e39f015e@sap.com>
 <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com>
 <fc4310e4a8544c69b3d14fd593a065fc@sap.com>
 <0fb73e86-12b8-a5d1-f9bb-5f4963606fbc@linux.vnet.ibm.com>
 <6eb4c5c5d42d49bf8564e8bfae1b3e9b@sap.com>
Message-ID: <90edf96036f24d91acfd6b649d65c41b@sap.com>

Hi Michihiro,

support for POLL_AT_VECTOR_LOOP is required in the handler_blob / RegisterSaver like on x86.
We haven't seen any issues with the current code, but I think this is affects jdk11, too. (We could also switch off SuperwordUseVSX for jdk11u.) Do you agree?

Best regards,
Martin


-----Original Message-----
From: Doerr, Martin 
Sent: Mittwoch, 5. September 2018 19:45
To: 'Gustavo Romero' <gromero at linux.vnet.ibm.com>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Michihiro Horie <HORIE at jp.ibm.com>
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; hotspot-dev at openjdk.java.net
Subject: RE: RFR: 8208171: PPC64: Enrich SLP support

Hi Gustavo,

thank you for your detailed explanation. I wonder what happens with the registers when VSX gets disabled, but the regs are read again many context switches later. But I guess this is solved somehow.

I'm getting different incorrect results every time I run the test on some machines, while other machines always compute the correct result and the test passes.

But I found out that the problem shows up with different kernel versions (4.4.0-101-generic, 3.10.0-693.1.1.el7.ppc64le, 4.4.126-94.22-default). So I guess it's rather unlikely that the problem is only caused by the OS.

After more investigation, it rather looks like v0 is not preserved across the safepoint:

vs32 = v0, vs36 = v4, vs40 = v8

  0x00007fff6813e6d0: extsw   r15,r17
  0x00007fff6813e6d4: rldic   r18,r17,2,30
  0x00007fff6813e6d8: add     r18,r21,r18
  0x00007fff6813e6dc: addi    r20,r18,16
  0x00007fff6813e6e0: addi    r18,r18,16
  0x00007fff6813e6e4: lxvd2x  vs36,0,r18
  0x00007fff6813e6e8: vaddfp  v4,v4,v0
  0x00007fff6813e6ec: rldicr  r15,r15,2,61
  0x00007fff6813e6f0: add     r15,r21,r15
  0x00007fff6813e6f4: addi    r18,r15,32
  0x00007fff6813e6f8: addi    r15,r15,32
  0x00007fff6813e6fc: lxvd2x  vs40,0,r15                    ;*faload {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 21 (line 62)

  0x00007fff6813e700: stxvd2x vs36,0,r20
  0x00007fff6813e704: vaddfp  v4,v8,v0
  0x00007fff6813e708: stxvd2x vs36,0,r18                    ;*fastore {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 24 (line 62)

  0x00007fff6813e70c: addi    r17,r17,8                     ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 25 (line 61)

  0x00007fff6813e710: cmpw    cr5,r17,r24
  0x00007fff6813e714: blt     cr5,0x00007fff6813e6d0        ;*goto {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61)

 ;; B15: #      B14 B16 <- B14  Freq: 12356.3

  0x00007fff6813e718: ld      r15,288(r16)                  ; ImmutableOopMap{R21=Oop }
                                                            ;*goto {reexecute=1 rethrow=0 return_oop=0}
                                                            ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61)

  0x00007fff6813e71c: tdlgei  r15,8                         ;*goto {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61)
                                                            ;   {poll}
  0x00007fff6813e720: cmpw    cr6,r17,r24
  0x00007fff6813e724: blt     cr6,0x00007fff6813e6d0


At the end of the method, I see v4_float = {10000, 10000, 10000, 10000} on machines on which the test passes.
On a machine on which it fails, e.g. v4_float = {0xffffffff, 0x8a296200, 0xffffffff, 0xffffffff}

I thought we had already checked saving and restoring vector registers at safepoints, but seems like we have missed something.

Best regards,
Martin


-----Original Message-----
From: Gustavo Romero <gromero at linux.vnet.ibm.com> 
Sent: Mittwoch, 5. September 2018 18:21
To: Doerr, Martin <martin.doerr at sap.com>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Michihiro Horie <HORIE at jp.ibm.com>
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; hotspot-dev at openjdk.java.net
Subject: Re: RFR: 8208171: PPC64: Enrich SLP support

Hi Martin,

On 09/03/2018 02:18 PM, Doerr, Martin wrote:
> Hi Gustavo and Michihiro,
> 
> we noticed jtreg test failures when using this change:
> compiler/runtime/safepoints/TestRegisterRestoring.java
> compiler/runtime/Test7196199.java
> 
> TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000.

Just to confirm I understood the description correctly:

Where you able to check it's returning random values for the
array instead of 10_000 or you just checked that test failed?

Also, did you pass -XX:-SuperwordUseVSX when it failed? I'm
asking because I'm able to fail that test due to a timeout, but not sure
if it's the same you got there. Look (I'm using the same kernel as yours):

http://cr.openjdk.java.net/~gromero/logs/slp_failure0.txt


Thank you.

Best regards,
Gustavo

> We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler.
> The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default.
> 
> That's what I found out so far. Maybe you have an idea?
> 
> I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented.
> 
> Best regards,
> Martin
> 
> 
> -----Original Message-----
> From: hotspot-dev <hotspot-dev-bounces at openjdk.java.net> On Behalf Of Gustavo Romero
> Sent: Montag, 3. September 2018 14:57
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Michihiro Horie <HORIE at jp.ibm.com>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; hotspot-dev at openjdk.java.net
> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
> 
> Hi Goetz,
> 
> On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote:
>> Also, I can not find all of the mail traffic in
>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html.
>> Is this a problem of the pipermail server?
>>
>> For some reason this webrev lacks the links to browse the diffs.
>> Do you need to use a more recent webrev?  You can obtain it with
>> hg clone http://hg.openjdk.java.net/code-tools/webrev/ .
> 
> Yes, probably it was a problem of the pipermail or in some relay.
> I noted the same thing, i.e. at least one Michi reply arrived
> to me but missed a ML.
> 
> The initial discussion is here:
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html
> 
> I understand Martin reviewed the last webrev in that thread, which is
> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/  (taken from
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html)
> 
> Martin's review of webrev.01:
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html
> 
> and Michi's reply to Martin's review of webrev.01:
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02,
> taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html).
> 
> and your last review:
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html
> 
> 
> HTH.
> 
> Best regards,
> Gustavo
>    
>> Why do you rename vnoreg to vnoregi?
>>
>> Besides that the change is fine, thanks for implementing this!
>>
>> Best regards,
>>     Goetz.
>>
>>
>>> -----Original Message-----
>>> From: Doerr, Martin
>>> Sent: Dienstag, 28. August 2018 19:35
>>> To: Gustavo Romero <gromero at linux.vnet.ibm.com>; Michihiro Horie
>>> <HORIE at jp.ibm.com>
>>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>>> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker
>>> <volker.simonis at sap.com>
>>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>>>
>>> Hi Michihiro,
>>>
>>> thank you for implementing it. I have just taken a first look at your
>>> webrev.01.
>>>
>>> It looks basically good. Only the Power version check seems to be incorrect.
>>> VM_Version::has_popcntb() checks for Power5.
>>> I believe most instructions are available with Power7.
>>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with
>>> Power8?
>>> We should check this carefully.
>>>
>>> Also, indentation in register_ppc.hpp could get improved.
>>>
>>> Thanks and best regard,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
>>> Sent: Donnerstag, 26. Juli 2018 16:02
>>> To: Michihiro Horie <HORIE at jp.ibm.com>
>>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>>> dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>; ppc-aix-
>>> port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>
>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>>
>>> Hi Michi,
>>>
>>> On 07/26/2018 01:43 AM, Michihiro Horie wrote:
>>>> I updated webrev:
>>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/
>>>
>>> Thanks for providing an updated webrev and for fixing indentation and
>>> function
>>> order in assembler_ppc.inline.hpp as well. I have no further comments :)
>>>
>>>
>>> Best Regards,
>>> Gustavo
>>>
>>>>
>>>> Best regards,
>>>> --
>>>> Michihiro,
>>>> IBM Research - Tokyo
>>>>
>>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi,
>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---
>>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie
>>> wrote:
>>>>
>>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
>>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-
>>> dev at openjdk.java.net, hotspot-dev at openjdk.java.net
>>>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin"
>>> <martin.doerr at sap.com>
>>>> Date: 2018/07/25 23:05
>>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>>>
>>>> -------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> ----------------------------------------------------------------------------------------------
>>> -----------------------------------------------------
>>>>
>>>>
>>>>
>>>> Hi Michi,
>>>>
>>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:
>>>>    > Dear all,
>>>>    >
>>>>    > Would you review the following change?
>>>>    > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171
>>>>    > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00
>>>>    >
>>>>    > This change adds support for vectorized arithmetic calculation with SLP.
>>>>    >
>>>>    > The to_vr function is added to convert VSR to VR. Currently, vecX is
>>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad,
>>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the
>>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the
>>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to
>>> the ConvD2FNode::Value in convertnode.cpp.
>>>>
>>>> Looks good. Just a few comments:
>>>>
>>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of
>>> vmaddfp in
>>>>      order to avoid the splat?
>>>>
>>>> - Although all instructions added by your change where introduced in ISA
>>> 2.06,
>>>>      so POWER7 and above are OK, as I see probes for
>>> PowerArchictecturePPC64=6|5 in
>>>>      vm_version_ppc.cpp (line 64),  I'm wondering if there is any control point
>>> to
>>>>      guarantee that these instructions won't be emitted on a CPU that does
>>> not
>>>>      support them.
>>>>
>>>> - I think that in general string in format %{} are in upper case. For instance,
>>>>      this the current output on optoassembly for vmul4F:
>>>>
>>>> 2941835 5b4     ADDI    R24, R24, #64
>>>> 2941836 5b8     vmaddfp  VSR32,VSR32,VSR36      ! mul packed4F
>>>> 2941837 5c0     STXVD2X     [R17], VSR32        // store 16-byte Vector
>>>>
>>>>      I think it would be better to be in upper case instead. I also think that if
>>>>      the node match emits more than one instruction all instructions must be
>>> listed
>>>>      in format %{}, since it's meant for detailed debugging. Finally I think it
>>>>      would be better to replace \t! by \t// in that string (unless I'm missing any
>>>>      special meaning for that char). So for vmul4F it would be something like:
>>>>
>>>> 2941835 5b4     ADDI      R24, R24, #64
>>>>                    VSPLTISW  VSR34, 0                 // Splat 0 imm in VSR34
>>>> 2941836 5b8     VMADDFP   VSR32,VSR32,VSR36,VSR34  // Mul packed4F
>>>> 2941837 5c0     STXVD2X   [R17], VSR32             // store 16-byte Vector
>>>>
>>>>
>>>> But feel free to change anything just after you get additional reviews :)
>>>>
>>>>
>>>>    > I confirmed this change with JTREG. In addition, I used attached micro
>>> benchmarks.
>>>>    > /(See attached file: slp_microbench.zip)/
>>>>
>>>> Thanks for sharing it.
>>>> Btw, another option to host it would be in the CR
>>>> server, in http://cr.openjdk.java.net/~mhorie/8208171
>>>>
>>>>
>>>> Best regards,
>>>> Gustavo
>>>>
>>>>    >
>>>>    > Best regards,
>>>>    > --
>>>>    > Michihiro,
>>>>    > IBM Research - Tokyo
>>>>    >
>>>>
>>>>
>>>>
>>
> 


From gromero at linux.vnet.ibm.com  Wed Sep  5 18:29:25 2018
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Wed, 5 Sep 2018 15:29:25 -0300
Subject: RFR: 8208171: PPC64: Enrich SLP support
In-Reply-To: <6eb4c5c5d42d49bf8564e8bfae1b3e9b@sap.com>
References: <OF27BFB560.0558A39A-ON002582D5.000EEFEA-492582D5.001F7E2E@notes.na.collabserv.com>
 <b235720a-353b-c0e4-d494-eb337eba06a7@linux.vnet.ibm.com>
 <OFCAFD0563.FF8B85D0-ON002582D6.0019C339-492582D6.0019F896@notes.na.collabserv.com>
 <f9be32d4-556e-5833-1c4c-8fed133f26d8@linux.vnet.ibm.com>
 <346da54af45243c4bdaf475f118a450d@sap.com>
 <9553d65d98f74f37a35b49a1e39f015e@sap.com>
 <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com>
 <fc4310e4a8544c69b3d14fd593a065fc@sap.com>
 <0fb73e86-12b8-a5d1-f9bb-5f4963606fbc@linux.vnet.ibm.com>
 <6eb4c5c5d42d49bf8564e8bfae1b3e9b@sap.com>
Message-ID: <71b0c876-25a2-f757-375f-686477d2d2c7@linux.vnet.ibm.com>

Hi Martin,

On 09/05/2018 02:45 PM, Doerr, Martin wrote:
> Hi Gustavo,
> 
> thank you for your detailed explanation. I wonder what happens with the registers when VSX gets disabled, but the regs are read again many context switches later. But I guess this is solved somehow.

No problem!

Yes, kernel solves that too: once a VSX instruction is used again and VSX is
disabled it generates an exception and the exception handler calls
load_up_vsx():

https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/vector.S#L119

load_up_vsx() calls, by its turn, load_up_fpu() and load_up_vec() to load FP and
VEC registers from the thread struct associated to the task that wants to use
the VSX facility again. That thread struct contains the correct FP/VEC/VSX
registers saved many context switches before when the facilities where disabled.

The best description on what happens in this case (valid for VEC and VSX as
well) can be found in load_up_fpu() description:

https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/fpu.S#L79-L83 :

  * This task wants to use the FPU now.
  * On UP, disable FP for the task which had the FPU previously,
  * and save its floating-point registers in its thread_struct.
  * Load up this task's FP registers from its thread_struct,
  * enable the FPU for the current task and return to the task.

Here UP stand for Uni Processor (in case we are running a machine with only 1
CPU).

  
> I'm getting different incorrect results every time I run the test on some machines, while other machines always compute the correct result and the test passes.
> 
> But I found out that the problem shows up with different kernel versions (4.4.0-101-generic, 3.10.0-693.1.1.el7.ppc64le, 4.4.126-94.22-default). So I guess it's rather unlikely that the problem is only caused by the OS.
> 
> After more investigation, it rather looks like v0 is not preserved across the safepoint:
> 
> vs32 = v0, vs36 = v4, vs40 = v8
> 
>    0x00007fff6813e6d0: extsw   r15,r17
>    0x00007fff6813e6d4: rldic   r18,r17,2,30
>    0x00007fff6813e6d8: add     r18,r21,r18
>    0x00007fff6813e6dc: addi    r20,r18,16
>    0x00007fff6813e6e0: addi    r18,r18,16
>    0x00007fff6813e6e4: lxvd2x  vs36,0,r18
>    0x00007fff6813e6e8: vaddfp  v4,v4,v0
>    0x00007fff6813e6ec: rldicr  r15,r15,2,61
>    0x00007fff6813e6f0: add     r15,r21,r15
>    0x00007fff6813e6f4: addi    r18,r15,32
>    0x00007fff6813e6f8: addi    r15,r15,32
>    0x00007fff6813e6fc: lxvd2x  vs40,0,r15                    ;*faload {reexecute=0 rethrow=0 return_oop=0}
>                                                              ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 21 (line 62)
> 
>    0x00007fff6813e700: stxvd2x vs36,0,r20
>    0x00007fff6813e704: vaddfp  v4,v8,v0
>    0x00007fff6813e708: stxvd2x vs36,0,r18                    ;*fastore {reexecute=0 rethrow=0 return_oop=0}
>                                                              ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 24 (line 62)
> 
>    0x00007fff6813e70c: addi    r17,r17,8                     ;*iinc {reexecute=0 rethrow=0 return_oop=0}
>                                                              ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 25 (line 61)
> 
>    0x00007fff6813e710: cmpw    cr5,r17,r24
>    0x00007fff6813e714: blt     cr5,0x00007fff6813e6d0        ;*goto {reexecute=0 rethrow=0 return_oop=0}
>                                                              ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61)
> 
>   ;; B15: #      B14 B16 <- B14  Freq: 12356.3
> 
>    0x00007fff6813e718: ld      r15,288(r16)                  ; ImmutableOopMap{R21=Oop }
>                                                              ;*goto {reexecute=1 rethrow=0 return_oop=0}
>                                                              ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61)
> 
>    0x00007fff6813e71c: tdlgei  r15,8                         ;*goto {reexecute=0 rethrow=0 return_oop=0}
>                                                              ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61)
>                                                              ;   {poll}
>    0x00007fff6813e720: cmpw    cr6,r17,r24
>    0x00007fff6813e724: blt     cr6,0x00007fff6813e6d0
> 
> 
> At the end of the method, I see v4_float = {10000, 10000, 10000, 10000} on machines on which the test passes.
> On a machine on which it fails, e.g. v4_float = {0xffffffff, 0x8a296200, 0xffffffff, 0xffffffff}
> 
> I thought we had already checked saving and restoring vector registers at safepoints, but seems like we have missed something.

OK. So I was not able to reproduce yet... But looks like you pointed out a
solution to Michi already, so I'll stay tuned.

Thanks.


Best regards,
Gustavo

> Best regards,
> Martin
> 
> 
> -----Original Message-----
> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> Sent: Mittwoch, 5. September 2018 18:21
> To: Doerr, Martin <martin.doerr at sap.com>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Michihiro Horie <HORIE at jp.ibm.com>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; hotspot-dev at openjdk.java.net
> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
> 
> Hi Martin,
> 
> On 09/03/2018 02:18 PM, Doerr, Martin wrote:
>> Hi Gustavo and Michihiro,
>>
>> we noticed jtreg test failures when using this change:
>> compiler/runtime/safepoints/TestRegisterRestoring.java
>> compiler/runtime/Test7196199.java
>>
>> TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000.
> 
> Just to confirm I understood the description correctly:
> 
> Where you able to check it's returning random values for the
> array instead of 10_000 or you just checked that test failed?
> 
> Also, did you pass -XX:-SuperwordUseVSX when it failed? I'm
> asking because I'm able to fail that test due to a timeout, but not sure
> if it's the same you got there. Look (I'm using the same kernel as yours):
> 
> http://cr.openjdk.java.net/~gromero/logs/slp_failure0.txt
> 
> 
> Thank you.
> 
> Best regards,
> Gustavo
> 
>> We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler.
>> The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default.
>>
>> That's what I found out so far. Maybe you have an idea?
>>
>> I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented.
>>
>> Best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: hotspot-dev <hotspot-dev-bounces at openjdk.java.net> On Behalf Of Gustavo Romero
>> Sent: Montag, 3. September 2018 14:57
>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Michihiro Horie <HORIE at jp.ibm.com>
>> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; hotspot-dev at openjdk.java.net
>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>
>> Hi Goetz,
>>
>> On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote:
>>> Also, I can not find all of the mail traffic in
>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html.
>>> Is this a problem of the pipermail server?
>>>
>>> For some reason this webrev lacks the links to browse the diffs.
>>> Do you need to use a more recent webrev?  You can obtain it with
>>> hg clone http://hg.openjdk.java.net/code-tools/webrev/ .
>>
>> Yes, probably it was a problem of the pipermail or in some relay.
>> I noted the same thing, i.e. at least one Michi reply arrived
>> to me but missed a ML.
>>
>> The initial discussion is here:
>> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html
>>
>> I understand Martin reviewed the last webrev in that thread, which is
>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/  (taken from
>> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html)
>>
>> Martin's review of webrev.01:
>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html
>>
>> and Michi's reply to Martin's review of webrev.01:
>> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02,
>> taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html).
>>
>> and your last review:
>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html
>>
>>
>> HTH.
>>
>> Best regards,
>> Gustavo
>>     
>>> Why do you rename vnoreg to vnoregi?
>>>
>>> Besides that the change is fine, thanks for implementing this!
>>>
>>> Best regards,
>>>      Goetz.
>>>
>>>
>>>> -----Original Message-----
>>>> From: Doerr, Martin
>>>> Sent: Dienstag, 28. August 2018 19:35
>>>> To: Gustavo Romero <gromero at linux.vnet.ibm.com>; Michihiro Horie
>>>> <HORIE at jp.ibm.com>
>>>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>>>> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker
>>>> <volker.simonis at sap.com>
>>>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>>>>
>>>> Hi Michihiro,
>>>>
>>>> thank you for implementing it. I have just taken a first look at your
>>>> webrev.01.
>>>>
>>>> It looks basically good. Only the Power version check seems to be incorrect.
>>>> VM_Version::has_popcntb() checks for Power5.
>>>> I believe most instructions are available with Power7.
>>>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with
>>>> Power8?
>>>> We should check this carefully.
>>>>
>>>> Also, indentation in register_ppc.hpp could get improved.
>>>>
>>>> Thanks and best regard,
>>>> Martin
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
>>>> Sent: Donnerstag, 26. Juli 2018 16:02
>>>> To: Michihiro Horie <HORIE at jp.ibm.com>
>>>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>>>> dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>; ppc-aix-
>>>> port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>
>>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>>>
>>>> Hi Michi,
>>>>
>>>> On 07/26/2018 01:43 AM, Michihiro Horie wrote:
>>>>> I updated webrev:
>>>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/
>>>>
>>>> Thanks for providing an updated webrev and for fixing indentation and
>>>> function
>>>> order in assembler_ppc.inline.hpp as well. I have no further comments :)
>>>>
>>>>
>>>> Best Regards,
>>>> Gustavo
>>>>
>>>>>
>>>>> Best regards,
>>>>> --
>>>>> Michihiro,
>>>>> IBM Research - Tokyo
>>>>>
>>>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi,
>>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---
>>>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie
>>>> wrote:
>>>>>
>>>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
>>>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-
>>>> dev at openjdk.java.net, hotspot-dev at openjdk.java.net
>>>>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin"
>>>> <martin.doerr at sap.com>
>>>>> Date: 2018/07/25 23:05
>>>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>>>>
>>>>> -------------------------------------------------------------------------------------------
>>>> ----------------------------------------------------------------------------------------------
>>>> ----------------------------------------------------------------------------------------------
>>>> ----------------------------------------------------------------------------------------------
>>>> ----------------------------------------------------------------------------------------------
>>>> ----------------------------------------------------------------------------------------------
>>>> ----------------------------------------------------------------------------------------------
>>>> ----------------------------------------------------------------------------------------------
>>>> ----------------------------------------------------------------------------------------------
>>>> ----------------------------------------------------------------------------------------------
>>>> -----------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>> Hi Michi,
>>>>>
>>>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:
>>>>>     > Dear all,
>>>>>     >
>>>>>     > Would you review the following change?
>>>>>     > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171
>>>>>     > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00
>>>>>     >
>>>>>     > This change adds support for vectorized arithmetic calculation with SLP.
>>>>>     >
>>>>>     > The to_vr function is added to convert VSR to VR. Currently, vecX is
>>>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad,
>>>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the
>>>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the
>>>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to
>>>> the ConvD2FNode::Value in convertnode.cpp.
>>>>>
>>>>> Looks good. Just a few comments:
>>>>>
>>>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of
>>>> vmaddfp in
>>>>>       order to avoid the splat?
>>>>>
>>>>> - Although all instructions added by your change where introduced in ISA
>>>> 2.06,
>>>>>       so POWER7 and above are OK, as I see probes for
>>>> PowerArchictecturePPC64=6|5 in
>>>>>       vm_version_ppc.cpp (line 64),  I'm wondering if there is any control point
>>>> to
>>>>>       guarantee that these instructions won't be emitted on a CPU that does
>>>> not
>>>>>       support them.
>>>>>
>>>>> - I think that in general string in format %{} are in upper case. For instance,
>>>>>       this the current output on optoassembly for vmul4F:
>>>>>
>>>>> 2941835 5b4     ADDI    R24, R24, #64
>>>>> 2941836 5b8     vmaddfp  VSR32,VSR32,VSR36      ! mul packed4F
>>>>> 2941837 5c0     STXVD2X     [R17], VSR32        // store 16-byte Vector
>>>>>
>>>>>       I think it would be better to be in upper case instead. I also think that if
>>>>>       the node match emits more than one instruction all instructions must be
>>>> listed
>>>>>       in format %{}, since it's meant for detailed debugging. Finally I think it
>>>>>       would be better to replace \t! by \t// in that string (unless I'm missing any
>>>>>       special meaning for that char). So for vmul4F it would be something like:
>>>>>
>>>>> 2941835 5b4     ADDI      R24, R24, #64
>>>>>                     VSPLTISW  VSR34, 0                 // Splat 0 imm in VSR34
>>>>> 2941836 5b8     VMADDFP   VSR32,VSR32,VSR36,VSR34  // Mul packed4F
>>>>> 2941837 5c0     STXVD2X   [R17], VSR32             // store 16-byte Vector
>>>>>
>>>>>
>>>>> But feel free to change anything just after you get additional reviews :)
>>>>>
>>>>>
>>>>>     > I confirmed this change with JTREG. In addition, I used attached micro
>>>> benchmarks.
>>>>>     > /(See attached file: slp_microbench.zip)/
>>>>>
>>>>> Thanks for sharing it.
>>>>> Btw, another option to host it would be in the CR
>>>>> server, in http://cr.openjdk.java.net/~mhorie/8208171
>>>>>
>>>>>
>>>>> Best regards,
>>>>> Gustavo
>>>>>
>>>>>     >
>>>>>     > Best regards,
>>>>>     > --
>>>>>     > Michihiro,
>>>>>     > IBM Research - Tokyo
>>>>>     >
>>>>>
>>>>>
>>>>>
>>>
>>
> 


From gromero at linux.vnet.ibm.com  Wed Sep  5 18:34:25 2018
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Wed, 5 Sep 2018 15:34:25 -0300
Subject: RFR: 8208171: PPC64: Enrich SLP support
In-Reply-To: <OFD4AAA261.28219E95-ON002582FF.00371825-492582FF.00390865@notes.na.collabserv.com>
References: <OF27BFB560.0558A39A-ON002582D5.000EEFEA-492582D5.001F7E2E@notes.na.collabserv.com>
 <b235720a-353b-c0e4-d494-eb337eba06a7@linux.vnet.ibm.com>
 <OFCAFD0563.FF8B85D0-ON002582D6.0019C339-492582D6.0019F896@notes.na.collabserv.com>
 <f9be32d4-556e-5833-1c4c-8fed133f26d8@linux.vnet.ibm.com>
 <346da54af45243c4bdaf475f118a450d@sap.com>
 <9553d65d98f74f37a35b49a1e39f015e@sap.com>
 <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com>
 <fc4310e4a8544c69b3d14fd593a065fc@sap.com>
 <OF1DDE5FDC.048940CF-ON002582FD.007F93C0-492582FE.001E65D8@notes.na.collabserv.com>
 <57ebd30a66504577a6b2ec267aee4b69@sap.com>
 <8a093a6e-631e-4c20-07dc-96bc29f8c411@linux.vnet.ibm.com>
 <OFD4AAA261.28219E95-ON002582FF.00371825-492582FF.00390865@notes.na.collabserv.com>
Message-ID: <a8132254-dd87-69e1-69b2-06c8a58565bf@linux.vnet.ibm.com>

Hi Michi,

On 09/05/2018 07:22 AM, Michihiro Horie wrote:
> Hi Martin, Gustavo,
> 
> I cannot still reproduce the problem. I noticed the machine I have is not SUSE but OpenSUSE with 4.1.21-14-default. I've also tried kernel 4.4.0-31-generic but it's Ubuntu.
> 
> Gustavo, is there any suspicious change before/after v4.4, which Martin got the crash?

Nope, nothing I'm aware of... However looks like Martin found no issues with
your last revision. Anyway, if you need a machine with SLES 12 SP3 installed
I have one that I can share. Drop me a Slack message if you need it.


Regards,
Gustavo

> 
> Apart from the problem, I uploaded the latest webrev:<http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.03/>
> http://cr.openjdk.java.net/~mhorie/8208171/webrev.03/ <http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.03/>
> 
> 
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo
> 
> Inactive hide details for Gustavo Romero ---2018/09/05 07:03:31---Hi Martin and Michi, On 09/04/2018 01:20 PM, Doerr, Martin wrGustavo Romero ---2018/09/05 07:03:31---Hi Martin and Michi, On 09/04/2018 01:20 PM, Doerr, Martin wrote:
> 
> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> To: "Doerr, Martin" <martin.doerr at sap.com>, Michihiro Horie/Japan/IBM at IBMJP
> Cc: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>, hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Date: 2018/09/05 07:03
> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
> 
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> 
> Hi Martin and Michi,
> 
> On 09/04/2018 01:20 PM, Doerr, Martin wrote:
>  > Can you reproduce the test failures?
>  >
>  > The very same VM works fine on a different Power8 machine which uses the same instructions by C2.
>  >
>  > The VM was built on the machine where it works ("SUSE Linux Enterprise Server 12 SP1").
>  >
>  > I have seen several linux kernel changes regarding saving and restoring the VSX registers.
>  >
>  > I still haven?t found out how the kernel determines things like ?tsk->thread.used_vsr? which is used to set ?msr |= MSR_VEC?.
>  >
>  > Maybe something is missing which tells the kernel that we?re using it. But that?s just a guess.
> 
> Facilities like FP (fp registers), VEC (vector registers - aka VMX/Altivec), and
> VSX (vector-scalar registers) are usually disabled on a new born process. Once
> any instruction associated to these facilities is used in the process it causes
> an exception that is treated by the kernel [1, 2, 3]: kernel enables the
> facility that caused the exception (see load_up_fp & friends) and re-execute the
> instruction when kernel returns the control back to the process in userspace.
> 
> Starting from kernel v4.6 [4] there is a simple heuristic that employs a 8-bit
> counter to help track if a process, after using these facilities for the first
> time, continues to use the facilities. The counters (load_fp and load_vec) are
> incremented on each context switch and if the process stops using the FP or VEC
> facilities then they are disabled again with FP/VEC/VSX save/restore on context
> switches being disabled as well in order to improve the performance on context
> switches by avoiding the FP/VEC/VEX register save/restore.
> 
> Either way (before or after the change introduced in v4.6) *that mechanism is
> opaque to userspace*, particularly to the process using these facilities. If a
> given facility is not enabled by the kernel (in case the CPU does not support
> it, kernel sends a SIGILL to the process). It's possible to inspect the thread
> member dynamics/state from userspace using tools like 'systemtap' (for
> exemple, this simple script can be used to inspect a VRSAVE registers on given
> thread that is running a program called 'vrsave_' [5]) or using the 'perf' tool.
> 
> "tsk->thread.used_vsr" [6] is actually associated to the VSX facility whilst
> MSR_VEC is associated to the VEC/VMX/Altivec facility [7], so
> "tsk->thread.used_vsr" is set to 1 once a VSX instruction is used (if it's a new
> process or if the load_fp and load_vec counters overflowed and became zero
> disabling VSX or if only FP or only VEC ?- not both - were used in the process).
> In that case kernel will also enable the VSX by MSR |= MSR_VSX. A similar
> mechanism drives the FP (MSR_FP) and the VEC (MSR_VEC) facilities.
> 
> If both FP and VEC facilities are used the VSX facility is enabled automatically
> since FP+VEC regsets == VSX regset [8].
> 
> Thus as this mechanism is entirely opaque to userspace I understand that if a
> program has to tell to kernel it wants to use any of these facilities
> (FP/VEC/VEC) before using it there is something wrong going in kernelspace.
> 
> Martin and Michi, if you want any help on drilling it further at kernel side
> please let me know, maybe I can help.
> 
> I didn't have the chance to reproduce the crash yet, so if I find anything
> meaningful about it tomorrow I'll keep you posted.
> 
> 
> Kind regards,
> Gustavo
> 
> [1] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L851-L869? ?(FP)
> [2] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211?(VEC/VMX/Altivec)
> [3] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211?(VSX)
> [4] https://github.com/torvalds/linux/commit/70fe3d980#diff-ef76830326856a12ea2b45630123d1adR239
> [5] http://cr.openjdk.java.net/~gromero/script.d <http://cr.openjdk.java.net/%7Egromero/script.d>
> [6] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R310
> [7] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R250
> [8] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R437
> 
>  > Best regards,
>  >
>  > Martin
>  >
>  > *From:*Michihiro Horie <HORIE at jp.ibm.com>
>  > *Sent:* Dienstag, 4. September 2018 07:32
>  > *To:* Doerr, Martin <martin.doerr at sap.com>
>  > *Cc:* Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Gustavo Romero <gromero at linux.vnet.ibm.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; hotspot-dev at openjdk.java.net
>  > *Subject:* RE: RFR: 8208171: PPC64: Enrich SLP support
>  >
>  > Hi Goetz, Martin, and Gustavo,
>  >
>  >
>  >>First, this should have been reviewed on hotspot-compiler-dev. It is clearly
>  >>a compiler change. _
>  > _>http://mail.openjdk.java.net/mailman/listinfo?<http://mail.openjdk.java.net/mailman/listinfo> says that hotspot-dev is for
>  >>"Technical discussion about the development of the HotSpot virtual machine that's not specific to any particular component"
>  >>while hotspot-compiler-dev is for
>  >>"Technical discussion about the development of the HotSpot bytecode compilers"
>  > I understood the instruction and would use hotspot-compiler-dev in future RFRs, thanks.
>  >
>  >
>  >> Why do you rename vnoreg to vnoregi?
>  > I followed the way of coding for vsnoreg and vsnoregi, but the renaming does not look necessary. I would get this part back. Should I also rename vsnoregi to vsnoreg?
>  >
>  >
>  >>we noticed jtreg test failures when using this change:
>  >>compiler/runtime/safepoints/TestRegisterRestoring.java
>  >>compiler/runtime/Test7196199.java
>  >>
>  >>TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000.
>  >>
>  >>We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler.
>  >>The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default.
>  > Thank you for letting me know the issue, I will try to reproduce this on a SUSE machine.
>  >
>  >
>  >>I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented.
>  > Thank you for pointing out another issue. Currently I do not hit this problem, but preventing to match the vector nodes makes sense to avoid the crash. I did not prepare match rules for non-vector nodes, so it might be better to prepare them similarly like the Replicate* rules, in any case.
>  >
>  >
>  > Gustavo, thanks for the wrap-up!
>  >
>  >
>  > Best regards,
>  > --
>  > Michihiro,
>  > IBM Research - Tokyo
>  >
>  > Inactive hide details for "Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures whe"Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures when using this change:
>  >
>  > From: "Doerr, Martin" <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>
>  > To: Gustavo Romero <gromero at linux.vnet.ibm.com <mailto:gromero at linux.vnet.ibm.com>>, "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>, Michihiro Horie <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
>  > Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>>, "hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>>
>  > Date: 2018/09/04 02:18
>  > Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>  >
>  > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  >
>  >
>  >
>  >
>  > Hi Gustavo and Michihiro,
>  >
>  > we noticed jtreg test failures when using this change:
>  > compiler/runtime/safepoints/TestRegisterRestoring.java
>  > compiler/runtime/Test7196199.java
>  >
>  > TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000.
>  >
>  > We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler.
>  > The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default.
>  >
>  > That's what I found out so far. Maybe you have an idea?
>  >
>  > I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented.
>  >
>  > Best regards,
>  > Martin
>  >
>  >
>  > -----Original Message-----
>  > From: hotspot-dev <hotspot-dev-bounces at openjdk.java.net <mailto:hotspot-dev-bounces at openjdk.java.net>> On Behalf Of Gustavo Romero
>  > Sent: Montag, 3. September 2018 14:57
>  > To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>; Michihiro Horie <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
>  > Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>>; hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>
>  > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>  >
>  > Hi Goetz,
>  >
>  > On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote:
>  >> Also, I can not find all of the mail traffic in
>  >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html.
>  >> Is this a problem of the pipermail server?
>  >>
>  >> For some reason this webrev lacks the links to browse the diffs.
>  >> Do you need to use a more recent webrev? ?You can obtain it with
>  >> hg clone http://hg.openjdk.java.net/code-tools/webrev/?.
>  >
>  > Yes, probably it was a problem of the pipermail or in some relay.
>  > I noted the same thing, i.e. at least one Michi reply arrived
>  > to me but missed a ML.
>  >
>  > The initial discussion is here:
>  > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html
>  >
>  > I understand Martin reviewed the last webrev in that thread, which is
>  > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ <http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/>?<http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/> ?(taken from
>  > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html)
>  >
>  > Martin's review of webrev.01:
>  > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html
>  >
>  > and Michi's reply to Martin's review of webrev.01:
>  > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html?(with webrev.02,
>  > taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html).
>  >
>  > and your last review:
>  > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html
>  >
>  >
>  > HTH.
>  >
>  > Best regards,
>  > Gustavo
>  >
>  >> Why do you rename vnoreg to vnoregi?
>  >>
>  >> Besides that the change is fine, thanks for implementing this!
>  >>
>  >> Best regards,
>  >> ? ?Goetz.
>  >>
>  >>
>  >>> -----Original Message-----
>  >>> From: Doerr, Martin
>  >>> Sent: Dienstag, 28. August 2018 19:35
>  >>> To: Gustavo Romero <gromero at linux.vnet.ibm.com <mailto:gromero at linux.vnet.ibm.com>>; Michihiro Horie
>  >>> <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
>  >>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>; hotspot-
>  >>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>; ppc-aix-port-dev at openjdk.java.net <mailto:ppc-aix-port-dev at openjdk.java.net>; Simonis, Volker
>  >>> <volker.simonis at sap.com <mailto:volker.simonis at sap.com>>
>  >>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>  >>>
>  >>> Hi Michihiro,
>  >>>
>  >>> thank you for implementing it. I have just taken a first look at your
>  >>> webrev.01.
>  >>>
>  >>> It looks basically good. Only the Power version check seems to be incorrect.
>  >>> VM_Version::has_popcntb() checks for Power5.
>  >>> I believe most instructions are available with Power7.
>  >>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with
>  >>> Power8?
>  >>> We should check this carefully.
>  >>>
>  >>> Also, indentation in register_ppc.hpp could get improved.
>  >>>
>  >>> Thanks and best regard,
>  >>> Martin
>  >>>
>  >>>
>  >>> -----Original Message-----
>  >>> From: Gustavo Romero <gromero at linux.vnet.ibm.com <mailto:gromero at linux.vnet.ibm.com>>
>  >>> Sent: Donnerstag, 26. Juli 2018 16:02
>  >>> To: Michihiro Horie <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
>  >>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>; hotspot-
>  >>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>; Doerr, Martin <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>; ppc-aix-
>  >>> port-dev at openjdk.java.net <mailto:port-dev at openjdk.java.net>; Simonis, Volker <volker.simonis at sap.com <mailto:volker.simonis at sap.com>>
>  >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>  >>>
>  >>> Hi Michi,
>  >>>
>  >>> On 07/26/2018 01:43 AM, Michihiro Horie wrote:
>  >>>> I updated webrev:
>  >>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ <http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/>?<http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/>
>  >>>
>  >>> Thanks for providing an updated webrev and for fixing indentation and
>  >>> function
>  >>> order in assembler_ppc.inline.hpp as well. I have no further comments :)
>  >>>
>  >>>
>  >>> Best Regards,
>  >>> Gustavo
>  >>>
>  >>>>
>  >>>> Best regards,
>  >>>> --
>  >>>> Michihiro,
>  >>>> IBM Research - Tokyo
>  >>>>
>  >>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi,
>  >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---
>  >>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie
>  >>> wrote:
>  >>>>
>  >>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com <mailto:gromero at linux.vnet.ibm.com>>
>  >>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-
>  >>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>, hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>
>  >>>> Cc: goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>, volker.simonis at sap.com <mailto:volker.simonis at sap.com>, "Doerr, Martin"
>  >>> <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>
>  >>>> Date: 2018/07/25 23:05
>  >>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>  >>>>
>  >>>> -------------------------------------------------------------------------------------------
>  >>> ----------------------------------------------------------------------------------------------
>  >>> ----------------------------------------------------------------------------------------------
>  >>> ----------------------------------------------------------------------------------------------
>  >>> ----------------------------------------------------------------------------------------------
>  >>> ----------------------------------------------------------------------------------------------
>  >>> ----------------------------------------------------------------------------------------------
>  >>> ----------------------------------------------------------------------------------------------
>  >>> ----------------------------------------------------------------------------------------------
>  >>> ----------------------------------------------------------------------------------------------
>  >>> -----------------------------------------------------
>  >>>>
>  >>>>
>  >>>>
>  >>>> Hi Michi,
>  >>>>
>  >>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:
>  >>>> ? > Dear all,
>  >>>> ? >
>  >>>> ? > Would you review the following change?
>  >>>> ? > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171
>  >>>> ? > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 <http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.00>?<http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.00>
>  >>>> ? >
>  >>>> ? > This change adds support for vectorized arithmetic calculation with SLP.
>  >>>> ? >
>  >>>> ? > The to_vr function is added to convert VSR to VR. Currently, vecX is
>  >>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad,
>  >>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the
>  >>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the
>  >>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to
>  >>> the ConvD2FNode::Value in convertnode.cpp.
>  >>>>
>  >>>> Looks good. Just a few comments:
>  >>>>
>  >>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of
>  >>> vmaddfp in
>  >>>> ? ? order to avoid the splat?
>  >>>>
>  >>>> - Although all instructions added by your change where introduced in ISA
>  >>> 2.06,
>  >>>> ? ? so POWER7 and above are OK, as I see probes for
>  >>> PowerArchictecturePPC64=6|5 in
>  >>>> ? ? vm_version_ppc.cpp (line 64), ?I'm wondering if there is any control point
>  >>> to
>  >>>> ? ? guarantee that these instructions won't be emitted on a CPU that does
>  >>> not
>  >>>> ? ? support them.
>  >>>>
>  >>>> - I think that in general string in format %{} are in upper case. For instance,
>  >>>> ? ? this the current output on optoassembly for vmul4F:
>  >>>>
>  >>>> 2941835 5b4 ? ? ADDI ? ?R24, R24, #64
>  >>>> 2941836 5b8 ? ? vmaddfp ?VSR32,VSR32,VSR36 ? ? ?! mul packed4F
>  >>>> 2941837 5c0 ? ? STXVD2X ? ? [R17], VSR32 ? ? ? ?// store 16-byte Vector
>  >>>>
>  >>>> ? ? I think it would be better to be in upper case instead. I also think that if
>  >>>> ? ? the node match emits more than one instruction all instructions must be
>  >>> listed
>  >>>> ? ? in format %{}, since it's meant for detailed debugging. Finally I think it
>  >>>> ? ? would be better to replace \t! by \t// in that string (unless I'm missing any
>  >>>> ? ? special meaning for that char). So for vmul4F it would be something like:
>  >>>>
>  >>>> 2941835 5b4 ? ? ADDI ? ? ?R24, R24, #64
>  >>>> ? ? ? ? ? ? ? ? ? VSPLTISW ?VSR34, 0 ? ? ? ? ? ? ? ? // Splat 0 imm in VSR34
>  >>>> 2941836 5b8 ? ? VMADDFP ? VSR32,VSR32,VSR36,VSR34 ?// Mul packed4F
>  >>>> 2941837 5c0 ? ? STXVD2X ? [R17], VSR32 ? ? ? ? ? ? // store 16-byte Vector
>  >>>>
>  >>>>
>  >>>> But feel free to change anything just after you get additional reviews :)
>  >>>>
>  >>>>
>  >>>> ? > I confirmed this change with JTREG. In addition, I used attached micro
>  >>> benchmarks.
>  >>>> ? > /(See attached file: slp_microbench.zip)/
>  >>>>
>  >>>> Thanks for sharing it.
>  >>>> Btw, another option to host it would be in the CR
>  >>>> server, in http://cr.openjdk.java.net/~mhorie/8208171 <http://cr.openjdk.java.net/%7Emhorie/8208171>?<http://cr.openjdk.java.net/%7Emhorie/8208171>
>  >>>>
>  >>>>
>  >>>> Best regards,
>  >>>> Gustavo
>  >>>>
>  >>>> ? >
>  >>>> ? > Best regards,
>  >>>> ? > --
>  >>>> ? > Michihiro,
>  >>>> ? > IBM Research - Tokyo
>  >>>> ? >
>  >>>>
>  >>>>
>  >>>>
>  >>
>  >
>  >
>  >
>  >
> 
> 
> 

From ekaterina.pavlova at oracle.com  Wed Sep  5 20:17:18 2018
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Wed, 5 Sep 2018 13:17:18 -0700
Subject: RFR (XS) 8208100: compiler/graalunit/JttLangMathALTest.java times
 out intermittently on Linux-X64
In-Reply-To: <FD2871FD-89DA-44B7-B7AC-CDFA9EC26FCC@oracle.com>
References: <e4b915f5-a678-d5a5-f85f-600f38b577f4@oracle.com>
 <eec6f7ed-b122-fa9c-d001-4d52428a4902@oracle.com>
 <92a20185-5422-6110-38a6-140ea71dfbd0@oracle.com>
 <ebffcde6-6687-27f5-f4c8-30cc409f882e@oracle.com>
 <68A63ADA-A4C1-4CB0-A869-B6B995BCAE04@oracle.com>
 <37916ab4-32fa-4d7d-0783-20adeaf5f7f7@oracle.com>
 <FD2871FD-89DA-44B7-B7AC-CDFA9EC26FCC@oracle.com>
Message-ID: <5d86a9ab-eded-08e5-b790-ed221ff05892@oracle.com>

On 8/29/18 11:41 AM, Doug Simon wrote:
> 
> 
>> On 29 Aug 2018, at 19:23, dean.long at oracle.com wrote:
>>
>> On 8/29/18 2:11 AM, Doug Simon wrote:
>>> When running these tests on Graal tip against JDK 11, I get:
>>>
>>> mx --java-home=/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home unittest org.graalvm.compiler.jtt.lang.Math
>>> ...
>>> 10 longest running test classes:
>>>       21.115 ms    org.graalvm.compiler.jtt.lang.Math_log10
>>>       11.921 ms    org.graalvm.compiler.jtt.lang.Math_log
>>>       10.460 ms    org.graalvm.compiler.jtt.lang.Math_sqrt
>>>        3.525 ms    org.graalvm.compiler.jtt.lang.Math_pow
>>>        1.937 ms    org.graalvm.compiler.jtt.lang.Math_sin
>>>        1.689 ms    org.graalvm.compiler.jtt.lang.Math_tan
>>>        1.550 ms    org.graalvm.compiler.jtt.lang.Math_exp
>>>        1.537 ms    org.graalvm.compiler.jtt.lang.Math_cos
>>>        1.526 ms    org.graalvm.compiler.jtt.lang.Math_abs
>>>          338 ms    org.graalvm.compiler.jtt.lang.Math_round
>>> 10 longest running tests:
>>>       10.583 ms    run7(org.graalvm.compiler.jtt.lang.Math_log)
>>>       10.335 ms    run7(org.graalvm.compiler.jtt.lang.Math_sqrt)
>>>        3.468 ms    run11(org.graalvm.compiler.jtt.lang.Math_pow)
>>>        1.666 ms    run5(org.graalvm.compiler.jtt.lang.Math_sin)
>>>        1.533 ms    run5(org.graalvm.compiler.jtt.lang.Math_tan)
>>>        1.519 ms    run8(org.graalvm.compiler.jtt.lang.Math_exp)
>>>        1.456 ms    run3(org.graalvm.compiler.jtt.lang.Math_cos)
>>>        1.371 ms    run7(org.graalvm.compiler.jtt.lang.Math_abs)
>>>        1.024 ms    run0(org.graalvm.compiler.jtt.lang.Math_log)
>>>           84 ms    run0(org.graalvm.compiler.jtt.lang.Math_sin)
>>>
>>> All seems as expected.
>>>
>>> I wanted to try it against JDK tip but https://bugs.openjdk.java.net/browse/JDK-8209301 means GitHub Graal cannot be compiled against the latest JDK:
>>>
>>> /Users/dsimon/graal/graal/compiler/src/jdk.tools.jaotc/src/jdk/tools/jaotc/AOTCompiledClass.java:77: error: cannot find symbol
>>>              this.metadataName = type.isAnonymous() ? "anon<" + classId + ">" : type.getName();
>>>                                      ^
>>>    symbol:   method isAnonymous()
>>>    location: variable type of type HotSpotResolvedObjectType
>>> 1 error
>>>
>>> To fix this, AOTCompiledClass.java will either have to use reflection to access HotSpotResolvedObjectType.isAnonymous[Unsafe]Anonymous or a jdk12 versioned project will have to be added.
>>
>> It looks like they forgot to rename getHostClass(), so we could replace uses of isAnonymous() with getHostClass() != null.  I think I added isAnonymous() first and then getHostClass() was added later.
> 
> I assume `getHostClass() != null` is more expensive than `isAnonymous()` but it probably doesn't matter here. Either way, this will have to be resolved before the next Graal sync.

Doug, could you please point to the bug id this issue is going to be tracked by.

thanks,
-katya

> -Doug
> 
>>> -Doug
>>>
>>>> On 29 Aug 2018, at 10:08, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>>>
>>>> I meant org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_sqrt
>>>>
>>>>
>>>> On 8/29/18 1:05 AM, Ekaterina Pavlova wrote:
>>>>> org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_log always were more than 10 times slower than other org.graalvm.compiler.jtt.lang.Math tests. But I agree, lets wait what Graal team will say regarding this slowness.
>>>>> thanks,
>>>>> -katya
>>>>> On 8/28/18 3:30 PM, Vladimir Kozlov wrote:
>>>>>> Before increase timeout Labs should look on this test and investigate it strange behavior - last iteration takes long time:
>>>>>>
>>>>>>     run5: Passed 228.9 ms
>>>>>>     run6: Passed 145.7 ms
>>>>>>     run7: Passed 833395.5 ms
>>>>>> org.graalvm.compiler.jtt.lang.Math_log finished 836029.5 ms
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 8/27/18 9:03 AM, Ekaterina Pavlova wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> compiler/graalunit/JttLangMathALTest.java continues to timeout on slow machines.
>>>>>>> Increased default timeout (120 seconds) in twice. Please review.
>>>>>>>
>>>>>>>       JBS: https://bugs.openjdk.java.net/browse/JDK-8208100
>>>>>>>    webrev: http://cr.openjdk.java.net/~epavlova/8208100/webrev.00/index.html
>>>>>>> testing: Tested by running the test 10 times on all platforms.
>>>>>>>
>>>>>>>
>>>>>>> thanks,
>>>>>>> -katya
> 


From doug.simon at oracle.com  Wed Sep  5 20:29:06 2018
From: doug.simon at oracle.com (Doug Simon)
Date: Wed, 5 Sep 2018 22:29:06 +0200
Subject: RFR (XS) 8208100: compiler/graalunit/JttLangMathALTest.java times
 out intermittently on Linux-X64
In-Reply-To: <5d86a9ab-eded-08e5-b790-ed221ff05892@oracle.com>
References: <e4b915f5-a678-d5a5-f85f-600f38b577f4@oracle.com>
 <eec6f7ed-b122-fa9c-d001-4d52428a4902@oracle.com>
 <92a20185-5422-6110-38a6-140ea71dfbd0@oracle.com>
 <ebffcde6-6687-27f5-f4c8-30cc409f882e@oracle.com>
 <68A63ADA-A4C1-4CB0-A869-B6B995BCAE04@oracle.com>
 <37916ab4-32fa-4d7d-0783-20adeaf5f7f7@oracle.com>
 <FD2871FD-89DA-44B7-B7AC-CDFA9EC26FCC@oracle.com>
 <5d86a9ab-eded-08e5-b790-ed221ff05892@oracle.com>
Message-ID: <2A0A38E8-460D-4B66-A0A6-55ACE5A2688F@oracle.com>

Hi Katya,

> On 5 Sep 2018, at 22:17, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
> 
> On 8/29/18 11:41 AM, Doug Simon wrote:
>>> On 29 Aug 2018, at 19:23, dean.long at oracle.com wrote:
>>> 
>>> On 8/29/18 2:11 AM, Doug Simon wrote:
>>>> When running these tests on Graal tip against JDK 11, I get:
>>>> 
>>>> mx --java-home=/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home unittest org.graalvm.compiler.jtt.lang.Math
>>>> ...
>>>> 10 longest running test classes:
>>>>      21.115 ms    org.graalvm.compiler.jtt.lang.Math_log10
>>>>      11.921 ms    org.graalvm.compiler.jtt.lang.Math_log
>>>>      10.460 ms    org.graalvm.compiler.jtt.lang.Math_sqrt
>>>>       3.525 ms    org.graalvm.compiler.jtt.lang.Math_pow
>>>>       1.937 ms    org.graalvm.compiler.jtt.lang.Math_sin
>>>>       1.689 ms    org.graalvm.compiler.jtt.lang.Math_tan
>>>>       1.550 ms    org.graalvm.compiler.jtt.lang.Math_exp
>>>>       1.537 ms    org.graalvm.compiler.jtt.lang.Math_cos
>>>>       1.526 ms    org.graalvm.compiler.jtt.lang.Math_abs
>>>>         338 ms    org.graalvm.compiler.jtt.lang.Math_round
>>>> 10 longest running tests:
>>>>      10.583 ms    run7(org.graalvm.compiler.jtt.lang.Math_log)
>>>>      10.335 ms    run7(org.graalvm.compiler.jtt.lang.Math_sqrt)
>>>>       3.468 ms    run11(org.graalvm.compiler.jtt.lang.Math_pow)
>>>>       1.666 ms    run5(org.graalvm.compiler.jtt.lang.Math_sin)
>>>>       1.533 ms    run5(org.graalvm.compiler.jtt.lang.Math_tan)
>>>>       1.519 ms    run8(org.graalvm.compiler.jtt.lang.Math_exp)
>>>>       1.456 ms    run3(org.graalvm.compiler.jtt.lang.Math_cos)
>>>>       1.371 ms    run7(org.graalvm.compiler.jtt.lang.Math_abs)
>>>>       1.024 ms    run0(org.graalvm.compiler.jtt.lang.Math_log)
>>>>          84 ms    run0(org.graalvm.compiler.jtt.lang.Math_sin)
>>>> 
>>>> All seems as expected.
>>>> 
>>>> I wanted to try it against JDK tip but https://bugs.openjdk.java.net/browse/JDK-8209301 means GitHub Graal cannot be compiled against the latest JDK:
>>>> 
>>>> /Users/dsimon/graal/graal/compiler/src/jdk.tools.jaotc/src/jdk/tools/jaotc/AOTCompiledClass.java:77: error: cannot find symbol
>>>>             this.metadataName = type.isAnonymous() ? "anon<" + classId + ">" : type.getName();
>>>>                                     ^
>>>>   symbol:   method isAnonymous()
>>>>   location: variable type of type HotSpotResolvedObjectType
>>>> 1 error
>>>> 
>>>> To fix this, AOTCompiledClass.java will either have to use reflection to access HotSpotResolvedObjectType.isAnonymous[Unsafe]Anonymous or a jdk12 versioned project will have to be added.
>>> 
>>> It looks like they forgot to rename getHostClass(), so we could replace uses of isAnonymous() with getHostClass() != null.  I think I added isAnonymous() first and then getHostClass() was added later.
>> I assume `getHostClass() != null` is more expensive than `isAnonymous()` but it probably doesn't matter here. Either way, this will have to be resolved before the next Graal sync.
> 
> Doug, could you please point to the bug id this issue is going to be tracked by.

I don't have a bug id for this issue - feel free to open one and assign it to me.

I left a note pointing out the Graal compilation issue along with Dean's recommended fix:

https://bugs.openjdk.java.net/browse/JDK-8209301?focusedCommentId=14208481&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14208481

-Doug

>> -Doug
>>>> -Doug
>>>> 
>>>>> On 29 Aug 2018, at 10:08, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>>>> 
>>>>> I meant org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_sqrt
>>>>> 
>>>>> 
>>>>> On 8/29/18 1:05 AM, Ekaterina Pavlova wrote:
>>>>>> org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_log always were more than 10 times slower than other org.graalvm.compiler.jtt.lang.Math tests. But I agree, lets wait what Graal team will say regarding this slowness.
>>>>>> thanks,
>>>>>> -katya
>>>>>> On 8/28/18 3:30 PM, Vladimir Kozlov wrote:
>>>>>>> Before increase timeout Labs should look on this test and investigate it strange behavior - last iteration takes long time:
>>>>>>> 
>>>>>>>    run5: Passed 228.9 ms
>>>>>>>    run6: Passed 145.7 ms
>>>>>>>    run7: Passed 833395.5 ms
>>>>>>> org.graalvm.compiler.jtt.lang.Math_log finished 836029.5 ms
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>> 
>>>>>>> On 8/27/18 9:03 AM, Ekaterina Pavlova wrote:
>>>>>>>> Hi All,
>>>>>>>> 
>>>>>>>> compiler/graalunit/JttLangMathALTest.java continues to timeout on slow machines.
>>>>>>>> Increased default timeout (120 seconds) in twice. Please review.
>>>>>>>> 
>>>>>>>>      JBS: https://bugs.openjdk.java.net/browse/JDK-8208100
>>>>>>>>   webrev: http://cr.openjdk.java.net/~epavlova/8208100/webrev.00/index.html
>>>>>>>> testing: Tested by running the test 10 times on all platforms.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> thanks,
>>>>>>>> -katya


From ekaterina.pavlova at oracle.com  Wed Sep  5 21:10:50 2018
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Wed, 5 Sep 2018 14:10:50 -0700
Subject: RFR (XS) 8208100: compiler/graalunit/JttLangMathALTest.java times
 out intermittently on Linux-X64
In-Reply-To: <2A0A38E8-460D-4B66-A0A6-55ACE5A2688F@oracle.com>
References: <e4b915f5-a678-d5a5-f85f-600f38b577f4@oracle.com>
 <eec6f7ed-b122-fa9c-d001-4d52428a4902@oracle.com>
 <92a20185-5422-6110-38a6-140ea71dfbd0@oracle.com>
 <ebffcde6-6687-27f5-f4c8-30cc409f882e@oracle.com>
 <68A63ADA-A4C1-4CB0-A869-B6B995BCAE04@oracle.com>
 <37916ab4-32fa-4d7d-0783-20adeaf5f7f7@oracle.com>
 <FD2871FD-89DA-44B7-B7AC-CDFA9EC26FCC@oracle.com>
 <5d86a9ab-eded-08e5-b790-ed221ff05892@oracle.com>
 <2A0A38E8-460D-4B66-A0A6-55ACE5A2688F@oracle.com>
Message-ID: <75f90080-aaa5-5b9c-a1a3-f580eec4919b@oracle.com>

Hi Doug,

I have created JDK-8210434.

-katya

On 9/5/18 1:29 PM, Doug Simon wrote:
> Hi Katya,
> 
>> On 5 Sep 2018, at 22:17, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>
>> On 8/29/18 11:41 AM, Doug Simon wrote:
>>>> On 29 Aug 2018, at 19:23, dean.long at oracle.com wrote:
>>>>
>>>> On 8/29/18 2:11 AM, Doug Simon wrote:
>>>>> When running these tests on Graal tip against JDK 11, I get:
>>>>>
>>>>> mx --java-home=/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home unittest org.graalvm.compiler.jtt.lang.Math
>>>>> ...
>>>>> 10 longest running test classes:
>>>>>       21.115 ms    org.graalvm.compiler.jtt.lang.Math_log10
>>>>>       11.921 ms    org.graalvm.compiler.jtt.lang.Math_log
>>>>>       10.460 ms    org.graalvm.compiler.jtt.lang.Math_sqrt
>>>>>        3.525 ms    org.graalvm.compiler.jtt.lang.Math_pow
>>>>>        1.937 ms    org.graalvm.compiler.jtt.lang.Math_sin
>>>>>        1.689 ms    org.graalvm.compiler.jtt.lang.Math_tan
>>>>>        1.550 ms    org.graalvm.compiler.jtt.lang.Math_exp
>>>>>        1.537 ms    org.graalvm.compiler.jtt.lang.Math_cos
>>>>>        1.526 ms    org.graalvm.compiler.jtt.lang.Math_abs
>>>>>          338 ms    org.graalvm.compiler.jtt.lang.Math_round
>>>>> 10 longest running tests:
>>>>>       10.583 ms    run7(org.graalvm.compiler.jtt.lang.Math_log)
>>>>>       10.335 ms    run7(org.graalvm.compiler.jtt.lang.Math_sqrt)
>>>>>        3.468 ms    run11(org.graalvm.compiler.jtt.lang.Math_pow)
>>>>>        1.666 ms    run5(org.graalvm.compiler.jtt.lang.Math_sin)
>>>>>        1.533 ms    run5(org.graalvm.compiler.jtt.lang.Math_tan)
>>>>>        1.519 ms    run8(org.graalvm.compiler.jtt.lang.Math_exp)
>>>>>        1.456 ms    run3(org.graalvm.compiler.jtt.lang.Math_cos)
>>>>>        1.371 ms    run7(org.graalvm.compiler.jtt.lang.Math_abs)
>>>>>        1.024 ms    run0(org.graalvm.compiler.jtt.lang.Math_log)
>>>>>           84 ms    run0(org.graalvm.compiler.jtt.lang.Math_sin)
>>>>>
>>>>> All seems as expected.
>>>>>
>>>>> I wanted to try it against JDK tip but https://bugs.openjdk.java.net/browse/JDK-8209301 means GitHub Graal cannot be compiled against the latest JDK:
>>>>>
>>>>> /Users/dsimon/graal/graal/compiler/src/jdk.tools.jaotc/src/jdk/tools/jaotc/AOTCompiledClass.java:77: error: cannot find symbol
>>>>>              this.metadataName = type.isAnonymous() ? "anon<" + classId + ">" : type.getName();
>>>>>                                      ^
>>>>>    symbol:   method isAnonymous()
>>>>>    location: variable type of type HotSpotResolvedObjectType
>>>>> 1 error
>>>>>
>>>>> To fix this, AOTCompiledClass.java will either have to use reflection to access HotSpotResolvedObjectType.isAnonymous[Unsafe]Anonymous or a jdk12 versioned project will have to be added.
>>>>
>>>> It looks like they forgot to rename getHostClass(), so we could replace uses of isAnonymous() with getHostClass() != null.  I think I added isAnonymous() first and then getHostClass() was added later.
>>> I assume `getHostClass() != null` is more expensive than `isAnonymous()` but it probably doesn't matter here. Either way, this will have to be resolved before the next Graal sync.
>>
>> Doug, could you please point to the bug id this issue is going to be tracked by.
> 
> I don't have a bug id for this issue - feel free to open one and assign it to me.
> 
> I left a note pointing out the Graal compilation issue along with Dean's recommended fix:
> 
> https://bugs.openjdk.java.net/browse/JDK-8209301?focusedCommentId=14208481&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14208481
> 
> -Doug
> 
>>> -Doug
>>>>> -Doug
>>>>>
>>>>>> On 29 Aug 2018, at 10:08, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>>>>>
>>>>>> I meant org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_sqrt
>>>>>>
>>>>>>
>>>>>> On 8/29/18 1:05 AM, Ekaterina Pavlova wrote:
>>>>>>> org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_log always were more than 10 times slower than other org.graalvm.compiler.jtt.lang.Math tests. But I agree, lets wait what Graal team will say regarding this slowness.
>>>>>>> thanks,
>>>>>>> -katya
>>>>>>> On 8/28/18 3:30 PM, Vladimir Kozlov wrote:
>>>>>>>> Before increase timeout Labs should look on this test and investigate it strange behavior - last iteration takes long time:
>>>>>>>>
>>>>>>>>     run5: Passed 228.9 ms
>>>>>>>>     run6: Passed 145.7 ms
>>>>>>>>     run7: Passed 833395.5 ms
>>>>>>>> org.graalvm.compiler.jtt.lang.Math_log finished 836029.5 ms
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>> On 8/27/18 9:03 AM, Ekaterina Pavlova wrote:
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> compiler/graalunit/JttLangMathALTest.java continues to timeout on slow machines.
>>>>>>>>> Increased default timeout (120 seconds) in twice. Please review.
>>>>>>>>>
>>>>>>>>>       JBS: https://bugs.openjdk.java.net/browse/JDK-8208100
>>>>>>>>>    webrev: http://cr.openjdk.java.net/~epavlova/8208100/webrev.00/index.html
>>>>>>>>> testing: Tested by running the test 10 times on all platforms.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>> -katya
> 


From doug.simon at oracle.com  Wed Sep  5 21:14:43 2018
From: doug.simon at oracle.com (Doug Simon)
Date: Wed, 5 Sep 2018 23:14:43 +0200
Subject: RFR (XS) 8208100: compiler/graalunit/JttLangMathALTest.java times
 out intermittently on Linux-X64
In-Reply-To: <75f90080-aaa5-5b9c-a1a3-f580eec4919b@oracle.com>
References: <e4b915f5-a678-d5a5-f85f-600f38b577f4@oracle.com>
 <eec6f7ed-b122-fa9c-d001-4d52428a4902@oracle.com>
 <92a20185-5422-6110-38a6-140ea71dfbd0@oracle.com>
 <ebffcde6-6687-27f5-f4c8-30cc409f882e@oracle.com>
 <68A63ADA-A4C1-4CB0-A869-B6B995BCAE04@oracle.com>
 <37916ab4-32fa-4d7d-0783-20adeaf5f7f7@oracle.com>
 <FD2871FD-89DA-44B7-B7AC-CDFA9EC26FCC@oracle.com>
 <5d86a9ab-eded-08e5-b790-ed221ff05892@oracle.com>
 <2A0A38E8-460D-4B66-A0A6-55ACE5A2688F@oracle.com>
 <75f90080-aaa5-5b9c-a1a3-f580eec4919b@oracle.com>
Message-ID: <CBE9BD6A-DE40-4CD2-A497-77974EE6B447@oracle.com>


> On 5 Sep 2018, at 23:10, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
> 
> Hi Doug,
> 
> I have created JDK-8210434.

Ok. I thought you were talking about a bug id for the failing tests.

Dean, I'll re-assign JDK-8210434 to you since it's a jaotc issue.

-Doug

> On 9/5/18 1:29 PM, Doug Simon wrote:
>> Hi Katya,
>>> On 5 Sep 2018, at 22:17, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>> 
>>> On 8/29/18 11:41 AM, Doug Simon wrote:
>>>>> On 29 Aug 2018, at 19:23, dean.long at oracle.com wrote:
>>>>> 
>>>>> On 8/29/18 2:11 AM, Doug Simon wrote:
>>>>>> When running these tests on Graal tip against JDK 11, I get:
>>>>>> 
>>>>>> mx --java-home=/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home unittest org.graalvm.compiler.jtt.lang.Math
>>>>>> ...
>>>>>> 10 longest running test classes:
>>>>>>      21.115 ms    org.graalvm.compiler.jtt.lang.Math_log10
>>>>>>      11.921 ms    org.graalvm.compiler.jtt.lang.Math_log
>>>>>>      10.460 ms    org.graalvm.compiler.jtt.lang.Math_sqrt
>>>>>>       3.525 ms    org.graalvm.compiler.jtt.lang.Math_pow
>>>>>>       1.937 ms    org.graalvm.compiler.jtt.lang.Math_sin
>>>>>>       1.689 ms    org.graalvm.compiler.jtt.lang.Math_tan
>>>>>>       1.550 ms    org.graalvm.compiler.jtt.lang.Math_exp
>>>>>>       1.537 ms    org.graalvm.compiler.jtt.lang.Math_cos
>>>>>>       1.526 ms    org.graalvm.compiler.jtt.lang.Math_abs
>>>>>>         338 ms    org.graalvm.compiler.jtt.lang.Math_round
>>>>>> 10 longest running tests:
>>>>>>      10.583 ms    run7(org.graalvm.compiler.jtt.lang.Math_log)
>>>>>>      10.335 ms    run7(org.graalvm.compiler.jtt.lang.Math_sqrt)
>>>>>>       3.468 ms    run11(org.graalvm.compiler.jtt.lang.Math_pow)
>>>>>>       1.666 ms    run5(org.graalvm.compiler.jtt.lang.Math_sin)
>>>>>>       1.533 ms    run5(org.graalvm.compiler.jtt.lang.Math_tan)
>>>>>>       1.519 ms    run8(org.graalvm.compiler.jtt.lang.Math_exp)
>>>>>>       1.456 ms    run3(org.graalvm.compiler.jtt.lang.Math_cos)
>>>>>>       1.371 ms    run7(org.graalvm.compiler.jtt.lang.Math_abs)
>>>>>>       1.024 ms    run0(org.graalvm.compiler.jtt.lang.Math_log)
>>>>>>          84 ms    run0(org.graalvm.compiler.jtt.lang.Math_sin)
>>>>>> 
>>>>>> All seems as expected.
>>>>>> 
>>>>>> I wanted to try it against JDK tip but https://bugs.openjdk.java.net/browse/JDK-8209301 means GitHub Graal cannot be compiled against the latest JDK:
>>>>>> 
>>>>>> /Users/dsimon/graal/graal/compiler/src/jdk.tools.jaotc/src/jdk/tools/jaotc/AOTCompiledClass.java:77: error: cannot find symbol
>>>>>>             this.metadataName = type.isAnonymous() ? "anon<" + classId + ">" : type.getName();
>>>>>>                                     ^
>>>>>>   symbol:   method isAnonymous()
>>>>>>   location: variable type of type HotSpotResolvedObjectType
>>>>>> 1 error
>>>>>> 
>>>>>> To fix this, AOTCompiledClass.java will either have to use reflection to access HotSpotResolvedObjectType.isAnonymous[Unsafe]Anonymous or a jdk12 versioned project will have to be added.
>>>>> 
>>>>> It looks like they forgot to rename getHostClass(), so we could replace uses of isAnonymous() with getHostClass() != null.  I think I added isAnonymous() first and then getHostClass() was added later.
>>>> I assume `getHostClass() != null` is more expensive than `isAnonymous()` but it probably doesn't matter here. Either way, this will have to be resolved before the next Graal sync.
>>> 
>>> Doug, could you please point to the bug id this issue is going to be tracked by.
>> I don't have a bug id for this issue - feel free to open one and assign it to me.
>> I left a note pointing out the Graal compilation issue along with Dean's recommended fix:
>> https://bugs.openjdk.java.net/browse/JDK-8209301?focusedCommentId=14208481&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14208481
>> -Doug
>>>> -Doug
>>>>>> -Doug
>>>>>> 
>>>>>>> On 29 Aug 2018, at 10:08, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>>>>>> 
>>>>>>> I meant org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_sqrt
>>>>>>> 
>>>>>>> 
>>>>>>> On 8/29/18 1:05 AM, Ekaterina Pavlova wrote:
>>>>>>>> org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_log always were more than 10 times slower than other org.graalvm.compiler.jtt.lang.Math tests. But I agree, lets wait what Graal team will say regarding this slowness.
>>>>>>>> thanks,
>>>>>>>> -katya
>>>>>>>> On 8/28/18 3:30 PM, Vladimir Kozlov wrote:
>>>>>>>>> Before increase timeout Labs should look on this test and investigate it strange behavior - last iteration takes long time:
>>>>>>>>> 
>>>>>>>>>    run5: Passed 228.9 ms
>>>>>>>>>    run6: Passed 145.7 ms
>>>>>>>>>    run7: Passed 833395.5 ms
>>>>>>>>> org.graalvm.compiler.jtt.lang.Math_log finished 836029.5 ms
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>> 
>>>>>>>>> On 8/27/18 9:03 AM, Ekaterina Pavlova wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>> 
>>>>>>>>>> compiler/graalunit/JttLangMathALTest.java continues to timeout on slow machines.
>>>>>>>>>> Increased default timeout (120 seconds) in twice. Please review.
>>>>>>>>>> 
>>>>>>>>>>      JBS: https://bugs.openjdk.java.net/browse/JDK-8208100
>>>>>>>>>>   webrev: http://cr.openjdk.java.net/~epavlova/8208100/webrev.00/index.html
>>>>>>>>>> testing: Tested by running the test 10 times on all platforms.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> thanks,
>>>>>>>>>> -katya
> 


From ekaterina.pavlova at oracle.com  Wed Sep  5 21:22:35 2018
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Wed, 5 Sep 2018 14:22:35 -0700
Subject: RFR (XS) 8208100: compiler/graalunit/JttLangMathALTest.java times
 out intermittently on Linux-X64
In-Reply-To: <CBE9BD6A-DE40-4CD2-A497-77974EE6B447@oracle.com>
References: <e4b915f5-a678-d5a5-f85f-600f38b577f4@oracle.com>
 <eec6f7ed-b122-fa9c-d001-4d52428a4902@oracle.com>
 <92a20185-5422-6110-38a6-140ea71dfbd0@oracle.com>
 <ebffcde6-6687-27f5-f4c8-30cc409f882e@oracle.com>
 <68A63ADA-A4C1-4CB0-A869-B6B995BCAE04@oracle.com>
 <37916ab4-32fa-4d7d-0783-20adeaf5f7f7@oracle.com>
 <FD2871FD-89DA-44B7-B7AC-CDFA9EC26FCC@oracle.com>
 <5d86a9ab-eded-08e5-b790-ed221ff05892@oracle.com>
 <2A0A38E8-460D-4B66-A0A6-55ACE5A2688F@oracle.com>
 <75f90080-aaa5-5b9c-a1a3-f580eec4919b@oracle.com>
 <CBE9BD6A-DE40-4CD2-A497-77974EE6B447@oracle.com>
Message-ID: <1ffdeaed-892c-cb5a-31b6-21955147008c@oracle.com>

Well, compiler/graalunit/JttLangMathALTest.java doesn't really fail, the test just runs slowly because
slow  org.graalvm.compiler.jtt.lang.Math_log sub-tests. Graal team doesn't see this slowness
when running these tests from Graal ws. However latest jdk is not used by default. The attempt
to use latest jdk failed because of 8209301.

Let me know if I am missing anything.

thanks,
-katya

On 9/5/18 2:14 PM, Doug Simon wrote:
> 
> 
>> On 5 Sep 2018, at 23:10, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>
>> Hi Doug,
>>
>> I have created JDK-8210434.
> 
> Ok. I thought you were talking about a bug id for the failing tests.
> 
> Dean, I'll re-assign JDK-8210434 to you since it's a jaotc issue.
> 
> -Doug
> 
>> On 9/5/18 1:29 PM, Doug Simon wrote:
>>> Hi Katya,
>>>> On 5 Sep 2018, at 22:17, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>>>
>>>> On 8/29/18 11:41 AM, Doug Simon wrote:
>>>>>> On 29 Aug 2018, at 19:23, dean.long at oracle.com wrote:
>>>>>>
>>>>>> On 8/29/18 2:11 AM, Doug Simon wrote:
>>>>>>> When running these tests on Graal tip against JDK 11, I get:
>>>>>>>
>>>>>>> mx --java-home=/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home unittest org.graalvm.compiler.jtt.lang.Math
>>>>>>> ...
>>>>>>> 10 longest running test classes:
>>>>>>>       21.115 ms    org.graalvm.compiler.jtt.lang.Math_log10
>>>>>>>       11.921 ms    org.graalvm.compiler.jtt.lang.Math_log
>>>>>>>       10.460 ms    org.graalvm.compiler.jtt.lang.Math_sqrt
>>>>>>>        3.525 ms    org.graalvm.compiler.jtt.lang.Math_pow
>>>>>>>        1.937 ms    org.graalvm.compiler.jtt.lang.Math_sin
>>>>>>>        1.689 ms    org.graalvm.compiler.jtt.lang.Math_tan
>>>>>>>        1.550 ms    org.graalvm.compiler.jtt.lang.Math_exp
>>>>>>>        1.537 ms    org.graalvm.compiler.jtt.lang.Math_cos
>>>>>>>        1.526 ms    org.graalvm.compiler.jtt.lang.Math_abs
>>>>>>>          338 ms    org.graalvm.compiler.jtt.lang.Math_round
>>>>>>> 10 longest running tests:
>>>>>>>       10.583 ms    run7(org.graalvm.compiler.jtt.lang.Math_log)
>>>>>>>       10.335 ms    run7(org.graalvm.compiler.jtt.lang.Math_sqrt)
>>>>>>>        3.468 ms    run11(org.graalvm.compiler.jtt.lang.Math_pow)
>>>>>>>        1.666 ms    run5(org.graalvm.compiler.jtt.lang.Math_sin)
>>>>>>>        1.533 ms    run5(org.graalvm.compiler.jtt.lang.Math_tan)
>>>>>>>        1.519 ms    run8(org.graalvm.compiler.jtt.lang.Math_exp)
>>>>>>>        1.456 ms    run3(org.graalvm.compiler.jtt.lang.Math_cos)
>>>>>>>        1.371 ms    run7(org.graalvm.compiler.jtt.lang.Math_abs)
>>>>>>>        1.024 ms    run0(org.graalvm.compiler.jtt.lang.Math_log)
>>>>>>>           84 ms    run0(org.graalvm.compiler.jtt.lang.Math_sin)
>>>>>>>
>>>>>>> All seems as expected.
>>>>>>>
>>>>>>> I wanted to try it against JDK tip but https://bugs.openjdk.java.net/browse/JDK-8209301 means GitHub Graal cannot be compiled against the latest JDK:
>>>>>>>
>>>>>>> /Users/dsimon/graal/graal/compiler/src/jdk.tools.jaotc/src/jdk/tools/jaotc/AOTCompiledClass.java:77: error: cannot find symbol
>>>>>>>              this.metadataName = type.isAnonymous() ? "anon<" + classId + ">" : type.getName();
>>>>>>>                                      ^
>>>>>>>    symbol:   method isAnonymous()
>>>>>>>    location: variable type of type HotSpotResolvedObjectType
>>>>>>> 1 error
>>>>>>>
>>>>>>> To fix this, AOTCompiledClass.java will either have to use reflection to access HotSpotResolvedObjectType.isAnonymous[Unsafe]Anonymous or a jdk12 versioned project will have to be added.
>>>>>>
>>>>>> It looks like they forgot to rename getHostClass(), so we could replace uses of isAnonymous() with getHostClass() != null.  I think I added isAnonymous() first and then getHostClass() was added later.
>>>>> I assume `getHostClass() != null` is more expensive than `isAnonymous()` but it probably doesn't matter here. Either way, this will have to be resolved before the next Graal sync.
>>>>
>>>> Doug, could you please point to the bug id this issue is going to be tracked by.
>>> I don't have a bug id for this issue - feel free to open one and assign it to me.
>>> I left a note pointing out the Graal compilation issue along with Dean's recommended fix:
>>> https://bugs.openjdk.java.net/browse/JDK-8209301?focusedCommentId=14208481&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14208481
>>> -Doug
>>>>> -Doug
>>>>>>> -Doug
>>>>>>>
>>>>>>>> On 29 Aug 2018, at 10:08, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>>>>>>>
>>>>>>>> I meant org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_sqrt
>>>>>>>>
>>>>>>>>
>>>>>>>> On 8/29/18 1:05 AM, Ekaterina Pavlova wrote:
>>>>>>>>> org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_log always were more than 10 times slower than other org.graalvm.compiler.jtt.lang.Math tests. But I agree, lets wait what Graal team will say regarding this slowness.
>>>>>>>>> thanks,
>>>>>>>>> -katya
>>>>>>>>> On 8/28/18 3:30 PM, Vladimir Kozlov wrote:
>>>>>>>>>> Before increase timeout Labs should look on this test and investigate it strange behavior - last iteration takes long time:
>>>>>>>>>>
>>>>>>>>>>     run5: Passed 228.9 ms
>>>>>>>>>>     run6: Passed 145.7 ms
>>>>>>>>>>     run7: Passed 833395.5 ms
>>>>>>>>>> org.graalvm.compiler.jtt.lang.Math_log finished 836029.5 ms
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Vladimir
>>>>>>>>>>
>>>>>>>>>> On 8/27/18 9:03 AM, Ekaterina Pavlova wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> compiler/graalunit/JttLangMathALTest.java continues to timeout on slow machines.
>>>>>>>>>>> Increased default timeout (120 seconds) in twice. Please review.
>>>>>>>>>>>
>>>>>>>>>>>       JBS: https://bugs.openjdk.java.net/browse/JDK-8208100
>>>>>>>>>>>    webrev: http://cr.openjdk.java.net/~epavlova/8208100/webrev.00/index.html
>>>>>>>>>>> testing: Tested by running the test 10 times on all platforms.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> thanks,
>>>>>>>>>>> -katya
>>
> 


From doug.simon at oracle.com  Wed Sep  5 21:35:50 2018
From: doug.simon at oracle.com (Doug Simon)
Date: Wed, 5 Sep 2018 23:35:50 +0200
Subject: RFR (XS) 8208100: compiler/graalunit/JttLangMathALTest.java times
 out intermittently on Linux-X64
In-Reply-To: <1ffdeaed-892c-cb5a-31b6-21955147008c@oracle.com>
References: <e4b915f5-a678-d5a5-f85f-600f38b577f4@oracle.com>
 <eec6f7ed-b122-fa9c-d001-4d52428a4902@oracle.com>
 <92a20185-5422-6110-38a6-140ea71dfbd0@oracle.com>
 <ebffcde6-6687-27f5-f4c8-30cc409f882e@oracle.com>
 <68A63ADA-A4C1-4CB0-A869-B6B995BCAE04@oracle.com>
 <37916ab4-32fa-4d7d-0783-20adeaf5f7f7@oracle.com>
 <FD2871FD-89DA-44B7-B7AC-CDFA9EC26FCC@oracle.com>
 <5d86a9ab-eded-08e5-b790-ed221ff05892@oracle.com>
 <2A0A38E8-460D-4B66-A0A6-55ACE5A2688F@oracle.com>
 <75f90080-aaa5-5b9c-a1a3-f580eec4919b@oracle.com>
 <CBE9BD6A-DE40-4CD2-A497-77974EE6B447@oracle.com>
 <1ffdeaed-892c-cb5a-31b6-21955147008c@oracle.com>
Message-ID: <7DFCD9B1-CADB-428A-9C22-3331A8FBE28E@oracle.com>


> On 5 Sep 2018, at 23:22, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
> 
> Well, compiler/graalunit/JttLangMathALTest.java doesn't really fail, the test just runs slowly because
> slow  org.graalvm.compiler.jtt.lang.Math_log sub-tests. Graal team doesn't see this slowness
> when running these tests from Graal ws. However latest jdk is not used by default. The attempt
> to use latest jdk failed because of 8209301.
> 
> Let me know if I am missing anything.

Nope - that clears things up for me - thanks. Once 8209301 is resolved, I can help with 8208100 (assuming I can reproduce it).

-Doug

> 
> thanks,
> -katya
> 
> On 9/5/18 2:14 PM, Doug Simon wrote:
>>> On 5 Sep 2018, at 23:10, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>> 
>>> Hi Doug,
>>> 
>>> I have created JDK-8210434.
>> Ok. I thought you were talking about a bug id for the failing tests.
>> Dean, I'll re-assign JDK-8210434 to you since it's a jaotc issue.
>> -Doug
>>> On 9/5/18 1:29 PM, Doug Simon wrote:
>>>> Hi Katya,
>>>>> On 5 Sep 2018, at 22:17, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>>>> 
>>>>> On 8/29/18 11:41 AM, Doug Simon wrote:
>>>>>>> On 29 Aug 2018, at 19:23, dean.long at oracle.com wrote:
>>>>>>> 
>>>>>>> On 8/29/18 2:11 AM, Doug Simon wrote:
>>>>>>>> When running these tests on Graal tip against JDK 11, I get:
>>>>>>>> 
>>>>>>>> mx --java-home=/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home unittest org.graalvm.compiler.jtt.lang.Math
>>>>>>>> ...
>>>>>>>> 10 longest running test classes:
>>>>>>>>      21.115 ms    org.graalvm.compiler.jtt.lang.Math_log10
>>>>>>>>      11.921 ms    org.graalvm.compiler.jtt.lang.Math_log
>>>>>>>>      10.460 ms    org.graalvm.compiler.jtt.lang.Math_sqrt
>>>>>>>>       3.525 ms    org.graalvm.compiler.jtt.lang.Math_pow
>>>>>>>>       1.937 ms    org.graalvm.compiler.jtt.lang.Math_sin
>>>>>>>>       1.689 ms    org.graalvm.compiler.jtt.lang.Math_tan
>>>>>>>>       1.550 ms    org.graalvm.compiler.jtt.lang.Math_exp
>>>>>>>>       1.537 ms    org.graalvm.compiler.jtt.lang.Math_cos
>>>>>>>>       1.526 ms    org.graalvm.compiler.jtt.lang.Math_abs
>>>>>>>>         338 ms    org.graalvm.compiler.jtt.lang.Math_round
>>>>>>>> 10 longest running tests:
>>>>>>>>      10.583 ms    run7(org.graalvm.compiler.jtt.lang.Math_log)
>>>>>>>>      10.335 ms    run7(org.graalvm.compiler.jtt.lang.Math_sqrt)
>>>>>>>>       3.468 ms    run11(org.graalvm.compiler.jtt.lang.Math_pow)
>>>>>>>>       1.666 ms    run5(org.graalvm.compiler.jtt.lang.Math_sin)
>>>>>>>>       1.533 ms    run5(org.graalvm.compiler.jtt.lang.Math_tan)
>>>>>>>>       1.519 ms    run8(org.graalvm.compiler.jtt.lang.Math_exp)
>>>>>>>>       1.456 ms    run3(org.graalvm.compiler.jtt.lang.Math_cos)
>>>>>>>>       1.371 ms    run7(org.graalvm.compiler.jtt.lang.Math_abs)
>>>>>>>>       1.024 ms    run0(org.graalvm.compiler.jtt.lang.Math_log)
>>>>>>>>          84 ms    run0(org.graalvm.compiler.jtt.lang.Math_sin)
>>>>>>>> 
>>>>>>>> All seems as expected.
>>>>>>>> 
>>>>>>>> I wanted to try it against JDK tip but https://bugs.openjdk.java.net/browse/JDK-8209301 means GitHub Graal cannot be compiled against the latest JDK:
>>>>>>>> 
>>>>>>>> /Users/dsimon/graal/graal/compiler/src/jdk.tools.jaotc/src/jdk/tools/jaotc/AOTCompiledClass.java:77: error: cannot find symbol
>>>>>>>>             this.metadataName = type.isAnonymous() ? "anon<" + classId + ">" : type.getName();
>>>>>>>>                                     ^
>>>>>>>>   symbol:   method isAnonymous()
>>>>>>>>   location: variable type of type HotSpotResolvedObjectType
>>>>>>>> 1 error
>>>>>>>> 
>>>>>>>> To fix this, AOTCompiledClass.java will either have to use reflection to access HotSpotResolvedObjectType.isAnonymous[Unsafe]Anonymous or a jdk12 versioned project will have to be added.
>>>>>>> 
>>>>>>> It looks like they forgot to rename getHostClass(), so we could replace uses of isAnonymous() with getHostClass() != null.  I think I added isAnonymous() first and then getHostClass() was added later.
>>>>>> I assume `getHostClass() != null` is more expensive than `isAnonymous()` but it probably doesn't matter here. Either way, this will have to be resolved before the next Graal sync.
>>>>> 
>>>>> Doug, could you please point to the bug id this issue is going to be tracked by.
>>>> I don't have a bug id for this issue - feel free to open one and assign it to me.
>>>> I left a note pointing out the Graal compilation issue along with Dean's recommended fix:
>>>> https://bugs.openjdk.java.net/browse/JDK-8209301?focusedCommentId=14208481&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14208481
>>>> -Doug
>>>>>> -Doug
>>>>>>>> -Doug
>>>>>>>> 
>>>>>>>>> On 29 Aug 2018, at 10:08, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>>>>>>>> 
>>>>>>>>> I meant org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_sqrt
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 8/29/18 1:05 AM, Ekaterina Pavlova wrote:
>>>>>>>>>> org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_log always were more than 10 times slower than other org.graalvm.compiler.jtt.lang.Math tests. But I agree, lets wait what Graal team will say regarding this slowness.
>>>>>>>>>> thanks,
>>>>>>>>>> -katya
>>>>>>>>>> On 8/28/18 3:30 PM, Vladimir Kozlov wrote:
>>>>>>>>>>> Before increase timeout Labs should look on this test and investigate it strange behavior - last iteration takes long time:
>>>>>>>>>>> 
>>>>>>>>>>>    run5: Passed 228.9 ms
>>>>>>>>>>>    run6: Passed 145.7 ms
>>>>>>>>>>>    run7: Passed 833395.5 ms
>>>>>>>>>>> org.graalvm.compiler.jtt.lang.Math_log finished 836029.5 ms
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Vladimir
>>>>>>>>>>> 
>>>>>>>>>>> On 8/27/18 9:03 AM, Ekaterina Pavlova wrote:
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>> 
>>>>>>>>>>>> compiler/graalunit/JttLangMathALTest.java continues to timeout on slow machines.
>>>>>>>>>>>> Increased default timeout (120 seconds) in twice. Please review.
>>>>>>>>>>>> 
>>>>>>>>>>>>      JBS: https://bugs.openjdk.java.net/browse/JDK-8208100
>>>>>>>>>>>>   webrev: http://cr.openjdk.java.net/~epavlova/8208100/webrev.00/index.html
>>>>>>>>>>>> testing: Tested by running the test 10 times on all platforms.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> thanks,
>>>>>>>>>>>> -katya
>>> 
> 


From gromero at linux.vnet.ibm.com  Wed Sep  5 22:18:27 2018
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Wed, 5 Sep 2018 19:18:27 -0300
Subject: RFR (S): 8209972: [GRAAL] Don't run RTM tests with Graal
In-Reply-To: <9e7a0c2b-47c0-45b7-a394-a88849cfa3a1@oracle.com>
References: <9148c105-1bd8-0852-9130-6b3c7823330e@linux.vnet.ibm.com>
 <3ae68161-0f78-2766-655f-fe3fe4a4719f@oracle.com>
 <ebccc7d7-aa73-9f00-7b91-379ff5cf1a84@linux.vnet.ibm.com>
 <9e7a0c2b-47c0-45b7-a394-a88849cfa3a1@oracle.com>
Message-ID: <15d5f2c1-96ce-c379-233f-e71218a63e1e@linux.vnet.ibm.com>

Hi Vladimir,

On 09/04/2018 03:40 PM, Vladimir Kozlov wrote:
> Thank you Gustavo for detailed answer.
> 
> I agree with your suggestion to keep vm.rtm.cpu and remove vm.rtm.os. Changes in VMProps.java are good now.

Thanks for reviewing it!


> About property check (flavor == "server" & !emulatedClient) in tests. If C2 (and Graal in a future when it supports RTM) is used, this check is always true because C2 (and Graal) is only available in Server VM and it is disabled in emulatedClient setting. This check was added before we have C2 usage check which can be done now with vm.rtm.compiler.

Thanks, I was not aware of it. I've updated the webrev removing
"flavor == "server" & !emulatedClient":

http://cr.openjdk.java.net/~gromero/8209972/v3/

"hg diff --patience":

http://cr.openjdk.java.net/~gromero/8209972/v3/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v3.diff

Testing (on Linux):

** X86_64 w/ CPU+OS RTM support + Graal VM **
Test results: no tests selected (all RTM tests skipped)

** POWER8 w/ CPU+OS support **
Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
Passed: compiler/rtm/locking/TestRTMAbortRatio.java
Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
Passed: compiler/rtm/locking/TestRTMRetryCount.java
Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
Passed: compiler/rtm/locking/TestUseRTMDeopt.java
Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
Test results: passed: 30

** X86_64 w/ CPU+OS support **
Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
Passed: compiler/rtm/locking/TestRTMAbortRatio.java
Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
Passed: compiler/rtm/locking/TestRTMRetryCount.java
Passed: compiler/rtm/locking/TestUseRTMDeopt.java
Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
Test results: passed: 30

** POWER7 wo/ CPU+OS RTM support **
Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
Test results: passed: 10

** POWER9 NV CPU w/ CPU RTM support and OS wo/ RTM support **
Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
Test results: passed: 10


Best regards,
Gustavo
  
> Thanks,
> Vladimir
> 
> On 9/3/18 3:15 PM, Gustavo Romero wrote:
>> Hi Vladimir,
>>
>> Thanks a lot for reviewing it and for your comments.
>>
>> On 08/31/2018 03:12 PM, Vladimir Kozlov wrote:
>>> Hi Gustavo,
>>>
>>> I think you should replace !Compiler.isGraalEnabled() with Compiler.isC2Enabled() because C2 may be switched off with TieredStopAtLevel < 4 flag
>>
>> Yes, although currently afaics all tests will explicitly enabled C2 (for
>> instance, passing -Xcomp and -XX:-TieredCompilation) or implicitly enable C2
>> through a warming up before testing, I agree that nothing forbids one to
>> switch off C2 with TieredStopAtLevel = level < 4 for a test case. It also
>> looks better to list explicitly which compilers do support RTM instead of
>> the ones that don't support it.
>>
>> I've updated the webrev accordingly:
>>
>> http://cr.openjdk.java.net/~gromero/8209972/v2/
>>
>> diff in there looks odd so I generated another one with --patience for a
>> better (IMO) diff format:
>>
>> http://cr.openjdk.java.net/~gromero/8209972/v2/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v2.diff
>>
>>
>>> Also can platforms check be replaced with one vmRTMCPU()? Is ppc64 return true from vmRTMCPU()?
>>
>> For example, on Linux the following cases are possible regarding CPU / OS
>> RTM support:
>>
>> POWER7   : cpu = false, os = false         => vm.rtm.cpu = false
>> POWER8   : cpu = true,  os = false | true  => vm.rtm.cpu = false | true
>> POWER9 VM: cpu = true,  os = false | true  => vm.rtm.cpu = false | true
>> POWER9 NV: cpu = true,  os = false         => vm.rtm.cpu = false
>>
>> PPC64 will return 'true' for vmRTMCPU() _only_ if both CPU and OS support
>> RTM. In other words, when @requires asks for "vm.rtm.cpu & vm.rtm.os" it
>> really looks like a tautology because if "vm.rtm.cpu" is 'true' it implies
>> "vm.rtm.os" being 'true' as well, otherwise the JVM would never advertise
>> the "rtm" feature (which is used by "vm.rtm.cpu"). That seems true for
>> Linux and for AIX.
>>
>> That said I don't think that the platforms check can be replaced with one
>> vmRTMCPU(), because in some cases it's necessary to run a test for
>> cpu = false and compiler = true, i.e. it's necessary to run a test on an
>> unsupported CPU for a given platform _only if_ the compiler in use supports
>> RTM (like C2). So if, for instance, we do:
>>
>> 'vm.rtm.compiler = C2 && vmRTMCPU()' and use 'vm.rtm.compiler' in @requires
>> we "tie" CPU+OS RTM support to compiler RTM support and the evaluation
>> returns 'false' for cpu = false and compiler = true, skipping the test
>> (vm.rtm.compiler = false). On the other hand, to evaluate 'vm.rtm.compiler'
>> as 'true' and run the test in that case one could match for
>> '!vm.rtm.compiler', but if compiler = false (Graal) '!vm.rtm.compiler' will
>> be evaluated as 'true' and the test will run even thought the Graal
>> compiler is selected, which is wrong.
>>
>> Hence 'vm.rtm.compiler' property must not rely on vmRTMCPU() and must
>> contain its own list of supported compilers with RTM support for each
>> platform IMO. Basically we can't ask the JVM about the compiler's support
>> for RTM, since the JVM can only tell us about the CPU+OS support for RTM
>> regarding the CPU and OS in which the JVM is running on.
>>
>>
>>> And since you check cpu in here why not to replace all vm.rtm.* with one vm.rtm.supported? In such case you would need only one @requires checks in tests instead of:
>>>
>>> vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os & vm.rtm.compiler
>>
>> I think it's not possible either. Currently there are 5 match cases in
>> RTM tests:
>>
>> gromero at moog:~/hg/jdk/jdk/test/hotspot/jtreg/compiler/rtm$ fgrep @require -RIn | cut -d ' ' -f 2-40 | sort -u
>> * @requires !(vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os)
>> * @requires vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os
>> * @requires (!vm.rtm.cpu) & (vm.flavor == "server" & !vm.emulatedClient)
>> * @requires vm.rtm.cpu & !(vm.flavor == "server" & !vm.emulatedClient)
>>
>> which can be simplified 5 cases as:
>>
>> 1:          !(flavor == "server" & !emulatedClient  & cpu & os)
>> 2:            flavor == "server" & !emulatedClient  & cpu & os
>> 3: (!cpu) &  (flavor == "server" & !emulatedClient)
>> 4:   cpu  & !(flavor == "server" & !emulatedClient)
>> 5: no @requires
>>
>> I understand that case 1 and 2 (since CPU implies OS) can be simplified as:
>>
>>
>> 1:          !(flavor == "server" & !emulatedClient  & cpu)
>> 2:            flavor == "server" & !emulatedClient  & cpu
>> 3: (!cpu) &  (flavor == "server" & !emulatedClient)
>> 4:   cpu  & !(flavor == "server" & !emulatedClient)
>> 5: no @requires
>>
>> and case 1 and 2 are mere opposites, so we have 4 cases:
>>
>> 1:          !(flavor == "server" & !emulatedClient  & cpu)
>> 3: (!cpu) &  (flavor == "server" & !emulatedClient)
>> 4:   cpu  & !(flavor == "server" & !emulatedClient)
>> 5: no @requires
>>
>> We could simplify further making P = (flavor == "server" & !emulatedClient),
>> and make:
>>
>> 1:          !(P & cpu)
>> 3: (!cpu) &  (P)
>> 4:   cpu  & !(P)
>> 5: no @requires
>>
>> So if we add a compiler = C2 && (x64 | PPC) property to each of them in
>> order to control running the tests only if the selected compiler on a
>> given platform has RTM support (skipping Graal, for instance):
>>
>> 1:          !(P & cpu) & compiler
>> 3: (!cpu) &  (P)       & compiler
>> 4:   cpu  & !(P)       & compiler
>> 5: no @requires        & compiler
>>
>> So it looks like that at minimum we would need 3 properties, but IMO it's
>> not worth to add another property P = (flavor == "server" & !emulatedClient)
>> just to simplify further the @requires line.
>>
>> In summing up, I think it's only possible to replace 'cpu & os' by 'cpu',
>> so I updated the webrev removing the vm.rtm.os property and keeping only
>> vm.rtm.cpu and vm.rtm.compiler (plus flavor and emulatedClient checks).
>>
>> I've tested the following scenarios and observed no regression [1]:
>>
>> 1. X86_64 w/ RTM
>> 2. X86_64 w/ RTM + Graal enabled
>> 3. POWER7: no CPU+OS support for RTM
>> 4. POWER8: CPU+OS support for RTM
>>
>> But I think we need a confirmation from SAP about AIX.
>>
>>
>> Best regards,
>> Gustavo
>>
>> [1]
>>
>> ** X86_64 w/ RTM **
>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>> Test results: passed: 30
>>
>>
>> ** X86_64 w/ RTM + Graal enabled **
>> Test results: no tests selected (all RTM tests skipped)
>>
>>
>> ** POWER7: no CPU+OS support for RTM **
>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>> Test results: passed: 10
>>
>>
>> ** POWER8: CPU+OS support for RTM **
>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>> Test results: passed: 30
>>
>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 8/31/18 8:38 AM, Gustavo Romero wrote:
>>>> Hi,
>>>>
>>>> Could the following small change be reviewed please?
>>>>
>>>> Bug   : https://bugs.openjdk.java.net/browse/JDK-8209972
>>>> Webrev: http://cr.openjdk.java.net/~gromero/8209972/v1/
>>>>
>>>> It disables all RTM tests when a compiler not supporting RTM (e.g. Graal)
>>>> is selected on platforms that can have CPU/OS with RTM support.
>>>>
>>>> It also disables all RTM tests for any other platform that has not a single
>>>> compiler supporting RTM.
>>>>
>>>> The RTM support was first added to C2 compiler and once checkers for RTM
>>>> (notably vm.rtm.cpu) find the feature "rtm" advertised by the JVM they
>>>> assume that a compiler supporting RTM is available for sure ("rtm" is
>>>> advertised only if RTM is supported by both CPU and OS). Later the JVM
>>>> began to allow the selection of a compiler different from C2, like Graal,
>>>> and it became possible to select a compiler without RTM support despite the
>>>> fact that both the CPU and the OS support RTM. Thus for platforms
>>>> supporting Graal or any other specific compiler the compiler availability for
>>>> the RTM tests must be adjusted and if the selected compiler does not
>>>> support RTM then all RTM tests must be skipped, including the ones meant
>>>> for platforms without CPU or OS RTM support (e.g. *Unsupported*.java)
>>>> because in some cases, like in TestUseRTMLockingOptionOnUnsupportedCPU.java,
>>>> the test expects JVM initialization errors that will never occur because the
>>>> problem is not that the RTM support for CPU or OS is missing, but rather
>>>> because the selected compiler does not support RTM.
>>>>
>>>> That change adds a new VM property 'vm.rtm.compiler' which can be used to
>>>> filter out compilers without RTM support for specific platforms and adapts
>>>> the current RTM tests to use that new property.
>>>>
>>>> Nothing changes regarding the number of passing/selected tests for the
>>>> various cpu/os/compiler combinations on platforms that currently might
>>>> support RTM [1], except when Graal is in use.
>>>>
>>>> Thank you.
>>>>
>>>> Best regards,
>>>> Gustavo
>>>>
>>>>
>>>> [1]
>>>>
>>>> ** X64 w/ CPU and OS supporting RTM **
>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>> Test results: passed: 30
>>>>
>>>> ** X64 w/ CPU and OS supporting RTM + Graal compiler wo/ RTM support **
>>>> Test results: no tests selected (all RTM tests skipped)
>>>>
>>>> ** POWER8 w/ CPU and OS supporting RTM **
>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>> Test results: passed: 30
>>>>
>>>> ** POWER7 wo/ CPU and OS supporting RTM **
>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>> Test results: passed: 10
>>>>
>>>
>>
> 


From vladimir.kozlov at oracle.com  Wed Sep  5 22:54:32 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 5 Sep 2018 15:54:32 -0700
Subject: RFR (S): 8209972: [GRAAL] Don't run RTM tests with Graal
In-Reply-To: <15d5f2c1-96ce-c379-233f-e71218a63e1e@linux.vnet.ibm.com>
References: <9148c105-1bd8-0852-9130-6b3c7823330e@linux.vnet.ibm.com>
 <3ae68161-0f78-2766-655f-fe3fe4a4719f@oracle.com>
 <ebccc7d7-aa73-9f00-7b91-379ff5cf1a84@linux.vnet.ibm.com>
 <9e7a0c2b-47c0-45b7-a394-a88849cfa3a1@oracle.com>
 <15d5f2c1-96ce-c379-233f-e71218a63e1e@linux.vnet.ibm.com>
Message-ID: <2e905821-4525-e210-5cc5-50a6f09a39f6@oracle.com>

v3 looks good.

Thanks,
Vladimir

On 9/5/18 3:18 PM, Gustavo Romero wrote:
> Hi Vladimir,
> 
> On 09/04/2018 03:40 PM, Vladimir Kozlov wrote:
>> Thank you Gustavo for detailed answer.
>>
>> I agree with your suggestion to keep vm.rtm.cpu and remove vm.rtm.os. Changes in VMProps.java are good now.
> 
> Thanks for reviewing it!
> 
> 
>> About property check (flavor == "server" & !emulatedClient) in tests. If C2 (and Graal in a future when it supports 
>> RTM) is used, this check is always true because C2 (and Graal) is only available in Server VM and it is disabled in 
>> emulatedClient setting. This check was added before we have C2 usage check which can be done now with vm.rtm.compiler.
> 
> Thanks, I was not aware of it. I've updated the webrev removing
> "flavor == "server" & !emulatedClient":
> 
> http://cr.openjdk.java.net/~gromero/8209972/v3/
> 
> "hg diff --patience":
> 
> http://cr.openjdk.java.net/~gromero/8209972/v3/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v3.diff
> 
> Testing (on Linux):
> 
> ** X86_64 w/ CPU+OS RTM support + Graal VM **
> Test results: no tests selected (all RTM tests skipped)
> 
> ** POWER8 w/ CPU+OS support **
> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
> Passed: compiler/rtm/locking/TestRTMRetryCount.java
> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
> Test results: passed: 30
> 
> ** X86_64 w/ CPU+OS support **
> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
> Passed: compiler/rtm/locking/TestRTMRetryCount.java
> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
> Test results: passed: 30
> 
> ** POWER7 wo/ CPU+OS RTM support **
> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
> Test results: passed: 10
> 
> ** POWER9 NV CPU w/ CPU RTM support and OS wo/ RTM support **
> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
> Test results: passed: 10
> 
> 
> Best regards,
> Gustavo
> 
>> Thanks,
>> Vladimir
>>
>> On 9/3/18 3:15 PM, Gustavo Romero wrote:
>>> Hi Vladimir,
>>>
>>> Thanks a lot for reviewing it and for your comments.
>>>
>>> On 08/31/2018 03:12 PM, Vladimir Kozlov wrote:
>>>> Hi Gustavo,
>>>>
>>>> I think you should replace !Compiler.isGraalEnabled() with Compiler.isC2Enabled() because C2 may be switched off 
>>>> with TieredStopAtLevel < 4 flag
>>>
>>> Yes, although currently afaics all tests will explicitly enabled C2 (for
>>> instance, passing -Xcomp and -XX:-TieredCompilation) or implicitly enable C2
>>> through a warming up before testing, I agree that nothing forbids one to
>>> switch off C2 with TieredStopAtLevel = level < 4 for a test case. It also
>>> looks better to list explicitly which compilers do support RTM instead of
>>> the ones that don't support it.
>>>
>>> I've updated the webrev accordingly:
>>>
>>> http://cr.openjdk.java.net/~gromero/8209972/v2/
>>>
>>> diff in there looks odd so I generated another one with --patience for a
>>> better (IMO) diff format:
>>>
>>> http://cr.openjdk.java.net/~gromero/8209972/v2/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v2.diff
>>>
>>>
>>>> Also can platforms check be replaced with one vmRTMCPU()? Is ppc64 return true from vmRTMCPU()?
>>>
>>> For example, on Linux the following cases are possible regarding CPU / OS
>>> RTM support:
>>>
>>> POWER7?? : cpu = false, os = false???????? => vm.rtm.cpu = false
>>> POWER8?? : cpu = true,? os = false | true? => vm.rtm.cpu = false | true
>>> POWER9 VM: cpu = true,? os = false | true? => vm.rtm.cpu = false | true
>>> POWER9 NV: cpu = true,? os = false???????? => vm.rtm.cpu = false
>>>
>>> PPC64 will return 'true' for vmRTMCPU() _only_ if both CPU and OS support
>>> RTM. In other words, when @requires asks for "vm.rtm.cpu & vm.rtm.os" it
>>> really looks like a tautology because if "vm.rtm.cpu" is 'true' it implies
>>> "vm.rtm.os" being 'true' as well, otherwise the JVM would never advertise
>>> the "rtm" feature (which is used by "vm.rtm.cpu"). That seems true for
>>> Linux and for AIX.
>>>
>>> That said I don't think that the platforms check can be replaced with one
>>> vmRTMCPU(), because in some cases it's necessary to run a test for
>>> cpu = false and compiler = true, i.e. it's necessary to run a test on an
>>> unsupported CPU for a given platform _only if_ the compiler in use supports
>>> RTM (like C2). So if, for instance, we do:
>>>
>>> 'vm.rtm.compiler = C2 && vmRTMCPU()' and use 'vm.rtm.compiler' in @requires
>>> we "tie" CPU+OS RTM support to compiler RTM support and the evaluation
>>> returns 'false' for cpu = false and compiler = true, skipping the test
>>> (vm.rtm.compiler = false). On the other hand, to evaluate 'vm.rtm.compiler'
>>> as 'true' and run the test in that case one could match for
>>> '!vm.rtm.compiler', but if compiler = false (Graal) '!vm.rtm.compiler' will
>>> be evaluated as 'true' and the test will run even thought the Graal
>>> compiler is selected, which is wrong.
>>>
>>> Hence 'vm.rtm.compiler' property must not rely on vmRTMCPU() and must
>>> contain its own list of supported compilers with RTM support for each
>>> platform IMO. Basically we can't ask the JVM about the compiler's support
>>> for RTM, since the JVM can only tell us about the CPU+OS support for RTM
>>> regarding the CPU and OS in which the JVM is running on.
>>>
>>>
>>>> And since you check cpu in here why not to replace all vm.rtm.* with one vm.rtm.supported? In such case you would 
>>>> need only one @requires checks in tests instead of:
>>>>
>>>> vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os & vm.rtm.compiler
>>>
>>> I think it's not possible either. Currently there are 5 match cases in
>>> RTM tests:
>>>
>>> gromero at moog:~/hg/jdk/jdk/test/hotspot/jtreg/compiler/rtm$ fgrep @require -RIn | cut -d ' ' -f 2-40 | sort -u
>>> * @requires !(vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os)
>>> * @requires vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os
>>> * @requires (!vm.rtm.cpu) & (vm.flavor == "server" & !vm.emulatedClient)
>>> * @requires vm.rtm.cpu & !(vm.flavor == "server" & !vm.emulatedClient)
>>>
>>> which can be simplified 5 cases as:
>>>
>>> 1:????????? !(flavor == "server" & !emulatedClient? & cpu & os)
>>> 2:??????????? flavor == "server" & !emulatedClient? & cpu & os
>>> 3: (!cpu) &? (flavor == "server" & !emulatedClient)
>>> 4:?? cpu? & !(flavor == "server" & !emulatedClient)
>>> 5: no @requires
>>>
>>> I understand that case 1 and 2 (since CPU implies OS) can be simplified as:
>>>
>>>
>>> 1:????????? !(flavor == "server" & !emulatedClient? & cpu)
>>> 2:??????????? flavor == "server" & !emulatedClient? & cpu
>>> 3: (!cpu) &? (flavor == "server" & !emulatedClient)
>>> 4:?? cpu? & !(flavor == "server" & !emulatedClient)
>>> 5: no @requires
>>>
>>> and case 1 and 2 are mere opposites, so we have 4 cases:
>>>
>>> 1:????????? !(flavor == "server" & !emulatedClient? & cpu)
>>> 3: (!cpu) &? (flavor == "server" & !emulatedClient)
>>> 4:?? cpu? & !(flavor == "server" & !emulatedClient)
>>> 5: no @requires
>>>
>>> We could simplify further making P = (flavor == "server" & !emulatedClient),
>>> and make:
>>>
>>> 1:????????? !(P & cpu)
>>> 3: (!cpu) &? (P)
>>> 4:?? cpu? & !(P)
>>> 5: no @requires
>>>
>>> So if we add a compiler = C2 && (x64 | PPC) property to each of them in
>>> order to control running the tests only if the selected compiler on a
>>> given platform has RTM support (skipping Graal, for instance):
>>>
>>> 1:????????? !(P & cpu) & compiler
>>> 3: (!cpu) &? (P)?????? & compiler
>>> 4:?? cpu? & !(P)?????? & compiler
>>> 5: no @requires??????? & compiler
>>>
>>> So it looks like that at minimum we would need 3 properties, but IMO it's
>>> not worth to add another property P = (flavor == "server" & !emulatedClient)
>>> just to simplify further the @requires line.
>>>
>>> In summing up, I think it's only possible to replace 'cpu & os' by 'cpu',
>>> so I updated the webrev removing the vm.rtm.os property and keeping only
>>> vm.rtm.cpu and vm.rtm.compiler (plus flavor and emulatedClient checks).
>>>
>>> I've tested the following scenarios and observed no regression [1]:
>>>
>>> 1. X86_64 w/ RTM
>>> 2. X86_64 w/ RTM + Graal enabled
>>> 3. POWER7: no CPU+OS support for RTM
>>> 4. POWER8: CPU+OS support for RTM
>>>
>>> But I think we need a confirmation from SAP about AIX.
>>>
>>>
>>> Best regards,
>>> Gustavo
>>>
>>> [1]
>>>
>>> ** X86_64 w/ RTM **
>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>> Test results: passed: 30
>>>
>>>
>>> ** X86_64 w/ RTM + Graal enabled **
>>> Test results: no tests selected (all RTM tests skipped)
>>>
>>>
>>> ** POWER7: no CPU+OS support for RTM **
>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>> Test results: passed: 10
>>>
>>>
>>> ** POWER8: CPU+OS support for RTM **
>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>> Test results: passed: 30
>>>
>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 8/31/18 8:38 AM, Gustavo Romero wrote:
>>>>> Hi,
>>>>>
>>>>> Could the following small change be reviewed please?
>>>>>
>>>>> Bug?? : https://bugs.openjdk.java.net/browse/JDK-8209972
>>>>> Webrev: http://cr.openjdk.java.net/~gromero/8209972/v1/
>>>>>
>>>>> It disables all RTM tests when a compiler not supporting RTM (e.g. Graal)
>>>>> is selected on platforms that can have CPU/OS with RTM support.
>>>>>
>>>>> It also disables all RTM tests for any other platform that has not a single
>>>>> compiler supporting RTM.
>>>>>
>>>>> The RTM support was first added to C2 compiler and once checkers for RTM
>>>>> (notably vm.rtm.cpu) find the feature "rtm" advertised by the JVM they
>>>>> assume that a compiler supporting RTM is available for sure ("rtm" is
>>>>> advertised only if RTM is supported by both CPU and OS). Later the JVM
>>>>> began to allow the selection of a compiler different from C2, like Graal,
>>>>> and it became possible to select a compiler without RTM support despite the
>>>>> fact that both the CPU and the OS support RTM. Thus for platforms
>>>>> supporting Graal or any other specific compiler the compiler availability for
>>>>> the RTM tests must be adjusted and if the selected compiler does not
>>>>> support RTM then all RTM tests must be skipped, including the ones meant
>>>>> for platforms without CPU or OS RTM support (e.g. *Unsupported*.java)
>>>>> because in some cases, like in TestUseRTMLockingOptionOnUnsupportedCPU.java,
>>>>> the test expects JVM initialization errors that will never occur because the
>>>>> problem is not that the RTM support for CPU or OS is missing, but rather
>>>>> because the selected compiler does not support RTM.
>>>>>
>>>>> That change adds a new VM property 'vm.rtm.compiler' which can be used to
>>>>> filter out compilers without RTM support for specific platforms and adapts
>>>>> the current RTM tests to use that new property.
>>>>>
>>>>> Nothing changes regarding the number of passing/selected tests for the
>>>>> various cpu/os/compiler combinations on platforms that currently might
>>>>> support RTM [1], except when Graal is in use.
>>>>>
>>>>> Thank you.
>>>>>
>>>>> Best regards,
>>>>> Gustavo
>>>>>
>>>>>
>>>>> [1]
>>>>>
>>>>> ** X64 w/ CPU and OS supporting RTM **
>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>>> Test results: passed: 30
>>>>>
>>>>> ** X64 w/ CPU and OS supporting RTM + Graal compiler wo/ RTM support **
>>>>> Test results: no tests selected (all RTM tests skipped)
>>>>>
>>>>> ** POWER8 w/ CPU and OS supporting RTM **
>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>>> Test results: passed: 30
>>>>>
>>>>> ** POWER7 wo/ CPU and OS supporting RTM **
>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>> Test results: passed: 10
>>>>>
>>>>
>>>
>>
> 

From sandhya.viswanathan at intel.com  Wed Sep  5 23:09:33 2018
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Wed, 5 Sep 2018 23:09:33 +0000
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>

Recently there have been couple of high priority issues with regards to high bank of XMM register (XMM16-XMM31) usage by C2:
https://bugs.openjdk.java.net/browse/JDK-8207746
https://bugs.openjdk.java.net/browse/JDK-8209735

Please find below a patch which attempts to clean up the XMM register handling by using register groups.
http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/

The patch provides a restricted set of registers to the match rules in the ad file based on the underlying architecture.
The aim is to remove special handling/workaround from macro assembler and assembler.
By removing the special handling, the patch reduces the overall code size by about 1800 lines of code.

Your review and feedback is very welcome.

Best Regards,
Sandhya


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180905/80b54a4c/attachment.html>

From gromero at linux.vnet.ibm.com  Wed Sep  5 23:53:07 2018
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Wed, 5 Sep 2018 20:53:07 -0300
Subject: RFR (S): 8209972: [GRAAL] Don't run RTM tests with Graal
In-Reply-To: <2e905821-4525-e210-5cc5-50a6f09a39f6@oracle.com>
References: <9148c105-1bd8-0852-9130-6b3c7823330e@linux.vnet.ibm.com>
 <3ae68161-0f78-2766-655f-fe3fe4a4719f@oracle.com>
 <ebccc7d7-aa73-9f00-7b91-379ff5cf1a84@linux.vnet.ibm.com>
 <9e7a0c2b-47c0-45b7-a394-a88849cfa3a1@oracle.com>
 <15d5f2c1-96ce-c379-233f-e71218a63e1e@linux.vnet.ibm.com>
 <2e905821-4525-e210-5cc5-50a6f09a39f6@oracle.com>
Message-ID: <c10ed3d2-b76a-4b52-c7ae-25bddb9ab721@linux.vnet.ibm.com>

On 09/05/2018 07:54 PM, Vladimir Kozlov wrote:
> v3 looks good.

Thanks a lot Vladimir.

@Goetz, would you mind to review v3 please? It touches code meant for AIX but
I don't expect any change in the end.

http://cr.openjdk.java.net/~gromero/8209972/v3/

Thank you.


Best regards,
Gustavo

> Thanks,
> Vladimir
> 
> On 9/5/18 3:18 PM, Gustavo Romero wrote:
>> Hi Vladimir,
>>
>> On 09/04/2018 03:40 PM, Vladimir Kozlov wrote:
>>> Thank you Gustavo for detailed answer.
>>>
>>> I agree with your suggestion to keep vm.rtm.cpu and remove vm.rtm.os. Changes in VMProps.java are good now.
>>
>> Thanks for reviewing it!
>>
>>
>>> About property check (flavor == "server" & !emulatedClient) in tests. If C2 (and Graal in a future when it supports RTM) is used, this check is always true because C2 (and Graal) is only available in Server VM and it is disabled in emulatedClient setting. This check was added before we have C2 usage check which can be done now with vm.rtm.compiler.
>>
>> Thanks, I was not aware of it. I've updated the webrev removing
>> "flavor == "server" & !emulatedClient":
>>
>> http://cr.openjdk.java.net/~gromero/8209972/v3/
>>
>> "hg diff --patience":
>>
>> http://cr.openjdk.java.net/~gromero/8209972/v3/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v3.diff
>>
>> Testing (on Linux):
>>
>> ** X86_64 w/ CPU+OS RTM support + Graal VM **
>> Test results: no tests selected (all RTM tests skipped)
>>
>> ** POWER8 w/ CPU+OS support **
>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>> Test results: passed: 30
>>
>> ** X86_64 w/ CPU+OS support **
>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>> Test results: passed: 30
>>
>> ** POWER7 wo/ CPU+OS RTM support **
>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>> Test results: passed: 10
>>
>> ** POWER9 NV CPU w/ CPU RTM support and OS wo/ RTM support **
>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>> Test results: passed: 10
>>
>>
>> Best regards,
>> Gustavo
>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/3/18 3:15 PM, Gustavo Romero wrote:
>>>> Hi Vladimir,
>>>>
>>>> Thanks a lot for reviewing it and for your comments.
>>>>
>>>> On 08/31/2018 03:12 PM, Vladimir Kozlov wrote:
>>>>> Hi Gustavo,
>>>>>
>>>>> I think you should replace !Compiler.isGraalEnabled() with Compiler.isC2Enabled() because C2 may be switched off with TieredStopAtLevel < 4 flag
>>>>
>>>> Yes, although currently afaics all tests will explicitly enabled C2 (for
>>>> instance, passing -Xcomp and -XX:-TieredCompilation) or implicitly enable C2
>>>> through a warming up before testing, I agree that nothing forbids one to
>>>> switch off C2 with TieredStopAtLevel = level < 4 for a test case. It also
>>>> looks better to list explicitly which compilers do support RTM instead of
>>>> the ones that don't support it.
>>>>
>>>> I've updated the webrev accordingly:
>>>>
>>>> http://cr.openjdk.java.net/~gromero/8209972/v2/
>>>>
>>>> diff in there looks odd so I generated another one with --patience for a
>>>> better (IMO) diff format:
>>>>
>>>> http://cr.openjdk.java.net/~gromero/8209972/v2/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v2.diff
>>>>
>>>>
>>>>> Also can platforms check be replaced with one vmRTMCPU()? Is ppc64 return true from vmRTMCPU()?
>>>>
>>>> For example, on Linux the following cases are possible regarding CPU / OS
>>>> RTM support:
>>>>
>>>> POWER7   : cpu = false, os = false         => vm.rtm.cpu = false
>>>> POWER8   : cpu = true,  os = false | true  => vm.rtm.cpu = false | true
>>>> POWER9 VM: cpu = true,  os = false | true  => vm.rtm.cpu = false | true
>>>> POWER9 NV: cpu = true,  os = false         => vm.rtm.cpu = false
>>>>
>>>> PPC64 will return 'true' for vmRTMCPU() _only_ if both CPU and OS support
>>>> RTM. In other words, when @requires asks for "vm.rtm.cpu & vm.rtm.os" it
>>>> really looks like a tautology because if "vm.rtm.cpu" is 'true' it implies
>>>> "vm.rtm.os" being 'true' as well, otherwise the JVM would never advertise
>>>> the "rtm" feature (which is used by "vm.rtm.cpu"). That seems true for
>>>> Linux and for AIX.
>>>>
>>>> That said I don't think that the platforms check can be replaced with one
>>>> vmRTMCPU(), because in some cases it's necessary to run a test for
>>>> cpu = false and compiler = true, i.e. it's necessary to run a test on an
>>>> unsupported CPU for a given platform _only if_ the compiler in use supports
>>>> RTM (like C2). So if, for instance, we do:
>>>>
>>>> 'vm.rtm.compiler = C2 && vmRTMCPU()' and use 'vm.rtm.compiler' in @requires
>>>> we "tie" CPU+OS RTM support to compiler RTM support and the evaluation
>>>> returns 'false' for cpu = false and compiler = true, skipping the test
>>>> (vm.rtm.compiler = false). On the other hand, to evaluate 'vm.rtm.compiler'
>>>> as 'true' and run the test in that case one could match for
>>>> '!vm.rtm.compiler', but if compiler = false (Graal) '!vm.rtm.compiler' will
>>>> be evaluated as 'true' and the test will run even thought the Graal
>>>> compiler is selected, which is wrong.
>>>>
>>>> Hence 'vm.rtm.compiler' property must not rely on vmRTMCPU() and must
>>>> contain its own list of supported compilers with RTM support for each
>>>> platform IMO. Basically we can't ask the JVM about the compiler's support
>>>> for RTM, since the JVM can only tell us about the CPU+OS support for RTM
>>>> regarding the CPU and OS in which the JVM is running on.
>>>>
>>>>
>>>>> And since you check cpu in here why not to replace all vm.rtm.* with one vm.rtm.supported? In such case you would need only one @requires checks in tests instead of:
>>>>>
>>>>> vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os & vm.rtm.compiler
>>>>
>>>> I think it's not possible either. Currently there are 5 match cases in
>>>> RTM tests:
>>>>
>>>> gromero at moog:~/hg/jdk/jdk/test/hotspot/jtreg/compiler/rtm$ fgrep @require -RIn | cut -d ' ' -f 2-40 | sort -u
>>>> * @requires !(vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os)
>>>> * @requires vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os
>>>> * @requires (!vm.rtm.cpu) & (vm.flavor == "server" & !vm.emulatedClient)
>>>> * @requires vm.rtm.cpu & !(vm.flavor == "server" & !vm.emulatedClient)
>>>>
>>>> which can be simplified 5 cases as:
>>>>
>>>> 1:          !(flavor == "server" & !emulatedClient  & cpu & os)
>>>> 2:            flavor == "server" & !emulatedClient  & cpu & os
>>>> 3: (!cpu) &  (flavor == "server" & !emulatedClient)
>>>> 4:   cpu  & !(flavor == "server" & !emulatedClient)
>>>> 5: no @requires
>>>>
>>>> I understand that case 1 and 2 (since CPU implies OS) can be simplified as:
>>>>
>>>>
>>>> 1:          !(flavor == "server" & !emulatedClient  & cpu)
>>>> 2:            flavor == "server" & !emulatedClient  & cpu
>>>> 3: (!cpu) &  (flavor == "server" & !emulatedClient)
>>>> 4:   cpu  & !(flavor == "server" & !emulatedClient)
>>>> 5: no @requires
>>>>
>>>> and case 1 and 2 are mere opposites, so we have 4 cases:
>>>>
>>>> 1:          !(flavor == "server" & !emulatedClient  & cpu)
>>>> 3: (!cpu) &  (flavor == "server" & !emulatedClient)
>>>> 4:   cpu  & !(flavor == "server" & !emulatedClient)
>>>> 5: no @requires
>>>>
>>>> We could simplify further making P = (flavor == "server" & !emulatedClient),
>>>> and make:
>>>>
>>>> 1:          !(P & cpu)
>>>> 3: (!cpu) &  (P)
>>>> 4:   cpu  & !(P)
>>>> 5: no @requires
>>>>
>>>> So if we add a compiler = C2 && (x64 | PPC) property to each of them in
>>>> order to control running the tests only if the selected compiler on a
>>>> given platform has RTM support (skipping Graal, for instance):
>>>>
>>>> 1:          !(P & cpu) & compiler
>>>> 3: (!cpu) &  (P)       & compiler
>>>> 4:   cpu  & !(P)       & compiler
>>>> 5: no @requires        & compiler
>>>>
>>>> So it looks like that at minimum we would need 3 properties, but IMO it's
>>>> not worth to add another property P = (flavor == "server" & !emulatedClient)
>>>> just to simplify further the @requires line.
>>>>
>>>> In summing up, I think it's only possible to replace 'cpu & os' by 'cpu',
>>>> so I updated the webrev removing the vm.rtm.os property and keeping only
>>>> vm.rtm.cpu and vm.rtm.compiler (plus flavor and emulatedClient checks).
>>>>
>>>> I've tested the following scenarios and observed no regression [1]:
>>>>
>>>> 1. X86_64 w/ RTM
>>>> 2. X86_64 w/ RTM + Graal enabled
>>>> 3. POWER7: no CPU+OS support for RTM
>>>> 4. POWER8: CPU+OS support for RTM
>>>>
>>>> But I think we need a confirmation from SAP about AIX.
>>>>
>>>>
>>>> Best regards,
>>>> Gustavo
>>>>
>>>> [1]
>>>>
>>>> ** X86_64 w/ RTM **
>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>> Test results: passed: 30
>>>>
>>>>
>>>> ** X86_64 w/ RTM + Graal enabled **
>>>> Test results: no tests selected (all RTM tests skipped)
>>>>
>>>>
>>>> ** POWER7: no CPU+OS support for RTM **
>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>> Test results: passed: 10
>>>>
>>>>
>>>> ** POWER8: CPU+OS support for RTM **
>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>> Test results: passed: 30
>>>>
>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 8/31/18 8:38 AM, Gustavo Romero wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Could the following small change be reviewed please?
>>>>>>
>>>>>> Bug   : https://bugs.openjdk.java.net/browse/JDK-8209972
>>>>>> Webrev: http://cr.openjdk.java.net/~gromero/8209972/v1/
>>>>>>
>>>>>> It disables all RTM tests when a compiler not supporting RTM (e.g. Graal)
>>>>>> is selected on platforms that can have CPU/OS with RTM support.
>>>>>>
>>>>>> It also disables all RTM tests for any other platform that has not a single
>>>>>> compiler supporting RTM.
>>>>>>
>>>>>> The RTM support was first added to C2 compiler and once checkers for RTM
>>>>>> (notably vm.rtm.cpu) find the feature "rtm" advertised by the JVM they
>>>>>> assume that a compiler supporting RTM is available for sure ("rtm" is
>>>>>> advertised only if RTM is supported by both CPU and OS). Later the JVM
>>>>>> began to allow the selection of a compiler different from C2, like Graal,
>>>>>> and it became possible to select a compiler without RTM support despite the
>>>>>> fact that both the CPU and the OS support RTM. Thus for platforms
>>>>>> supporting Graal or any other specific compiler the compiler availability for
>>>>>> the RTM tests must be adjusted and if the selected compiler does not
>>>>>> support RTM then all RTM tests must be skipped, including the ones meant
>>>>>> for platforms without CPU or OS RTM support (e.g. *Unsupported*.java)
>>>>>> because in some cases, like in TestUseRTMLockingOptionOnUnsupportedCPU.java,
>>>>>> the test expects JVM initialization errors that will never occur because the
>>>>>> problem is not that the RTM support for CPU or OS is missing, but rather
>>>>>> because the selected compiler does not support RTM.
>>>>>>
>>>>>> That change adds a new VM property 'vm.rtm.compiler' which can be used to
>>>>>> filter out compilers without RTM support for specific platforms and adapts
>>>>>> the current RTM tests to use that new property.
>>>>>>
>>>>>> Nothing changes regarding the number of passing/selected tests for the
>>>>>> various cpu/os/compiler combinations on platforms that currently might
>>>>>> support RTM [1], except when Graal is in use.
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> Best regards,
>>>>>> Gustavo
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>> ** X64 w/ CPU and OS supporting RTM **
>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>>>> Test results: passed: 30
>>>>>>
>>>>>> ** X64 w/ CPU and OS supporting RTM + Graal compiler wo/ RTM support **
>>>>>> Test results: no tests selected (all RTM tests skipped)
>>>>>>
>>>>>> ** POWER8 w/ CPU and OS supporting RTM **
>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>>>> Test results: passed: 30
>>>>>>
>>>>>> ** POWER7 wo/ CPU and OS supporting RTM **
>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>>> Test results: passed: 10
>>>>>>
>>>>>
>>>>
>>>
>>
> 


From HORIE at jp.ibm.com  Thu Sep  6 03:27:34 2018
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Thu, 6 Sep 2018 12:27:34 +0900
Subject: RFR: 8208171: PPC64: Enrich SLP support
In-Reply-To: <a8132254-dd87-69e1-69b2-06c8a58565bf@linux.vnet.ibm.com>
References: <OF27BFB560.0558A39A-ON002582D5.000EEFEA-492582D5.001F7E2E@notes.na.collabserv.com>
 <b235720a-353b-c0e4-d494-eb337eba06a7@linux.vnet.ibm.com>
 <OFCAFD0563.FF8B85D0-ON002582D6.0019C339-492582D6.0019F896@notes.na.collabserv.com>
 <f9be32d4-556e-5833-1c4c-8fed133f26d8@linux.vnet.ibm.com>
 <346da54af45243c4bdaf475f118a450d@sap.com>
 <9553d65d98f74f37a35b49a1e39f015e@sap.com>
 <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com>
 <fc4310e4a8544c69b3d14fd593a065fc@sap.com>
 <OF1DDE5FDC.048940CF-ON002582FD.007F93C0-492582FE.001E65D8@notes.na.collabserv.com>
 <57ebd30a66504577a6b2ec267aee4b69@sap.com>
 <8a093a6e-631e-4c20-07dc-96bc29f8c411@linux.vnet.ibm.com>
 <OFD4AAA261.28219E95-ON002582FF.00371825-492582FF.00390865@notes.na.collabserv.com>
 <a8132254-dd87-69e1-69b2-06c8a58565bf@linux.vnet.ibm.com>
Message-ID: <OF3264BF03.B675D915-ON00258300.00076F89-49258300.00130124@notes.na.collabserv.com>

Hi Martin, Gustavo,

Thank you for giving the detailed discussions and narrowing down the
current issue on ppc64.

> We haven't seen any issues with the current code, but I think this is
affects jdk11, too. (We could also switch off SuperwordUseVSX for jdk11u.)
Do you agree?
Yes, I agree. Following is the latest webrev switching off SuperwordUseVSX
by default.
http://cr.openjdk.java.net/~mhorie/8208171/webrev.04/


Best regards,
--
Michihiro,
IBM Research - Tokyo


From:	Gustavo Romero <gromero at linux.vnet.ibm.com>
To:	Michihiro Horie/Japan/IBM at IBMJP
Cc:	"Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>, hotspot
            compiler <hotspot-compiler-dev at openjdk.java.net>, "Doerr,
            Martin" <martin.doerr at sap.com>
Date:	2018/09/06 03:34
Subject:	Re: RFR: 8208171: PPC64: Enrich SLP support


Hi Michi,

On 09/05/2018 07:22 AM, Michihiro Horie wrote:
> Hi Martin, Gustavo,
>
> I cannot still reproduce the problem. I noticed the machine I have is not
SUSE but OpenSUSE with 4.1.21-14-default. I've also tried kernel
4.4.0-31-generic but it's Ubuntu.
>
> Gustavo, is there any suspicious change before/after v4.4, which Martin
got the crash?

Nope, nothing I'm aware of... However looks like Martin found no issues
with
your last revision. Anyway, if you need a machine with SLES 12 SP3
installed
I have one that I can share. Drop me a Slack message if you need it.


Regards,
Gustavo

>
> Apart from the problem, I uploaded the latest webrev:<
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.03/>
> http://cr.openjdk.java.net/~mhorie/8208171/webrev.03/ <
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.03/>
>
>
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo
>
> Inactive hide details for Gustavo Romero ---2018/09/05 07:03:31---Hi
Martin and Michi, On 09/04/2018 01:20 PM, Doerr, Martin wrGustavo Romero
---2018/09/05 07:03:31---Hi Martin and Michi, On 09/04/2018 01:20 PM,
Doerr, Martin wrote:
>
> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> To: "Doerr, Martin" <martin.doerr at sap.com>, Michihiro
Horie/Japan/IBM at IBMJP
> Cc: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>, hotspot compiler
<hotspot-compiler-dev at openjdk.java.net>
> Date: 2018/09/05 07:03
> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>
>
>
> Hi Martin and Michi,
>
> On 09/04/2018 01:20 PM, Doerr, Martin wrote:
>  > Can you reproduce the test failures?
>  >
>  > The very same VM works fine on a different Power8 machine which uses
the same instructions by C2.
>  >
>  > The VM was built on the machine where it works ("SUSE Linux Enterprise
Server 12 SP1").
>  >
>  > I have seen several linux kernel changes regarding saving and
restoring the VSX registers.
>  >
>  > I still haven?t found out how the kernel determines things like ?tsk->
thread.used_vsr? which is used to set ?msr |= MSR_VEC?.
>  >
>  > Maybe something is missing which tells the kernel that we?re using it.
But that?s just a guess.
>
> Facilities like FP (fp registers), VEC (vector registers - aka
VMX/Altivec), and
> VSX (vector-scalar registers) are usually disabled on a new born process.
Once
> any instruction associated to these facilities is used in the process it
causes
> an exception that is treated by the kernel [1, 2, 3]: kernel enables the
> facility that caused the exception (see load_up_fp & friends) and
re-execute the
> instruction when kernel returns the control back to the process in
userspace.
>
> Starting from kernel v4.6 [4] there is a simple heuristic that employs a
8-bit
> counter to help track if a process, after using these facilities for the
first
> time, continues to use the facilities. The counters (load_fp and
load_vec) are
> incremented on each context switch and if the process stops using the FP
or VEC
> facilities then they are disabled again with FP/VEC/VSX save/restore on
context
> switches being disabled as well in order to improve the performance on
context
> switches by avoiding the FP/VEC/VEX register save/restore.
>
> Either way (before or after the change introduced in v4.6) *that
mechanism is
> opaque to userspace*, particularly to the process using these facilities.
If a
> given facility is not enabled by the kernel (in case the CPU does not
support
> it, kernel sends a SIGILL to the process). It's possible to inspect the
thread
> member dynamics/state from userspace using tools like 'systemtap' (for
> exemple, this simple script can be used to inspect a VRSAVE registers on
given
> thread that is running a program called 'vrsave_' [5]) or using the
'perf' tool.
>
> "tsk->thread.used_vsr" [6] is actually associated to the VSX facility
whilst
> MSR_VEC is associated to the VEC/VMX/Altivec facility [7], so
> "tsk->thread.used_vsr" is set to 1 once a VSX instruction is used (if
it's a new
> process or if the load_fp and load_vec counters overflowed and became
zero
> disabling VSX or if only FP or only VEC ?- not both - were used in the
process).
> In that case kernel will also enable the VSX by MSR |= MSR_VSX. A similar
> mechanism drives the FP (MSR_FP) and the VEC (MSR_VEC) facilities.
>
> If both FP and VEC facilities are used the VSX facility is enabled
automatically
> since FP+VEC regsets == VSX regset [8].
>
> Thus as this mechanism is entirely opaque to userspace I understand that
if a
> program has to tell to kernel it wants to use any of these facilities
> (FP/VEC/VEC) before using it there is something wrong going in
kernelspace.
>
> Martin and Michi, if you want any help on drilling it further at kernel
side
> please let me know, maybe I can help.
>
> I didn't have the chance to reproduce the crash yet, so if I find
anything
> meaningful about it tomorrow I'll keep you posted.
>
>
> Kind regards,
> Gustavo
>
> [1]
https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L851-L869?
 ?(FP)
> [2]
https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211?(VEC/VMX/Altivec)

> [3]
https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211?(VSX)

> [4]
https://github.com/torvalds/linux/commit/70fe3d980#diff-ef76830326856a12ea2b45630123d1adR239

> [5] http://cr.openjdk.java.net/~gromero/script.d <
http://cr.openjdk.java.net/%7Egromero/script.d>
> [6]
https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R310

> [7]
https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R250

> [8]
https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R437

>
>  > Best regards,
>  >
>  > Martin
>  >
>  > *From:*Michihiro Horie <HORIE at jp.ibm.com>
>  > *Sent:* Dienstag, 4. September 2018 07:32
>  > *To:* Doerr, Martin <martin.doerr at sap.com>
>  > *Cc:* Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Gustavo Romero
<gromero at linux.vnet.ibm.com>; hotspot compiler
<hotspot-compiler-dev at openjdk.java.net>; hotspot-dev at openjdk.java.net
>  > *Subject:* RE: RFR: 8208171: PPC64: Enrich SLP support
>  >
>  > Hi Goetz, Martin, and Gustavo,
>  >
>  >
>  >>First, this should have been reviewed on hotspot-compiler-dev. It is
clearly
>  >>a compiler change. _
>  > _>http://mail.openjdk.java.net/mailman/listinfo?<
http://mail.openjdk.java.net/mailman/listinfo> says that hotspot-dev is for
>  >>"Technical discussion about the development of the HotSpot virtual
machine that's not specific to any particular component"
>  >>while hotspot-compiler-dev is for
>  >>"Technical discussion about the development of the HotSpot bytecode
compilers"
>  > I understood the instruction and would use hotspot-compiler-dev in
future RFRs, thanks.
>  >
>  >
>  >> Why do you rename vnoreg to vnoregi?
>  > I followed the way of coding for vsnoreg and vsnoregi, but the
renaming does not look necessary. I would get this part back. Should I also
rename vsnoregi to vsnoreg?
>  >
>  >
>  >>we noticed jtreg test failures when using this change:
>  >>compiler/runtime/safepoints/TestRegisterRestoring.java
>  >>compiler/runtime/Test7196199.java
>  >>
>  >>TestRegisterRestoring is a simple test which returns arbitrary results
instead of 10000.
>  >>
>  >>We didn't see it on all machines, so it might be an issue with
saving&restoring VR registers in the signal handler.
>  >>The machine which I have used has "SUSE Linux Enterprise Server 12
SP3" with kernel 4.4.126-94.22-default.
>  > Thank you for letting me know the issue, I will try to reproduce this
on a SUSE machine.
>  >
>  >
>  >>I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file
when your patch is applied. Looks like matching the vector nodes needs to
be prevented.
>  > Thank you for pointing out another issue. Currently I do not hit this
problem, but preventing to match the vector nodes makes sense to avoid the
crash. I did not prepare match rules for non-vector nodes, so it might be
better to prepare them similarly like the Replicate* rules, in any case.
>  >
>  >
>  > Gustavo, thanks for the wrap-up!
>  >
>  >
>  > Best regards,
>  > --
>  > Michihiro,
>  > IBM Research - Tokyo
>  >
>  > Inactive hide details for "Doerr, Martin" ---2018/09/04 02:18:24---Hi
Gustavo and Michihiro, we noticed jtreg test failures whe"Doerr, Martin"
---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test
failures when using this change:
>  >
>  > From: "Doerr, Martin" <martin.doerr at sap.com <
mailto:martin.doerr at sap.com>>
>  > To: Gustavo Romero <gromero at linux.vnet.ibm.com <
mailto:gromero at linux.vnet.ibm.com>>, "Lindenmaier, Goetz"
<goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>, Michihiro
Horie <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
>  > Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net <
mailto:hotspot-compiler-dev at openjdk.java.net>>,
"hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>"
<hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>>
>  > Date: 2018/09/04 02:18
>  > Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>  >
>  >
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>  >
>  >
>  >
>  >
>  > Hi Gustavo and Michihiro,
>  >
>  > we noticed jtreg test failures when using this change:
>  > compiler/runtime/safepoints/TestRegisterRestoring.java
>  > compiler/runtime/Test7196199.java
>  >
>  > TestRegisterRestoring is a simple test which returns arbitrary results
instead of 10000.
>  >
>  > We didn't see it on all machines, so it might be an issue with
saving&restoring VR registers in the signal handler.
>  > The machine which I have used has "SUSE Linux Enterprise Server 12
SP3" with kernel 4.4.126-94.22-default.
>  >
>  > That's what I found out so far. Maybe you have an idea?
>  >
>  > I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file
when your patch is applied. Looks like matching the vector nodes needs to
be prevented.
>  >
>  > Best regards,
>  > Martin
>  >
>  >
>  > -----Original Message-----
>  > From: hotspot-dev <hotspot-dev-bounces at openjdk.java.net <
mailto:hotspot-dev-bounces at openjdk.java.net>> On Behalf Of Gustavo Romero
>  > Sent: Montag, 3. September 2018 14:57
>  > To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com <
mailto:goetz.lindenmaier at sap.com>>; Michihiro Horie <HORIE at jp.ibm.com <
mailto:HORIE at jp.ibm.com>>
>  > Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net <
mailto:hotspot-compiler-dev at openjdk.java.net>>;
hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>
>  > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>  >
>  > Hi Goetz,
>  >
>  > On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote:
>  >> Also, I can not find all of the mail traffic in
>  >>
http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html.
>  >> Is this a problem of the pipermail server?
>  >>
>  >> For some reason this webrev lacks the links to browse the diffs.
>  >> Do you need to use a more recent webrev? ?You can obtain it with
>  >> hg clone http://hg.openjdk.java.net/code-tools/webrev/?.
>  >
>  > Yes, probably it was a problem of the pipermail or in some relay.
>  > I noted the same thing, i.e. at least one Michi reply arrived
>  > to me but missed a ML.
>  >
>  > The initial discussion is here:
>  >
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html

>  >
>  > I understand Martin reviewed the last webrev in that thread, which is
>  > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ <
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/>?<
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/> ?(taken from
>  >
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html
)
>  >
>  > Martin's review of webrev.01:
>  >
http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html
>  >
>  > and Michi's reply to Martin's review of webrev.01:
>  >
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html?(with
 webrev.02,
>  > taken from
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html
).
>  >
>  > and your last review:
>  >
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html

>  >
>  >
>  > HTH.
>  >
>  > Best regards,
>  > Gustavo
>  >
>  >> Why do you rename vnoreg to vnoregi?
>  >>
>  >> Besides that the change is fine, thanks for implementing this!
>  >>
>  >> Best regards,
>  >> ? ?Goetz.
>  >>
>  >>
>  >>> -----Original Message-----
>  >>> From: Doerr, Martin
>  >>> Sent: Dienstag, 28. August 2018 19:35
>  >>> To: Gustavo Romero <gromero at linux.vnet.ibm.com <
mailto:gromero at linux.vnet.ibm.com>>; Michihiro Horie
>  >>> <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
>  >>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com <
mailto:goetz.lindenmaier at sap.com>>; hotspot-
>  >>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>;
ppc-aix-port-dev at openjdk.java.net <mailto:ppc-aix-port-dev at openjdk.java.net
>; Simonis, Volker
>  >>> <volker.simonis at sap.com <mailto:volker.simonis at sap.com>>
>  >>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>  >>>
>  >>> Hi Michihiro,
>  >>>
>  >>> thank you for implementing it. I have just taken a first look at
your
>  >>> webrev.01.
>  >>>
>  >>> It looks basically good. Only the Power version check seems to be
incorrect.
>  >>> VM_Version::has_popcntb() checks for Power5.
>  >>> I believe most instructions are available with Power7.
>  >>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with
>  >>> Power8?
>  >>> We should check this carefully.
>  >>>
>  >>> Also, indentation in register_ppc.hpp could get improved.
>  >>>
>  >>> Thanks and best regard,
>  >>> Martin
>  >>>
>  >>>
>  >>> -----Original Message-----
>  >>> From: Gustavo Romero <gromero at linux.vnet.ibm.com <
mailto:gromero at linux.vnet.ibm.com>>
>  >>> Sent: Donnerstag, 26. Juli 2018 16:02
>  >>> To: Michihiro Horie <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
>  >>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com <
mailto:goetz.lindenmaier at sap.com>>; hotspot-
>  >>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>; Doerr, Martin
<martin.doerr at sap.com <mailto:martin.doerr at sap.com>>; ppc-aix-
>  >>> port-dev at openjdk.java.net <mailto:port-dev at openjdk.java.net>;
Simonis, Volker <volker.simonis at sap.com <mailto:volker.simonis at sap.com>>
>  >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>  >>>
>  >>> Hi Michi,
>  >>>
>  >>> On 07/26/2018 01:43 AM, Michihiro Horie wrote:
>  >>>> I updated webrev:
>  >>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ <
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/>?<
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/>
>  >>>
>  >>> Thanks for providing an updated webrev and for fixing indentation
and
>  >>> function
>  >>> order in assembler_ppc.inline.hpp as well. I have no further
comments :)
>  >>>
>  >>>
>  >>> Best Regards,
>  >>> Gustavo
>  >>>
>  >>>>
>  >>>> Best regards,
>  >>>> --
>  >>>> Michihiro,
>  >>>> IBM Research - Tokyo
>  >>>>
>  >>>> Inactive hide details for Gustavo Romero ---2018/07/25
23:05:32---Hi Michi,
>  >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---
>  >>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro
Horie
>  >>> wrote:
>  >>>>
>  >>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com <
mailto:gromero at linux.vnet.ibm.com>>
>  >>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-
>  >>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>,
hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>
>  >>>> Cc: goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>,
volker.simonis at sap.com <mailto:volker.simonis at sap.com>, "Doerr, Martin"
>  >>> <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>
>  >>>> Date: 2018/07/25 23:05
>  >>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>  >>>>
>  >>>>
-------------------------------------------------------------------------------------------

>  >>>
----------------------------------------------------------------------------------------------

>  >>>
----------------------------------------------------------------------------------------------

>  >>>
----------------------------------------------------------------------------------------------

>  >>>
----------------------------------------------------------------------------------------------

>  >>>
----------------------------------------------------------------------------------------------

>  >>>
----------------------------------------------------------------------------------------------

>  >>>
----------------------------------------------------------------------------------------------

>  >>>
----------------------------------------------------------------------------------------------

>  >>>
----------------------------------------------------------------------------------------------

>  >>> -----------------------------------------------------
>  >>>>
>  >>>>
>  >>>>
>  >>>> Hi Michi,
>  >>>>
>  >>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:
>  >>>> ? > Dear all,
>  >>>> ? >
>  >>>> ? > Would you review the following change?
>  >>>> ? > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171
>  >>>> ? > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 <
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.00>?<
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.00>
>  >>>> ? >
>  >>>> ? > This change adds support for vectorized arithmetic calculation
with SLP.
>  >>>> ? >
>  >>>> ? > The to_vr function is added to convert VSR to VR. Currently,
vecX is
>  >>> associated with a VSR class vs_reg that only defines VSR32-51 in
ppc.ad,
>  >>> which are exactly overlapped with VRs. Instruction APIs receiving
VRs use the
>  >>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable
the
>  >>> matching with SqrtVF. I think the change in sqrtF_reg would be fine
due to
>  >>> the ConvD2FNode::Value in convertnode.cpp.
>  >>>>
>  >>>> Looks good. Just a few comments:
>  >>>>
>  >>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of
>  >>> vmaddfp in
>  >>>> ? ? order to avoid the splat?
>  >>>>
>  >>>> - Although all instructions added by your change where introduced
in ISA
>  >>> 2.06,
>  >>>> ? ? so POWER7 and above are OK, as I see probes for
>  >>> PowerArchictecturePPC64=6|5 in
>  >>>> ? ? vm_version_ppc.cpp (line 64), ?I'm wondering if there is any
control point
>  >>> to
>  >>>> ? ? guarantee that these instructions won't be emitted on a CPU
that does
>  >>> not
>  >>>> ? ? support them.
>  >>>>
>  >>>> - I think that in general string in format %{} are in upper case.
For instance,
>  >>>> ? ? this the current output on optoassembly for vmul4F:
>  >>>>
>  >>>> 2941835 5b4 ? ? ADDI ? ?R24, R24, #64
>  >>>> 2941836 5b8 ? ? vmaddfp ?VSR32,VSR32,VSR36 ? ? ?! mul packed4F
>  >>>> 2941837 5c0 ? ? STXVD2X ? ? [R17], VSR32 ? ? ? ?// store 16-byte
Vector
>  >>>>
>  >>>> ? ? I think it would be better to be in upper case instead. I also
think that if
>  >>>> ? ? the node match emits more than one instruction all instructions
must be
>  >>> listed
>  >>>> ? ? in format %{}, since it's meant for detailed debugging. Finally
I think it
>  >>>> ? ? would be better to replace \t! by \t// in that string (unless
I'm missing any
>  >>>> ? ? special meaning for that char). So for vmul4F it would be
something like:
>  >>>>
>  >>>> 2941835 5b4 ? ? ADDI ? ? ?R24, R24, #64
>  >>>> ? ? ? ? ? ? ? ? ? VSPLTISW ?VSR34, 0 ? ? ? ? ? ? ? ? // Splat 0 imm
in VSR34
>  >>>> 2941836 5b8 ? ? VMADDFP ? VSR32,VSR32,VSR36,VSR34 ?// Mul packed4F
>  >>>> 2941837 5c0 ? ? STXVD2X ? [R17], VSR32 ? ? ? ? ? ? // store 16-byte
Vector
>  >>>>
>  >>>>
>  >>>> But feel free to change anything just after you get additional
reviews :)
>  >>>>
>  >>>>
>  >>>> ? > I confirmed this change with JTREG. In addition, I used
attached micro
>  >>> benchmarks.
>  >>>> ? > /(See attached file: slp_microbench.zip)/
>  >>>>
>  >>>> Thanks for sharing it.
>  >>>> Btw, another option to host it would be in the CR
>  >>>> server, in http://cr.openjdk.java.net/~mhorie/8208171 <
http://cr.openjdk.java.net/%7Emhorie/8208171>?<
http://cr.openjdk.java.net/%7Emhorie/8208171>
>  >>>>
>  >>>>
>  >>>> Best regards,
>  >>>> Gustavo
>  >>>>
>  >>>> ? >
>  >>>> ? > Best regards,
>  >>>> ? > --
>  >>>> ? > Michihiro,
>  >>>> ? > IBM Research - Tokyo
>  >>>> ? >
>  >>>>
>  >>>>
>  >>>>
>  >>
>  >
>  >
>  >
>  >
>
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180906/48d103fd/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180906/48d103fd/graycol-0001.gif>

From rwestrel at redhat.com  Thu Sep  6 07:16:57 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 06 Sep 2018 09:16:57 +0200
Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11
In-Reply-To: <f14ff724-5635-fb52-d18d-154395207979@bell-sw.com>
References: <dk6zhx64pl5.fsf@rwestrel.remote.csb>
 <98587949-4bef-51ad-0652-3a74f79af233@oracle.com>
 <dk67ek311xv.fsf@rwestrel.remote.csb>
 <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com>
 <dk65zzkz7wz.fsf@rwestrel.remote.csb>
 <ff6c3ae6-99a9-47bc-8dea-01232eab4927@bell-sw.com>
 <dca1a96d-4120-2c7d-6175-31137161ccdd@oracle.com>
 <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com>
 <f14ff724-5635-fb52-d18d-154395207979@bell-sw.com>
Message-ID: <dk6efe7xfh2.fsf@rwestrel.remote.csb>


Hi Dmitry,

> -prof perfnorm shows 7-14% more branch misses.

My patch doesn't make any change to the stubs. It only tweaks c2
compiled code. Do you see any difference in the code generated for
com.sun.crypto.provider.CipherCore::doFinal?

Roland.

From rwestrel at redhat.com  Thu Sep  6 07:17:13 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 06 Sep 2018 09:17:13 +0200
Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11
In-Reply-To: <3c8ae9e3-e3f2-df0e-0add-4d1c54589198@oracle.com>
References: <dk6zhx64pl5.fsf@rwestrel.remote.csb>
 <98587949-4bef-51ad-0652-3a74f79af233@oracle.com>
 <dk67ek311xv.fsf@rwestrel.remote.csb>
 <3c8ae9e3-e3f2-df0e-0add-4d1c54589198@oracle.com>
Message-ID: <dk6bm9bxfgm.fsf@rwestrel.remote.csb>


Thanks for the review, Vladimir.

Roland.

From dmitry.chuyko at bell-sw.com  Thu Sep  6 10:59:28 2018
From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko)
Date: Thu, 6 Sep 2018 13:59:28 +0300
Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11
In-Reply-To: <dk6efe7xfh2.fsf@rwestrel.remote.csb>
References: <dk6zhx64pl5.fsf@rwestrel.remote.csb>
 <98587949-4bef-51ad-0652-3a74f79af233@oracle.com>
 <dk67ek311xv.fsf@rwestrel.remote.csb>
 <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com>
 <dk65zzkz7wz.fsf@rwestrel.remote.csb>
 <ff6c3ae6-99a9-47bc-8dea-01232eab4927@bell-sw.com>
 <dca1a96d-4120-2c7d-6175-31137161ccdd@oracle.com>
 <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com>
 <f14ff724-5635-fb52-d18d-154395207979@bell-sw.com>
 <dk6efe7xfh2.fsf@rwestrel.remote.csb>
Message-ID: <097e9014-db4a-0f2f-9b78-80cb5dcc7832@bell-sw.com>

On 09/06/2018 10:16 AM, Roland Westrelin wrote:
> Hi Dmitry,
>
>> -prof perfnorm shows 7-14% more branch misses.
> My patch doesn't make any change to the stubs. It only tweaks c2
> compiled code.
One guess could be that other code influenced branches prediction in the 
stub.
>   Do you see any difference in the code generated for
> com.sun.crypto.provider.CipherCore::doFinal?
Yes. Here is how it looks like:

Current

             0x0000fffca85ffd68: add    w16, w4, w10                ;*invokestatic addExact {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 17 (line 328)
                                                                       ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917)
   0.02%     0x0000fffca85ffd6c: cmp    w14, #0x1
             0x0000fffca85ffd70: b.cc    0x0000fffca85fff20  // b.lo, b.ul, b.last
                                                                       ;*lookupswitch {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 25 (line 329)
                                                                       ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917)
             0x0000fffca85ffd74: lsl    x11, x11, #3                ;*getfield padding {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 92 (line 344)
                                                                       ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917)

Patched/-XX:-UseSwitchProfiling

            0x0000fffcd86000e4: add    w16, w4, w15                ;*invokestatic addExact {reexecute=0 rethrow=0 return_oop=0}
                                                                      ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 17 (line 328)
                                                                      ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917)
   0.01%    0x0000fffcd86000e8: cmp    w14, #0x7
            0x0000fffcd86000ec: b.eq    0x0000fffcd8600234  // b.none  ;*lookupswitch {reexecute=0 rethrow=0 return_oop=0}
                                                                      ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 25 (line 329)
                                                                      ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917)
            0x0000fffcd86000f0: lsl    x11, x12, #3                ;*getfield padding {reexecute=0 rethrow=0 return_oop=0}
                                                                      ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 92 (line 344)
                                                                      ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917)

There are also several instructions/blocks rearrangements.

-Dmitry

>
> Roland.


From rwestrel at redhat.com  Thu Sep  6 13:10:43 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 06 Sep 2018 15:10:43 +0200
Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11
In-Reply-To: <097e9014-db4a-0f2f-9b78-80cb5dcc7832@bell-sw.com>
References: <dk6zhx64pl5.fsf@rwestrel.remote.csb>
 <98587949-4bef-51ad-0652-3a74f79af233@oracle.com>
 <dk67ek311xv.fsf@rwestrel.remote.csb>
 <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com>
 <dk65zzkz7wz.fsf@rwestrel.remote.csb>
 <ff6c3ae6-99a9-47bc-8dea-01232eab4927@bell-sw.com>
 <dca1a96d-4120-2c7d-6175-31137161ccdd@oracle.com>
 <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com>
 <f14ff724-5635-fb52-d18d-154395207979@bell-sw.com>
 <dk6efe7xfh2.fsf@rwestrel.remote.csb>
 <097e9014-db4a-0f2f-9b78-80cb5dcc7832@bell-sw.com>
Message-ID: <dk68t4eydnw.fsf@rwestrel.remote.csb>


> Yes. Here is how it looks like:
>
> Current
>
>              0x0000fffca85ffd68: add    w16, w4, w10                ;*invokestatic addExact {reexecute=0 rethrow=0 return_oop=0}
>                                                                        ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 17 (line 328)
>                                                                        ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917)
>    0.02%     0x0000fffca85ffd6c: cmp    w14, #0x1
>              0x0000fffca85ffd70: b.cc    0x0000fffca85fff20  // b.lo, b.ul, b.last
>                                                                        ;*lookupswitch {reexecute=0 rethrow=0 return_oop=0}
>                                                                        ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 25 (line 329)
>                                                                        ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917)
>              0x0000fffca85ffd74: lsl    x11, x11, #3                ;*getfield padding {reexecute=0 rethrow=0 return_oop=0}
>                                                                        ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 92 (line 344)
>                                                                        ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917)
>
> Patched/-XX:-UseSwitchProfiling
>
>             0x0000fffcd86000e4: add    w16, w4, w15                ;*invokestatic addExact {reexecute=0 rethrow=0 return_oop=0}
>                                                                       ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 17 (line 328)
>                                                                       ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917)
>    0.01%    0x0000fffcd86000e8: cmp    w14, #0x7
>             0x0000fffcd86000ec: b.eq    0x0000fffcd8600234  // b.none  ;*lookupswitch {reexecute=0 rethrow=0 return_oop=0}
>                                                                       ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 25 (line 329)
>                                                                       ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917)
>             0x0000fffcd86000f0: lsl    x11, x12, #3                ;*getfield padding {reexecute=0 rethrow=0 return_oop=0}
>                                                                       ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 92 (line 344)
>                                                                       ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917)
>
> There are also several instructions/blocks rearrangements.

That does seem like a pretty minimal difference and not a reason not to
push that change. What do you think?

Roland.

From dmitry.chuyko at bell-sw.com  Thu Sep  6 13:20:43 2018
From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko)
Date: Thu, 6 Sep 2018 16:20:43 +0300
Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11
In-Reply-To: <dk68t4eydnw.fsf@rwestrel.remote.csb>
References: <dk6zhx64pl5.fsf@rwestrel.remote.csb>
 <98587949-4bef-51ad-0652-3a74f79af233@oracle.com>
 <dk67ek311xv.fsf@rwestrel.remote.csb>
 <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com>
 <dk65zzkz7wz.fsf@rwestrel.remote.csb>
 <ff6c3ae6-99a9-47bc-8dea-01232eab4927@bell-sw.com>
 <dca1a96d-4120-2c7d-6175-31137161ccdd@oracle.com>
 <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com>
 <f14ff724-5635-fb52-d18d-154395207979@bell-sw.com>
 <dk6efe7xfh2.fsf@rwestrel.remote.csb>
 <097e9014-db4a-0f2f-9b78-80cb5dcc7832@bell-sw.com>
 <dk68t4eydnw.fsf@rwestrel.remote.csb>
Message-ID: <9b58801d-fcdd-88ff-7d45-15c44096137f@bell-sw.com>

On 09/06/2018 04:10 PM, Roland Westrelin wrote:
>> Yes. Here is how it looks like:
>> ...................................
> That does seem like a pretty minimal difference and not a reason not to
> push that change. What do you think?
I agree, it looks like something we should investigate in aarch64 port.

-Dmitry
>
> Roland.


From adinn at redhat.com  Fri Sep  7 12:58:59 2018
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 7 Sep 2018 13:58:59 +0100
Subject: RFR: 8189107 - AARCH64: create intrinsic for pow
In-Reply-To: <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com>
References: <d5554366-9851-a63c-b8d2-c1300219f487@bell-sw.com>
 <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com>
 <c9153905-1f74-fb34-4938-b48b34d5a12c@redhat.com>
 <a7707848-3075-9fa4-f81d-89c128fd6dca@bell-sw.com>
 <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com>
Message-ID: <ae6b81dd-795d-e23f-099a-761d776f3466@redhat.com>

Hi Dmitrij

On 22/08/18 11:04, Andrew Dinn wrote:
> Thank you for the revised webrev and new test results. I am now working
> through them. I will post comments as soon as I have given the new code
> a full read and assessed the new results. I am afraid that may take a
> day or two, for which delay advance apologies.

This review has taken a great deal longer than expected. I am sorry but
that is because the documentation for the code you have submitted is
still seriously inadequate and I have had to put a lot of work into
revising it before I can fully review the code.

I am still finishing off that last task but I wanted to start providing
you with some feedback and also to enlist your help in checking that my
revisions are correct. I plan to provide feedback in 3 stages to match
the 3 steps in the review that I am doing as follows:

1) Correct the original 'algorithm' you started from

2) Correct the 'modified algorithm' that is meant to describe the
behaviour of your code

3) Propose any necessary corrections/improvements to the generated code

So, let's start with step 1.

The 'original' algorithm located in file macroAssembler_aarch64_pow.cpp
is really just a fragment of C code with a few missing elements (e.g.
the origin of values P1, P2, ... is not explained, hugeX, tiny are not
defined). Although this code as the virtue that it is known to be
correct (or at least has been verified by long use and the eyes of
experts in numerical computation) it still fails to provide important
information about what the 'algorithm' is supposed to do. That
information is critical for anyone coming to it fresh to be able
understand what is happening.

The first omission is several pieces of background mathematics that are
neither explained in the code nor referenced. The mathematics includes
the formulae on which the algorithm is based and the numerical
approximation to these formulae that is employed to define the
algorithm. This is needed to explain /how/ and /why/ a) the two
different computations of log2(x) and b) the computation of exp(x) are
performed as they are and to justify that the results are valid.

The second omission is detailed descriptions of what most of the more
complex individual steps in the algorithm do. Many of the logic,
floating point and branching operations which compute intermediate
results are extremely opaque. This is particularly so for the steps
which manipulate bit patterns in the long representation of the fp
values being used. However, some of the straight fp arithmetic is also
highly problematic.

The other thing I think needs to be made clearer is the relationship
between the various special case return points in the code and the
special case rules they relate to. This is not so critical for the
original algorithm because the C code at least has a regular and
standardised control flow. However, labelling the exit paths is still
useful here and will be much more useful if used both here and in the
modified algorithm (and we'll come to that later).

I have rewritten the algorithm to achieve what I think is needed to
patch these omissions. The redraft of this part of the code is available
here:

  http://cr.openjdk.java.net/~adinn/8189107/original_algorithm.txt

I assume you are familiar with the relevant mathematics and how it has
been used to derive the algorithm. If so then I would like you to review
this rewrite and ensure that there are nor mathematical errors in it. I
would also like you to check that the explanatory comments for of the
individual steps in the algorithm do not contain any errors.

If you are not familiar with the mathematics then please let me know. I
need to know whether this has been reviewed bu someone competent to do so.

n.b. one little detail you might easily miss. I removed lg2, lg2_h and
lg2_l from the first table of constants as neither log(x) algorithm
needs them (it relies on ivln2). I renamed the entries in the second
table from LG2, etc to /ln2/, etc and change the name accordingly at
point of use. The computation of exp(x) actually does need ln2. One of
the code changes is to remove these redundant entries from your table
pow_coeff1.

Ok, as for the next 2 steps will post a follow-up to deal with them once
I have completed my review. That will include a heavily revised version
of your 'modified algorithm' (which is still in progress) plus
suggestions for changes to the code that I have found along the way.
Just as a preliminary I'll summarize what is wrong below.

Note that I have not yet found any errors in how the generated code
implements the mathematics but I am still not happy with it because it
is extremely unclear. Correcting the 'modified algorithm' is a
necessary, critical step to improvimg the clarity of the code.

So, in overview, what is wrong with your 'modified algorithm'. Well, the
thing that is immediately obvious is that it is /not/ actually the
algorithm you have employed. It is simply a mangled version of the C
code you started from that bears only a tenuous relation to the code
structure it is supposed to summarize. Now, I'm happy for you to use C
to model the generated code if possible and, in fact, am in the process
of writing a proper algorithm that looks as much like C code as possible
/but/ also actually describes what your generated code does. The problem
is that what you have written is not only /not/ C it is also i)
incoherent, ii) retains elements from the original code that don't exist
at all in the generated code and iii) omits important elements of the
generated code.

So, firstly, let's deal with the problem as it relates to control flow.
Your 'modified algorithm' includes various tags mentioning the word
'label' suggesting that some transfer of control is to be effected.
However, these are tacked onto statement blocks connected via 'if
(cond)' tests or 'else' keywords that are meant to imply some
alternative control flow. Essentially, your generated code relies on
gotos which do not fit a standard if/else flow model and you have tried
to bodge some sort of goto model on top of the original valid, gotoless
C control flow with no clear definition of how that is meant to work.
Honestly, if your generated code uses a goto control flow then your C
algorithm is going to have to do the same in order to clearly summarize
what the code actually does.

The second major problem is one I pointed out in my earlier note, i.e.
that the data values described in the 'modified algorithm' do not
correctly match the ones operated on in your generated code. Your
algorithm lists many redundant values used in the original algorithm
(e.g. ix, iy, ax, yisint) even though your code doesn't ever explicitly
construct most of those values (n.b. this but not just limited to the 32
bit half-word quantities). Instead the code frequently pulls the
relevant value, as needed, out of other data that it does construct and
holds in registers -- sometimes across control branches. At other times
it performs an equivalent operation on a different, related data value.
Your response to my request was to add comments which labelled some of
these on-the-fly created values or alternative values with the original
names but that ignores the fact that the names and the values referenced
in the comments do not actually match.

Contrariwise, a lot of the values the code does actually operate on are
not mentioned in the algorithm. Indeed, it is worse than that because
they are not coherently identified even in the generated code. Data
items stored in registers are referred to using the utterly redundant
symbolic aliases tmp0, tmp2, etc for registers r0, r1 etc. What is worse
the same meaningless symbolic names get reused for completely different
data items.

For example, at one point tmp2 identifies the exponent of y stored in r2
and later it identifies the absolute value of y also stored in r2,
overwriting the exponent. Your algorithm really ought to mention values
like exp_y or ay (or even |y|) for these cases and the code should
correspondingly define exp_y and ay as an alias for register r2. These
meaningful names should then be used when loading the constructed value
into a register and at every subsequent point of use where that
constructed value is valid.

This is not all that is wrong with the 'modified algorithm' but it is
enough to make it not just useless but worse than useless. What you have
written provides a hand-wave towards what the code does that fails to
summarize it with any accuracy or clarity and equally fails to clarify
the difference between it and the C code you started from. That only
makes the whole picture less clear not more so.

As I said, I will provide a better version of the 'modified algorithm'
in a follow-up and then discuss possible code changes. Please review the
linked file above while I prepare that.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From martin.doerr at sap.com  Fri Sep  7 13:39:25 2018
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 7 Sep 2018 13:39:25 +0000
Subject: RFR: 8208171: PPC64: Enrich SLP support
In-Reply-To: <OF3264BF03.B675D915-ON00258300.00076F89-49258300.00130124@notes.na.collabserv.com>
References: <OF27BFB560.0558A39A-ON002582D5.000EEFEA-492582D5.001F7E2E@notes.na.collabserv.com>
 <b235720a-353b-c0e4-d494-eb337eba06a7@linux.vnet.ibm.com>
 <OFCAFD0563.FF8B85D0-ON002582D6.0019C339-492582D6.0019F896@notes.na.collabserv.com>
 <f9be32d4-556e-5833-1c4c-8fed133f26d8@linux.vnet.ibm.com>
 <346da54af45243c4bdaf475f118a450d@sap.com>
 <9553d65d98f74f37a35b49a1e39f015e@sap.com>
 <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com>
 <fc4310e4a8544c69b3d14fd593a065fc@sap.com>
 <OF1DDE5FDC.048940CF-ON002582FD.007F93C0-492582FE.001E65D8@notes.na.collabserv.com>
 <57ebd30a66504577a6b2ec267aee4b69@sap.com>
 <8a093a6e-631e-4c20-07dc-96bc29f8c411@linux.vnet.ibm.com>
 <OFD4AAA261.28219E95-ON002582FF.00371825-492582FF.00390865@notes.na.collabserv.com>
 <a8132254-dd87-69e1-69b2-06c8a58565bf@linux.vnet.ibm.com>
 <OF3264BF03.B675D915-ON00258300.00076F89-49258300.00130124@notes.na.collabserv.com>
Message-ID: <baadeab733f54bd5bb590533be5371db@sap.com>

Hi Michihiro,

I?ve created a new bug for the vector register save issue:
https://bugs.openjdk.java.net/browse/JDK-8210497
I?d like to fix that one first. I can push your webrev.03 afterwards when tests are passing and review is completed.

Best regards,
Martin


From: Michihiro Horie <HORIE at jp.ibm.com>
Sent: Donnerstag, 6. September 2018 05:28
To: Gustavo Romero <gromero at linux.vnet.ibm.com>
Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; Doerr, Martin <martin.doerr at sap.com>
Subject: Re: RFR: 8208171: PPC64: Enrich SLP support


Hi Martin, Gustavo,

Thank you for giving the detailed discussions and narrowing down the current issue on ppc64.

> We haven't seen any issues with the current code, but I think this is affects jdk11, too. (We could also switch off SuperwordUseVSX for jdk11u.) Do you agree?
Yes, I agree. Following is the latest webrev switching off SuperwordUseVSX by default.
http://cr.openjdk.java.net/~mhorie/8208171/webrev.04/


Best regards,
--
Michihiro,
IBM Research - Tokyo

[Inactive hide details for Gustavo Romero ---2018/09/06 03:34:34---Hi Michi, On 09/05/2018 07:22 AM, Michihiro Horie wrote:]Gustavo Romero ---2018/09/06 03:34:34---Hi Michi, On 09/05/2018 07:22 AM, Michihiro Horie wrote:

From: Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>
To: Michihiro Horie/Japan/IBM at IBMJP
Cc: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>, hotspot compiler <hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>, "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>
Date: 2018/09/06 03:34
Subject: Re: RFR: 8208171: PPC64: Enrich SLP support

________________________________


Hi Michi,

On 09/05/2018 07:22 AM, Michihiro Horie wrote:
> Hi Martin, Gustavo,
>
> I cannot still reproduce the problem. I noticed the machine I have is not SUSE but OpenSUSE with 4.1.21-14-default. I've also tried kernel 4.4.0-31-generic but it's Ubuntu.
>
> Gustavo, is there any suspicious change before/after v4.4, which Martin got the crash?

Nope, nothing I'm aware of... However looks like Martin found no issues with
your last revision. Anyway, if you need a machine with SLES 12 SP3 installed
I have one that I can share. Drop me a Slack message if you need it.


Regards,
Gustavo

>
> Apart from the problem, I uploaded the latest webrev:<http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.03/>
> http://cr.openjdk.java.net/~mhorie/8208171/webrev.03/ <http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.03/>
>
>
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo
>
> Inactive hide details for Gustavo Romero ---2018/09/05 07:03:31---Hi Martin and Michi, On 09/04/2018 01:20 PM, Doerr, Martin wrGustavo Romero ---2018/09/05 07:03:31---Hi Martin and Michi, On 09/04/2018 01:20 PM, Doerr, Martin wrote:
>
> From: Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>
> To: "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>, Michihiro Horie/Japan/IBM at IBMJP
> Cc: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>, hotspot compiler <hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>
> Date: 2018/09/05 07:03
> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>
> Hi Martin and Michi,
>
> On 09/04/2018 01:20 PM, Doerr, Martin wrote:
>  > Can you reproduce the test failures?
>  >
>  > The very same VM works fine on a different Power8 machine which uses the same instructions by C2.
>  >
>  > The VM was built on the machine where it works ("SUSE Linux Enterprise Server 12 SP1").
>  >
>  > I have seen several linux kernel changes regarding saving and restoring the VSX registers.
>  >
>  > I still haven?t found out how the kernel determines things like ?tsk->thread.used_vsr? which is used to set ?msr |= MSR_VEC?.
>  >
>  > Maybe something is missing which tells the kernel that we?re using it. But that?s just a guess.
>
> Facilities like FP (fp registers), VEC (vector registers - aka VMX/Altivec), and
> VSX (vector-scalar registers) are usually disabled on a new born process. Once
> any instruction associated to these facilities is used in the process it causes
> an exception that is treated by the kernel [1, 2, 3]: kernel enables the
> facility that caused the exception (see load_up_fp & friends) and re-execute the
> instruction when kernel returns the control back to the process in userspace.
>
> Starting from kernel v4.6 [4] there is a simple heuristic that employs a 8-bit
> counter to help track if a process, after using these facilities for the first
> time, continues to use the facilities. The counters (load_fp and load_vec) are
> incremented on each context switch and if the process stops using the FP or VEC
> facilities then they are disabled again with FP/VEC/VSX save/restore on context
> switches being disabled as well in order to improve the performance on context
> switches by avoiding the FP/VEC/VEX register save/restore.
>
> Either way (before or after the change introduced in v4.6) *that mechanism is
> opaque to userspace*, particularly to the process using these facilities. If a
> given facility is not enabled by the kernel (in case the CPU does not support
> it, kernel sends a SIGILL to the process). It's possible to inspect the thread
> member dynamics/state from userspace using tools like 'systemtap' (for
> exemple, this simple script can be used to inspect a VRSAVE registers on given
> thread that is running a program called 'vrsave_' [5]) or using the 'perf' tool.
>
> "tsk->thread.used_vsr" [6] is actually associated to the VSX facility whilst
> MSR_VEC is associated to the VEC/VMX/Altivec facility [7], so
> "tsk->thread.used_vsr" is set to 1 once a VSX instruction is used (if it's a new
> process or if the load_fp and load_vec counters overflowed and became zero
> disabling VSX or if only FP or only VEC  - not both - were used in the process).
> In that case kernel will also enable the VSX by MSR |= MSR_VSX. A similar
> mechanism drives the FP (MSR_FP) and the VEC (MSR_VEC) facilities.
>
> If both FP and VEC facilities are used the VSX facility is enabled automatically
> since FP+VEC regsets == VSX regset [8].
>
> Thus as this mechanism is entirely opaque to userspace I understand that if a
> program has to tell to kernel it wants to use any of these facilities
> (FP/VEC/VEC) before using it there is something wrong going in kernelspace.
>
> Martin and Michi, if you want any help on drilling it further at kernel side
> please let me know, maybe I can help.
>
> I didn't have the chance to reproduce the crash yet, so if I find anything
> meaningful about it tomorrow I'll keep you posted.
>
>
> Kind regards,
> Gustavo
>
> [1] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L851-L869   (FP)
> [2] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211 (VEC/VMX/Altivec)
> [3] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211 (VSX)
> [4] https://github.com/torvalds/linux/commit/70fe3d980#diff-ef76830326856a12ea2b45630123d1adR239
> [5] http://cr.openjdk.java.net/~gromero/script.d <http://cr.openjdk.java.net/%7Egromero/script.d>
> [6] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R310
> [7] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R250
> [8] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R437
>
>  > Best regards,
>  >
>  > Martin
>  >
>  > *From:*Michihiro Horie <HORIE at jp.ibm.com<mailto:HORIE at jp.ibm.com>>
>  > *Sent:* Dienstag, 4. September 2018 07:32
>  > *To:* Doerr, Martin <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>
>  > *Cc:* Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>; Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>; hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>
>  > *Subject:* RE: RFR: 8208171: PPC64: Enrich SLP support
>  >
>  > Hi Goetz, Martin, and Gustavo,
>  >
>  >
>  >>First, this should have been reviewed on hotspot-compiler-dev. It is clearly
>  >>a compiler change. _
>  > _>http://mail.openjdk.java.net/mailman/listinfo <http://mail.openjdk.java.net/mailman/listinfo> says that hotspot-dev is for
>  >>"Technical discussion about the development of the HotSpot virtual machine that's not specific to any particular component"
>  >>while hotspot-compiler-dev is for
>  >>"Technical discussion about the development of the HotSpot bytecode compilers"
>  > I understood the instruction and would use hotspot-compiler-dev in future RFRs, thanks.
>  >
>  >
>  >> Why do you rename vnoreg to vnoregi?
>  > I followed the way of coding for vsnoreg and vsnoregi, but the renaming does not look necessary. I would get this part back. Should I also rename vsnoregi to vsnoreg?
>  >
>  >
>  >>we noticed jtreg test failures when using this change:
>  >>compiler/runtime/safepoints/TestRegisterRestoring.java
>  >>compiler/runtime/Test7196199.java
>  >>
>  >>TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000.
>  >>
>  >>We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler.
>  >>The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default.
>  > Thank you for letting me know the issue, I will try to reproduce this on a SUSE machine.
>  >
>  >
>  >>I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented.
>  > Thank you for pointing out another issue. Currently I do not hit this problem, but preventing to match the vector nodes makes sense to avoid the crash. I did not prepare match rules for non-vector nodes, so it might be better to prepare them similarly like the Replicate* rules, in any case.
>  >
>  >
>  > Gustavo, thanks for the wrap-up!
>  >
>  >
>  > Best regards,
>  > --
>  > Michihiro,
>  > IBM Research - Tokyo
>  >
>  > Inactive hide details for "Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures whe"Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures when using this change:
>  >
>  > From: "Doerr, Martin" <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>
>  > To: Gustavo Romero <gromero at linux.vnet.ibm.com <mailto:gromero at linux.vnet.ibm.com>>, "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>, Michihiro Horie <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
>  > Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>>, "hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>>
>  > Date: 2018/09/04 02:18
>  > Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>  >
>  > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  >
>  >
>  >
>  >
>  > Hi Gustavo and Michihiro,
>  >
>  > we noticed jtreg test failures when using this change:
>  > compiler/runtime/safepoints/TestRegisterRestoring.java
>  > compiler/runtime/Test7196199.java
>  >
>  > TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000.
>  >
>  > We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler.
>  > The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default.
>  >
>  > That's what I found out so far. Maybe you have an idea?
>  >
>  > I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented.
>  >
>  > Best regards,
>  > Martin
>  >
>  >
>  > -----Original Message-----
>  > From: hotspot-dev <hotspot-dev-bounces at openjdk.java.net <mailto:hotspot-dev-bounces at openjdk.java.net>> On Behalf Of Gustavo Romero
>  > Sent: Montag, 3. September 2018 14:57
>  > To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>; Michihiro Horie <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
>  > Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>>; hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net> <mailto:hotspot-dev at openjdk.java.net>
>  > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>  >
>  > Hi Goetz,
>  >
>  > On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote:
>  >> Also, I can not find all of the mail traffic in
>  >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html.
>  >> Is this a problem of the pipermail server?
>  >>
>  >> For some reason this webrev lacks the links to browse the diffs.
>  >> Do you need to use a more recent webrev?  You can obtain it with
>  >> hg clone http://hg.openjdk.java.net/code-tools/webrev/ .
>  >
>  > Yes, probably it was a problem of the pipermail or in some relay.
>  > I noted the same thing, i.e. at least one Michi reply arrived
>  > to me but missed a ML.
>  >
>  > The initial discussion is here:
>  > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html
>  >
>  > I understand Martin reviewed the last webrev in that thread, which is
>  > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ <http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/> <http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/>  (taken from
>  > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html)
>  >
>  > Martin's review of webrev.01:
>  > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html
>  >
>  > and Michi's reply to Martin's review of webrev.01:
>  > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02,
>  > taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html).
>  >
>  > and your last review:
>  > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html
>  >
>  >
>  > HTH.
>  >
>  > Best regards,
>  > Gustavo
>  >
>  >> Why do you rename vnoreg to vnoregi?
>  >>
>  >> Besides that the change is fine, thanks for implementing this!
>  >>
>  >> Best regards,
>  >>    Goetz.
>  >>
>  >>
>  >>> -----Original Message-----
>  >>> From: Doerr, Martin
>  >>> Sent: Dienstag, 28. August 2018 19:35
>  >>> To: Gustavo Romero <gromero at linux.vnet.ibm.com <mailto:gromero at linux.vnet.ibm.com>>; Michihiro Horie
>  >>> <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
>  >>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>; hotspot-
>  >>> dev at openjdk.java.net<mailto:dev at openjdk.java.net> <mailto:dev at openjdk.java.net>; ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net> <mailto:ppc-aix-port-dev at openjdk.java.net>; Simonis, Volker
>  >>> <volker.simonis at sap.com <mailto:volker.simonis at sap.com>>
>  >>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>  >>>
>  >>> Hi Michihiro,
>  >>>
>  >>> thank you for implementing it. I have just taken a first look at your
>  >>> webrev.01.
>  >>>
>  >>> It looks basically good. Only the Power version check seems to be incorrect.
>  >>> VM_Version::has_popcntb() checks for Power5.
>  >>> I believe most instructions are available with Power7.
>  >>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with
>  >>> Power8?
>  >>> We should check this carefully.
>  >>>
>  >>> Also, indentation in register_ppc.hpp could get improved.
>  >>>
>  >>> Thanks and best regard,
>  >>> Martin
>  >>>
>  >>>
>  >>> -----Original Message-----
>  >>> From: Gustavo Romero <gromero at linux.vnet.ibm.com <mailto:gromero at linux.vnet.ibm.com>>
>  >>> Sent: Donnerstag, 26. Juli 2018 16:02
>  >>> To: Michihiro Horie <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
>  >>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>; hotspot-
>  >>> dev at openjdk.java.net<mailto:dev at openjdk.java.net> <mailto:dev at openjdk.java.net>; Doerr, Martin <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>; ppc-aix-
>  >>> port-dev at openjdk.java.net<mailto:port-dev at openjdk.java.net> <mailto:port-dev at openjdk.java.net>; Simonis, Volker <volker.simonis at sap.com <mailto:volker.simonis at sap.com>>
>  >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>  >>>
>  >>> Hi Michi,
>  >>>
>  >>> On 07/26/2018 01:43 AM, Michihiro Horie wrote:
>  >>>> I updated webrev:
>  >>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ <http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/> <http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/>
>  >>>
>  >>> Thanks for providing an updated webrev and for fixing indentation and
>  >>> function
>  >>> order in assembler_ppc.inline.hpp as well. I have no further comments :)
>  >>>
>  >>>
>  >>> Best Regards,
>  >>> Gustavo
>  >>>
>  >>>>
>  >>>> Best regards,
>  >>>> --
>  >>>> Michihiro,
>  >>>> IBM Research - Tokyo
>  >>>>
>  >>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi,
>  >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---
>  >>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie
>  >>> wrote:
>  >>>>
>  >>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com <mailto:gromero at linux.vnet.ibm.com>>
>  >>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-
>  >>> dev at openjdk.java.net<mailto:dev at openjdk.java.net> <mailto:dev at openjdk.java.net>, hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net> <mailto:hotspot-dev at openjdk.java.net>
>  >>>> Cc: goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com> <mailto:goetz.lindenmaier at sap.com>, volker.simonis at sap.com<mailto:volker.simonis at sap.com> <mailto:volker.simonis at sap.com>, "Doerr, Martin"
>  >>> <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>
>  >>>> Date: 2018/07/25 23:05
>  >>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>  >>>>
>  >>>> -------------------------------------------------------------------------------------------
>  >>> ----------------------------------------------------------------------------------------------
>  >>> ----------------------------------------------------------------------------------------------
>  >>> ----------------------------------------------------------------------------------------------
>  >>> ----------------------------------------------------------------------------------------------
>  >>> ----------------------------------------------------------------------------------------------
>  >>> ----------------------------------------------------------------------------------------------
>  >>> ----------------------------------------------------------------------------------------------
>  >>> ----------------------------------------------------------------------------------------------
>  >>> ----------------------------------------------------------------------------------------------
>  >>> -----------------------------------------------------
>  >>>>
>  >>>>
>  >>>>
>  >>>> Hi Michi,
>  >>>>
>  >>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:
>  >>>>   > Dear all,
>  >>>>   >
>  >>>>   > Would you review the following change?
>  >>>>   > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171
>  >>>>   > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 <http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.00> <http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.00>
>  >>>>   >
>  >>>>   > This change adds support for vectorized arithmetic calculation with SLP.
>  >>>>   >
>  >>>>   > The to_vr function is added to convert VSR to VR. Currently, vecX is
>  >>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad,
>  >>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the
>  >>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the
>  >>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to
>  >>> the ConvD2FNode::Value in convertnode.cpp.
>  >>>>
>  >>>> Looks good. Just a few comments:
>  >>>>
>  >>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of
>  >>> vmaddfp in
>  >>>>     order to avoid the splat?
>  >>>>
>  >>>> - Although all instructions added by your change where introduced in ISA
>  >>> 2.06,
>  >>>>     so POWER7 and above are OK, as I see probes for
>  >>> PowerArchictecturePPC64=6|5 in
>  >>>>     vm_version_ppc.cpp (line 64),  I'm wondering if there is any control point
>  >>> to
>  >>>>     guarantee that these instructions won't be emitted on a CPU that does
>  >>> not
>  >>>>     support them.
>  >>>>
>  >>>> - I think that in general string in format %{} are in upper case. For instance,
>  >>>>     this the current output on optoassembly for vmul4F:
>  >>>>
>  >>>> 2941835 5b4     ADDI    R24, R24, #64
>  >>>> 2941836 5b8     vmaddfp  VSR32,VSR32,VSR36      ! mul packed4F
>  >>>> 2941837 5c0     STXVD2X     [R17], VSR32        // store 16-byte Vector
>  >>>>
>  >>>>     I think it would be better to be in upper case instead. I also think that if
>  >>>>     the node match emits more than one instruction all instructions must be
>  >>> listed
>  >>>>     in format %{}, since it's meant for detailed debugging. Finally I think it
>  >>>>     would be better to replace \t! by \t// in that string (unless I'm missing any
>  >>>>     special meaning for that char). So for vmul4F it would be something like:
>  >>>>
>  >>>> 2941835 5b4     ADDI      R24, R24, #64
>  >>>>                   VSPLTISW  VSR34, 0                 // Splat 0 imm in VSR34
>  >>>> 2941836 5b8     VMADDFP   VSR32,VSR32,VSR36,VSR34  // Mul packed4F
>  >>>> 2941837 5c0     STXVD2X   [R17], VSR32             // store 16-byte Vector
>  >>>>
>  >>>>
>  >>>> But feel free to change anything just after you get additional reviews :)
>  >>>>
>  >>>>
>  >>>>   > I confirmed this change with JTREG. In addition, I used attached micro
>  >>> benchmarks.
>  >>>>   > /(See attached file: slp_microbench.zip)/
>  >>>>
>  >>>> Thanks for sharing it.
>  >>>> Btw, another option to host it would be in the CR
>  >>>> server, in http://cr.openjdk.java.net/~mhorie/8208171 <http://cr.openjdk.java.net/%7Emhorie/8208171> <http://cr.openjdk.java.net/%7Emhorie/8208171>
>  >>>>
>  >>>>
>  >>>> Best regards,
>  >>>> Gustavo
>  >>>>
>  >>>>   >
>  >>>>   > Best regards,
>  >>>>   > --
>  >>>>   > Michihiro,
>  >>>>   > IBM Research - Tokyo
>  >>>>   >
>  >>>>
>  >>>>
>  >>>>
>  >>
>  >
>  >
>  >
>  >
>
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180907/be29de7c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 105 bytes
Desc: image001.gif
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180907/be29de7c/image001-0001.gif>

From dmitrij.pochepko at bell-sw.com  Fri Sep  7 13:40:23 2018
From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko)
Date: Fri, 7 Sep 2018 16:40:23 +0300
Subject: RFR(S): 8210461 - AArch64: Math.cos intrinsic gives incorrect results
Message-ID: <d4dc48cc-d12e-9b62-2481-1e2d9c6b5eed@bell-sw.com>

Hi,

please review small fix for 8210461 - AArch64: Math.cos intrinsic gives 
incorrect results

Large argument reduction code has a bug in one of code branches.

C-code: of affected place:

iq[jz] = (int)(z-two24B*fw);

with bug it was calculated as iq[jz] = (int)(z+two24B*fw);? // by fmadd 
instruction

Fix is to change it into fmsub instruction for correct calculation.

I also re-parsed most of code in search of same errors. Seems like no 
other issues found.


This bug wasn't caught by jtreg and jck tests, so I added separate small 
test for such case.

webrev: http://cr.openjdk.java.net/~dpochepk/8210461/webrev.01/

CR: https://bugs.openjdk.java.net/browse/JDK-8210461

Testing: I tested this patch via new and old tests. All passed. I also 
ran this new test on x86.


This patch should be pushed into jdk12 and backported into jdk11.


Thanks,

Dmitrij


From aph at redhat.com  Fri Sep  7 13:52:06 2018
From: aph at redhat.com (Andrew Haley)
Date: Fri, 7 Sep 2018 14:52:06 +0100
Subject: [aarch64-port-dev ] RFR(S): 8210461 - AArch64: Math.cos intrinsic
 gives incorrect results
In-Reply-To: <d4dc48cc-d12e-9b62-2481-1e2d9c6b5eed@bell-sw.com>
References: <d4dc48cc-d12e-9b62-2481-1e2d9c6b5eed@bell-sw.com>
Message-ID: <ea8cb840-1696-a141-eade-e0247da166a7@redhat.com>

On 09/07/2018 02:40 PM, Dmitrij Pochepko wrote:
> C-code: of affected place:
> 
> iq[jz] = (int)(z-two24B*fw);
> 
> with bug it was calculated as iq[jz] = (int)(z+two24B*fw);? // by fmadd 
> instruction
> 
> Fix is to change it into fmsub instruction for correct calculation.

Am I right to think that this code branch has never been tested?

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From dmitrij.pochepko at bell-sw.com  Fri Sep  7 14:03:18 2018
From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko)
Date: Fri, 7 Sep 2018 17:03:18 +0300
Subject: [aarch64-port-dev ] RFR(S): 8210461 - AArch64: Math.cos intrinsic
 gives incorrect results
In-Reply-To: <ea8cb840-1696-a141-eade-e0247da166a7@redhat.com>
References: <d4dc48cc-d12e-9b62-2481-1e2d9c6b5eed@bell-sw.com>
 <ea8cb840-1696-a141-eade-e0247da166a7@redhat.com>
Message-ID: <4ef01df1-0c2f-78ca-f2e5-f10b78247140@bell-sw.com>

I remember debugging this branch while running JCK tests.

Haven't checked precisely, but probably fw was? 0 on those cases, so, z 
- two24B*fw and z + tmp24B*fw. It would explain such behavior.


On 07/09/18 16:52, Andrew Haley wrote:
> On 09/07/2018 02:40 PM, Dmitrij Pochepko wrote:
>> C-code: of affected place:
>>
>> iq[jz] = (int)(z-two24B*fw);
>>
>> with bug it was calculated as iq[jz] = (int)(z+two24B*fw);? // by fmadd
>> instruction
>>
>> Fix is to change it into fmsub instruction for correct calculation.
> Am I right to think that this code branch has never been tested?
>


From HORIE at jp.ibm.com  Fri Sep  7 14:55:43 2018
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Fri, 7 Sep 2018 23:55:43 +0900
Subject: RFR: 8208171: PPC64: Enrich SLP support
In-Reply-To: <baadeab733f54bd5bb590533be5371db@sap.com>
References: <OF27BFB560.0558A39A-ON002582D5.000EEFEA-492582D5.001F7E2E@notes.na.collabserv.com>
 <OFCAFD0563.FF8B85D0-ON002582D6.0019C339-492582D6.0019F896@notes.na.collabserv.com>
 <f9be32d4-556e-5833-1c4c-8fed133f26d8@linux.vnet.ibm.com>
 <346da54af45243c4bdaf475f118a450d@sap.com>
 <9553d65d98f74f37a35b49a1e39f015e@sap.com>
 <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com>
 <fc4310e4a8544c69b3d14fd593a065fc@sap.com>
 <OF1DDE5FDC.048940CF-ON002582FD.007F93C0-492582FE.001E65D8@notes.na.collabserv.com>
 <57ebd30a66504577a6b2ec267aee4b69@sap.com>
 <8a093a6e-631e-4c20-07dc-96bc29f8c411@linux.vnet.ibm.com>
 <OFD4AAA261.28219E95-ON002582FF.00371825-492582FF.00390865@notes.na.collabserv.com>
 <a8132254-dd87-69e1-69b2-06c8a58565bf@linux.vnet.ibm.com>
 <baadeab733f54bd5bb590533be5371db@sap.com>
Message-ID: <OF9BFFD6D9.BD510F54-ON00258301.0050C704-49258301.00520196@notes.na.collabserv.com>


Hi Martin,

>I?ve created a new bug for the vector register save issue:
>https://bugs.openjdk.java.net/browse/JDK-8210497
>I?d like to fix that one first.
Thank you very much for handling the issue. I really appreciate it.

>I can push your webrev.03 afterwards when tests are passing and review is
completed.
This would be great, thanks again.


Best regards,
--
Michihiro,
IBM Research - Tokyo


From:	"Doerr, Martin" <martin.doerr at sap.com>
To:	Michihiro Horie <HORIE at jp.ibm.com>, Gustavo Romero
            <gromero at linux.vnet.ibm.com>
Cc:	"Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>, hotspot
            compiler <hotspot-compiler-dev at openjdk.java.net>
Date:	2018/09/07 22:39
Subject:	RE: RFR: 8208171: PPC64: Enrich SLP support


Hi Michihiro,

I?ve created a new bug for the vector register save issue:
https://bugs.openjdk.java.net/browse/JDK-8210497
I?d like to fix that one first. I can push your webrev.03 afterwards when
tests are passing and review is completed.

Best regards,
Martin


From: Michihiro Horie <HORIE at jp.ibm.com>
Sent: Donnerstag, 6. September 2018 05:28
To: Gustavo Romero <gromero at linux.vnet.ibm.com>
Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot compiler
<hotspot-compiler-dev at openjdk.java.net>; Doerr, Martin
<martin.doerr at sap.com>
Subject: Re: RFR: 8208171: PPC64: Enrich SLP support


Hi Martin, Gustavo,

Thank you for giving the detailed discussions and narrowing down the
current issue on ppc64.

> We haven't seen any issues with the current code, but I think this is
affects jdk11, too. (We could also switch off SuperwordUseVSX for jdk11u.)
Do you agree?
Yes, I agree. Following is the latest webrev switching off SuperwordUseVSX
by default.
http://cr.openjdk.java.net/~mhorie/8208171/webrev.04/


Best regards,
--
Michihiro,
IBM Research - Tokyo

Inactive hide details for Gustavo Romero ---2018/09/06 03:34:34---Hi Michi,
On 09/05/2018 07:22 AM, Michihiro Horie wrote:Gustavo Romero ---2018/09/06
03:34:34---Hi Michi, On 09/05/2018 07:22 AM, Michihiro Horie wrote:

From: Gustavo Romero <gromero at linux.vnet.ibm.com>
To: Michihiro Horie/Japan/IBM at IBMJP
Cc: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>, hotspot compiler <
hotspot-compiler-dev at openjdk.java.net>, "Doerr, Martin" <
martin.doerr at sap.com>
Date: 2018/09/06 03:34
Subject: Re: RFR: 8208171: PPC64: Enrich SLP support


Hi Michi,

On 09/05/2018 07:22 AM, Michihiro Horie wrote:
> Hi Martin, Gustavo,
>
> I cannot still reproduce the problem. I noticed the machine I have is not
SUSE but OpenSUSE with 4.1.21-14-default. I've also tried kernel
4.4.0-31-generic but it's Ubuntu.
>
> Gustavo, is there any suspicious change before/after v4.4, which Martin
got the crash?

Nope, nothing I'm aware of... However looks like Martin found no issues
with
your last revision. Anyway, if you need a machine with SLES 12 SP3
installed
I have one that I can share. Drop me a Slack message if you need it.


Regards,
Gustavo

>
> Apart from the problem, I uploaded the latest webrev:<
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.03/>
> http://cr.openjdk.java.net/~mhorie/8208171/webrev.03/ <
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.03/>
>
>
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo
>
> Inactive hide details for Gustavo Romero ---2018/09/05 07:03:31---Hi
Martin and Michi, On 09/04/2018 01:20 PM, Doerr, Martin wrGustavo Romero
---2018/09/05 07:03:31---Hi Martin and Michi, On 09/04/2018 01:20 PM,
Doerr, Martin wrote:
>
> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> To: "Doerr, Martin" <martin.doerr at sap.com>, Michihiro
Horie/Japan/IBM at IBMJP
> Cc: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>, hotspot compiler <
hotspot-compiler-dev at openjdk.java.net>
> Date: 2018/09/05 07:03
> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>
>
>
> Hi Martin and Michi,
>
> On 09/04/2018 01:20 PM, Doerr, Martin wrote:
>  > Can you reproduce the test failures?
>  >
>  > The very same VM works fine on a different Power8 machine which uses
the same instructions by C2.
>  >
>  > The VM was built on the machine where it works ("SUSE Linux Enterprise
Server 12 SP1").
>  >
>  > I have seen several linux kernel changes regarding saving and
restoring the VSX registers.
>  >
>  > I still haven?t found out how the kernel determines things like ?tsk->
thread.used_vsr? which is used to set ?msr |= MSR_VEC?.
>  >
>  > Maybe something is missing which tells the kernel that we?re using it.
But that?s just a guess.
>
> Facilities like FP (fp registers), VEC (vector registers - aka
VMX/Altivec), and
> VSX (vector-scalar registers) are usually disabled on a new born process.
Once
> any instruction associated to these facilities is used in the process it
causes
> an exception that is treated by the kernel [1, 2, 3]: kernel enables the
> facility that caused the exception (see load_up_fp & friends) and
re-execute the
> instruction when kernel returns the control back to the process in
userspace.
>
> Starting from kernel v4.6 [4] there is a simple heuristic that employs a
8-bit
> counter to help track if a process, after using these facilities for the
first
> time, continues to use the facilities. The counters (load_fp and
load_vec) are
> incremented on each context switch and if the process stops using the FP
or VEC
> facilities then they are disabled again with FP/VEC/VSX save/restore on
context
> switches being disabled as well in order to improve the performance on
context
> switches by avoiding the FP/VEC/VEX register save/restore.
>
> Either way (before or after the change introduced in v4.6) *that
mechanism is
> opaque to userspace*, particularly to the process using these facilities.
If a
> given facility is not enabled by the kernel (in case the CPU does not
support
> it, kernel sends a SIGILL to the process). It's possible to inspect the
thread
> member dynamics/state from userspace using tools like 'systemtap' (for
> exemple, this simple script can be used to inspect a VRSAVE registers on
given
> thread that is running a program called 'vrsave_' [5]) or using the
'perf' tool.
>
> "tsk->thread.used_vsr" [6] is actually associated to the VSX facility
whilst
> MSR_VEC is associated to the VEC/VMX/Altivec facility [7], so
> "tsk->thread.used_vsr" is set to 1 once a VSX instruction is used (if
it's a new
> process or if the load_fp and load_vec counters overflowed and became
zero
> disabling VSX or if only FP or only VEC  - not both - were used in the
process).
> In that case kernel will also enable the VSX by MSR |= MSR_VSX. A similar
> mechanism drives the FP (MSR_FP) and the VEC (MSR_VEC) facilities.
>
> If both FP and VEC facilities are used the VSX facility is enabled
automatically
> since FP+VEC regsets == VSX regset [8].
>
> Thus as this mechanism is entirely opaque to userspace I understand that
if a
> program has to tell to kernel it wants to use any of these facilities
> (FP/VEC/VEC) before using it there is something wrong going in
kernelspace.
>
> Martin and Michi, if you want any help on drilling it further at kernel
side
> please let me know, maybe I can help.
>
> I didn't have the chance to reproduce the crash yet, so if I find
anything
> meaningful about it tomorrow I'll keep you posted.
>
>
> Kind regards,
> Gustavo
>
> [1]
https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L851-L869
   (FP)
> [2]
https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211
 (VEC/VMX/Altivec)
> [3]
https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211
 (VSX)
> [4]
https://github.com/torvalds/linux/commit/70fe3d980#diff-ef76830326856a12ea2b45630123d1adR239

> [5] http://cr.openjdk.java.net/~gromero/script.d <
http://cr.openjdk.java.net/%7Egromero/script.d>
> [6]
https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R310

> [7]
https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R250

> [8]
https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R437

>
>  > Best regards,
>  >
>  > Martin
>  >
>  > *From:*Michihiro Horie <HORIE at jp.ibm.com>
>  > *Sent:* Dienstag, 4. September 2018 07:32
>  > *To:* Doerr, Martin <martin.doerr at sap.com>
>  > *Cc:* Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Gustavo Romero <
gromero at linux.vnet.ibm.com>; hotspot compiler <
hotspot-compiler-dev at openjdk.java.net>; hotspot-dev at openjdk.java.net
>  > *Subject:* RE: RFR: 8208171: PPC64: Enrich SLP support
>  >
>  > Hi Goetz, Martin, and Gustavo,
>  >
>  >
>  >>First, this should have been reviewed on hotspot-compiler-dev. It is
clearly
>  >>a compiler change. _
>  > _>http://mail.openjdk.java.net/mailman/listinfo <
http://mail.openjdk.java.net/mailman/listinfo> says that hotspot-dev is for
>  >>"Technical discussion about the development of the HotSpot virtual
machine that's not specific to any particular component"
>  >>while hotspot-compiler-dev is for
>  >>"Technical discussion about the development of the HotSpot bytecode
compilers"
>  > I understood the instruction and would use hotspot-compiler-dev in
future RFRs, thanks.
>  >
>  >
>  >> Why do you rename vnoreg to vnoregi?
>  > I followed the way of coding for vsnoreg and vsnoregi, but the
renaming does not look necessary. I would get this part back. Should I also
rename vsnoregi to vsnoreg?
>  >
>  >
>  >>we noticed jtreg test failures when using this change:
>  >>compiler/runtime/safepoints/TestRegisterRestoring.java
>  >>compiler/runtime/Test7196199.java
>  >>
>  >>TestRegisterRestoring is a simple test which returns arbitrary results
instead of 10000.
>  >>
>  >>We didn't see it on all machines, so it might be an issue with
saving&restoring VR registers in the signal handler.
>  >>The machine which I have used has "SUSE Linux Enterprise Server 12
SP3" with kernel 4.4.126-94.22-default.
>  > Thank you for letting me know the issue, I will try to reproduce this
on a SUSE machine.
>  >
>  >
>  >>I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file
when your patch is applied. Looks like matching the vector nodes needs to
be prevented.
>  > Thank you for pointing out another issue. Currently I do not hit this
problem, but preventing to match the vector nodes makes sense to avoid the
crash. I did not prepare match rules for non-vector nodes, so it might be
better to prepare them similarly like the Replicate* rules, in any case.
>  >
>  >
>  > Gustavo, thanks for the wrap-up!
>  >
>  >
>  > Best regards,
>  > --
>  > Michihiro,
>  > IBM Research - Tokyo
>  >
>  > Inactive hide details for "Doerr, Martin" ---2018/09/04 02:18:24---Hi
Gustavo and Michihiro, we noticed jtreg test failures whe"Doerr, Martin"
---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test
failures when using this change:
>  >
>  > From: "Doerr, Martin" <martin.doerr at sap.com <
mailto:martin.doerr at sap.com>>
>  > To: Gustavo Romero <gromero at linux.vnet.ibm.com <
mailto:gromero at linux.vnet.ibm.com>>, "Lindenmaier, Goetz"
<goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>, Michihiro
Horie <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
>  > Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net <
mailto:hotspot-compiler-dev at openjdk.java.net>>,
"hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>"
<hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>>
>  > Date: 2018/09/04 02:18
>  > Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>  >
>  >
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>  >
>  >
>  >
>  >
>  > Hi Gustavo and Michihiro,
>  >
>  > we noticed jtreg test failures when using this change:
>  > compiler/runtime/safepoints/TestRegisterRestoring.java
>  > compiler/runtime/Test7196199.java
>  >
>  > TestRegisterRestoring is a simple test which returns arbitrary results
instead of 10000.
>  >
>  > We didn't see it on all machines, so it might be an issue with
saving&restoring VR registers in the signal handler.
>  > The machine which I have used has "SUSE Linux Enterprise Server 12
SP3" with kernel 4.4.126-94.22-default.
>  >
>  > That's what I found out so far. Maybe you have an idea?
>  >
>  > I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file
when your patch is applied. Looks like matching the vector nodes needs to
be prevented.
>  >
>  > Best regards,
>  > Martin
>  >
>  >
>  > -----Original Message-----
>  > From: hotspot-dev <hotspot-dev-bounces at openjdk.java.net <
mailto:hotspot-dev-bounces at openjdk.java.net>> On Behalf Of Gustavo Romero
>  > Sent: Montag, 3. September 2018 14:57
>  > To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com <
mailto:goetz.lindenmaier at sap.com>>; Michihiro Horie <HORIE at jp.ibm.com <
mailto:HORIE at jp.ibm.com>>
>  > Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net <
mailto:hotspot-compiler-dev at openjdk.java.net>>;
hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>
>  > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>  >
>  > Hi Goetz,
>  >
>  > On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote:
>  >> Also, I can not find all of the mail traffic in
>  >>
http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html.
>  >> Is this a problem of the pipermail server?
>  >>
>  >> For some reason this webrev lacks the links to browse the diffs.
>  >> Do you need to use a more recent webrev?  You can obtain it with
>  >> hg clone http://hg.openjdk.java.net/code-tools/webrev/ .
>  >
>  > Yes, probably it was a problem of the pipermail or in some relay.
>  > I noted the same thing, i.e. at least one Michi reply arrived
>  > to me but missed a ML.
>  >
>  > The initial discussion is here:
>  >
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html

>  >
>  > I understand Martin reviewed the last webrev in that thread, which is
>  > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ <
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/> <
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/>  (taken from
>  >
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html
)
>  >
>  > Martin's review of webrev.01:
>  >
http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html
>  >
>  > and Michi's reply to Martin's review of webrev.01:
>  >
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html
 (with webrev.02,
>  > taken from
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html
).
>  >
>  > and your last review:
>  >
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html

>  >
>  >
>  > HTH.
>  >
>  > Best regards,
>  > Gustavo
>  >
>  >> Why do you rename vnoreg to vnoregi?
>  >>
>  >> Besides that the change is fine, thanks for implementing this!
>  >>
>  >> Best regards,
>  >>    Goetz.
>  >>
>  >>
>  >>> -----Original Message-----
>  >>> From: Doerr, Martin
>  >>> Sent: Dienstag, 28. August 2018 19:35
>  >>> To: Gustavo Romero <gromero at linux.vnet.ibm.com <
mailto:gromero at linux.vnet.ibm.com>>; Michihiro Horie
>  >>> <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
>  >>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com <
mailto:goetz.lindenmaier at sap.com>>; hotspot-
>  >>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>;
ppc-aix-port-dev at openjdk.java.net <mailto:ppc-aix-port-dev at openjdk.java.net
>; Simonis, Volker
>  >>> <volker.simonis at sap.com <mailto:volker.simonis at sap.com>>
>  >>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>  >>>
>  >>> Hi Michihiro,
>  >>>
>  >>> thank you for implementing it. I have just taken a first look at
your
>  >>> webrev.01.
>  >>>
>  >>> It looks basically good. Only the Power version check seems to be
incorrect.
>  >>> VM_Version::has_popcntb() checks for Power5.
>  >>> I believe most instructions are available with Power7.
>  >>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with
>  >>> Power8?
>  >>> We should check this carefully.
>  >>>
>  >>> Also, indentation in register_ppc.hpp could get improved.
>  >>>
>  >>> Thanks and best regard,
>  >>> Martin
>  >>>
>  >>>
>  >>> -----Original Message-----
>  >>> From: Gustavo Romero <gromero at linux.vnet.ibm.com <
mailto:gromero at linux.vnet.ibm.com>>
>  >>> Sent: Donnerstag, 26. Juli 2018 16:02
>  >>> To: Michihiro Horie <HORIE at jp.ibm.com <mailto:HORIE at jp.ibm.com>>
>  >>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com <
mailto:goetz.lindenmaier at sap.com>>; hotspot-
>  >>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>; Doerr, Martin
<martin.doerr at sap.com <mailto:martin.doerr at sap.com>>; ppc-aix-
>  >>> port-dev at openjdk.java.net <mailto:port-dev at openjdk.java.net>;
Simonis, Volker <volker.simonis at sap.com <mailto:volker.simonis at sap.com>>
>  >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>  >>>
>  >>> Hi Michi,
>  >>>
>  >>> On 07/26/2018 01:43 AM, Michihiro Horie wrote:
>  >>>> I updated webrev:
>  >>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ <
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/> <
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/>
>  >>>
>  >>> Thanks for providing an updated webrev and for fixing indentation
and
>  >>> function
>  >>> order in assembler_ppc.inline.hpp as well. I have no further
comments :)
>  >>>
>  >>>
>  >>> Best Regards,
>  >>> Gustavo
>  >>>
>  >>>>
>  >>>> Best regards,
>  >>>> --
>  >>>> Michihiro,
>  >>>> IBM Research - Tokyo
>  >>>>
>  >>>> Inactive hide details for Gustavo Romero ---2018/07/25
23:05:32---Hi Michi,
>  >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---
>  >>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro
Horie
>  >>> wrote:
>  >>>>
>  >>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com <
mailto:gromero at linux.vnet.ibm.com>>
>  >>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-
>  >>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>,
hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>
>  >>>> Cc: goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>,
volker.simonis at sap.com <mailto:volker.simonis at sap.com>, "Doerr, Martin"
>  >>> <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>
>  >>>> Date: 2018/07/25 23:05
>  >>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>  >>>>
>  >>>>
-------------------------------------------------------------------------------------------

>  >>>
----------------------------------------------------------------------------------------------

>  >>>
----------------------------------------------------------------------------------------------

>  >>>
----------------------------------------------------------------------------------------------

>  >>>
----------------------------------------------------------------------------------------------

>  >>>
----------------------------------------------------------------------------------------------

>  >>>
----------------------------------------------------------------------------------------------

>  >>>
----------------------------------------------------------------------------------------------

>  >>>
----------------------------------------------------------------------------------------------

>  >>>
----------------------------------------------------------------------------------------------

>  >>> -----------------------------------------------------
>  >>>>
>  >>>>
>  >>>>
>  >>>> Hi Michi,
>  >>>>
>  >>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:
>  >>>>   > Dear all,
>  >>>>   >
>  >>>>   > Would you review the following change?
>  >>>>   > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171
>  >>>>   > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 <
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.00> <
http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.00>
>  >>>>   >
>  >>>>   > This change adds support for vectorized arithmetic calculation
with SLP.
>  >>>>   >
>  >>>>   > The to_vr function is added to convert VSR to VR. Currently,
vecX is
>  >>> associated with a VSR class vs_reg that only defines VSR32-51 in
ppc.ad,
>  >>> which are exactly overlapped with VRs. Instruction APIs receiving
VRs use the
>  >>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable
the
>  >>> matching with SqrtVF. I think the change in sqrtF_reg would be fine
due to
>  >>> the ConvD2FNode::Value in convertnode.cpp.
>  >>>>
>  >>>> Looks good. Just a few comments:
>  >>>>
>  >>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of
>  >>> vmaddfp in
>  >>>>     order to avoid the splat?
>  >>>>
>  >>>> - Although all instructions added by your change where introduced
in ISA
>  >>> 2.06,
>  >>>>     so POWER7 and above are OK, as I see probes for
>  >>> PowerArchictecturePPC64=6|5 in
>  >>>>     vm_version_ppc.cpp (line 64),  I'm wondering if there is any
control point
>  >>> to
>  >>>>     guarantee that these instructions won't be emitted on a CPU
that does
>  >>> not
>  >>>>     support them.
>  >>>>
>  >>>> - I think that in general string in format %{} are in upper case.
For instance,
>  >>>>     this the current output on optoassembly for vmul4F:
>  >>>>
>  >>>> 2941835 5b4     ADDI    R24, R24, #64
>  >>>> 2941836 5b8     vmaddfp  VSR32,VSR32,VSR36      ! mul packed4F
>  >>>> 2941837 5c0     STXVD2X     [R17], VSR32        // store 16-byte
Vector
>  >>>>
>  >>>>     I think it would be better to be in upper case instead. I also
think that if
>  >>>>     the node match emits more than one instruction all instructions
must be
>  >>> listed
>  >>>>     in format %{}, since it's meant for detailed debugging. Finally
I think it
>  >>>>     would be better to replace \t! by \t// in that string (unless
I'm missing any
>  >>>>     special meaning for that char). So for vmul4F it would be
something like:
>  >>>>
>  >>>> 2941835 5b4     ADDI      R24, R24, #64
>  >>>>                   VSPLTISW  VSR34, 0                 // Splat 0 imm
in VSR34
>  >>>> 2941836 5b8     VMADDFP   VSR32,VSR32,VSR36,VSR34  // Mul packed4F
>  >>>> 2941837 5c0     STXVD2X   [R17], VSR32             // store 16-byte
Vector
>  >>>>
>  >>>>
>  >>>> But feel free to change anything just after you get additional
reviews :)
>  >>>>
>  >>>>
>  >>>>   > I confirmed this change with JTREG. In addition, I used
attached micro
>  >>> benchmarks.
>  >>>>   > /(See attached file: slp_microbench.zip)/
>  >>>>
>  >>>> Thanks for sharing it.
>  >>>> Btw, another option to host it would be in the CR
>  >>>> server, in http://cr.openjdk.java.net/~mhorie/8208171 <
http://cr.openjdk.java.net/%7Emhorie/8208171> <
http://cr.openjdk.java.net/%7Emhorie/8208171>
>  >>>>
>  >>>>
>  >>>> Best regards,
>  >>>> Gustavo
>  >>>>
>  >>>>   >
>  >>>>   > Best regards,
>  >>>>   > --
>  >>>>   > Michihiro,
>  >>>>   > IBM Research - Tokyo
>  >>>>   >
>  >>>>
>  >>>>
>  >>>>
>  >>
>  >
>  >
>  >
>  >
>
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180907/c679c3c2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180907/c679c3c2/graycol-0001.gif>

From aph at redhat.com  Fri Sep  7 15:26:36 2018
From: aph at redhat.com (Andrew Haley)
Date: Fri, 7 Sep 2018 16:26:36 +0100
Subject: [aarch64-port-dev ] RFR(S): 8210461 - AArch64: Math.cos intrinsic
 gives incorrect results
In-Reply-To: <4ef01df1-0c2f-78ca-f2e5-f10b78247140@bell-sw.com>
References: <d4dc48cc-d12e-9b62-2481-1e2d9c6b5eed@bell-sw.com>
 <ea8cb840-1696-a141-eade-e0247da166a7@redhat.com>
 <4ef01df1-0c2f-78ca-f2e5-f10b78247140@bell-sw.com>
Message-ID: <5ebc307e-e1ce-06e8-d42c-36206685c679@redhat.com>

On 09/07/2018 03:03 PM, Dmitrij Pochepko wrote:
> I remember debugging this branch while running JCK tests.
> 
> Haven't checked precisely, but probably fw was  0 on those cases, so, z 
> - two24B*fw and z + tmp24B*fw. It would explain such behavior.

I see.

I wrote some simple code to stress test argument reduction, and it
immediately failed.  The range reduction code is so horribly
complicated that the *first thing* to have done should have been to
stress test it, and evidently that was not done.

The code, as it stands, is so complicated and tangled that it is
almost impossible for anybody to debug and analyse. Its documentation
is inadequate, for the same reasons that Andrew Dinn explained with
respect to pow(). I can't have any confidence that there aren't more
lurking bugs, and this method is too important to risk breakage. It
needs some major reworking. In hindsight, I should not have accepted
it.

It's too late to get this fixed in the JDK 11 release, so it's going
to go out broken on AArch64. I'll disable the intrinsic in JDK devel
and tell the distro packagers to do patch their packages. Then we can
rewrite this intrinsic with a view of fixing its maintainability and
documentation, and perhaps including it in JDK 12.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From martin.doerr at sap.com  Fri Sep  7 16:11:34 2018
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 7 Sep 2018 16:11:34 +0000
Subject: RFR(M): 8210497: [PPC64] Vector registers not saved across safepoint
Message-ID: <a86078397e244b24af72716b50f221f6@sap.com>

Hi,

we noticed that the RegisterSaver misses code to save and restore the vector registers on PPC64. I'd like to fix that.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8210497

Webrev:
http://cr.openjdk.java.net/~mdoerr/8210497_PPC64_save_CR/webrev.00/

This webrev already fixes the following tests when JDK-8208171 webrev.03 is applied:
compiler/runtime/safepoints/TestRegisterRestoring.java
compiler/runtime/Test7196199.java

I'll try to test the OopMap part. This may be tricky.

Best regards,
Martin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180907/d2424823/attachment.html>

From dmitrij.pochepko at bell-sw.com  Fri Sep  7 16:42:41 2018
From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko)
Date: Fri, 7 Sep 2018 19:42:41 +0300
Subject: RFR: 8189107 - AARCH64: create intrinsic for pow
In-Reply-To: <ae6b81dd-795d-e23f-099a-761d776f3466@redhat.com>
References: <d5554366-9851-a63c-b8d2-c1300219f487@bell-sw.com>
 <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com>
 <c9153905-1f74-fb34-4938-b48b34d5a12c@redhat.com>
 <a7707848-3075-9fa4-f81d-89c128fd6dca@bell-sw.com>
 <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com>
 <ae6b81dd-795d-e23f-099a-761d776f3466@redhat.com>
Message-ID: <f44efcd7-5051-5868-a093-1107cc5d0ddb@bell-sw.com>

Hi Andrew,

Thank you again for looking into it in such details. It will take me 
some time to review your draft with comments related to original code. 
Looking forward to work on improving the code and algorithm description 
after that.

Small note though: since you're adding documentation to original code, 
it probably would make sense to update it in original location as well 
at src/hotspot/share/runtime/sharedRuntimeTrans.cpp

Thanks,
Dmitrij


On 07/09/18 15:58, Andrew Dinn wrote:
> Hi Dmitrij
>
> On 22/08/18 11:04, Andrew Dinn wrote:
>> Thank you for the revised webrev and new test results. I am now working
>> through them. I will post comments as soon as I have given the new code
>> a full read and assessed the new results. I am afraid that may take a
>> day or two, for which delay advance apologies.
> This review has taken a great deal longer than expected. I am sorry but
> that is because the documentation for the code you have submitted is
> still seriously inadequate and I have had to put a lot of work into
> revising it before I can fully review the code.
>
> I am still finishing off that last task but I wanted to start providing
> you with some feedback and also to enlist your help in checking that my
> revisions are correct. I plan to provide feedback in 3 stages to match
> the 3 steps in the review that I am doing as follows:
>
> 1) Correct the original 'algorithm' you started from
>
> 2) Correct the 'modified algorithm' that is meant to describe the
> behaviour of your code
>
> 3) Propose any necessary corrections/improvements to the generated code
>
> So, let's start with step 1.
>
> The 'original' algorithm located in file macroAssembler_aarch64_pow.cpp
> is really just a fragment of C code with a few missing elements (e.g.
> the origin of values P1, P2, ... is not explained, hugeX, tiny are not
> defined). Although this code as the virtue that it is known to be
> correct (or at least has been verified by long use and the eyes of
> experts in numerical computation) it still fails to provide important
> information about what the 'algorithm' is supposed to do. That
> information is critical for anyone coming to it fresh to be able
> understand what is happening.
>
> The first omission is several pieces of background mathematics that are
> neither explained in the code nor referenced. The mathematics includes
> the formulae on which the algorithm is based and the numerical
> approximation to these formulae that is employed to define the
> algorithm. This is needed to explain /how/ and /why/ a) the two
> different computations of log2(x) and b) the computation of exp(x) are
> performed as they are and to justify that the results are valid.
>
> The second omission is detailed descriptions of what most of the more
> complex individual steps in the algorithm do. Many of the logic,
> floating point and branching operations which compute intermediate
> results are extremely opaque. This is particularly so for the steps
> which manipulate bit patterns in the long representation of the fp
> values being used. However, some of the straight fp arithmetic is also
> highly problematic.
>
> The other thing I think needs to be made clearer is the relationship
> between the various special case return points in the code and the
> special case rules they relate to. This is not so critical for the
> original algorithm because the C code at least has a regular and
> standardised control flow. However, labelling the exit paths is still
> useful here and will be much more useful if used both here and in the
> modified algorithm (and we'll come to that later).
>
> I have rewritten the algorithm to achieve what I think is needed to
> patch these omissions. The redraft of this part of the code is available
> here:
>
>    http://cr.openjdk.java.net/~adinn/8189107/original_algorithm.txt
>
> I assume you are familiar with the relevant mathematics and how it has
> been used to derive the algorithm. If so then I would like you to review
> this rewrite and ensure that there are nor mathematical errors in it. I
> would also like you to check that the explanatory comments for of the
> individual steps in the algorithm do not contain any errors.
>
> If you are not familiar with the mathematics then please let me know. I
> need to know whether this has been reviewed bu someone competent to do so.
>
> n.b. one little detail you might easily miss. I removed lg2, lg2_h and
> lg2_l from the first table of constants as neither log(x) algorithm
> needs them (it relies on ivln2). I renamed the entries in the second
> table from LG2, etc to /ln2/, etc and change the name accordingly at
> point of use. The computation of exp(x) actually does need ln2. One of
> the code changes is to remove these redundant entries from your table
> pow_coeff1.
>
> Ok, as for the next 2 steps will post a follow-up to deal with them once
> I have completed my review. That will include a heavily revised version
> of your 'modified algorithm' (which is still in progress) plus
> suggestions for changes to the code that I have found along the way.
> Just as a preliminary I'll summarize what is wrong below.
>
> Note that I have not yet found any errors in how the generated code
> implements the mathematics but I am still not happy with it because it
> is extremely unclear. Correcting the 'modified algorithm' is a
> necessary, critical step to improvimg the clarity of the code.
>
> So, in overview, what is wrong with your 'modified algorithm'. Well, the
> thing that is immediately obvious is that it is /not/ actually the
> algorithm you have employed. It is simply a mangled version of the C
> code you started from that bears only a tenuous relation to the code
> structure it is supposed to summarize. Now, I'm happy for you to use C
> to model the generated code if possible and, in fact, am in the process
> of writing a proper algorithm that looks as much like C code as possible
> /but/ also actually describes what your generated code does. The problem
> is that what you have written is not only /not/ C it is also i)
> incoherent, ii) retains elements from the original code that don't exist
> at all in the generated code and iii) omits important elements of the
> generated code.
>
> So, firstly, let's deal with the problem as it relates to control flow.
> Your 'modified algorithm' includes various tags mentioning the word
> 'label' suggesting that some transfer of control is to be effected.
> However, these are tacked onto statement blocks connected via 'if
> (cond)' tests or 'else' keywords that are meant to imply some
> alternative control flow. Essentially, your generated code relies on
> gotos which do not fit a standard if/else flow model and you have tried
> to bodge some sort of goto model on top of the original valid, gotoless
> C control flow with no clear definition of how that is meant to work.
> Honestly, if your generated code uses a goto control flow then your C
> algorithm is going to have to do the same in order to clearly summarize
> what the code actually does.
>
> The second major problem is one I pointed out in my earlier note, i.e.
> that the data values described in the 'modified algorithm' do not
> correctly match the ones operated on in your generated code. Your
> algorithm lists many redundant values used in the original algorithm
> (e.g. ix, iy, ax, yisint) even though your code doesn't ever explicitly
> construct most of those values (n.b. this but not just limited to the 32
> bit half-word quantities). Instead the code frequently pulls the
> relevant value, as needed, out of other data that it does construct and
> holds in registers -- sometimes across control branches. At other times
> it performs an equivalent operation on a different, related data value.
> Your response to my request was to add comments which labelled some of
> these on-the-fly created values or alternative values with the original
> names but that ignores the fact that the names and the values referenced
> in the comments do not actually match.
>
> Contrariwise, a lot of the values the code does actually operate on are
> not mentioned in the algorithm. Indeed, it is worse than that because
> they are not coherently identified even in the generated code. Data
> items stored in registers are referred to using the utterly redundant
> symbolic aliases tmp0, tmp2, etc for registers r0, r1 etc. What is worse
> the same meaningless symbolic names get reused for completely different
> data items.
>
> For example, at one point tmp2 identifies the exponent of y stored in r2
> and later it identifies the absolute value of y also stored in r2,
> overwriting the exponent. Your algorithm really ought to mention values
> like exp_y or ay (or even |y|) for these cases and the code should
> correspondingly define exp_y and ay as an alias for register r2. These
> meaningful names should then be used when loading the constructed value
> into a register and at every subsequent point of use where that
> constructed value is valid.
>
> This is not all that is wrong with the 'modified algorithm' but it is
> enough to make it not just useless but worse than useless. What you have
> written provides a hand-wave towards what the code does that fails to
> summarize it with any accuracy or clarity and equally fails to clarify
> the difference between it and the C code you started from. That only
> makes the whole picture less clear not more so.
>
> As I said, I will provide a better version of the 'modified algorithm'
> in a follow-up and then discuss possible code changes. Please review the
> linked file above while I prepare that.
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander


From dmitrij.pochepko at bell-sw.com  Fri Sep  7 16:45:36 2018
From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko)
Date: Fri, 7 Sep 2018 19:45:36 +0300
Subject: [aarch64-port-dev ] RFR(S): 8210461 - AArch64: Math.cos intrinsic
 gives incorrect results
In-Reply-To: <5ebc307e-e1ce-06e8-d42c-36206685c679@redhat.com>
References: <d4dc48cc-d12e-9b62-2481-1e2d9c6b5eed@bell-sw.com>
 <ea8cb840-1696-a141-eade-e0247da166a7@redhat.com>
 <4ef01df1-0c2f-78ca-f2e5-f10b78247140@bell-sw.com>
 <5ebc307e-e1ce-06e8-d42c-36206685c679@redhat.com>
Message-ID: <e4ec413f-3ace-e8df-44e3-9f993dc5ea6a@bell-sw.com>

Hi Andrew,

Ok. I'm really sorry to have introduced such a bug and I agree that the 
best strategy is to disable the intrinsic temporarily for sin and cos.

I aim to work with Andrew Dinn on pow to calibrate and enhance 
documentation and algorithm there first. Then I'll get back to sin/cos 
and revise it in a same manner.

Meanwhile, do we have to abandon this particular patch? It still resolve 
this particular problem and it would be a waste to re-debug and fix this 
problem later.

Thanks,
Dmitrij


On 07/09/18 18:26, Andrew Haley wrote:
> On 09/07/2018 03:03 PM, Dmitrij Pochepko wrote:
>> I remember debugging this branch while running JCK tests.
>>
>> Haven't checked precisely, but probably fw was  0 on those cases, so, z
>> - two24B*fw and z + tmp24B*fw. It would explain such behavior.
> I see.
>
> I wrote some simple code to stress test argument reduction, and it
> immediately failed.  The range reduction code is so horribly
> complicated that the *first thing* to have done should have been to
> stress test it, and evidently that was not done.
>
> The code, as it stands, is so complicated and tangled that it is
> almost impossible for anybody to debug and analyse. Its documentation
> is inadequate, for the same reasons that Andrew Dinn explained with
> respect to pow(). I can't have any confidence that there aren't more
> lurking bugs, and this method is too important to risk breakage. It
> needs some major reworking. In hindsight, I should not have accepted
> it.
>
> It's too late to get this fixed in the JDK 11 release, so it's going
> to go out broken on AArch64. I'll disable the intrinsic in JDK devel
> and tell the distro packagers to do patch their packages. Then we can
> rewrite this intrinsic with a view of fixing its maintainability and
> documentation, and perhaps including it in JDK 12.
>


From aph at redhat.com  Sat Sep  8 09:14:07 2018
From: aph at redhat.com (Andrew Haley)
Date: Sat, 8 Sep 2018 10:14:07 +0100
Subject: [aarch64-port-dev ] RFR(S): 8210461 - AArch64: Math.cos intrinsic
 gives incorrect results
In-Reply-To: <e4ec413f-3ace-e8df-44e3-9f993dc5ea6a@bell-sw.com>
References: <d4dc48cc-d12e-9b62-2481-1e2d9c6b5eed@bell-sw.com>
 <ea8cb840-1696-a141-eade-e0247da166a7@redhat.com>
 <4ef01df1-0c2f-78ca-f2e5-f10b78247140@bell-sw.com>
 <5ebc307e-e1ce-06e8-d42c-36206685c679@redhat.com>
 <e4ec413f-3ace-e8df-44e3-9f993dc5ea6a@bell-sw.com>
Message-ID: <7e4549e8-232e-c329-b92e-a368f3b7b7ba@redhat.com>

On 09/07/2018 05:45 PM, Dmitrij Pochepko wrote:
> Meanwhile, do we have to abandon this particular patch? It still resolve 
> this particular problem and it would be a waste to re-debug and fix this 
> problem later.

No, we don't have to abandon this patch. Please push it to JDK head,
thanks.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From adinn at redhat.com  Sun Sep  9 08:08:12 2018
From: adinn at redhat.com (Andrew Dinn)
Date: Sun, 9 Sep 2018 09:08:12 +0100
Subject: RFR: 8189107 - AARCH64: create intrinsic for pow
In-Reply-To: <f44efcd7-5051-5868-a093-1107cc5d0ddb@bell-sw.com>
References: <d5554366-9851-a63c-b8d2-c1300219f487@bell-sw.com>
 <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com>
 <c9153905-1f74-fb34-4938-b48b34d5a12c@redhat.com>
 <a7707848-3075-9fa4-f81d-89c128fd6dca@bell-sw.com>
 <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com>
 <ae6b81dd-795d-e23f-099a-761d776f3466@redhat.com>
 <f44efcd7-5051-5868-a093-1107cc5d0ddb@bell-sw.com>
Message-ID: <caf4778c-3ce8-a319-9312-6bceea250457@redhat.com>

Hi Dmitrij

On 07/09/18 17:42, Dmitrij Pochepko wrote:
> Thank you again for looking into it in such details. It will take me
> some time to review your draft with comments related to original code.
> Looking forward to work on improving the code and algorithm description
> after that.

You are welcome. However, thanks are not needed. This is simply what I
am required to do as a reviewer.

> Small note though: since you're adding documentation to original code,
> it probably would make sense to update it in original location as well
> at src/hotspot/share/runtime/sharedRuntimeTrans.cpp
I agree that it would be better if comments in that shared code were
also updated. However, I recommend we pursue that task as a follow-up
once we have fixed the intrinsic.

Also, it's important to note that the omission in the above file are, to
a degree, mitigated by the /slightly/ more complete documentation in
file src/java.base/share/classes/java/lang/FdLibm.java. Comments in the
methods for computing log(x) and exp(x) in the latter file include some
of same details of the maths/algorithms that I described (I only found
these comments after deriving the relevant maths myself :-).

So, we might consider upgrading the comments in the Java source and
adding a cross-reference to that file from the C source. The code itself
is almost identical so one commented version should work for both.

I'd still like my comments to remain in your generator code. This is the
most complex implementation version and has the greatest divergence from
the original. So, it will be the focus of any nasty bugs that arise.
Having an explanation of the maths and algorithm right there in the
generator source is going to ensure whoever has to fix any such bug is
best prepared to do so. Also, it will pin down the version of the shared
code from which the generator was derived. The shared code ought not to
be updated without changing the generator code but keeping the C
template in with the generator is safer.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From jayaprabhakar at gmail.com  Mon Sep 10 03:58:02 2018
From: jayaprabhakar at gmail.com (jayaprabhakar k)
Date: Sun, 9 Sep 2018 20:58:02 -0700
Subject: Any way to avoid JIT overhead for small programs when using AOT?
Message-ID: <CA+t=Si+mtt7YVS2tAtMfD9MWhxOJkHky9T_i68EMHWvgHgEioQ@mail.gmail.com>

Hi,
I understand that at present AOT and -Xint are not compatible. I see the
code explicitly disables AOT when -Xint is set
<http://cr.openjdk.java.net/~kvn/8171137/webrev/raw_files/new/src/share/vm/aot/aotLoader.cpp>
.

For extremely short programs, typically used by beginners learning Java, I
see that CDS, AOT and Xint all help reduce the startup time. While CDS
works with both AOT and Xint, multiplying the benefits, AOT and Xint do
not.

Is there a way to keep both AOT + Xint, For classes/methods that are
precompiled, use AOT code, and for others just interpret? If not now, would
it be possible in the future?

Thanks,
JP
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180909/86bb6624/attachment.html>

From Pengfei.Li at arm.com  Mon Sep 10 04:24:16 2018
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Mon, 10 Sep 2018 04:24:16 +0000
Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by power-of-2
 check
Message-ID: <DB7PR08MB31150B1D6C7E547538B2B99A96050@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi Dean / Vladimir / JDK experts,

Do you have any further questions or comments on this patch? Or should I make some modifications on it, such as adding some limitations to the matching condition?
I appreciate your help.

--
Thanks,
Pengfei


> -----Original Message-----
> From: Pengfei Li (Arm Technology China)
> Sent: Monday, September 3, 2018 13:50
> To: 'dean.long at oracle.com' <dean.long at oracle.com>; 'Vladimir Kozlov'
> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net;
> hotspot-dev at openjdk.java.net
> Cc: nd <nd at arm.com>
> Subject: RE: RFR(S): 8210152: Optimize integer divisible by power-of-2 check
> 
> Hi Vladimir, Dean,
> 
> Thanks for your review.
> 
> > I don't see where negation is coming from for 'X % 2 == 0' expression.
> > It should be only 2 instructions: 'cmp (X and 1), 0'
> The 'cmp (X and 1), 0' is just what we expected. But there's redundant
> conditional negation coming from the possibly negative X handling in "X % 2".
> For instance, X = -5, "X % 2" should be -1. So only "(X and 1)" operation is not
> enough. We have to negate the result.
> 
> > I will look on it next week. But it would be nice if you can provide small test
> to show this issue.
> I've already provided a case of "if (a%2 == 0) { ... }" in JBS description. What
> code generated and what can be optimized are listed there.
> You could see https://bugs.openjdk.java.net/browse/JDK-8210152 for details.
> You could also see the test case for this optimization I attached below.
> 
> > It looks like your matching may allow more patterns than expected. I was
> expecting it to look for < 0 or >= 0 for the conditional negation, but I don't see
> it.
> Yes. I didn't limit the if condition to <0 or >= 0 so it will match more patterns.
> But nothing is going wrong if this ideal transformation applies on more cases.
> In pseudo code, if someone writes:
> if ( some_condition ) { x = -x; }
> if ( x == 0 ) { do_something(); }
> The negation in 1st if-clause could always be eliminated whatever the
> condition is.
> 
> --
> Thanks,
> Pengfei
> 
> 
> -- my test case attached below --
> public class Foo {
> 
>     public static void main(String[] args) {
>         int[] dividends = { 0, 17, 1553, -90, -35789, 0x80000000 };
>         for (int i = 0; i < dividends.length; i++) {
>             int x = dividends[i];
>             System.out.println(testDivisible(x));
>             System.out.println(testModulo(x));
>             testCondNeg(x);
>         }
>         return;
>     }
> 
>     public static int testDivisible(int x) {
>         // Modulo result is only for zero check
>         if (x % 4 == 0) {
>             return 444;
>         }
>         return 555;
>     }
> 
>     public static int testModulo(int x) {
>         int y = x % 4;
>         if (y == 0) {
>             return 222;
>         }
>         // Modulo result is used elsewhere
>         System.out.println(y);
>         return 333;
>     }
> 
>     public static void testCondNeg(int x) {
>         // Pure conditional negation
>         if (printAndIfNeg(x)) {
>             x = -x;
>         }
>         if (x == 0) {
>             System.out.println("zero!");
>         }
>     }
> 
>     static boolean printAndIfNeg(int x) {
>         System.out.println(x);
>         return x <= 0;
>     }
> }

From dean.long at oracle.com  Mon Sep 10 08:00:29 2018
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Mon, 10 Sep 2018 01:00:29 -0700
Subject: Any way to avoid JIT overhead for small programs when using AOT?
In-Reply-To: <CA+t=Si+mtt7YVS2tAtMfD9MWhxOJkHky9T_i68EMHWvgHgEioQ@mail.gmail.com>
References: <CA+t=Si+mtt7YVS2tAtMfD9MWhxOJkHky9T_i68EMHWvgHgEioQ@mail.gmail.com>
Message-ID: <fa2711c0-e73e-b7b0-9ca6-5d0fb52cb330@oracle.com>

On 9/9/18 8:58 PM, jayaprabhakar k wrote:
> Hi,
> I understand that at present AOT and -Xint are not compatible. I see 
> the code explicitly disables AOT when -Xint is set 
> <http://cr.openjdk.java.net/%7Ekvn/8171137/webrev/raw_files/new/src/share/vm/aot/aotLoader.cpp>.
>
> For extremely short programs, typically used by beginners learning 
> Java, I see that CDS, AOT and Xint all help reduce the startup time. 
> While CDS works with both AOT and Xint, multiplying the benefits, AOT 
> and Xint do not.
>
> Is there a way to keep both AOT?+ Xint, For classes/methods that are 
> precompiled, use AOT code, and for others just interpret? If not now, 
> would it be possible in the future?
>
> Thanks,
> JP

Hi JP.? Yes, it could be possible in the future.? One problem is 
MethodHandle intrinsics.? With -Xint, there's no code heap, so no place 
to generate native adapters for those intrinsics.

dl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180910/5f3ec9cd/attachment.html>

From aph at redhat.com  Mon Sep 10 08:17:59 2018
From: aph at redhat.com (Andrew Haley)
Date: Mon, 10 Sep 2018 09:17:59 +0100
Subject: Any way to avoid JIT overhead for small programs when using AOT?
In-Reply-To: <CA+t=Si+mtt7YVS2tAtMfD9MWhxOJkHky9T_i68EMHWvgHgEioQ@mail.gmail.com>
References: <CA+t=Si+mtt7YVS2tAtMfD9MWhxOJkHky9T_i68EMHWvgHgEioQ@mail.gmail.com>
Message-ID: <2753b70f-67c7-ef7a-ca40-49266f502401@redhat.com>

On 09/10/2018 04:58 AM, jayaprabhakar k wrote:

> I understand that at present AOT and -Xint are not compatible. I see the
> code explicitly disables AOT when -Xint is set
> <http://cr.openjdk.java.net/~kvn/8171137/webrev/raw_files/new/src/share/vm/aot/aotLoader.cpp>
> .
> 
> For extremely short programs, typically used by beginners learning Java, I
> see that CDS, AOT and Xint all help reduce the startup time. While CDS
> works with both AOT and Xint, multiplying the benefits, AOT and Xint do
> not.
> 
> Is there a way to keep both AOT + Xint, For classes/methods that are
> precompiled, use AOT code, and for others just interpret? If not now, would
> it be possible in the future?

Does it significantly help? If you precompile the Java library and your programs
are extremely short, you'll see very little compilation activity.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From kuaiwei.kw at alibaba-inc.com  Mon Sep 10 08:39:42 2018
From: kuaiwei.kw at alibaba-inc.com (Kuai Wei)
Date: Mon, 10 Sep 2018 16:39:42 +0800
Subject: =?UTF-8?B?SklUOiBDMiBkb2Vzbid0IHNraXAgcG9zdCBiYXJyaWVyIGZvciBuZXcgYWxsb2NhdGVkIG9i?=
 =?UTF-8?B?amVjdHM=?=
Message-ID: <1849545e-2e42-4b24-957c-cb4924362971.kuaiwei.kw@alibaba-inc.com>


Hi,

  Recently I checked the optimization of reducing G1 post barrier for new allocated object. But I found it doesn't work as expected.
I wrote a simple test case to store oop in initialize function or just after init function .
public class StoreTest {
    static String val="x";

    public static Foo testMethod() {
        Foo newfoo = new Foo(val);
 newfoo.b=val; // the store barrier could be reduced
        return newfoo;
    }

    public static void main(String []args) {
        Foo obj = new Foo(val);  // init Foo class
        testMethod();
    }

    static class Foo {
        Object a;
        Object b;
        public Foo(Object val) {
this.a=val; // the store barrier could be reduced
        };
    }
}
I inline Foo:<init> and Object::<init> when compile testMethod by C2, so I think the 2 store marked red don't need post barrier. But I still found post barrier in generated assembly code.
The test command: java -Xcomp -Xbatch -XX:+UseG1GC -XX:CompileCommandFile=compile_command -Xbatch -XX:+PrintCompilation -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining StoreTest
compile_command:
compileonly, StoreTest::testMethod
compileonly, StoreTest$Foo::<init>
inline, StoreTest$Foo::<init>
compileonly, java.lang.Object::<init>
inline, java.lang.Object::<init>
print, StoreTest::testMethod

I checked the node graph in parsing phase. The optimization depends on GraphKit::just_allocated_object to detect new allocate object. The idea is to check control of store is control proj of allocation. But in parse phase , there's a Region node between control proj and control of store. The region just has one input edge. So it could be optimized later. The region node is generated when C2 inline init method of super class, I think it's used in exit map to merge all exit path.

The change is simple, in just_allocated_object, I checked if there's region node with only 1 input. With the change, we can see good performance improvement in pressure test.

Could you check the change and give comments about it?

graphKit.cpp
 // We use this to determine if an object is so "fresh" that
 // it does not require card marks.
 Node* GraphKit::just_allocated_object(Node* current_control) {
-  if (C->recent_alloc_ctl() == current_control)
+  Node * ctrl = current_control;
+  if (CheckJustAllocatedAggressive) {
+    // Object::<init> is invoked after allocation, most of invoke nodes
+    // will be reduced, but a region node is kept in parse time, we check
+    // the pattern and skip the region node
+    if (ctrl != NULL && ctrl->is_Region() && ctrl->req() == 2) {
+      ctrl = ctrl->in(1);
+    }
+  }
+  if (C->recent_alloc_ctl() == ctrl)
     return C->recent_alloc_obj();
   return NULL;
 }
Thanks,
Kevin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180910/0f0f7161/attachment-0001.html>

From 944797358 at qq.com  Mon Sep 10 11:10:51 2018
From: 944797358 at qq.com (Andy Law)
Date: Mon, 10 Sep 2018 19:10:51 +0800
Subject: [PATCH] 8202414: Unsafe crash in C2
Message-ID: <F17BF47D-F247-45DB-BAB6-AB81ACEC51F2@qq.com>

This change is only about:
Disabling the un-aligned C2 `clean_memory()` optimization when using Unsafe to write to an unaligned address.

```
java -version
openjdk version "1.8.0-internal-debug"
OpenJDK Runtime Environment (build 1.8.0-internal-debug-***_2018_09_03_19_31-b00)
OpenJDK 64-Bit Server VM (build 25.71-b00-debug, mixed mode)
```

This issue 8202414 is about:
ArrayObjects of -XX:+UseCompressedOops on 64-bit has a 12 bits header and a 4 bits length. So the length address is from 12th to 16th bytes.
If we use Unsafe.putInt() to write at the 17th bit, the C2 `clean_memory()` will mistakenly do `done_offset -= BytesPerInt;`, then `done_offset` will become 13.  And then it will clear the address from the 13th bit, make the array length changes to a different value. When a GC happens, it will crash.

I didn?t find the unaligned memory support of `clear_memory()`, so I only do a small fix to make the affect be the least:
When Unsafe.put*() writes to an aligned memory as above, it will cause the assert fail. So when it fails, we don?t do any optimizations instead, and the problem solves.

I don?t know if it is a good solution? It is only 3 lines of code, so please have a look:) Thank you!


Andy

-------------- next part --------------
A non-text attachment was scrubbed...
Name: openjdk-patch-8202414.diff
Type: application/octet-stream
Size: 463 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180910/4f45e72e/openjdk-patch-8202414.diff>

From goetz.lindenmaier at sap.com  Mon Sep 10 14:15:34 2018
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Mon, 10 Sep 2018 14:15:34 +0000
Subject: RFR(S): 8210319: [s390]: Use of shift operators not covered by
 cpp standard
In-Reply-To: <B42550A0-F7E7-42B6-9FB7-88A73EB775FB@sap.com>
References: <383FFEC2-0FB7-402B-95B1-6ACD017BB630@sap.com>
 <bd3b71bbce404b7eadcf808c4ec68949@sap.com>
 <B42550A0-F7E7-42B6-9FB7-88A73EB775FB@sap.com>
Message-ID: <0cc673af4a354ddd81cd9cf639c281a6@sap.com>

HI Lutz, 

looks good to me, too.

Thanks,
  Goetz.

> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of Schmidt, Lutz
> Sent: Dienstag, 4. September 2018 14:59
> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-compiler-
> dev at openjdk.java.net
> Subject: [CAUTION] Re: RFR(S): 8210319: [s390]: Use of shift operators not
> covered by cpp standard
> 
> Hi Martin,
> 
> thanks for the review!
> 
> Regards,
> 
> Lutz
> 
> 
> 
> From: "Doerr, Martin (martin.doerr at sap.com)" <martin.doerr at sap.com>
> Date: Tuesday, 4. September 2018 at 11:28
> To: Lutz Schmidt <lutz.schmidt at sap.com>, "hotspot-compiler-
> dev at openjdk.java.net" <hotspot-compiler-dev at openjdk.java.net>
> Subject: RE: RFR(S): 8210319: [s390]: Use of shift operators not covered by
> cpp standard
> 
> 
> 
> Hi Lutz,
> 
> 
> 
> looks good. Thanks for improving.
> 
> 
> 
> Best regards,
> 
> Martin
> 
> 
> 
> 
> 
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of Schmidt, Lutz
> Sent: Dienstag, 4. September 2018 10:29
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: RFR(S): 8210319: [s390]: Use of shift operators not covered by cpp
> standard
> 
> 
> 
> Dear All,
> 
> 
> 
> may I please request reviews for this small, s390-only patch. It fixes some
> shift operations which relied on behavior not covered by the language
> standard.
> 
> Bug:    https://bugs.openjdk.java.net/browse/JDK-8210319
> <https://bugs.openjdk.java.net/browse/JDK-8210319>
> 
> Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8210319.00/
> <http://cr.openjdk.java.net/~lucy/webrevs/8210319.00/>
> 
> 
> 
> Thank you!
> 
> Lutz
> 
> 


From lutz.schmidt at sap.com  Mon Sep 10 14:20:01 2018
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Mon, 10 Sep 2018 14:20:01 +0000
Subject: RFR(S): 8210319: [s390]: Use of shift operators not covered by
 cpp standard
In-Reply-To: <0cc673af4a354ddd81cd9cf639c281a6@sap.com>
References: <383FFEC2-0FB7-402B-95B1-6ACD017BB630@sap.com>
 <bd3b71bbce404b7eadcf808c4ec68949@sap.com>
 <B42550A0-F7E7-42B6-9FB7-88A73EB775FB@sap.com>
 <0cc673af4a354ddd81cd9cf639c281a6@sap.com>
Message-ID: <1E85BE87-07BC-4A86-A72C-512F4F297F01@sap.com>

Thank you, Goetz!
With two positive reviews, and with the patch having been active in our nightly tests for several days, I'll go ahead and push.
Regards, Lutz

?On 10.09.18, 16:15, "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com> wrote:

    HI Lutz, 
    
    looks good to me, too.
    
    Thanks,
      Goetz.
    
    > -----Original Message-----
    > From: hotspot-compiler-dev <hotspot-compiler-dev-
    > bounces at openjdk.java.net> On Behalf Of Schmidt, Lutz
    > Sent: Dienstag, 4. September 2018 14:59
    > To: Doerr, Martin <martin.doerr at sap.com>; hotspot-compiler-
    > dev at openjdk.java.net
    > Subject: [CAUTION] Re: RFR(S): 8210319: [s390]: Use of shift operators not
    > covered by cpp standard
    > 
    > Hi Martin,
    > 
    > thanks for the review!
    > 
    > Regards,
    > 
    > Lutz
    > 
    > 
    > 
    > From: "Doerr, Martin (martin.doerr at sap.com)" <martin.doerr at sap.com>
    > Date: Tuesday, 4. September 2018 at 11:28
    > To: Lutz Schmidt <lutz.schmidt at sap.com>, "hotspot-compiler-
    > dev at openjdk.java.net" <hotspot-compiler-dev at openjdk.java.net>
    > Subject: RE: RFR(S): 8210319: [s390]: Use of shift operators not covered by
    > cpp standard
    > 
    > 
    > 
    > Hi Lutz,
    > 
    > 
    > 
    > looks good. Thanks for improving.
    > 
    > 
    > 
    > Best regards,
    > 
    > Martin
    > 
    > 
    > 
    > 
    > 
    > From: hotspot-compiler-dev <hotspot-compiler-dev-
    > bounces at openjdk.java.net> On Behalf Of Schmidt, Lutz
    > Sent: Dienstag, 4. September 2018 10:29
    > To: hotspot-compiler-dev at openjdk.java.net
    > Subject: RFR(S): 8210319: [s390]: Use of shift operators not covered by cpp
    > standard
    > 
    > 
    > 
    > Dear All,
    > 
    > 
    > 
    > may I please request reviews for this small, s390-only patch. It fixes some
    > shift operations which relied on behavior not covered by the language
    > standard.
    > 
    > Bug:    https://bugs.openjdk.java.net/browse/JDK-8210319
    > <https://bugs.openjdk.java.net/browse/JDK-8210319>
    > 
    > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8210319.00/
    > <http://cr.openjdk.java.net/~lucy/webrevs/8210319.00/>
    > 
    > 
    > 
    > Thank you!
    > 
    > Lutz
    > 
    > 
    
    
From lutz.schmidt at sap.com  Mon Sep 10 15:59:51 2018
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Mon, 10 Sep 2018 15:59:51 +0000
Subject: [PATCH] 8202414: Unsafe crash in C2
In-Reply-To: <F17BF47D-F247-45DB-BAB6-AB81ACEC51F2@qq.com>
References: <F17BF47D-F247-45DB-BAB6-AB81ACEC51F2@qq.com>
Message-ID: <BE14B804-FABC-492E-A6AD-F0BE942D760F@sap.com>

Hi Andy,

to avoid misunderstandings, please be precise when talking about bits and bytes. ArrayObjects (with -XX:+UseCompressedOops) have a 16-byte header, whereof the last four bytes (byte# 12..15) designate the array length (in #elements). 

And now, just checking if I got your intention right:
I read your text below as well as description and comments in https://bugs.openjdk.java.net/browse/JDK-8202414. In essence, you are trying to perform a ?-byte store into a byte array by means of a unaligned putInt() call.

To my understanding, putInt() is not designed for unaligned accesses. Even "worse", it relies on the store address to be at least 4-byte aligned. That's what I learn e.g. from http://www.docjar.com/docs/api/sun/misc/Unsafe.html. And that's the reason why your code (sometimes) destroys the length field of the ArrayObject header.

Your fix would just ignore (copy nothing) calls with unaligned end_offset. Why would you then call the unsafe function at all?

Yes, your patch would probably help in your situation. It just puts a blanket of silence over a call with unsupported parameters. 

That's how far I can comment. I am neither a reviewer nor in a position to decide if such interface violation should be handled gracefully (e.g. by throwing an exception) or if the status quo is ok. 

Thank you
Lutz


?On 10.09.18, 13:10, "hotspot-compiler-dev on behalf of Andy Law" <hotspot-compiler-dev-bounces at openjdk.java.net on behalf of 944797358 at qq.com> wrote:

    This change is only about:
    Disabling the un-aligned C2 `clean_memory()` optimization when using Unsafe to write to an unaligned address.
    
    ```
    java -version
    openjdk version "1.8.0-internal-debug"
    OpenJDK Runtime Environment (build 1.8.0-internal-debug-***_2018_09_03_19_31-b00)
    OpenJDK 64-Bit Server VM (build 25.71-b00-debug, mixed mode)
    ```
    
    This issue 8202414 is about:
    ArrayObjects of -XX:+UseCompressedOops on 64-bit has a 12 bits header and a 4 bits length. So the length address is from 12th to 16th bytes.
    If we use Unsafe.putInt() to write at the 17th bit, the C2 `clean_memory()` will mistakenly do `done_offset -= BytesPerInt;`, then `done_offset` will become 13.  And then it will clear the address from the 13th bit, make the array length changes to a different value. When a GC happens, it will crash.
    
    I didn?t find the unaligned memory support of `clear_memory()`, so I only do a small fix to make the affect be the least:
    When Unsafe.put*() writes to an aligned memory as above, it will cause the assert fail. So when it fails, we don?t do any optimizations instead, and the problem solves.
    
    I don?t know if it is a good solution? It is only 3 lines of code, so please have a look:) Thank you!
    
    
    Andy
    
    
From vladimir.kozlov at oracle.com  Mon Sep 10 17:57:17 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 10 Sep 2018 10:57:17 -0700
Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by
 power-of-2 check
In-Reply-To: <DB7PR08MB31150B1D6C7E547538B2B99A96050@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB31150B1D6C7E547538B2B99A96050@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <bcf5438c-50f9-7d75-b4d8-d09b89783816@oracle.com>

I finally have time to look on it and I agree with your changes.

The only comment I have is to add check for SubI on other branch (not only on True branch). Negation 
may occur on either branch since you accept all conditions for negation.

Thanks,
Vladimir

On 9/9/18 9:24 PM, Pengfei Li (Arm Technology China) wrote:
> Hi Dean / Vladimir / JDK experts,
> 
> Do you have any further questions or comments on this patch? Or should I make some modifications on it, such as adding some limitations to the matching condition?
> I appreciate your help.
> 
> --
> Thanks,
> Pengfei
> 
> 
>> -----Original Message-----
>> From: Pengfei Li (Arm Technology China)
>> Sent: Monday, September 3, 2018 13:50
>> To: 'dean.long at oracle.com' <dean.long at oracle.com>; 'Vladimir Kozlov'
>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net;
>> hotspot-dev at openjdk.java.net
>> Cc: nd <nd at arm.com>
>> Subject: RE: RFR(S): 8210152: Optimize integer divisible by power-of-2 check
>>
>> Hi Vladimir, Dean,
>>
>> Thanks for your review.
>>
>>> I don't see where negation is coming from for 'X % 2 == 0' expression.
>>> It should be only 2 instructions: 'cmp (X and 1), 0'
>> The 'cmp (X and 1), 0' is just what we expected. But there's redundant
>> conditional negation coming from the possibly negative X handling in "X % 2".
>> For instance, X = -5, "X % 2" should be -1. So only "(X and 1)" operation is not
>> enough. We have to negate the result.
>>
>>> I will look on it next week. But it would be nice if you can provide small test
>> to show this issue.
>> I've already provided a case of "if (a%2 == 0) { ... }" in JBS description. What
>> code generated and what can be optimized are listed there.
>> You could see https://bugs.openjdk.java.net/browse/JDK-8210152 for details.
>> You could also see the test case for this optimization I attached below.
>>
>>> It looks like your matching may allow more patterns than expected. I was
>> expecting it to look for < 0 or >= 0 for the conditional negation, but I don't see
>> it.
>> Yes. I didn't limit the if condition to <0 or >= 0 so it will match more patterns.
>> But nothing is going wrong if this ideal transformation applies on more cases.
>> In pseudo code, if someone writes:
>> if ( some_condition ) { x = -x; }
>> if ( x == 0 ) { do_something(); }
>> The negation in 1st if-clause could always be eliminated whatever the
>> condition is.
>>
>> --
>> Thanks,
>> Pengfei
>>
>>
>> -- my test case attached below --
>> public class Foo {
>>
>>      public static void main(String[] args) {
>>          int[] dividends = { 0, 17, 1553, -90, -35789, 0x80000000 };
>>          for (int i = 0; i < dividends.length; i++) {
>>              int x = dividends[i];
>>              System.out.println(testDivisible(x));
>>              System.out.println(testModulo(x));
>>              testCondNeg(x);
>>          }
>>          return;
>>      }
>>
>>      public static int testDivisible(int x) {
>>          // Modulo result is only for zero check
>>          if (x % 4 == 0) {
>>              return 444;
>>          }
>>          return 555;
>>      }
>>
>>      public static int testModulo(int x) {
>>          int y = x % 4;
>>          if (y == 0) {
>>              return 222;
>>          }
>>          // Modulo result is used elsewhere
>>          System.out.println(y);
>>          return 333;
>>      }
>>
>>      public static void testCondNeg(int x) {
>>          // Pure conditional negation
>>          if (printAndIfNeg(x)) {
>>              x = -x;
>>          }
>>          if (x == 0) {
>>              System.out.println("zero!");
>>          }
>>      }
>>
>>      static boolean printAndIfNeg(int x) {
>>          System.out.println(x);
>>          return x <= 0;
>>      }
>> }

From Alan.Bateman at oracle.com  Mon Sep 10 18:05:18 2018
From: Alan.Bateman at oracle.com (Alan Bateman)
Date: Mon, 10 Sep 2018 19:05:18 +0100
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
Message-ID: <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>

On 20/08/2018 16:18, Andrew Dinn wrote:
> Hi Alan,
>
> Round 4:
>
> I have redrafted the JEP and updated the implementation in the light of
> your last feedback:
>
>    JEP JIRA: https://bugs.openjdk.java.net/browse/JDK-8207851
>
>    Formatted JEP: http://openjdk.java.net/jeps/8207851
>
>    New webrev: http://cr.openjdk.java.net/~adinn/pmem/webrev.04/
>
>
The updated JEP looks much better.

I realize we've been through several iterations on this but I'm now 
wondering if the MappedByteBuffer is the right API. As you've shown, 
it's straight forward to map a region of NVM and use the existing API, 
I'm just not sure if it's the right API. I think I'd like to see a few 
examples of how the API might be used. ByteBuffers aren't intended for 
use by concurrent threads and I just wonder if the examples might need 
that. I also wonder if there is a possible connection with work in 
Project Panama and whether it's worth exploring if its scopes and 
pointers could be used to backed by NVM. The Risks and Assumption 
section mentions the 2GB limit which is another reminder that the MBB 
API may not be the right API.

The 2-arg force method to msync a region make sense? although it might 
be more consistent for the second parameter to be the length than the 
end offset.

A detail for later is whether UOE might be more appropriate for 
implementations that do not support the XXX_PERSISTENT modes.

-Alan.

From aph at redhat.com  Mon Sep 10 18:29:47 2018
From: aph at redhat.com (Andrew Haley)
Date: Mon, 10 Sep 2018 19:29:47 +0100
Subject: [PATCH] 8202414: Unsafe crash in C2
In-Reply-To: <BE14B804-FABC-492E-A6AD-F0BE942D760F@sap.com>
References: <F17BF47D-F247-45DB-BAB6-AB81ACEC51F2@qq.com>
 <BE14B804-FABC-492E-A6AD-F0BE942D760F@sap.com>
Message-ID: <ffbaca17-c3e3-4261-c35e-5a5bf75670ee@redhat.com>

On 09/10/2018 04:59 PM, Schmidt, Lutz wrote:
> To my understanding, putInt() is not designed for unaligned accesses. Even "worse", it relies on the store address to be at least 4-byte aligned. That's what I learn e.g. from http://www.docjar.com/docs/api/sun/misc/Unsafe.html. And that's the reason why your code (sometimes) destroys the length field of the ArrayObject header.

Exactly: user error, don't do that. The doc is clear, I think.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From vladimir.kozlov at oracle.com  Mon Sep 10 19:34:55 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 10 Sep 2018 12:34:55 -0700
Subject: [PATCH] 8202414: Unsafe crash in C2
In-Reply-To: <F17BF47D-F247-45DB-BAB6-AB81ACEC51F2@qq.com>
References: <F17BF47D-F247-45DB-BAB6-AB81ACEC51F2@qq.com>
Message-ID: <3347f875-74a2-47a9-9108-3e1685107423@oracle.com>

Thank you, Andy

Unfortunately your change may leave uninitialized (not zeroed) bytes in object.
Instead unaligned stores should be treated as subword stores:

diff -r b9f6a4427da9 src/hotspot/share/opto/memnode.cpp
--- a/src/hotspot/share/opto/memnode.cpp
+++ b/src/hotspot/share/opto/memnode.cpp
@@ -4095,10 +4095,11 @@
        // See if this store needs a zero before it or under it.
        intptr_t zeroes_needed = st_off;

-      if (st_size < BytesPerInt) {
+      if (st_size < BytesPerInt || (zeroes_needed % BytesPerInt) != 0) {
          // Look for subword stores which only partially initialize words.
          // If we find some, we must lay down some word-level zeroes first,
          // underneath the subword stores.
+        // Do the same for unaligned stores.
          //
          // Examples:
          //   byte[] a = { p,q,r,s }  =>  a[0]=p,a[1]=q,a[2]=r,a[3]=s

Rahul, the bug is assigned to you. Please, test this solution.

Thanks,
Vladimir

On 9/10/18 4:10 AM, Andy Law wrote:
> This change is only about:
> Disabling the un-aligned C2 `clean_memory()` optimization when using Unsafe to write to an unaligned address.
> 
> ```
> java -version
> openjdk version "1.8.0-internal-debug"
> OpenJDK Runtime Environment (build 1.8.0-internal-debug-***_2018_09_03_19_31-b00)
> OpenJDK 64-Bit Server VM (build 25.71-b00-debug, mixed mode)
> ```
> 
> This issue 8202414 is about:
> ArrayObjects of -XX:+UseCompressedOops on 64-bit has a 12 bits header and a 4 bits length. So the length address is from 12th to 16th bytes.
> If we use Unsafe.putInt() to write at the 17th bit, the C2 `clean_memory()` will mistakenly do `done_offset -= BytesPerInt;`, then `done_offset` will become 13.  And then it will clear the address from the 13th bit, make the array length changes to a different value. When a GC happens, it will crash.
> 
> I didn?t find the unaligned memory support of `clear_memory()`, so I only do a small fix to make the affect be the least:
> When Unsafe.put*() writes to an aligned memory as above, it will cause the assert fail. So when it fails, we don?t do any optimizations instead, and the problem solves.
> 
> I don?t know if it is a good solution? It is only 3 lines of code, so please have a look:) Thank you!
> 
> 
> Andy
> 

From dean.long at oracle.com  Mon Sep 10 20:08:02 2018
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Mon, 10 Sep 2018 13:08:02 -0700
Subject: RFR(XS) 8210434: [Graal] 8209301 prevents GitHub Graal from compiling
 with latest JDK
Message-ID: <87209ade-9c45-2edd-3370-31a447e38588@oracle.com>

http://cr.openjdk.java.net/~dlong/8210434/webrev/
https://bugs.openjdk.java.net/browse/JDK-8210434

This change reverts the 8209301 rename in AOTCompiledClass and adds back 
HotSpotResolvedObjectType.isAnonymous to preserve compatibility.

dl

From cthalinger at twitter.com  Mon Sep 10 20:17:44 2018
From: cthalinger at twitter.com (Christian Thalinger)
Date: Mon, 10 Sep 2018 22:17:44 +0200
Subject: RFR (S): 8209972: [GRAAL] Don't run RTM tests with Graal
In-Reply-To: <c10ed3d2-b76a-4b52-c7ae-25bddb9ab721@linux.vnet.ibm.com>
References: <9148c105-1bd8-0852-9130-6b3c7823330e@linux.vnet.ibm.com>
 <3ae68161-0f78-2766-655f-fe3fe4a4719f@oracle.com>
 <ebccc7d7-aa73-9f00-7b91-379ff5cf1a84@linux.vnet.ibm.com>
 <9e7a0c2b-47c0-45b7-a394-a88849cfa3a1@oracle.com>
 <15d5f2c1-96ce-c379-233f-e71218a63e1e@linux.vnet.ibm.com>
 <2e905821-4525-e210-5cc5-50a6f09a39f6@oracle.com>
 <c10ed3d2-b76a-4b52-c7ae-25bddb9ab721@linux.vnet.ibm.com>
Message-ID: <4F954DE5-DD8C-4395-8E40-6D341C42649C@twitter.com>


> On Sep 6, 2018, at 1:53 AM, Gustavo Romero <gromero at linux.vnet.ibm.com> wrote:
> 
> On 09/05/2018 07:54 PM, Vladimir Kozlov wrote:
>> v3 looks good.
> 
> Thanks a lot Vladimir.
> 
> @Goetz, would you mind to review v3 please?

Is he on vacation? :-)

> It touches code meant for AIX but
> I don't expect any change in the end.
> 
> http://cr.openjdk.java.net/~gromero/8209972/v3/
> 
> Thank you.
> 
> 
> Best regards,
> Gustavo
> 
>> Thanks,
>> Vladimir
>> On 9/5/18 3:18 PM, Gustavo Romero wrote:
>>> Hi Vladimir,
>>> 
>>> On 09/04/2018 03:40 PM, Vladimir Kozlov wrote:
>>>> Thank you Gustavo for detailed answer.
>>>> 
>>>> I agree with your suggestion to keep vm.rtm.cpu and remove vm.rtm.os. Changes in VMProps.java are good now.
>>> 
>>> Thanks for reviewing it!
>>> 
>>> 
>>>> About property check (flavor == "server" & !emulatedClient) in tests. If C2 (and Graal in a future when it supports RTM) is used, this check is always true because C2 (and Graal) is only available in Server VM and it is disabled in emulatedClient setting. This check was added before we have C2 usage check which can be done now with vm.rtm.compiler.
>>> 
>>> Thanks, I was not aware of it. I've updated the webrev removing
>>> "flavor == "server" & !emulatedClient":
>>> 
>>> http://cr.openjdk.java.net/~gromero/8209972/v3/
>>> 
>>> "hg diff --patience":
>>> 
>>> http://cr.openjdk.java.net/~gromero/8209972/v3/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v3.diff
>>> 
>>> Testing (on Linux):
>>> 
>>> ** X86_64 w/ CPU+OS RTM support + Graal VM **
>>> Test results: no tests selected (all RTM tests skipped)
>>> 
>>> ** POWER8 w/ CPU+OS support **
>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>> Test results: passed: 30
>>> 
>>> ** X86_64 w/ CPU+OS support **
>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>> Test results: passed: 30
>>> 
>>> ** POWER7 wo/ CPU+OS RTM support **
>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>> Test results: passed: 10
>>> 
>>> ** POWER9 NV CPU w/ CPU RTM support and OS wo/ RTM support **
>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>> Test results: passed: 10
>>> 
>>> 
>>> Best regards,
>>> Gustavo
>>> 
>>>> Thanks,
>>>> Vladimir
>>>> 
>>>> On 9/3/18 3:15 PM, Gustavo Romero wrote:
>>>>> Hi Vladimir,
>>>>> 
>>>>> Thanks a lot for reviewing it and for your comments.
>>>>> 
>>>>> On 08/31/2018 03:12 PM, Vladimir Kozlov wrote:
>>>>>> Hi Gustavo,
>>>>>> 
>>>>>> I think you should replace !Compiler.isGraalEnabled() with Compiler.isC2Enabled() because C2 may be switched off with TieredStopAtLevel < 4 flag
>>>>> 
>>>>> Yes, although currently afaics all tests will explicitly enabled C2 (for
>>>>> instance, passing -Xcomp and -XX:-TieredCompilation) or implicitly enable C2
>>>>> through a warming up before testing, I agree that nothing forbids one to
>>>>> switch off C2 with TieredStopAtLevel = level < 4 for a test case. It also
>>>>> looks better to list explicitly which compilers do support RTM instead of
>>>>> the ones that don't support it.
>>>>> 
>>>>> I've updated the webrev accordingly:
>>>>> 
>>>>> http://cr.openjdk.java.net/~gromero/8209972/v2/
>>>>> 
>>>>> diff in there looks odd so I generated another one with --patience for a
>>>>> better (IMO) diff format:
>>>>> 
>>>>> http://cr.openjdk.java.net/~gromero/8209972/v2/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v2.diff
>>>>> 
>>>>> 
>>>>>> Also can platforms check be replaced with one vmRTMCPU()? Is ppc64 return true from vmRTMCPU()?
>>>>> 
>>>>> For example, on Linux the following cases are possible regarding CPU / OS
>>>>> RTM support:
>>>>> 
>>>>> POWER7   : cpu = false, os = false         => vm.rtm.cpu = false
>>>>> POWER8   : cpu = true,  os = false | true  => vm.rtm.cpu = false | true
>>>>> POWER9 VM: cpu = true,  os = false | true  => vm.rtm.cpu = false | true
>>>>> POWER9 NV: cpu = true,  os = false         => vm.rtm.cpu = false
>>>>> 
>>>>> PPC64 will return 'true' for vmRTMCPU() _only_ if both CPU and OS support
>>>>> RTM. In other words, when @requires asks for "vm.rtm.cpu & vm.rtm.os" it
>>>>> really looks like a tautology because if "vm.rtm.cpu" is 'true' it implies
>>>>> "vm.rtm.os" being 'true' as well, otherwise the JVM would never advertise
>>>>> the "rtm" feature (which is used by "vm.rtm.cpu"). That seems true for
>>>>> Linux and for AIX.
>>>>> 
>>>>> That said I don't think that the platforms check can be replaced with one
>>>>> vmRTMCPU(), because in some cases it's necessary to run a test for
>>>>> cpu = false and compiler = true, i.e. it's necessary to run a test on an
>>>>> unsupported CPU for a given platform _only if_ the compiler in use supports
>>>>> RTM (like C2). So if, for instance, we do:
>>>>> 
>>>>> 'vm.rtm.compiler = C2 && vmRTMCPU()' and use 'vm.rtm.compiler' in @requires
>>>>> we "tie" CPU+OS RTM support to compiler RTM support and the evaluation
>>>>> returns 'false' for cpu = false and compiler = true, skipping the test
>>>>> (vm.rtm.compiler = false). On the other hand, to evaluate 'vm.rtm.compiler'
>>>>> as 'true' and run the test in that case one could match for
>>>>> '!vm.rtm.compiler', but if compiler = false (Graal) '!vm.rtm.compiler' will
>>>>> be evaluated as 'true' and the test will run even thought the Graal
>>>>> compiler is selected, which is wrong.
>>>>> 
>>>>> Hence 'vm.rtm.compiler' property must not rely on vmRTMCPU() and must
>>>>> contain its own list of supported compilers with RTM support for each
>>>>> platform IMO. Basically we can't ask the JVM about the compiler's support
>>>>> for RTM, since the JVM can only tell us about the CPU+OS support for RTM
>>>>> regarding the CPU and OS in which the JVM is running on.
>>>>> 
>>>>> 
>>>>>> And since you check cpu in here why not to replace all vm.rtm.* with one vm.rtm.supported? In such case you would need only one @requires checks in tests instead of:
>>>>>> 
>>>>>> vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os & vm.rtm.compiler
>>>>> 
>>>>> I think it's not possible either. Currently there are 5 match cases in
>>>>> RTM tests:
>>>>> 
>>>>> gromero at moog:~/hg/jdk/jdk/test/hotspot/jtreg/compiler/rtm$ fgrep @require -RIn | cut -d ' ' -f 2-40 | sort -u
>>>>> * @requires !(vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os)
>>>>> * @requires vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os
>>>>> * @requires (!vm.rtm.cpu) & (vm.flavor == "server" & !vm.emulatedClient)
>>>>> * @requires vm.rtm.cpu & !(vm.flavor == "server" & !vm.emulatedClient)
>>>>> 
>>>>> which can be simplified 5 cases as:
>>>>> 
>>>>> 1:          !(flavor == "server" & !emulatedClient  & cpu & os)
>>>>> 2:            flavor == "server" & !emulatedClient  & cpu & os
>>>>> 3: (!cpu) &  (flavor == "server" & !emulatedClient)
>>>>> 4:   cpu  & !(flavor == "server" & !emulatedClient)
>>>>> 5: no @requires
>>>>> 
>>>>> I understand that case 1 and 2 (since CPU implies OS) can be simplified as:
>>>>> 
>>>>> 
>>>>> 1:          !(flavor == "server" & !emulatedClient  & cpu)
>>>>> 2:            flavor == "server" & !emulatedClient  & cpu
>>>>> 3: (!cpu) &  (flavor == "server" & !emulatedClient)
>>>>> 4:   cpu  & !(flavor == "server" & !emulatedClient)
>>>>> 5: no @requires
>>>>> 
>>>>> and case 1 and 2 are mere opposites, so we have 4 cases:
>>>>> 
>>>>> 1:          !(flavor == "server" & !emulatedClient  & cpu)
>>>>> 3: (!cpu) &  (flavor == "server" & !emulatedClient)
>>>>> 4:   cpu  & !(flavor == "server" & !emulatedClient)
>>>>> 5: no @requires
>>>>> 
>>>>> We could simplify further making P = (flavor == "server" & !emulatedClient),
>>>>> and make:
>>>>> 
>>>>> 1:          !(P & cpu)
>>>>> 3: (!cpu) &  (P)
>>>>> 4:   cpu  & !(P)
>>>>> 5: no @requires
>>>>> 
>>>>> So if we add a compiler = C2 && (x64 | PPC) property to each of them in
>>>>> order to control running the tests only if the selected compiler on a
>>>>> given platform has RTM support (skipping Graal, for instance):
>>>>> 
>>>>> 1:          !(P & cpu) & compiler
>>>>> 3: (!cpu) &  (P)       & compiler
>>>>> 4:   cpu  & !(P)       & compiler
>>>>> 5: no @requires        & compiler
>>>>> 
>>>>> So it looks like that at minimum we would need 3 properties, but IMO it's
>>>>> not worth to add another property P = (flavor == "server" & !emulatedClient)
>>>>> just to simplify further the @requires line.
>>>>> 
>>>>> In summing up, I think it's only possible to replace 'cpu & os' by 'cpu',
>>>>> so I updated the webrev removing the vm.rtm.os property and keeping only
>>>>> vm.rtm.cpu and vm.rtm.compiler (plus flavor and emulatedClient checks).
>>>>> 
>>>>> I've tested the following scenarios and observed no regression [1]:
>>>>> 
>>>>> 1. X86_64 w/ RTM
>>>>> 2. X86_64 w/ RTM + Graal enabled
>>>>> 3. POWER7: no CPU+OS support for RTM
>>>>> 4. POWER8: CPU+OS support for RTM
>>>>> 
>>>>> But I think we need a confirmation from SAP about AIX.
>>>>> 
>>>>> 
>>>>> Best regards,
>>>>> Gustavo
>>>>> 
>>>>> [1]
>>>>> 
>>>>> ** X86_64 w/ RTM **
>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>>> Test results: passed: 30
>>>>> 
>>>>> 
>>>>> ** X86_64 w/ RTM + Graal enabled **
>>>>> Test results: no tests selected (all RTM tests skipped)
>>>>> 
>>>>> 
>>>>> ** POWER7: no CPU+OS support for RTM **
>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>> Test results: passed: 10
>>>>> 
>>>>> 
>>>>> ** POWER8: CPU+OS support for RTM **
>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>>> Test results: passed: 30
>>>>> 
>>>>> 
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>> 
>>>>>> On 8/31/18 8:38 AM, Gustavo Romero wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Could the following small change be reviewed please?
>>>>>>> 
>>>>>>> Bug   : https://bugs.openjdk.java.net/browse/JDK-8209972
>>>>>>> Webrev: http://cr.openjdk.java.net/~gromero/8209972/v1/
>>>>>>> 
>>>>>>> It disables all RTM tests when a compiler not supporting RTM (e.g. Graal)
>>>>>>> is selected on platforms that can have CPU/OS with RTM support.
>>>>>>> 
>>>>>>> It also disables all RTM tests for any other platform that has not a single
>>>>>>> compiler supporting RTM.
>>>>>>> 
>>>>>>> The RTM support was first added to C2 compiler and once checkers for RTM
>>>>>>> (notably vm.rtm.cpu) find the feature "rtm" advertised by the JVM they
>>>>>>> assume that a compiler supporting RTM is available for sure ("rtm" is
>>>>>>> advertised only if RTM is supported by both CPU and OS). Later the JVM
>>>>>>> began to allow the selection of a compiler different from C2, like Graal,
>>>>>>> and it became possible to select a compiler without RTM support despite the
>>>>>>> fact that both the CPU and the OS support RTM. Thus for platforms
>>>>>>> supporting Graal or any other specific compiler the compiler availability for
>>>>>>> the RTM tests must be adjusted and if the selected compiler does not
>>>>>>> support RTM then all RTM tests must be skipped, including the ones meant
>>>>>>> for platforms without CPU or OS RTM support (e.g. *Unsupported*.java)
>>>>>>> because in some cases, like in TestUseRTMLockingOptionOnUnsupportedCPU.java,
>>>>>>> the test expects JVM initialization errors that will never occur because the
>>>>>>> problem is not that the RTM support for CPU or OS is missing, but rather
>>>>>>> because the selected compiler does not support RTM.
>>>>>>> 
>>>>>>> That change adds a new VM property 'vm.rtm.compiler' which can be used to
>>>>>>> filter out compilers without RTM support for specific platforms and adapts
>>>>>>> the current RTM tests to use that new property.
>>>>>>> 
>>>>>>> Nothing changes regarding the number of passing/selected tests for the
>>>>>>> various cpu/os/compiler combinations on platforms that currently might
>>>>>>> support RTM [1], except when Graal is in use.
>>>>>>> 
>>>>>>> Thank you.
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> Gustavo
>>>>>>> 
>>>>>>> 
>>>>>>> [1]
>>>>>>> 
>>>>>>> ** X64 w/ CPU and OS supporting RTM **
>>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>>>>> Test results: passed: 30
>>>>>>> 
>>>>>>> ** X64 w/ CPU and OS supporting RTM + Graal compiler wo/ RTM support **
>>>>>>> Test results: no tests selected (all RTM tests skipped)
>>>>>>> 
>>>>>>> ** POWER8 w/ CPU and OS supporting RTM **
>>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>>>>> Test results: passed: 30
>>>>>>> 
>>>>>>> ** POWER7 wo/ CPU and OS supporting RTM **
>>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>>>> Test results: passed: 10
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
> 


From doug.simon at oracle.com  Mon Sep 10 20:33:17 2018
From: doug.simon at oracle.com (Doug Simon)
Date: Mon, 10 Sep 2018 22:33:17 +0200
Subject: RFR(XS) 8210434: [Graal] 8209301 prevents GitHub Graal from
 compiling with latest JDK
In-Reply-To: <87209ade-9c45-2edd-3370-31a447e38588@oracle.com>
References: <87209ade-9c45-2edd-3370-31a447e38588@oracle.com>
Message-ID: <7E57CA1D-4B3F-46A2-94AD-6CE169093E94@oracle.com>

Looks good to me.

> On 10 Sep 2018, at 22:08, dean.long at oracle.com wrote:
> 
> http://cr.openjdk.java.net/~dlong/8210434/webrev/
> https://bugs.openjdk.java.net/browse/JDK-8210434
> 
> This change reverts the 8209301 rename in AOTCompiledClass and adds back HotSpotResolvedObjectType.isAnonymous to preserve compatibility.
> 
> dl


From vladimir.kozlov at oracle.com  Mon Sep 10 21:23:50 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 10 Sep 2018 14:23:50 -0700
Subject: RFR(XS) 8210434: [Graal] 8209301 prevents GitHub Graal from
 compiling with latest JDK
In-Reply-To: <7E57CA1D-4B3F-46A2-94AD-6CE169093E94@oracle.com>
References: <87209ade-9c45-2edd-3370-31a447e38588@oracle.com>
 <7E57CA1D-4B3F-46A2-94AD-6CE169093E94@oracle.com>
Message-ID: <492e8d6b-1897-8dd4-5d79-4ee2a7c1f60f@oracle.com>

+1

Thanks,
Vladimir

On 9/10/18 1:33 PM, Doug Simon wrote:
> Looks good to me.
> 
>> On 10 Sep 2018, at 22:08, dean.long at oracle.com wrote:
>>
>> http://cr.openjdk.java.net/~dlong/8210434/webrev/
>> https://bugs.openjdk.java.net/browse/JDK-8210434
>>
>> This change reverts the 8209301 rename in AOTCompiledClass and adds back HotSpotResolvedObjectType.isAnonymous to preserve compatibility.
>>
>> dl
> 

From dean.long at oracle.com  Mon Sep 10 23:18:07 2018
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Mon, 10 Sep 2018 16:18:07 -0700
Subject: RFR(XS) 8210434: [Graal] 8209301 prevents GitHub Graal from
 compiling with latest JDK
In-Reply-To: <492e8d6b-1897-8dd4-5d79-4ee2a7c1f60f@oracle.com>
References: <87209ade-9c45-2edd-3370-31a447e38588@oracle.com>
 <7E57CA1D-4B3F-46A2-94AD-6CE169093E94@oracle.com>
 <492e8d6b-1897-8dd4-5d79-4ee2a7c1f60f@oracle.com>
Message-ID: <ab7022d8-bd88-236a-55ce-193d5119b7ee@oracle.com>

Thanks Doug and Vladimir.

dl

On 9/10/18 2:23 PM, Vladimir Kozlov wrote:
> +1
>
> Thanks,
> Vladimir
>
> On 9/10/18 1:33 PM, Doug Simon wrote:
>> Looks good to me.
>>
>>> On 10 Sep 2018, at 22:08, dean.long at oracle.com wrote:
>>>
>>> http://cr.openjdk.java.net/~dlong/8210434/webrev/
>>> https://bugs.openjdk.java.net/browse/JDK-8210434
>>>
>>> This change reverts the 8209301 rename in AOTCompiledClass and adds 
>>> back HotSpotResolvedObjectType.isAnonymous to preserve compatibility.
>>>
>>> dl
>>


From 944797358 at qq.com  Tue Sep 11 00:36:57 2018
From: 944797358 at qq.com (Andy Law)
Date: Tue, 11 Sep 2018 08:36:57 +0800
Subject: [PATCH] 8202414: Unsafe crash in C2
Message-ID: <F246E560-332C-4638-812E-B2306CCC1135@qq.com>

Hi Lutz and Andrew,

Thank you for your reply and sorry for my typos :)

TL;DR
I think it is the optimization of `clear_memory()`which cause this problem, in my understanding it may not be a user fault :)

When running the example on the bug list using `-XX:DisableIntrinsic=_putInt`, or use interpreter/C1 only will make this problem go away, due to the fact that program will go to another branch.

In function `clear_memory()`, it will make an optimization which will clear the context of the memory, in fact

    if (done_offset > start_offset) {  // [1]
        // it will clear the memory from start_offset to done_offset
    }

    if (done_offset < end_offset) {  // [2]
        // it will clear the memory by using a Int (0) to clear the memory of done_offset
    }

|<--------------- 16-byte header ?--??>| <?? 4-byte arr length (new byte[397]) ???>
                                                              |       0000     0001     1000     1101     |
If it is aligned, it won?t have any problem but, if it is not aligned as the example, this optimization will mis-clear the `0000 0001` to `0000 0000`, so the array length becomes 141. Then it will crash when gc happens.

It is the optimization which cause this problem, so when it is not aligned, we don?t do this optimization for this unaligned address may solve the problem.
By the way I didn?t find the unaligned message on the doc:( but I think you?re right and it should be aligned when using Unsafe, though it is an deprecated API :) It won?t be reproduced using the templateInterpreter or C1 compiler, due to the fact that they won?t do this optimization.

Thank you:),
Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180911/c4dc1cf6/attachment.html>

From 944797358 at qq.com  Tue Sep 11 00:42:27 2018
From: 944797358 at qq.com (Andy Law)
Date: Tue, 11 Sep 2018 08:42:27 +0800
Subject: [PATCH] 8202414: Unsafe crash in C2
Message-ID: <1BBBEB69-6301-41EE-AD50-A02AA152D757@qq.com>

Hi Vladimir,

Thank you for your reply:)

However, I think my patch is as below

diff --git a/src/share/vm/opto/memnode.cpp b/src/share/vm/opto/memnode.cpp
--- a/src/share/vm/opto/memnode.cpp
+++ b/src/share/vm/opto/memnode.cpp
@@ -2923,8 +2923,11 @@
     return mem;
   }
 
+  if ((end_offset % BytesPerInt) != 0) {
+    return mem;
+  }
+
   Compile* C = phase->C;
-  assert((end_offset % BytesPerInt) == 0, "odd end offset");
   intptr_t done_offset = end_offset;
   if ((done_offset % BytesPerLong) != 0) {
     done_offset -= BytesPerInt;

Maybe I mis-submitted some code ...?
Sorry for bothering :(

Thanks,
Andy


From vladimir.kozlov at oracle.com  Tue Sep 11 01:08:38 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 10 Sep 2018 18:08:38 -0700
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
Message-ID: <d57561f0-94d2-a720-b54c-f48b51051fad@oracle.com>

Very nice. Thank you, Sandhya.

I would like to see more meaningful naming in .ad files - instead of rreg* and ovec* to have 
something like vlReg* and legVec*.

New load_from_* and load_to_* instructions in .ad files should be renamed to next and collocate with 
other Move*_reg_reg* instructions:

instruct MoveF2VL(vlRegF dst, regF src)
instruct MoveVL2F(regF dst, vlRegF src)

You did not added instructions to load these registers from memory (and stack). What happens in such 
cases when you need to load or store?

Also please explain why these registers are used when UseAVX == 0?:

+instruct absD_reg(rregD dst) %{
    predicate((UseSSE>=2) && (UseAVX == 0));

we switch off evex so regular regD (only legacy register in this case) should work too:
  661   if (UseAVX < 3) {
  662     _features &= ~CPU_AVX512F;

Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):

+reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, vectors_reg_legacy, %{ 
VM_Version::supports_evex() && VM_Version::supports_avx512bw() && VM_Version::supports_avx512dq() && 
VM_Version::supports_avx512vl() %} );

I would suggest to test these changes on different machines (non-avx512 and avx512) and with 
different UseAVX values.

Thanks,
Vladimir

On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
> Recently there have been couple of high priority issues with regards to high bank of XMM register 
> (XMM16-XMM31) usage by C2:
> 
> https://bugs.openjdk.java.net/browse/JDK-8207746
> 
> https://bugs.openjdk.java.net/browse/JDK-8209735
> 
> Please find below a patch which attempts to clean up the XMM register handling by using register groups.
> 
> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ 
> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
> 
> The patch provides a restricted set of registers to the match rules in the ad file based on the 
> underlying architecture.
> 
> The aim is to remove special handling/workaround from macro assembler and assembler.
> 
> By removing the special handling, the patch reduces the overall code size by about 1800 lines of code.
> 
> Your review and feedback is very welcome.
> 
> Best Regards,
> 
> Sandhya
> 

From vladimir.kozlov at oracle.com  Tue Sep 11 01:20:17 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 10 Sep 2018 18:20:17 -0700
Subject: [PATCH] 8202414: Unsafe crash in C2
In-Reply-To: <1BBBEB69-6301-41EE-AD50-A02AA152D757@qq.com>
References: <1BBBEB69-6301-41EE-AD50-A02AA152D757@qq.com>
Message-ID: <94b45235-1493-bb55-2d7d-ea90350a91ab@oracle.com>

Hi Andy,

What I sent is *my* suggested fix because I think your fix (below) is not correct.

InitializeNode::complete_stores() assumes that call ClearArrayNode::clear_memory() will generate 
code to zero the part of object and you change does not generate such code.

Thanks,
Vladimir

On 9/10/18 5:42 PM, Andy Law wrote:
> Hi Vladimir,
> 
> Thank you for your reply:)
> 
> However, I think my patch is as below
> 
> diff --git a/src/share/vm/opto/memnode.cpp b/src/share/vm/opto/memnode.cpp
> --- a/src/share/vm/opto/memnode.cpp
> +++ b/src/share/vm/opto/memnode.cpp
> @@ -2923,8 +2923,11 @@
>       return mem;
>     }
>   
> +  if ((end_offset % BytesPerInt) != 0) {
> +    return mem;
> +  }
> +
>     Compile* C = phase->C;
> -  assert((end_offset % BytesPerInt) == 0, "odd end offset");
>     intptr_t done_offset = end_offset;
>     if ((done_offset % BytesPerLong) != 0) {
>       done_offset -= BytesPerInt;
> 
> Maybe I mis-submitted some code ...?
> Sorry for bothering :(
> 
> Thanks,
> Andy
> 
> 

From dean.long at oracle.com  Tue Sep 11 03:12:40 2018
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Mon, 10 Sep 2018 20:12:40 -0700
Subject: [PATCH] 8202414: Unsafe crash in C2
In-Reply-To: <F246E560-332C-4638-812E-B2306CCC1135@qq.com>
References: <F246E560-332C-4638-812E-B2306CCC1135@qq.com>
Message-ID: <f028df05-475c-7d32-52ed-425d78b463f5@oracle.com>

Hi Andy.? Did you notice the difference between these two intrinsics:

   *case*  vmIntrinsics 
<https://java.se.oracle.com/source/s?defs=vmIntrinsics&project=jdk-jdk>::_putInt <https://java.se.oracle.com/source/s?defs=_putInt&project=jdk-jdk>:*return*  inline_unsafe_access 
<https://java.se.oracle.com/source/xref/jdk-jdk/open/src/hotspot/share/opto/library_call.cpp#inline_unsafe_access>(is_store <https://java.se.oracle.com/source/s?defs=is_store&project=jdk-jdk>,T_INT <https://java.se.oracle.com/source/s?defs=T_INT&project=jdk-jdk>,Relaxed 
<https://java.se.oracle.com/source/xref/jdk-jdk/open/src/hotspot/share/opto/library_call.cpp#Relaxed>,*false*);
[...]
   *case*  vmIntrinsics 
<https://java.se.oracle.com/source/s?defs=vmIntrinsics&project=jdk-jdk>::_putIntUnaligned 
<https://java.se.oracle.com/source/s?defs=_putIntUnaligned&project=jdk-jdk>:*return*  inline_unsafe_access 
<https://java.se.oracle.com/source/xref/jdk-jdk/open/src/hotspot/share/opto/library_call.cpp#inline_unsafe_access>(is_store <https://java.se.oracle.com/source/s?defs=is_store&project=jdk-jdk>,T_INT <https://java.se.oracle.com/source/s?defs=T_INT&project=jdk-jdk>,Relaxed 
<https://java.se.oracle.com/source/xref/jdk-jdk/open/src/hotspot/share/opto/library_call.cpp#Relaxed>,*true*);


The last argument (bool unaligned) for _putInt is saying that it does 
not support unaligned accesses.
Looking at jdk.internal.misc.Unsafe instead of sun.misc.Unsafe should 
make the difference clearer.

dl

On 9/10/18 5:36 PM, Andy Law wrote:
> Hi Lutz and?Andrew,
>
> Thank you for your reply and sorry for my typos :)
>
> TL;DR
> I think it is the optimization of `clear_memory()`which cause this 
> problem, in my understanding it may not be a user fault :)
>
> When running?the example on the bug list?using 
> `-XX:DisableIntrinsic=_putInt`, or use interpreter/C1 only will make 
> this problem go away, due to the fact that program will go to another 
> branch.
>
> In function `clear_memory()`, it will make an optimization which will 
> clear the context of the memory, in fact
>
> ? ??if (done_offset > start_offset) { ?// [1]
> ? ? ? ? // it will clear the memory from start_offset to done_offset
> ? ? }
>
> ? ??if (done_offset < end_offset) { ?// [2]
> ? ? ? ? // it will clear the memory by using a Int (0) to clear the 
> memory of done_offset
> ? ? }
>
> |<--------------- 16-byte header ?--??>| <?? 4-byte arr length (new 
> byte[397]) ???>
> ? ? ? ? ? | ? ? ? 0000 ? ? 0001 ? ? 1000 ? ? 1101 ? ? |
> If it is aligned, it won?t have any problem but, if it is not aligned 
> as the example, this optimization will mis-clear the `0000 0001` to 
> `0000 0000`, so the array length becomes 141. Then it will crash when 
> gc happens.
>
> It is the optimization which cause this problem, so when it is not 
> aligned, we don?t do this optimization for this unaligned address may 
> solve the problem.
> By the way I didn?t find the unaligned message on the doc:( but I 
> think you?re right and it should be aligned when using Unsafe, though 
> it is an deprecated API :) It won?t be reproduced using the 
> templateInterpreter or C1 compiler, due to the fact that they won?t do 
> this optimization.
>
> Thank you:),
> Andy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180910/73dfcc39/attachment.html>

From 944797358 at qq.com  Tue Sep 11 03:22:52 2018
From: 944797358 at qq.com (Andy Law)
Date: Tue, 11 Sep 2018 11:22:52 +0800
Subject: [PATCH] 8202414: Unsafe crash in C2
In-Reply-To: <f028df05-475c-7d32-52ed-425d78b463f5@oracle.com>
References: <F246E560-332C-4638-812E-B2306CCC1135@qq.com>
 <f028df05-475c-7d32-52ed-425d78b463f5@oracle.com>
Message-ID: <4FEA91FB-DAB2-4F69-862B-383F16DA31E7@qq.com>

Hi Dean,

Thanks for pointing it out, I didn?t notice it before because I mainly use openjdk 8. Now I get it, thank you :)

Andy

> On Sep 11, 2018, at 11:12, dean.long at oracle.com wrote:
> 
> :_putIntUnaligned <https://java.se.oracle.com/source/s?defs=_putIntUnaligned&project=jdk-jdk>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180911/f4e932e7/attachment.html>

From jayaprabhakar at gmail.com  Tue Sep 11 06:22:51 2018
From: jayaprabhakar at gmail.com (jayaprabhakar k)
Date: Mon, 10 Sep 2018 23:22:51 -0700
Subject: Any way to avoid JIT overhead for small programs when using AOT?
In-Reply-To: <mailman.15191.1536568789.4678.hotspot-compiler-dev@openjdk.java.net>
References: <mailman.15191.1536568789.4678.hotspot-compiler-dev@openjdk.java.net>
Message-ID: <CA+t=SiJSXSXDuDTLwjYMt9NLjUVkjNJoxT-iV8VBvkUbb6JQXw@mail.gmail.com>

On Mon, 10 Sep 2018 at 01:40, <hotspot-compiler-dev-request at openjdk.java.net>
wrote:

> Send hotspot-compiler-dev mailing list submissions to
>         hotspot-compiler-dev at openjdk.java.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev
> or, via email, send a message with subject or body 'help' to
>         hotspot-compiler-dev-request at openjdk.java.net
>
> You can reach the person managing the list at
>         hotspot-compiler-dev-owner at openjdk.java.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of hotspot-compiler-dev digest..."
>
>
> Today's Topics:
>
>    1. Any way to avoid JIT overhead for small programs when using
>       AOT? (jayaprabhakar k)
>    2. [PING] RE: RFR(S): 8210152: Optimize integer divisible by
>       power-of-2 check (Pengfei Li (Arm Technology China))
>    3. Re: Any way to avoid JIT overhead for small programs when
>       using AOT? (dean.long at oracle.com)
>    4. Re: Any way to avoid JIT overhead for small programs when
>       using AOT? (Andrew Haley)
>    5. JIT: C2 doesn't skip post barrier for new allocated objects
>       (Kuai Wei)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 9 Sep 2018 20:58:02 -0700
> From: jayaprabhakar k <jayaprabhakar at gmail.com>
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: Any way to avoid JIT overhead for small programs when using
>         AOT?
> Message-ID:
>         <CA+t=
> Si+mtt7YVS2tAtMfD9MWhxOJkHky9T_i68EMHWvgHgEioQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
> I understand that at present AOT and -Xint are not compatible. I see the
> code explicitly disables AOT when -Xint is set
> <
> http://cr.openjdk.java.net/~kvn/8171137/webrev/raw_files/new/src/share/vm/aot/aotLoader.cpp
> >
> .
>
> For extremely short programs, typically used by beginners learning Java, I
> see that CDS, AOT and Xint all help reduce the startup time. While CDS
> works with both AOT and Xint, multiplying the benefits, AOT and Xint do
> not.
>
> Is there a way to keep both AOT + Xint, For classes/methods that are
> precompiled, use AOT code, and for others just interpret? If not now, would
> it be possible in the future?
>
> Thanks,
> JP
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180909/86bb6624/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Mon, 10 Sep 2018 04:24:16 +0000
> From: "Pengfei Li (Arm Technology China)" <Pengfei.Li at arm.com>
> To: "dean.long at oracle.com" <dean.long at oracle.com>, Vladimir Kozlov
>         <vladimir.kozlov at oracle.com>, "
> hotspot-compiler-dev at openjdk.java.net"
>         <hotspot-compiler-dev at openjdk.java.net>,
>         "hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>,
>         "Pengfei Li (Arm Technology China)" <Pengfei.Li at arm.com>
> Cc: nd <nd at arm.com>
> Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by
>         power-of-2 check
> Message-ID:
>         <
> DB7PR08MB31150B1D6C7E547538B2B99A96050 at DB7PR08MB3115.eurprd08.prod.outlook.com
> >
>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Dean / Vladimir / JDK experts,
>
> Do you have any further questions or comments on this patch? Or should I
> make some modifications on it, such as adding some limitations to the
> matching condition?
> I appreciate your help.
>
> --
> Thanks,
> Pengfei
>
>
> > -----Original Message-----
> > From: Pengfei Li (Arm Technology China)
> > Sent: Monday, September 3, 2018 13:50
> > To: 'dean.long at oracle.com' <dean.long at oracle.com>; 'Vladimir Kozlov'
> > <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net;
> > hotspot-dev at openjdk.java.net
> > Cc: nd <nd at arm.com>
> > Subject: RE: RFR(S): 8210152: Optimize integer divisible by power-of-2
> check
> >
> > Hi Vladimir, Dean,
> >
> > Thanks for your review.
> >
> > > I don't see where negation is coming from for 'X % 2 == 0' expression.
> > > It should be only 2 instructions: 'cmp (X and 1), 0'
> > The 'cmp (X and 1), 0' is just what we expected. But there's redundant
> > conditional negation coming from the possibly negative X handling in "X
> % 2".
> > For instance, X = -5, "X % 2" should be -1. So only "(X and 1)"
> operation is not
> > enough. We have to negate the result.
> >
> > > I will look on it next week. But it would be nice if you can provide
> small test
> > to show this issue.
> > I've already provided a case of "if (a%2 == 0) { ... }" in JBS
> description. What
> > code generated and what can be optimized are listed there.
> > You could see https://bugs.openjdk.java.net/browse/JDK-8210152 for
> details.
> > You could also see the test case for this optimization I attached below.
> >
> > > It looks like your matching may allow more patterns than expected. I
> was
> > expecting it to look for < 0 or >= 0 for the conditional negation, but I
> don't see
> > it.
> > Yes. I didn't limit the if condition to <0 or >= 0 so it will match more
> patterns.
> > But nothing is going wrong if this ideal transformation applies on more
> cases.
> > In pseudo code, if someone writes:
> > if ( some_condition ) { x = -x; }
> > if ( x == 0 ) { do_something(); }
> > The negation in 1st if-clause could always be eliminated whatever the
> > condition is.
> >
> > --
> > Thanks,
> > Pengfei
> >
> >
> > -- my test case attached below --
> > public class Foo {
> >
> >     public static void main(String[] args) {
> >         int[] dividends = { 0, 17, 1553, -90, -35789, 0x80000000 };
> >         for (int i = 0; i < dividends.length; i++) {
> >             int x = dividends[i];
> >             System.out.println(testDivisible(x));
> >             System.out.println(testModulo(x));
> >             testCondNeg(x);
> >         }
> >         return;
> >     }
> >
> >     public static int testDivisible(int x) {
> >         // Modulo result is only for zero check
> >         if (x % 4 == 0) {
> >             return 444;
> >         }
> >         return 555;
> >     }
> >
> >     public static int testModulo(int x) {
> >         int y = x % 4;
> >         if (y == 0) {
> >             return 222;
> >         }
> >         // Modulo result is used elsewhere
> >         System.out.println(y);
> >         return 333;
> >     }
> >
> >     public static void testCondNeg(int x) {
> >         // Pure conditional negation
> >         if (printAndIfNeg(x)) {
> >             x = -x;
> >         }
> >         if (x == 0) {
> >             System.out.println("zero!");
> >         }
> >     }
> >
> >     static boolean printAndIfNeg(int x) {
> >         System.out.println(x);
> >         return x <= 0;
> >     }
> > }
>
> ------------------------------
>
> Message: 3
> Date: Mon, 10 Sep 2018 01:00:29 -0700
> From: dean.long at oracle.com
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: Any way to avoid JIT overhead for small programs when
>         using AOT?
> Message-ID: <fa2711c0-e73e-b7b0-9ca6-5d0fb52cb330 at oracle.com>
> Content-Type: text/plain; charset="utf-8"; Format="flowed"
>
> On 9/9/18 8:58 PM, jayaprabhakar k wrote:
> > Hi,
> > I understand that at present AOT and -Xint are not compatible. I see
> > the code explicitly disables AOT when -Xint is set
> > <
> http://cr.openjdk.java.net/%7Ekvn/8171137/webrev/raw_files/new/src/share/vm/aot/aotLoader.cpp
> >.
> >
> > For extremely short programs, typically used by beginners learning
> > Java, I see that CDS, AOT and Xint all help reduce the startup time.
> > While CDS works with both AOT and Xint, multiplying the benefits, AOT
> > and Xint do not.
> >
> > Is there a way to keep both AOT?+ Xint, For classes/methods that are
> > precompiled, use AOT code, and for others just interpret? If not now,
> > would it be possible in the future?
> >
> > Thanks,
> > JP
>
> Hi JP.? Yes, it could be possible in the future.? One problem is
> MethodHandle intrinsics.? With -Xint, there's no code heap, so no place
> to generate native adapters for those intrinsics.
>
> dl
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180910/5f3ec9cd/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 4
> Date: Mon, 10 Sep 2018 09:17:59 +0100
> From: Andrew Haley <aph at redhat.com>
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: Any way to avoid JIT overhead for small programs when
>         using AOT?
> Message-ID: <2753b70f-67c7-ef7a-ca40-49266f502401 at redhat.com>
> Content-Type: text/plain; charset=utf-8
>
> On 09/10/2018 04:58 AM, jayaprabhakar k wrote:
>
> > I understand that at present AOT and -Xint are not compatible. I see the
> > code explicitly disables AOT when -Xint is set
> > <
> http://cr.openjdk.java.net/~kvn/8171137/webrev/raw_files/new/src/share/vm/aot/aotLoader.cpp
> >
> > .
> >
> > For extremely short programs, typically used by beginners learning Java,
> I
> > see that CDS, AOT and Xint all help reduce the startup time. While CDS
> > works with both AOT and Xint, multiplying the benefits, AOT and Xint do
> > not.
> >
> > Is there a way to keep both AOT + Xint, For classes/methods that are
> > precompiled, use AOT code, and for others just interpret? If not now,
> would
> > it be possible in the future?
>
> Does it significantly help? If you precompile the Java library and your
> programs
> are extremely short, you'll see very little compilation activity.
>
Thanks Andrew.
I don't see any compilation (The default -XX:CompileThreshold is quite
large), but the overhead still seems to be large.  I ran a small test on
AWS T2 instances.
The test class just has empty main method. But I could reproduce the exact
same behavior when run with *--dry-run* command line option.

So most of the delay happens on startup.

-- Default --

$ perf stat -e  cpu-clock -r50 java   -XX:+UseG1GC EmptyMainMethod

 Performance counter stats for 'java -XX:+UseG1GC EmptyMainMethod' (50 runs):

        104.039398 cpu-clock (msec)
          ( +-  0.39% )

       0.093801870 seconds time elapsed
          ( +-  2.66% )


-- Xint --

perf stat -e  cpu-clock -r50 java   -XX:+UseG1GC  -Xint     EmptyMainMethod

 Performance counter stats for 'java -XX:+UseG1GC -Xint
EmptyMainMethod' (50 runs):

         76.203249 cpu-clock (msec)
          ( +-  0.33% )

       0.083464038 seconds time elapsed
          ( +-  2.03% )

-- AOT --

$ perf stat -e  cpu-clock -r50 java   -XX:+UseG1GC
-XX:AOTLibrary=jaot/touched_methods.so     EmptyMainMethod

 Performance counter stats for 'java -XX:+UseG1GC
-XX:AOTLibrary=jaot/touched_methods.so EmptyMainMethod' (50 runs):

        102.416037 cpu-clock (msec)
          ( +-  0.22% )

       0.083394143 seconds time elapsed
          ( +-  0.92% )

--

--
The source code for the test is

public class EmptyMainMethod {
  public static void main(String[] args) {

  }
}


--
This delay seems consistent with most programs created by school students
learning Java.

Context for the request: I am the developer of Codiva.io online Java IDE
<https://www.codiva.io>. Many teachers recommend it for their students to
learn java. To support spiky load, I run the programs on the server on a
container with reduced resource limits for each run. At 10% CPU limit, the
difference gets around 200ms.


>
> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>
>
> ------------------------------
>
> Message: 5
> Date: Mon, 10 Sep 2018 16:39:42 +0800
> From: "Kuai Wei" <kuaiwei.kw at alibaba-inc.com>
> To: "hotspot compiler" <hotspot-compiler-dev at openjdk.java.net>
> Subject: JIT: C2 doesn't skip post barrier for new allocated objects
> Message-ID:
>         <1849545e-2e42-4b24-957c-cb4924362971.kuaiwei.kw at alibaba-inc.com>
> Content-Type: text/plain; charset="utf-8"
>
>
> Hi,
>
>   Recently I checked the optimization of reducing G1 post barrier for new
> allocated object. But I found it doesn't work as expected.
> I wrote a simple test case to store oop in initialize function or just
> after init function .
> public class StoreTest {
>     static String val="x";
>
>     public static Foo testMethod() {
>         Foo newfoo = new Foo(val);
>  newfoo.b=val; // the store barrier could be reduced
>         return newfoo;
>     }
>
>     public static void main(String []args) {
>         Foo obj = new Foo(val);  // init Foo class
>         testMethod();
>     }
>
>     static class Foo {
>         Object a;
>         Object b;
>         public Foo(Object val) {
> this.a=val; // the store barrier could be reduced
>         };
>     }
> }
> I inline Foo:<init> and Object::<init> when compile testMethod by C2, so I
> think the 2 store marked red don't need post barrier. But I still found
> post barrier in generated assembly code.
> The test command: java -Xcomp -Xbatch -XX:+UseG1GC
> -XX:CompileCommandFile=compile_command -Xbatch -XX:+PrintCompilation
> -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
> StoreTest
> compile_command:
> compileonly, StoreTest::testMethod
> compileonly, StoreTest$Foo::<init>
> inline, StoreTest$Foo::<init>
> compileonly, java.lang.Object::<init>
> inline, java.lang.Object::<init>
> print, StoreTest::testMethod
>
> I checked the node graph in parsing phase. The optimization depends on
> GraphKit::just_allocated_object to detect new allocate object. The idea is
> to check control of store is control proj of allocation. But in parse phase
> , there's a Region node between control proj and control of store. The
> region just has one input edge. So it could be optimized later. The region
> node is generated when C2 inline init method of super class, I think it's
> used in exit map to merge all exit path.
>
> The change is simple, in just_allocated_object, I checked if there's
> region node with only 1 input. With the change, we can see good performance
> improvement in pressure test.
>
> Could you check the change and give comments about it?
>
> graphKit.cpp
>  // We use this to determine if an object is so "fresh" that
>  // it does not require card marks.
>  Node* GraphKit::just_allocated_object(Node* current_control) {
> -  if (C->recent_alloc_ctl() == current_control)
> +  Node * ctrl = current_control;
> +  if (CheckJustAllocatedAggressive) {
> +    // Object::<init> is invoked after allocation, most of invoke nodes
> +    // will be reduced, but a region node is kept in parse time, we check
> +    // the pattern and skip the region node
> +    if (ctrl != NULL && ctrl->is_Region() && ctrl->req() == 2) {
> +      ctrl = ctrl->in(1);
> +    }
> +  }
> +  if (C->recent_alloc_ctl() == ctrl)
>      return C->recent_alloc_obj();
>    return NULL;
>  }
> Thanks,
> Kevin
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180910/0f0f7161/attachment.html
> >
>
> End of hotspot-compiler-dev Digest, Vol 136, Issue 30
> *****************************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180910/4b8dde1a/attachment-0001.html>

From rwestrel at redhat.com  Tue Sep 11 08:23:40 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 11 Sep 2018 10:23:40 +0200
Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by
 power-of-2 check
In-Reply-To: <bcf5438c-50f9-7d75-b4d8-d09b89783816@oracle.com>
References: <DB7PR08MB31150B1D6C7E547538B2B99A96050@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <bcf5438c-50f9-7d75-b4d8-d09b89783816@oracle.com>
Message-ID: <dk61sa0wigj.fsf@rwestrel.remote.csb>


> The only comment I have is to add check for SubI on other branch (not
> only on True branch). Negation may occur on either branch since you
> accept all conditions for negation.

Can't we make this more general and support a phi with any number of
inputs (not only 2 data inputs) as long as it's a mix of X and -X?

Roland.

From tobias.hartmann at oracle.com  Tue Sep 11 08:31:27 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 11 Sep 2018 10:31:27 +0200
Subject: [12] RFR(S): 8210387: C2 compilation fails with
 "assert(node->_last_del == _last) failed: must have deleted the edge just
 produced"
Message-ID: <bfb5a29e-4611-1154-e921-118ba42ad617@oracle.com>

Hi,

please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8210387
http://cr.openjdk.java.net/~thartmann/8210387/webrev.00/

During CCP, before removing unreachable regions, we first replace all dead phi users with TOP by
calling PhaseIterGVN::replace_node. Replacing a phi node can trigger removal of other nodes some of
which might also be phi users of the same region. This breaks verification of the DUIterator because
only one node is expected to be removed. Very similar to the code in RegionNode::Ideal [1], we need
to refresh the iterator and start iterating from the beginning for as long as there is progress.

Thanks,
Tobias

[1] http://hg.openjdk.java.net/jdk/jdk/file/bbc7157ad9c5/src/hotspot/share/opto/cfgnode.cpp#l538

From adinn at redhat.com  Tue Sep 11 09:06:35 2018
From: adinn at redhat.com (Andrew Dinn)
Date: Tue, 11 Sep 2018 10:06:35 +0100
Subject: RFR: 8210578: AArch64: Invalid encoding for fmlsvs instruction
Message-ID: <80d98039-c412-b638-c764-0794da1ba26f@redhat.com>

Can I please have a review for this trivial patch to correct the
encoding for fmlsvs.

JIRA Issue: https://bugs.openjdk.java.net/browse/JDK-8210578

Patch:

diff -r bbc7157ad9c5 src/hotspot/cpu/aarch64/assembler_aarch64.hpp
--- a/src/hotspot/cpu/aarch64/assembler_aarch64.hpp	Tue Sep 11 09:14:36
2018 +0200
+++ b/src/hotspot/cpu/aarch64/assembler_aarch64.hpp	Tue Sep 11 09:42:41
2018 +0100
@@ -2356,7 +2356,7 @@

   // FMLA/FMLS - Vector - Scalar
   INSN(fmlavs, 0, 0b0001);
-  INSN(fmlsvs, 0, 0b0001);
+  INSN(fmlsvs, 0, 0b0101);
   // FMULX - Vector - Scalar
   INSN(fmulxvs, 1, 0b1001);

The corrected bit identifies the sub_op which distinguishes a fused add
multiply vector by scalar (fmlavs) and add from a fused multiply vector
by scalar and subtract (fmlsvs).

Testing:
It appears that this instruction has never been exercised (by contrast,
fmlavs has -- by the power intrinsic I am currently reviewing). All I
have done to check this patch is ensure I can rebuild the JVM (there
isn't really any opportunity to test it until it is needed in an intrinsic).

Can I assume this is trivial enough to be pushed without running a
submit job?

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From claes.redestad at oracle.com  Tue Sep 11 10:39:57 2018
From: claes.redestad at oracle.com (Claes Redestad)
Date: Tue, 11 Sep 2018 12:39:57 +0200
Subject: Any way to avoid JIT overhead for small programs when using AOT?
In-Reply-To: <CA+t=SiJSXSXDuDTLwjYMt9NLjUVkjNJoxT-iV8VBvkUbb6JQXw@mail.gmail.com>
References: <mailman.15191.1536568789.4678.hotspot-compiler-dev@openjdk.java.net>
 <CA+t=SiJSXSXDuDTLwjYMt9NLjUVkjNJoxT-iV8VBvkUbb6JQXw@mail.gmail.com>
Message-ID: <447a323d-58bb-2917-3599-d0515c0eef3f@oracle.com>

Hi,

On 2018-09-11 08:22, jayaprabhakar k wrote:
>
>     > I understand that at present AOT and -Xint are not compatible. I
>     see the
>     > code explicitly disables AOT when -Xint is set
>     >
>     <http://cr.openjdk.java.net/~kvn/8171137/webrev/raw_files/new/src/share/vm/aot/aotLoader.cpp
>     <http://cr.openjdk.java.net/%7Ekvn/8171137/webrev/raw_files/new/src/share/vm/aot/aotLoader.cpp>>
>     > .
>     >
>     > For extremely short programs, typically used by beginners
>     learning Java, I
>     > see that CDS, AOT and Xint all help reduce the startup time.
>     While CDS
>     > works with both AOT and Xint, multiplying the benefits, AOT and
>     Xint do
>     > not.
>     >
>     > Is there a way to keep both AOT + Xint, For classes/methods that are
>     > precompiled, use AOT code, and for others just interpret? If not
>     now, would
>     > it be possible in the future?
>
>     Does it significantly help? If you precompile the Java library and
>     your programs
>     are extremely short, you'll see very little compilation activity.
>
> Thanks Andrew.
> I don't see any compilation (The default -XX:CompileThreshold is quite 
> large), but the overhead still seems to be large.? I ran a small test 
> on AWS T2 instances.
> The test class just has empty main method. But I could reproduce the 
> exact same behavior when run with *--dry-run* command line option.
>
> So most of the delay happens on startup.
>
> -- Default --
> $ perf stat -e  cpu-clock -r50 java   -XX:+UseG1GC EmptyMainMethod
>
>   Performance counter stats for 'java -XX:+UseG1GC EmptyMainMethod' (50 runs):
>
>          104.039398 cpu-clock (msec)                                              ( +-  0.39% )
>
>         0.093801870 seconds time elapsed                                          ( +-  2.66% )
>
> -- Xint --
> perf stat -e  cpu-clock -r50 java   -XX:+UseG1GC  -Xint     EmptyMainMethod
>
>   Performance counter stats for 'java -XX:+UseG1GC -Xint EmptyMainMethod' (50 runs):
>
>           76.203249 cpu-clock (msec)                                              ( +-  0.33% )
>
>         0.083464038 seconds time elapsed                                          ( +-  2.03% )
> -- AOT --
> $ perf stat -e  cpu-clock -r50 java   -XX:+UseG1GC  -XX:AOTLibrary=jaot/touched_methods.so     EmptyMainMethod
>
>   Performance counter stats for 'java -XX:+UseG1GC -XX:AOTLibrary=jaot/touched_methods.so EmptyMainMethod' (50 runs):
>
>          102.416037 cpu-clock (msec)                                              ( +-  0.22% )
>
>         0.083394143 seconds time elapsed                                          ( +-  0.92% )
> --

there might always be some things executed by the interpreter, some of 
which might get hot enough to trigger compilations. And if you've 
compiled your AOT library with support for tiered compilation you might 
also see C2 jobs fired off early.

You can indirectly avoid some of this by stopping the JIT from trying to 
go beyond C1 level optimization:

-XX:TieredStopAtLevel=1

In your constrained environment you might also want to limit the number 
of compiler threads the system could be spinning up to a minimum:

-XX:CICompilerCount=1

With this I see a significant reduction in cpu-clock time on my local 
machine (recent build from jdk/jdk):

AOT:

 ???????? 81.064838????? cpu-clock 
(msec)????????????????????????????????????????????? ( +-? 1.13% )
 ?????? 0.073530160 seconds time 
elapsed????????????????????????????????????????? ( +-? 1.05% )

AOT -XX:TieredStopAtLevel=1 -XX:CICompilerCount=1

 ???????? 54.584255????? cpu-clock 
(msec)????????????????????????????????????????????? ( +-? 1.16% )
 ?????? 0.054806668 seconds time 
elapsed????????????????????????????????????????? ( +-? 1.35% )


There's some I/O and extra linking overhead of starting up with an AOT 
archive, so -Xint might still outperform on a hello world:

 ???????? 52.138182????? cpu-clock 
(msec)????????????????????????????????????????????? ( +-? 1.60% )
 ?????? 0.053423763 seconds time 
elapsed????????????????????????????????????????? ( +-? 1.67% )

Generally the static startup overhead of AOT should be amortized rather 
quickly, say, once you have something that runs for more than a couple 
of hundred milliseconds.

HTH

/Claes

>
> --
> The source code for the test is
>
> public class EmptyMainMethod {
> ? public static void main(String[] args) {
>
> ? }
> }
>
>
> --
> This delay seems consistent with most programs created by school 
> students learning Java.
>
> Context for the request: I am the developer of Codiva.io online Java 
> IDE <https://www.codiva.io>. Many teachers recommend it for their 
> students to learn java. To support spiky load, I run the programs on 
> the server on a container with reduced resource limits for each run. 
> At 10% CPU limit, the difference gets around 200ms.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180911/c3864783/attachment-0001.html>

From lutz.schmidt at sap.com  Tue Sep 11 10:43:37 2018
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Tue, 11 Sep 2018 10:43:37 +0000
Subject: [PATCH] 8202414: Unsafe crash in C2
In-Reply-To: <F246E560-332C-4638-812E-B2306CCC1135@qq.com>
References: <F246E560-332C-4638-812E-B2306CCC1135@qq.com>
Message-ID: <DC26AA72-A155-4BDD-8E14-8AEA21DC964B@sap.com>

Hi Andy, 

unfortunately, we have two mail thread heads on the same topic. So I will try to give a very brief summary: 

  -  I agree, it's not your fault. The "user" is InitializeNode::complete_stores().
  -  clear_memory() is not prepared to handle unaligned (less than BytesPerInt) offsets.
  -  Your patch just leaves the memory uninitialized in case of unaligned offsets.
  -  Vladimir's patch fixes the root cause, i.e. the caller of clear_memory().
  -  Your patch removes the safety net from clear_memory(). Another reason why I don't like it.

In essence, I suggest to go with Vladimir's patch, provided the tests Vladimir requested work out ok: 

---BEGIN Vladimir's patch ---
diff -r b9f6a4427da9 src/hotspot/share/opto/memnode.cpp
--- a/src/hotspot/share/opto/memnode.cpp
+++ b/src/hotspot/share/opto/memnode.cpp
@@ -4095,10 +4095,11 @@
        // See if this store needs a zero before it or under it.
        intptr_t zeroes_needed = st_off;

-      if (st_size < BytesPerInt) {
+      if (st_size < BytesPerInt || (zeroes_needed % BytesPerInt) != 0) {
          // Look for subword stores which only partially initialize words.
          // If we find some, we must lay down some word-level zeroes first,
          // underneath the subword stores.
+        // Do the same for unaligned stores.
          //
          // Examples:
          //   byte[] a = { p,q,r,s }  =>  a[0]=p,a[1]=q,a[2]=r,a[3]=s
---END Vladimir's patch ---

BTW, I requested to be precise, so I have to correct myself. The length field of ArrayOop is at offset 12 (@klass_gap_offset_in_bytes()) only if UseCompressedClassPointers is true. It does not depend on UseCompressedOops.

Thanks,
Lutz

From: Andy Law <944797358 at qq.com>
Date: Tuesday, 11. September 2018 at 02:36
To: "hotspot-compiler-dev at openjdk.java.net" <hotspot-compiler-dev at openjdk.java.net>, Lutz Schmidt <lutz.schmidt at sap.com>, "aph at redhat.com" <aph at redhat.com>
Subject: Re: [PATCH] 8202414: Unsafe crash in C2

Hi Lutz and?Andrew, 

Thank you for your reply and sorry for my typos :)

TL;DR
I think it is the optimization of `clear_memory()`which cause this problem, in my understanding it may not be a user fault :)

When running?the example on the bug list?using `-XX:DisableIntrinsic=_putInt`, or use interpreter/C1 only will make this problem go away, due to the fact that program will go to another branch.

In function `clear_memory()`, it will make an optimization which will clear the context of the memory, in fact

? ??if (done_offset > start_offset) { ?// [1]
? ? ? ? // it will clear the memory from start_offset to done_offset
? ? }

? ??if (done_offset < end_offset) { ?// [2]
? ? ? ? // it will clear the memory by using a Int (0) to clear the memory of done_offset
? ? }

|<--------------- 16-byte header ?--??>| <?? 4-byte arr length (new byte[397]) ???>
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | ? ? ? 0000 ? ? 0001 ? ? 1000 ? ? 1101 ? ? |
If it is aligned, it won?t have any problem but, if it is not aligned as the example, this optimization will mis-clear the `0000 0001` to `0000 0000`, so the array length becomes 141. Then it will crash when gc happens.

It is the optimization which cause this problem, so when it is not aligned, we don?t do this optimization for this unaligned address may solve the problem.
By the way I didn?t find the unaligned message on the doc:( but I think you?re right and it should be aligned when using Unsafe, though it is an deprecated API :) It won?t be reproduced using the templateInterpreter or C1 compiler, due to the fact that they won?t do this optimization.

Thank you:),
Andy


From 944797358 at qq.com  Tue Sep 11 11:16:10 2018
From: 944797358 at qq.com (Andy Law)
Date: Tue, 11 Sep 2018 19:16:10 +0800
Subject: [PATCH] 8202414: Unsafe crash in C2
In-Reply-To: <DC26AA72-A155-4BDD-8E14-8AEA21DC964B@sap.com>
References: <F246E560-332C-4638-812E-B2306CCC1135@qq.com>
 <DC26AA72-A155-4BDD-8E14-8AEA21DC964B@sap.com>
Message-ID: <3CF1BA06-6C2B-4DFC-ACA9-2F082766922B@qq.com>

Hi Lutz,

Nice summary and I totally agree with your points.

Thanks,
Andy

> On Sep 11, 2018, at 18:43, Schmidt, Lutz <lutz.schmidt at sap.com> wrote:
> 
> Hi Andy, 
> 
> unfortunately, we have two mail thread heads on the same topic. So I will try to give a very brief summary: 
> 
>  -  I agree, it's not your fault. The "user" is InitializeNode::complete_stores().
>  -  clear_memory() is not prepared to handle unaligned (less than BytesPerInt) offsets.
>  -  Your patch just leaves the memory uninitialized in case of unaligned offsets.
>  -  Vladimir's patch fixes the root cause, i.e. the caller of clear_memory().
>  -  Your patch removes the safety net from clear_memory(). Another reason why I don't like it.
> 
> In essence, I suggest to go with Vladimir's patch, provided the tests Vladimir requested work out ok: 
> 
> ---BEGIN Vladimir's patch ---
> diff -r b9f6a4427da9 src/hotspot/share/opto/memnode.cpp
> --- a/src/hotspot/share/opto/memnode.cpp
> +++ b/src/hotspot/share/opto/memnode.cpp
> @@ -4095,10 +4095,11 @@
>        // See if this store needs a zero before it or under it.
>        intptr_t zeroes_needed = st_off;
> 
> -      if (st_size < BytesPerInt) {
> +      if (st_size < BytesPerInt || (zeroes_needed % BytesPerInt) != 0) {
>          // Look for subword stores which only partially initialize words.
>          // If we find some, we must lay down some word-level zeroes first,
>          // underneath the subword stores.
> +        // Do the same for unaligned stores.
>          //
>          // Examples:
>          //   byte[] a = { p,q,r,s }  =>  a[0]=p,a[1]=q,a[2]=r,a[3]=s
> ---END Vladimir's patch ---
> 
> BTW, I requested to be precise, so I have to correct myself. The length field of ArrayOop is at offset 12 (@klass_gap_offset_in_bytes()) only if UseCompressedClassPointers is true. It does not depend on UseCompressedOops.
> 
> Thanks,
> Lutz
> 
> From: Andy Law <944797358 at qq.com>
> Date: Tuesday, 11. September 2018 at 02:36
> To: "hotspot-compiler-dev at openjdk.java.net" <hotspot-compiler-dev at openjdk.java.net>, Lutz Schmidt <lutz.schmidt at sap.com>, "aph at redhat.com" <aph at redhat.com>
> Subject: Re: [PATCH] 8202414: Unsafe crash in C2
> 
> Hi Lutz and Andrew, 
> 
> Thank you for your reply and sorry for my typos :)
> 
> TL;DR
> I think it is the optimization of `clear_memory()`which cause this problem, in my understanding it may not be a user fault :)
> 
> When running the example on the bug list using `-XX:DisableIntrinsic=_putInt`, or use interpreter/C1 only will make this problem go away, due to the fact that program will go to another branch.
> 
> In function `clear_memory()`, it will make an optimization which will clear the context of the memory, in fact
> 
>     if (done_offset > start_offset) {  // [1]
>         // it will clear the memory from start_offset to done_offset
>     }
> 
>     if (done_offset < end_offset) {  // [2]
>         // it will clear the memory by using a Int (0) to clear the memory of done_offset
>     }
> 
> |<--------------- 16-byte header ?--??>| <?? 4-byte arr length (new byte[397]) ???>
>                                                               |       0000     0001     1000     1101     |
> If it is aligned, it won?t have any problem but, if it is not aligned as the example, this optimization will mis-clear the `0000 0001` to `0000 0000`, so the array length becomes 141. Then it will crash when gc happens.
> 
> It is the optimization which cause this problem, so when it is not aligned, we don?t do this optimization for this unaligned address may solve the problem.
> By the way I didn?t find the unaligned message on the doc:( but I think you?re right and it should be aligned when using Unsafe, though it is an deprecated API :) It won?t be reproduced using the templateInterpreter or C1 compiler, due to the fact that they won?t do this optimization.
> 
> Thank you:),
> Andy
> 


From vladimir.kozlov at oracle.com  Tue Sep 11 16:01:44 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 11 Sep 2018 09:01:44 -0700
Subject: [12] RFR(S): 8210387: C2 compilation fails with
 "assert(node->_last_del == _last) failed: must have deleted the edge just
 produced"
In-Reply-To: <bfb5a29e-4611-1154-e921-118ba42ad617@oracle.com>
References: <bfb5a29e-4611-1154-e921-118ba42ad617@oracle.com>
Message-ID: <0d4748d9-6d7d-b76f-a950-8040b5911151@oracle.com>

Looks good.

Thanks,
Vladimir

On 9/11/18 1:31 AM, Tobias Hartmann wrote:
> Hi,
> 
> please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8210387
> http://cr.openjdk.java.net/~thartmann/8210387/webrev.00/
> 
> During CCP, before removing unreachable regions, we first replace all dead phi users with TOP by
> calling PhaseIterGVN::replace_node. Replacing a phi node can trigger removal of other nodes some of
> which might also be phi users of the same region. This breaks verification of the DUIterator because
> only one node is expected to be removed. Very similar to the code in RegionNode::Ideal [1], we need
> to refresh the iterator and start iterating from the beginning for as long as there is progress.
> 
> Thanks,
> Tobias
> 
> [1] http://hg.openjdk.java.net/jdk/jdk/file/bbc7157ad9c5/src/hotspot/share/opto/cfgnode.cpp#l538
> 

From tobias.hartmann at oracle.com  Tue Sep 11 16:03:36 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 11 Sep 2018 18:03:36 +0200
Subject: [12] RFR(S): 8210387: C2 compilation fails with
 "assert(node->_last_del == _last) failed: must have deleted the edge just
 produced"
In-Reply-To: <0d4748d9-6d7d-b76f-a950-8040b5911151@oracle.com>
References: <bfb5a29e-4611-1154-e921-118ba42ad617@oracle.com>
 <0d4748d9-6d7d-b76f-a950-8040b5911151@oracle.com>
Message-ID: <6596277b-b5f8-5fe5-217a-89997a523f23@oracle.com>

Thanks Vladimir.

Best regards,
Tobias

On 11.09.2018 18:01, Vladimir Kozlov wrote:
> Looks good.
> 
> Thanks,
> Vladimir
> 
> On 9/11/18 1:31 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8210387
>> http://cr.openjdk.java.net/~thartmann/8210387/webrev.00/
>>
>> During CCP, before removing unreachable regions, we first replace all dead phi users with TOP by
>> calling PhaseIterGVN::replace_node. Replacing a phi node can trigger removal of other nodes some of
>> which might also be phi users of the same region. This breaks verification of the DUIterator because
>> only one node is expected to be removed. Very similar to the code in RegionNode::Ideal [1], we need
>> to refresh the iterator and start iterating from the beginning for as long as there is progress.
>>
>> Thanks,
>> Tobias
>>
>> [1] http://hg.openjdk.java.net/jdk/jdk/file/bbc7157ad9c5/src/hotspot/share/opto/cfgnode.cpp#l538
>>

From vladimir.kozlov at oracle.com  Tue Sep 11 17:55:03 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 11 Sep 2018 10:55:03 -0700
Subject: [PATCH] 8202414: Unsafe crash in C2
In-Reply-To: <3CF1BA06-6C2B-4DFC-ACA9-2F082766922B@qq.com>
References: <F246E560-332C-4638-812E-B2306CCC1135@qq.com>
 <DC26AA72-A155-4BDD-8E14-8AEA21DC964B@sap.com>
 <3CF1BA06-6C2B-4DFC-ACA9-2F082766922B@qq.com>
Message-ID: <a097b72b-605c-b14e-5b74-bc285f704fb9@oracle.com>

Dean have additional comments in bug report regarding unaligned store in general (need to set C2_UNALIGNED).
This make me nervous because stores collected by Initialized node are converted to raw stores [1], could be combined and 
may change such properties.

I think we need make sure to not collect such unaligned stores by Initialize node. And in fact we do have such check [2] 
but in this case store is not marked as unaligned. Because, as Dean pointed before, intrinsic for putInt() do not mark 
store as unaligned [3].

We can argue that putIntUnaligned() should be used but we can't guarantee that user will use correct one or it is even 
available as this bug shows. That is why we need to check if store/load is unaligned regardless of which Unsafe method 
is used. At least for cases when offset is constant.

I think we need to fix LibraryCallKit::inline_unsafe_access() and also InitializeNode::can_capture_store() because 
during parse phase offset could be not constant.

I am retracting my suggested fix and let someone to work on this.

Thanks,
Vladimir

[1] http://hg.openjdk.java.net/jdk/jdk/file/74dde8b66b7f/src/hotspot/share/opto/memnode.cpp#l3721
[2] http://hg.openjdk.java.net/jdk/jdk/file/74dde8b66b7f/src/hotspot/share/opto/memnode.cpp#l3478
[3] http://hg.openjdk.java.net/jdk/jdk/file/74dde8b66b7f/src/hotspot/share/opto/library_call.cpp#l2315

On 9/11/18 4:16 AM, Andy Law wrote:
> Hi Lutz,
> 
> Nice summary and I totally agree with your points.
> 
> Thanks,
> Andy
> 
>> On Sep 11, 2018, at 18:43, Schmidt, Lutz <lutz.schmidt at sap.com> wrote:
>>
>> Hi Andy,
>>
>> unfortunately, we have two mail thread heads on the same topic. So I will try to give a very brief summary:
>>
>>   -  I agree, it's not your fault. The "user" is InitializeNode::complete_stores().
>>   -  clear_memory() is not prepared to handle unaligned (less than BytesPerInt) offsets.
>>   -  Your patch just leaves the memory uninitialized in case of unaligned offsets.
>>   -  Vladimir's patch fixes the root cause, i.e. the caller of clear_memory().
>>   -  Your patch removes the safety net from clear_memory(). Another reason why I don't like it.
>>
>> In essence, I suggest to go with Vladimir's patch, provided the tests Vladimir requested work out ok:
>>
>> ---BEGIN Vladimir's patch ---
>> diff -r b9f6a4427da9 src/hotspot/share/opto/memnode.cpp
>> --- a/src/hotspot/share/opto/memnode.cpp
>> +++ b/src/hotspot/share/opto/memnode.cpp
>> @@ -4095,10 +4095,11 @@
>>         // See if this store needs a zero before it or under it.
>>         intptr_t zeroes_needed = st_off;
>>
>> -      if (st_size < BytesPerInt) {
>> +      if (st_size < BytesPerInt || (zeroes_needed % BytesPerInt) != 0) {
>>           // Look for subword stores which only partially initialize words.
>>           // If we find some, we must lay down some word-level zeroes first,
>>           // underneath the subword stores.
>> +        // Do the same for unaligned stores.
>>           //
>>           // Examples:
>>           //   byte[] a = { p,q,r,s }  =>  a[0]=p,a[1]=q,a[2]=r,a[3]=s
>> ---END Vladimir's patch ---
>>
>> BTW, I requested to be precise, so I have to correct myself. The length field of ArrayOop is at offset 12 (@klass_gap_offset_in_bytes()) only if UseCompressedClassPointers is true. It does not depend on UseCompressedOops.
>>
>> Thanks,
>> Lutz
>>
>> From: Andy Law <944797358 at qq.com>
>> Date: Tuesday, 11. September 2018 at 02:36
>> To: "hotspot-compiler-dev at openjdk.java.net" <hotspot-compiler-dev at openjdk.java.net>, Lutz Schmidt <lutz.schmidt at sap.com>, "aph at redhat.com" <aph at redhat.com>
>> Subject: Re: [PATCH] 8202414: Unsafe crash in C2
>>
>> Hi Lutz and Andrew,
>>
>> Thank you for your reply and sorry for my typos :)
>>
>> TL;DR
>> I think it is the optimization of `clear_memory()`which cause this problem, in my understanding it may not be a user fault :)
>>
>> When running the example on the bug list using `-XX:DisableIntrinsic=_putInt`, or use interpreter/C1 only will make this problem go away, due to the fact that program will go to another branch.
>>
>> In function `clear_memory()`, it will make an optimization which will clear the context of the memory, in fact
>>
>>      if (done_offset > start_offset) {  // [1]
>>          // it will clear the memory from start_offset to done_offset
>>      }
>>
>>      if (done_offset < end_offset) {  // [2]
>>          // it will clear the memory by using a Int (0) to clear the memory of done_offset
>>      }
>>
>> |<--------------- 16-byte header ?--??>| <?? 4-byte arr length (new byte[397]) ???>
>>                                                                |       0000     0001     1000     1101     |
>> If it is aligned, it won?t have any problem but, if it is not aligned as the example, this optimization will mis-clear the `0000 0001` to `0000 0000`, so the array length becomes 141. Then it will crash when gc happens.
>>
>> It is the optimization which cause this problem, so when it is not aligned, we don?t do this optimization for this unaligned address may solve the problem.
>> By the way I didn?t find the unaligned message on the doc:( but I think you?re right and it should be aligned when using Unsafe, though it is an deprecated API :) It won?t be reproduced using the templateInterpreter or C1 compiler, due to the fact that they won?t do this optimization.
>>
>> Thank you:),
>> Andy
>>
> 
> 
> 

From sandhya.viswanathan at intel.com  Tue Sep 11 21:58:43 2018
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Tue, 11 Sep 2018 21:58:43 +0000
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <d57561f0-94d2-a720-b54c-f48b51051fad@oracle.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <d57561f0-94d2-a720-b54c-f48b51051fad@oracle.com>
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>

Hi Vladimir,

Thanks a lot for your review and feedback. Please see my response in your email below. I will send an updated webrev incorporating your feedback.

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Monday, September 10, 2018 6:09 PM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction

Very nice. Thank you, Sandhya.

I would like to see more meaningful naming in .ad files - instead of rreg* and ovec* to have something like vlReg* and legVec*.

>>> Yes, accepted.

New load_from_* and load_to_* instructions in .ad files should be renamed to next and collocate with other Move*_reg_reg* instructions:

instruct MoveF2VL(vlRegF dst, regF src)
instruct MoveVL2F(regF dst, vlRegF src)
>>> Yes, accepted.

You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store?
>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa.

Also please explain why these registers are used when UseAVX == 0?:

+instruct absD_reg(rregD dst) %{
    predicate((UseSSE>=2) && (UseAVX == 0));

we switch off evex so regular regD (only legacy register in this case) should work too:
  661   if (UseAVX < 3) {
  662     _features &= ~CPU_AVX512F;

>>> Yes, accepted. It could be regD here.

Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):

+reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, 
+vectors_reg_legacy, %{
VM_Version::supports_evex() && VM_Version::supports_avx512bw() && VM_Version::supports_avx512dq() &&
VM_Version::supports_avx512vl() %} );

>>> Yes, accepted. 

I would suggest to test these changes on different machines (non-avx512 and avx512) and with different UseAVX values.

>>> Will do. 

Thanks,
Vladimir

On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
> Recently there have been couple of high priority issues with regards 
> to high bank of XMM register
> (XMM16-XMM31) usage by C2:
> 
> https://bugs.openjdk.java.net/browse/JDK-8207746
> 
> https://bugs.openjdk.java.net/browse/JDK-8209735
> 
> Please find below a patch which attempts to clean up the XMM register handling by using register groups.
> 
> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
> 
> The patch provides a restricted set of registers to the match rules in 
> the ad file based on the underlying architecture.
> 
> The aim is to remove special handling/workaround from macro assembler and assembler.
> 
> By removing the special handling, the patch reduces the overall code size by about 1800 lines of code.
> 
> Your review and feedback is very welcome.
> 
> Best Regards,
> 
> Sandhya
> 

From vladimir.kozlov at oracle.com  Wed Sep 12 00:11:25 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 11 Sep 2018 17:11:25 -0700
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <d57561f0-94d2-a720-b54c-f48b51051fad@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
Message-ID: <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>

Thank you.

I want to discuss next issue:

 > You did not added instructions to load these registers from memory (and stack). What happens in 
such cases when you need to load or store?
 >>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into 
regF and then register to register move to rregF and vice versa.

This is what I thought. You increase registers pressure this way which may cause spills on stack. 
Also we don't check that register could be the same as result you may get unneeded moves.

I would advice add memory moves at least.

An other question. You use movflt() and movdbl() which use either movap[s|d] and movs[s|d] instructions:
http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu/x86/macroAssembler_x86.hpp#l164
Are these instructions work when avx512vl is not available? I see for vectors you use 
vpxor+vinserti* combination.

Last question. I notice next UseAVX check in vectors spills code in x86.ad:
if ((UseAVX < 2) || VM_Version::supports_avx512vl())

Should it be (UseAVX < 3)?

Thanks,
Vladimir

On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> Thanks a lot for your review and feedback. Please see my response in your email below. I will send an updated webrev incorporating your feedback.
> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Monday, September 10, 2018 6:09 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
> 
> Very nice. Thank you, Sandhya.
> 
> I would like to see more meaningful naming in .ad files - instead of rreg* and ovec* to have something like vlReg* and legVec*.
> 
>>>> Yes, accepted.
> 
> New load_from_* and load_to_* instructions in .ad files should be renamed to next and collocate with other Move*_reg_reg* instructions:
> 
> instruct MoveF2VL(vlRegF dst, regF src)
> instruct MoveVL2F(regF dst, vlRegF src)
>>>> Yes, accepted.
> 
> You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store?
>>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa.
> 
> Also please explain why these registers are used when UseAVX == 0?:
> 
> +instruct absD_reg(rregD dst) %{
>      predicate((UseSSE>=2) && (UseAVX == 0));
> 
> we switch off evex so regular regD (only legacy register in this case) should work too:
>    661   if (UseAVX < 3) {
>    662     _features &= ~CPU_AVX512F;
> 
>>>> Yes, accepted. It could be regD here.
> 
> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
> 
> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
> +vectors_reg_legacy, %{
> VM_Version::supports_evex() && VM_Version::supports_avx512bw() && VM_Version::supports_avx512dq() &&
> VM_Version::supports_avx512vl() %} );
> 
>>>> Yes, accepted.
> 
> I would suggest to test these changes on different machines (non-avx512 and avx512) and with different UseAVX values.
> 
>>>> Will do.
> 
> Thanks,
> Vladimir
> 
> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>> Recently there have been couple of high priority issues with regards
>> to high bank of XMM register
>> (XMM16-XMM31) usage by C2:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8207746
>>
>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>
>> Please find below a patch which attempts to clean up the XMM register handling by using register groups.
>>
>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>
>> The patch provides a restricted set of registers to the match rules in
>> the ad file based on the underlying architecture.
>>
>> The aim is to remove special handling/workaround from macro assembler and assembler.
>>
>> By removing the special handling, the patch reduces the overall code size by about 1800 lines of code.
>>
>> Your review and feedback is very welcome.
>>
>> Best Regards,
>>
>> Sandhya
>>

From sandhya.viswanathan at intel.com  Wed Sep 12 01:13:32 2018
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Wed, 12 Sep 2018 01:13:32 +0000
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <d57561f0-94d2-a720-b54c-f48b51051fad@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
 <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>

Hi Vladimir,

Thanks a lot for the detailed review. I really appreciate your feedback. 
Please see my response in your email below marked with (Sandhya >>>). Looking forward to your advice.

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Tuesday, September 11, 2018 5:11 PM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction

Thank you.

I want to discuss next issue:

 > You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store?
 >>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa.

This is what I thought. You increase registers pressure this way which may cause spills on stack. 
Also we don't check that register could be the same as result you may get unneeded moves.

I would advice add memory moves at least.

Sandhya >>>  I had added those rules initially and removed them in the final patch. I noticed that the register allocator uses the memory rules (e.g. LoadF) to initialize the idealreg2reg mask (matcher.cpp). I would like the register allocator to get all the possible register on an architecture for idealreg2reg mask. I wondered that multiple instruct rules in .ad file for LoadF from memory might cause problems.  I would have to have higher cost for loading into restricted register set like vlReg. Then I decided that the register allocator can handle this in much better way than me adding rules to load from memory. This is with the background that the regF is always all the available registers and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I am referring to is:
  MachNode *spillCP = match_tree(new LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
#endif
  MachNode *spillI  = match_tree(new LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
  MachNode *spillL  = match_tree(new LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, LoadNode::DependsO
nlyOnTest, false));
  MachNode *spillF  = match_tree(new LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
  MachNode *spillD  = match_tree(new LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
  MachNode *spillP  = match_tree(new LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
  ....
  idealreg2regmask[Op_RegF] = &spillF->out_RegMask();

An other question. You use movflt() and movdbl() which use either movap[s|d] and movs[s|d] instructions:
http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu/x86/macroAssembler_x86.hpp#l164
Are these instructions work when avx512vl is not available? I see for vectors you use 
vpxor+vinserti* combination.

Sandhya >>> Yes the scalar floating point instructions are available with AVX512 encoding when avx512vl is not available. That is why you would see not just movflt, movdbl but all the other scalar operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions.

Last question. I notice next UseAVX check in vectors spills code in x86.ad:
if ((UseAVX < 2) || VM_Version::supports_avx512vl())

Should it be (UseAVX < 3)?

Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.

Thanks,
Vladimir

On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> Thanks a lot for your review and feedback. Please see my response in your email below. I will send an updated webrev incorporating your feedback.
> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Monday, September 10, 2018 6:09 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; 
> hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
> 
> Very nice. Thank you, Sandhya.
> 
> I would like to see more meaningful naming in .ad files - instead of rreg* and ovec* to have something like vlReg* and legVec*.
> 
>>>> Yes, accepted.
> 
> New load_from_* and load_to_* instructions in .ad files should be renamed to next and collocate with other Move*_reg_reg* instructions:
> 
> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, 
> vlRegF src)
>>>> Yes, accepted.
> 
> You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store?
>>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa.
> 
> Also please explain why these registers are used when UseAVX == 0?:
> 
> +instruct absD_reg(rregD dst) %{
>      predicate((UseSSE>=2) && (UseAVX == 0));
> 
> we switch off evex so regular regD (only legacy register in this case) should work too:
>    661   if (UseAVX < 3) {
>    662     _features &= ~CPU_AVX512F;
> 
>>>> Yes, accepted. It could be regD here.
> 
> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
> 
> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
> +vectors_reg_legacy, %{
> VM_Version::supports_evex() && VM_Version::supports_avx512bw() && 
> VM_Version::supports_avx512dq() &&
> VM_Version::supports_avx512vl() %} );
> 
>>>> Yes, accepted.
> 
> I would suggest to test these changes on different machines (non-avx512 and avx512) and with different UseAVX values.
> 
>>>> Will do.
> 
> Thanks,
> Vladimir
> 
> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>> Recently there have been couple of high priority issues with regards 
>> to high bank of XMM register
>> (XMM16-XMM31) usage by C2:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8207746
>>
>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>
>> Please find below a patch which attempts to clean up the XMM register handling by using register groups.
>>
>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>
>> The patch provides a restricted set of registers to the match rules 
>> in the ad file based on the underlying architecture.
>>
>> The aim is to remove special handling/workaround from macro assembler and assembler.
>>
>> By removing the special handling, the patch reduces the overall code size by about 1800 lines of code.
>>
>> Your review and feedback is very welcome.
>>
>> Best Regards,
>>
>> Sandhya
>>

From vladimir.kozlov at oracle.com  Wed Sep 12 03:53:46 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 11 Sep 2018 20:53:46 -0700
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <d57561f0-94d2-a720-b54c-f48b51051fad@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
 <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>
Message-ID: <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>

Thank you, Sandhya

I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.

Vladimir

On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> Thanks a lot for the detailed review. I really appreciate your feedback.
> Please see my response in your email below marked with (Sandhya >>>). Looking forward to your advice.
> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Tuesday, September 11, 2018 5:11 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
> 
> Thank you.
> 
> I want to discuss next issue:
> 
>   > You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store?
>   >>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa.
> 
> This is what I thought. You increase registers pressure this way which may cause spills on stack.
> Also we don't check that register could be the same as result you may get unneeded moves.
> 
> I would advice add memory moves at least.
> 
> Sandhya >>>  I had added those rules initially and removed them in the final patch. I noticed that the register allocator uses the memory rules (e.g. LoadF) to initialize the idealreg2reg mask (matcher.cpp). I would like the register allocator to get all the possible register on an architecture for idealreg2reg mask. I wondered that multiple instruct rules in .ad file for LoadF from memory might cause problems.  I would have to have higher cost for loading into restricted register set like vlReg. Then I decided that the register allocator can handle this in much better way than me adding rules to load from memory. This is with the background that the regF is always all the available registers and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I am referring to is:
>    MachNode *spillCP = match_tree(new LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
> #endif
>    MachNode *spillI  = match_tree(new LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>    MachNode *spillL  = match_tree(new LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, LoadNode::DependsO
> nlyOnTest, false));
>    MachNode *spillF  = match_tree(new LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>    MachNode *spillD  = match_tree(new LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>    MachNode *spillP  = match_tree(new LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>    ....
>    idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
> 
> An other question. You use movflt() and movdbl() which use either movap[s|d] and movs[s|d] instructions:
> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu/x86/macroAssembler_x86.hpp#l164
> Are these instructions work when avx512vl is not available? I see for vectors you use
> vpxor+vinserti* combination.
> 
> Sandhya >>> Yes the scalar floating point instructions are available with AVX512 encoding when avx512vl is not available. That is why you would see not just movflt, movdbl but all the other scalar operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions.
> 
> Last question. I notice next UseAVX check in vectors spills code in x86.ad:
> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
> 
> Should it be (UseAVX < 3)?
> 
> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
> 
> Thanks,
> Vladimir
> 
> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>> Hi Vladimir,
>>
>> Thanks a lot for your review and feedback. Please see my response in your email below. I will send an updated webrev incorporating your feedback.
>>
>> Best Regards,
>> Sandhya
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Monday, September 10, 2018 6:09 PM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>> hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>
>> Very nice. Thank you, Sandhya.
>>
>> I would like to see more meaningful naming in .ad files - instead of rreg* and ovec* to have something like vlReg* and legVec*.
>>
>>>>> Yes, accepted.
>>
>> New load_from_* and load_to_* instructions in .ad files should be renamed to next and collocate with other Move*_reg_reg* instructions:
>>
>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst,
>> vlRegF src)
>>>>> Yes, accepted.
>>
>> You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store?
>>>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>
>> Also please explain why these registers are used when UseAVX == 0?:
>>
>> +instruct absD_reg(rregD dst) %{
>>       predicate((UseSSE>=2) && (UseAVX == 0));
>>
>> we switch off evex so regular regD (only legacy register in this case) should work too:
>>     661   if (UseAVX < 3) {
>>     662     _features &= ~CPU_AVX512F;
>>
>>>>> Yes, accepted. It could be regD here.
>>
>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
>>
>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>> +vectors_reg_legacy, %{
>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
>> VM_Version::supports_avx512dq() &&
>> VM_Version::supports_avx512vl() %} );
>>
>>>>> Yes, accepted.
>>
>> I would suggest to test these changes on different machines (non-avx512 and avx512) and with different UseAVX values.
>>
>>>>> Will do.
>>
>> Thanks,
>> Vladimir
>>
>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>>> Recently there have been couple of high priority issues with regards
>>> to high bank of XMM register
>>> (XMM16-XMM31) usage by C2:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>
>>> Please find below a patch which attempts to clean up the XMM register handling by using register groups.
>>>
>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>>
>>> The patch provides a restricted set of registers to the match rules
>>> in the ad file based on the underlying architecture.
>>>
>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>>>
>>> By removing the special handling, the patch reduces the overall code size by about 1800 lines of code.
>>>
>>> Your review and feedback is very welcome.
>>>
>>> Best Regards,
>>>
>>> Sandhya
>>>

From rwestrel at redhat.com  Wed Sep 12 07:56:58 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 12 Sep 2018 09:56:58 +0200
Subject: RFR: 8210578: AArch64: Invalid encoding for fmlsvs instruction
In-Reply-To: <80d98039-c412-b638-c764-0794da1ba26f@redhat.com>
References: <80d98039-c412-b638-c764-0794da1ba26f@redhat.com>
Message-ID: <dk68t47rvw5.fsf@rwestrel.remote.csb>


Patch looks good to me.

Roland.

From adinn at redhat.com  Wed Sep 12 08:07:39 2018
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 12 Sep 2018 09:07:39 +0100
Subject: RFR: 8210578: AArch64: Invalid encoding for fmlsvs instruction
In-Reply-To: <dk68t47rvw5.fsf@rwestrel.remote.csb>
References: <80d98039-c412-b638-c764-0794da1ba26f@redhat.com>
 <dk68t47rvw5.fsf@rwestrel.remote.csb>
Message-ID: <eddc6c67-0249-1970-62c4-ae0f894c928a@redhat.com>

On 12/09/18 08:56, Roland Westrelin wrote:
> 
> Patch looks good to me.
Thanks for the review Roland.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From Pengfei.Li at arm.com  Wed Sep 12 09:50:44 2018
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Wed, 12 Sep 2018 09:50:44 +0000
Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by
 power-of-2 check
In-Reply-To: <dk61sa0wigj.fsf@rwestrel.remote.csb>
References: <DB7PR08MB31150B1D6C7E547538B2B99A96050@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <bcf5438c-50f9-7d75-b4d8-d09b89783816@oracle.com>
 <dk61sa0wigj.fsf@rwestrel.remote.csb>
Message-ID: <DB7PR08MB31151A5DD37FAB07FDCB9B29961B0@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi,

I've updated the patch based on Vladimir's comment. I added checks for SubI on both branches of the diamond phi.
Also thanks Roland for the suggestion that supporting a Phi with 3 or more inputs. But I think the matching rule will be much more complex if we add this. And I'm not sure if there are any real case scenario which can benefit from this support. So I didn't add it in.

New webrev: http://cr.openjdk.java.net/~njian/8210152/webrev.01/
I've run jtreg full test with the new patch and no new issues found.

Please let me know if you have other comments or suggestions. If no further issues, I need your help to sponsor and push the patch.

--
Thanks,
Pengfei


> -----Original Message-----
> From: Roland Westrelin <rwestrel at redhat.com>
> Sent: Tuesday, September 11, 2018 16:24
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Pengfei Li (Arm
> Technology China) <Pengfei.Li at arm.com>; dean.long at oracle.com; hotspot-
> compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net
> Cc: nd <nd at arm.com>
> Subject: Re: [PING] RE: RFR(S): 8210152: Optimize integer divisible by power-
> of-2 check
> 
> 
> > The only comment I have is to add check for SubI on other branch (not
> > only on True branch). Negation may occur on either branch since you
> > accept all conditions for negation.
> 
> Can't we make this more general and support a phi with any number of
> inputs (not only 2 data inputs) as long as it's a mix of X and -X?
> 
> Roland.

From goetz.lindenmaier at sap.com  Wed Sep 12 10:12:53 2018
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Wed, 12 Sep 2018 10:12:53 +0000
Subject: RFR(M): 8210497: [PPC64] Vector registers not saved across
 safepoint
In-Reply-To: <a86078397e244b24af72716b50f221f6@sap.com>
References: <a86078397e244b24af72716b50f221f6@sap.com>
Message-ID: <41135339cfc6421d86cd3b7147eee525@sap.com>

Hi Martin,

I had a look at your fix and it looks good.
I currently can't judge though whether even more
vector support is needed in some other place.
But this is not subject of this fix I guess.

Reviewed.

Best regards,
  Geotz.

PS: I would appreciate if you put the 'save_registers'
argument on a line of it's own whereever this is done for
all other arguments.  No new webrev needed.


From: Doerr, Martin
Sent: Freitag, 7. September 2018 18:12
To: 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>; Michihiro Horie (HORIE at jp.ibm.com) <HORIE at jp.ibm.com>; Gustavo Romero <gromero at linux.vnet.ibm.com>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
Subject: RFR(M): 8210497: [PPC64] Vector registers not saved across safepoint

Hi,

we noticed that the RegisterSaver misses code to save and restore the vector registers on PPC64. I'd like to fix that.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8210497

Webrev:
http://cr.openjdk.java.net/~mdoerr/8210497_PPC64_save_CR/webrev.00/

This webrev already fixes the following tests when JDK-8208171 webrev.03 is applied:
compiler/runtime/safepoints/TestRegisterRestoring.java
compiler/runtime/Test7196199.java

I'll try to test the OopMap part. This may be tricky.

Best regards,
Martin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180912/21dcff77/attachment.html>

From lutz.schmidt at sap.com  Wed Sep 12 10:23:51 2018
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Wed, 12 Sep 2018 10:23:51 +0000
Subject: [CAUTION] RE: RFR(M): 8210497: [PPC64] Vector registers not saved
 across safepoint
Message-ID: <762D04D8-0739-4212-93EE-4B901D3D9031@sap.com>

Hi Martin, 

your changes look good overall. I'm not a reviewer, so that judgement doesn't help you much. ??

I have found a few details you may want to consider:

src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp
@line 272: the new frame is pushed, but @line 275 comment says frame is not yet pushed.
Is there a reason why you need both, R30 and R31, as scratch in push_frame_reg_args_and_save_live_registers()? 

Regards,
Lutz


From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> on behalf of G?tz Lindenmaier <goetz.lindenmaier at sap.com>
Date: Wednesday, 12. September 2018 at 12:12
To: "Doerr, Martin (martin.doerr at sap.com)" <martin.doerr at sap.com>, "'hotspot-compiler-dev at openjdk.java.net'" <hotspot-compiler-dev at openjdk.java.net>, "Michihiro Horie (HORIE at jp.ibm.com)" <HORIE at jp.ibm.com>, Gustavo Romero <gromero at linux.vnet.ibm.com>
Subject: [CAUTION] RE: RFR(M): 8210497: [PPC64] Vector registers not saved across safepoint

Hi Martin, 
?
I had a look at your fix and it looks good. 
I currently can?t judge though whether even more 
vector support is needed in some other place. 
But this is not subject of this fix I guess.
?
Reviewed.
?
Best regards,
? Geotz.
?
PS: I would appreciate if you put the ?save_registers?
argument on a line of it?s own whereever this is done for
all other arguments.? No new webrev needed.
?
?
From: Doerr, Martin 
Sent: Freitag, 7. September 2018 18:12
To: 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>; Michihiro Horie (HORIE at jp.ibm.com) <HORIE at jp.ibm.com>; Gustavo Romero <gromero at linux.vnet.ibm.com>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
Subject: RFR(M): 8210497: [PPC64] Vector registers not saved across safepoint
?
Hi,
?
we noticed that the RegisterSaver misses code to save and restore the vector registers on PPC64. I?d like to fix that.
?
Bug:
https://bugs.openjdk.java.net/browse/JDK-8210497
?
Webrev:
http://cr.openjdk.java.net/~mdoerr/8210497_PPC64_save_CR/webrev.00/
?
This webrev already fixes the following tests when JDK-8208171 webrev.03 is applied:
compiler/runtime/safepoints/TestRegisterRestoring.java
compiler/runtime/Test7196199.java
?
I?ll try to test the OopMap part. This may be tricky.
?
Best regards,
Martin
?


From martin.doerr at sap.com  Wed Sep 12 10:44:32 2018
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 12 Sep 2018 10:44:32 +0000
Subject: RFR(M): 8210497: [PPC64] Vector registers not saved across
 safepoint
Message-ID: <1246d39ddb974707aaed8b9f66791f43@sap.com>

Hi G?tz and Lutz,

thank you for the reviews. I'll update what you requested before pushing:

I'll add line breaks for the new arguments.
And I'll update the comment. Thanks for pointing me to it, Lutz.
R31 is used to determine the return pc which is used by one of the callers (generate_handler_blob). Please note that this register usage is unrelated to this change and I didn't touch it in this changelist.

Best regards,
Martin


-----Original Message-----
From: Schmidt, Lutz 
Sent: Mittwoch, 12. September 2018 12:24
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>; Michihiro Horie (HORIE at jp.ibm.com) <HORIE at jp.ibm.com>; Gustavo Romero <gromero at linux.vnet.ibm.com>
Subject: Re: RE: RFR(M): 8210497: [PPC64] Vector registers not saved across safepoint

Hi Martin, 

your changes look good overall. I'm not a reviewer, so that judgement doesn't help you much. ??

I have found a few details you may want to consider:

src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp
@line 272: the new frame is pushed, but @line 275 comment says frame is not yet pushed.
Is there a reason why you need both, R30 and R31, as scratch in push_frame_reg_args_and_save_live_registers()? 

Regards,
Lutz


From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> on behalf of G?tz Lindenmaier <goetz.lindenmaier at sap.com>
Date: Wednesday, 12. September 2018 at 12:12
To: "Doerr, Martin (martin.doerr at sap.com)" <martin.doerr at sap.com>, "'hotspot-compiler-dev at openjdk.java.net'" <hotspot-compiler-dev at openjdk.java.net>, "Michihiro Horie (HORIE at jp.ibm.com)" <HORIE at jp.ibm.com>, Gustavo Romero <gromero at linux.vnet.ibm.com>
Subject: [CAUTION] RE: RFR(M): 8210497: [PPC64] Vector registers not saved across safepoint

Hi Martin, 
?
I had a look at your fix and it looks good. 
I currently can?t judge though whether even more 
vector support is needed in some other place. 
But this is not subject of this fix I guess.
?
Reviewed.
?
Best regards,
? Geotz.
?
PS: I would appreciate if you put the ?save_registers?
argument on a line of it?s own whereever this is done for
all other arguments.? No new webrev needed.
?
?
From: Doerr, Martin 
Sent: Freitag, 7. September 2018 18:12
To: 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>; Michihiro Horie (HORIE at jp.ibm.com) <HORIE at jp.ibm.com>; Gustavo Romero <gromero at linux.vnet.ibm.com>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
Subject: RFR(M): 8210497: [PPC64] Vector registers not saved across safepoint
?
Hi,
?
we noticed that the RegisterSaver misses code to save and restore the vector registers on PPC64. I?d like to fix that.
?
Bug:
https://bugs.openjdk.java.net/browse/JDK-8210497
?
Webrev:
http://cr.openjdk.java.net/~mdoerr/8210497_PPC64_save_CR/webrev.00/
?
This webrev already fixes the following tests when JDK-8208171 webrev.03 is applied:
compiler/runtime/safepoints/TestRegisterRestoring.java
compiler/runtime/Test7196199.java
?
I?ll try to test the OopMap part. This may be tricky.
?
Best regards,
Martin
?


From aph at redhat.com  Wed Sep 12 14:36:40 2018
From: aph at redhat.com (Andrew Haley)
Date: Wed, 12 Sep 2018 15:36:40 +0100
Subject: JIT: C2 doesn't skip post barrier for new allocated objects
In-Reply-To: <1849545e-2e42-4b24-957c-cb4924362971.kuaiwei.kw@alibaba-inc.com>
References: <1849545e-2e42-4b24-957c-cb4924362971.kuaiwei.kw@alibaba-inc.com>
Message-ID: <eede4c68-c55d-0ccb-a4d6-5ad783e38204@redhat.com>

On 09/10/2018 09:39 AM, Kuai Wei wrote:
>   Recently I checked the optimization of reducing G1 post barrier for new allocated object. But I found it doesn't work as expected.
> I wrote a simple test case to store oop in initialize function or just after init function .

I believe you are correct. We need a bug report created for this.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From HORIE at jp.ibm.com  Wed Sep 12 16:10:51 2018
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Thu, 13 Sep 2018 01:10:51 +0900
Subject: RFR: PPC64: Mapping floating point registers to vsx registers in
 ppc.ad
Message-ID: <OF9584E067.98466151-ON00258306.0057D2FB-49258306.0058E2AD@notes.na.collabserv.com>


Dear all,

Would you please review the following change?

Bug: https://bugs.openjdk.java.net/browse/JDK-8210660
Webrev: http://cr.openjdk.java.net/~mhorie/8210660/webrev.00/

In the current code emit for replicating the floating point value in
ppc.ad, a floating point value is once stored in order to load as an
integer value. However, when SuperwordUseVSX is enabled, this is redundant
because the floating point registers are overlapped with vsx registers
0-31. We can use vsx instructions for replicating the floating point value
by mapping the floating point register to the vsx register.


Best regards,
--
Michihiro,
IBM Research - Tokyo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180913/f1bfe299/attachment.html>

From rkennke at redhat.com  Wed Sep 12 20:11:51 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 12 Sep 2018 22:11:51 +0200
Subject: RFR: JDK-8210656: Object equals abstraction for BarrierSetC2
Message-ID: <07f491e5-fb06-86a3-f6f1-1b9d37544fb0@redhat.com>

This introduces an abstraction to deal with object equality in
BarrierSetC2. This is needed by GCs that can have different copies of
same objects alive like Shenandoah.

The approach chosen here is slightly different than we did in e.g.
BarrierSetAssembler and the runtime Access API: instead of owning the
whole equality, it only provides a resolve-like method to resolve the
operands to stable values. The reason for doing this is that it's easier
to do this way in intrinsics if those barriers are detached from the
actual CmpP. This is because the barriers create new memory states, and
we'd have to create memphis around those things, which is considerably
more complex.

I chose to add a new resolve_for_obj_equals(a, b) method instead of
using two calls to resolve(a); resolve(b); because this allows for
optimization: if any of a or b is known to be NULL, we can elide
barriers for both. This is not possible to do with two independent
resolve() calls.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8210656
Webrev:
http://cr.openjdk.java.net/~rkennke/JDK-8210656/webrev.00/

Testing: passes hotspot/jtreg:tier1

What do you think about this?

Thanks,
Roman

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180912/bb052a85/signature.asc>

From vladimir.kozlov at oracle.com  Wed Sep 12 20:52:59 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 12 Sep 2018 13:52:59 -0700
Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error #
 ERROR: TEST FAILED: Cought IOException while receiving event packet
Message-ID: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com>

http://cr.openjdk.java.net/~kvn/8210220/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8210220

Don't register AOT method if corresponding java method has breakpoints (for debugging) otherwise AOT 
method will be executed which do not stop at breakpoint. JIT has similar check [1].

I also removed AOT code which is not used and we forgot to remove.

Tested hs-tier1-3.

thanks,
Vladimir

[1] http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845

From vladimir.kozlov at oracle.com  Wed Sep 12 21:15:42 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 12 Sep 2018 14:15:42 -0700
Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by
 power-of-2 check
In-Reply-To: <DB7PR08MB31151A5DD37FAB07FDCB9B29961B0@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB31150B1D6C7E547538B2B99A96050@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <bcf5438c-50f9-7d75-b4d8-d09b89783816@oracle.com>
 <dk61sa0wigj.fsf@rwestrel.remote.csb>
 <DB7PR08MB31151A5DD37FAB07FDCB9B29961B0@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com>

Looks good.

Thanks,
Vladimir

On 9/12/18 2:50 AM, Pengfei Li (Arm Technology China) wrote:
> Hi,
> 
> I've updated the patch based on Vladimir's comment. I added checks for SubI on both branches of the diamond phi.
> Also thanks Roland for the suggestion that supporting a Phi with 3 or more inputs. But I think the matching rule will be much more complex if we add this. And I'm not sure if there are any real case scenario which can benefit from this support. So I didn't add it in.
> 
> New webrev: http://cr.openjdk.java.net/~njian/8210152/webrev.01/
> I've run jtreg full test with the new patch and no new issues found.
> 
> Please let me know if you have other comments or suggestions. If no further issues, I need your help to sponsor and push the patch.
> 
> --
> Thanks,
> Pengfei
> 
> 
>> -----Original Message-----
>> From: Roland Westrelin <rwestrel at redhat.com>
>> Sent: Tuesday, September 11, 2018 16:24
>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Pengfei Li (Arm
>> Technology China) <Pengfei.Li at arm.com>; dean.long at oracle.com; hotspot-
>> compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net
>> Cc: nd <nd at arm.com>
>> Subject: Re: [PING] RE: RFR(S): 8210152: Optimize integer divisible by power-
>> of-2 check
>>
>>
>>> The only comment I have is to add check for SubI on other branch (not
>>> only on True branch). Negation may occur on either branch since you
>>> accept all conditions for negation.
>>
>> Can't we make this more general and support a phi with any number of
>> inputs (not only 2 data inputs) as long as it's a mix of X and -X?
>>
>> Roland.

From dean.long at oracle.com  Thu Sep 13 00:45:48 2018
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Wed, 12 Sep 2018 17:45:48 -0700
Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error
 # ERROR: TEST FAILED: Cought IOException while receiving event packet
In-Reply-To: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com>
References: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com>
Message-ID: <ffc441da-0c77-3f0a-e547-aa2e8aaa63b9@oracle.com>

Hi Vladimir.? C1 and C2 use ciEnv which also grabs locks and checks 
JvmtiExport::can_hotswap_or_post_breakpoint() and 
Dependencies::check_evol_method().? But if the breakpoint count can only 
be changed by the VM thread at a safepoint, then your fix looks good as 
long as we don't enter a safepoint before the code is registered.? How 
about adding a NoSafepointVerifier to publish_aot()?

dl


On 9/12/18 1:52 PM, Vladimir Kozlov wrote:
> http://cr.openjdk.java.net/~kvn/8210220/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8210220
>
> Don't register AOT method if corresponding java method has breakpoints 
> (for debugging) otherwise AOT method will be executed which do not 
> stop at breakpoint. JIT has similar check [1].
>
> I also removed AOT code which is not used and we forgot to remove.
>
> Tested hs-tier1-3.
>
> thanks,
> Vladimir
>
> [1] 
> http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845


From vladimir.kozlov at oracle.com  Thu Sep 13 01:25:30 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 12 Sep 2018 18:25:30 -0700
Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error
 # ERROR: TEST FAILED: Cought IOException while receiving event packet
In-Reply-To: <ffc441da-0c77-3f0a-e547-aa2e8aaa63b9@oracle.com>
References: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com>
 <ffc441da-0c77-3f0a-e547-aa2e8aaa63b9@oracle.com>
Message-ID: <a2dc6962-8001-6f51-044a-f7335c498d44@oracle.com>

Thank you, Dean

Breakpoint is set at safepoint:

http://hg.openjdk.java.net/jdk/jdk/file/b7bfd64e43a6/src/hotspot/share/prims/jvmtiImpl.cpp#l411

But why it is important to not be at safepoint in publish_aot(). If AOT is registered first and then breakpoint is set 
AOT methods will be deoptimized by CodeCache::flush_dependents_on_method() which is called from BreakpointInfo::set().

Vladimir

On 9/12/18 5:45 PM, dean.long at oracle.com wrote:
> Hi Vladimir.? C1 and C2 use ciEnv which also grabs locks and checks JvmtiExport::can_hotswap_or_post_breakpoint() and 
> Dependencies::check_evol_method().? But if the breakpoint count can only be changed by the VM thread at a safepoint, 
> then your fix looks good as long as we don't enter a safepoint before the code is registered.? How about adding a 
> NoSafepointVerifier to publish_aot()?
> 
> dl
> 
> 
> On 9/12/18 1:52 PM, Vladimir Kozlov wrote:
>> http://cr.openjdk.java.net/~kvn/8210220/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8210220
>>
>> Don't register AOT method if corresponding java method has breakpoints (for debugging) otherwise AOT method will be 
>> executed which do not stop at breakpoint. JIT has similar check [1].
>>
>> I also removed AOT code which is not used and we forgot to remove.
>>
>> Tested hs-tier1-3.
>>
>> thanks,
>> Vladimir
>>
>> [1] http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845
> 

From dean.long at oracle.com  Thu Sep 13 01:51:32 2018
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Wed, 12 Sep 2018 18:51:32 -0700
Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error
 # ERROR: TEST FAILED: Cought IOException while receiving event packet
In-Reply-To: <a2dc6962-8001-6f51-044a-f7335c498d44@oracle.com>
References: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com>
 <ffc441da-0c77-3f0a-e547-aa2e8aaa63b9@oracle.com>
 <a2dc6962-8001-6f51-044a-f7335c498d44@oracle.com>
Message-ID: <f57effec-c0ec-9f38-91cb-b647991bcd89@oracle.com>

On 9/12/18 6:25 PM, Vladimir Kozlov wrote:
> Thank you, Dean
>
> Breakpoint is set at safepoint:
>
> http://hg.openjdk.java.net/jdk/jdk/file/b7bfd64e43a6/src/hotspot/share/prims/jvmtiImpl.cpp#l411 
>
>
> But why it is important to not be at safepoint in publish_aot(). If 
> AOT is registered first and then breakpoint is set AOT methods will be 
> deoptimized by CodeCache::flush_dependents_on_method() which is called 
> from BreakpointInfo::set().

I mean you can't do this:

1) check breakpoint count
2) safepoint
3) register code

The AOT code is not visible to CodeCache::flush_dependents_on_method() 
until the cmpxchg().
NoSafepointVerifier would catch any changes in the future that introduce 
a safepoint.

dl

>
> Vladimir
>
> On 9/12/18 5:45 PM, dean.long at oracle.com wrote:
>> Hi Vladimir.? C1 and C2 use ciEnv which also grabs locks and checks 
>> JvmtiExport::can_hotswap_or_post_breakpoint() and 
>> Dependencies::check_evol_method().? But if the breakpoint count can 
>> only be changed by the VM thread at a safepoint, then your fix looks 
>> good as long as we don't enter a safepoint before the code is 
>> registered.? How about adding a NoSafepointVerifier to publish_aot()?
>>
>> dl
>>
>>
>> On 9/12/18 1:52 PM, Vladimir Kozlov wrote:
>>> http://cr.openjdk.java.net/~kvn/8210220/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8210220
>>>
>>> Don't register AOT method if corresponding java method has 
>>> breakpoints (for debugging) otherwise AOT method will be executed 
>>> which do not stop at breakpoint. JIT has similar check [1].
>>>
>>> I also removed AOT code which is not used and we forgot to remove.
>>>
>>> Tested hs-tier1-3.
>>>
>>> thanks,
>>> Vladimir
>>>
>>> [1] 
>>> http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845
>>


From igor.ignatyev at oracle.com  Thu Sep 13 03:46:50 2018
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 12 Sep 2018 20:46:50 -0700
Subject: RFR(XS) : 8210699 : Problem list tests which times out in Xcomp mode
Message-ID: <CFB673E4-09C9-4D4D-A8F2-15C6F2FA0BEA@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8210699/webrev.00/index.html
> 62 lines changed: 62 ins; 0 del; 0 mod; 

Hi all,

could you please review this small fix which introduces a Xcomp-specific problem-list?

the analysis of the failures from last two weeks showed that most of timeouts w/ -Xcomp happen in a handful number of tests. the patch puts the tests which times out only w/ Xcomp.

for the record, here is the statistics on how many timeouts we have observed:
java/lang/invoke/MethodHandles/CatchExceptionTest : 33 times, on different platforms
vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH : 32 times, only on solaris-sparc 
runtime/appcds/cacheObject/DifferentHeapSizes : 17 times, only on solaris-sparc 
java/lang/Class/forName/modules/TestDriver : 10 times, only on solaris-sparc

webrev: http://cr.openjdk.java.net/~iignatyev//8210699/webrev.00/index.html
JBS: https://bugs.openjdk.java.net/browse/JDK-8210699

Thanks,
-- Igor


From vladimir.kozlov at oracle.com  Thu Sep 13 04:00:33 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 12 Sep 2018 21:00:33 -0700
Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error
 # ERROR: TEST FAILED: Cought IOException while receiving event packet
In-Reply-To: <f57effec-c0ec-9f38-91cb-b647991bcd89@oracle.com>
References: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com>
 <ffc441da-0c77-3f0a-e547-aa2e8aaa63b9@oracle.com>
 <a2dc6962-8001-6f51-044a-f7335c498d44@oracle.com>
 <f57effec-c0ec-9f38-91cb-b647991bcd89@oracle.com>
Message-ID: <1203c5d6-c74a-e205-9dcb-eb43fb6971d3@oracle.com>

Yes, you are right I will add NoSafepointVerifier and will rerun testing.

Thanks,
Vladimir

On 9/12/18 6:51 PM, dean.long at oracle.com wrote:
> On 9/12/18 6:25 PM, Vladimir Kozlov wrote:
>> Thank you, Dean
>>
>> Breakpoint is set at safepoint:
>>
>> http://hg.openjdk.java.net/jdk/jdk/file/b7bfd64e43a6/src/hotspot/share/prims/jvmtiImpl.cpp#l411
>>
>> But why it is important to not be at safepoint in publish_aot(). If AOT is registered first and then breakpoint is set 
>> AOT methods will be deoptimized by CodeCache::flush_dependents_on_method() which is called from BreakpointInfo::set().
> 
> I mean you can't do this:
> 
> 1) check breakpoint count
> 2) safepoint
> 3) register code
> 
> The AOT code is not visible to CodeCache::flush_dependents_on_method() until the cmpxchg().
> NoSafepointVerifier would catch any changes in the future that introduce a safepoint.
> 
> dl
> 
>>
>> Vladimir
>>
>> On 9/12/18 5:45 PM, dean.long at oracle.com wrote:
>>> Hi Vladimir.? C1 and C2 use ciEnv which also grabs locks and checks JvmtiExport::can_hotswap_or_post_breakpoint() and 
>>> Dependencies::check_evol_method().? But if the breakpoint count can only be changed by the VM thread at a safepoint, 
>>> then your fix looks good as long as we don't enter a safepoint before the code is registered.? How about adding a 
>>> NoSafepointVerifier to publish_aot()?
>>>
>>> dl
>>>
>>>
>>> On 9/12/18 1:52 PM, Vladimir Kozlov wrote:
>>>> http://cr.openjdk.java.net/~kvn/8210220/webrev.00/
>>>> https://bugs.openjdk.java.net/browse/JDK-8210220
>>>>
>>>> Don't register AOT method if corresponding java method has breakpoints (for debugging) otherwise AOT method will be 
>>>> executed which do not stop at breakpoint. JIT has similar check [1].
>>>>
>>>> I also removed AOT code which is not used and we forgot to remove.
>>>>
>>>> Tested hs-tier1-3.
>>>>
>>>> thanks,
>>>> Vladimir
>>>>
>>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845
>>>
> 

From vladimir.kozlov at oracle.com  Thu Sep 13 04:02:06 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 12 Sep 2018 21:02:06 -0700
Subject: RFR(XS) : 8210699 : Problem list tests which times out in Xcomp
 mode
In-Reply-To: <CFB673E4-09C9-4D4D-A8F2-15C6F2FA0BEA@oracle.com>
References: <CFB673E4-09C9-4D4D-A8F2-15C6F2FA0BEA@oracle.com>
Message-ID: <67fdf5cd-4642-fad1-88ca-1d113fb5ecc4@oracle.com>

Good.

Thanks,
Vladimir

On 9/12/18 8:46 PM, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev//8210699/webrev.00/index.html
>> 62 lines changed: 62 ins; 0 del; 0 mod;
> 
> Hi all,
> 
> could you please review this small fix which introduces a Xcomp-specific problem-list?
> 
> the analysis of the failures from last two weeks showed that most of timeouts w/ -Xcomp happen in a handful number of tests. the patch puts the tests which times out only w/ Xcomp.
> 
> for the record, here is the statistics on how many timeouts we have observed:
> java/lang/invoke/MethodHandles/CatchExceptionTest : 33 times, on different platforms
> vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH : 32 times, only on solaris-sparc
> runtime/appcds/cacheObject/DifferentHeapSizes : 17 times, only on solaris-sparc
> java/lang/Class/forName/modules/TestDriver : 10 times, only on solaris-sparc
> 
> webrev: http://cr.openjdk.java.net/~iignatyev//8210699/webrev.00/index.html
> JBS: https://bugs.openjdk.java.net/browse/JDK-8210699
> 
> Thanks,
> -- Igor
> 

From igor.ignatyev at oracle.com  Thu Sep 13 04:55:31 2018
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 12 Sep 2018 21:55:31 -0700
Subject: RFR(XS) : 8210699 : Problem list tests which times out in Xcomp
 mode
In-Reply-To: <67fdf5cd-4642-fad1-88ca-1d113fb5ecc4@oracle.com>
References: <CFB673E4-09C9-4D4D-A8F2-15C6F2FA0BEA@oracle.com>
 <67fdf5cd-4642-fad1-88ca-1d113fb5ecc4@oracle.com>
Message-ID: <F2D5CE92-375C-4D46-AFF0-097F9A3C6EA7@oracle.com>

Vladimir,

thank you for review.

-- Igor 

> On Sep 12, 2018, at 9:02 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Good.
> 
> Thanks,
> Vladimir
> 
> On 9/12/18 8:46 PM, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev//8210699/webrev.00/index.html
>>> 62 lines changed: 62 ins; 0 del; 0 mod;
>> Hi all,
>> could you please review this small fix which introduces a Xcomp-specific problem-list?
>> the analysis of the failures from last two weeks showed that most of timeouts w/ -Xcomp happen in a handful number of tests. the patch puts the tests which times out only w/ Xcomp.
>> for the record, here is the statistics on how many timeouts we have observed:
>> java/lang/invoke/MethodHandles/CatchExceptionTest : 33 times, on different platforms
>> vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH : 32 times, only on solaris-sparc
>> runtime/appcds/cacheObject/DifferentHeapSizes : 17 times, only on solaris-sparc
>> java/lang/Class/forName/modules/TestDriver : 10 times, only on solaris-sparc
>> webrev: http://cr.openjdk.java.net/~iignatyev//8210699/webrev.00/index.html
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8210699
>> Thanks,
>> -- Igor


From martin.doerr at sap.com  Thu Sep 13 07:25:28 2018
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 13 Sep 2018 07:25:28 +0000
Subject: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx
 registers in ppc.ad
Message-ID: <5e113528b9c94b5089b8b7f1b8b22ed8@sap.com>

Hi Michihiro,

I have added "RFR(S): 8210660" to the subject.

I don't think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used.
Besides this, your change looks good to me.

Would you like to improve ReplicateD with vector length 2, too?

Thanks and best regards,
Martin


From: Michihiro Horie <HORIE at jp.ibm.com>
Sent: Mittwoch, 12. September 2018 18:11
To: hotspot-compiler-dev at openjdk.java.net
Cc: Doerr, Martin <martin.doerr at sap.com>; Gustavo Romero <gromero at linux.vnet.ibm.com>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
Subject: RFR: PPC64: Mapping floating point registers to vsx registers in ppc.ad


Dear all,

Would you please review the following change?

Bug: https://bugs.openjdk.java.net/browse/JDK-8210660
Webrev: http://cr.openjdk.java.net/~mhorie/8210660/webrev.00/

In the current code emit for replicating the floating point value in ppc.ad, a floating point value is once stored in order to load as an integer value. However, when SuperwordUseVSX is enabled, this is redundant because the floating point registers are overlapped with vsx registers 0-31. We can use vsx instructions for replicating the floating point value by mapping the floating point register to the vsx register.


Best regards,
--
Michihiro,
IBM Research - Tokyo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180913/2c68b684/attachment.html>

From Pengfei.Li at arm.com  Thu Sep 13 08:31:01 2018
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Thu, 13 Sep 2018 08:31:01 +0000
Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by
 power-of-2 check
In-Reply-To: <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com>
References: <DB7PR08MB31150B1D6C7E547538B2B99A96050@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <bcf5438c-50f9-7d75-b4d8-d09b89783816@oracle.com>
 <dk61sa0wigj.fsf@rwestrel.remote.csb>
 <DB7PR08MB31151A5DD37FAB07FDCB9B29961B0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com>
Message-ID: <DB7PR08MB3115A4C3DB86C659354E5062961A0@DB7PR08MB3115.eurprd08.prod.outlook.com>

Thanks Vladimir.
So I still need another reviewer's feedback.

--
Thanks,
Pengfei

> -----Original Message-----
> 
> Looks good.
> 
> Thanks,
> Vladimir
> 
> On 9/12/18 2:50 AM, Pengfei Li (Arm Technology China) wrote:
> > Hi,
> >
> > I've updated the patch based on Vladimir's comment. I added checks for
> SubI on both branches of the diamond phi.
> > Also thanks Roland for the suggestion that supporting a Phi with 3 or more
> inputs. But I think the matching rule will be much more complex if we add
> this. And I'm not sure if there are any real case scenario which can benefit
> from this support. So I didn't add it in.
> >
> > New webrev: http://cr.openjdk.java.net/~njian/8210152/webrev.01/
> > I've run jtreg full test with the new patch and no new issues found.
> >
> > Please let me know if you have other comments or suggestions. If no
> further issues, I need your help to sponsor and push the patch.
> >
> > --
> > Thanks,
> > Pengfei
> >
> >
> >> -----Original Message-----
> >> From: Roland Westrelin <rwestrel at redhat.com>
> >> Sent: Tuesday, September 11, 2018 16:24
> >> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Pengfei Li (Arm
> >> Technology China) <Pengfei.Li at arm.com>; dean.long at oracle.com;
> >> hotspot- compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net
> >> Cc: nd <nd at arm.com>
> >> Subject: Re: [PING] RE: RFR(S): 8210152: Optimize integer divisible
> >> by power-
> >> of-2 check
> >>
> >>
> >>> The only comment I have is to add check for SubI on other branch
> >>> (not only on True branch). Negation may occur on either branch since
> >>> you accept all conditions for negation.
> >>
> >> Can't we make this more general and support a phi with any number of
> >> inputs (not only 2 data inputs) as long as it's a mix of X and -X?
> >>
> >> Roland.

From Pengfei.Li at arm.com  Thu Sep 13 09:04:36 2018
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Thu, 13 Sep 2018 09:04:36 +0000
Subject: RFR(S): 8210413: AArch64: Optimize div/rem by constant in C1
Message-ID: <DB7PR08MB3115445A18A786BAFD1F7B08961A0@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi,

Could you please help review this optimization in C1 AArch64?

Currently, there are 2 LIR_Assembler::arithmetic_idiv() methods in c1_LIRAssembler_aarch64.cpp. One is left unimplemented, the other checks whether the divisor is a power-of-2 constant but does nothing optimized then. In this patch, I combined these 2 methods and added 2 below optimizations for integer div/rem.
1) Remove the div-by-zero check if the divisor is known to be a non-zero constant.
2) Use cheaper instructions instead of "sdiv" to do div/rem by a power-of-2 constant (including 1, 2, 4, 8, 16, ...)

JBS: https://bugs.openjdk.java.net/browse/JDK-8210413
webrev: http://cr.openjdk.java.net/~njian/8210413/webrev.00/

As Roman Kennke's original code comment said, using the temp register passed into arithmetic_idiv() is problematic. So I also use the rscratch1 directly for intermediate result in div/rem calculations. You could refer thread http://mail.openjdk.java.net/pipermail/aarch64-port-dev/2018-September/006315.html for this issue.

I've run jtreg full test with this patch and JVM option "-XX:TieredStopAtLevel=1" on an AArch64 server and no new issues found.

--
Thanks,
Pengfei

From erik.osterlund at oracle.com  Thu Sep 13 10:43:21 2018
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Thu, 13 Sep 2018 12:43:21 +0200
Subject: RFR: JDK-8210656: Object equals abstraction for BarrierSetC2
In-Reply-To: <07f491e5-fb06-86a3-f6f1-1b9d37544fb0@redhat.com>
References: <07f491e5-fb06-86a3-f6f1-1b9d37544fb0@redhat.com>
Message-ID: <5B9A3F49.4090905@oracle.com>

Hi Roman,

Interesting. So semantically, this is cmp(resolve(a), resolve(b)), but 
you have circumstances in which the barriers are unnecessary and can be 
elided. Any of them having null in their type is one reason, but I 
suppose there are surely other reasons as well (such as finding 
dominating write barriers).

I see two different approaches for this barrier elision:
1) Elide it during parsing (as you propose)
2) Elide it during Optimize (which I think conceptually looks like a 
natural fit)

I originally proposed a function on BarrierSetC2 that I think I called 
optimize_barriers() or something like that. The idea was to use this 
hook to let GC barrier code shave off pointless (not to be confused with 
useless) barriers that can be removed. Roland thought that this seemed 
too specific to ZGC to warrant a general API, and I agreed, because 
indeed only ZGC used this hook at the time. This is today 
ZBarrierSetC2::find_dominating_barriers which is called straight from 
Optimize.

I wonder if it would make sense to re-instate that hook. Then you could 
use the existing resolve() barriers during parsing, and leave barrier 
elision tricks (null checks included, plus other tricks you might have 
up your sleeve) to Optimize. For example, you might be able to walk your 
list of barriers and disconnect these pointless barriers. What do you think?

Thanks,
/Erik

On 2018-09-12 22:11, Roman Kennke wrote:
> This introduces an abstraction to deal with object equality in
> BarrierSetC2. This is needed by GCs that can have different copies of
> same objects alive like Shenandoah.
>
> The approach chosen here is slightly different than we did in e.g.
> BarrierSetAssembler and the runtime Access API: instead of owning the
> whole equality, it only provides a resolve-like method to resolve the
> operands to stable values. The reason for doing this is that it's easier
> to do this way in intrinsics if those barriers are detached from the
> actual CmpP. This is because the barriers create new memory states, and
> we'd have to create memphis around those things, which is considerably
> more complex.
>
> I chose to add a new resolve_for_obj_equals(a, b) method instead of
> using two calls to resolve(a); resolve(b); because this allows for
> optimization: if any of a or b is known to be NULL, we can elide
> barriers for both. This is not possible to do with two independent
> resolve() calls.
>
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8210656
> Webrev:
> http://cr.openjdk.java.net/~rkennke/JDK-8210656/webrev.00/
>
> Testing: passes hotspot/jtreg:tier1
>
> What do you think about this?
>
> Thanks,
> Roman
>


From rkennke at redhat.com  Thu Sep 13 10:51:29 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 13 Sep 2018 12:51:29 +0200
Subject: RFR: JDK-8210656: Object equals abstraction for BarrierSetC2
In-Reply-To: <5B9A3F49.4090905@oracle.com>
References: <07f491e5-fb06-86a3-f6f1-1b9d37544fb0@redhat.com>
 <5B9A3F49.4090905@oracle.com>
Message-ID: <1cd9d4d1-5f18-15da-2c30-44effc1b8bf2@redhat.com>


Hi Erik,

> Interesting. So semantically, this is cmp(resolve(a), resolve(b)), but
> you have circumstances in which the barriers are unnecessary and can be
> elided. Any of them having null in their type is one reason, but I
> suppose there are surely other reasons as well (such as finding
> dominating write barriers).

Yes. We can already handle reasons that relate to 'stand-alone' barriers
(like dominating write-barriers and others). However, this one is
different because it relates to the *combination* of the two operands.
I.e. a property of operand A or B would affect barriers for both A *and*
B. This seems tricky to do after parsing. I guess we could look at CmpP,
check their operands for known-null, and elide the write-barriers then,
but this also means we need to check if the write-barriers haven't found
other uses in the meantime, etc). Overall, this seemed *much* more
hassle, whereas during parsing it comes quite naturally. See our impl:

https://paste.fedoraproject.org/paste/Hr~nKkm4HnZo3hmcw3Snnw

Roland: how hard/feasible would it be to do something like Erik
proposed? I.e. use the usual resolve() for obj-eq and elide barriers
later? It might have additional advantage (not sure) to catch cases
where type of an object becomes known-null later in the optimization
process?

Roman

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180913/adf37f57/signature.asc>

From aph at redhat.com  Thu Sep 13 11:05:52 2018
From: aph at redhat.com (Andrew Haley)
Date: Thu, 13 Sep 2018 12:05:52 +0100
Subject: [aarch64-port-dev ] RFR(S): 8210413: AArch64: Optimize div/rem by
 constant in C1
In-Reply-To: <DB7PR08MB3115445A18A786BAFD1F7B08961A0@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115445A18A786BAFD1F7B08961A0@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <9645c210-3d87-52fa-8051-54dc60629866@redhat.com>

Hi,

On 09/13/2018 10:04 AM, Pengfei Li (Arm Technology China) wrote:

> Could you please help review this optimization in C1 AArch64?
> JBS: https://bugs.openjdk.java.net/browse/JDK-8210413
> webrev: http://cr.openjdk.java.net/~njian/8210413/webrev.00/

It looks fine, but it's really odd that this is only implemented for ints and not
longs. Can you do longs too?

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From rkennke at redhat.com  Thu Sep 13 12:15:21 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 13 Sep 2018 14:15:21 +0200
Subject: RFR: JDK-8210656: Object equals abstraction for BarrierSetC2
In-Reply-To: <5B9A3F49.4090905@oracle.com>
References: <07f491e5-fb06-86a3-f6f1-1b9d37544fb0@redhat.com>
 <5B9A3F49.4090905@oracle.com>
Message-ID: <8b48dd82-7162-1d5a-527a-fc068a72c009@redhat.com>

Hi Erik,

I talked to Roland about this. It turns out that we already have this
optimization pass, and could just as well live with cmp(resolve(a),
resolve(b)). We need a little (shenandoah-specific-) hook in
CmpPNode::Ideal() for that though (but we'd need that anyway I suppose).
If you agree with that, I'll withdraw this RFR. Ok?

Roman

> Hi Roman,
> 
> Interesting. So semantically, this is cmp(resolve(a), resolve(b)), but
> you have circumstances in which the barriers are unnecessary and can be
> elided. Any of them having null in their type is one reason, but I
> suppose there are surely other reasons as well (such as finding
> dominating write barriers).
> 
> I see two different approaches for this barrier elision:
> 1) Elide it during parsing (as you propose)
> 2) Elide it during Optimize (which I think conceptually looks like a
> natural fit)
> 
> I originally proposed a function on BarrierSetC2 that I think I called
> optimize_barriers() or something like that. The idea was to use this
> hook to let GC barrier code shave off pointless (not to be confused with
> useless) barriers that can be removed. Roland thought that this seemed
> too specific to ZGC to warrant a general API, and I agreed, because
> indeed only ZGC used this hook at the time. This is today
> ZBarrierSetC2::find_dominating_barriers which is called straight from
> Optimize.
> 
> I wonder if it would make sense to re-instate that hook. Then you could
> use the existing resolve() barriers during parsing, and leave barrier
> elision tricks (null checks included, plus other tricks you might have
> up your sleeve) to Optimize. For example, you might be able to walk your
> list of barriers and disconnect these pointless barriers. What do you
> think?
> 
> Thanks,
> /Erik
> 
> On 2018-09-12 22:11, Roman Kennke wrote:
>> This introduces an abstraction to deal with object equality in
>> BarrierSetC2. This is needed by GCs that can have different copies of
>> same objects alive like Shenandoah.
>>
>> The approach chosen here is slightly different than we did in e.g.
>> BarrierSetAssembler and the runtime Access API: instead of owning the
>> whole equality, it only provides a resolve-like method to resolve the
>> operands to stable values. The reason for doing this is that it's easier
>> to do this way in intrinsics if those barriers are detached from the
>> actual CmpP. This is because the barriers create new memory states, and
>> we'd have to create memphis around those things, which is considerably
>> more complex.
>>
>> I chose to add a new resolve_for_obj_equals(a, b) method instead of
>> using two calls to resolve(a); resolve(b); because this allows for
>> optimization: if any of a or b is known to be NULL, we can elide
>> barriers for both. This is not possible to do with two independent
>> resolve() calls.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8210656
>> Webrev:
>> http://cr.openjdk.java.net/~rkennke/JDK-8210656/webrev.00/
>>
>> Testing: passes hotspot/jtreg:tier1
>>
>> What do you think about this?
>>
>> Thanks,
>> Roman
>>
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180913/7f3bbb0f/signature.asc>

From erik.osterlund at oracle.com  Thu Sep 13 12:51:16 2018
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Thu, 13 Sep 2018 14:51:16 +0200
Subject: RFR: JDK-8210656: Object equals abstraction for BarrierSetC2
In-Reply-To: <8b48dd82-7162-1d5a-527a-fc068a72c009@redhat.com>
References: <07f491e5-fb06-86a3-f6f1-1b9d37544fb0@redhat.com>
 <5B9A3F49.4090905@oracle.com>
 <8b48dd82-7162-1d5a-527a-fc068a72c009@redhat.com>
Message-ID: <5B9A5D44.7040409@oracle.com>

Hi Roman,

I'm glad this idea works well for you. If you need an Ideal hook for 
CmpPNode anyway for barrier optimizations, then I suppose we should sort 
something out there. In that API though, it would be great if it was not 
specific to CmpPNode. I'm thinking something along the lines of Node* 
BarrierSetC2::ideal_node(Node* n), and then figure out if something 
should or should not be done for a given node in the backend. That way, 
if we need ideal hooks for other node types, we could reuse the same API 
to ask the BarrierSetC2 if it has more ideal nodes. What do you think?

Thanks,
/Erik

On 2018-09-13 14:15, Roman Kennke wrote:
> Hi Erik,
>
> I talked to Roland about this. It turns out that we already have this
> optimization pass, and could just as well live with cmp(resolve(a),
> resolve(b)). We need a little (shenandoah-specific-) hook in
> CmpPNode::Ideal() for that though (but we'd need that anyway I suppose).
> If you agree with that, I'll withdraw this RFR. Ok?
>
> Roman
>
>> Hi Roman,
>>
>> Interesting. So semantically, this is cmp(resolve(a), resolve(b)), but
>> you have circumstances in which the barriers are unnecessary and can be
>> elided. Any of them having null in their type is one reason, but I
>> suppose there are surely other reasons as well (such as finding
>> dominating write barriers).
>>
>> I see two different approaches for this barrier elision:
>> 1) Elide it during parsing (as you propose)
>> 2) Elide it during Optimize (which I think conceptually looks like a
>> natural fit)
>>
>> I originally proposed a function on BarrierSetC2 that I think I called
>> optimize_barriers() or something like that. The idea was to use this
>> hook to let GC barrier code shave off pointless (not to be confused with
>> useless) barriers that can be removed. Roland thought that this seemed
>> too specific to ZGC to warrant a general API, and I agreed, because
>> indeed only ZGC used this hook at the time. This is today
>> ZBarrierSetC2::find_dominating_barriers which is called straight from
>> Optimize.
>>
>> I wonder if it would make sense to re-instate that hook. Then you could
>> use the existing resolve() barriers during parsing, and leave barrier
>> elision tricks (null checks included, plus other tricks you might have
>> up your sleeve) to Optimize. For example, you might be able to walk your
>> list of barriers and disconnect these pointless barriers. What do you
>> think?
>>
>> Thanks,
>> /Erik
>>
>> On 2018-09-12 22:11, Roman Kennke wrote:
>>> This introduces an abstraction to deal with object equality in
>>> BarrierSetC2. This is needed by GCs that can have different copies of
>>> same objects alive like Shenandoah.
>>>
>>> The approach chosen here is slightly different than we did in e.g.
>>> BarrierSetAssembler and the runtime Access API: instead of owning the
>>> whole equality, it only provides a resolve-like method to resolve the
>>> operands to stable values. The reason for doing this is that it's easier
>>> to do this way in intrinsics if those barriers are detached from the
>>> actual CmpP. This is because the barriers create new memory states, and
>>> we'd have to create memphis around those things, which is considerably
>>> more complex.
>>>
>>> I chose to add a new resolve_for_obj_equals(a, b) method instead of
>>> using two calls to resolve(a); resolve(b); because this allows for
>>> optimization: if any of a or b is known to be NULL, we can elide
>>> barriers for both. This is not possible to do with two independent
>>> resolve() calls.
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8210656
>>> Webrev:
>>> http://cr.openjdk.java.net/~rkennke/JDK-8210656/webrev.00/
>>>
>>> Testing: passes hotspot/jtreg:tier1
>>>
>>> What do you think about this?
>>>
>>> Thanks,
>>> Roman
>>>
>


From sandhya.viswanathan at intel.com  Thu Sep 13 13:05:49 2018
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Thu, 13 Sep 2018 13:05:49 +0000
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <d57561f0-94d2-a720-b54c-f48b51051fad@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
 <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>
 <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com>

Hi Vladimir,

Please find below the updated webrev with all your comments incorporated:

http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/

I have run the jtreg compiler tests on SKX and KNL which have two different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms.

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Tuesday, September 11, 2018 8:54 PM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction

Thank you, Sandhya

I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.

Vladimir

On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> Thanks a lot for the detailed review. I really appreciate your feedback.
> Please see my response in your email below marked with (Sandhya >>>). Looking forward to your advice.
> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Tuesday, September 11, 2018 5:11 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
> 
> Thank you.
> 
> I want to discuss next issue:
> 
>   > You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store?
>   >>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa.
> 
> This is what I thought. You increase registers pressure this way which may cause spills on stack.
> Also we don't check that register could be the same as result you may get unneeded moves.
> 
> I would advice add memory moves at least.
> 
> Sandhya >>>  I had added those rules initially and removed them in the final patch. I noticed that the register allocator uses the memory rules (e.g. LoadF) to initialize the idealreg2reg mask (matcher.cpp). I would like the register allocator to get all the possible register on an architecture for idealreg2reg mask. I wondered that multiple instruct rules in .ad file for LoadF from memory might cause problems.  I would have to have higher cost for loading into restricted register set like vlReg. Then I decided that the register allocator can handle this in much better way than me adding rules to load from memory. This is with the background that the regF is always all the available registers and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I am referring to is:
>    MachNode *spillCP = match_tree(new LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
> #endif
>    MachNode *spillI  = match_tree(new LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>    MachNode *spillL  = match_tree(new LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, LoadNode::DependsO
> nlyOnTest, false));
>    MachNode *spillF  = match_tree(new LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>    MachNode *spillD  = match_tree(new LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>    MachNode *spillP  = match_tree(new LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>    ....
>    idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
> 
> An other question. You use movflt() and movdbl() which use either movap[s|d] and movs[s|d] instructions:
> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu/x86/macroAssembler_x86.hpp#l164
> Are these instructions work when avx512vl is not available? I see for vectors you use
> vpxor+vinserti* combination.
> 
> Sandhya >>> Yes the scalar floating point instructions are available with AVX512 encoding when avx512vl is not available. That is why you would see not just movflt, movdbl but all the other scalar operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions.
> 
> Last question. I notice next UseAVX check in vectors spills code in x86.ad:
> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
> 
> Should it be (UseAVX < 3)?
> 
> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
> 
> Thanks,
> Vladimir
> 
> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>> Hi Vladimir,
>>
>> Thanks a lot for your review and feedback. Please see my response in your email below. I will send an updated webrev incorporating your feedback.
>>
>> Best Regards,
>> Sandhya
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Monday, September 10, 2018 6:09 PM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>> hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>
>> Very nice. Thank you, Sandhya.
>>
>> I would like to see more meaningful naming in .ad files - instead of rreg* and ovec* to have something like vlReg* and legVec*.
>>
>>>>> Yes, accepted.
>>
>> New load_from_* and load_to_* instructions in .ad files should be renamed to next and collocate with other Move*_reg_reg* instructions:
>>
>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst,
>> vlRegF src)
>>>>> Yes, accepted.
>>
>> You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store?
>>>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>
>> Also please explain why these registers are used when UseAVX == 0?:
>>
>> +instruct absD_reg(rregD dst) %{
>>       predicate((UseSSE>=2) && (UseAVX == 0));
>>
>> we switch off evex so regular regD (only legacy register in this case) should work too:
>>     661   if (UseAVX < 3) {
>>     662     _features &= ~CPU_AVX512F;
>>
>>>>> Yes, accepted. It could be regD here.
>>
>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
>>
>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>> +vectors_reg_legacy, %{
>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
>> VM_Version::supports_avx512dq() &&
>> VM_Version::supports_avx512vl() %} );
>>
>>>>> Yes, accepted.
>>
>> I would suggest to test these changes on different machines (non-avx512 and avx512) and with different UseAVX values.
>>
>>>>> Will do.
>>
>> Thanks,
>> Vladimir
>>
>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>>> Recently there have been couple of high priority issues with regards
>>> to high bank of XMM register
>>> (XMM16-XMM31) usage by C2:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>
>>> Please find below a patch which attempts to clean up the XMM register handling by using register groups.
>>>
>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>>
>>> The patch provides a restricted set of registers to the match rules
>>> in the ad file based on the underlying architecture.
>>>
>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>>>
>>> By removing the special handling, the patch reduces the overall code size by about 1800 lines of code.
>>>
>>> Your review and feedback is very welcome.
>>>
>>> Best Regards,
>>>
>>> Sandhya
>>>

From dmitrij.pochepko at bell-sw.com  Thu Sep 13 14:35:53 2018
From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko)
Date: Thu, 13 Sep 2018 17:35:53 +0300
Subject: RFR: 8189107 - AARCH64: create intrinsic for pow
In-Reply-To: <ae6b81dd-795d-e23f-099a-761d776f3466@redhat.com>
References: <d5554366-9851-a63c-b8d2-c1300219f487@bell-sw.com>
 <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com>
 <c9153905-1f74-fb34-4938-b48b34d5a12c@redhat.com>
 <a7707848-3075-9fa4-f81d-89c128fd6dca@bell-sw.com>
 <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com>
 <ae6b81dd-795d-e23f-099a-761d776f3466@redhat.com>
Message-ID: <02da7309-9a70-ea30-fafc-d1d85cfae328@bell-sw.com>

Hi,

I found 3 items to fix in your comments in 
http://cr.openjdk.java.net/~adinn/8189107/original_algorithm.txt


1)

//?????????? [1, sqrt(3/2)), [sqrt(3/2), sqrt(3)/2), [sqrt(3)/2, 2)
//????? i.e. [1, ~1.225],??? [~1.225,??? ~1.732),??? [~1.732, 2)

this one should be:

[1, sqrt(3/2)), [sqrt(3/2), sqrt(3)), [sqrt(3), 2)
i.e. [1, ~1.225],??? [~1.225,??? ~1.732),??? [~1.732,??? 2)


2)

"4) Filter out overflows (z > 1023) or underflows (z < -1077)"

should be:

"4) Filter out overflows (z > 1023) or underflows (z < -1076)"


3) "5) Let |z| = n + r where n is int, 0 <= n < 10, and 0 <= r < 1"

should be:

"5) Let |z| = n + r where n is int, 0 <= n < 1076, and 0 <= r < 1"


Other comments seems fine

Thanks,

Dmitrij


On 07/09/18 15:58, Andrew Dinn wrote:
>
> I have rewritten the algorithm to achieve what I think is needed to
> patch these omissions. The redraft of this part of the code is available
> here:
>
>    http://cr.openjdk.java.net/~adinn/8189107/original_algorithm.txt
>
>


From rkennke at redhat.com  Thu Sep 13 15:37:48 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 13 Sep 2018 17:37:48 +0200
Subject: RFR: JDK-8210656: Object equals abstraction for BarrierSetC2
In-Reply-To: <5B9A5D44.7040409@oracle.com>
References: <07f491e5-fb06-86a3-f6f1-1b9d37544fb0@redhat.com>
 <5B9A3F49.4090905@oracle.com>
 <8b48dd82-7162-1d5a-527a-fc068a72c009@redhat.com>
 <5B9A5D44.7040409@oracle.com>
Message-ID: <87d66c05-9f5c-8e18-a589-b47d21d72681@redhat.com>

Hi Erik,

> I'm glad this idea works well for you. If you need an Ideal hook for
> CmpPNode anyway for barrier optimizations, then I suppose we should sort
> something out there. In that API though, it would be great if it was not
> specific to CmpPNode. I'm thinking something along the lines of Node*
> BarrierSetC2::ideal_node(Node* n), and then figure out if something
> should or should not be done for a given node in the backend. That way,
> if we need ideal hooks for other node types, we could reuse the same API
> to ask the BarrierSetC2 if it has more ideal nodes. What do you think?


Yeah, that would actually be good. We have at least one more place that
I know of where we hook into Ideal(). The API would be something like this:

Node* ideal_node(PhaseGVN* phase, Node* n, bool can_reshape) const;

And a sample usage from CallLeafNode would look like this:

Node* CallLeafNode::Ideal(PhaseGVN* phase, bool can_reshape) {
  Node* ideal =
BarrierSet::barrier_set_c2()->barrier_set_c2()->ideal_node(phase, n,
can_reshape);
  if (ideal != NULL) {
    return ideal;
  }
  return CallNode::Ideal(phase, n, can_reshape);
}

Unfortunately (or maybe fortunately) this can't be inserted generically
into Node::Ideal(..) because subclasses can't be expected to always call
the super implementation.

Thanks for reviewing! I'll withdraw this RFR and push the additional
resolve() hooks via another RFE.

Cheers,
Roman


> Thanks,
> /Erik
> 
> On 2018-09-13 14:15, Roman Kennke wrote:
>> Hi Erik,
>>
>> I talked to Roland about this. It turns out that we already have this
>> optimization pass, and could just as well live with cmp(resolve(a),
>> resolve(b)). We need a little (shenandoah-specific-) hook in
>> CmpPNode::Ideal() for that though (but we'd need that anyway I suppose).
>> If you agree with that, I'll withdraw this RFR. Ok?
>>
>> Roman
>>
>>> Hi Roman,
>>>
>>> Interesting. So semantically, this is cmp(resolve(a), resolve(b)), but
>>> you have circumstances in which the barriers are unnecessary and can be
>>> elided. Any of them having null in their type is one reason, but I
>>> suppose there are surely other reasons as well (such as finding
>>> dominating write barriers).
>>>
>>> I see two different approaches for this barrier elision:
>>> 1) Elide it during parsing (as you propose)
>>> 2) Elide it during Optimize (which I think conceptually looks like a
>>> natural fit)
>>>
>>> I originally proposed a function on BarrierSetC2 that I think I called
>>> optimize_barriers() or something like that. The idea was to use this
>>> hook to let GC barrier code shave off pointless (not to be confused with
>>> useless) barriers that can be removed. Roland thought that this seemed
>>> too specific to ZGC to warrant a general API, and I agreed, because
>>> indeed only ZGC used this hook at the time. This is today
>>> ZBarrierSetC2::find_dominating_barriers which is called straight from
>>> Optimize.
>>>
>>> I wonder if it would make sense to re-instate that hook. Then you could
>>> use the existing resolve() barriers during parsing, and leave barrier
>>> elision tricks (null checks included, plus other tricks you might have
>>> up your sleeve) to Optimize. For example, you might be able to walk your
>>> list of barriers and disconnect these pointless barriers. What do you
>>> think?
>>>
>>> Thanks,
>>> /Erik
>>>
>>> On 2018-09-12 22:11, Roman Kennke wrote:
>>>> This introduces an abstraction to deal with object equality in
>>>> BarrierSetC2. This is needed by GCs that can have different copies of
>>>> same objects alive like Shenandoah.
>>>>
>>>> The approach chosen here is slightly different than we did in e.g.
>>>> BarrierSetAssembler and the runtime Access API: instead of owning the
>>>> whole equality, it only provides a resolve-like method to resolve the
>>>> operands to stable values. The reason for doing this is that it's
>>>> easier
>>>> to do this way in intrinsics if those barriers are detached from the
>>>> actual CmpP. This is because the barriers create new memory states, and
>>>> we'd have to create memphis around those things, which is considerably
>>>> more complex.
>>>>
>>>> I chose to add a new resolve_for_obj_equals(a, b) method instead of
>>>> using two calls to resolve(a); resolve(b); because this allows for
>>>> optimization: if any of a or b is known to be NULL, we can elide
>>>> barriers for both. This is not possible to do with two independent
>>>> resolve() calls.
>>>>
>>>> Bug:
>>>> https://bugs.openjdk.java.net/browse/JDK-8210656
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~rkennke/JDK-8210656/webrev.00/
>>>>
>>>> Testing: passes hotspot/jtreg:tier1
>>>>
>>>> What do you think about this?
>>>>
>>>> Thanks,
>>>> Roman
>>>>
>>
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180913/ec337e6a/signature.asc>

From vladimir.kozlov at oracle.com  Thu Sep 13 16:01:18 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 13 Sep 2018 09:01:18 -0700
Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error
 # ERROR: TEST FAILED: Cought IOException while receiving event packet
In-Reply-To: <1203c5d6-c74a-e205-9dcb-eb43fb6971d3@oracle.com>
References: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com>
 <ffc441da-0c77-3f0a-e547-aa2e8aaa63b9@oracle.com>
 <a2dc6962-8001-6f51-044a-f7335c498d44@oracle.com>
 <f57effec-c0ec-9f38-91cb-b647991bcd89@oracle.com>
 <1203c5d6-c74a-e205-9dcb-eb43fb6971d3@oracle.com>
Message-ID: <a0fd0ec8-ccfd-c6c0-ddd0-c8a164e2eab7@oracle.com>

Updated changes with NoSafepointVerifier:

http://cr.openjdk.java.net/~kvn/8210220/webrev.01/

Vladimir

On 9/12/18 9:00 PM, Vladimir Kozlov wrote:
> Yes, you are right I will add NoSafepointVerifier and will rerun testing.
> 
> Thanks,
> Vladimir
> 
> On 9/12/18 6:51 PM, dean.long at oracle.com wrote:
>> On 9/12/18 6:25 PM, Vladimir Kozlov wrote:
>>> Thank you, Dean
>>>
>>> Breakpoint is set at safepoint:
>>>
>>> http://hg.openjdk.java.net/jdk/jdk/file/b7bfd64e43a6/src/hotspot/share/prims/jvmtiImpl.cpp#l411
>>>
>>> But why it is important to not be at safepoint in publish_aot(). If AOT is registered first and 
>>> then breakpoint is set AOT methods will be deoptimized by CodeCache::flush_dependents_on_method() 
>>> which is called from BreakpointInfo::set().
>>
>> I mean you can't do this:
>>
>> 1) check breakpoint count
>> 2) safepoint
>> 3) register code
>>
>> The AOT code is not visible to CodeCache::flush_dependents_on_method() until the cmpxchg().
>> NoSafepointVerifier would catch any changes in the future that introduce a safepoint.
>>
>> dl
>>
>>>
>>> Vladimir
>>>
>>> On 9/12/18 5:45 PM, dean.long at oracle.com wrote:
>>>> Hi Vladimir.? C1 and C2 use ciEnv which also grabs locks and checks 
>>>> JvmtiExport::can_hotswap_or_post_breakpoint() and Dependencies::check_evol_method().? But if the 
>>>> breakpoint count can only be changed by the VM thread at a safepoint, then your fix looks good 
>>>> as long as we don't enter a safepoint before the code is registered.? How about adding a 
>>>> NoSafepointVerifier to publish_aot()?
>>>>
>>>> dl
>>>>
>>>>
>>>> On 9/12/18 1:52 PM, Vladimir Kozlov wrote:
>>>>> http://cr.openjdk.java.net/~kvn/8210220/webrev.00/
>>>>> https://bugs.openjdk.java.net/browse/JDK-8210220
>>>>>
>>>>> Don't register AOT method if corresponding java method has breakpoints (for debugging) 
>>>>> otherwise AOT method will be executed which do not stop at breakpoint. JIT has similar check [1].
>>>>>
>>>>> I also removed AOT code which is not used and we forgot to remove.
>>>>>
>>>>> Tested hs-tier1-3.
>>>>>
>>>>> thanks,
>>>>> Vladimir
>>>>>
>>>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845
>>>>
>>

From HORIE at jp.ibm.com  Thu Sep 13 17:05:08 2018
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Fri, 14 Sep 2018 02:05:08 +0900
Subject: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx
 registers in ppc.ad
In-Reply-To: <5e113528b9c94b5089b8b7f1b8b22ed8@sap.com>
References: <5e113528b9c94b5089b8b7f1b8b22ed8@sap.com>
Message-ID: <OF38E7CA02.B1F6AF05-ON00258307.005C3C0B-49258307.005DDAA1@notes.na.collabserv.com>


Hi Martin,

Thank you so much for your review (and adding the ID in the subject :-).

>I don?t think we need 2 nodes for ReplicateF with vector length 4. Both
ones in your webrev are for Power8 so only one will be used.
You're right, thanks. I removed a redundant one.

I also refactored ReplicateD with vector length 2.

Following is the latest webrev:
http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/


Best regards,
--
Michihiro,
IBM Research - Tokyo


From:	"Doerr, Martin" <martin.doerr at sap.com>
To:	Michihiro Horie <HORIE at jp.ibm.com>,
            "hotspot-compiler-dev at openjdk.java.net"
            <hotspot-compiler-dev at openjdk.java.net>
Cc:	Gustavo Romero <gromero at linux.vnet.ibm.com>, "Lindenmaier,
            Goetz" <goetz.lindenmaier at sap.com>
Date:	2018/09/13 16:25
Subject:	RE: RFR(S): 8210660: PPC64: Mapping floating point registers to
            vsx registers in ppc.ad


Hi Michihiro,

I have added ?RFR(S): 8210660? to the subject.

I don?t think we need 2 nodes for ReplicateF with vector length 4. Both
ones in your webrev are for Power8 so only one will be used.
Besides this, your change looks good to me.

Would you like to improve ReplicateD with vector length 2, too?

Thanks and best regards,
Martin


From: Michihiro Horie <HORIE at jp.ibm.com>
Sent: Mittwoch, 12. September 2018 18:11
To: hotspot-compiler-dev at openjdk.java.net
Cc: Doerr, Martin <martin.doerr at sap.com>; Gustavo Romero
<gromero at linux.vnet.ibm.com>; Lindenmaier, Goetz
<goetz.lindenmaier at sap.com>
Subject: RFR: PPC64: Mapping floating point registers to vsx registers in
ppc.ad


Dear all,

Would you please review the following change?

Bug: https://bugs.openjdk.java.net/browse/JDK-8210660
Webrev: http://cr.openjdk.java.net/~mhorie/8210660/webrev.00/

In the current code emit for replicating the floating point value in
ppc.ad, a floating point value is once stored in order to load as an
integer value. However, when SuperwordUseVSX is enabled, this is redundant
because the floating point registers are overlapped with vsx registers
0-31. We can use vsx instructions for replicating the floating point value
by mapping the floating point register to the vsx register.


Best regards,
--
Michihiro,
IBM Research - Tokyo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180914/b0eb4ad9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180914/b0eb4ad9/graycol-0001.gif>

From dean.long at oracle.com  Thu Sep 13 18:59:05 2018
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Thu, 13 Sep 2018 11:59:05 -0700
Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error
 # ERROR: TEST FAILED: Cought IOException while receiving event packet
In-Reply-To: <a0fd0ec8-ccfd-c6c0-ddd0-c8a164e2eab7@oracle.com>
References: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com>
 <ffc441da-0c77-3f0a-e547-aa2e8aaa63b9@oracle.com>
 <a2dc6962-8001-6f51-044a-f7335c498d44@oracle.com>
 <f57effec-c0ec-9f38-91cb-b647991bcd89@oracle.com>
 <1203c5d6-c74a-e205-9dcb-eb43fb6971d3@oracle.com>
 <a0fd0ec8-ccfd-c6c0-ddd0-c8a164e2eab7@oracle.com>
Message-ID: <3c63df34-7f40-aac7-77aa-97f6d6a482cb@oracle.com>

After the first PauseNoSafepointVerifier, I think you need to check 
mh->number_of_breakpoints() again, because it could have changed.

dl

On 9/13/18 9:01 AM, Vladimir Kozlov wrote:
> Updated changes with NoSafepointVerifier:
>
> http://cr.openjdk.java.net/~kvn/8210220/webrev.01/
>
> Vladimir
>
> On 9/12/18 9:00 PM, Vladimir Kozlov wrote:
>> Yes, you are right I will add NoSafepointVerifier and will rerun 
>> testing.
>>
>> Thanks,
>> Vladimir
>>
>> On 9/12/18 6:51 PM, dean.long at oracle.com wrote:
>>> On 9/12/18 6:25 PM, Vladimir Kozlov wrote:
>>>> Thank you, Dean
>>>>
>>>> Breakpoint is set at safepoint:
>>>>
>>>> http://hg.openjdk.java.net/jdk/jdk/file/b7bfd64e43a6/src/hotspot/share/prims/jvmtiImpl.cpp#l411 
>>>>
>>>>
>>>> But why it is important to not be at safepoint in publish_aot(). If 
>>>> AOT is registered first and then breakpoint is set AOT methods will 
>>>> be deoptimized by CodeCache::flush_dependents_on_method() which is 
>>>> called from BreakpointInfo::set().
>>>
>>> I mean you can't do this:
>>>
>>> 1) check breakpoint count
>>> 2) safepoint
>>> 3) register code
>>>
>>> The AOT code is not visible to 
>>> CodeCache::flush_dependents_on_method() until the cmpxchg().
>>> NoSafepointVerifier would catch any changes in the future that 
>>> introduce a safepoint.
>>>
>>> dl
>>>
>>>>
>>>> Vladimir
>>>>
>>>> On 9/12/18 5:45 PM, dean.long at oracle.com wrote:
>>>>> Hi Vladimir.? C1 and C2 use ciEnv which also grabs locks and 
>>>>> checks JvmtiExport::can_hotswap_or_post_breakpoint() and 
>>>>> Dependencies::check_evol_method().? But if the breakpoint count 
>>>>> can only be changed by the VM thread at a safepoint, then your fix 
>>>>> looks good as long as we don't enter a safepoint before the code 
>>>>> is registered.? How about adding a NoSafepointVerifier to 
>>>>> publish_aot()?
>>>>>
>>>>> dl
>>>>>
>>>>>
>>>>> On 9/12/18 1:52 PM, Vladimir Kozlov wrote:
>>>>>> http://cr.openjdk.java.net/~kvn/8210220/webrev.00/
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8210220
>>>>>>
>>>>>> Don't register AOT method if corresponding java method has 
>>>>>> breakpoints (for debugging) otherwise AOT method will be executed 
>>>>>> which do not stop at breakpoint. JIT has similar check [1].
>>>>>>
>>>>>> I also removed AOT code which is not used and we forgot to remove.
>>>>>>
>>>>>> Tested hs-tier1-3.
>>>>>>
>>>>>> thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> [1] 
>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845
>>>>>
>>>


From vladimir.kozlov at oracle.com  Thu Sep 13 19:25:15 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 13 Sep 2018 12:25:15 -0700
Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error
 # ERROR: TEST FAILED: Cought IOException while receiving event packet
In-Reply-To: <3c63df34-7f40-aac7-77aa-97f6d6a482cb@oracle.com>
References: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com>
 <ffc441da-0c77-3f0a-e547-aa2e8aaa63b9@oracle.com>
 <a2dc6962-8001-6f51-044a-f7335c498d44@oracle.com>
 <f57effec-c0ec-9f38-91cb-b647991bcd89@oracle.com>
 <1203c5d6-c74a-e205-9dcb-eb43fb6971d3@oracle.com>
 <a0fd0ec8-ccfd-c6c0-ddd0-c8a164e2eab7@oracle.com>
 <3c63df34-7f40-aac7-77aa-97f6d6a482cb@oracle.com>
Message-ID: <3144010b-1cd1-cc27-2782-867326916ae1@oracle.com>

No, first PauseNoSafepointVerifier is on the path where we exit function without publishing AOT method.
May be my comment there is not clear. Do you have better suggestion for comment?

Thanks,
Vladimir

On 9/13/18 11:59 AM, dean.long at oracle.com wrote:
> After the first PauseNoSafepointVerifier, I think you need to check mh->number_of_breakpoints() 
> again, because it could have changed.
> 
> dl
> 
> On 9/13/18 9:01 AM, Vladimir Kozlov wrote:
>> Updated changes with NoSafepointVerifier:
>>
>> http://cr.openjdk.java.net/~kvn/8210220/webrev.01/
>>
>> Vladimir
>>
>> On 9/12/18 9:00 PM, Vladimir Kozlov wrote:
>>> Yes, you are right I will add NoSafepointVerifier and will rerun testing.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/12/18 6:51 PM, dean.long at oracle.com wrote:
>>>> On 9/12/18 6:25 PM, Vladimir Kozlov wrote:
>>>>> Thank you, Dean
>>>>>
>>>>> Breakpoint is set at safepoint:
>>>>>
>>>>> http://hg.openjdk.java.net/jdk/jdk/file/b7bfd64e43a6/src/hotspot/share/prims/jvmtiImpl.cpp#l411
>>>>>
>>>>> But why it is important to not be at safepoint in publish_aot(). If AOT is registered first and 
>>>>> then breakpoint is set AOT methods will be deoptimized by 
>>>>> CodeCache::flush_dependents_on_method() which is called from BreakpointInfo::set().
>>>>
>>>> I mean you can't do this:
>>>>
>>>> 1) check breakpoint count
>>>> 2) safepoint
>>>> 3) register code
>>>>
>>>> The AOT code is not visible to CodeCache::flush_dependents_on_method() until the cmpxchg().
>>>> NoSafepointVerifier would catch any changes in the future that introduce a safepoint.
>>>>
>>>> dl
>>>>
>>>>>
>>>>> Vladimir
>>>>>
>>>>> On 9/12/18 5:45 PM, dean.long at oracle.com wrote:
>>>>>> Hi Vladimir.? C1 and C2 use ciEnv which also grabs locks and checks 
>>>>>> JvmtiExport::can_hotswap_or_post_breakpoint() and Dependencies::check_evol_method().? But if 
>>>>>> the breakpoint count can only be changed by the VM thread at a safepoint, then your fix looks 
>>>>>> good as long as we don't enter a safepoint before the code is registered.? How about adding a 
>>>>>> NoSafepointVerifier to publish_aot()?
>>>>>>
>>>>>> dl
>>>>>>
>>>>>>
>>>>>> On 9/12/18 1:52 PM, Vladimir Kozlov wrote:
>>>>>>> http://cr.openjdk.java.net/~kvn/8210220/webrev.00/
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8210220
>>>>>>>
>>>>>>> Don't register AOT method if corresponding java method has breakpoints (for debugging) 
>>>>>>> otherwise AOT method will be executed which do not stop at breakpoint. JIT has similar check 
>>>>>>> [1].
>>>>>>>
>>>>>>> I also removed AOT code which is not used and we forgot to remove.
>>>>>>>
>>>>>>> Tested hs-tier1-3.
>>>>>>>
>>>>>>> thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845
>>>>>>
>>>>
> 

From dean.long at oracle.com  Thu Sep 13 19:44:07 2018
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Thu, 13 Sep 2018 12:44:07 -0700
Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error
 # ERROR: TEST FAILED: Cought IOException while receiving event packet
In-Reply-To: <3144010b-1cd1-cc27-2782-867326916ae1@oracle.com>
References: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com>
 <ffc441da-0c77-3f0a-e547-aa2e8aaa63b9@oracle.com>
 <a2dc6962-8001-6f51-044a-f7335c498d44@oracle.com>
 <f57effec-c0ec-9f38-91cb-b647991bcd89@oracle.com>
 <1203c5d6-c74a-e205-9dcb-eb43fb6971d3@oracle.com>
 <a0fd0ec8-ccfd-c6c0-ddd0-c8a164e2eab7@oracle.com>
 <3c63df34-7f40-aac7-77aa-97f6d6a482cb@oracle.com>
 <3144010b-1cd1-cc27-2782-867326916ae1@oracle.com>
Message-ID: <9200d62f-1854-a1ed-2f2e-7becbe1beea0@oracle.com>

No it's fine.? I wasn't looking at the full context.? Sorry for the 
confusion.? This version is good.

dl

On 9/13/18 12:25 PM, Vladimir Kozlov wrote:
> No, first PauseNoSafepointVerifier is on the path where we exit 
> function without publishing AOT method.
> May be my comment there is not clear. Do you have better suggestion 
> for comment?
>
> Thanks,
> Vladimir
>
> On 9/13/18 11:59 AM, dean.long at oracle.com wrote:
>> After the first PauseNoSafepointVerifier, I think you need to check 
>> mh->number_of_breakpoints() again, because it could have changed.
>>
>> dl
>>
>> On 9/13/18 9:01 AM, Vladimir Kozlov wrote:
>>> Updated changes with NoSafepointVerifier:
>>>
>>> http://cr.openjdk.java.net/~kvn/8210220/webrev.01/
>>>
>>> Vladimir
>>>
>>> On 9/12/18 9:00 PM, Vladimir Kozlov wrote:
>>>> Yes, you are right I will add NoSafepointVerifier and will rerun 
>>>> testing.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 9/12/18 6:51 PM, dean.long at oracle.com wrote:
>>>>> On 9/12/18 6:25 PM, Vladimir Kozlov wrote:
>>>>>> Thank you, Dean
>>>>>>
>>>>>> Breakpoint is set at safepoint:
>>>>>>
>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/b7bfd64e43a6/src/hotspot/share/prims/jvmtiImpl.cpp#l411 
>>>>>>
>>>>>>
>>>>>> But why it is important to not be at safepoint in publish_aot(). 
>>>>>> If AOT is registered first and then breakpoint is set AOT methods 
>>>>>> will be deoptimized by CodeCache::flush_dependents_on_method() 
>>>>>> which is called from BreakpointInfo::set().
>>>>>
>>>>> I mean you can't do this:
>>>>>
>>>>> 1) check breakpoint count
>>>>> 2) safepoint
>>>>> 3) register code
>>>>>
>>>>> The AOT code is not visible to 
>>>>> CodeCache::flush_dependents_on_method() until the cmpxchg().
>>>>> NoSafepointVerifier would catch any changes in the future that 
>>>>> introduce a safepoint.
>>>>>
>>>>> dl
>>>>>
>>>>>>
>>>>>> Vladimir
>>>>>>
>>>>>> On 9/12/18 5:45 PM, dean.long at oracle.com wrote:
>>>>>>> Hi Vladimir.? C1 and C2 use ciEnv which also grabs locks and 
>>>>>>> checks JvmtiExport::can_hotswap_or_post_breakpoint() and 
>>>>>>> Dependencies::check_evol_method().? But if the breakpoint count 
>>>>>>> can only be changed by the VM thread at a safepoint, then your 
>>>>>>> fix looks good as long as we don't enter a safepoint before the 
>>>>>>> code is registered.? How about adding a NoSafepointVerifier to 
>>>>>>> publish_aot()?
>>>>>>>
>>>>>>> dl
>>>>>>>
>>>>>>>
>>>>>>> On 9/12/18 1:52 PM, Vladimir Kozlov wrote:
>>>>>>>> http://cr.openjdk.java.net/~kvn/8210220/webrev.00/
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8210220
>>>>>>>>
>>>>>>>> Don't register AOT method if corresponding java method has 
>>>>>>>> breakpoints (for debugging) otherwise AOT method will be 
>>>>>>>> executed which do not stop at breakpoint. JIT has similar check 
>>>>>>>> [1].
>>>>>>>>
>>>>>>>> I also removed AOT code which is not used and we forgot to remove.
>>>>>>>>
>>>>>>>> Tested hs-tier1-3.
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>> [1] 
>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845
>>>>>>>
>>>>>
>>


From vladimir.kozlov at oracle.com  Thu Sep 13 20:48:20 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 13 Sep 2018 13:48:20 -0700
Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error
 # ERROR: TEST FAILED: Cought IOException while receiving event packet
In-Reply-To: <9200d62f-1854-a1ed-2f2e-7becbe1beea0@oracle.com>
References: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com>
 <ffc441da-0c77-3f0a-e547-aa2e8aaa63b9@oracle.com>
 <a2dc6962-8001-6f51-044a-f7335c498d44@oracle.com>
 <f57effec-c0ec-9f38-91cb-b647991bcd89@oracle.com>
 <1203c5d6-c74a-e205-9dcb-eb43fb6971d3@oracle.com>
 <a0fd0ec8-ccfd-c6c0-ddd0-c8a164e2eab7@oracle.com>
 <3c63df34-7f40-aac7-77aa-97f6d6a482cb@oracle.com>
 <3144010b-1cd1-cc27-2782-867326916ae1@oracle.com>
 <9200d62f-1854-a1ed-2f2e-7becbe1beea0@oracle.com>
Message-ID: <d60181d6-2d31-3dcc-b803-e003f3095929@oracle.com>

Thank you, Dean

Vladimir

On 9/13/18 12:44 PM, dean.long at oracle.com wrote:
> No it's fine.? I wasn't looking at the full context.? Sorry for the confusion.? This version is good.
> 
> dl
> 
> On 9/13/18 12:25 PM, Vladimir Kozlov wrote:
>> No, first PauseNoSafepointVerifier is on the path where we exit function without publishing AOT 
>> method.
>> May be my comment there is not clear. Do you have better suggestion for comment?
>>
>> Thanks,
>> Vladimir
>>
>> On 9/13/18 11:59 AM, dean.long at oracle.com wrote:
>>> After the first PauseNoSafepointVerifier, I think you need to check mh->number_of_breakpoints() 
>>> again, because it could have changed.
>>>
>>> dl
>>>
>>> On 9/13/18 9:01 AM, Vladimir Kozlov wrote:
>>>> Updated changes with NoSafepointVerifier:
>>>>
>>>> http://cr.openjdk.java.net/~kvn/8210220/webrev.01/
>>>>
>>>> Vladimir
>>>>
>>>> On 9/12/18 9:00 PM, Vladimir Kozlov wrote:
>>>>> Yes, you are right I will add NoSafepointVerifier and will rerun testing.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 9/12/18 6:51 PM, dean.long at oracle.com wrote:
>>>>>> On 9/12/18 6:25 PM, Vladimir Kozlov wrote:
>>>>>>> Thank you, Dean
>>>>>>>
>>>>>>> Breakpoint is set at safepoint:
>>>>>>>
>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/b7bfd64e43a6/src/hotspot/share/prims/jvmtiImpl.cpp#l411
>>>>>>>
>>>>>>> But why it is important to not be at safepoint in publish_aot(). If AOT is registered first 
>>>>>>> and then breakpoint is set AOT methods will be deoptimized by 
>>>>>>> CodeCache::flush_dependents_on_method() which is called from BreakpointInfo::set().
>>>>>>
>>>>>> I mean you can't do this:
>>>>>>
>>>>>> 1) check breakpoint count
>>>>>> 2) safepoint
>>>>>> 3) register code
>>>>>>
>>>>>> The AOT code is not visible to CodeCache::flush_dependents_on_method() until the cmpxchg().
>>>>>> NoSafepointVerifier would catch any changes in the future that introduce a safepoint.
>>>>>>
>>>>>> dl
>>>>>>
>>>>>>>
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 9/12/18 5:45 PM, dean.long at oracle.com wrote:
>>>>>>>> Hi Vladimir.? C1 and C2 use ciEnv which also grabs locks and checks 
>>>>>>>> JvmtiExport::can_hotswap_or_post_breakpoint() and Dependencies::check_evol_method().? But if 
>>>>>>>> the breakpoint count can only be changed by the VM thread at a safepoint, then your fix 
>>>>>>>> looks good as long as we don't enter a safepoint before the code is registered.? How about 
>>>>>>>> adding a NoSafepointVerifier to publish_aot()?
>>>>>>>>
>>>>>>>> dl
>>>>>>>>
>>>>>>>>
>>>>>>>> On 9/12/18 1:52 PM, Vladimir Kozlov wrote:
>>>>>>>>> http://cr.openjdk.java.net/~kvn/8210220/webrev.00/
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8210220
>>>>>>>>>
>>>>>>>>> Don't register AOT method if corresponding java method has breakpoints (for debugging) 
>>>>>>>>> otherwise AOT method will be executed which do not stop at breakpoint. JIT has similar 
>>>>>>>>> check [1].
>>>>>>>>>
>>>>>>>>> I also removed AOT code which is not used and we forgot to remove.
>>>>>>>>>
>>>>>>>>> Tested hs-tier1-3.
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>> [1] 
>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845
>>>>>>>>
>>>>>>
>>>
> 

From magnus.ihse.bursie at oracle.com  Thu Sep 13 22:20:52 2018
From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie)
Date: Fri, 14 Sep 2018 00:20:52 +0200
Subject: RFR: JDK-8210731 PropertiesParser does not produce reproducible output
Message-ID: <28385672-2843-9499-debd-fd7dc70e3f7c@oracle.com>

The file make/langtools/tools/propertiesparser/PropertiesParser.java 
b/make/langtools/tools/propertiesparser/PropertiesParser.java is used to 
convert .properties files into .java files as part of the gensrc step.

However, due to it's use of creating it's output directly from HashMaps, 
it's not guaranteed to be stable, and is causing spurios differences in 
our cmp-baseline builds.

Bug: https://bugs.openjdk.java.net/browse/JDK-8210731
WebRev: 
http://cr.openjdk.java.net/~ihse/JDK-8210731-properties-parser-is-not-stable/webrev.01

/Magnus


From mandy.chung at oracle.com  Thu Sep 13 22:25:44 2018
From: mandy.chung at oracle.com (mandy chung)
Date: Thu, 13 Sep 2018 15:25:44 -0700
Subject: RFR: JDK-8210731 PropertiesParser does not produce reproducible
 output
In-Reply-To: <28385672-2843-9499-debd-fd7dc70e3f7c@oracle.com>
References: <28385672-2843-9499-debd-fd7dc70e3f7c@oracle.com>
Message-ID: <4d9ed0b0-770d-2d5f-2629-03d5661690fa@oracle.com>

Looks okay to me.

Mandy
P.S. I cc'ed compiler-dev since I think you meant to cc compiler-dev 
instead of hotspot-compiler-dev.

On 9/13/18 3:20 PM, Magnus Ihse Bursie wrote:
> The file make/langtools/tools/propertiesparser/PropertiesParser.java 
> b/make/langtools/tools/propertiesparser/PropertiesParser.java is used 
> to convert .properties files into .java files as part of the gensrc step.
>
> However, due to it's use of creating it's output directly from 
> HashMaps, it's not guaranteed to be stable, and is causing spurios 
> differences in our cmp-baseline builds.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8210731
> WebRev: 
> http://cr.openjdk.java.net/~ihse/JDK-8210731-properties-parser-is-not-stable/webrev.01
>
> /Magnus
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180913/eb353996/attachment.html>

From jonathan.gibbons at oracle.com  Thu Sep 13 22:25:54 2018
From: jonathan.gibbons at oracle.com (Jonathan Gibbons)
Date: Thu, 13 Sep 2018 15:25:54 -0700
Subject: RFR: JDK-8210731 PropertiesParser does not produce reproducible
 output
In-Reply-To: <4d9ed0b0-770d-2d5f-2629-03d5661690fa@oracle.com>
References: <28385672-2843-9499-debd-fd7dc70e3f7c@oracle.com>
 <4d9ed0b0-770d-2d5f-2629-03d5661690fa@oracle.com>
Message-ID: <5B9AE3F2.4020404@oracle.com>

+1

-- Jon

On 09/13/2018 03:25 PM, mandy chung wrote:
> Looks okay to me.
>
> Mandy
> P.S. I cc'ed compiler-dev since I think you meant to cc compiler-dev 
> instead of hotspot-compiler-dev.
>
> On 9/13/18 3:20 PM, Magnus Ihse Bursie wrote:
>> The file make/langtools/tools/propertiesparser/PropertiesParser.java 
>> b/make/langtools/tools/propertiesparser/PropertiesParser.java is used 
>> to convert .properties files into .java files as part of the gensrc 
>> step.
>>
>> However, due to it's use of creating it's output directly from 
>> HashMaps, it's not guaranteed to be stable, and is causing spurios 
>> differences in our cmp-baseline builds.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8210731
>> WebRev: 
>> http://cr.openjdk.java.net/~ihse/JDK-8210731-properties-parser-is-not-stable/webrev.01
>>
>> /Magnus
>>
>


From erik.joelsson at oracle.com  Thu Sep 13 23:20:13 2018
From: erik.joelsson at oracle.com (Erik Joelsson)
Date: Thu, 13 Sep 2018 16:20:13 -0700
Subject: RFR: JDK-8210731 PropertiesParser does not produce reproducible
 output
In-Reply-To: <28385672-2843-9499-debd-fd7dc70e3f7c@oracle.com>
References: <28385672-2843-9499-debd-fd7dc70e3f7c@oracle.com>
Message-ID: <2c529f83-0d72-fb47-d6f1-5be45eeac3ef@oracle.com>

Hello,

Looks good. Perhaps add a comment explaining why the otherwise unusual 
choice of collection class is used.

/Erik

On 2018-09-13 15:20, Magnus Ihse Bursie wrote:
> The file make/langtools/tools/propertiesparser/PropertiesParser.java 
> b/make/langtools/tools/propertiesparser/PropertiesParser.java is used 
> to convert .properties files into .java files as part of the gensrc step.
>
> However, due to it's use of creating it's output directly from 
> HashMaps, it's not guaranteed to be stable, and is causing spurios 
> differences in our cmp-baseline builds.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8210731
> WebRev: 
> http://cr.openjdk.java.net/~ihse/JDK-8210731-properties-parser-is-not-stable/webrev.01
>
> /Magnus
>


From igor.veresov at oracle.com  Fri Sep 14 03:50:39 2018
From: igor.veresov at oracle.com (Igor Veresov)
Date: Thu, 13 Sep 2018 20:50:39 -0700
Subject: RFR(L) 8210478: Update Graal
Message-ID: <F6E803BA-375A-425B-9803-33FBFF2EEE92@oracle.com>

This is a regular update. Please see the JBS issue for the list of the changes included in this update.

JBS: https://bugs.openjdk.java.net/browse/JDK-8210478
Webrev: http://cr.openjdk.java.net/~iveresov/8210478/webrev.00/


Thanks!
igor


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180913/9d56817d/attachment.html>

From vladimir.kozlov at oracle.com  Fri Sep 14 04:39:58 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 13 Sep 2018 21:39:58 -0700
Subject: RFR(L) 8210478: Update Graal
In-Reply-To: <F6E803BA-375A-425B-9803-33FBFF2EEE92@oracle.com>
References: <F6E803BA-375A-425B-9803-33FBFF2EEE92@oracle.com>
Message-ID: <278cdc1f-bc09-4332-abfe-2429225c25d9@oracle.com>

Looks good.

Thanks,
Vladimir

On 9/13/18 8:50 PM, Igor Veresov wrote:
> This is a regular update. Please see the JBS issue for the list of the changes included in this update.
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8210478
> Webrev: http://cr.openjdk.java.net/~iveresov/8210478/webrev.00/ <http://cr.openjdk.java.net/%7Eiveresov/8210478/webrev.00/>
> 
> 
> Thanks!
> igor
> 
> 
> 

From igor.veresov at oracle.com  Fri Sep 14 04:56:41 2018
From: igor.veresov at oracle.com (Igor Veresov)
Date: Thu, 13 Sep 2018 21:56:41 -0700
Subject: RFR(L) 8210478: Update Graal
In-Reply-To: <278cdc1f-bc09-4332-abfe-2429225c25d9@oracle.com>
References: <F6E803BA-375A-425B-9803-33FBFF2EEE92@oracle.com>
 <278cdc1f-bc09-4332-abfe-2429225c25d9@oracle.com>
Message-ID: <114529AA-8A68-495A-98E9-EBCC98D663BE@oracle.com>

Thanks, Vladimir!

igor


> On Sep 13, 2018, at 9:39 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Looks good.
> 
> Thanks,
> Vladimir
> 
> On 9/13/18 8:50 PM, Igor Veresov wrote:
>> This is a regular update. Please see the JBS issue for the list of the changes included in this update.
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8210478
>> Webrev: http://cr.openjdk.java.net/~iveresov/8210478/webrev.00/ <http://cr.openjdk.java.net/%7Eiveresov/8210478/webrev.00/>
>> Thanks!
>> igor

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180913/123648b5/attachment.html>

From rwestrel at redhat.com  Fri Sep 14 07:49:16 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 14 Sep 2018 09:49:16 +0200
Subject: RFR(S): 8210390: C2 still crashes with "assert(mode ==
 ControlAroundStripMined && use == sfpt) failed: missed a node"
Message-ID: <dk6bm90r01v.fsf@rwestrel.remote.csb>


http://cr.openjdk.java.net/~roland/8210390/webrev.00/

PhaseIdealLoop::reorg_offsets() creates some data nodes on the exit path
of a counted loop so they are in the outer strip mined loop. Data nodes
in the outer strip mined loop are expected to be referenced from the
safepoint node. But that's not the case for these new nodes which have
all uses outside the outer strip mined loop. This inconsistency causes a
later attempt at cloning the loop in the same loop opts pass to break.

The fix is to assign control to the new data nodes that's on the outer
strip mined loop exit path.

Roland.

From martin.doerr at sap.com  Fri Sep 14 08:29:58 2018
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 14 Sep 2018 08:29:58 +0000
Subject: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx
 registers in ppc.ad
In-Reply-To: <OF38E7CA02.B1F6AF05-ON00258307.005C3C0B-49258307.005DDAA1@notes.na.collabserv.com>
References: <5e113528b9c94b5089b8b7f1b8b22ed8@sap.com>
 <OF38E7CA02.B1F6AF05-ON00258307.005C3C0B-49258307.005DDAA1@notes.na.collabserv.com>
Message-ID: <f67c306c714c40e4bb604e8f9dfe4515@sap.com>

Hi Michihiro,

your webrev
http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/
looks good to me.

I only noticed that some instructs (e.g. for xscvdpspn and xxspltw) have no "format %{ ... %}" specification so they are missing in the PrintOptoAssembly output. But this seems to be missing in the current version already.

We can test it while waiting for a 2nd review.

Thanks and best regards,
Martin


From: Michihiro Horie <HORIE at jp.ibm.com>
Sent: Donnerstag, 13. September 2018 19:05
To: Doerr, Martin <martin.doerr at sap.com>
Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Gustavo Romero <gromero at linux.vnet.ibm.com>; hotspot-compiler-dev at openjdk.java.net
Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad


Hi Martin,

Thank you so much for your review (and adding the ID in the subject :-).

>I don't think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used.
You're right, thanks. I removed a redundant one.

I also refactored ReplicateD with vector length 2.

Following is the latest webrev:
http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/<http://cr.openjdk.java.net/~mhorie/8210660/webrev.00/>


Best regards,
--
Michihiro,
IBM Research - Tokyo

[Inactive hide details for "Doerr, Martin" ---2018/09/13 16:25:33---Hi Michihiro, I have added "RFR(S): 8210660" to the subject.]"Doerr, Martin" ---2018/09/13 16:25:33---Hi Michihiro, I have added "RFR(S): 8210660" to the subject.

From: "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>
To: Michihiro Horie <HORIE at jp.ibm.com<mailto:HORIE at jp.ibm.com>>, "hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>" <hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>
Cc: Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>, "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>
Date: 2018/09/13 16:25
Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad

________________________________


Hi Michihiro,

I have added "RFR(S): 8210660" to the subject.

I don't think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used.
Besides this, your change looks good to me.

Would you like to improve ReplicateD with vector length 2, too?

Thanks and best regards,
Martin


From: Michihiro Horie <HORIE at jp.ibm.com<mailto:HORIE at jp.ibm.com>>
Sent: Mittwoch, 12. September 2018 18:11
To: hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Cc: Doerr, Martin <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>; Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>
Subject: RFR: PPC64: Mapping floating point registers to vsx registers in ppc.ad

Dear all,

Would you please review the following change?

Bug: https://bugs.openjdk.java.net/browse/JDK-8210660
Webrev: http://cr.openjdk.java.net/~mhorie/8210660/webrev.00/

In the current code emit for replicating the floating point value in ppc.ad, a floating point value is once stored in order to load as an integer value. However, when SuperwordUseVSX is enabled, this is redundant because the floating point registers are overlapped with vsx registers 0-31. We can use vsx instructions for replicating the floating point value by mapping the floating point register to the vsx register.


Best regards,
--
Michihiro,
IBM Research - Tokyo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180914/a52251ac/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 105 bytes
Desc: image001.gif
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180914/a52251ac/image001.gif>

From aph at redhat.com  Fri Sep 14 09:34:34 2018
From: aph at redhat.com (Andrew Haley)
Date: Fri, 14 Sep 2018 10:34:34 +0100
Subject: [aarch64-port-dev ] RFR: 8189107 - AARCH64: create intrinsic for
 pow
In-Reply-To: <ae6b81dd-795d-e23f-099a-761d776f3466@redhat.com>
References: <d5554366-9851-a63c-b8d2-c1300219f487@bell-sw.com>
 <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com>
 <c9153905-1f74-fb34-4938-b48b34d5a12c@redhat.com>
 <a7707848-3075-9fa4-f81d-89c128fd6dca@bell-sw.com>
 <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com>
 <ae6b81dd-795d-e23f-099a-761d776f3466@redhat.com>
Message-ID: <169c83ec-3e2f-a001-22c0-08528b3f189f@redhat.com>

On 09/07/2018 01:58 PM, Andrew Dinn wrote:
> I have rewritten the algorithm to achieve what I think is needed to
> patch these omissions. The redraft of this part of the code is available
> here:
> 
>   http://cr.openjdk.java.net/~adinn/8189107/original_algorithm.txt

I know that you're very good at using punctuation, capitalization, and
grammar in written text. However, for some reason you omit these in
comments. In this case, it would be much easier to read your comments
if they were recast as sentences in grammatical English. Sure, some of
them could be simply noun phrases.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From adinn at redhat.com  Fri Sep 14 09:59:38 2018
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 14 Sep 2018 10:59:38 +0100
Subject: RFR: 8189107 - AARCH64: create intrinsic for pow
In-Reply-To: <02da7309-9a70-ea30-fafc-d1d85cfae328@bell-sw.com>
References: <d5554366-9851-a63c-b8d2-c1300219f487@bell-sw.com>
 <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com>
 <c9153905-1f74-fb34-4938-b48b34d5a12c@redhat.com>
 <a7707848-3075-9fa4-f81d-89c128fd6dca@bell-sw.com>
 <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com>
 <ae6b81dd-795d-e23f-099a-761d776f3466@redhat.com>
 <02da7309-9a70-ea30-fafc-d1d85cfae328@bell-sw.com>
Message-ID: <26281e69-0354-9abe-1ffc-36c10fd93d68@redhat.com>

Hi Dmitrij,

On 13/09/18 15:35, Dmitrij Pochepko wrote:
> I found 3 items to fix in your comments in
> http://cr.openjdk.java.net/~adinn/8189107/original_algorithm.txt
> 
> 1)
> 
> //?????????? [1, sqrt(3/2)), [sqrt(3/2), sqrt(3)/2), [sqrt(3)/2, 2)
> //????? i.e. [1, ~1.225],??? [~1.225,??? ~1.732),??? [~1.732, 2)
> 
> this one should be:
> 
> [1, sqrt(3/2)), [sqrt(3/2), sqrt(3)), [sqrt(3), 2)
> i.e. [1, ~1.225],??? [~1.225,??? ~1.732),??? [~1.732,??? 2)
> 
> 
> 2)
> 
> "4) Filter out overflows (z > 1023) or underflows (z < -1077)"
> 
> should be:
> 
> "4) Filter out overflows (z > 1023) or underflows (z < -1076)"
> 
> 3) "5) Let |z| = n + r where n is int, 0 <= n < 10, and 0 <= r < 1"
> 
> should be:
> 
> "5) Let |z| = n + r where n is int, 0 <= n < 1076, and 0 <= r < 1"
> 
> Other comments seems fine
Thank you for the corrections. I will update the file on
cr.openjdk.java.net accordingly.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From zhaixiang at loongson.cn  Fri Sep 14 11:31:14 2018
From: zhaixiang at loongson.cn (Leslie Zhai)
Date: Fri, 14 Sep 2018 19:31:14 +0800
Subject: RFR(XS): 8024128: guarantee(codelet_size > 0 && (size_t)codelet_size
 > 2*K) failed: not enough space for interpreter generation
Message-ID: <c5a018a8-2cab-8b3b-e76a-b75494cd7f76@loongson.cn>

Hi,

I just quoted the old thread 
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-October/012243.html

I think we should increase it more for future otherwise you will have to
always catch up with interpreter changes.

Increase it to 256 * 1024 and 224 * 1024

Vladimir

On 10/16/13 12:22 PM, Albert Noll wrote:
 > Hi,
 >
 > could I have a review for this patch?
 >
 > bug: https://bugs.openjdk.java.net/browse/JDK-8026708
 > webrev: http://cr.openjdk.java.net/~anoll/8026708/webrev.00/
 > <http://cr.openjdk.java.net/%7Eanoll/8026708/webrev.00/>
 >
 > Problem: Not enough room for interpreter. My last patch did not solve
 > the problem for solaris-amd64.
 >???????????????? A local build (solaris-amd64) of the most recent
 > hotspot-comp version requires a template interpreter
 >???????????????? size of 211K (obtained with -XX:+PrintInterpreter).
 > There have been some modifications to the template
 >???????????????? interpreter in the last couple of weeks which might have
 > triggered this error.
 >
 > Solution: Increase interpreter size by 8k (32-bit and 64-bit).
 >
 > Testing: Failing test case in solaris-amd64

----- 8< -------- 8< -------- 8< -------- 8< -------- 8< -------- 8< ---

I found that `InterpreterCodeSize` had been changed from 200K to 208K 
[1] ,? then changed from 208K to 256K [2] by Albert.? But if built 
with-debug-level=fastdebug/slowdebug,? it will be multiplied by four:

NOT_PRODUCT(code_size *= 4;)? // debug uses extra interpreter code space

Then it might trigger Native memory allocation (malloc) failed to 
allocate xxx bytes for CodeCache: no room for Interpreter issue.

I don't want to always catch up with interpreter changes by guessing the 
suitable number, not too small, not too big :) Please give me some 
suggestion about the root cause,? thanks a lot!

Leslie Zhai

[1] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/6d7eba360ba4

[2] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/74e00b98d5dd


From HORIE at jp.ibm.com  Fri Sep 14 11:42:15 2018
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Fri, 14 Sep 2018 20:42:15 +0900
Subject: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx
 registers in ppc.ad
In-Reply-To: <f67c306c714c40e4bb604e8f9dfe4515@sap.com>
References: <5e113528b9c94b5089b8b7f1b8b22ed8@sap.com>
 <OF38E7CA02.B1F6AF05-ON00258307.005C3C0B-49258307.005DDAA1@notes.na.collabserv.com>
 <f67c306c714c40e4bb604e8f9dfe4515@sap.com>
Message-ID: <OF46ED71D3.A34CED01-ON00258308.003FBB2C-49258308.00404B58@notes.na.collabserv.com>


Hi Martin,

Thank you for your comment to improve this change and testing it. I
uploaded a new webrev with format statements.
http://cr.openjdk.java.net/~mhorie/8210660/webrev.02/


Best regards,
--
Michihiro,
IBM Research - Tokyo


From:	"Doerr, Martin" <martin.doerr at sap.com>
To:	Michihiro Horie <HORIE at jp.ibm.com>
Cc:	"Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>, Gustavo
            Romero <gromero at linux.vnet.ibm.com>,
            "hotspot-compiler-dev at openjdk.java.net"
            <hotspot-compiler-dev at openjdk.java.net>
Date:	2018/09/14 17:30
Subject:	RE: RFR(S): 8210660: PPC64: Mapping floating point registers to
            vsx registers in ppc.ad


Hi Michihiro,

your webrev
http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/
looks good to me.

I only noticed that some instructs (e.g. for xscvdpspn and xxspltw) have no
?format %{ ? %}? specification so they are missing in the PrintOptoAssembly
output. But this seems to be missing in the current version already.

We can test it while waiting for a 2nd review.

Thanks and best regards,
Martin


From: Michihiro Horie <HORIE at jp.ibm.com>
Sent: Donnerstag, 13. September 2018 19:05
To: Doerr, Martin <martin.doerr at sap.com>
Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Gustavo Romero
<gromero at linux.vnet.ibm.com>; hotspot-compiler-dev at openjdk.java.net
Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to
vsx registers in ppc.ad


Hi Martin,

Thank you so much for your review (and adding the ID in the subject :-).

>I don?t think we need 2 nodes for ReplicateF with vector length 4. Both
ones in your webrev are for Power8 so only one will be used.
You're right, thanks. I removed a redundant one.

I also refactored ReplicateD with vector length 2.

Following is the latest webrev:
http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/


Best regards,
--
Michihiro,
IBM Research - Tokyo

Inactive hide details for "Doerr, Martin" ---2018/09/13 16:25:33---Hi
Michihiro, I have added "RFR(S): 8210660" to the subject."Doerr, Martin"
---2018/09/13 16:25:33---Hi Michihiro, I have added "RFR(S): 8210660" to
the subject.

From: "Doerr, Martin" <martin.doerr at sap.com>
To: Michihiro Horie <HORIE at jp.ibm.com>, "
hotspot-compiler-dev at openjdk.java.net" <
hotspot-compiler-dev at openjdk.java.net>
Cc: Gustavo Romero <gromero at linux.vnet.ibm.com>, "Lindenmaier, Goetz" <
goetz.lindenmaier at sap.com>
Date: 2018/09/13 16:25
Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to
vsx registers in ppc.ad


Hi Michihiro,

I have added ?RFR(S): 8210660? to the subject.

I don?t think we need 2 nodes for ReplicateF with vector length 4. Both
ones in your webrev are for Power8 so only one will be used.
Besides this, your change looks good to me.

Would you like to improve ReplicateD with vector length 2, too?

Thanks and best regards,
Martin


From: Michihiro Horie <HORIE at jp.ibm.com>
Sent: Mittwoch, 12. September 2018 18:11
To: hotspot-compiler-dev at openjdk.java.net
Cc: Doerr, Martin <martin.doerr at sap.com>; Gustavo Romero <
gromero at linux.vnet.ibm.com>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
Subject: RFR: PPC64: Mapping floating point registers to vsx registers in
ppc.ad


Dear all,

Would you please review the following change?

Bug: https://bugs.openjdk.java.net/browse/JDK-8210660
Webrev: http://cr.openjdk.java.net/~mhorie/8210660/webrev.00/

In the current code emit for replicating the floating point value in
ppc.ad, a floating point value is once stored in order to load as an
integer value. However, when SuperwordUseVSX is enabled, this is redundant
because the floating point registers are overlapped with vsx registers
0-31. We can use vsx instructions for replicating the floating point value
by mapping the floating point register to the vsx register.


Best regards,
--
Michihiro,
IBM Research - Tokyo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180914/a86135d1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180914/a86135d1/graycol.gif>

From rkennke at redhat.com  Fri Sep 14 12:56:07 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 14 Sep 2018 14:56:07 +0200
Subject: RFR: JDK-8210752: Remaining explicit barriers for C2
Message-ID: <15be8e2d-dba5-2e8a-c851-b6821b81d4b3@redhat.com>

Please review the following change:

JDK-8210187 introduced explicit barriers for C2. There've been a few
missing:
- Unsafe accesses also require explicit barriers when it's unknown if
the access is on-heap or off-heap. In this case, C2 turns the access
into a raw access, in which case the access_load/store APIs cannot
determine what to do. Emitting explicit barriers solves this for
Shenandoah: in case of raw access, base will be NULL, which gets handled
by a null-check (in this case the barrier is ignored), for on-heap
access, the null-check will fail and the barrier triggered correctly.
- One arraycopy barrier on dst erroneously emitted for ACCESS_READ where
it should be ACCESS_WRITE (my mistake)
- Object equality using CmpP requires stable oops, and thus barriers on
both operands.
- vectorizedMismatch() and copyMemory() also require explicit barriers
before building the addresses and feeding them into the calls.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8210752
Webrev:
http://cr.openjdk.java.net/~rkennke/JDK-8210752/webrev.00/

Thanks,
Roman

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180914/3eadd5bf/signature.asc>

From tobias.hartmann at oracle.com  Fri Sep 14 13:42:39 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 14 Sep 2018 15:42:39 +0200
Subject: RFR(S): 8210390: C2 still crashes with "assert(mode ==
 ControlAroundStripMined && use == sfpt) failed: missed a node"
In-Reply-To: <dk6bm90r01v.fsf@rwestrel.remote.csb>
References: <dk6bm90r01v.fsf@rwestrel.remote.csb>
Message-ID: <0976877f-eaa7-c5e6-fb52-0c0a70467c84@oracle.com>

Hi Roland,

that looks good to me.

Best regards,
Tobias

On 14.09.2018 09:49, Roland Westrelin wrote:
> 
> http://cr.openjdk.java.net/~roland/8210390/webrev.00/
> 
> PhaseIdealLoop::reorg_offsets() creates some data nodes on the exit path
> of a counted loop so they are in the outer strip mined loop. Data nodes
> in the outer strip mined loop are expected to be referenced from the
> safepoint node. But that's not the case for these new nodes which have
> all uses outside the outer strip mined loop. This inconsistency causes a
> later attempt at cloning the loop in the same loop opts pass to break.
> 
> The fix is to assign control to the new data nodes that's on the outer
> strip mined loop exit path.
> 
> Roland.
> 

From rwestrel at redhat.com  Fri Sep 14 14:47:21 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 14 Sep 2018 16:47:21 +0200
Subject: RFR(S): 8210390: C2 still crashes with "assert(mode ==
 ControlAroundStripMined && use == sfpt) failed: missed a node"
In-Reply-To: <0976877f-eaa7-c5e6-fb52-0c0a70467c84@oracle.com>
References: <dk6bm90r01v.fsf@rwestrel.remote.csb>
 <0976877f-eaa7-c5e6-fb52-0c0a70467c84@oracle.com>
Message-ID: <dk636ucqgp2.fsf@rwestrel.remote.csb>


> that looks good to me.

Thanks for the review, Tobias.

Roland.

From goetz.lindenmaier at sap.com  Fri Sep 14 15:18:12 2018
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Fri, 14 Sep 2018 15:18:12 +0000
Subject: RFR (S): 8209972: [GRAAL] Don't run RTM tests with Graal
In-Reply-To: <4F954DE5-DD8C-4395-8E40-6D341C42649C@twitter.com>
References: <9148c105-1bd8-0852-9130-6b3c7823330e@linux.vnet.ibm.com>
 <3ae68161-0f78-2766-655f-fe3fe4a4719f@oracle.com>
 <ebccc7d7-aa73-9f00-7b91-379ff5cf1a84@linux.vnet.ibm.com>
 <9e7a0c2b-47c0-45b7-a394-a88849cfa3a1@oracle.com>
 <15d5f2c1-96ce-c379-233f-e71218a63e1e@linux.vnet.ibm.com>
 <2e905821-4525-e210-5cc5-50a6f09a39f6@oracle.com>
 <c10ed3d2-b76a-4b52-c7ae-25bddb9ab721@linux.vnet.ibm.com>,
 <4F954DE5-DD8C-4395-8E40-6D341C42649C@twitter.com>
Message-ID: <20A4FBE2-A091-4437-A2D8-9806C8DC1837@sap.com>

Hi,

Gustavo, thanks for the offlist explanations.

The change simplifies the matter nicely.
Looks good, reviewed.

Best regards, G?tz 

> Am 10.09.2018 um 22:17 schrieb Christian Thalinger <cthalinger at twitter.com>:
> 
> 
> 
>> On Sep 6, 2018, at 1:53 AM, Gustavo Romero <gromero at linux.vnet.ibm.com> wrote:
>> 
>> On 09/05/2018 07:54 PM, Vladimir Kozlov wrote:
>>> v3 looks good.
>> 
>> Thanks a lot Vladimir.
>> 
>> @Goetz, would you mind to review v3 please?
> 
> Is he on vacation? :-)
> 
>> It touches code meant for AIX but
>> I don't expect any change in the end.
>> 
>> http://cr.openjdk.java.net/~gromero/8209972/v3/
>> 
>> Thank you.
>> 
>> 
>> Best regards,
>> Gustavo
>> 
>>> Thanks,
>>> Vladimir
>>>> On 9/5/18 3:18 PM, Gustavo Romero wrote:
>>>> Hi Vladimir,
>>>> 
>>>>> On 09/04/2018 03:40 PM, Vladimir Kozlov wrote:
>>>>> Thank you Gustavo for detailed answer.
>>>>> 
>>>>> I agree with your suggestion to keep vm.rtm.cpu and remove vm.rtm.os. Changes in VMProps.java are good now.
>>>> 
>>>> Thanks for reviewing it!
>>>> 
>>>> 
>>>>> About property check (flavor == "server" & !emulatedClient) in tests. If C2 (and Graal in a future when it supports RTM) is used, this check is always true because C2 (and Graal) is only available in Server VM and it is disabled in emulatedClient setting. This check was added before we have C2 usage check which can be done now with vm.rtm.compiler.
>>>> 
>>>> Thanks, I was not aware of it. I've updated the webrev removing
>>>> "flavor == "server" & !emulatedClient":
>>>> 
>>>> http://cr.openjdk.java.net/~gromero/8209972/v3/
>>>> 
>>>> "hg diff --patience":
>>>> 
>>>> http://cr.openjdk.java.net/~gromero/8209972/v3/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v3.diff
>>>> 
>>>> Testing (on Linux):
>>>> 
>>>> ** X86_64 w/ CPU+OS RTM support + Graal VM **
>>>> Test results: no tests selected (all RTM tests skipped)
>>>> 
>>>> ** POWER8 w/ CPU+OS support **
>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>> Test results: passed: 30
>>>> 
>>>> ** X86_64 w/ CPU+OS support **
>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>> Test results: passed: 30
>>>> 
>>>> ** POWER7 wo/ CPU+OS RTM support **
>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>> Test results: passed: 10
>>>> 
>>>> ** POWER9 NV CPU w/ CPU RTM support and OS wo/ RTM support **
>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>> Test results: passed: 10
>>>> 
>>>> 
>>>> Best regards,
>>>> Gustavo
>>>> 
>>>>> Thanks,
>>>>> Vladimir
>>>>> 
>>>>>> On 9/3/18 3:15 PM, Gustavo Romero wrote:
>>>>>> Hi Vladimir,
>>>>>> 
>>>>>> Thanks a lot for reviewing it and for your comments.
>>>>>> 
>>>>>>> On 08/31/2018 03:12 PM, Vladimir Kozlov wrote:
>>>>>>> Hi Gustavo,
>>>>>>> 
>>>>>>> I think you should replace !Compiler.isGraalEnabled() with Compiler.isC2Enabled() because C2 may be switched off with TieredStopAtLevel < 4 flag
>>>>>> 
>>>>>> Yes, although currently afaics all tests will explicitly enabled C2 (for
>>>>>> instance, passing -Xcomp and -XX:-TieredCompilation) or implicitly enable C2
>>>>>> through a warming up before testing, I agree that nothing forbids one to
>>>>>> switch off C2 with TieredStopAtLevel = level < 4 for a test case. It also
>>>>>> looks better to list explicitly which compilers do support RTM instead of
>>>>>> the ones that don't support it.
>>>>>> 
>>>>>> I've updated the webrev accordingly:
>>>>>> 
>>>>>> http://cr.openjdk.java.net/~gromero/8209972/v2/
>>>>>> 
>>>>>> diff in there looks odd so I generated another one with --patience for a
>>>>>> better (IMO) diff format:
>>>>>> 
>>>>>> http://cr.openjdk.java.net/~gromero/8209972/v2/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v2.diff
>>>>>> 
>>>>>> 
>>>>>>> Also can platforms check be replaced with one vmRTMCPU()? Is ppc64 return true from vmRTMCPU()?
>>>>>> 
>>>>>> For example, on Linux the following cases are possible regarding CPU / OS
>>>>>> RTM support:
>>>>>> 
>>>>>> POWER7   : cpu = false, os = false         => vm.rtm.cpu = false
>>>>>> POWER8   : cpu = true,  os = false | true  => vm.rtm.cpu = false | true
>>>>>> POWER9 VM: cpu = true,  os = false | true  => vm.rtm.cpu = false | true
>>>>>> POWER9 NV: cpu = true,  os = false         => vm.rtm.cpu = false
>>>>>> 
>>>>>> PPC64 will return 'true' for vmRTMCPU() _only_ if both CPU and OS support
>>>>>> RTM. In other words, when @requires asks for "vm.rtm.cpu & vm.rtm.os" it
>>>>>> really looks like a tautology because if "vm.rtm.cpu" is 'true' it implies
>>>>>> "vm.rtm.os" being 'true' as well, otherwise the JVM would never advertise
>>>>>> the "rtm" feature (which is used by "vm.rtm.cpu"). That seems true for
>>>>>> Linux and for AIX.
>>>>>> 
>>>>>> That said I don't think that the platforms check can be replaced with one
>>>>>> vmRTMCPU(), because in some cases it's necessary to run a test for
>>>>>> cpu = false and compiler = true, i.e. it's necessary to run a test on an
>>>>>> unsupported CPU for a given platform _only if_ the compiler in use supports
>>>>>> RTM (like C2). So if, for instance, we do:
>>>>>> 
>>>>>> 'vm.rtm.compiler = C2 && vmRTMCPU()' and use 'vm.rtm.compiler' in @requires
>>>>>> we "tie" CPU+OS RTM support to compiler RTM support and the evaluation
>>>>>> returns 'false' for cpu = false and compiler = true, skipping the test
>>>>>> (vm.rtm.compiler = false). On the other hand, to evaluate 'vm.rtm.compiler'
>>>>>> as 'true' and run the test in that case one could match for
>>>>>> '!vm.rtm.compiler', but if compiler = false (Graal) '!vm.rtm.compiler' will
>>>>>> be evaluated as 'true' and the test will run even thought the Graal
>>>>>> compiler is selected, which is wrong.
>>>>>> 
>>>>>> Hence 'vm.rtm.compiler' property must not rely on vmRTMCPU() and must
>>>>>> contain its own list of supported compilers with RTM support for each
>>>>>> platform IMO. Basically we can't ask the JVM about the compiler's support
>>>>>> for RTM, since the JVM can only tell us about the CPU+OS support for RTM
>>>>>> regarding the CPU and OS in which the JVM is running on.
>>>>>> 
>>>>>> 
>>>>>>> And since you check cpu in here why not to replace all vm.rtm.* with one vm.rtm.supported? In such case you would need only one @requires checks in tests instead of:
>>>>>>> 
>>>>>>> vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os & vm.rtm.compiler
>>>>>> 
>>>>>> I think it's not possible either. Currently there are 5 match cases in
>>>>>> RTM tests:
>>>>>> 
>>>>>> gromero at moog:~/hg/jdk/jdk/test/hotspot/jtreg/compiler/rtm$ fgrep @require -RIn | cut -d ' ' -f 2-40 | sort -u
>>>>>> * @requires !(vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os)
>>>>>> * @requires vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os
>>>>>> * @requires (!vm.rtm.cpu) & (vm.flavor == "server" & !vm.emulatedClient)
>>>>>> * @requires vm.rtm.cpu & !(vm.flavor == "server" & !vm.emulatedClient)
>>>>>> 
>>>>>> which can be simplified 5 cases as:
>>>>>> 
>>>>>> 1:          !(flavor == "server" & !emulatedClient  & cpu & os)
>>>>>> 2:            flavor == "server" & !emulatedClient  & cpu & os
>>>>>> 3: (!cpu) &  (flavor == "server" & !emulatedClient)
>>>>>> 4:   cpu  & !(flavor == "server" & !emulatedClient)
>>>>>> 5: no @requires
>>>>>> 
>>>>>> I understand that case 1 and 2 (since CPU implies OS) can be simplified as:
>>>>>> 
>>>>>> 
>>>>>> 1:          !(flavor == "server" & !emulatedClient  & cpu)
>>>>>> 2:            flavor == "server" & !emulatedClient  & cpu
>>>>>> 3: (!cpu) &  (flavor == "server" & !emulatedClient)
>>>>>> 4:   cpu  & !(flavor == "server" & !emulatedClient)
>>>>>> 5: no @requires
>>>>>> 
>>>>>> and case 1 and 2 are mere opposites, so we have 4 cases:
>>>>>> 
>>>>>> 1:          !(flavor == "server" & !emulatedClient  & cpu)
>>>>>> 3: (!cpu) &  (flavor == "server" & !emulatedClient)
>>>>>> 4:   cpu  & !(flavor == "server" & !emulatedClient)
>>>>>> 5: no @requires
>>>>>> 
>>>>>> We could simplify further making P = (flavor == "server" & !emulatedClient),
>>>>>> and make:
>>>>>> 
>>>>>> 1:          !(P & cpu)
>>>>>> 3: (!cpu) &  (P)
>>>>>> 4:   cpu  & !(P)
>>>>>> 5: no @requires
>>>>>> 
>>>>>> So if we add a compiler = C2 && (x64 | PPC) property to each of them in
>>>>>> order to control running the tests only if the selected compiler on a
>>>>>> given platform has RTM support (skipping Graal, for instance):
>>>>>> 
>>>>>> 1:          !(P & cpu) & compiler
>>>>>> 3: (!cpu) &  (P)       & compiler
>>>>>> 4:   cpu  & !(P)       & compiler
>>>>>> 5: no @requires        & compiler
>>>>>> 
>>>>>> So it looks like that at minimum we would need 3 properties, but IMO it's
>>>>>> not worth to add another property P = (flavor == "server" & !emulatedClient)
>>>>>> just to simplify further the @requires line.
>>>>>> 
>>>>>> In summing up, I think it's only possible to replace 'cpu & os' by 'cpu',
>>>>>> so I updated the webrev removing the vm.rtm.os property and keeping only
>>>>>> vm.rtm.cpu and vm.rtm.compiler (plus flavor and emulatedClient checks).
>>>>>> 
>>>>>> I've tested the following scenarios and observed no regression [1]:
>>>>>> 
>>>>>> 1. X86_64 w/ RTM
>>>>>> 2. X86_64 w/ RTM + Graal enabled
>>>>>> 3. POWER7: no CPU+OS support for RTM
>>>>>> 4. POWER8: CPU+OS support for RTM
>>>>>> 
>>>>>> But I think we need a confirmation from SAP about AIX.
>>>>>> 
>>>>>> 
>>>>>> Best regards,
>>>>>> Gustavo
>>>>>> 
>>>>>> [1]
>>>>>> 
>>>>>> ** X86_64 w/ RTM **
>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>>>> Test results: passed: 30
>>>>>> 
>>>>>> 
>>>>>> ** X86_64 w/ RTM + Graal enabled **
>>>>>> Test results: no tests selected (all RTM tests skipped)
>>>>>> 
>>>>>> 
>>>>>> ** POWER7: no CPU+OS support for RTM **
>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>>> Test results: passed: 10
>>>>>> 
>>>>>> 
>>>>>> ** POWER8: CPU+OS support for RTM **
>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>>>> Test results: passed: 30
>>>>>> 
>>>>>> 
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>> 
>>>>>>>> On 8/31/18 8:38 AM, Gustavo Romero wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Could the following small change be reviewed please?
>>>>>>>> 
>>>>>>>> Bug   : https://bugs.openjdk.java.net/browse/JDK-8209972
>>>>>>>> Webrev: http://cr.openjdk.java.net/~gromero/8209972/v1/
>>>>>>>> 
>>>>>>>> It disables all RTM tests when a compiler not supporting RTM (e.g. Graal)
>>>>>>>> is selected on platforms that can have CPU/OS with RTM support.
>>>>>>>> 
>>>>>>>> It also disables all RTM tests for any other platform that has not a single
>>>>>>>> compiler supporting RTM.
>>>>>>>> 
>>>>>>>> The RTM support was first added to C2 compiler and once checkers for RTM
>>>>>>>> (notably vm.rtm.cpu) find the feature "rtm" advertised by the JVM they
>>>>>>>> assume that a compiler supporting RTM is available for sure ("rtm" is
>>>>>>>> advertised only if RTM is supported by both CPU and OS). Later the JVM
>>>>>>>> began to allow the selection of a compiler different from C2, like Graal,
>>>>>>>> and it became possible to select a compiler without RTM support despite the
>>>>>>>> fact that both the CPU and the OS support RTM. Thus for platforms
>>>>>>>> supporting Graal or any other specific compiler the compiler availability for
>>>>>>>> the RTM tests must be adjusted and if the selected compiler does not
>>>>>>>> support RTM then all RTM tests must be skipped, including the ones meant
>>>>>>>> for platforms without CPU or OS RTM support (e.g. *Unsupported*.java)
>>>>>>>> because in some cases, like in TestUseRTMLockingOptionOnUnsupportedCPU.java,
>>>>>>>> the test expects JVM initialization errors that will never occur because the
>>>>>>>> problem is not that the RTM support for CPU or OS is missing, but rather
>>>>>>>> because the selected compiler does not support RTM.
>>>>>>>> 
>>>>>>>> That change adds a new VM property 'vm.rtm.compiler' which can be used to
>>>>>>>> filter out compilers without RTM support for specific platforms and adapts
>>>>>>>> the current RTM tests to use that new property.
>>>>>>>> 
>>>>>>>> Nothing changes regarding the number of passing/selected tests for the
>>>>>>>> various cpu/os/compiler combinations on platforms that currently might
>>>>>>>> support RTM [1], except when Graal is in use.
>>>>>>>> 
>>>>>>>> Thank you.
>>>>>>>> 
>>>>>>>> Best regards,
>>>>>>>> Gustavo
>>>>>>>> 
>>>>>>>> 
>>>>>>>> [1]
>>>>>>>> 
>>>>>>>> ** X64 w/ CPU and OS supporting RTM **
>>>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>>>>>> Test results: passed: 30
>>>>>>>> 
>>>>>>>> ** X64 w/ CPU and OS supporting RTM + Graal compiler wo/ RTM support **
>>>>>>>> Test results: no tests selected (all RTM tests skipped)
>>>>>>>> 
>>>>>>>> ** POWER8 w/ CPU and OS supporting RTM **
>>>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>>>>>> Test results: passed: 30
>>>>>>>> 
>>>>>>>> ** POWER7 wo/ CPU and OS supporting RTM **
>>>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>>>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>>>>> Test results: passed: 10
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
> 

From adinn at redhat.com  Fri Sep 14 15:29:47 2018
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 14 Sep 2018 16:29:47 +0100
Subject: RFR: 8189107 - AARCH64: create intrinsic for pow
In-Reply-To: <02da7309-9a70-ea30-fafc-d1d85cfae328@bell-sw.com>
References: <d5554366-9851-a63c-b8d2-c1300219f487@bell-sw.com>
 <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com>
 <c9153905-1f74-fb34-4938-b48b34d5a12c@redhat.com>
 <a7707848-3075-9fa4-f81d-89c128fd6dca@bell-sw.com>
 <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com>
 <ae6b81dd-795d-e23f-099a-761d776f3466@redhat.com>
 <02da7309-9a70-ea30-fafc-d1d85cfae328@bell-sw.com>
Message-ID: <47c4307d-eca1-5270-7e2c-83a31686ef91@redhat.com>

On 13/09/18 15:35, Dmitrij Pochepko wrote:
> Other comments seems fine
I am glad to hear that you did not find any errors in my analysis.
However, I also need to ask you to answer a question that was implicit
in my earlier note. I said:

"I assume you are familiar with the relevant mathematics and how it has
been used to derive the algorithm. If so then I would like you to review
this rewrite and ensure that there are nor mathematical errors in it. I
would also like you to check that the explanatory comments for of the
individual steps in the algorithm do not contain any errors.

If you are not familiar with the mathematics then please let me know. I
need to know whether this has been reviewed by someone competent to do so."

As you didn't respond to this I will have to ask you explicitly this
time. Do you have a background in mathematics and numerical analysis
that means you understand how the original algorithm has been arrived
at? equally, how your algorithm may legitimately vary from that original?

I'll break this down into several steps:

Do you understand the (elementary) theory that explains how the various
polynomial expansions I described in my comments converge to the
original log and exp functions?

Do you understand the theory that explains how partial polynomial sums
(Remez polynomials) can be used used to approximate these polynomial
expansions within specified ranges?

Do you know how the coefficients of these Remez polynomial can be
derived to any necessary accuracy?

Do you understand how the computation of the values of those Remez
polynomials must proceed in order to guarantee accuracy in the computed
result in the presence of rounding errors?

Can you provide a mathematical proof that the variations you have
introduced into the computational process (specifially the move from
Horner form to Estrin form) will not introduce rounding errors?

I certainly cannot lay claim to a /thorough/ understanding of most, if
not all, those topics. If you also cannot then I think we need to bring
in someone who does. In particular, it is the last point that matters
most of all here as this is where you have /chosen/ to make your
algorithm diverge from the code you inherited.

As regards the rest of the background maths, we do at least know that
the other aspects of the algorithm -- in its original manifestation --
have been checked by numerical experts. Hence, if we ensure that your
algorithm implements /equivalent/ steps then it ought to inherit the
same guarantees of correctness. So, the only task as far as most of the
code is concerned is to iron out any errors you might inadvertently have
introduced. I have several nits to pick in that regard that which I will
be posting shortly.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From gromero at linux.vnet.ibm.com  Fri Sep 14 15:58:27 2018
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Fri, 14 Sep 2018 12:58:27 -0300
Subject: RFR (S): 8209972: [GRAAL] Don't run RTM tests with Graal
In-Reply-To: <20A4FBE2-A091-4437-A2D8-9806C8DC1837@sap.com>
References: <9148c105-1bd8-0852-9130-6b3c7823330e@linux.vnet.ibm.com>
 <3ae68161-0f78-2766-655f-fe3fe4a4719f@oracle.com>
 <ebccc7d7-aa73-9f00-7b91-379ff5cf1a84@linux.vnet.ibm.com>
 <9e7a0c2b-47c0-45b7-a394-a88849cfa3a1@oracle.com>
 <15d5f2c1-96ce-c379-233f-e71218a63e1e@linux.vnet.ibm.com>
 <2e905821-4525-e210-5cc5-50a6f09a39f6@oracle.com>
 <c10ed3d2-b76a-4b52-c7ae-25bddb9ab721@linux.vnet.ibm.com>
 <4F954DE5-DD8C-4395-8E40-6D341C42649C@twitter.com>
 <20A4FBE2-A091-4437-A2D8-9806C8DC1837@sap.com>
Message-ID: <606d94b1-6099-4b5b-c992-99e1d1c5661d@linux.vnet.ibm.com>

Hi G?tz,

On 09/14/2018 12:18 PM, Lindenmaier, Goetz wrote:
> Hi,
> 
> Gustavo, thanks for the offlist explanations.
> 
> The change simplifies the matter nicely.
> Looks good, reviewed.

Thanks a lot for reviewing it!

I'll push it today.


Best regards,
Gustavo

> Best regards, G?tz
> 
>> Am 10.09.2018 um 22:17 schrieb Christian Thalinger <cthalinger at twitter.com>:
>>
>>
>>
>>> On Sep 6, 2018, at 1:53 AM, Gustavo Romero <gromero at linux.vnet.ibm.com> wrote:
>>>
>>> On 09/05/2018 07:54 PM, Vladimir Kozlov wrote:
>>>> v3 looks good.
>>>
>>> Thanks a lot Vladimir.
>>>
>>> @Goetz, would you mind to review v3 please?
>>
>> Is he on vacation? :-)
>>
>>> It touches code meant for AIX but
>>> I don't expect any change in the end.
>>>
>>> http://cr.openjdk.java.net/~gromero/8209972/v3/
>>>
>>> Thank you.
>>>
>>>
>>> Best regards,
>>> Gustavo
>>>
>>>> Thanks,
>>>> Vladimir
>>>>> On 9/5/18 3:18 PM, Gustavo Romero wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>>> On 09/04/2018 03:40 PM, Vladimir Kozlov wrote:
>>>>>> Thank you Gustavo for detailed answer.
>>>>>>
>>>>>> I agree with your suggestion to keep vm.rtm.cpu and remove vm.rtm.os. Changes in VMProps.java are good now.
>>>>>
>>>>> Thanks for reviewing it!
>>>>>
>>>>>
>>>>>> About property check (flavor == "server" & !emulatedClient) in tests. If C2 (and Graal in a future when it supports RTM) is used, this check is always true because C2 (and Graal) is only available in Server VM and it is disabled in emulatedClient setting. This check was added before we have C2 usage check which can be done now with vm.rtm.compiler.
>>>>>
>>>>> Thanks, I was not aware of it. I've updated the webrev removing
>>>>> "flavor == "server" & !emulatedClient":
>>>>>
>>>>> http://cr.openjdk.java.net/~gromero/8209972/v3/
>>>>>
>>>>> "hg diff --patience":
>>>>>
>>>>> http://cr.openjdk.java.net/~gromero/8209972/v3/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v3.diff
>>>>>
>>>>> Testing (on Linux):
>>>>>
>>>>> ** X86_64 w/ CPU+OS RTM support + Graal VM **
>>>>> Test results: no tests selected (all RTM tests skipped)
>>>>>
>>>>> ** POWER8 w/ CPU+OS support **
>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>>> Test results: passed: 30
>>>>>
>>>>> ** X86_64 w/ CPU+OS support **
>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>>> Test results: passed: 30
>>>>>
>>>>> ** POWER7 wo/ CPU+OS RTM support **
>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>> Test results: passed: 10
>>>>>
>>>>> ** POWER9 NV CPU w/ CPU RTM support and OS wo/ RTM support **
>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>> Test results: passed: 10
>>>>>
>>>>>
>>>>> Best regards,
>>>>> Gustavo
>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>>> On 9/3/18 3:15 PM, Gustavo Romero wrote:
>>>>>>> Hi Vladimir,
>>>>>>>
>>>>>>> Thanks a lot for reviewing it and for your comments.
>>>>>>>
>>>>>>>> On 08/31/2018 03:12 PM, Vladimir Kozlov wrote:
>>>>>>>> Hi Gustavo,
>>>>>>>>
>>>>>>>> I think you should replace !Compiler.isGraalEnabled() with Compiler.isC2Enabled() because C2 may be switched off with TieredStopAtLevel < 4 flag
>>>>>>>
>>>>>>> Yes, although currently afaics all tests will explicitly enabled C2 (for
>>>>>>> instance, passing -Xcomp and -XX:-TieredCompilation) or implicitly enable C2
>>>>>>> through a warming up before testing, I agree that nothing forbids one to
>>>>>>> switch off C2 with TieredStopAtLevel = level < 4 for a test case. It also
>>>>>>> looks better to list explicitly which compilers do support RTM instead of
>>>>>>> the ones that don't support it.
>>>>>>>
>>>>>>> I've updated the webrev accordingly:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~gromero/8209972/v2/
>>>>>>>
>>>>>>> diff in there looks odd so I generated another one with --patience for a
>>>>>>> better (IMO) diff format:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~gromero/8209972/v2/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v2.diff
>>>>>>>
>>>>>>>
>>>>>>>> Also can platforms check be replaced with one vmRTMCPU()? Is ppc64 return true from vmRTMCPU()?
>>>>>>>
>>>>>>> For example, on Linux the following cases are possible regarding CPU / OS
>>>>>>> RTM support:
>>>>>>>
>>>>>>> POWER7   : cpu = false, os = false         => vm.rtm.cpu = false
>>>>>>> POWER8   : cpu = true,  os = false | true  => vm.rtm.cpu = false | true
>>>>>>> POWER9 VM: cpu = true,  os = false | true  => vm.rtm.cpu = false | true
>>>>>>> POWER9 NV: cpu = true,  os = false         => vm.rtm.cpu = false
>>>>>>>
>>>>>>> PPC64 will return 'true' for vmRTMCPU() _only_ if both CPU and OS support
>>>>>>> RTM. In other words, when @requires asks for "vm.rtm.cpu & vm.rtm.os" it
>>>>>>> really looks like a tautology because if "vm.rtm.cpu" is 'true' it implies
>>>>>>> "vm.rtm.os" being 'true' as well, otherwise the JVM would never advertise
>>>>>>> the "rtm" feature (which is used by "vm.rtm.cpu"). That seems true for
>>>>>>> Linux and for AIX.
>>>>>>>
>>>>>>> That said I don't think that the platforms check can be replaced with one
>>>>>>> vmRTMCPU(), because in some cases it's necessary to run a test for
>>>>>>> cpu = false and compiler = true, i.e. it's necessary to run a test on an
>>>>>>> unsupported CPU for a given platform _only if_ the compiler in use supports
>>>>>>> RTM (like C2). So if, for instance, we do:
>>>>>>>
>>>>>>> 'vm.rtm.compiler = C2 && vmRTMCPU()' and use 'vm.rtm.compiler' in @requires
>>>>>>> we "tie" CPU+OS RTM support to compiler RTM support and the evaluation
>>>>>>> returns 'false' for cpu = false and compiler = true, skipping the test
>>>>>>> (vm.rtm.compiler = false). On the other hand, to evaluate 'vm.rtm.compiler'
>>>>>>> as 'true' and run the test in that case one could match for
>>>>>>> '!vm.rtm.compiler', but if compiler = false (Graal) '!vm.rtm.compiler' will
>>>>>>> be evaluated as 'true' and the test will run even thought the Graal
>>>>>>> compiler is selected, which is wrong.
>>>>>>>
>>>>>>> Hence 'vm.rtm.compiler' property must not rely on vmRTMCPU() and must
>>>>>>> contain its own list of supported compilers with RTM support for each
>>>>>>> platform IMO. Basically we can't ask the JVM about the compiler's support
>>>>>>> for RTM, since the JVM can only tell us about the CPU+OS support for RTM
>>>>>>> regarding the CPU and OS in which the JVM is running on.
>>>>>>>
>>>>>>>
>>>>>>>> And since you check cpu in here why not to replace all vm.rtm.* with one vm.rtm.supported? In such case you would need only one @requires checks in tests instead of:
>>>>>>>>
>>>>>>>> vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os & vm.rtm.compiler
>>>>>>>
>>>>>>> I think it's not possible either. Currently there are 5 match cases in
>>>>>>> RTM tests:
>>>>>>>
>>>>>>> gromero at moog:~/hg/jdk/jdk/test/hotspot/jtreg/compiler/rtm$ fgrep @require -RIn | cut -d ' ' -f 2-40 | sort -u
>>>>>>> * @requires !(vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os)
>>>>>>> * @requires vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os
>>>>>>> * @requires (!vm.rtm.cpu) & (vm.flavor == "server" & !vm.emulatedClient)
>>>>>>> * @requires vm.rtm.cpu & !(vm.flavor == "server" & !vm.emulatedClient)
>>>>>>>
>>>>>>> which can be simplified 5 cases as:
>>>>>>>
>>>>>>> 1:          !(flavor == "server" & !emulatedClient  & cpu & os)
>>>>>>> 2:            flavor == "server" & !emulatedClient  & cpu & os
>>>>>>> 3: (!cpu) &  (flavor == "server" & !emulatedClient)
>>>>>>> 4:   cpu  & !(flavor == "server" & !emulatedClient)
>>>>>>> 5: no @requires
>>>>>>>
>>>>>>> I understand that case 1 and 2 (since CPU implies OS) can be simplified as:
>>>>>>>
>>>>>>>
>>>>>>> 1:          !(flavor == "server" & !emulatedClient  & cpu)
>>>>>>> 2:            flavor == "server" & !emulatedClient  & cpu
>>>>>>> 3: (!cpu) &  (flavor == "server" & !emulatedClient)
>>>>>>> 4:   cpu  & !(flavor == "server" & !emulatedClient)
>>>>>>> 5: no @requires
>>>>>>>
>>>>>>> and case 1 and 2 are mere opposites, so we have 4 cases:
>>>>>>>
>>>>>>> 1:          !(flavor == "server" & !emulatedClient  & cpu)
>>>>>>> 3: (!cpu) &  (flavor == "server" & !emulatedClient)
>>>>>>> 4:   cpu  & !(flavor == "server" & !emulatedClient)
>>>>>>> 5: no @requires
>>>>>>>
>>>>>>> We could simplify further making P = (flavor == "server" & !emulatedClient),
>>>>>>> and make:
>>>>>>>
>>>>>>> 1:          !(P & cpu)
>>>>>>> 3: (!cpu) &  (P)
>>>>>>> 4:   cpu  & !(P)
>>>>>>> 5: no @requires
>>>>>>>
>>>>>>> So if we add a compiler = C2 && (x64 | PPC) property to each of them in
>>>>>>> order to control running the tests only if the selected compiler on a
>>>>>>> given platform has RTM support (skipping Graal, for instance):
>>>>>>>
>>>>>>> 1:          !(P & cpu) & compiler
>>>>>>> 3: (!cpu) &  (P)       & compiler
>>>>>>> 4:   cpu  & !(P)       & compiler
>>>>>>> 5: no @requires        & compiler
>>>>>>>
>>>>>>> So it looks like that at minimum we would need 3 properties, but IMO it's
>>>>>>> not worth to add another property P = (flavor == "server" & !emulatedClient)
>>>>>>> just to simplify further the @requires line.
>>>>>>>
>>>>>>> In summing up, I think it's only possible to replace 'cpu & os' by 'cpu',
>>>>>>> so I updated the webrev removing the vm.rtm.os property and keeping only
>>>>>>> vm.rtm.cpu and vm.rtm.compiler (plus flavor and emulatedClient checks).
>>>>>>>
>>>>>>> I've tested the following scenarios and observed no regression [1]:
>>>>>>>
>>>>>>> 1. X86_64 w/ RTM
>>>>>>> 2. X86_64 w/ RTM + Graal enabled
>>>>>>> 3. POWER7: no CPU+OS support for RTM
>>>>>>> 4. POWER8: CPU+OS support for RTM
>>>>>>>
>>>>>>> But I think we need a confirmation from SAP about AIX.
>>>>>>>
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Gustavo
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>> ** X86_64 w/ RTM **
>>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>>>>> Test results: passed: 30
>>>>>>>
>>>>>>>
>>>>>>> ** X86_64 w/ RTM + Graal enabled **
>>>>>>> Test results: no tests selected (all RTM tests skipped)
>>>>>>>
>>>>>>>
>>>>>>> ** POWER7: no CPU+OS support for RTM **
>>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>>>> Test results: passed: 10
>>>>>>>
>>>>>>>
>>>>>>> ** POWER8: CPU+OS support for RTM **
>>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>>>>> Test results: passed: 30
>>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>>> On 8/31/18 8:38 AM, Gustavo Romero wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Could the following small change be reviewed please?
>>>>>>>>>
>>>>>>>>> Bug   : https://bugs.openjdk.java.net/browse/JDK-8209972
>>>>>>>>> Webrev: http://cr.openjdk.java.net/~gromero/8209972/v1/
>>>>>>>>>
>>>>>>>>> It disables all RTM tests when a compiler not supporting RTM (e.g. Graal)
>>>>>>>>> is selected on platforms that can have CPU/OS with RTM support.
>>>>>>>>>
>>>>>>>>> It also disables all RTM tests for any other platform that has not a single
>>>>>>>>> compiler supporting RTM.
>>>>>>>>>
>>>>>>>>> The RTM support was first added to C2 compiler and once checkers for RTM
>>>>>>>>> (notably vm.rtm.cpu) find the feature "rtm" advertised by the JVM they
>>>>>>>>> assume that a compiler supporting RTM is available for sure ("rtm" is
>>>>>>>>> advertised only if RTM is supported by both CPU and OS). Later the JVM
>>>>>>>>> began to allow the selection of a compiler different from C2, like Graal,
>>>>>>>>> and it became possible to select a compiler without RTM support despite the
>>>>>>>>> fact that both the CPU and the OS support RTM. Thus for platforms
>>>>>>>>> supporting Graal or any other specific compiler the compiler availability for
>>>>>>>>> the RTM tests must be adjusted and if the selected compiler does not
>>>>>>>>> support RTM then all RTM tests must be skipped, including the ones meant
>>>>>>>>> for platforms without CPU or OS RTM support (e.g. *Unsupported*.java)
>>>>>>>>> because in some cases, like in TestUseRTMLockingOptionOnUnsupportedCPU.java,
>>>>>>>>> the test expects JVM initialization errors that will never occur because the
>>>>>>>>> problem is not that the RTM support for CPU or OS is missing, but rather
>>>>>>>>> because the selected compiler does not support RTM.
>>>>>>>>>
>>>>>>>>> That change adds a new VM property 'vm.rtm.compiler' which can be used to
>>>>>>>>> filter out compilers without RTM support for specific platforms and adapts
>>>>>>>>> the current RTM tests to use that new property.
>>>>>>>>>
>>>>>>>>> Nothing changes regarding the number of passing/selected tests for the
>>>>>>>>> various cpu/os/compiler combinations on platforms that currently might
>>>>>>>>> support RTM [1], except when Graal is in use.
>>>>>>>>>
>>>>>>>>> Thank you.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Gustavo
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>>
>>>>>>>>> ** X64 w/ CPU and OS supporting RTM **
>>>>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>>>>>>> Test results: passed: 30
>>>>>>>>>
>>>>>>>>> ** X64 w/ CPU and OS supporting RTM + Graal compiler wo/ RTM support **
>>>>>>>>> Test results: no tests selected (all RTM tests skipped)
>>>>>>>>>
>>>>>>>>> ** POWER8 w/ CPU and OS supporting RTM **
>>>>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java
>>>>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java
>>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java
>>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java
>>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java
>>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java
>>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java
>>>>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java
>>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java
>>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java
>>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java
>>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java
>>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java
>>>>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java
>>>>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java
>>>>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>>>>>>>>> Test results: passed: 30
>>>>>>>>>
>>>>>>>>> ** POWER7 wo/ CPU and OS supporting RTM **
>>>>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java
>>>>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java
>>>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java
>>>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java
>>>>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java
>>>>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java
>>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java
>>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java
>>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java
>>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java
>>>>>>>>> Test results: passed: 10
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>


From vladimir.kozlov at oracle.com  Fri Sep 14 16:13:01 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 14 Sep 2018 09:13:01 -0700
Subject: RFR(S): 8210390: C2 still crashes with "assert(mode ==
 ControlAroundStripMined && use == sfpt) failed: missed a node"
In-Reply-To: <0976877f-eaa7-c5e6-fb52-0c0a70467c84@oracle.com>
References: <dk6bm90r01v.fsf@rwestrel.remote.csb>
 <0976877f-eaa7-c5e6-fb52-0c0a70467c84@oracle.com>
Message-ID: <22635c3c-4c0e-ccc1-9853-46ffa56dd96c@oracle.com>

+1

Vladimir

On 9/14/18 6:42 AM, Tobias Hartmann wrote:
> Hi Roland,
> 
> that looks good to me.
> 
> Best regards,
> Tobias
> 
> On 14.09.2018 09:49, Roland Westrelin wrote:
>>
>> http://cr.openjdk.java.net/~roland/8210390/webrev.00/
>>
>> PhaseIdealLoop::reorg_offsets() creates some data nodes on the exit path
>> of a counted loop so they are in the outer strip mined loop. Data nodes
>> in the outer strip mined loop are expected to be referenced from the
>> safepoint node. But that's not the case for these new nodes which have
>> all uses outside the outer strip mined loop. This inconsistency causes a
>> later attempt at cloning the loop in the same loop opts pass to break.
>>
>> The fix is to assign control to the new data nodes that's on the outer
>> strip mined loop exit path.
>>
>> Roland.
>>

From vladimir.kozlov at oracle.com  Fri Sep 14 18:33:17 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 14 Sep 2018 11:33:17 -0700
Subject: RFR(XS): 8024128: guarantee(codelet_size > 0 &&
 (size_t)codelet_size > 2*K) failed: not enough space for interpreter
 generation
In-Reply-To: <c5a018a8-2cab-8b3b-e76a-b75494cd7f76@loongson.cn>
References: <c5a018a8-2cab-8b3b-e76a-b75494cd7f76@loongson.cn>
Message-ID: <156208aa-a44c-fe34-c56f-24ffdb619714@oracle.com>

Hi Leslie

More context is needed. Is it Client or Server VM? Did you change ReservedCodeCacheSize?
Even with *4 it is about 1Mb when CodeCache size is 48Mb and in Tiered case even bigger.
Also we need call stack when you hit guarantee().

Regards,
Vladimir

On 9/14/18 4:31 AM, Leslie Zhai wrote:
> Hi,
> 
> I just quoted the old thread 
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-October/012243.html
> 
> I think we should increase it more for future otherwise you will have to
> always catch up with interpreter changes.
> 
> Increase it to 256 * 1024 and 224 * 1024
> 
> Vladimir
> 
> On 10/16/13 12:22 PM, Albert Noll wrote:
>  > Hi,
>  >
>  > could I have a review for this patch?
>  >
>  > bug: https://bugs.openjdk.java.net/browse/JDK-8026708
>  > webrev: http://cr.openjdk.java.net/~anoll/8026708/webrev.00/
>  > <http://cr.openjdk.java.net/%7Eanoll/8026708/webrev.00/>
>  >
>  > Problem: Not enough room for interpreter. My last patch did not solve
>  > the problem for solaris-amd64.
>  >???????????????? A local build (solaris-amd64) of the most recent
>  > hotspot-comp version requires a template interpreter
>  >???????????????? size of 211K (obtained with -XX:+PrintInterpreter).
>  > There have been some modifications to the template
>  >???????????????? interpreter in the last couple of weeks which might have
>  > triggered this error.
>  >
>  > Solution: Increase interpreter size by 8k (32-bit and 64-bit).
>  >
>  > Testing: Failing test case in solaris-amd64
> 
> ----- 8< -------- 8< -------- 8< -------- 8< -------- 8< -------- 8< ---
> 
> I found that `InterpreterCodeSize` had been changed from 200K to 208K [1] ,? then changed from 208K 
> to 256K [2] by Albert.? But if built with-debug-level=fastdebug/slowdebug,? it will be multiplied by 
> four:
> 
> NOT_PRODUCT(code_size *= 4;)? // debug uses extra interpreter code space
> 
> Then it might trigger Native memory allocation (malloc) failed to allocate xxx bytes for CodeCache: 
> no room for Interpreter issue.
> 
> I don't want to always catch up with interpreter changes by guessing the suitable number, not too 
> small, not too big :) Please give me some suggestion about the root cause,? thanks a lot!
> 
> Leslie Zhai
> 
> [1] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/6d7eba360ba4
> 
> [2] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/74e00b98d5dd
> 
> 

From vladimir.kozlov at oracle.com  Fri Sep 14 18:39:28 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 14 Sep 2018 11:39:28 -0700
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <d57561f0-94d2-a720-b54c-f48b51051fad@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
 <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>
 <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com>
Message-ID: <be409c1c-789c-3160-f1d4-55acab52d8df@oracle.com>

Looks good to me. I will start testing and let you know results.

Thanks,
Vladimir

On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> Please find below the updated webrev with all your comments incorporated:
> 
> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
> 
> I have run the jtreg compiler tests on SKX and KNL which have two different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms.
> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Tuesday, September 11, 2018 8:54 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
> 
> Thank you, Sandhya
> 
> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.
> 
> Vladimir
> 
> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
>> Hi Vladimir,
>>
>> Thanks a lot for the detailed review. I really appreciate your feedback.
>> Please see my response in your email below marked with (Sandhya >>>). Looking forward to your advice.
>>
>> Best Regards,
>> Sandhya
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Tuesday, September 11, 2018 5:11 PM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>
>> Thank you.
>>
>> I want to discuss next issue:
>>
>>    > You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store?
>>    >>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>
>> This is what I thought. You increase registers pressure this way which may cause spills on stack.
>> Also we don't check that register could be the same as result you may get unneeded moves.
>>
>> I would advice add memory moves at least.
>>
>> Sandhya >>>  I had added those rules initially and removed them in the final patch. I noticed that the register allocator uses the memory rules (e.g. LoadF) to initialize the idealreg2reg mask (matcher.cpp). I would like the register allocator to get all the possible register on an architecture for idealreg2reg mask. I wondered that multiple instruct rules in .ad file for LoadF from memory might cause problems.  I would have to have higher cost for loading into restricted register set like vlReg. Then I decided that the register allocator can handle this in much better way than me adding rules to load from memory. This is with the background that the regF is always all the available registers and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I am referring to is:
>>     MachNode *spillCP = match_tree(new LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>> #endif
>>     MachNode *spillI  = match_tree(new LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>>     MachNode *spillL  = match_tree(new LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, LoadNode::DependsO
>> nlyOnTest, false));
>>     MachNode *spillF  = match_tree(new LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>>     MachNode *spillD  = match_tree(new LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>>     MachNode *spillP  = match_tree(new LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>     ....
>>     idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
>>
>> An other question. You use movflt() and movdbl() which use either movap[s|d] and movs[s|d] instructions:
>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu/x86/macroAssembler_x86.hpp#l164
>> Are these instructions work when avx512vl is not available? I see for vectors you use
>> vpxor+vinserti* combination.
>>
>> Sandhya >>> Yes the scalar floating point instructions are available with AVX512 encoding when avx512vl is not available. That is why you would see not just movflt, movdbl but all the other scalar operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions.
>>
>> Last question. I notice next UseAVX check in vectors spills code in x86.ad:
>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
>>
>> Should it be (UseAVX < 3)?
>>
>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
>>
>> Thanks,
>> Vladimir
>>
>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>>> Hi Vladimir,
>>>
>>> Thanks a lot for your review and feedback. Please see my response in your email below. I will send an updated webrev incorporating your feedback.
>>>
>>> Best Regards,
>>> Sandhya
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Monday, September 10, 2018 6:09 PM
>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>> hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>>
>>> Very nice. Thank you, Sandhya.
>>>
>>> I would like to see more meaningful naming in .ad files - instead of rreg* and ovec* to have something like vlReg* and legVec*.
>>>
>>>>>> Yes, accepted.
>>>
>>> New load_from_* and load_to_* instructions in .ad files should be renamed to next and collocate with other Move*_reg_reg* instructions:
>>>
>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst,
>>> vlRegF src)
>>>>>> Yes, accepted.
>>>
>>> You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store?
>>>>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>
>>> Also please explain why these registers are used when UseAVX == 0?:
>>>
>>> +instruct absD_reg(rregD dst) %{
>>>        predicate((UseSSE>=2) && (UseAVX == 0));
>>>
>>> we switch off evex so regular regD (only legacy register in this case) should work too:
>>>      661   if (UseAVX < 3) {
>>>      662     _features &= ~CPU_AVX512F;
>>>
>>>>>> Yes, accepted. It could be regD here.
>>>
>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
>>>
>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>>> +vectors_reg_legacy, %{
>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
>>> VM_Version::supports_avx512dq() &&
>>> VM_Version::supports_avx512vl() %} );
>>>
>>>>>> Yes, accepted.
>>>
>>> I would suggest to test these changes on different machines (non-avx512 and avx512) and with different UseAVX values.
>>>
>>>>>> Will do.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>>>> Recently there have been couple of high priority issues with regards
>>>> to high bank of XMM register
>>>> (XMM16-XMM31) usage by C2:
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>>
>>>> Please find below a patch which attempts to clean up the XMM register handling by using register groups.
>>>>
>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>>>
>>>> The patch provides a restricted set of registers to the match rules
>>>> in the ad file based on the underlying architecture.
>>>>
>>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>>>>
>>>> By removing the special handling, the patch reduces the overall code size by about 1800 lines of code.
>>>>
>>>> Your review and feedback is very welcome.
>>>>
>>>> Best Regards,
>>>>
>>>> Sandhya
>>>>

From vladimir.kozlov at oracle.com  Fri Sep 14 19:12:45 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 14 Sep 2018 12:12:45 -0700
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <be409c1c-789c-3160-f1d4-55acab52d8df@oracle.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <d57561f0-94d2-a720-b54c-f48b51051fad@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
 <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>
 <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com>
 <be409c1c-789c-3160-f1d4-55acab52d8df@oracle.com>
Message-ID: <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com>

I got build failure:

workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end 
of the array (which contains 16 elements) [-Werror,-Warray-bounds]
jib >   _xmm_regs[16]  = xmm16;

I also noticed that we don't have RFE for this work. I filed:

https://bugs.openjdk.java.net/browse/JDK-8209735

You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I 
added next change to yours in src/hotspot/cpu/x86/globals_x86.hpp:

- product(intx, UseAVX, 2, \
+ product(intx, UseAVX, 3, \

Thanks,
Vladimir

On 9/14/18 11:39 AM, Vladimir Kozlov wrote:
> Looks good to me. I will start testing and let you know results.
> 
> Thanks,
> Vladimir
> 
> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
>> Hi Vladimir,
>>
>> Please find below the updated webrev with all your comments incorporated:
>>
>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
>>
>> I have run the jtreg compiler tests on SKX and KNL which have two different flavors of AVX512 and 
>> Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms.
>>
>> Best Regards,
>> Sandhya
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Tuesday, September 11, 2018 8:54 PM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>
>> Thank you, Sandhya
>>
>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.
>>
>> Vladimir
>>
>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
>>> Hi Vladimir,
>>>
>>> Thanks a lot for the detailed review. I really appreciate your feedback.
>>> Please see my response in your email below marked with (Sandhya >>>). Looking forward to your 
>>> advice.
>>>
>>> Best Regards,
>>> Sandhya
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Tuesday, September 11, 2018 5:11 PM
>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>>
>>> Thank you.
>>>
>>> I want to discuss next issue:
>>>
>>> ?? > You did not added instructions to load these registers from memory (and stack). What happens 
>>> in such cases when you need to load or store?
>>> ?? >>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory 
>>> into regF and then register to register move to rregF and vice versa.
>>>
>>> This is what I thought. You increase registers pressure this way which may cause spills on stack.
>>> Also we don't check that register could be the same as result you may get unneeded moves.
>>>
>>> I would advice add memory moves at least.
>>>
>>> Sandhya >>>? I had added those rules initially and removed them in the final patch. I noticed 
>>> that the register allocator uses the memory rules (e.g. LoadF) to initialize the idealreg2reg 
>>> mask (matcher.cpp). I would like the register allocator to get all the possible register on an 
>>> architecture for idealreg2reg mask. I wondered that multiple instruct rules in .ad file for LoadF 
>>> from memory might cause problems.? I would have to have higher cost for loading into restricted 
>>> register set like vlReg. Then I decided that the register allocator can handle this in much 
>>> better way than me adding rules to load from memory. This is with the background that the regF is 
>>> always all the available registers and vlRegF is the restricted register set. Likewise for VecS 
>>> and legVecS. Let me know you thoughts on this and if I should still add the rules to load from 
>>> memory into vlReg and legVec. The specific code from matcher.cpp that I am referring to is:
>>> ??? MachNode *spillCP = match_tree(new 
>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>> #endif
>>> ??? MachNode *spillI? = match_tree(new LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>>> ??? MachNode *spillL? = match_tree(new 
>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, LoadNode::DependsO
>>> nlyOnTest, false));
>>> ??? MachNode *spillF? = match_tree(new LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>>> ??? MachNode *spillD? = match_tree(new LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>>> ??? MachNode *spillP? = match_tree(new 
>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>> ??? ....
>>> ??? idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
>>>
>>> An other question. You use movflt() and movdbl() which use either movap[s|d] and movs[s|d] 
>>> instructions:
>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu/x86/macroAssembler_x86.hpp#l164
>>> Are these instructions work when avx512vl is not available? I see for vectors you use
>>> vpxor+vinserti* combination.
>>>
>>> Sandhya >>> Yes the scalar floating point instructions are available with AVX512 encoding when 
>>> avx512vl is not available. That is why you would see not just movflt, movdbl but all the other 
>>> scalar operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they 
>>> are AVX512F instructions.
>>>
>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad:
>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
>>>
>>> Should it be (UseAVX < 3)?
>>>
>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>>>> Hi Vladimir,
>>>>
>>>> Thanks a lot for your review and feedback. Please see my response in your email below. I will 
>>>> send an updated webrev incorporating your feedback.
>>>>
>>>> Best Regards,
>>>> Sandhya
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Monday, September 10, 2018 6:09 PM
>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>> hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>>>
>>>> Very nice. Thank you, Sandhya.
>>>>
>>>> I would like to see more meaningful naming in .ad files - instead of rreg* and ovec* to have 
>>>> something like vlReg* and legVec*.
>>>>
>>>>>>> Yes, accepted.
>>>>
>>>> New load_from_* and load_to_* instructions in .ad files should be renamed to next and collocate 
>>>> with other Move*_reg_reg* instructions:
>>>>
>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst,
>>>> vlRegF src)
>>>>>>> Yes, accepted.
>>>>
>>>> You did not added instructions to load these registers from memory (and stack). What happens in 
>>>> such cases when you need to load or store?
>>>>>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into 
>>>>>>> regF and then register to register move to rregF and vice versa.
>>>>
>>>> Also please explain why these registers are used when UseAVX == 0?:
>>>>
>>>> +instruct absD_reg(rregD dst) %{
>>>> ?????? predicate((UseSSE>=2) && (UseAVX == 0));
>>>>
>>>> we switch off evex so regular regD (only legacy register in this case) should work too:
>>>> ???? 661?? if (UseAVX < 3) {
>>>> ???? 662???? _features &= ~CPU_AVX512F;
>>>>
>>>>>>> Yes, accepted. It could be regD here.
>>>>
>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
>>>>
>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>>>> +vectors_reg_legacy, %{
>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
>>>> VM_Version::supports_avx512dq() &&
>>>> VM_Version::supports_avx512vl() %} );
>>>>
>>>>>>> Yes, accepted.
>>>>
>>>> I would suggest to test these changes on different machines (non-avx512 and avx512) and with 
>>>> different UseAVX values.
>>>>
>>>>>>> Will do.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>>>>> Recently there have been couple of high priority issues with regards
>>>>> to high bank of XMM register
>>>>> (XMM16-XMM31) usage by C2:
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>>>
>>>>> Please find below a patch which attempts to clean up the XMM register handling by using 
>>>>> register groups.
>>>>>
>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>>>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>>>>
>>>>> The patch provides a restricted set of registers to the match rules
>>>>> in the ad file based on the underlying architecture.
>>>>>
>>>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>>>>>
>>>>> By removing the special handling, the patch reduces the overall code size by about 1800 lines 
>>>>> of code.
>>>>>
>>>>> Your review and feedback is very welcome.
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Sandhya
>>>>>

From sandhya.viswanathan at intel.com  Fri Sep 14 20:27:29 2018
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 14 Sep 2018 20:27:29 +0000
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <d57561f0-94d2-a720-b54c-f48b51051fad@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
 <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>
 <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com>
 <be409c1c-789c-3160-f1d4-55acab52d8df@oracle.com>
 <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com>
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com>

 
Thanks Vladimir, the below should fix this issue: 

------------------------------
--- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700
+++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700
@@ -233,22 +233,6 @@
   _xmm_regs[13]  = xmm13;
   _xmm_regs[14]  = xmm14;
   _xmm_regs[15]  = xmm15;
-  _xmm_regs[16]  = xmm16;
-  _xmm_regs[17]  = xmm17;
-  _xmm_regs[18]  = xmm18;
-  _xmm_regs[19]  = xmm19;
-  _xmm_regs[20]  = xmm20;
-  _xmm_regs[21]  = xmm21;
-  _xmm_regs[22]  = xmm22;
-  _xmm_regs[23]  = xmm23;
-  _xmm_regs[24]  = xmm24;
-  _xmm_regs[25]  = xmm25;
-  _xmm_regs[26]  = xmm26;
-  _xmm_regs[27]  = xmm27;
-  _xmm_regs[28]  = xmm28;
-  _xmm_regs[29]  = xmm29;
-  _xmm_regs[30]  = xmm30;
-  _xmm_regs[31]  = xmm31;
 #endif // _LP64

   for (int i = 0; i < 8; i++) {
---------------------------------

I think the gcc version on my desktop is older so didn?t catch this.   

The updated patch along with the above change and setting UseAVX as 3 is uploaded to:
Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/
RFE: https://bugs.openjdk.java.net/browse/JDK-8209735

FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you before changing it back to 3.

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Friday, September 14, 2018 12:13 PM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction

I got build failure:

workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the array (which contains 16 elements) [-Werror,-Warray-bounds]
jib >   _xmm_regs[16]  = xmm16;

I also noticed that we don't have RFE for this work. I filed:

https://bugs.openjdk.java.net/browse/JDK-8209735

You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next change to yours in src/hotspot/cpu/x86/globals_x86.hpp:

- product(intx, UseAVX, 2, \
+ product(intx, UseAVX, 3, \

Thanks,
Vladimir

On 9/14/18 11:39 AM, Vladimir Kozlov wrote:
> Looks good to me. I will start testing and let you know results.
> 
> Thanks,
> Vladimir
> 
> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
>> Hi Vladimir,
>>
>> Please find below the updated webrev with all your comments incorporated:
>>
>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
>>
>> I have run the jtreg compiler tests on SKX and KNL which have two 
>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms.
>>
>> Best Regards,
>> Sandhya
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Tuesday, September 11, 2018 8:54 PM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; 
>> hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>
>> Thank you, Sandhya
>>
>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.
>>
>> Vladimir
>>
>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
>>> Hi Vladimir,
>>>
>>> Thanks a lot for the detailed review. I really appreciate your feedback.
>>> Please see my response in your email below marked with (Sandhya 
>>> >>>). Looking forward to your advice.
>>>
>>> Best Regards,
>>> Sandhya
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Tuesday, September 11, 2018 5:11 PM
>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; 
>>> hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 
>>> instruction
>>>
>>> Thank you.
>>>
>>> I want to discuss next issue:
>>>
>>> ?? > You did not added instructions to load these registers from 
>>> memory (and stack). What happens in such cases when you need to load or store?
>>> ?? >>>> Let us take an example, e.g. for loading into rregF. First 
>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>
>>> This is what I thought. You increase registers pressure this way which may cause spills on stack.
>>> Also we don't check that register could be the same as result you may get unneeded moves.
>>>
>>> I would advice add memory moves at least.
>>>
>>> Sandhya >>>? I had added those rules initially and removed them in 
>>> the final patch. I noticed that the register allocator uses the 
>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask 
>>> (matcher.cpp). I would like the register allocator to get all the 
>>> possible register on an architecture for idealreg2reg mask. I 
>>> wondered that multiple instruct rules in .ad file for LoadF from 
>>> memory might cause problems.? I would have to have higher cost for 
>>> loading into restricted register set like vlReg. Then I decided that 
>>> the register allocator can handle this in much better way than me 
>>> adding rules to load from memory. This is with the background that the regF is always all the available registers and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I am referring to is:
>>> ??? MachNode *spillCP = match_tree(new 
>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>> #endif
>>> ??? MachNode *spillI? = match_tree(new 
>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>>> ??? MachNode *spillL? = match_tree(new 
>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, 
>>> LoadNode::DependsO nlyOnTest, false));
>>> ??? MachNode *spillF? = match_tree(new 
>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>>> ??? MachNode *spillD? = match_tree(new 
>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>>> ??? MachNode *spillP? = match_tree(new 
>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>> ??? ....
>>> ??? idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
>>>
>>> An other question. You use movflt() and movdbl() which use either 
>>> movap[s|d] and movs[s|d]
>>> instructions:
>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu
>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when 
>>> avx512vl is not available? I see for vectors you use
>>> vpxor+vinserti* combination.
>>>
>>> Sandhya >>> Yes the scalar floating point instructions are available 
>>> with AVX512 encoding when avx512vl is not available. That is why you 
>>> would see not just movflt, movdbl but all the other scalar 
>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions.
>>>
>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad:
>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
>>>
>>> Should it be (UseAVX < 3)?
>>>
>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>>>> Hi Vladimir,
>>>>
>>>> Thanks a lot for your review and feedback. Please see my response 
>>>> in your email below. I will send an updated webrev incorporating your feedback.
>>>>
>>>> Best Regards,
>>>> Sandhya
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Monday, September 10, 2018 6:09 PM
>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; 
>>>> hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 
>>>> instruction
>>>>
>>>> Very nice. Thank you, Sandhya.
>>>>
>>>> I would like to see more meaningful naming in .ad files - instead 
>>>> of rreg* and ovec* to have something like vlReg* and legVec*.
>>>>
>>>>>>> Yes, accepted.
>>>>
>>>> New load_from_* and load_to_* instructions in .ad files should be 
>>>> renamed to next and collocate with other Move*_reg_reg* instructions:
>>>>
>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, 
>>>> vlRegF src)
>>>>>>> Yes, accepted.
>>>>
>>>> You did not added instructions to load these registers from memory 
>>>> (and stack). What happens in such cases when you need to load or store?
>>>>>>> Let us take an example, e.g. for loading into rregF. First it 
>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>>
>>>> Also please explain why these registers are used when UseAVX == 0?:
>>>>
>>>> +instruct absD_reg(rregD dst) %{
>>>> ?????? predicate((UseSSE>=2) && (UseAVX == 0));
>>>>
>>>> we switch off evex so regular regD (only legacy register in this case) should work too:
>>>> ???? 661?? if (UseAVX < 3) {
>>>> ???? 662???? _features &= ~CPU_AVX512F;
>>>>
>>>>>>> Yes, accepted. It could be regD here.
>>>>
>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
>>>>
>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>>>> +vectors_reg_legacy, %{
>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
>>>> VM_Version::supports_avx512dq() &&
>>>> VM_Version::supports_avx512vl() %} );
>>>>
>>>>>>> Yes, accepted.
>>>>
>>>> I would suggest to test these changes on different machines 
>>>> (non-avx512 and avx512) and with different UseAVX values.
>>>>
>>>>>>> Will do.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>>>>> Recently there have been couple of high priority issues with 
>>>>> regards to high bank of XMM register
>>>>> (XMM16-XMM31) usage by C2:
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>>>
>>>>> Please find below a patch which attempts to clean up the XMM 
>>>>> register handling by using register groups.
>>>>>
>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>>>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>>>>
>>>>> The patch provides a restricted set of registers to the match 
>>>>> rules in the ad file based on the underlying architecture.
>>>>>
>>>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>>>>>
>>>>> By removing the special handling, the patch reduces the overall 
>>>>> code size by about 1800 lines of code.
>>>>>
>>>>> Your review and feedback is very welcome.
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Sandhya
>>>>>

From vladimir.kozlov at oracle.com  Fri Sep 14 22:49:49 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 14 Sep 2018 15:49:49 -0700
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <d57561f0-94d2-a720-b54c-f48b51051fad@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
 <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>
 <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com>
 <be409c1c-789c-3160-f1d4-55acab52d8df@oracle.com>
 <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com>
Message-ID: <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com>

Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did 
not noticed.

Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on 
avx512 too.

1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 
-XX:-TieredCompilation' on CPU with AVX1 only

#  SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884
# Problematic frame:
# V  [libjvm.so+0x46f0f0]  MachNode::ideal_reg() const+0x20

Current CompileTask:
C2:    154    5             java.lang.String::equals (65 bytes)

Stack: [0x00007f3b10044000,0x00007f3b10145000],  sp=0x00007f3b1013fe70,  free space=1007k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native 
code)
V  [libjvm.so+0x46f0f0]  MachNode::ideal_reg() const+0x20
V  [libjvm.so+0x882a72]  PhaseChaitin::gather_lrg_masks(bool)+0x872
V  [libjvm.so+0xd82235]  PhaseCFG::global_code_motion()+0xfc5
V  [libjvm.so+0xd824b1]  PhaseCFG::do_global_code_motion()+0x51
V  [libjvm.so+0xa2c26c]  Compile::Code_Gen()+0x24c
V  [libjvm.so+0xa2ff82]  Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, 
DirectiveSet*)+0xe42

------------------------------------------------------------------------------------------------
2. compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java with '-Xcomp'
#  Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073
#  assert(false) failed: cannot spill interval that is used in first instruction (possible reason: 
no register found)

Current CompileTask:
C1: 854767 13391       3       org.sunflow.math.Matrix4::multiply (692 bytes)

Stack: [0x00007f23b9d82000,0x00007f23b9e83000],  sp=0x00007f23b9e7f9d0,  free space=1014k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native 
code)
V  [libjvm.so+0x1882202]  VMError::report_and_die(int, char const*, char const*, __va_list_tag*, 
Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x562
V  [libjvm.so+0x1882d2f]  VMError::report_and_die(Thread*, void*, char const*, int, char const*, 
char const*, __va_list_tag*)+0x2f
V  [libjvm.so+0xb0bea0]  report_vm_error(char const*, int, char const*, char const*, ...)+0x100
V  [libjvm.so+0x7e0410]  LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0
V  [libjvm.so+0x7e0a20]  LinearScanWalker::activate_current()+0x280
V  [libjvm.so+0x7e0c7d]  IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d
V  [libjvm.so+0x7e1078]  LinearScan::allocate_registers()+0x338
V  [libjvm.so+0x7e2135]  LinearScan::do_linear_scan()+0x155
V  [libjvm.so+0x70a6bb]  Compilation::emit_lir()+0x99b
V  [libjvm.so+0x70caff]  Compilation::compile_java_method()+0x42f
V  [libjvm.so+0x70d974]  Compilation::compile_method()+0x1d4
V  [libjvm.so+0x70e547]  Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, 
BufferBlob*, DirectiveSet*)+0x357
V  [libjvm.so+0x71073c]  Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c
V  [libjvm.so+0xa3cf89]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409

Vladimir

On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote:
>   
> 
> Thanks Vladimir, the below should fix this issue:
> 
> ------------------------------
> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700
> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700
> @@ -233,22 +233,6 @@
>     _xmm_regs[13]  = xmm13;
>     _xmm_regs[14]  = xmm14;
>     _xmm_regs[15]  = xmm15;
> -  _xmm_regs[16]  = xmm16;
> -  _xmm_regs[17]  = xmm17;
> -  _xmm_regs[18]  = xmm18;
> -  _xmm_regs[19]  = xmm19;
> -  _xmm_regs[20]  = xmm20;
> -  _xmm_regs[21]  = xmm21;
> -  _xmm_regs[22]  = xmm22;
> -  _xmm_regs[23]  = xmm23;
> -  _xmm_regs[24]  = xmm24;
> -  _xmm_regs[25]  = xmm25;
> -  _xmm_regs[26]  = xmm26;
> -  _xmm_regs[27]  = xmm27;
> -  _xmm_regs[28]  = xmm28;
> -  _xmm_regs[29]  = xmm29;
> -  _xmm_regs[30]  = xmm30;
> -  _xmm_regs[31]  = xmm31;
>   #endif // _LP64
> 
>     for (int i = 0; i < 8; i++) {
> ---------------------------------
> 
> I think the gcc version on my desktop is older so didn?t catch this.
> 
> The updated patch along with the above change and setting UseAVX as 3 is uploaded to:
> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/
> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735
> 
> FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you before changing it back to 3.
> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Friday, September 14, 2018 12:13 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
> 
> I got build failure:
> 
> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the array (which contains 16 elements) [-Werror,-Warray-bounds]
> jib >   _xmm_regs[16]  = xmm16;
> 
> I also noticed that we don't have RFE for this work. I filed:
> 
> https://bugs.openjdk.java.net/browse/JDK-8209735
> 
> You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next change to yours in src/hotspot/cpu/x86/globals_x86.hpp:
> 
> - product(intx, UseAVX, 2, \
> + product(intx, UseAVX, 3, \
> 
> Thanks,
> Vladimir
> 
> On 9/14/18 11:39 AM, Vladimir Kozlov wrote:
>> Looks good to me. I will start testing and let you know results.
>>
>> Thanks,
>> Vladimir
>>
>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
>>> Hi Vladimir,
>>>
>>> Please find below the updated webrev with all your comments incorporated:
>>>
>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
>>>
>>> I have run the jtreg compiler tests on SKX and KNL which have two
>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms.
>>>
>>> Best Regards,
>>> Sandhya
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Tuesday, September 11, 2018 8:54 PM
>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>> hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>>
>>> Thank you, Sandhya
>>>
>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.
>>>
>>> Vladimir
>>>
>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
>>>> Hi Vladimir,
>>>>
>>>> Thanks a lot for the detailed review. I really appreciate your feedback.
>>>> Please see my response in your email below marked with (Sandhya
>>>>>>> ). Looking forward to your advice.
>>>>
>>>> Best Regards,
>>>> Sandhya
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Tuesday, September 11, 2018 5:11 PM
>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>> hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>>> instruction
>>>>
>>>> Thank you.
>>>>
>>>> I want to discuss next issue:
>>>>
>>>>  ?? > You did not added instructions to load these registers from
>>>> memory (and stack). What happens in such cases when you need to load or store?
>>>>  ?? >>>> Let us take an example, e.g. for loading into rregF. First
>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>>
>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack.
>>>> Also we don't check that register could be the same as result you may get unneeded moves.
>>>>
>>>> I would advice add memory moves at least.
>>>>
>>>> Sandhya >>>? I had added those rules initially and removed them in
>>>> the final patch. I noticed that the register allocator uses the
>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask
>>>> (matcher.cpp). I would like the register allocator to get all the
>>>> possible register on an architecture for idealreg2reg mask. I
>>>> wondered that multiple instruct rules in .ad file for LoadF from
>>>> memory might cause problems.? I would have to have higher cost for
>>>> loading into restricted register set like vlReg. Then I decided that
>>>> the register allocator can handle this in much better way than me
>>>> adding rules to load from memory. This is with the background that the regF is always all the available registers and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I am referring to is:
>>>>  ??? MachNode *spillCP = match_tree(new
>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>>> #endif
>>>>  ??? MachNode *spillI? = match_tree(new
>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>>>>  ??? MachNode *spillL? = match_tree(new
>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered,
>>>> LoadNode::DependsO nlyOnTest, false));
>>>>  ??? MachNode *spillF? = match_tree(new
>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>>>>  ??? MachNode *spillD? = match_tree(new
>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>>>>  ??? MachNode *spillP? = match_tree(new
>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>>>  ??? ....
>>>>  ??? idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
>>>>
>>>> An other question. You use movflt() and movdbl() which use either
>>>> movap[s|d] and movs[s|d]
>>>> instructions:
>>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu
>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when
>>>> avx512vl is not available? I see for vectors you use
>>>> vpxor+vinserti* combination.
>>>>
>>>> Sandhya >>> Yes the scalar floating point instructions are available
>>>> with AVX512 encoding when avx512vl is not available. That is why you
>>>> would see not just movflt, movdbl but all the other scalar
>>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions.
>>>>
>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad:
>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
>>>>
>>>> Should it be (UseAVX < 3)?
>>>>
>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>> Thanks a lot for your review and feedback. Please see my response
>>>>> in your email below. I will send an updated webrev incorporating your feedback.
>>>>>
>>>>> Best Regards,
>>>>> Sandhya
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>> Sent: Monday, September 10, 2018 6:09 PM
>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>>> hotspot-compiler-dev at openjdk.java.net
>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>>>> instruction
>>>>>
>>>>> Very nice. Thank you, Sandhya.
>>>>>
>>>>> I would like to see more meaningful naming in .ad files - instead
>>>>> of rreg* and ovec* to have something like vlReg* and legVec*.
>>>>>
>>>>>>>> Yes, accepted.
>>>>>
>>>>> New load_from_* and load_to_* instructions in .ad files should be
>>>>> renamed to next and collocate with other Move*_reg_reg* instructions:
>>>>>
>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst,
>>>>> vlRegF src)
>>>>>>>> Yes, accepted.
>>>>>
>>>>> You did not added instructions to load these registers from memory
>>>>> (and stack). What happens in such cases when you need to load or store?
>>>>>>>> Let us take an example, e.g. for loading into rregF. First it
>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>>>
>>>>> Also please explain why these registers are used when UseAVX == 0?:
>>>>>
>>>>> +instruct absD_reg(rregD dst) %{
>>>>>  ?????? predicate((UseSSE>=2) && (UseAVX == 0));
>>>>>
>>>>> we switch off evex so regular regD (only legacy register in this case) should work too:
>>>>>  ???? 661?? if (UseAVX < 3) {
>>>>>  ???? 662???? _features &= ~CPU_AVX512F;
>>>>>
>>>>>>>> Yes, accepted. It could be regD here.
>>>>>
>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
>>>>>
>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>>>>> +vectors_reg_legacy, %{
>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
>>>>> VM_Version::supports_avx512dq() &&
>>>>> VM_Version::supports_avx512vl() %} );
>>>>>
>>>>>>>> Yes, accepted.
>>>>>
>>>>> I would suggest to test these changes on different machines
>>>>> (non-avx512 and avx512) and with different UseAVX values.
>>>>>
>>>>>>>> Will do.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>>>>>> Recently there have been couple of high priority issues with
>>>>>> regards to high bank of XMM register
>>>>>> (XMM16-XMM31) usage by C2:
>>>>>>
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>>>>>>
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>>>>
>>>>>> Please find below a patch which attempts to clean up the XMM
>>>>>> register handling by using register groups.
>>>>>>
>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>>>>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>>>>>
>>>>>> The patch provides a restricted set of registers to the match
>>>>>> rules in the ad file based on the underlying architecture.
>>>>>>
>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>>>>>>
>>>>>> By removing the special handling, the patch reduces the overall
>>>>>> code size by about 1800 lines of code.
>>>>>>
>>>>>> Your review and feedback is very welcome.
>>>>>>
>>>>>> Best Regards,
>>>>>>
>>>>>> Sandhya
>>>>>>

From vladimir.kozlov at oracle.com  Sat Sep 15 00:22:41 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 14 Sep 2018 17:22:41 -0700
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <d57561f0-94d2-a720-b54c-f48b51051fad@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
 <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>
 <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com>
 <be409c1c-789c-3160-f1d4-55acab52d8df@oracle.com>
 <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com>
 <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com>
Message-ID: <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com>

I gave incorrect link to RFE. Here is correct:

https://bugs.openjdk.java.net/browse/JDK-8210764

Vladimir

On 9/14/18 3:49 PM, Vladimir Kozlov wrote:
> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did 
> not noticed.
> 
> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on 
> avx512 too.
> 
> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 
> -XX:-TieredCompilation' on CPU with AVX1 only
> 
> #? SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884
> # Problematic frame:
> # V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20
> 
> Current CompileTask:
> C2:??? 154??? 5???????????? java.lang.String::equals (65 bytes)
> 
> Stack: [0x00007f3b10044000,0x00007f3b10145000],? sp=0x00007f3b1013fe70,? free space=1007k
> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20
> V? [libjvm.so+0x882a72]? PhaseChaitin::gather_lrg_masks(bool)+0x872
> V? [libjvm.so+0xd82235]? PhaseCFG::global_code_motion()+0xfc5
> V? [libjvm.so+0xd824b1]? PhaseCFG::do_global_code_motion()+0x51
> V? [libjvm.so+0xa2c26c]? Compile::Code_Gen()+0x24c
> V? [libjvm.so+0xa2ff82]? Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, 
> DirectiveSet*)+0xe42
> 
> ------------------------------------------------------------------------------------------------
> 2. compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java with '-Xcomp'
> #? Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073
> #? assert(false) failed: cannot spill interval that is used in first instruction (possible reason: 
> no register found)
> 
> Current CompileTask:
> C1: 854767 13391?????? 3?????? org.sunflow.math.Matrix4::multiply (692 bytes)
> 
> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],? sp=0x00007f23b9e7f9d0,? free space=1014k
> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> V? [libjvm.so+0x1882202]? VMError::report_and_die(int, char const*, char const*, __va_list_tag*, 
> Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x562
> V? [libjvm.so+0x1882d2f]? VMError::report_and_die(Thread*, void*, char const*, int, char const*, 
> char const*, __va_list_tag*)+0x2f
> V? [libjvm.so+0xb0bea0]? report_vm_error(char const*, int, char const*, char const*, ...)+0x100
> V? [libjvm.so+0x7e0410]? LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0
> V? [libjvm.so+0x7e0a20]? LinearScanWalker::activate_current()+0x280
> V? [libjvm.so+0x7e0c7d]? IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d
> V? [libjvm.so+0x7e1078]? LinearScan::allocate_registers()+0x338
> V? [libjvm.so+0x7e2135]? LinearScan::do_linear_scan()+0x155
> V? [libjvm.so+0x70a6bb]? Compilation::emit_lir()+0x99b
> V? [libjvm.so+0x70caff]? Compilation::compile_java_method()+0x42f
> V? [libjvm.so+0x70d974]? Compilation::compile_method()+0x1d4
> V? [libjvm.so+0x70e547]? Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, 
> BufferBlob*, DirectiveSet*)+0x357
> V? [libjvm.so+0x71073c]? Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c
> V? [libjvm.so+0xa3cf89]? CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409
> 
> Vladimir
> 
> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote:
>>
>> Thanks Vladimir, the below should fix this issue:
>>
>> ------------------------------
>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700
>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700
>> @@ -233,22 +233,6 @@
>> ??? _xmm_regs[13]? = xmm13;
>> ??? _xmm_regs[14]? = xmm14;
>> ??? _xmm_regs[15]? = xmm15;
>> -? _xmm_regs[16]? = xmm16;
>> -? _xmm_regs[17]? = xmm17;
>> -? _xmm_regs[18]? = xmm18;
>> -? _xmm_regs[19]? = xmm19;
>> -? _xmm_regs[20]? = xmm20;
>> -? _xmm_regs[21]? = xmm21;
>> -? _xmm_regs[22]? = xmm22;
>> -? _xmm_regs[23]? = xmm23;
>> -? _xmm_regs[24]? = xmm24;
>> -? _xmm_regs[25]? = xmm25;
>> -? _xmm_regs[26]? = xmm26;
>> -? _xmm_regs[27]? = xmm27;
>> -? _xmm_regs[28]? = xmm28;
>> -? _xmm_regs[29]? = xmm29;
>> -? _xmm_regs[30]? = xmm30;
>> -? _xmm_regs[31]? = xmm31;
>> ? #endif // _LP64
>>
>> ??? for (int i = 0; i < 8; i++) {
>> ---------------------------------
>>
>> I think the gcc version on my desktop is older so didn?t catch this.
>>
>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to:
>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/
>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735
>>
>> FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation 
>> from you before changing it back to 3.
>>
>> Best Regards,
>> Sandhya
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Friday, September 14, 2018 12:13 PM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>
>> I got build failure:
>>
>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the 
>> end of the array (which contains 16 elements) [-Werror,-Warray-bounds]
>> jib >?? _xmm_regs[16]? = xmm16;
>>
>> I also noticed that we don't have RFE for this work. I filed:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>
>> You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). 
>> I added next change to yours in src/hotspot/cpu/x86/globals_x86.hpp:
>>
>> - product(intx, UseAVX, 2, \
>> + product(intx, UseAVX, 3, \
>>
>> Thanks,
>> Vladimir
>>
>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote:
>>> Looks good to me. I will start testing and let you know results.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
>>>> Hi Vladimir,
>>>>
>>>> Please find below the updated webrev with all your comments incorporated:
>>>>
>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
>>>>
>>>> I have run the jtreg compiler tests on SKX and KNL which have two
>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the 
>>>> three platforms.
>>>>
>>>> Best Regards,
>>>> Sandhya
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Tuesday, September 11, 2018 8:54 PM
>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>> hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>>>
>>>> Thank you, Sandhya
>>>>
>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.
>>>>
>>>> Vladimir
>>>>
>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>> Thanks a lot for the detailed review. I really appreciate your feedback.
>>>>> Please see my response in your email below marked with (Sandhya
>>>>>>>> ). Looking forward to your advice.
>>>>>
>>>>> Best Regards,
>>>>> Sandhya
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>> Sent: Tuesday, September 11, 2018 5:11 PM
>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>>> hotspot-compiler-dev at openjdk.java.net
>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>>>> instruction
>>>>>
>>>>> Thank you.
>>>>>
>>>>> I want to discuss next issue:
>>>>>
>>>>> ??? > You did not added instructions to load these registers from
>>>>> memory (and stack). What happens in such cases when you need to load or store?
>>>>> ??? >>>> Let us take an example, e.g. for loading into rregF. First
>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>>>
>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack.
>>>>> Also we don't check that register could be the same as result you may get unneeded moves.
>>>>>
>>>>> I would advice add memory moves at least.
>>>>>
>>>>> Sandhya >>>? I had added those rules initially and removed them in
>>>>> the final patch. I noticed that the register allocator uses the
>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask
>>>>> (matcher.cpp). I would like the register allocator to get all the
>>>>> possible register on an architecture for idealreg2reg mask. I
>>>>> wondered that multiple instruct rules in .ad file for LoadF from
>>>>> memory might cause problems.? I would have to have higher cost for
>>>>> loading into restricted register set like vlReg. Then I decided that
>>>>> the register allocator can handle this in much better way than me
>>>>> adding rules to load from memory. This is with the background that the regF is always all the 
>>>>> available registers and vlRegF is the restricted register set. Likewise for VecS and legVecS. 
>>>>> Let me know you thoughts on this and if I should still add the rules to load from memory into 
>>>>> vlReg and legVec. The specific code from matcher.cpp that I am referring to is:
>>>>> ???? MachNode *spillCP = match_tree(new
>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>>>> #endif
>>>>> ???? MachNode *spillI? = match_tree(new
>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>>>>> ???? MachNode *spillL? = match_tree(new
>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered,
>>>>> LoadNode::DependsO nlyOnTest, false));
>>>>> ???? MachNode *spillF? = match_tree(new
>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>>>>> ???? MachNode *spillD? = match_tree(new
>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>>>>> ???? MachNode *spillP? = match_tree(new
>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>>>> ???? ....
>>>>> ???? idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
>>>>>
>>>>> An other question. You use movflt() and movdbl() which use either
>>>>> movap[s|d] and movs[s|d]
>>>>> instructions:
>>>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu
>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when
>>>>> avx512vl is not available? I see for vectors you use
>>>>> vpxor+vinserti* combination.
>>>>>
>>>>> Sandhya >>> Yes the scalar floating point instructions are available
>>>>> with AVX512 encoding when avx512vl is not available. That is why you
>>>>> would see not just movflt, movdbl but all the other scalar
>>>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are 
>>>>> AVX512F instructions.
>>>>>
>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad:
>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
>>>>>
>>>>> Should it be (UseAVX < 3)?
>>>>>
>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>>>>>> Hi Vladimir,
>>>>>>
>>>>>> Thanks a lot for your review and feedback. Please see my response
>>>>>> in your email below. I will send an updated webrev incorporating your feedback.
>>>>>>
>>>>>> Best Regards,
>>>>>> Sandhya
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>>> Sent: Monday, September 10, 2018 6:09 PM
>>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>>>> hotspot-compiler-dev at openjdk.java.net
>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>>>>> instruction
>>>>>>
>>>>>> Very nice. Thank you, Sandhya.
>>>>>>
>>>>>> I would like to see more meaningful naming in .ad files - instead
>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*.
>>>>>>
>>>>>>>>> Yes, accepted.
>>>>>>
>>>>>> New load_from_* and load_to_* instructions in .ad files should be
>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions:
>>>>>>
>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst,
>>>>>> vlRegF src)
>>>>>>>>> Yes, accepted.
>>>>>>
>>>>>> You did not added instructions to load these registers from memory
>>>>>> (and stack). What happens in such cases when you need to load or store?
>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it
>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>>>>
>>>>>> Also please explain why these registers are used when UseAVX == 0?:
>>>>>>
>>>>>> +instruct absD_reg(rregD dst) %{
>>>>>> ??????? predicate((UseSSE>=2) && (UseAVX == 0));
>>>>>>
>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too:
>>>>>> ????? 661?? if (UseAVX < 3) {
>>>>>> ????? 662???? _features &= ~CPU_AVX512F;
>>>>>>
>>>>>>>>> Yes, accepted. It could be regD here.
>>>>>>
>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have 
>>>>>> some):
>>>>>>
>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>>>>>> +vectors_reg_legacy, %{
>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
>>>>>> VM_Version::supports_avx512dq() &&
>>>>>> VM_Version::supports_avx512vl() %} );
>>>>>>
>>>>>>>>> Yes, accepted.
>>>>>>
>>>>>> I would suggest to test these changes on different machines
>>>>>> (non-avx512 and avx512) and with different UseAVX values.
>>>>>>
>>>>>>>>> Will do.
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>>>>>>> Recently there have been couple of high priority issues with
>>>>>>> regards to high bank of XMM register
>>>>>>> (XMM16-XMM31) usage by C2:
>>>>>>>
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>>>>>>>
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>>>>>
>>>>>>> Please find below a patch which attempts to clean up the XMM
>>>>>>> register handling by using register groups.
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>>>>>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>>>>>>
>>>>>>> The patch provides a restricted set of registers to the match
>>>>>>> rules in the ad file based on the underlying architecture.
>>>>>>>
>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>>>>>>>
>>>>>>> By removing the special handling, the patch reduces the overall
>>>>>>> code size by about 1800 lines of code.
>>>>>>>
>>>>>>> Your review and feedback is very welcome.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>>
>>>>>>> Sandhya
>>>>>>>

From zhaixiang at loongson.cn  Sat Sep 15 02:51:27 2018
From: zhaixiang at loongson.cn (Leslie Zhai)
Date: Sat, 15 Sep 2018 10:51:27 +0800
Subject: RFR(XS): 8024128: guarantee(codelet_size > 0 &&
 (size_t)codelet_size > 2*K) failed: not enough space for interpreter
 generation
In-Reply-To: <156208aa-a44c-fe34-c56f-24ffdb619714@oracle.com>
References: <c5a018a8-2cab-8b3b-e76a-b75494cd7f76@loongson.cn>
 <156208aa-a44c-fe34-c56f-24ffdb619714@oracle.com>
Message-ID: <12aa8902-a767-607d-f20c-462e9d0bc306@loongson.cn>

Hi Vladimir,

Thanks for your kind response!

It is Server VM, I am just debugging HotSpot C2.? Yes, I changed 
ReservedCodeCacheSize to 3m.

It is able to reproduce by jtreg for jdk8u fastdebug when 
InterpreterCodeSize is too big, for example 640K:

$ jtreg -dir:/home/xiangzhai/project/jdk8u/hotspot/test -verbose:all -a 
-ignore:quiet -timeoutFactor:5 -agentvm 
-testjdk:/home/xiangzhai/project/jdk8u/build/linux-x86_64-normal-server-fastdebug/images/j2sdk-image 
compiler/startup/SmallCodeCacheStartup.java

Native memory allocation (malloc) failed to allocate 2621440 bytes for 
CodeCache: no room for Interpreter

It is clear that failed to allocate BufferBlob -> CodeBlob.

And it is also able to reproduce for release when too small, for example 
200K, but *no* changing ReservedCodeCacheSize:

$ jtreg -dir:/home/xiangzhai/project/jdk8u/jdk/test -verbose:all 
-exclude:/home/xiangzhai/project/jdk8u/jdk/test/ProblemList.txt -conc:2 
-Xmx512m -a -ignore:quiet -timeoutFactor:5 -agentvm 
-testjdk:/home/xiangzhai/project/jdk8u/build/linux-x86_64-normal-server-release/images/j2sdk-image 
com/sun/jdi/AccessSpecifierTest.java

It is sensible that codelet_size = 
AbstractInterpreter::code()->available_space() - 2*K? is too small.

And CodeBuffer might failed to verify the allocation for each section 
due to guarantee(sect->end() <= tend, "sanity");

So for X86 the suitable range of InterpreterCodeSize might be [250K, 
600K], but for AArch64 is 200K[1].? What is the root cause behind it?? 
It is just like magic number by running with +PrintInterpreter to get 
the VM to print out the size.? I need to dig deep-in to find the root cause.

[1] 
http://hg.openjdk.java.net/aarch64-port/jdk8u/hotspot/file/6bc3e4922a8b/src/cpu/aarch64/vm/templateInterpreter_aarch64.hpp#l38


? 2018?09?15? 02:33, Vladimir Kozlov ??:
> Hi Leslie
>
> More context is needed. Is it Client or Server VM? Did you change 
> ReservedCodeCacheSize?
> Even with *4 it is about 1Mb when CodeCache size is 48Mb and in Tiered 
> case even bigger.
> Also we need call stack when you hit guarantee().
>
> Regards,
> Vladimir
>
> On 9/14/18 4:31 AM, Leslie Zhai wrote:
>> Hi,
>>
>> I just quoted the old thread 
>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-October/012243.html
>>
>> I think we should increase it more for future otherwise you will have to
>> always catch up with interpreter changes.
>>
>> Increase it to 256 * 1024 and 224 * 1024
>>
>> Vladimir
>>
>> On 10/16/13 12:22 PM, Albert Noll wrote:
>> ?> Hi,
>> ?>
>> ?> could I have a review for this patch?
>> ?>
>> ?> bug: https://bugs.openjdk.java.net/browse/JDK-8026708
>> ?> webrev: http://cr.openjdk.java.net/~anoll/8026708/webrev.00/
>> ?> <http://cr.openjdk.java.net/%7Eanoll/8026708/webrev.00/>
>> ?>
>> ?> Problem: Not enough room for interpreter. My last patch did not solve
>> ?> the problem for solaris-amd64.
>> ?>???????????????? A local build (solaris-amd64) of the most recent
>> ?> hotspot-comp version requires a template interpreter
>> ?>???????????????? size of 211K (obtained with -XX:+PrintInterpreter).
>> ?> There have been some modifications to the template
>> ?>???????????????? interpreter in the last couple of weeks which 
>> might have
>> ?> triggered this error.
>> ?>
>> ?> Solution: Increase interpreter size by 8k (32-bit and 64-bit).
>> ?>
>> ?> Testing: Failing test case in solaris-amd64
>>
>> ----- 8< -------- 8< -------- 8< -------- 8< -------- 8< -------- 8< ---
>>
>> I found that `InterpreterCodeSize` had been changed from 200K to 208K 
>> [1] ,? then changed from 208K to 256K [2] by Albert.? But if built 
>> with-debug-level=fastdebug/slowdebug,? it will be multiplied by four:
>>
>> NOT_PRODUCT(code_size *= 4;)? // debug uses extra interpreter code space
>>
>> Then it might trigger Native memory allocation (malloc) failed to 
>> allocate xxx bytes for CodeCache: no room for Interpreter issue.
>>
>> I don't want to always catch up with interpreter changes by guessing 
>> the suitable number, not too small, not too big :) Please give me 
>> some suggestion about the root cause,? thanks a lot!
>>
>> Leslie Zhai
>>
>> [1] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/6d7eba360ba4
>>
>> [2] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/74e00b98d5dd
>>
>>

-- 
Regards,
Leslie Zhai


From vladimir.kozlov at oracle.com  Mon Sep 17 17:14:13 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 17 Sep 2018 10:14:13 -0700
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <d57561f0-94d2-a720-b54c-f48b51051fad@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
 <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>
 <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com>
 <be409c1c-789c-3160-f1d4-55acab52d8df@oracle.com>
 <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com>
 <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com>
 <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com>
Message-ID: <3c0fe42a-0606-a060-7435-32547676683b@oracle.com>

I finished testing on avx512 machine.
All passed except known (TestNaNVector.java) failures.

Thanks,
Vladimir

On 9/14/18 5:22 PM, Vladimir Kozlov wrote:
> I gave incorrect link to RFE. Here is correct:
> 
> https://bugs.openjdk.java.net/browse/JDK-8210764
> 
> Vladimir
> 
> On 9/14/18 3:49 PM, Vladimir Kozlov wrote:
>> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed.
>>
>> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too.
>>
>> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation' on CPU 
>> with AVX1 only
>>
>> #? SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884
>> # Problematic frame:
>> # V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20
>>
>> Current CompileTask:
>> C2:??? 154??? 5???????????? java.lang.String::equals (65 bytes)
>>
>> Stack: [0x00007f3b10044000,0x00007f3b10145000],? sp=0x00007f3b1013fe70,? free space=1007k
>> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20
>> V? [libjvm.so+0x882a72]? PhaseChaitin::gather_lrg_masks(bool)+0x872
>> V? [libjvm.so+0xd82235]? PhaseCFG::global_code_motion()+0xfc5
>> V? [libjvm.so+0xd824b1]? PhaseCFG::do_global_code_motion()+0x51
>> V? [libjvm.so+0xa2c26c]? Compile::Code_Gen()+0x24c
>> V? [libjvm.so+0xa2ff82]? Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, DirectiveSet*)+0xe42
>>
>> ------------------------------------------------------------------------------------------------
>> 2. compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java with '-Xcomp'
>> #? Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073
>> #? assert(false) failed: cannot spill interval that is used in first instruction (possible reason: no register found)
>>
>> Current CompileTask:
>> C1: 854767 13391?????? 3?????? org.sunflow.math.Matrix4::multiply (692 bytes)
>>
>> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],? sp=0x00007f23b9e7f9d0,? free space=1014k
>> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V? [libjvm.so+0x1882202]? VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned 
>> char*, void*, void*, char const*, int, unsigned long)+0x562
>> V? [libjvm.so+0x1882d2f]? VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, 
>> __va_list_tag*)+0x2f
>> V? [libjvm.so+0xb0bea0]? report_vm_error(char const*, int, char const*, char const*, ...)+0x100
>> V? [libjvm.so+0x7e0410]? LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0
>> V? [libjvm.so+0x7e0a20]? LinearScanWalker::activate_current()+0x280
>> V? [libjvm.so+0x7e0c7d]? IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d
>> V? [libjvm.so+0x7e1078]? LinearScan::allocate_registers()+0x338
>> V? [libjvm.so+0x7e2135]? LinearScan::do_linear_scan()+0x155
>> V? [libjvm.so+0x70a6bb]? Compilation::emit_lir()+0x99b
>> V? [libjvm.so+0x70caff]? Compilation::compile_java_method()+0x42f
>> V? [libjvm.so+0x70d974]? Compilation::compile_method()+0x1d4
>> V? [libjvm.so+0x70e547]? Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, 
>> DirectiveSet*)+0x357
>> V? [libjvm.so+0x71073c]? Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c
>> V? [libjvm.so+0xa3cf89]? CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409
>>
>> Vladimir
>>
>> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote:
>>>
>>> Thanks Vladimir, the below should fix this issue:
>>>
>>> ------------------------------
>>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700
>>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700
>>> @@ -233,22 +233,6 @@
>>> ??? _xmm_regs[13]? = xmm13;
>>> ??? _xmm_regs[14]? = xmm14;
>>> ??? _xmm_regs[15]? = xmm15;
>>> -? _xmm_regs[16]? = xmm16;
>>> -? _xmm_regs[17]? = xmm17;
>>> -? _xmm_regs[18]? = xmm18;
>>> -? _xmm_regs[19]? = xmm19;
>>> -? _xmm_regs[20]? = xmm20;
>>> -? _xmm_regs[21]? = xmm21;
>>> -? _xmm_regs[22]? = xmm22;
>>> -? _xmm_regs[23]? = xmm23;
>>> -? _xmm_regs[24]? = xmm24;
>>> -? _xmm_regs[25]? = xmm25;
>>> -? _xmm_regs[26]? = xmm26;
>>> -? _xmm_regs[27]? = xmm27;
>>> -? _xmm_regs[28]? = xmm28;
>>> -? _xmm_regs[29]? = xmm29;
>>> -? _xmm_regs[30]? = xmm30;
>>> -? _xmm_regs[31]? = xmm31;
>>> ? #endif // _LP64
>>>
>>> ??? for (int i = 0; i < 8; i++) {
>>> ---------------------------------
>>>
>>> I think the gcc version on my desktop is older so didn?t catch this.
>>>
>>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to:
>>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/
>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735
>>>
>>> FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you before 
>>> changing it back to 3.
>>>
>>> Best Regards,
>>> Sandhya
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Friday, September 14, 2018 12:13 PM
>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>>
>>> I got build failure:
>>>
>>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the array 
>>> (which contains 16 elements) [-Werror,-Warray-bounds]
>>> jib >?? _xmm_regs[16]? = xmm16;
>>>
>>> I also noticed that we don't have RFE for this work. I filed:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>
>>> You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next 
>>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp:
>>>
>>> - product(intx, UseAVX, 2, \
>>> + product(intx, UseAVX, 3, \
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote:
>>>> Looks good to me. I will start testing and let you know results.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>> Please find below the updated webrev with all your comments incorporated:
>>>>>
>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
>>>>>
>>>>> I have run the jtreg compiler tests on SKX and KNL which have two
>>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms.
>>>>>
>>>>> Best Regards,
>>>>> Sandhya
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>> Sent: Tuesday, September 11, 2018 8:54 PM
>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>>> hotspot-compiler-dev at openjdk.java.net
>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>>>>
>>>>> Thank you, Sandhya
>>>>>
>>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.
>>>>>
>>>>> Vladimir
>>>>>
>>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
>>>>>> Hi Vladimir,
>>>>>>
>>>>>> Thanks a lot for the detailed review. I really appreciate your feedback.
>>>>>> Please see my response in your email below marked with (Sandhya
>>>>>>>>> ). Looking forward to your advice.
>>>>>>
>>>>>> Best Regards,
>>>>>> Sandhya
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>>> Sent: Tuesday, September 11, 2018 5:11 PM
>>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>>>> hotspot-compiler-dev at openjdk.java.net
>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>>>>> instruction
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> I want to discuss next issue:
>>>>>>
>>>>>> ??? > You did not added instructions to load these registers from
>>>>>> memory (and stack). What happens in such cases when you need to load or store?
>>>>>> ??? >>>> Let us take an example, e.g. for loading into rregF. First
>>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>>>>
>>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack.
>>>>>> Also we don't check that register could be the same as result you may get unneeded moves.
>>>>>>
>>>>>> I would advice add memory moves at least.
>>>>>>
>>>>>> Sandhya >>>? I had added those rules initially and removed them in
>>>>>> the final patch. I noticed that the register allocator uses the
>>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask
>>>>>> (matcher.cpp). I would like the register allocator to get all the
>>>>>> possible register on an architecture for idealreg2reg mask. I
>>>>>> wondered that multiple instruct rules in .ad file for LoadF from
>>>>>> memory might cause problems.? I would have to have higher cost for
>>>>>> loading into restricted register set like vlReg. Then I decided that
>>>>>> the register allocator can handle this in much better way than me
>>>>>> adding rules to load from memory. This is with the background that the regF is always all the available registers 
>>>>>> and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if 
>>>>>> I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I 
>>>>>> am referring to is:
>>>>>> ???? MachNode *spillCP = match_tree(new
>>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>>>>> #endif
>>>>>> ???? MachNode *spillI? = match_tree(new
>>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>>>>>> ???? MachNode *spillL? = match_tree(new
>>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered,
>>>>>> LoadNode::DependsO nlyOnTest, false));
>>>>>> ???? MachNode *spillF? = match_tree(new
>>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>>>>>> ???? MachNode *spillD? = match_tree(new
>>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>>>>>> ???? MachNode *spillP? = match_tree(new
>>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>>>>> ???? ....
>>>>>> ???? idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
>>>>>>
>>>>>> An other question. You use movflt() and movdbl() which use either
>>>>>> movap[s|d] and movs[s|d]
>>>>>> instructions:
>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu
>>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when
>>>>>> avx512vl is not available? I see for vectors you use
>>>>>> vpxor+vinserti* combination.
>>>>>>
>>>>>> Sandhya >>> Yes the scalar floating point instructions are available
>>>>>> with AVX512 encoding when avx512vl is not available. That is why you
>>>>>> would see not just movflt, movdbl but all the other scalar
>>>>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions.
>>>>>>
>>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad:
>>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
>>>>>>
>>>>>> Should it be (UseAVX < 3)?
>>>>>>
>>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>>>>>>> Hi Vladimir,
>>>>>>>
>>>>>>> Thanks a lot for your review and feedback. Please see my response
>>>>>>> in your email below. I will send an updated webrev incorporating your feedback.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Sandhya
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>>>> Sent: Monday, September 10, 2018 6:09 PM
>>>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>>>>> hotspot-compiler-dev at openjdk.java.net
>>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>>>>>> instruction
>>>>>>>
>>>>>>> Very nice. Thank you, Sandhya.
>>>>>>>
>>>>>>> I would like to see more meaningful naming in .ad files - instead
>>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*.
>>>>>>>
>>>>>>>>>> Yes, accepted.
>>>>>>>
>>>>>>> New load_from_* and load_to_* instructions in .ad files should be
>>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions:
>>>>>>>
>>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst,
>>>>>>> vlRegF src)
>>>>>>>>>> Yes, accepted.
>>>>>>>
>>>>>>> You did not added instructions to load these registers from memory
>>>>>>> (and stack). What happens in such cases when you need to load or store?
>>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it
>>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>>>>>
>>>>>>> Also please explain why these registers are used when UseAVX == 0?:
>>>>>>>
>>>>>>> +instruct absD_reg(rregD dst) %{
>>>>>>> ??????? predicate((UseSSE>=2) && (UseAVX == 0));
>>>>>>>
>>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too:
>>>>>>> ????? 661?? if (UseAVX < 3) {
>>>>>>> ????? 662???? _features &= ~CPU_AVX512F;
>>>>>>>
>>>>>>>>>> Yes, accepted. It could be regD here.
>>>>>>>
>>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
>>>>>>>
>>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>>>>>>> +vectors_reg_legacy, %{
>>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
>>>>>>> VM_Version::supports_avx512dq() &&
>>>>>>> VM_Version::supports_avx512vl() %} );
>>>>>>>
>>>>>>>>>> Yes, accepted.
>>>>>>>
>>>>>>> I would suggest to test these changes on different machines
>>>>>>> (non-avx512 and avx512) and with different UseAVX values.
>>>>>>>
>>>>>>>>>> Will do.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>>>>>>>> Recently there have been couple of high priority issues with
>>>>>>>> regards to high bank of XMM register
>>>>>>>> (XMM16-XMM31) usage by C2:
>>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>>>>>>
>>>>>>>> Please find below a patch which attempts to clean up the XMM
>>>>>>>> register handling by using register groups.
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>>>>>>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>>>>>>>
>>>>>>>> The patch provides a restricted set of registers to the match
>>>>>>>> rules in the ad file based on the underlying architecture.
>>>>>>>>
>>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>>>>>>>>
>>>>>>>> By removing the special handling, the patch reduces the overall
>>>>>>>> code size by about 1800 lines of code.
>>>>>>>>
>>>>>>>> Your review and feedback is very welcome.
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>>
>>>>>>>> Sandhya
>>>>>>>>

From sandhya.viswanathan at intel.com  Mon Sep 17 18:18:57 2018
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Mon, 17 Sep 2018 18:18:57 +0000
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <3c0fe42a-0606-a060-7435-32547676683b@oracle.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <d57561f0-94d2-a720-b54c-f48b51051fad@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
 <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>
 <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com>
 <be409c1c-789c-3160-f1d4-55acab52d8df@oracle.com>
 <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com>
 <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com>
 <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com>
 <3c0fe42a-0606-a060-7435-32547676683b@oracle.com>
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com>

Hi Vladimir,

I have the fix for the TextNaNVector.java. This was a corner case, when the -XX:MaxVectorSize=4 is given on the command line.  

I need your advice on the second problem with NativeCallTest.java. I am not able to reproduce it but from my analysis this seems to be the case when there is a call to graal compiled code from C1.
The method being called (org.sunflow.math.Matrix4::multiply) has greater than 16 floating point arguments. Per graal calling convention all the arguments need to be passed in xmm register. 
Since there are more than 16 arguments, some of them need to be passed in XMM register > 15.  
As you know, I had restricted the C1 register allocator to only XMM0-15 so it has no way of copying the arguments to appropriate XMM register say XMM16 before calling the graal compiled method.
The solution then seems to be to remove the restriction and allow C1 to have all the registers. 

But if I go with this solution, there is one case of negation of floating point in C1 which needs special handling. In C1 negation of floating point is being done using xorps, xorpd. 
The xorpd and xorps instructions are not available in AVX512F (KNL) with higher bank (XMM  > 15). 
The only assembler level alternative for this seems to be to have the ugly workaround for KNL (push_zmm(xmm0), .... pop_zmm(xmm0)).
Any other solution is not straightforward. Solutions like using subss/subsd from 0.0 doesn?t work as src/dst can be the same in the call to LIRAssembler. So I cannot load 0.0 in dst and do a subtraction. 
For C2, I could go with providing restricted register set to the instruction and letting the register allocator do the work. I am not familiar with C1 register allocator or codegen.

So my question is would you be ok with sequence like below for C1 on Knights family only when the dst/src register is > 15 for negatesd(xorpd), negatess(xorps)?
   push_zmm(xmm0);
   movdbl(xmm0, nds);
   vxorpd(xmm0, xmm0, src, Assembler::AVX_128bit);
   movdbl(dst, xmm0);
   pop_zmm(xmm0);

Please let me know your thoughts and if a better solution is possible. In the meantime I will prepare a patch with the above solutions for the two problems that you reported, do some testing and send it to you.

Note: For NativeCallTest.java and many others I get the following message in the corresponding .jtr file:
                    "test result: Error. Use -nativepath to specify the location of native code"
            Do I need to give any additional info to jtreg to get over this problem?

Thanks a lot!             
Best Regards,
Sandhya

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Monday, September 17, 2018 10:14 AM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction

I finished testing on avx512 machine.
All passed except known (TestNaNVector.java) failures.

Thanks,
Vladimir

On 9/14/18 5:22 PM, Vladimir Kozlov wrote:
> I gave incorrect link to RFE. Here is correct:
> 
> https://bugs.openjdk.java.net/browse/JDK-8210764
> 
> Vladimir
> 
> On 9/14/18 3:49 PM, Vladimir Kozlov wrote:
>> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed.
>>
>> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too.
>>
>> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation' on CPU 
>> with AVX1 only
>>
>> #? SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884
>> # Problematic frame:
>> # V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20
>>
>> Current CompileTask:
>> C2:??? 154??? 5???????????? java.lang.String::equals (65 bytes)
>>
>> Stack: [0x00007f3b10044000,0x00007f3b10145000],? sp=0x00007f3b1013fe70,? free space=1007k
>> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20
>> V? [libjvm.so+0x882a72]? PhaseChaitin::gather_lrg_masks(bool)+0x872
>> V? [libjvm.so+0xd82235]? PhaseCFG::global_code_motion()+0xfc5
>> V? [libjvm.so+0xd824b1]? PhaseCFG::do_global_code_motion()+0x51
>> V? [libjvm.so+0xa2c26c]? Compile::Code_Gen()+0x24c
>> V? [libjvm.so+0xa2ff82]? Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, DirectiveSet*)+0xe42
>>
>> ------------------------------------------------------------------------------------------------
>> 2. compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java with '-Xcomp'
>> #? Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073
>> #? assert(false) failed: cannot spill interval that is used in first instruction (possible reason: no register found)
>>
>> Current CompileTask:
>> C1: 854767 13391?????? 3?????? org.sunflow.math.Matrix4::multiply (692 bytes)
>>
>> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],? sp=0x00007f23b9e7f9d0,? free space=1014k
>> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V? [libjvm.so+0x1882202]? VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned 
>> char*, void*, void*, char const*, int, unsigned long)+0x562
>> V? [libjvm.so+0x1882d2f]? VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, 
>> __va_list_tag*)+0x2f
>> V? [libjvm.so+0xb0bea0]? report_vm_error(char const*, int, char const*, char const*, ...)+0x100
>> V? [libjvm.so+0x7e0410]? LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0
>> V? [libjvm.so+0x7e0a20]? LinearScanWalker::activate_current()+0x280
>> V? [libjvm.so+0x7e0c7d]? IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d
>> V? [libjvm.so+0x7e1078]? LinearScan::allocate_registers()+0x338
>> V? [libjvm.so+0x7e2135]? LinearScan::do_linear_scan()+0x155
>> V? [libjvm.so+0x70a6bb]? Compilation::emit_lir()+0x99b
>> V? [libjvm.so+0x70caff]? Compilation::compile_java_method()+0x42f
>> V? [libjvm.so+0x70d974]? Compilation::compile_method()+0x1d4
>> V? [libjvm.so+0x70e547]? Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, 
>> DirectiveSet*)+0x357
>> V? [libjvm.so+0x71073c]? Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c
>> V? [libjvm.so+0xa3cf89]? CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409
>>
>> Vladimir
>>
>> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote:
>>>
>>> Thanks Vladimir, the below should fix this issue:
>>>
>>> ------------------------------
>>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700
>>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700
>>> @@ -233,22 +233,6 @@
>>> ??? _xmm_regs[13]? = xmm13;
>>> ??? _xmm_regs[14]? = xmm14;
>>> ??? _xmm_regs[15]? = xmm15;
>>> -? _xmm_regs[16]? = xmm16;
>>> -? _xmm_regs[17]? = xmm17;
>>> -? _xmm_regs[18]? = xmm18;
>>> -? _xmm_regs[19]? = xmm19;
>>> -? _xmm_regs[20]? = xmm20;
>>> -? _xmm_regs[21]? = xmm21;
>>> -? _xmm_regs[22]? = xmm22;
>>> -? _xmm_regs[23]? = xmm23;
>>> -? _xmm_regs[24]? = xmm24;
>>> -? _xmm_regs[25]? = xmm25;
>>> -? _xmm_regs[26]? = xmm26;
>>> -? _xmm_regs[27]? = xmm27;
>>> -? _xmm_regs[28]? = xmm28;
>>> -? _xmm_regs[29]? = xmm29;
>>> -? _xmm_regs[30]? = xmm30;
>>> -? _xmm_regs[31]? = xmm31;
>>> ? #endif // _LP64
>>>
>>> ??? for (int i = 0; i < 8; i++) {
>>> ---------------------------------
>>>
>>> I think the gcc version on my desktop is older so didn?t catch this.
>>>
>>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to:
>>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/
>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735
>>>
>>> FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you before 
>>> changing it back to 3.
>>>
>>> Best Regards,
>>> Sandhya
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Friday, September 14, 2018 12:13 PM
>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>>
>>> I got build failure:
>>>
>>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the array 
>>> (which contains 16 elements) [-Werror,-Warray-bounds]
>>> jib >?? _xmm_regs[16]? = xmm16;
>>>
>>> I also noticed that we don't have RFE for this work. I filed:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>
>>> You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next 
>>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp:
>>>
>>> - product(intx, UseAVX, 2, \
>>> + product(intx, UseAVX, 3, \
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote:
>>>> Looks good to me. I will start testing and let you know results.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>> Please find below the updated webrev with all your comments incorporated:
>>>>>
>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
>>>>>
>>>>> I have run the jtreg compiler tests on SKX and KNL which have two
>>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms.
>>>>>
>>>>> Best Regards,
>>>>> Sandhya
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>> Sent: Tuesday, September 11, 2018 8:54 PM
>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>>> hotspot-compiler-dev at openjdk.java.net
>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>>>>
>>>>> Thank you, Sandhya
>>>>>
>>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.
>>>>>
>>>>> Vladimir
>>>>>
>>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
>>>>>> Hi Vladimir,
>>>>>>
>>>>>> Thanks a lot for the detailed review. I really appreciate your feedback.
>>>>>> Please see my response in your email below marked with (Sandhya
>>>>>>>>> ). Looking forward to your advice.
>>>>>>
>>>>>> Best Regards,
>>>>>> Sandhya
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>>> Sent: Tuesday, September 11, 2018 5:11 PM
>>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>>>> hotspot-compiler-dev at openjdk.java.net
>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>>>>> instruction
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> I want to discuss next issue:
>>>>>>
>>>>>> ??? > You did not added instructions to load these registers from
>>>>>> memory (and stack). What happens in such cases when you need to load or store?
>>>>>> ??? >>>> Let us take an example, e.g. for loading into rregF. First
>>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>>>>
>>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack.
>>>>>> Also we don't check that register could be the same as result you may get unneeded moves.
>>>>>>
>>>>>> I would advice add memory moves at least.
>>>>>>
>>>>>> Sandhya >>>? I had added those rules initially and removed them in
>>>>>> the final patch. I noticed that the register allocator uses the
>>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask
>>>>>> (matcher.cpp). I would like the register allocator to get all the
>>>>>> possible register on an architecture for idealreg2reg mask. I
>>>>>> wondered that multiple instruct rules in .ad file for LoadF from
>>>>>> memory might cause problems.? I would have to have higher cost for
>>>>>> loading into restricted register set like vlReg. Then I decided that
>>>>>> the register allocator can handle this in much better way than me
>>>>>> adding rules to load from memory. This is with the background that the regF is always all the available registers 
>>>>>> and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if 
>>>>>> I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I 
>>>>>> am referring to is:
>>>>>> ???? MachNode *spillCP = match_tree(new
>>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>>>>> #endif
>>>>>> ???? MachNode *spillI? = match_tree(new
>>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>>>>>> ???? MachNode *spillL? = match_tree(new
>>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered,
>>>>>> LoadNode::DependsO nlyOnTest, false));
>>>>>> ???? MachNode *spillF? = match_tree(new
>>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>>>>>> ???? MachNode *spillD? = match_tree(new
>>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>>>>>> ???? MachNode *spillP? = match_tree(new
>>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>>>>> ???? ....
>>>>>> ???? idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
>>>>>>
>>>>>> An other question. You use movflt() and movdbl() which use either
>>>>>> movap[s|d] and movs[s|d]
>>>>>> instructions:
>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu
>>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when
>>>>>> avx512vl is not available? I see for vectors you use
>>>>>> vpxor+vinserti* combination.
>>>>>>
>>>>>> Sandhya >>> Yes the scalar floating point instructions are available
>>>>>> with AVX512 encoding when avx512vl is not available. That is why you
>>>>>> would see not just movflt, movdbl but all the other scalar
>>>>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions.
>>>>>>
>>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad:
>>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
>>>>>>
>>>>>> Should it be (UseAVX < 3)?
>>>>>>
>>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>>>>>>> Hi Vladimir,
>>>>>>>
>>>>>>> Thanks a lot for your review and feedback. Please see my response
>>>>>>> in your email below. I will send an updated webrev incorporating your feedback.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Sandhya
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>>>> Sent: Monday, September 10, 2018 6:09 PM
>>>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>>>>> hotspot-compiler-dev at openjdk.java.net
>>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>>>>>> instruction
>>>>>>>
>>>>>>> Very nice. Thank you, Sandhya.
>>>>>>>
>>>>>>> I would like to see more meaningful naming in .ad files - instead
>>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*.
>>>>>>>
>>>>>>>>>> Yes, accepted.
>>>>>>>
>>>>>>> New load_from_* and load_to_* instructions in .ad files should be
>>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions:
>>>>>>>
>>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst,
>>>>>>> vlRegF src)
>>>>>>>>>> Yes, accepted.
>>>>>>>
>>>>>>> You did not added instructions to load these registers from memory
>>>>>>> (and stack). What happens in such cases when you need to load or store?
>>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it
>>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>>>>>
>>>>>>> Also please explain why these registers are used when UseAVX == 0?:
>>>>>>>
>>>>>>> +instruct absD_reg(rregD dst) %{
>>>>>>> ??????? predicate((UseSSE>=2) && (UseAVX == 0));
>>>>>>>
>>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too:
>>>>>>> ????? 661?? if (UseAVX < 3) {
>>>>>>> ????? 662???? _features &= ~CPU_AVX512F;
>>>>>>>
>>>>>>>>>> Yes, accepted. It could be regD here.
>>>>>>>
>>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
>>>>>>>
>>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>>>>>>> +vectors_reg_legacy, %{
>>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
>>>>>>> VM_Version::supports_avx512dq() &&
>>>>>>> VM_Version::supports_avx512vl() %} );
>>>>>>>
>>>>>>>>>> Yes, accepted.
>>>>>>>
>>>>>>> I would suggest to test these changes on different machines
>>>>>>> (non-avx512 and avx512) and with different UseAVX values.
>>>>>>>
>>>>>>>>>> Will do.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>>>>>>>> Recently there have been couple of high priority issues with
>>>>>>>> regards to high bank of XMM register
>>>>>>>> (XMM16-XMM31) usage by C2:
>>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>>>>>>
>>>>>>>> Please find below a patch which attempts to clean up the XMM
>>>>>>>> register handling by using register groups.
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>>>>>>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>>>>>>>
>>>>>>>> The patch provides a restricted set of registers to the match
>>>>>>>> rules in the ad file based on the underlying architecture.
>>>>>>>>
>>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>>>>>>>>
>>>>>>>> By removing the special handling, the patch reduces the overall
>>>>>>>> code size by about 1800 lines of code.
>>>>>>>>
>>>>>>>> Your review and feedback is very welcome.
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>>
>>>>>>>> Sandhya
>>>>>>>>

From vladimir.kozlov at oracle.com  Mon Sep 17 18:32:59 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 17 Sep 2018 11:32:59 -0700
Subject: [12] RFR(S) 8209574: [AOT] breakpoint events are generated in
 different threads does not meet expected count
Message-ID: <92f97ba3-ab35-b4b4-4364-423eecb6eff3@oracle.com>

http://cr.openjdk.java.net/~kvn/8209574/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8209574

Disable AOT when debugger is attached.

-- 
Thanks,
Vladimir

From vladimir.kozlov at oracle.com  Mon Sep 17 18:37:18 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 17 Sep 2018 11:37:18 -0700
Subject: [12] RFR(S) 8209574: [AOT] breakpoint events are generated in
 different threads does not meet expected count
In-Reply-To: <92f97ba3-ab35-b4b4-4364-423eecb6eff3@oracle.com>
References: <92f97ba3-ab35-b4b4-4364-423eecb6eff3@oracle.com>
Message-ID: <0b383f5d-75eb-c3a9-5e13-e8d11e0b4c58@oracle.com>

Pressed 'Send' too soon.

I also removed unused AOT functions and added '--info' flag to AOT test driver class AotCompiler.java to investigate 
cases when jaotc is timed-out (we have several bugs, for example, 8209769).

Thanks,
Vladimir

On 9/17/18 11:32 AM, Vladimir Kozlov wrote:
> http://cr.openjdk.java.net/~kvn/8209574/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8209574
> 
> Disable AOT when debugger is attached.
> 

From dean.long at oracle.com  Mon Sep 17 18:48:11 2018
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Mon, 17 Sep 2018 11:48:11 -0700
Subject: [12] RFR(S) 8209574: [AOT] breakpoint events are generated in
 different threads does not meet expected count
In-Reply-To: <0b383f5d-75eb-c3a9-5e13-e8d11e0b4c58@oracle.com>
References: <92f97ba3-ab35-b4b4-4364-423eecb6eff3@oracle.com>
 <0b383f5d-75eb-c3a9-5e13-e8d11e0b4c58@oracle.com>
Message-ID: <f2083dcc-6b05-a709-ede6-54a0ef2bd1ec@oracle.com>

 >? assert(UseAOT, "called only only when AOT is enabled");

typo "only only".? The rest looks good.

dl

On 9/17/18 11:37 AM, Vladimir Kozlov wrote:
> Pressed 'Send' too soon.
>
> I also removed unused AOT functions and added '--info' flag to AOT 
> test driver class AotCompiler.java to investigate cases when jaotc is 
> timed-out (we have several bugs, for example, 8209769).
>
> Thanks,
> Vladimir
>
> On 9/17/18 11:32 AM, Vladimir Kozlov wrote:
>> http://cr.openjdk.java.net/~kvn/8209574/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8209574
>>
>> Disable AOT when debugger is attached.
>>


From vladimir.kozlov at oracle.com  Mon Sep 17 19:03:09 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 17 Sep 2018 12:03:09 -0700
Subject: [12] RFR(S) 8209574: [AOT] breakpoint events are generated in
 different threads does not meet expected count
In-Reply-To: <f2083dcc-6b05-a709-ede6-54a0ef2bd1ec@oracle.com>
References: <92f97ba3-ab35-b4b4-4364-423eecb6eff3@oracle.com>
 <0b383f5d-75eb-c3a9-5e13-e8d11e0b4c58@oracle.com>
 <f2083dcc-6b05-a709-ede6-54a0ef2bd1ec@oracle.com>
Message-ID: <7bb8254c-8d91-3667-6cf1-f463d49b9ea8@oracle.com>

Thank you, Dean

On 9/17/18 11:48 AM, dean.long at oracle.com wrote:
>  >? assert(UseAOT, "called only only when AOT is enabled");
> 
> typo "only only".? The rest looks good.

Fixed.

Vladimir

> 
> dl
> 
> On 9/17/18 11:37 AM, Vladimir Kozlov wrote:
>> Pressed 'Send' too soon.
>>
>> I also removed unused AOT functions and added '--info' flag to AOT test driver class AotCompiler.java to investigate 
>> cases when jaotc is timed-out (we have several bugs, for example, 8209769).
>>
>> Thanks,
>> Vladimir
>>
>> On 9/17/18 11:32 AM, Vladimir Kozlov wrote:
>>> http://cr.openjdk.java.net/~kvn/8209574/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8209574
>>>
>>> Disable AOT when debugger is attached.
>>>
> 

From gromero at linux.vnet.ibm.com  Mon Sep 17 21:48:38 2018
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Mon, 17 Sep 2018 18:48:38 -0300
Subject: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx
 registers in ppc.ad
In-Reply-To: <OF46ED71D3.A34CED01-ON00258308.003FBB2C-49258308.00404B58@notes.na.collabserv.com>
References: <5e113528b9c94b5089b8b7f1b8b22ed8@sap.com>
 <OF38E7CA02.B1F6AF05-ON00258307.005C3C0B-49258307.005DDAA1@notes.na.collabserv.com>
 <f67c306c714c40e4bb604e8f9dfe4515@sap.com>
 <OF46ED71D3.A34CED01-ON00258308.003FBB2C-49258308.00404B58@notes.na.collabserv.com>
Message-ID: <e9ba6fa4-c2d2-b1d4-3f91-ee248ebae09d@linux.vnet.ibm.com>

Hi Michi,

On 09/14/2018 08:42 AM, Michihiro Horie wrote:
> Hi Martin,
> 
> Thank you for your comment to improve this change and testing it. I uploaded a new webrev with format statements.<http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.02/>
> http://cr.openjdk.java.net/~mhorie/8210660/webrev.02/ <http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.02/>

Thanks for the updated webrev.

Only some nits:

Besides the missing closing quotes (") in MTVSRWZ and XXSPLTW format
strings I see trailing spaces in the following lines:

-    immI8  zero %{ (int)  0 %}
+    immI8  zero %{ (int)  0 %}
  
-    xscvdpspn_regF(tmpV, src);
+    xscvdpspn_regF(tmpV, src);

Curious enough, jcheck [1] is not complaining about them. I found it because I
set the color extension [2] in .hgrc:

[extensions]
color =

which marks trailing whitespaces in red.

I looks like some trailing spaces slipped also into related change
"8188139: PPC64: Superword Level Parallelization with VSX", in case you want to
fix them in a next change.

Finally, I think it would be better in XXPERMDI format string to replace
"Permute 16-byte register" to something like "Splat doubleword" to be like the
description in XXSPLTW that says "Splat word".
  
Otherwise, LGTM. Reviewed.

I'll sponsor that change.


Best regards,
Gustavo

[1] http://openjdk.java.net/projects/code-tools/jcheck/
[2] https://www.mercurial-scm.org/wiki/ColorExtension

> 
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo
> 
> Inactive hide details for "Doerr, Martin" ---2018/09/14 17:30:04---Hi Michihiro, your webrev"Doerr, Martin" ---2018/09/14 17:30:04---Hi Michihiro, your webrev
> 
> From: "Doerr, Martin" <martin.doerr at sap.com>
> To: Michihiro Horie <HORIE at jp.ibm.com>
> Cc: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>, Gustavo Romero <gromero at linux.vnet.ibm.com>, "hotspot-compiler-dev at openjdk.java.net" <hotspot-compiler-dev at openjdk.java.net>
> Date: 2018/09/14 17:30
> Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad
> 
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> 
> Hi Michihiro,
> 
> your webrev
> _http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/_ <http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.01/>
> looks good to me.
> 
> I only noticed that some instructs (e.g. for xscvdpspn and xxspltw) have no ?format %{ ? %}? specification so they are missing in the PrintOptoAssembly output. But this seems to be missing in the current version already.
> 
> We can test it while waiting for a 2nd review.
> 
> Thanks and best regards,
> Martin
> 
> 
> *From:* Michihiro Horie <HORIE at jp.ibm.com> *
> Sent:* Donnerstag, 13. September 2018 19:05*
> To:* Doerr, Martin <martin.doerr at sap.com>*
> Cc:* Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Gustavo Romero <gromero at linux.vnet.ibm.com>; hotspot-compiler-dev at openjdk.java.net*
> Subject:* RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad
> 
> Hi Martin,
> 
> Thank you so much for your review (and adding the ID in the subject :-).
> 
>  >I don?t think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used.
> You're right, thanks. I removed a redundant one.
> 
> I also refactored ReplicateD with vector length 2.
> 
> Following is the latest webrev:_
> __http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/_ <http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.00/>
> 
> 
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo
> 
> Inactive hide details for "Doerr, Martin" ---2018/09/13 16:25:33---Hi Michihiro, I have added "RFR(S): 8210660" to the subject."Doerr, Martin" ---2018/09/13 16:25:33---Hi Michihiro, I have added "RFR(S): 8210660" to the subject.
> 
> From: "Doerr, Martin" <_martin.doerr at sap.com_ <mailto:martin.doerr at sap.com>>
> To: Michihiro Horie <_HORIE at jp.ibm.com_ <mailto:HORIE at jp.ibm.com>>, "_hotspot-compiler-dev at openjdk.java.net_ <mailto:hotspot-compiler-dev at openjdk.java.net>" <_hotspot-compiler-dev at openjdk.java.net_ <mailto:hotspot-compiler-dev at openjdk.java.net>>
> Cc: Gustavo Romero <_gromero at linux.vnet.ibm.com_ <mailto:gromero at linux.vnet.ibm.com>>, "Lindenmaier, Goetz" <_goetz.lindenmaier at sap.com_ <mailto:goetz.lindenmaier at sap.com>>
> Date: 2018/09/13 16:25
> Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad
> 
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> 
> 
> Hi Michihiro,
> 
> I have added ?RFR(S): 8210660? to the subject.
> 
> I don?t think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used.
> Besides this, your change looks good to me.
> 
> Would you like to improve ReplicateD with vector length 2, too?
> 
> Thanks and best regards,
> Martin
> 
> *
> From:* Michihiro Horie <_HORIE at jp.ibm.com_ <mailto:HORIE at jp.ibm.com>> *
> Sent:* Mittwoch, 12. September 2018 18:11*
> To:* _hotspot-compiler-dev at openjdk.java.net_ <mailto:hotspot-compiler-dev at openjdk.java.net>*
> Cc:* Doerr, Martin <_martin.doerr at sap.com_ <mailto:martin.doerr at sap.com>>; Gustavo Romero <_gromero at linux.vnet.ibm.com_ <mailto:gromero at linux.vnet.ibm.com>>; Lindenmaier, Goetz <_goetz.lindenmaier at sap.com_ <mailto:goetz.lindenmaier at sap.com>>*
> Subject:* RFR: PPC64: Mapping floating point registers to vsx registers in ppc.ad
> 
> Dear all,
> 
> Would you please review the following change?
> 
> Bug: _https://bugs.openjdk.java.net/browse/JDK-8210660_
> Webrev: _http://cr.openjdk.java.net/~mhorie/8210660/webrev.00/_ <http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.00/>
> 
> In the current code emit for replicating the floating point value in ppc.ad, a floating point value is once stored in order to load as an integer value. However, when SuperwordUseVSX is enabled, this is redundant because the floating point registers are overlapped with vsx registers 0-31. We can use vsx instructions for replicating the floating point value by mapping the floating point register to the vsx register.
> 
> 
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo
> 
> 

From HORIE at jp.ibm.com  Tue Sep 18 02:35:46 2018
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Tue, 18 Sep 2018 11:35:46 +0900
Subject: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx
 registers in ppc.ad
In-Reply-To: <e9ba6fa4-c2d2-b1d4-3f91-ee248ebae09d@linux.vnet.ibm.com>
References: <5e113528b9c94b5089b8b7f1b8b22ed8@sap.com>
 <OF38E7CA02.B1F6AF05-ON00258307.005C3C0B-49258307.005DDAA1@notes.na.collabserv.com>
 <f67c306c714c40e4bb604e8f9dfe4515@sap.com>
 <OF46ED71D3.A34CED01-ON00258308.003FBB2C-49258308.00404B58@notes.na.collabserv.com>
 <e9ba6fa4-c2d2-b1d4-3f91-ee248ebae09d@linux.vnet.ibm.com>
Message-ID: <OFB1252508.95B7C2C4-ON0025830C.000DA7A9-4925830C.000E42CD@notes.na.collabserv.com>


Hi Gustavo,

Thanks a lot for your comments and review! I uploaded a latest webrev with
closing quotes and removing trailing spaces.

http://cr.openjdk.java.net/~mhorie/8210660/webrev.03/


Best regards,
--
Michihiro,
IBM Research - Tokyo


From:	Gustavo Romero <gromero at linux.vnet.ibm.com>
To:	Michihiro Horie/Japan/IBM at IBMJP, "Doerr, Martin"
            <martin.doerr at sap.com>
Cc:	"Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>,
            "hotspot-compiler-dev at openjdk.java.net"
            <hotspot-compiler-dev at openjdk.java.net>
Date:	2018/09/18 06:48
Subject:	Re: RFR(S): 8210660: PPC64: Mapping floating point registers to
            vsx registers in ppc.ad


Hi Michi,

On 09/14/2018 08:42 AM, Michihiro Horie wrote:
> Hi Martin,
>
> Thank you for your comment to improve this change and testing it. I
uploaded a new webrev with format statements.<
http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.02/>
> http://cr.openjdk.java.net/~mhorie/8210660/webrev.02/ <
http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.02/>

Thanks for the updated webrev.

Only some nits:

Besides the missing closing quotes (") in MTVSRWZ and XXSPLTW format
strings I see trailing spaces in the following lines:

-    immI8  zero %{ (int)  0 %}
+    immI8  zero %{ (int)  0 %}

-    xscvdpspn_regF(tmpV, src);
+    xscvdpspn_regF(tmpV, src);

Curious enough, jcheck [1] is not complaining about them. I found it
because I
set the color extension [2] in .hgrc:

[extensions]
color =

which marks trailing whitespaces in red.

I looks like some trailing spaces slipped also into related change
"8188139: PPC64: Superword Level Parallelization with VSX", in case you
want to
fix them in a next change.

Finally, I think it would be better in XXPERMDI format string to replace
"Permute 16-byte register" to something like "Splat doubleword" to be like
the
description in XXSPLTW that says "Splat word".

Otherwise, LGTM. Reviewed.

I'll sponsor that change.


Best regards,
Gustavo

[1] http://openjdk.java.net/projects/code-tools/jcheck/
[2] https://www.mercurial-scm.org/wiki/ColorExtension

>
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo
>
> Inactive hide details for "Doerr, Martin" ---2018/09/14 17:30:04---Hi
Michihiro, your webrev"Doerr, Martin" ---2018/09/14 17:30:04---Hi
Michihiro, your webrev
>
> From: "Doerr, Martin" <martin.doerr at sap.com>
> To: Michihiro Horie <HORIE at jp.ibm.com>
> Cc: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>, Gustavo Romero
<gromero at linux.vnet.ibm.com>, "hotspot-compiler-dev at openjdk.java.net"
<hotspot-compiler-dev at openjdk.java.net>
> Date: 2018/09/14 17:30
> Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to
vsx registers in ppc.ad
>
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>
>
>
> Hi Michihiro,
>
> your webrev
> _http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/_ <
http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.01/>
> looks good to me.
>
> I only noticed that some instructs (e.g. for xscvdpspn and xxspltw) have
no ?format %{ ? %}? specification so they are missing in the
PrintOptoAssembly output. But this seems to be missing in the current
version already.
>
> We can test it while waiting for a 2nd review.
>
> Thanks and best regards,
> Martin
>
>
> *From:* Michihiro Horie <HORIE at jp.ibm.com> *
> Sent:* Donnerstag, 13. September 2018 19:05*
> To:* Doerr, Martin <martin.doerr at sap.com>*
> Cc:* Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Gustavo Romero
<gromero at linux.vnet.ibm.com>; hotspot-compiler-dev at openjdk.java.net*
> Subject:* RE: RFR(S): 8210660: PPC64: Mapping floating point registers to
vsx registers in ppc.ad
>
> Hi Martin,
>
> Thank you so much for your review (and adding the ID in the subject :-).
>
>  >I don?t think we need 2 nodes for ReplicateF with vector length 4. Both
ones in your webrev are for Power8 so only one will be used.
> You're right, thanks. I removed a redundant one.
>
> I also refactored ReplicateD with vector length 2.
>
> Following is the latest webrev:_
> __http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/_ <
http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.00/>
>
>
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo
>
> Inactive hide details for "Doerr, Martin" ---2018/09/13 16:25:33---Hi
Michihiro, I have added "RFR(S): 8210660" to the subject."Doerr, Martin"
---2018/09/13 16:25:33---Hi Michihiro, I have added "RFR(S): 8210660" to
the subject.
>
> From: "Doerr, Martin" <_martin.doerr at sap.com_ <
mailto:martin.doerr at sap.com>>
> To: Michihiro Horie <_HORIE at jp.ibm.com_ <mailto:HORIE at jp.ibm.com>>,
"_hotspot-compiler-dev at openjdk.java.net_ <
mailto:hotspot-compiler-dev at openjdk.java.net>"
<_hotspot-compiler-dev at openjdk.java.net_ <
mailto:hotspot-compiler-dev at openjdk.java.net>>
> Cc: Gustavo Romero <_gromero at linux.vnet.ibm.com_ <
mailto:gromero at linux.vnet.ibm.com>>, "Lindenmaier, Goetz"
<_goetz.lindenmaier at sap.com_ <mailto:goetz.lindenmaier at sap.com>>
> Date: 2018/09/13 16:25
> Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to
vsx registers in ppc.ad
>
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>
>
>
>
> Hi Michihiro,
>
> I have added ?RFR(S): 8210660? to the subject.
>
> I don?t think we need 2 nodes for ReplicateF with vector length 4. Both
ones in your webrev are for Power8 so only one will be used.
> Besides this, your change looks good to me.
>
> Would you like to improve ReplicateD with vector length 2, too?
>
> Thanks and best regards,
> Martin
>
> *
> From:* Michihiro Horie <_HORIE at jp.ibm.com_ <mailto:HORIE at jp.ibm.com>> *
> Sent:* Mittwoch, 12. September 2018 18:11*
> To:* _hotspot-compiler-dev at openjdk.java.net_ <
mailto:hotspot-compiler-dev at openjdk.java.net>*
> Cc:* Doerr, Martin <_martin.doerr at sap.com_ <mailto:martin.doerr at sap.com
>>; Gustavo Romero <_gromero at linux.vnet.ibm.com_ <
mailto:gromero at linux.vnet.ibm.com>>; Lindenmaier, Goetz
<_goetz.lindenmaier at sap.com_ <mailto:goetz.lindenmaier at sap.com>>*
> Subject:* RFR: PPC64: Mapping floating point registers to vsx registers
in ppc.ad
>
> Dear all,
>
> Would you please review the following change?
>
> Bug: _https://bugs.openjdk.java.net/browse/JDK-8210660_
> Webrev: _http://cr.openjdk.java.net/~mhorie/8210660/webrev.00/_ <
http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.00/>
>
> In the current code emit for replicating the floating point value in
ppc.ad, a floating point value is once stored in order to load as an
integer value. However, when SuperwordUseVSX is enabled, this is redundant
because the floating point registers are overlapped with vsx registers
0-31. We can use vsx instructions for replicating the floating point value
by mapping the floating point register to the vsx register.
>
>
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180918/fa78d1b8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180918/fa78d1b8/graycol.gif>

From gromero at linux.vnet.ibm.com  Tue Sep 18 03:46:39 2018
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Tue, 18 Sep 2018 00:46:39 -0300
Subject: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx
 registers in ppc.ad
In-Reply-To: <OFB1252508.95B7C2C4-ON0025830C.000DA7A9-4925830C.000E42CD@notes.na.collabserv.com>
References: <5e113528b9c94b5089b8b7f1b8b22ed8@sap.com>
 <OF38E7CA02.B1F6AF05-ON00258307.005C3C0B-49258307.005DDAA1@notes.na.collabserv.com>
 <f67c306c714c40e4bb604e8f9dfe4515@sap.com>
 <OF46ED71D3.A34CED01-ON00258308.003FBB2C-49258308.00404B58@notes.na.collabserv.com>
 <e9ba6fa4-c2d2-b1d4-3f91-ee248ebae09d@linux.vnet.ibm.com>
 <OFB1252508.95B7C2C4-ON0025830C.000DA7A9-4925830C.000E42CD@notes.na.collabserv.com>
Message-ID: <5532522c-e1d5-f003-8630-a663978d5f21@linux.vnet.ibm.com>

Hi Michi,

Thanks for the updated webrev.

Pushed.


Best regards,
Gustavo

On 09/17/2018 11:35 PM, Michihiro Horie wrote:
> Hi Gustavo,
> 
> Thanks a lot for your comments and review! I uploaded a latest webrev with closing quotes and removing trailing spaces.
> 
> http://cr.openjdk.java.net/~mhorie/8210660/webrev.03/ <http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.03/>
> 
> 
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo
> 
> Inactive hide details for Gustavo Romero ---2018/09/18 06:48:48---Hi Michi, On 09/14/2018 08:42 AM, Michihiro Horie wrote:Gustavo Romero ---2018/09/18 06:48:48---Hi Michi, On 09/14/2018 08:42 AM, Michihiro Horie wrote:
> 
> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> To: Michihiro Horie/Japan/IBM at IBMJP, "Doerr, Martin" <martin.doerr at sap.com>
> Cc: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>, "hotspot-compiler-dev at openjdk.java.net" <hotspot-compiler-dev at openjdk.java.net>
> Date: 2018/09/18 06:48
> Subject: Re: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad
> 
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> 
> Hi Michi,
> 
> On 09/14/2018 08:42 AM, Michihiro Horie wrote:
>  > Hi Martin,
>  >
>  > Thank you for your comment to improve this change and testing it. I uploaded a new webrev with format statements.<http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.02/>
>  > http://cr.openjdk.java.net/~mhorie/8210660/webrev.02/ <http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.02/>?<http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.02/>
> 
> Thanks for the updated webrev.
> 
> Only some nits:
> 
> Besides the missing closing quotes (") in MTVSRWZ and XXSPLTW format
> strings I see trailing spaces in the following lines:
> 
> - ? ?immI8 ?zero %{ (int) ?0 %}
> + ? ?immI8 ?zero %{ (int) ?0 %}
> 
> - ? ?xscvdpspn_regF(tmpV, src);
> + ? ?xscvdpspn_regF(tmpV, src);
> 
> Curious enough, jcheck [1] is not complaining about them. I found it because I
> set the color extension [2] in .hgrc:
> 
> [extensions]
> color =
> 
> which marks trailing whitespaces in red.
> 
> I looks like some trailing spaces slipped also into related change
> "8188139: PPC64: Superword Level Parallelization with VSX", in case you want to
> fix them in a next change.
> 
> Finally, I think it would be better in XXPERMDI format string to replace
> "Permute 16-byte register" to something like "Splat doubleword" to be like the
> description in XXSPLTW that says "Splat word".
> 
> Otherwise, LGTM. Reviewed.
> 
> I'll sponsor that change.
> 
> 
> Best regards,
> Gustavo
> 
> [1] http://openjdk.java.net/projects/code-tools/jcheck/
> [2] https://www.mercurial-scm.org/wiki/ColorExtension
> 
>  >
>  > Best regards,
>  > --
>  > Michihiro,
>  > IBM Research - Tokyo
>  >
>  > Inactive hide details for "Doerr, Martin" ---2018/09/14 17:30:04---Hi Michihiro, your webrev"Doerr, Martin" ---2018/09/14 17:30:04---Hi Michihiro, your webrev
>  >
>  > From: "Doerr, Martin" <martin.doerr at sap.com>
>  > To: Michihiro Horie <HORIE at jp.ibm.com>
>  > Cc: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>, Gustavo Romero <gromero at linux.vnet.ibm.com>, "hotspot-compiler-dev at openjdk.java.net" <hotspot-compiler-dev at openjdk.java.net>
>  > Date: 2018/09/14 17:30
>  > Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad
>  >
>  > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  >
>  >
>  >
>  > Hi Michihiro,
>  >
>  > your webrev
>  > _http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/_ <http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.01/>
>  > looks good to me.
>  >
>  > I only noticed that some instructs (e.g. for xscvdpspn and xxspltw) have no ?format %{ ? %}? specification so they are missing in the PrintOptoAssembly output. But this seems to be missing in the current version already.
>  >
>  > We can test it while waiting for a 2nd review.
>  >
>  > Thanks and best regards,
>  > Martin
>  >
>  >
>  > *From:* Michihiro Horie <HORIE at jp.ibm.com> *
>  > Sent:* Donnerstag, 13. September 2018 19:05*
>  > To:* Doerr, Martin <martin.doerr at sap.com>*
>  > Cc:* Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Gustavo Romero <gromero at linux.vnet.ibm.com>; hotspot-compiler-dev at openjdk.java.net*
>  > Subject:* RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad
>  >
>  > Hi Martin,
>  >
>  > Thank you so much for your review (and adding the ID in the subject :-).
>  >
>  > ?>I don?t think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used.
>  > You're right, thanks. I removed a redundant one.
>  >
>  > I also refactored ReplicateD with vector length 2.
>  >
>  > Following is the latest webrev:_
>  > __http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/_ <http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.00/>
>  >
>  >
>  > Best regards,
>  > --
>  > Michihiro,
>  > IBM Research - Tokyo
>  >
>  > Inactive hide details for "Doerr, Martin" ---2018/09/13 16:25:33---Hi Michihiro, I have added "RFR(S): 8210660" to the subject."Doerr, Martin" ---2018/09/13 16:25:33---Hi Michihiro, I have added "RFR(S): 8210660" to the subject.
>  >
>  > From: "Doerr, Martin" <_martin.doerr at sap.com_ <mailto:martin.doerr at sap.com>>
>  > To: Michihiro Horie <_HORIE at jp.ibm.com_ <mailto:HORIE at jp.ibm.com>>, "_hotspot-compiler-dev at openjdk.java.net_ <mailto:hotspot-compiler-dev at openjdk.java.net>" <_hotspot-compiler-dev at openjdk.java.net_ <mailto:hotspot-compiler-dev at openjdk.java.net>>
>  > Cc: Gustavo Romero <_gromero at linux.vnet.ibm.com_ <mailto:gromero at linux.vnet.ibm.com>>, "Lindenmaier, Goetz" <_goetz.lindenmaier at sap.com_ <mailto:goetz.lindenmaier at sap.com>>
>  > Date: 2018/09/13 16:25
>  > Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad
>  >
>  > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  >
>  >
>  >
>  >
>  > Hi Michihiro,
>  >
>  > I have added ?RFR(S): 8210660? to the subject.
>  >
>  > I don?t think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used.
>  > Besides this, your change looks good to me.
>  >
>  > Would you like to improve ReplicateD with vector length 2, too?
>  >
>  > Thanks and best regards,
>  > Martin
>  >
>  > *
>  > From:* Michihiro Horie <_HORIE at jp.ibm.com_ <mailto:HORIE at jp.ibm.com>> *
>  > Sent:* Mittwoch, 12. September 2018 18:11*
>  > To:* _hotspot-compiler-dev at openjdk.java.net_ <mailto:hotspot-compiler-dev at openjdk.java.net>*
>  > Cc:* Doerr, Martin <_martin.doerr at sap.com_ <mailto:martin.doerr at sap.com>>; Gustavo Romero <_gromero at linux.vnet.ibm.com_ <mailto:gromero at linux.vnet.ibm.com>>; Lindenmaier, Goetz <_goetz.lindenmaier at sap.com_ <mailto:goetz.lindenmaier at sap.com>>*
>  > Subject:* RFR: PPC64: Mapping floating point registers to vsx registers in ppc.ad
>  >
>  > Dear all,
>  >
>  > Would you please review the following change?
>  >
>  > Bug: _https://bugs.openjdk.java.net/browse/JDK-8210660_
>  > Webrev: _http://cr.openjdk.java.net/~mhorie/8210660/webrev.00/_ <http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.00/>
>  >
>  > In the current code emit for replicating the floating point value in ppc.ad, a floating point value is once stored in order to load as an integer value. However, when SuperwordUseVSX is enabled, this is redundant because the floating point registers are overlapped with vsx registers 0-31. We can use vsx instructions for replicating the floating point value by mapping the floating point register to the vsx register.
>  >
>  >
>  > Best regards,
>  > --
>  > Michihiro,
>  > IBM Research - Tokyo
>  >
>  >
> 
> 
> 

From jcbeyler at google.com  Tue Sep 18 04:28:50 2018
From: jcbeyler at google.com (JC Beyler)
Date: Mon, 17 Sep 2018 21:28:50 -0700
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <d57561f0-94d2-a720-b54c-f48b51051fad@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
 <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>
 <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com>
 <be409c1c-789c-3160-f1d4-55acab52d8df@oracle.com>
 <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com>
 <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com>
 <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com>
 <3c0fe42a-0606-a060-7435-32547676683b@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com>
Message-ID: <CAF9BGBzuA33v_aP7eJ_9hdnbv_jwH=GYRXcr-fG5Zzi-fdyfxw@mail.gmail.com>

Hi Sandhya,

How are you invoking the test for NativeCallTest?

The way I would do it using jtreg would be something like this:

$ export BUILD_TYPE=release
$ export JDK_PATH=wherever you have your JDK

>From the test subfolder:
$ wherever-your-jtreg-is/bin/jtreg
-nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/support/test/hotspot/jtreg/native/lib
-jdk $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk
hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java

Seems to pass for me.

But much easier is:
$ make run-test
TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java"

That seems to pass for me as well and is easier to use :)

For information, the make run-test documentation is here:
http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing.html

Let me know if that helps,
Jc

Note: For NativeCallTest.java and many others I get the following message
> in the corresponding .jtr file:
>                     "test result: Error. Use -nativepath to specify the
> location of native code"
>             Do I need to give any additional info to jtreg to get over
> this problem?
>
> Thanks a lot!
> Best Regards,
> Sandhya
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Monday, September 17, 2018 10:14 AM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>
> I finished testing on avx512 machine.
> All passed except known (TestNaNVector.java) failures.
>
> Thanks,
> Vladimir
>
> On 9/14/18 5:22 PM, Vladimir Kozlov wrote:
> > I gave incorrect link to RFE. Here is correct:
> >
> > https://bugs.openjdk.java.net/browse/JDK-8210764
> >
> > Vladimir
> >
> > On 9/14/18 3:49 PM, Vladimir Kozlov wrote:
> >> Build failure I got on MacOS and Windows. Linux passed for some reason
> so I am not surprise you did not noticed.
> >>
> >> Anyway. I tested with these changes and got next failures on avx1
> machines. I am planning to run on avx512 too.
> >>
> >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or
> '-XX:CompileThreshold=100 -XX:-TieredCompilation' on CPU
> >> with AVX1 only
> >>
> >> #  SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884
> >> # Problematic frame:
> >> # V  [libjvm.so+0x46f0f0]  MachNode::ideal_reg() const+0x20
> >>
> >> Current CompileTask:
> >> C2:    154    5             java.lang.String::equals (65 bytes)
> >>
> >> Stack: [0x00007f3b10044000,0x00007f3b10145000],
> sp=0x00007f3b1013fe70,  free space=1007k
> >> Native frames: (J=compiled Java code, A=aot compiled Java code,
> j=interpreted, Vv=VM code, C=native code)
> >> V  [libjvm.so+0x46f0f0]  MachNode::ideal_reg() const+0x20
> >> V  [libjvm.so+0x882a72]  PhaseChaitin::gather_lrg_masks(bool)+0x872
> >> V  [libjvm.so+0xd82235]  PhaseCFG::global_code_motion()+0xfc5
> >> V  [libjvm.so+0xd824b1]  PhaseCFG::do_global_code_motion()+0x51
> >> V  [libjvm.so+0xa2c26c]  Compile::Code_Gen()+0x24c
> >> V  [libjvm.so+0xa2ff82]  Compile::Compile(ciEnv*, C2Compiler*,
> ciMethod*, int, bool, bool, bool, DirectiveSet*)+0xe42
> >>
> >>
> ------------------------------------------------------------------------------------------------
> >> 2.
> compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java
> with '-Xcomp'
> >> #  Internal Error
> (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016,
> tid=22073
> >> #  assert(false) failed: cannot spill interval that is used in first
> instruction (possible reason: no register found)
> >>
> >> Current CompileTask:
> >> C1: 854767 13391       3       org.sunflow.math.Matrix4::multiply (692
> bytes)
> >>
> >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],
> sp=0x00007f23b9e7f9d0,  free space=1014k
> >> Native frames: (J=compiled Java code, A=aot compiled Java code,
> j=interpreted, Vv=VM code, C=native code)
> >> V  [libjvm.so+0x1882202]  VMError::report_and_die(int, char const*,
> char const*, __va_list_tag*, Thread*, unsigned
> >> char*, void*, void*, char const*, int, unsigned long)+0x562
> >> V  [libjvm.so+0x1882d2f]  VMError::report_and_die(Thread*, void*, char
> const*, int, char const*, char const*,
> >> __va_list_tag*)+0x2f
> >> V  [libjvm.so+0xb0bea0]  report_vm_error(char const*, int, char const*,
> char const*, ...)+0x100
> >> V  [libjvm.so+0x7e0410]
> LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0
> >> V  [libjvm.so+0x7e0a20]  LinearScanWalker::activate_current()+0x280
> >> V  [libjvm.so+0x7e0c7d]  IntervalWalker::walk_to(int) [clone
> .constprop.299]+0x9d
> >> V  [libjvm.so+0x7e1078]  LinearScan::allocate_registers()+0x338
> >> V  [libjvm.so+0x7e2135]  LinearScan::do_linear_scan()+0x155
> >> V  [libjvm.so+0x70a6bb]  Compilation::emit_lir()+0x99b
> >> V  [libjvm.so+0x70caff]  Compilation::compile_java_method()+0x42f
> >> V  [libjvm.so+0x70d974]  Compilation::compile_method()+0x1d4
> >> V  [libjvm.so+0x70e547]  Compilation::Compilation(AbstractCompiler*,
> ciEnv*, ciMethod*, int, BufferBlob*,
> >> DirectiveSet*)+0x357
> >> V  [libjvm.so+0x71073c]  Compiler::compile_method(ciEnv*, ciMethod*,
> int, DirectiveSet*)+0x14c
> >> V  [libjvm.so+0xa3cf89]
> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409
> >>
> >> Vladimir
> >>
> >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote:
> >>>
> >>> Thanks Vladimir, the below should fix this issue:
> >>>
> >>> ------------------------------
> >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14
> 13:10:23.488379912 -0700
> >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14
> 13:10:23.308379915 -0700
> >>> @@ -233,22 +233,6 @@
> >>>     _xmm_regs[13]  = xmm13;
> >>>     _xmm_regs[14]  = xmm14;
> >>>     _xmm_regs[15]  = xmm15;
> >>> -  _xmm_regs[16]  = xmm16;
> >>> -  _xmm_regs[17]  = xmm17;
> >>> -  _xmm_regs[18]  = xmm18;
> >>> -  _xmm_regs[19]  = xmm19;
> >>> -  _xmm_regs[20]  = xmm20;
> >>> -  _xmm_regs[21]  = xmm21;
> >>> -  _xmm_regs[22]  = xmm22;
> >>> -  _xmm_regs[23]  = xmm23;
> >>> -  _xmm_regs[24]  = xmm24;
> >>> -  _xmm_regs[25]  = xmm25;
> >>> -  _xmm_regs[26]  = xmm26;
> >>> -  _xmm_regs[27]  = xmm27;
> >>> -  _xmm_regs[28]  = xmm28;
> >>> -  _xmm_regs[29]  = xmm29;
> >>> -  _xmm_regs[30]  = xmm30;
> >>> -  _xmm_regs[31]  = xmm31;
> >>>   #endif // _LP64
> >>>
> >>>     for (int i = 0; i < 8; i++) {
> >>> ---------------------------------
> >>>
> >>> I think the gcc version on my desktop is older so didn?t catch this.
> >>>
> >>> The updated patch along with the above change and setting UseAVX as 3
> is uploaded to:
> >>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/
> >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735
> >>>
> >>> FYI, I did notice that the default for UseAVX had been rolled back and
> wanted to get confirmation from you before
> >>> changing it back to 3.
> >>>
> >>> Best Regards,
> >>> Sandhya
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> >>> Sent: Friday, September 14, 2018 12:13 PM
> >>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> hotspot-compiler-dev at openjdk.java.net
> >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
> >>>
> >>> I got build failure:
> >>>
> >>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error:
> array index 16 is past the end of the array
> >>> (which contains 16 elements) [-Werror,-Warray-bounds]
> >>> jib >   _xmm_regs[16]  = xmm16;
> >>>
> >>> I also noticed that we don't have RFE for this work. I filed:
> >>>
> >>> https://bugs.openjdk.java.net/browse/JDK-8209735
> >>>
> >>> You did not enabled avx512 by default (8209735 change was synced from
> jdk 11 into 12 2 weeks ago). I added next
> >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp:
> >>>
> >>> - product(intx, UseAVX, 2, \
> >>> + product(intx, UseAVX, 3, \
> >>>
> >>> Thanks,
> >>> Vladimir
> >>>
> >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote:
> >>>> Looks good to me. I will start testing and let you know results.
> >>>>
> >>>> Thanks,
> >>>> Vladimir
> >>>>
> >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
> >>>>> Hi Vladimir,
> >>>>>
> >>>>> Please find below the updated webrev with all your comments
> incorporated:
> >>>>>
> >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
> >>>>>
> >>>>> I have run the jtreg compiler tests on SKX and KNL which have two
> >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also
> tested SPECjvm2008 on the three platforms.
> >>>>>
> >>>>> Best Regards,
> >>>>> Sandhya
> >>>>>
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> >>>>> Sent: Tuesday, September 11, 2018 8:54 PM
> >>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> >>>>> hotspot-compiler-dev at openjdk.java.net
> >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
> >>>>>
> >>>>> Thank you, Sandhya
> >>>>>
> >>>>> I am satisfied with your detailed answer for memory loads issues.
> Okay lets not add them.
> >>>>>
> >>>>> Vladimir
> >>>>>
> >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
> >>>>>> Hi Vladimir,
> >>>>>>
> >>>>>> Thanks a lot for the detailed review. I really appreciate your
> feedback.
> >>>>>> Please see my response in your email below marked with (Sandhya
> >>>>>>>>> ). Looking forward to your advice.
> >>>>>>
> >>>>>> Best Regards,
> >>>>>> Sandhya
> >>>>>>
> >>>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM
> >>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> >>>>>> hotspot-compiler-dev at openjdk.java.net
> >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
> >>>>>> instruction
> >>>>>>
> >>>>>> Thank you.
> >>>>>>
> >>>>>> I want to discuss next issue:
> >>>>>>
> >>>>>>     > You did not added instructions to load these registers from
> >>>>>> memory (and stack). What happens in such cases when you need to
> load or store?
> >>>>>>     >>>> Let us take an example, e.g. for loading into rregF. First
> >>>>>> it gets loaded from memory into regF and then register to register
> move to rregF and vice versa.
> >>>>>>
> >>>>>> This is what I thought. You increase registers pressure this way
> which may cause spills on stack.
> >>>>>> Also we don't check that register could be the same as result you
> may get unneeded moves.
> >>>>>>
> >>>>>> I would advice add memory moves at least.
> >>>>>>
> >>>>>> Sandhya >>>  I had added those rules initially and removed them in
> >>>>>> the final patch. I noticed that the register allocator uses the
> >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask
> >>>>>> (matcher.cpp). I would like the register allocator to get all the
> >>>>>> possible register on an architecture for idealreg2reg mask. I
> >>>>>> wondered that multiple instruct rules in .ad file for LoadF from
> >>>>>> memory might cause problems.  I would have to have higher cost for
> >>>>>> loading into restricted register set like vlReg. Then I decided that
> >>>>>> the register allocator can handle this in much better way than me
> >>>>>> adding rules to load from memory. This is with the background that
> the regF is always all the available registers
> >>>>>> and vlRegF is the restricted register set. Likewise for VecS and
> legVecS. Let me know you thoughts on this and if
> >>>>>> I should still add the rules to load from memory into vlReg and
> legVec. The specific code from matcher.cpp that I
> >>>>>> am referring to is:
> >>>>>>      MachNode *spillCP = match_tree(new
> >>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
> >>>>>> #endif
> >>>>>>      MachNode *spillI  = match_tree(new
> >>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
> >>>>>>      MachNode *spillL  = match_tree(new
> >>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered,
> >>>>>> LoadNode::DependsO nlyOnTest, false));
> >>>>>>      MachNode *spillF  = match_tree(new
> >>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
> >>>>>>      MachNode *spillD  = match_tree(new
> >>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
> >>>>>>      MachNode *spillP  = match_tree(new
> >>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
> >>>>>>      ....
> >>>>>>      idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
> >>>>>>
> >>>>>> An other question. You use movflt() and movdbl() which use either
> >>>>>> movap[s|d] and movs[s|d]
> >>>>>> instructions:
> >>>>>>
> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu
> >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when
> >>>>>> avx512vl is not available? I see for vectors you use
> >>>>>> vpxor+vinserti* combination.
> >>>>>>
> >>>>>> Sandhya >>> Yes the scalar floating point instructions are available
> >>>>>> with AVX512 encoding when avx512vl is not available. That is why you
> >>>>>> would see not just movflt, movdbl but all the other scalar
> >>>>>> operations like adds, addsd etc using the entire xmm range
> (xmm0-31). In other words they are AVX512F instructions.
> >>>>>>
> >>>>>> Last question. I notice next UseAVX check in vectors spills code in
> x86.ad:
> >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
> >>>>>>
> >>>>>> Should it be (UseAVX < 3)?
> >>>>>>
> >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the
> updated webrev.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Vladimir
> >>>>>>
> >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
> >>>>>>> Hi Vladimir,
> >>>>>>>
> >>>>>>> Thanks a lot for your review and feedback. Please see my response
> >>>>>>> in your email below. I will send an updated webrev incorporating
> your feedback.
> >>>>>>>
> >>>>>>> Best Regards,
> >>>>>>> Sandhya
> >>>>>>>
> >>>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> >>>>>>> Sent: Monday, September 10, 2018 6:09 PM
> >>>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> >>>>>>> hotspot-compiler-dev at openjdk.java.net
> >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
> >>>>>>> instruction
> >>>>>>>
> >>>>>>> Very nice. Thank you, Sandhya.
> >>>>>>>
> >>>>>>> I would like to see more meaningful naming in .ad files - instead
> >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*.
> >>>>>>>
> >>>>>>>>>> Yes, accepted.
> >>>>>>>
> >>>>>>> New load_from_* and load_to_* instructions in .ad files should be
> >>>>>>> renamed to next and collocate with other Move*_reg_reg*
> instructions:
> >>>>>>>
> >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst,
> >>>>>>> vlRegF src)
> >>>>>>>>>> Yes, accepted.
> >>>>>>>
> >>>>>>> You did not added instructions to load these registers from memory
> >>>>>>> (and stack). What happens in such cases when you need to load or
> store?
> >>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it
> >>>>>>>>>> gets loaded from memory into regF and then register to register
> move to rregF and vice versa.
> >>>>>>>
> >>>>>>> Also please explain why these registers are used when UseAVX == 0?:
> >>>>>>>
> >>>>>>> +instruct absD_reg(rregD dst) %{
> >>>>>>>         predicate((UseSSE>=2) && (UseAVX == 0));
> >>>>>>>
> >>>>>>> we switch off evex so regular regD (only legacy register in this
> case) should work too:
> >>>>>>>       661   if (UseAVX < 3) {
> >>>>>>>       662     _features &= ~CPU_AVX512F;
> >>>>>>>
> >>>>>>>>>> Yes, accepted. It could be regD here.
> >>>>>>>
> >>>>>>> Next checks could be combined by using new function in
> vm_version_x86.hpp (you already have some):
> >>>>>>>
> >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
> >>>>>>> +vectors_reg_legacy, %{
> >>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
> >>>>>>> VM_Version::supports_avx512dq() &&
> >>>>>>> VM_Version::supports_avx512vl() %} );
> >>>>>>>
> >>>>>>>>>> Yes, accepted.
> >>>>>>>
> >>>>>>> I would suggest to test these changes on different machines
> >>>>>>> (non-avx512 and avx512) and with different UseAVX values.
> >>>>>>>
> >>>>>>>>>> Will do.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Vladimir
> >>>>>>>
> >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
> >>>>>>>> Recently there have been couple of high priority issues with
> >>>>>>>> regards to high bank of XMM register
> >>>>>>>> (XMM16-XMM31) usage by C2:
> >>>>>>>>
> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
> >>>>>>>>
> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
> >>>>>>>>
> >>>>>>>> Please find below a patch which attempts to clean up the XMM
> >>>>>>>> register handling by using register groups.
> >>>>>>>>
> >>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
> >>>>>>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
> >>>>>>>>
> >>>>>>>> The patch provides a restricted set of registers to the match
> >>>>>>>> rules in the ad file based on the underlying architecture.
> >>>>>>>>
> >>>>>>>> The aim is to remove special handling/workaround from macro
> assembler and assembler.
> >>>>>>>>
> >>>>>>>> By removing the special handling, the patch reduces the overall
> >>>>>>>> code size by about 1800 lines of code.
> >>>>>>>>
> >>>>>>>> Your review and feedback is very welcome.
> >>>>>>>>
> >>>>>>>> Best Regards,
> >>>>>>>>
> >>>>>>>> Sandhya
> >>>>>>>>
>


-- 

Thanks,
Jc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180917/83fc0cbb/attachment-0001.html>

From Pengfei.Li at arm.com  Tue Sep 18 07:13:00 2018
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Tue, 18 Sep 2018 07:13:00 +0000
Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by
 power-of-2 check
In-Reply-To: <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com>
References: <DB7PR08MB31150B1D6C7E547538B2B99A96050@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <bcf5438c-50f9-7d75-b4d8-d09b89783816@oracle.com>
 <dk61sa0wigj.fsf@rwestrel.remote.csb>
 <DB7PR08MB31151A5DD37FAB07FDCB9B29961B0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com>
Message-ID: <DB7PR08MB31154E9C6A3B07BF27F4D6B2961D0@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi Reviewers,

Is there any other comments, objections or suggestions on the new webrev?
If no problems, could anyone help to push this commit?

I look forward to your response.

--
Thanks,
Pengfei

> -----Original Message-----
> 
> Looks good.
> 
> Thanks,
> Vladimir
> 
> On 9/12/18 2:50 AM, Pengfei Li (Arm Technology China) wrote:
> > Hi,
> >
> > I've updated the patch based on Vladimir's comment. I added checks for
> SubI on both branches of the diamond phi.
> > Also thanks Roland for the suggestion that supporting a Phi with 3 or more
> inputs. But I think the matching rule will be much more complex if we add
> this. And I'm not sure if there are any real case scenario which can benefit
> from this support. So I didn't add it in.
> >
> > New webrev: http://cr.openjdk.java.net/~njian/8210152/webrev.01/
> > I've run jtreg full test with the new patch and no new issues found.
> >
> > Please let me know if you have other comments or suggestions. If no
> further issues, I need your help to sponsor and push the patch.
> >
> > --
> > Thanks,
> > Pengfei
> >
> >

From Pengfei.Li at arm.com  Tue Sep 18 07:40:53 2018
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Tue, 18 Sep 2018 07:40:53 +0000
Subject: [aarch64-port-dev ] RFR(S): 8210413: AArch64: Optimize div/rem by
 constant in C1
In-Reply-To: <9645c210-3d87-52fa-8051-54dc60629866@redhat.com>
References: <DB7PR08MB3115445A18A786BAFD1F7B08961A0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <9645c210-3d87-52fa-8051-54dc60629866@redhat.com>
Message-ID: <DB7PR08MB31150364BEBF370C3B4F8B33961D0@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi Andrew,

Thanks for your reminder. I'm adding this for longs and testing it.
I will send out a new webrev later.

--
Thanks,
Pengfei

> -----Original Message-----
> 
> Hi,
> 
> On 09/13/2018 10:04 AM, Pengfei Li (Arm Technology China) wrote:
> 
> > Could you please help review this optimization in C1 AArch64?
> > JBS: https://bugs.openjdk.java.net/browse/JDK-8210413
> > webrev: http://cr.openjdk.java.net/~njian/8210413/webrev.00/
> 
> It looks fine, but it's really odd that this is only implemented for ints and not
> longs. Can you do longs too?
> 
> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From rkennke at redhat.com  Tue Sep 18 07:58:42 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 18 Sep 2018 09:58:42 +0200
Subject: RFR: JDK-8210829: Modularize allocations in C2
Message-ID: <8fe187e2-b3fb-5473-257f-e4f6b6a94fcb@redhat.com>

Similar to what we've done before to runtime, interpreter and C1,
allocations should be owned and implemented by GC, and possible to
override by specific collectors. For example, Shenandoah lays out
objects differently in heap, and needs one extra store to initialize
objects.

This proposed change factors out the interesting part of object
allocation (i.e. the actual allocation) into BarrierSetC2. It's mostly a
move-and-rename-job. I had to move some little things around, that is:
- for the need-gc-check, I'm passing back the needgc_ctrl to plug into
slow-path
- for prefetching, instead of passing around the 'length' node, only to
determine the number of prefetch lines, I determine this early, and pass
through the lines arg.
- i_o, needgc_ctrl, fast_oop_ctrl, fast_oop_rawmem are passed as in/out
or out-args to stitch together into the regions and phis as appropriate.
I see no easy way around that.

I tested this using hotspot/jtreg:tier1 and also verified that this
fills Shenandoah's needs and run tier3_gc_shenandoah testsuite.

http://cr.openjdk.java.net/~rkennke/JDK-8210829/webrev.00/

Can I please get reviews?
Thanks,
Roman

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180918/2c4fa3d8/signature.asc>

From doug.simon at oracle.com  Tue Sep 18 08:00:09 2018
From: doug.simon at oracle.com (Doug Simon)
Date: Tue, 18 Sep 2018 10:00:09 +0200
Subject: RFR: 8210793: [JVMCI] AllocateCompileIdTest.java failed to find
 DiagnosticCommand.class
Message-ID: <8FC2A8D7-F1FA-4DF0-857E-8A6B0D617884@oracle.com>

Please review this tiny change to ensure that the DiagnosticCommand test library class is compiled as part of AllocateCompileIdTest.

https://bugs.openjdk.java.net/browse/JDK-8210793
http://cr.openjdk.java.net/~dnsimon/8210793/

-Doug

From tobias.hartmann at oracle.com  Tue Sep 18 12:18:51 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 18 Sep 2018 08:18:51 -0400
Subject: RFR: 8210793: [JVMCI] AllocateCompileIdTest.java failed to find
 DiagnosticCommand.class
In-Reply-To: <8FC2A8D7-F1FA-4DF0-857E-8A6B0D617884@oracle.com>
References: <8FC2A8D7-F1FA-4DF0-857E-8A6B0D617884@oracle.com>
Message-ID: <529cd19b-faaa-f68c-333c-7a60b0f06d7f@oracle.com>

Hi Doug,

looks good to me.

Best regards,
Tobias

On 18.09.2018 04:00, Doug Simon wrote:
> Please review this tiny change to ensure that the DiagnosticCommand test library class is compiled as part of AllocateCompileIdTest.
> 
> https://bugs.openjdk.java.net/browse/JDK-8210793
> http://cr.openjdk.java.net/~dnsimon/8210793/
> 
> -Doug
> 

From doug.simon at oracle.com  Tue Sep 18 13:08:03 2018
From: doug.simon at oracle.com (Doug Simon)
Date: Tue, 18 Sep 2018 15:08:03 +0200
Subject: RFR: 8210793: [JVMCI] AllocateCompileIdTest.java failed to find
 DiagnosticCommand.class
In-Reply-To: <529cd19b-faaa-f68c-333c-7a60b0f06d7f@oracle.com>
References: <8FC2A8D7-F1FA-4DF0-857E-8A6B0D617884@oracle.com>
 <529cd19b-faaa-f68c-333c-7a60b0f06d7f@oracle.com>
Message-ID: <746A2C0D-6AD9-4904-BD99-A1BFC57FBC85@oracle.com>

Thanks Tobias.

-Doug

> On 18 Sep 2018, at 14:18, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> Hi Doug,
> 
> looks good to me.
> 
> Best regards,
> Tobias
> 
> On 18.09.2018 04:00, Doug Simon wrote:
>> Please review this tiny change to ensure that the DiagnosticCommand test library class is compiled as part of AllocateCompileIdTest.
>> 
>> https://bugs.openjdk.java.net/browse/JDK-8210793
>> http://cr.openjdk.java.net/~dnsimon/8210793/
>> 
>> -Doug
>> 


From kuaiwei.kw at alibaba-inc.com  Tue Sep 18 13:33:50 2018
From: kuaiwei.kw at alibaba-inc.com (Kuai Wei)
Date: Tue, 18 Sep 2018 21:33:50 +0800
Subject: =?UTF-8?B?W1BhdGNoXSA4MjEwODUzOiBDMiBkb2Vzbid0IHNraXAgcG9zdCBiYXJyaWVyIGZvciBuZXcg?=
 =?UTF-8?B?YWxsb2NhdGVkIG9iamVjdHM=?=
Message-ID: <37b6e1a5-6dcf-4097-baca-3f9c03d38bb1.kuaiwei.kw@alibaba-inc.com>


Hi,

  I made a patch to https://bugs.openjdk.java.net/browse/JDK-8210853 . Could you help review my change?

Background:
  C2 could remove G1 post barrier if store to new allocated object. But the check of just_allocated_object will be prevent by a Region node which is created when inline initialize method of super class. The change is to check the pattern and skip the Region node.

src/hotspot/share/opto/graphKit.cpp

 // We use this to determine if an object is so "fresh" that
 // it does not require card marks.
 Node* GraphKit::just_allocated_object(Node* current_control) {
-  if (C->recent_alloc_ctl() == current_control)
+  Node * ctrl = current_control;
+  // Object::<init> is invoked after allocation, most of invoke nodes
+  // will be reduced, but a region node is kept in parse time, we check
+  // the pattern and skip the region node
+  if (ctrl != NULL && ctrl->is_Region() && ctrl->req() == 2) {
+    ctrl = ctrl->in(1);
+  }
+  if (C->recent_alloc_ctl() == ctrl)
     return C->recent_alloc_obj();
   return NULL;
 }
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180918/e07e1333/attachment.html>

From vladimir.kozlov at oracle.com  Tue Sep 18 16:40:32 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 18 Sep 2018 09:40:32 -0700
Subject: RFR: 8210793: [JVMCI] AllocateCompileIdTest.java failed to find
 DiagnosticCommand.class
In-Reply-To: <529cd19b-faaa-f68c-333c-7a60b0f06d7f@oracle.com>
References: <8FC2A8D7-F1FA-4DF0-857E-8A6B0D617884@oracle.com>
 <529cd19b-faaa-f68c-333c-7a60b0f06d7f@oracle.com>
Message-ID: <2226eb54-622f-5cc6-c6e6-4a302a94d3c5@oracle.com>

+1

Thanks,
Vladimir

On 9/18/18 5:18 AM, Tobias Hartmann wrote:
> Hi Doug,
> 
> looks good to me.
> 
> Best regards,
> Tobias
> 
> On 18.09.2018 04:00, Doug Simon wrote:
>> Please review this tiny change to ensure that the DiagnosticCommand test library class is compiled as part of AllocateCompileIdTest.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8210793
>> http://cr.openjdk.java.net/~dnsimon/8210793/
>>
>> -Doug
>>

From vladimir.kozlov at oracle.com  Tue Sep 18 17:41:51 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 18 Sep 2018 10:41:51 -0700
Subject: RFR: JDK-8210829: Modularize allocations in C2
In-Reply-To: <8fe187e2-b3fb-5473-257f-e4f6b6a94fcb@redhat.com>
References: <8fe187e2-b3fb-5473-257f-e4f6b6a94fcb@redhat.com>
Message-ID: <09c53a34-eb68-a614-ee03-468673004b3b@oracle.com>

Hi Roman,

This looks good. I looked through changes and it generates the same ideal graph as before.
It seems you unintentionally changed indent of the comment in barrierSetC2.hpp

Thanks,
Vladimir

On 9/18/18 12:58 AM, Roman Kennke wrote:
> Similar to what we've done before to runtime, interpreter and C1,
> allocations should be owned and implemented by GC, and possible to
> override by specific collectors. For example, Shenandoah lays out
> objects differently in heap, and needs one extra store to initialize
> objects.
> 
> This proposed change factors out the interesting part of object
> allocation (i.e. the actual allocation) into BarrierSetC2. It's mostly a
> move-and-rename-job. I had to move some little things around, that is:
> - for the need-gc-check, I'm passing back the needgc_ctrl to plug into
> slow-path
> - for prefetching, instead of passing around the 'length' node, only to
> determine the number of prefetch lines, I determine this early, and pass
> through the lines arg.
> - i_o, needgc_ctrl, fast_oop_ctrl, fast_oop_rawmem are passed as in/out
> or out-args to stitch together into the regions and phis as appropriate.
> I see no easy way around that.
> 
> I tested this using hotspot/jtreg:tier1 and also verified that this
> fills Shenandoah's needs and run tier3_gc_shenandoah testsuite.
> 
> http://cr.openjdk.java.net/~rkennke/JDK-8210829/webrev.00/
> 
> Can I please get reviews?
> Thanks,
> Roman
> 

From rwestrel at redhat.com  Tue Sep 18 19:47:46 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 18 Sep 2018 21:47:46 +0200
Subject: RFR(S): 8210389: C2: assert(n->outcnt() != 0 || C->top() == n ||
 n->is_Proj()) failed: No dead instructions after post-alloc
Message-ID: <dk68t3y8u59.fsf@rwestrel.remote.csb>


http://cr.openjdk.java.net/~roland/8210389/webrev.00/

With volatile loads, the trailing membar has an edge to the load. After
optimizations, that edge can point to a chain of Phis and the membar can
be the one use that keeps the phis alive. After matching, that required
edge is converted to a precedence edge. Liveness analysis ignores
precedence edges, the chain of phis is killed and register allocation
finds a node with no use.

As a fix, I propose that, at the end of optimizations, the edge between
the volatile load's membar and the phis be removed and all dead phis be
killed. As I understand, that edge is not required for correctness
because anti dependencies detection code adds a precedence edge between
a volatile load and its membar if needed. I ran full jcstress on x86 and
aarch64 with this patch successfully.

Roland.

From rwestrel at redhat.com  Tue Sep 18 19:57:53 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 18 Sep 2018 21:57:53 +0200
Subject: RFR(M): 8210885: Convert left over loads/stores to access api
Message-ID: <dk65zz28toe.fsf@rwestrel.remote.csb>


http://cr.openjdk.java.net/~roland/8210885/webrev.00/

This converts some remaining loads and stores to the access API (as
preparation for shenandoah). This also cleans up the C2 access API: some
entry points get a control argument that's in practice useless because
current control() is always used.

Roland.

From rwestrel at redhat.com  Tue Sep 18 20:09:46 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 18 Sep 2018 22:09:46 +0200
Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy
Message-ID: <dk636u68t4l.fsf@rwestrel.remote.csb>


http://cr.openjdk.java.net/~roland/8210887/webrev.00/

This extends the entry point of the c2 access api for arraycopy (in
preparation for shenandoah). This also fixes some incorrect
_adr_type's. It also modifies the ArrayCopyNode::Ideal() transform that
produces a series of loads/stores so, as a subsequent change, barriers
can be added to loads and stores: they need to produce and consume
memory state other than the slice of the array being copied.

Roland.

From rwestrel at redhat.com  Tue Sep 18 20:17:05 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 18 Sep 2018 22:17:05 +0200
Subject: RFR(S): 8210390: C2 still crashes with "assert(mode ==
 ControlAroundStripMined && use == sfpt) failed: missed a node"
In-Reply-To: <22635c3c-4c0e-ccc1-9853-46ffa56dd96c@oracle.com>
References: <dk6bm90r01v.fsf@rwestrel.remote.csb>
 <0976877f-eaa7-c5e6-fb52-0c0a70467c84@oracle.com>
 <22635c3c-4c0e-ccc1-9853-46ffa56dd96c@oracle.com>
Message-ID: <dk6zhwe7e7y.fsf@rwestrel.remote.csb>


Thanks for the review, Vladimir.

Roland.

From sandhya.viswanathan at intel.com  Tue Sep 18 20:47:18 2018
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Tue, 18 Sep 2018 20:47:18 +0000
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <CAF9BGBzuA33v_aP7eJ_9hdnbv_jwH=GYRXcr-fG5Zzi-fdyfxw@mail.gmail.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <d57561f0-94d2-a720-b54c-f48b51051fad@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
 <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>
 <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com>
 <be409c1c-789c-3160-f1d4-55acab52d8df@oracle.com>
 <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com>
 <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com>
 <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com>
 <3c0fe42a-0606-a060-7435-32547676683b@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com>
 <CAF9BGBzuA33v_aP7eJ_9hdnbv_jwH=GYRXcr-fG5Zzi-fdyfxw@mail.gmail.com>
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF4A75@FMSMSX126.amr.corp.intel.com>

Hi Jc,

Thanks a lot for the steps. I am now able to verify my fix for the NativeCallTest.java.

Best Regards,
Sandhya

From: JC Beyler [mailto:jcbeyler at google.com]
Sent: Monday, September 17, 2018 9:29 PM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
Cc: vladimir.kozlov at oracle.com; hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction

Hi Sandhya,

How are you invoking the test for NativeCallTest?

The way I would do it using jtreg would be something like this:

$ export BUILD_TYPE=release
$ export JDK_PATH=wherever you have your JDK

From the test subfolder:
$ wherever-your-jtreg-is/bin/jtreg -nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/support/test/hotspot/jtreg/native/lib -jdk $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java

Seems to pass for me.

But much easier is:
$ make run-test TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java"

That seems to pass for me as well and is easier to use :)

For information, the make run-test documentation is here:
http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing.html

Let me know if that helps,
Jc

Note: For NativeCallTest.java and many others I get the following message in the corresponding .jtr file:
                    "test result: Error. Use -nativepath to specify the location of native code"
            Do I need to give any additional info to jtreg to get over this problem?

Thanks a lot!
Best Regards,
Sandhya

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com<mailto:vladimir.kozlov at oracle.com>]
Sent: Monday, September 17, 2018 10:14 AM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com<mailto:sandhya.viswanathan at intel.com>>; hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction

I finished testing on avx512 machine.
All passed except known (TestNaNVector.java) failures.

Thanks,
Vladimir

On 9/14/18 5:22 PM, Vladimir Kozlov wrote:
> I gave incorrect link to RFE. Here is correct:
>
> https://bugs.openjdk.java.net/browse/JDK-8210764
>
> Vladimir
>
> On 9/14/18 3:49 PM, Vladimir Kozlov wrote:
>> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed.
>>
>> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too.
>>
>> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation' on CPU
>> with AVX1 only
>>
>> #  SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884
>> # Problematic frame:
>> # V  [libjvm.so+0x46f0f0]  MachNode::ideal_reg() const+0x20
>>
>> Current CompileTask:
>> C2:    154    5             java.lang.String::equals (65 bytes)
>>
>> Stack: [0x00007f3b10044000,0x00007f3b10145000],  sp=0x00007f3b1013fe70,  free space=1007k
>> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V  [libjvm.so+0x46f0f0]  MachNode::ideal_reg() const+0x20
>> V  [libjvm.so+0x882a72]  PhaseChaitin::gather_lrg_masks(bool)+0x872
>> V  [libjvm.so+0xd82235]  PhaseCFG::global_code_motion()+0xfc5
>> V  [libjvm.so+0xd824b1]  PhaseCFG::do_global_code_motion()+0x51
>> V  [libjvm.so+0xa2c26c]  Compile::Code_Gen()+0x24c
>> V  [libjvm.so+0xa2ff82]  Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, DirectiveSet*)+0xe42
>>
>> ------------------------------------------------------------------------------------------------
>> 2. compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java with '-Xcomp'
>> #  Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073
>> #  assert(false) failed: cannot spill interval that is used in first instruction (possible reason: no register found)
>>
>> Current CompileTask:
>> C1: 854767 13391       3       org.sunflow.math.Matrix4::multiply (692 bytes)
>>
>> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],  sp=0x00007f23b9e7f9d0,  free space=1014k
>> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V  [libjvm.so+0x1882202]  VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned
>> char*, void*, void*, char const*, int, unsigned long)+0x562
>> V  [libjvm.so+0x1882d2f]  VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*,
>> __va_list_tag*)+0x2f
>> V  [libjvm.so+0xb0bea0]  report_vm_error(char const*, int, char const*, char const*, ...)+0x100
>> V  [libjvm.so+0x7e0410]  LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0
>> V  [libjvm.so+0x7e0a20]  LinearScanWalker::activate_current()+0x280
>> V  [libjvm.so+0x7e0c7d]  IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d
>> V  [libjvm.so+0x7e1078]  LinearScan::allocate_registers()+0x338
>> V  [libjvm.so+0x7e2135]  LinearScan::do_linear_scan()+0x155
>> V  [libjvm.so+0x70a6bb]  Compilation::emit_lir()+0x99b
>> V  [libjvm.so+0x70caff]  Compilation::compile_java_method()+0x42f
>> V  [libjvm.so+0x70d974]  Compilation::compile_method()+0x1d4
>> V  [libjvm.so+0x70e547]  Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*,
>> DirectiveSet*)+0x357
>> V  [libjvm.so+0x71073c]  Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c
>> V  [libjvm.so+0xa3cf89]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409
>>
>> Vladimir
>>
>> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote:
>>>
>>> Thanks Vladimir, the below should fix this issue:
>>>
>>> ------------------------------
>>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700
>>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700
>>> @@ -233,22 +233,6 @@
>>>     _xmm_regs[13]  = xmm13;
>>>     _xmm_regs[14]  = xmm14;
>>>     _xmm_regs[15]  = xmm15;
>>> -  _xmm_regs[16]  = xmm16;
>>> -  _xmm_regs[17]  = xmm17;
>>> -  _xmm_regs[18]  = xmm18;
>>> -  _xmm_regs[19]  = xmm19;
>>> -  _xmm_regs[20]  = xmm20;
>>> -  _xmm_regs[21]  = xmm21;
>>> -  _xmm_regs[22]  = xmm22;
>>> -  _xmm_regs[23]  = xmm23;
>>> -  _xmm_regs[24]  = xmm24;
>>> -  _xmm_regs[25]  = xmm25;
>>> -  _xmm_regs[26]  = xmm26;
>>> -  _xmm_regs[27]  = xmm27;
>>> -  _xmm_regs[28]  = xmm28;
>>> -  _xmm_regs[29]  = xmm29;
>>> -  _xmm_regs[30]  = xmm30;
>>> -  _xmm_regs[31]  = xmm31;
>>>   #endif // _LP64
>>>
>>>     for (int i = 0; i < 8; i++) {
>>> ---------------------------------
>>>
>>> I think the gcc version on my desktop is older so didn?t catch this.
>>>
>>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to:
>>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/
>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735
>>>
>>> FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you before
>>> changing it back to 3.
>>>
>>> Best Regards,
>>> Sandhya
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com<mailto:vladimir.kozlov at oracle.com>]
>>> Sent: Friday, September 14, 2018 12:13 PM
>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com<mailto:sandhya.viswanathan at intel.com>>; hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>>
>>> I got build failure:
>>>
>>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the array
>>> (which contains 16 elements) [-Werror,-Warray-bounds]
>>> jib >   _xmm_regs[16]  = xmm16;
>>>
>>> I also noticed that we don't have RFE for this work. I filed:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>
>>> You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next
>>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp:
>>>
>>> - product(intx, UseAVX, 2, \
>>> + product(intx, UseAVX, 3, \
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote:
>>>> Looks good to me. I will start testing and let you know results.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>> Please find below the updated webrev with all your comments incorporated:
>>>>>
>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
>>>>>
>>>>> I have run the jtreg compiler tests on SKX and KNL which have two
>>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms.
>>>>>
>>>>> Best Regards,
>>>>> Sandhya
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com<mailto:vladimir.kozlov at oracle.com>]
>>>>> Sent: Tuesday, September 11, 2018 8:54 PM
>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com<mailto:sandhya.viswanathan at intel.com>>;
>>>>> hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>>>>
>>>>> Thank you, Sandhya
>>>>>
>>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.
>>>>>
>>>>> Vladimir
>>>>>
>>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
>>>>>> Hi Vladimir,
>>>>>>
>>>>>> Thanks a lot for the detailed review. I really appreciate your feedback.
>>>>>> Please see my response in your email below marked with (Sandhya
>>>>>>>>> ). Looking forward to your advice.
>>>>>>
>>>>>> Best Regards,
>>>>>> Sandhya
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com<mailto:vladimir.kozlov at oracle.com>]
>>>>>> Sent: Tuesday, September 11, 2018 5:11 PM
>>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com<mailto:sandhya.viswanathan at intel.com>>;
>>>>>> hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>>>>> instruction
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> I want to discuss next issue:
>>>>>>
>>>>>>     > You did not added instructions to load these registers from
>>>>>> memory (and stack). What happens in such cases when you need to load or store?
>>>>>>     >>>> Let us take an example, e.g. for loading into rregF. First
>>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>>>>
>>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack.
>>>>>> Also we don't check that register could be the same as result you may get unneeded moves.
>>>>>>
>>>>>> I would advice add memory moves at least.
>>>>>>
>>>>>> Sandhya >>>  I had added those rules initially and removed them in
>>>>>> the final patch. I noticed that the register allocator uses the
>>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask
>>>>>> (matcher.cpp). I would like the register allocator to get all the
>>>>>> possible register on an architecture for idealreg2reg mask. I
>>>>>> wondered that multiple instruct rules in .ad file for LoadF from
>>>>>> memory might cause problems.  I would have to have higher cost for
>>>>>> loading into restricted register set like vlReg. Then I decided that
>>>>>> the register allocator can handle this in much better way than me
>>>>>> adding rules to load from memory. This is with the background that the regF is always all the available registers
>>>>>> and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if
>>>>>> I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I
>>>>>> am referring to is:
>>>>>>      MachNode *spillCP = match_tree(new
>>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>>>>> #endif
>>>>>>      MachNode *spillI  = match_tree(new
>>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>>>>>>      MachNode *spillL  = match_tree(new
>>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered,
>>>>>> LoadNode::DependsO nlyOnTest, false));
>>>>>>      MachNode *spillF  = match_tree(new
>>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>>>>>>      MachNode *spillD  = match_tree(new
>>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>>>>>>      MachNode *spillP  = match_tree(new
>>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>>>>>      ....
>>>>>>      idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
>>>>>>
>>>>>> An other question. You use movflt() and movdbl() which use either
>>>>>> movap[s|d] and movs[s|d]
>>>>>> instructions:
>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu
>>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when
>>>>>> avx512vl is not available? I see for vectors you use
>>>>>> vpxor+vinserti* combination.
>>>>>>
>>>>>> Sandhya >>> Yes the scalar floating point instructions are available
>>>>>> with AVX512 encoding when avx512vl is not available. That is why you
>>>>>> would see not just movflt, movdbl but all the other scalar
>>>>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions.
>>>>>>
>>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad<http://x86.ad>:
>>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
>>>>>>
>>>>>> Should it be (UseAVX < 3)?
>>>>>>
>>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>>>>>>> Hi Vladimir,
>>>>>>>
>>>>>>> Thanks a lot for your review and feedback. Please see my response
>>>>>>> in your email below. I will send an updated webrev incorporating your feedback.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Sandhya
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com<mailto:vladimir.kozlov at oracle.com>]
>>>>>>> Sent: Monday, September 10, 2018 6:09 PM
>>>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com<mailto:sandhya.viswanathan at intel.com>>;
>>>>>>> hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>>>>>> instruction
>>>>>>>
>>>>>>> Very nice. Thank you, Sandhya.
>>>>>>>
>>>>>>> I would like to see more meaningful naming in .ad files - instead
>>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*.
>>>>>>>
>>>>>>>>>> Yes, accepted.
>>>>>>>
>>>>>>> New load_from_* and load_to_* instructions in .ad files should be
>>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions:
>>>>>>>
>>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst,
>>>>>>> vlRegF src)
>>>>>>>>>> Yes, accepted.
>>>>>>>
>>>>>>> You did not added instructions to load these registers from memory
>>>>>>> (and stack). What happens in such cases when you need to load or store?
>>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it
>>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>>>>>
>>>>>>> Also please explain why these registers are used when UseAVX == 0?:
>>>>>>>
>>>>>>> +instruct absD_reg(rregD dst) %{
>>>>>>>         predicate((UseSSE>=2) && (UseAVX == 0));
>>>>>>>
>>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too:
>>>>>>>       661   if (UseAVX < 3) {
>>>>>>>       662     _features &= ~CPU_AVX512F;
>>>>>>>
>>>>>>>>>> Yes, accepted. It could be regD here.
>>>>>>>
>>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
>>>>>>>
>>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>>>>>>> +vectors_reg_legacy, %{
>>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
>>>>>>> VM_Version::supports_avx512dq() &&
>>>>>>> VM_Version::supports_avx512vl() %} );
>>>>>>>
>>>>>>>>>> Yes, accepted.
>>>>>>>
>>>>>>> I would suggest to test these changes on different machines
>>>>>>> (non-avx512 and avx512) and with different UseAVX values.
>>>>>>>
>>>>>>>>>> Will do.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>>>>>>>> Recently there have been couple of high priority issues with
>>>>>>>> regards to high bank of XMM register
>>>>>>>> (XMM16-XMM31) usage by C2:
>>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>>>>>>
>>>>>>>> Please find below a patch which attempts to clean up the XMM
>>>>>>>> register handling by using register groups.
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>>>>>>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>>>>>>>
>>>>>>>> The patch provides a restricted set of registers to the match
>>>>>>>> rules in the ad file based on the underlying architecture.
>>>>>>>>
>>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>>>>>>>>
>>>>>>>> By removing the special handling, the patch reduces the overall
>>>>>>>> code size by about 1800 lines of code.
>>>>>>>>
>>>>>>>> Your review and feedback is very welcome.
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>>
>>>>>>>> Sandhya
>>>>>>>>


--

Thanks,
Jc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180918/2b6e2d2f/attachment-0001.html>

From rwestrel at redhat.com  Tue Sep 18 21:32:58 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 18 Sep 2018 23:32:58 +0200
Subject: RFR: JDK-8210752: Remaining explicit barriers for C2
In-Reply-To: <15be8e2d-dba5-2e8a-c851-b6821b81d4b3@redhat.com>
References: <15be8e2d-dba5-2e8a-c851-b6821b81d4b3@redhat.com>
Message-ID: <dk6wori7aph.fsf@rwestrel.remote.csb>


> http://cr.openjdk.java.net/~rkennke/JDK-8210752/webrev.00/

That looks good to me.

Roland.

From rwestrel at redhat.com  Tue Sep 18 21:39:00 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 18 Sep 2018 23:39:00 +0200
Subject: RFR: JDK-8210829: Modularize allocations in C2
In-Reply-To: <8fe187e2-b3fb-5473-257f-e4f6b6a94fcb@redhat.com>
References: <8fe187e2-b3fb-5473-257f-e4f6b6a94fcb@redhat.com>
Message-ID: <dk6tvmm7aff.fsf@rwestrel.remote.csb>


> http://cr.openjdk.java.net/~rkennke/JDK-8210829/webrev.00/

That looks good to me.

Roland.

From sandhya.viswanathan at intel.com  Tue Sep 18 23:52:43 2018
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Tue, 18 Sep 2018 23:52:43 +0000
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <d57561f0-94d2-a720-b54c-f48b51051fad@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
 <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>
 <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com>
 <be409c1c-789c-3160-f1d4-55acab52d8df@oracle.com>
 <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com>
 <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com>
 <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com>
 <3c0fe42a-0606-a060-7435-32547676683b@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com>
 <CAF9BGBzuA33v_aP7eJ_9hdnbv_jwH=GYRXcr-fG5Zzi-fdyfxw@mail.gmail.com> 
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF4BB3@FMSMSX126.amr.corp.intel.com>

Hi Vladimir,

Please find below the updated webrev with fixes for the two issues:
Patch:  http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.03/
RFE:    https://bugs.openjdk.java.net/browse/JDK-8210764

Fix for compiler/vectorization/TestNaNVector.java was to pass legVecS as the temporary register type for intrinsics instead of legVecD.
This test was only failing with -XX:MaxVectorSize=4.
The file modified is x86_64.ad.

Fix for compiler/vectorization/TestNaNVector.java was to allow all xmm registers (xmm0-xmm31) for C1 and handle floating point abs and negate appropriately by providing a temp register.
The C1 files are modified for this fix.

I reran compiler jtreg tests with setting nativepath appropriately on Haswell, SKX and KNL.

Best Regards,
Sandhya


From: Viswanathan, Sandhya
Sent: Tuesday, September 18, 2018 1:47 PM
To: 'JC Beyler' <jcbeyler at google.com>
Cc: vladimir.kozlov at oracle.com; hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>
Subject: RE: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction

Hi Jc,

Thanks a lot for the steps. I am now able to verify my fix for the NativeCallTest.java.

Best Regards,
Sandhya

From: JC Beyler [mailto:jcbeyler at google.com]
Sent: Monday, September 17, 2018 9:29 PM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com<mailto:sandhya.viswanathan at intel.com>>
Cc: vladimir.kozlov at oracle.com<mailto:vladimir.kozlov at oracle.com>; hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>
Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction

Hi Sandhya,

How are you invoking the test for NativeCallTest?

The way I would do it using jtreg would be something like this:

$ export BUILD_TYPE=release
$ export JDK_PATH=wherever you have your JDK

From the test subfolder:
$ wherever-your-jtreg-is/bin/jtreg -nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/support/test/hotspot/jtreg/native/lib -jdk $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java

Seems to pass for me.

But much easier is:
$ make run-test TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java"

That seems to pass for me as well and is easier to use :)

For information, the make run-test documentation is here:
http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing.html

Let me know if that helps,
Jc

Note: For NativeCallTest.java and many others I get the following message in the corresponding .jtr file:
                    "test result: Error. Use -nativepath to specify the location of native code"
            Do I need to give any additional info to jtreg to get over this problem?

Thanks a lot!
Best Regards,
Sandhya

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com<mailto:vladimir.kozlov at oracle.com>]
Sent: Monday, September 17, 2018 10:14 AM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com<mailto:sandhya.viswanathan at intel.com>>; hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction

I finished testing on avx512 machine.
All passed except known (TestNaNVector.java) failures.

Thanks,
Vladimir

On 9/14/18 5:22 PM, Vladimir Kozlov wrote:
> I gave incorrect link to RFE. Here is correct:
>
> https://bugs.openjdk.java.net/browse/JDK-8210764
>
> Vladimir
>
> On 9/14/18 3:49 PM, Vladimir Kozlov wrote:
>> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed.
>>
>> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too.
>>
>> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation' on CPU
>> with AVX1 only
>>
>> #  SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884
>> # Problematic frame:
>> # V  [libjvm.so+0x46f0f0]  MachNode::ideal_reg() const+0x20
>>
>> Current CompileTask:
>> C2:    154    5             java.lang.String::equals (65 bytes)
>>
>> Stack: [0x00007f3b10044000,0x00007f3b10145000],  sp=0x00007f3b1013fe70,  free space=1007k
>> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V  [libjvm.so+0x46f0f0]  MachNode::ideal_reg() const+0x20
>> V  [libjvm.so+0x882a72]  PhaseChaitin::gather_lrg_masks(bool)+0x872
>> V  [libjvm.so+0xd82235]  PhaseCFG::global_code_motion()+0xfc5
>> V  [libjvm.so+0xd824b1]  PhaseCFG::do_global_code_motion()+0x51
>> V  [libjvm.so+0xa2c26c]  Compile::Code_Gen()+0x24c
>> V  [libjvm.so+0xa2ff82]  Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, DirectiveSet*)+0xe42
>>
>> ------------------------------------------------------------------------------------------------
>> 2.

with '-Xcomp'
>> #  Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073
>> #  assert(false) failed: cannot spill interval that is used in first instruction (possible reason: no register found)
>>
>> Current CompileTask:
>> C1: 854767 13391       3       org.sunflow.math.Matrix4::multiply (692 bytes)
>>
>> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],  sp=0x00007f23b9e7f9d0,  free space=1014k
>> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V  [libjvm.so+0x1882202]  VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned
>> char*, void*, void*, char const*, int, unsigned long)+0x562
>> V  [libjvm.so+0x1882d2f]  VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*,
>> __va_list_tag*)+0x2f
>> V  [libjvm.so+0xb0bea0]  report_vm_error(char const*, int, char const*, char const*, ...)+0x100
>> V  [libjvm.so+0x7e0410]  LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0
>> V  [libjvm.so+0x7e0a20]  LinearScanWalker::activate_current()+0x280
>> V  [libjvm.so+0x7e0c7d]  IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d
>> V  [libjvm.so+0x7e1078]  LinearScan::allocate_registers()+0x338
>> V  [libjvm.so+0x7e2135]  LinearScan::do_linear_scan()+0x155
>> V  [libjvm.so+0x70a6bb]  Compilation::emit_lir()+0x99b
>> V  [libjvm.so+0x70caff]  Compilation::compile_java_method()+0x42f
>> V  [libjvm.so+0x70d974]  Compilation::compile_method()+0x1d4
>> V  [libjvm.so+0x70e547]  Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*,
>> DirectiveSet*)+0x357
>> V  [libjvm.so+0x71073c]  Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c
>> V  [libjvm.so+0xa3cf89]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409
>>
>> Vladimir
>>
>> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote:
>>>
>>> Thanks Vladimir, the below should fix this issue:
>>>
>>> ------------------------------
>>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700
>>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700
>>> @@ -233,22 +233,6 @@
>>>     _xmm_regs[13]  = xmm13;
>>>     _xmm_regs[14]  = xmm14;
>>>     _xmm_regs[15]  = xmm15;
>>> -  _xmm_regs[16]  = xmm16;
>>> -  _xmm_regs[17]  = xmm17;
>>> -  _xmm_regs[18]  = xmm18;
>>> -  _xmm_regs[19]  = xmm19;
>>> -  _xmm_regs[20]  = xmm20;
>>> -  _xmm_regs[21]  = xmm21;
>>> -  _xmm_regs[22]  = xmm22;
>>> -  _xmm_regs[23]  = xmm23;
>>> -  _xmm_regs[24]  = xmm24;
>>> -  _xmm_regs[25]  = xmm25;
>>> -  _xmm_regs[26]  = xmm26;
>>> -  _xmm_regs[27]  = xmm27;
>>> -  _xmm_regs[28]  = xmm28;
>>> -  _xmm_regs[29]  = xmm29;
>>> -  _xmm_regs[30]  = xmm30;
>>> -  _xmm_regs[31]  = xmm31;
>>>   #endif // _LP64
>>>
>>>     for (int i = 0; i < 8; i++) {
>>> ---------------------------------
>>>
>>> I think the gcc version on my desktop is older so didn?t catch this.
>>>
>>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to:
>>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/
>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735
>>>
>>> FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you before
>>> changing it back to 3.
>>>
>>> Best Regards,
>>> Sandhya
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com<mailto:vladimir.kozlov at oracle.com>]
>>> Sent: Friday, September 14, 2018 12:13 PM
>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com<mailto:sandhya.viswanathan at intel.com>>; hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>>
>>> I got build failure:
>>>
>>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the array
>>> (which contains 16 elements) [-Werror,-Warray-bounds]
>>> jib >   _xmm_regs[16]  = xmm16;
>>>
>>> I also noticed that we don't have RFE for this work. I filed:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>
>>> You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next
>>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp:
>>>
>>> - product(intx, UseAVX, 2, \
>>> + product(intx, UseAVX, 3, \
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote:
>>>> Looks good to me. I will start testing and let you know results.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>> Please find below the updated webrev with all your comments incorporated:
>>>>>
>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
>>>>>
>>>>> I have run the jtreg compiler tests on SKX and KNL which have two
>>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms.
>>>>>
>>>>> Best Regards,
>>>>> Sandhya
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com<mailto:vladimir.kozlov at oracle.com>]
>>>>> Sent: Tuesday, September 11, 2018 8:54 PM
>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com<mailto:sandhya.viswanathan at intel.com>>;
>>>>> hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>>>>
>>>>> Thank you, Sandhya
>>>>>
>>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.
>>>>>
>>>>> Vladimir
>>>>>
>>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
>>>>>> Hi Vladimir,
>>>>>>
>>>>>> Thanks a lot for the detailed review. I really appreciate your feedback.
>>>>>> Please see my response in your email below marked with (Sandhya
>>>>>>>>> ). Looking forward to your advice.
>>>>>>
>>>>>> Best Regards,
>>>>>> Sandhya
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com<mailto:vladimir.kozlov at oracle.com>]
>>>>>> Sent: Tuesday, September 11, 2018 5:11 PM
>>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com<mailto:sandhya.viswanathan at intel.com>>;
>>>>>> hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>>>>> instruction
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> I want to discuss next issue:
>>>>>>
>>>>>>     > You did not added instructions to load these registers from
>>>>>> memory (and stack). What happens in such cases when you need to load or store?
>>>>>>     >>>> Let us take an example, e.g. for loading into rregF. First
>>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>>>>
>>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack.
>>>>>> Also we don't check that register could be the same as result you may get unneeded moves.
>>>>>>
>>>>>> I would advice add memory moves at least.
>>>>>>
>>>>>> Sandhya >>>  I had added those rules initially and removed them in
>>>>>> the final patch. I noticed that the register allocator uses the
>>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask
>>>>>> (matcher.cpp). I would like the register allocator to get all the
>>>>>> possible register on an architecture for idealreg2reg mask. I
>>>>>> wondered that multiple instruct rules in .ad file for LoadF from
>>>>>> memory might cause problems.  I would have to have higher cost for
>>>>>> loading into restricted register set like vlReg. Then I decided that
>>>>>> the register allocator can handle this in much better way than me
>>>>>> adding rules to load from memory. This is with the background that the regF is always all the available registers
>>>>>> and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if
>>>>>> I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I
>>>>>> am referring to is:
>>>>>>      MachNode *spillCP = match_tree(new
>>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>>>>> #endif
>>>>>>      MachNode *spillI  = match_tree(new
>>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>>>>>>      MachNode *spillL  = match_tree(new
>>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered,
>>>>>> LoadNode::DependsO nlyOnTest, false));
>>>>>>      MachNode *spillF  = match_tree(new
>>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>>>>>>      MachNode *spillD  = match_tree(new
>>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>>>>>>      MachNode *spillP  = match_tree(new
>>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>>>>>      ....
>>>>>>      idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
>>>>>>
>>>>>> An other question. You use movflt() and movdbl() which use either
>>>>>> movap[s|d] and movs[s|d]
>>>>>> instructions:
>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu
>>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when
>>>>>> avx512vl is not available? I see for vectors you use
>>>>>> vpxor+vinserti* combination.
>>>>>>
>>>>>> Sandhya >>> Yes the scalar floating point instructions are available
>>>>>> with AVX512 encoding when avx512vl is not available. That is why you
>>>>>> would see not just movflt, movdbl but all the other scalar
>>>>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions.
>>>>>>
>>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad<http://x86.ad>:
>>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
>>>>>>
>>>>>> Should it be (UseAVX < 3)?
>>>>>>
>>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>>>>>>> Hi Vladimir,
>>>>>>>
>>>>>>> Thanks a lot for your review and feedback. Please see my response
>>>>>>> in your email below. I will send an updated webrev incorporating your feedback.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Sandhya
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com<mailto:vladimir.kozlov at oracle.com>]
>>>>>>> Sent: Monday, September 10, 2018 6:09 PM
>>>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com<mailto:sandhya.viswanathan at intel.com>>;
>>>>>>> hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>>>>>> instruction
>>>>>>>
>>>>>>> Very nice. Thank you, Sandhya.
>>>>>>>
>>>>>>> I would like to see more meaningful naming in .ad files - instead
>>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*.
>>>>>>>
>>>>>>>>>> Yes, accepted.
>>>>>>>
>>>>>>> New load_from_* and load_to_* instructions in .ad files should be
>>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions:
>>>>>>>
>>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst,
>>>>>>> vlRegF src)
>>>>>>>>>> Yes, accepted.
>>>>>>>
>>>>>>> You did not added instructions to load these registers from memory
>>>>>>> (and stack). What happens in such cases when you need to load or store?
>>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it
>>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>>>>>
>>>>>>> Also please explain why these registers are used when UseAVX == 0?:
>>>>>>>
>>>>>>> +instruct absD_reg(rregD dst) %{
>>>>>>>         predicate((UseSSE>=2) && (UseAVX == 0));
>>>>>>>
>>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too:
>>>>>>>       661   if (UseAVX < 3) {
>>>>>>>       662     _features &= ~CPU_AVX512F;
>>>>>>>
>>>>>>>>>> Yes, accepted. It could be regD here.
>>>>>>>
>>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
>>>>>>>
>>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>>>>>>> +vectors_reg_legacy, %{
>>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
>>>>>>> VM_Version::supports_avx512dq() &&
>>>>>>> VM_Version::supports_avx512vl() %} );
>>>>>>>
>>>>>>>>>> Yes, accepted.
>>>>>>>
>>>>>>> I would suggest to test these changes on different machines
>>>>>>> (non-avx512 and avx512) and with different UseAVX values.
>>>>>>>
>>>>>>>>>> Will do.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>>>>>>>> Recently there have been couple of high priority issues with
>>>>>>>> regards to high bank of XMM register
>>>>>>>> (XMM16-XMM31) usage by C2:
>>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>>>>>>
>>>>>>>> Please find below a patch which attempts to clean up the XMM
>>>>>>>> register handling by using register groups.
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>>>>>>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>>>>>>>
>>>>>>>> The patch provides a restricted set of registers to the match
>>>>>>>> rules in the ad file based on the underlying architecture.
>>>>>>>>
>>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>>>>>>>>
>>>>>>>> By removing the special handling, the patch reduces the overall
>>>>>>>> code size by about 1800 lines of code.
>>>>>>>>
>>>>>>>> Your review and feedback is very welcome.
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>>
>>>>>>>> Sandhya
>>>>>>>>


--

Thanks,
Jc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180918/f1ebc135/attachment-0001.html>

From patric.hedlin at oracle.com  Wed Sep 19 09:25:19 2018
From: patric.hedlin at oracle.com (Patric Hedlin)
Date: Wed, 19 Sep 2018 11:25:19 +0200
Subject: RFR(XS): 8210284: "assert((av & 0x00000001) == 0) failed: unsupported
 V8" on Solaris 11.4
Message-ID: <e4aae25b-d1ca-9e75-f841-326222784dc8@oracle.com>

Dear all,

I would like to ask for help to review the following change/update:

Issue:? https://bugs.openjdk.java.net/browse/JDK-8210284

Webrev: http://cr.openjdk.java.net/~phedlin/tr8210284


Testing:Verified that the JVM (in debug build) will not assert on
 ???? ??? start-up when running Solaris 11.4, after applying the update.


Best regards,
Patric

From rkennke at redhat.com  Wed Sep 19 09:40:31 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 19 Sep 2018 11:40:31 +0200
Subject: RFR: JDK-8210829: Modularize allocations in C2
In-Reply-To: <09c53a34-eb68-a614-ee03-468673004b3b@oracle.com>
References: <8fe187e2-b3fb-5473-257f-e4f6b6a94fcb@redhat.com>
 <09c53a34-eb68-a614-ee03-468673004b3b@oracle.com>
Message-ID: <f69d7eed-6213-a5c8-6c75-e47dc5ac0f5f@redhat.com>

Thanks, Vladimir!
I'll fix the comment and push it through jdk/submit before pushing to
jdk/jdk.

Roman

> Hi Roman,
> 
> This looks good. I looked through changes and it generates the same
> ideal graph as before.
> It seems you unintentionally changed indent of the comment in
> barrierSetC2.hpp
> 
> Thanks,
> Vladimir
> 
> On 9/18/18 12:58 AM, Roman Kennke wrote:
>> Similar to what we've done before to runtime, interpreter and C1,
>> allocations should be owned and implemented by GC, and possible to
>> override by specific collectors. For example, Shenandoah lays out
>> objects differently in heap, and needs one extra store to initialize
>> objects.
>>
>> This proposed change factors out the interesting part of object
>> allocation (i.e. the actual allocation) into BarrierSetC2. It's mostly a
>> move-and-rename-job. I had to move some little things around, that is:
>> - for the need-gc-check, I'm passing back the needgc_ctrl to plug into
>> slow-path
>> - for prefetching, instead of passing around the 'length' node, only to
>> determine the number of prefetch lines, I determine this early, and pass
>> through the lines arg.
>> - i_o, needgc_ctrl, fast_oop_ctrl, fast_oop_rawmem are passed as in/out
>> or out-args to stitch together into the regions and phis as appropriate.
>> I see no easy way around that.
>>
>> I tested this using hotspot/jtreg:tier1 and also verified that this
>> fills Shenandoah's needs and run tier3_gc_shenandoah testsuite.
>>
>> http://cr.openjdk.java.net/~rkennke/JDK-8210829/webrev.00/
>>
>> Can I please get reviews?
>> Thanks,
>> Roman
>>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180919/a2de01a6/signature.asc>

From erik.osterlund at oracle.com  Wed Sep 19 12:19:23 2018
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Wed, 19 Sep 2018 14:19:23 +0200
Subject: RFR(XS): 8210284: "assert((av & 0x00000001) == 0) failed:
 unsupported V8" on Solaris 11.4
In-Reply-To: <e4aae25b-d1ca-9e75-f841-326222784dc8@oracle.com>
References: <e4aae25b-d1ca-9e75-f841-326222784dc8@oracle.com>
Message-ID: <18901554-24c4-64b0-3f06-9a1d029f8d85@oracle.com>

Hi Patric,

Looks good.

Thanks,
/Erik

On 2018-09-19 11:25, Patric Hedlin wrote:
> Dear all,
>
> I would like to ask for help to review the following change/update:
>
> Issue:? https://bugs.openjdk.java.net/browse/JDK-8210284
>
> Webrev: http://cr.openjdk.java.net/~phedlin/tr8210284
>
>
>
> Testing:Verified that the JVM (in debug build) will not assert on
> ???? ??? start-up when running Solaris 11.4, after applying the update.
>
>
> Best regards,
> Patric


From vladimir.kozlov at oracle.com  Wed Sep 19 16:22:56 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 19 Sep 2018 09:22:56 -0700
Subject: RFR(XS): 8210284: "assert((av & 0x00000001) == 0) failed:
 unsupported V8" on Solaris 11.4
In-Reply-To: <e4aae25b-d1ca-9e75-f841-326222784dc8@oracle.com>
References: <e4aae25b-d1ca-9e75-f841-326222784dc8@oracle.com>
Message-ID: <88e85a78-d9a9-9b05-a895-9e1349aaecba@oracle.com>

Looks good.

Thanks,
Vladimir

On 9/19/18 2:25 AM, Patric Hedlin wrote:
> Dear all,
> 
> I would like to ask for help to review the following change/update:
> 
> Issue:? https://bugs.openjdk.java.net/browse/JDK-8210284
> 
> Webrev: http://cr.openjdk.java.net/~phedlin/tr8210284
> 
> 
> 
> Testing:Verified that the JVM (in debug build) will not assert on
>  ???? ??? start-up when running Solaris 11.4, after applying the update.
> 
> 
> Best regards,
> Patric

From vladimir.kozlov at oracle.com  Wed Sep 19 16:53:54 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 19 Sep 2018 09:53:54 -0700
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF4BB3@FMSMSX126.amr.corp.intel.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <d57561f0-94d2-a720-b54c-f48b51051fad@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
 <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>
 <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com>
 <be409c1c-789c-3160-f1d4-55acab52d8df@oracle.com>
 <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com>
 <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com>
 <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com>
 <3c0fe42a-0606-a060-7435-32547676683b@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com>
 <CAF9BGBzuA33v_aP7eJ_9hdnbv_jwH=GYRXcr-fG5Zzi-fdyfxw@mail.gmail.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF4BB3@FMSMSX126.amr.corp.intel.com>
Message-ID: <f45b8b3e-182b-c1d3-e8c7-a33f29e6bd43@oracle.com>

Thank you, Sandhya

I submitted new testing.

Vladimir

On 9/18/18 4:52 PM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> Please find below the updated webrev with fixes for the two issues:
> 
> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.03/ 
> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.03/>
> 
> RFE: https://bugs.openjdk.java.net/browse/JDK-8210764
> 
> Fix for compiler/vectorization/TestNaNVector.java was to pass legVecS as the temporary register type for intrinsics 
> instead of legVecD.
> 
> This test was only failing with -XX:MaxVectorSize=4.
> 
> The file modified is x86_64.ad.
> 
> Fix for compiler/vectorization/TestNaNVector.java was to allow all xmm registers (xmm0-xmm31) for C1 and handle floating 
> point abs and negate appropriately by providing a temp register.
> 
> The C1 files are modified for this fix.
> 
> I reran compiler jtreg tests with setting nativepath appropriately on Haswell, SKX and KNL.
> 
> Best Regards,
> 
> Sandhya
> 
> *From:*Viswanathan, Sandhya
> *Sent:* Tuesday, September 18, 2018 1:47 PM
> *To:* 'JC Beyler' <jcbeyler at google.com>
> *Cc:* vladimir.kozlov at oracle.com; hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>
> *Subject:* RE: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
> 
> Hi Jc,
> 
> Thanks a lot for the steps. I am now able to verify my fix for the NativeCallTest.java.
> 
> Best Regards,
> 
> Sandhya
> 
> *From:*JC Beyler [mailto:jcbeyler at google.com]
> *Sent:* Monday, September 17, 2018 9:29 PM
> *To:* Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>
> *Cc:* vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>; hotspot-compiler-dev 
> <hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>>
> *Subject:* Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
> 
> Hi Sandhya,
> 
> How are you invoking the test for NativeCallTest?
> 
> The way I would do it using jtreg would be something like this:
> 
> $ export BUILD_TYPE=release
> 
> $ export JDK_PATH=wherever you have your JDK
> 
>  From the test subfolder:
> 
> $ wherever-your-jtreg-is/bin/jtreg 
> -nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/support/test/hotspot/jtreg/native/lib -jdk 
> $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk 
> hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java
> 
> Seems to pass for me.
> 
> But much easier is:
> 
> $ make run-test TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java"
> 
> That seems to pass for me as well and is easier to use :)
> 
> For information, the make run-test documentation is here:
> 
> http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing.html
> 
> Let me know if that helps,
> 
> Jc
> 
>     Note: For NativeCallTest.java and many others I get the following message in the corresponding .jtr file:
>      ? ? ? ? ? ? ? ? ? ? "test result: Error. Use -nativepath to specify the location of native code"
>      ? ? ? ? ? ? Do I need to give any additional info to jtreg to get over this problem?
> 
>     Thanks a lot!
>     Best Regards,
>     Sandhya
> 
>     -----Original Message-----
>     From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>]
>     Sent: Monday, September 17, 2018 10:14 AM
>     To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>;
>     hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>     Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
> 
>     I finished testing on avx512 machine.
>     All passed except known (TestNaNVector.java) failures.
> 
>     Thanks,
>     Vladimir
> 
>     On 9/14/18 5:22 PM, Vladimir Kozlov wrote:
>      > I gave incorrect link to RFE. Here is correct:
>      >
>      > https://bugs.openjdk.java.net/browse/JDK-8210764
>      >
>      > Vladimir
>      >
>      > On 9/14/18 3:49 PM, Vladimir Kozlov wrote:
>      >> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed.
>      >>
>      >> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too.
>      >>
>      >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation'
>     on CPU
>      >> with AVX1 only
>      >>
>      >> #? SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884
>      >> # Problematic frame:
>      >> # V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20
>      >>
>      >> Current CompileTask:
>      >> C2:??? 154??? 5???????????? java.lang.String::equals (65 bytes)
>      >>
>      >> Stack: [0x00007f3b10044000,0x00007f3b10145000],? sp=0x00007f3b1013fe70,? free space=1007k
>      >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>      >> V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20
>      >> V? [libjvm.so+0x882a72]? PhaseChaitin::gather_lrg_masks(bool)+0x872
>      >> V? [libjvm.so+0xd82235]? PhaseCFG::global_code_motion()+0xfc5
>      >> V? [libjvm.so+0xd824b1]? PhaseCFG::do_global_code_motion()+0x51
>      >> V? [libjvm.so+0xa2c26c]? Compile::Code_Gen()+0x24c
>      >> V? [libjvm.so+0xa2ff82]? Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool,
>     DirectiveSet*)+0xe42
>      >>
>      >> ------------------------------------------------------------------------------------------------
>      >> 2.
> 
>     with '-Xcomp'
>      >> #? Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073
>      >> #? assert(false) failed: cannot spill interval that is used in first instruction (possible reason: no register
>     found)
>      >>
>      >> Current CompileTask:
>      >> C1: 854767 13391?????? 3?????? org.sunflow.math.Matrix4::multiply (692 bytes)
>      >>
>      >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],? sp=0x00007f23b9e7f9d0,? free space=1014k
>      >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>      >> V? [libjvm.so+0x1882202]? VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned
>      >> char*, void*, void*, char const*, int, unsigned long)+0x562
>      >> V? [libjvm.so+0x1882d2f]? VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*,
>      >> __va_list_tag*)+0x2f
>      >> V? [libjvm.so+0xb0bea0]? report_vm_error(char const*, int, char const*, char const*, ...)+0x100
>      >> V? [libjvm.so+0x7e0410]? LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0
>      >> V? [libjvm.so+0x7e0a20]? LinearScanWalker::activate_current()+0x280
>      >> V? [libjvm.so+0x7e0c7d]? IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d
>      >> V? [libjvm.so+0x7e1078]? LinearScan::allocate_registers()+0x338
>      >> V? [libjvm.so+0x7e2135]? LinearScan::do_linear_scan()+0x155
>      >> V? [libjvm.so+0x70a6bb]? Compilation::emit_lir()+0x99b
>      >> V? [libjvm.so+0x70caff]? Compilation::compile_java_method()+0x42f
>      >> V? [libjvm.so+0x70d974]? Compilation::compile_method()+0x1d4
>      >> V? [libjvm.so+0x70e547]? Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*,
>      >> DirectiveSet*)+0x357
>      >> V? [libjvm.so+0x71073c]? Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c
>      >> V? [libjvm.so+0xa3cf89]? CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409
>      >>
>      >> Vladimir
>      >>
>      >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote:
>      >>>
>      >>> Thanks Vladimir, the below should fix this issue:
>      >>>
>      >>> ------------------------------
>      >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700
>      >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700
>      >>> @@ -233,22 +233,6 @@
>      >>> ??? _xmm_regs[13]? = xmm13;
>      >>> ??? _xmm_regs[14]? = xmm14;
>      >>> ??? _xmm_regs[15]? = xmm15;
>      >>> -? _xmm_regs[16]? = xmm16;
>      >>> -? _xmm_regs[17]? = xmm17;
>      >>> -? _xmm_regs[18]? = xmm18;
>      >>> -? _xmm_regs[19]? = xmm19;
>      >>> -? _xmm_regs[20]? = xmm20;
>      >>> -? _xmm_regs[21]? = xmm21;
>      >>> -? _xmm_regs[22]? = xmm22;
>      >>> -? _xmm_regs[23]? = xmm23;
>      >>> -? _xmm_regs[24]? = xmm24;
>      >>> -? _xmm_regs[25]? = xmm25;
>      >>> -? _xmm_regs[26]? = xmm26;
>      >>> -? _xmm_regs[27]? = xmm27;
>      >>> -? _xmm_regs[28]? = xmm28;
>      >>> -? _xmm_regs[29]? = xmm29;
>      >>> -? _xmm_regs[30]? = xmm30;
>      >>> -? _xmm_regs[31]? = xmm31;
>      >>> ? #endif // _LP64
>      >>>
>      >>> ??? for (int i = 0; i < 8; i++) {
>      >>> ---------------------------------
>      >>>
>      >>> I think the gcc version on my desktop is older so didn?t catch this.
>      >>>
>      >>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to:
>      >>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/
>     <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.02/>
>      >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735
>      >>>
>      >>> FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you before
>      >>> changing it back to 3.
>      >>>
>      >>> Best Regards,
>      >>> Sandhya
>      >>>
>      >>>
>      >>> -----Original Message-----
>      >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>]
>      >>> Sent: Friday, September 14, 2018 12:13 PM
>      >>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>;
>     hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>      >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>      >>>
>      >>> I got build failure:
>      >>>
>      >>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the array
>      >>> (which contains 16 elements) [-Werror,-Warray-bounds]
>      >>> jib >?? _xmm_regs[16]? = xmm16;
>      >>>
>      >>> I also noticed that we don't have RFE for this work. I filed:
>      >>>
>      >>> https://bugs.openjdk.java.net/browse/JDK-8209735
>      >>>
>      >>> You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next
>      >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp:
>      >>>
>      >>> - product(intx, UseAVX, 2, \
>      >>> + product(intx, UseAVX, 3, \
>      >>>
>      >>> Thanks,
>      >>> Vladimir
>      >>>
>      >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote:
>      >>>> Looks good to me. I will start testing and let you know results.
>      >>>>
>      >>>> Thanks,
>      >>>> Vladimir
>      >>>>
>      >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
>      >>>>> Hi Vladimir,
>      >>>>>
>      >>>>> Please find below the updated webrev with all your comments incorporated:
>      >>>>>
>      >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
>     <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.01/>
>      >>>>>
>      >>>>> I have run the jtreg compiler tests on SKX and KNL which have two
>      >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms.
>      >>>>>
>      >>>>> Best Regards,
>      >>>>> Sandhya
>      >>>>>
>      >>>>>
>      >>>>> -----Original Message-----
>      >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>]
>      >>>>> Sent: Tuesday, September 11, 2018 8:54 PM
>      >>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>;
>      >>>>> hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>      >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>      >>>>>
>      >>>>> Thank you, Sandhya
>      >>>>>
>      >>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.
>      >>>>>
>      >>>>> Vladimir
>      >>>>>
>      >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
>      >>>>>> Hi Vladimir,
>      >>>>>>
>      >>>>>> Thanks a lot for the detailed review. I really appreciate your feedback.
>      >>>>>> Please see my response in your email below marked with (Sandhya
>      >>>>>>>>> ). Looking forward to your advice.
>      >>>>>>
>      >>>>>> Best Regards,
>      >>>>>> Sandhya
>      >>>>>>
>      >>>>>>
>      >>>>>> -----Original Message-----
>      >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>]
>      >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM
>      >>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>;
>      >>>>>> hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>      >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>      >>>>>> instruction
>      >>>>>>
>      >>>>>> Thank you.
>      >>>>>>
>      >>>>>> I want to discuss next issue:
>      >>>>>>
>      >>>>>> ??? > You did not added instructions to load these registers from
>      >>>>>> memory (and stack). What happens in such cases when you need to load or store?
>      >>>>>> ??? >>>> Let us take an example, e.g. for loading into rregF. First
>      >>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>      >>>>>>
>      >>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack.
>      >>>>>> Also we don't check that register could be the same as result you may get unneeded moves.
>      >>>>>>
>      >>>>>> I would advice add memory moves at least.
>      >>>>>>
>      >>>>>> Sandhya >>>? I had added those rules initially and removed them in
>      >>>>>> the final patch. I noticed that the register allocator uses the
>      >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask
>      >>>>>> (matcher.cpp). I would like the register allocator to get all the
>      >>>>>> possible register on an architecture for idealreg2reg mask. I
>      >>>>>> wondered that multiple instruct rules in .ad file for LoadF from
>      >>>>>> memory might cause problems.? I would have to have higher cost for
>      >>>>>> loading into restricted register set like vlReg. Then I decided that
>      >>>>>> the register allocator can handle this in much better way than me
>      >>>>>> adding rules to load from memory. This is with the background that the regF is always all the available
>     registers
>      >>>>>> and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this
>     and if
>      >>>>>> I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp
>     that I
>      >>>>>> am referring to is:
>      >>>>>> ???? MachNode *spillCP = match_tree(new
>      >>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>      >>>>>> #endif
>      >>>>>> ???? MachNode *spillI? = match_tree(new
>      >>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>      >>>>>> ???? MachNode *spillL? = match_tree(new
>      >>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered,
>      >>>>>> LoadNode::DependsO nlyOnTest, false));
>      >>>>>> ???? MachNode *spillF? = match_tree(new
>      >>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>      >>>>>> ???? MachNode *spillD? = match_tree(new
>      >>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>      >>>>>> ???? MachNode *spillP? = match_tree(new
>      >>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>      >>>>>> ???? ....
>      >>>>>> ???? idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
>      >>>>>>
>      >>>>>> An other question. You use movflt() and movdbl() which use either
>      >>>>>> movap[s|d] and movs[s|d]
>      >>>>>> instructions:
>      >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu
>      >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when
>      >>>>>> avx512vl is not available? I see for vectors you use
>      >>>>>> vpxor+vinserti* combination.
>      >>>>>>
>      >>>>>> Sandhya >>> Yes the scalar floating point instructions are available
>      >>>>>> with AVX512 encoding when avx512vl is not available. That is why you
>      >>>>>> would see not just movflt, movdbl but all the other scalar
>      >>>>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F
>     instructions.
>      >>>>>>
>      >>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad <http://x86.ad>:
>      >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
>      >>>>>>
>      >>>>>> Should it be (UseAVX < 3)?
>      >>>>>>
>      >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
>      >>>>>>
>      >>>>>> Thanks,
>      >>>>>> Vladimir
>      >>>>>>
>      >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>      >>>>>>> Hi Vladimir,
>      >>>>>>>
>      >>>>>>> Thanks a lot for your review and feedback. Please see my response
>      >>>>>>> in your email below. I will send an updated webrev incorporating your feedback.
>      >>>>>>>
>      >>>>>>> Best Regards,
>      >>>>>>> Sandhya
>      >>>>>>>
>      >>>>>>>
>      >>>>>>> -----Original Message-----
>      >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>]
>      >>>>>>> Sent: Monday, September 10, 2018 6:09 PM
>      >>>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>;
>      >>>>>>> hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>      >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>      >>>>>>> instruction
>      >>>>>>>
>      >>>>>>> Very nice. Thank you, Sandhya.
>      >>>>>>>
>      >>>>>>> I would like to see more meaningful naming in .ad files - instead
>      >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*.
>      >>>>>>>
>      >>>>>>>>>> Yes, accepted.
>      >>>>>>>
>      >>>>>>> New load_from_* and load_to_* instructions in .ad files should be
>      >>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions:
>      >>>>>>>
>      >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst,
>      >>>>>>> vlRegF src)
>      >>>>>>>>>> Yes, accepted.
>      >>>>>>>
>      >>>>>>> You did not added instructions to load these registers from memory
>      >>>>>>> (and stack). What happens in such cases when you need to load or store?
>      >>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it
>      >>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa.
>      >>>>>>>
>      >>>>>>> Also please explain why these registers are used when UseAVX == 0?:
>      >>>>>>>
>      >>>>>>> +instruct absD_reg(rregD dst) %{
>      >>>>>>> ??????? predicate((UseSSE>=2) && (UseAVX == 0));
>      >>>>>>>
>      >>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too:
>      >>>>>>> ????? 661?? if (UseAVX < 3) {
>      >>>>>>> ????? 662???? _features &= ~CPU_AVX512F;
>      >>>>>>>
>      >>>>>>>>>> Yes, accepted. It could be regD here.
>      >>>>>>>
>      >>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
>      >>>>>>>
>      >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>      >>>>>>> +vectors_reg_legacy, %{
>      >>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
>      >>>>>>> VM_Version::supports_avx512dq() &&
>      >>>>>>> VM_Version::supports_avx512vl() %} );
>      >>>>>>>
>      >>>>>>>>>> Yes, accepted.
>      >>>>>>>
>      >>>>>>> I would suggest to test these changes on different machines
>      >>>>>>> (non-avx512 and avx512) and with different UseAVX values.
>      >>>>>>>
>      >>>>>>>>>> Will do.
>      >>>>>>>
>      >>>>>>> Thanks,
>      >>>>>>> Vladimir
>      >>>>>>>
>      >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>      >>>>>>>> Recently there have been couple of high priority issues with
>      >>>>>>>> regards to high bank of XMM register
>      >>>>>>>> (XMM16-XMM31) usage by C2:
>      >>>>>>>>
>      >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>      >>>>>>>>
>      >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>      >>>>>>>>
>      >>>>>>>> Please find below a patch which attempts to clean up the XMM
>      >>>>>>>> register handling by using register groups.
>      >>>>>>>>
>      >>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>     <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>      >>>>>>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>      >>>>>>>>
>      >>>>>>>> The patch provides a restricted set of registers to the match
>      >>>>>>>> rules in the ad file based on the underlying architecture.
>      >>>>>>>>
>      >>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>      >>>>>>>>
>      >>>>>>>> By removing the special handling, the patch reduces the overall
>      >>>>>>>> code size by about 1800 lines of code.
>      >>>>>>>>
>      >>>>>>>> Your review and feedback is very welcome.
>      >>>>>>>>
>      >>>>>>>> Best Regards,
>      >>>>>>>>
>      >>>>>>>> Sandhya
>      >>>>>>>>
> 
> 
> -- 
> 
> Thanks,
> 
> Jc
> 

From rkennke at redhat.com  Wed Sep 19 17:08:11 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 19 Sep 2018 19:08:11 +0200
Subject: RFR: JDK-8210829: Modularize allocations in C2
In-Reply-To: <09c53a34-eb68-a614-ee03-468673004b3b@oracle.com>
References: <8fe187e2-b3fb-5473-257f-e4f6b6a94fcb@redhat.com>
 <09c53a34-eb68-a614-ee03-468673004b3b@oracle.com>
Message-ID: <af0f0925-5b18-8510-a96c-f7d4c8e739ba@redhat.com>

Alright, submit repo came back with UNSTABLE. Can somebody here check it
and get back to me?

Build Details: 2018-09-19-1536076.roman.source
0 Failed Tests
Mach5 Tasks Results Summary

    KILLED: 0
    PASSED: 70
    UNABLE_TO_RUN: 3
    NA: 0
    FAILED: 0
    EXECUTED_WITH_FAILURE: 2
    Test

        2 Not run

tier1-solaris-sparc-jdk_open_test_hotspot_jtreg_tier1_common-solaris-sparcv9-57
Dependency task failed:
mach5...-1909-solaris-sparcv9-solaris-sparcv9-build-8

tier1-solaris-sparc-jdk_open_test_hotspot_jtreg_tier1_common-solaris-sparcv9-debug-58
Dependency task failed:
mach5...solaris-sparcv9-debug-solaris-sparcv9-build-9


> Hi Roman,
> 
> This looks good. I looked through changes and it generates the same
> ideal graph as before.
> It seems you unintentionally changed indent of the comment in
> barrierSetC2.hpp
> 
> Thanks,
> Vladimir
> 
> On 9/18/18 12:58 AM, Roman Kennke wrote:
>> Similar to what we've done before to runtime, interpreter and C1,
>> allocations should be owned and implemented by GC, and possible to
>> override by specific collectors. For example, Shenandoah lays out
>> objects differently in heap, and needs one extra store to initialize
>> objects.
>>
>> This proposed change factors out the interesting part of object
>> allocation (i.e. the actual allocation) into BarrierSetC2. It's mostly a
>> move-and-rename-job. I had to move some little things around, that is:
>> - for the need-gc-check, I'm passing back the needgc_ctrl to plug into
>> slow-path
>> - for prefetching, instead of passing around the 'length' node, only to
>> determine the number of prefetch lines, I determine this early, and pass
>> through the lines arg.
>> - i_o, needgc_ctrl, fast_oop_ctrl, fast_oop_rawmem are passed as in/out
>> or out-args to stitch together into the regions and phis as appropriate.
>> I see no easy way around that.
>>
>> I tested this using hotspot/jtreg:tier1 and also verified that this
>> fills Shenandoah's needs and run tier3_gc_shenandoah testsuite.
>>
>> http://cr.openjdk.java.net/~rkennke/JDK-8210829/webrev.00/
>>
>> Can I please get reviews?
>> Thanks,
>> Roman
>>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180919/f0e5191c/signature.asc>

From vladimir.kozlov at oracle.com  Wed Sep 19 17:58:53 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 19 Sep 2018 10:58:53 -0700
Subject: RFR: JDK-8210829: Modularize allocations in C2
In-Reply-To: <af0f0925-5b18-8510-a96c-f7d4c8e739ba@redhat.com>
References: <8fe187e2-b3fb-5473-257f-e4f6b6a94fcb@redhat.com>
 <09c53a34-eb68-a614-ee03-468673004b3b@oracle.com>
 <af0f0925-5b18-8510-a96c-f7d4c8e739ba@redhat.com>
Message-ID: <81b25b8b-c18f-28b5-1e25-3ecab35d6dc6@oracle.com>

Crypto library build failed - 8210912.
Mikael just pushed the fix - update your copy:

http://hg.openjdk.java.net/jdk/jdk/rev/15094d12a632

Vladimir

On 9/19/18 10:08 AM, Roman Kennke wrote:
> Alright, submit repo came back with UNSTABLE. Can somebody here check it
> and get back to me?
> 
> Build Details: 2018-09-19-1536076.roman.source
> 0 Failed Tests
> Mach5 Tasks Results Summary
> 
>      KILLED: 0
>      PASSED: 70
>      UNABLE_TO_RUN: 3
>      NA: 0
>      FAILED: 0
>      EXECUTED_WITH_FAILURE: 2
>      Test
> 
>          2 Not run
> 
> tier1-solaris-sparc-jdk_open_test_hotspot_jtreg_tier1_common-solaris-sparcv9-57
> Dependency task failed:
> mach5...-1909-solaris-sparcv9-solaris-sparcv9-build-8
> 
> tier1-solaris-sparc-jdk_open_test_hotspot_jtreg_tier1_common-solaris-sparcv9-debug-58
> Dependency task failed:
> mach5...solaris-sparcv9-debug-solaris-sparcv9-build-9
> 
> 
>> Hi Roman,
>>
>> This looks good. I looked through changes and it generates the same
>> ideal graph as before.
>> It seems you unintentionally changed indent of the comment in
>> barrierSetC2.hpp
>>
>> Thanks,
>> Vladimir
>>
>> On 9/18/18 12:58 AM, Roman Kennke wrote:
>>> Similar to what we've done before to runtime, interpreter and C1,
>>> allocations should be owned and implemented by GC, and possible to
>>> override by specific collectors. For example, Shenandoah lays out
>>> objects differently in heap, and needs one extra store to initialize
>>> objects.
>>>
>>> This proposed change factors out the interesting part of object
>>> allocation (i.e. the actual allocation) into BarrierSetC2. It's mostly a
>>> move-and-rename-job. I had to move some little things around, that is:
>>> - for the need-gc-check, I'm passing back the needgc_ctrl to plug into
>>> slow-path
>>> - for prefetching, instead of passing around the 'length' node, only to
>>> determine the number of prefetch lines, I determine this early, and pass
>>> through the lines arg.
>>> - i_o, needgc_ctrl, fast_oop_ctrl, fast_oop_rawmem are passed as in/out
>>> or out-args to stitch together into the regions and phis as appropriate.
>>> I see no easy way around that.
>>>
>>> I tested this using hotspot/jtreg:tier1 and also verified that this
>>> fills Shenandoah's needs and run tier3_gc_shenandoah testsuite.
>>>
>>> http://cr.openjdk.java.net/~rkennke/JDK-8210829/webrev.00/
>>>
>>> Can I please get reviews?
>>> Thanks,
>>> Roman
>>>
> 
> 

From rkennke at redhat.com  Wed Sep 19 20:05:23 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 19 Sep 2018 22:05:23 +0200
Subject: RFR: JDK-8210829: Modularize allocations in C2
In-Reply-To: <81b25b8b-c18f-28b5-1e25-3ecab35d6dc6@oracle.com>
References: <8fe187e2-b3fb-5473-257f-e4f6b6a94fcb@redhat.com>
 <09c53a34-eb68-a614-ee03-468673004b3b@oracle.com>
 <af0f0925-5b18-8510-a96c-f7d4c8e739ba@redhat.com>
 <81b25b8b-c18f-28b5-1e25-3ecab35d6dc6@oracle.com>
Message-ID: <b6e949da-eabe-c58a-c877-46cf9e6d9600@redhat.com>

Thanks, Vladimir. That fixed it, build came out clean and I pushed the
change.

Thanks,
Roman


> Crypto library build failed - 8210912.
> Mikael just pushed the fix - update your copy:
> 
> http://hg.openjdk.java.net/jdk/jdk/rev/15094d12a632
> 
> Vladimir
> 
> On 9/19/18 10:08 AM, Roman Kennke wrote:
>> Alright, submit repo came back with UNSTABLE. Can somebody here check it
>> and get back to me?
>>
>> Build Details: 2018-09-19-1536076.roman.source
>> 0 Failed Tests
>> Mach5 Tasks Results Summary
>>
>> ???? KILLED: 0
>> ???? PASSED: 70
>> ???? UNABLE_TO_RUN: 3
>> ???? NA: 0
>> ???? FAILED: 0
>> ???? EXECUTED_WITH_FAILURE: 2
>> ???? Test
>>
>> ???????? 2 Not run
>>
>> tier1-solaris-sparc-jdk_open_test_hotspot_jtreg_tier1_common-solaris-sparcv9-57
>>
>> Dependency task failed:
>> mach5...-1909-solaris-sparcv9-solaris-sparcv9-build-8
>>
>> tier1-solaris-sparc-jdk_open_test_hotspot_jtreg_tier1_common-solaris-sparcv9-debug-58
>>
>> Dependency task failed:
>> mach5...solaris-sparcv9-debug-solaris-sparcv9-build-9
>>
>>
>>> Hi Roman,
>>>
>>> This looks good. I looked through changes and it generates the same
>>> ideal graph as before.
>>> It seems you unintentionally changed indent of the comment in
>>> barrierSetC2.hpp
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/18/18 12:58 AM, Roman Kennke wrote:
>>>> Similar to what we've done before to runtime, interpreter and C1,
>>>> allocations should be owned and implemented by GC, and possible to
>>>> override by specific collectors. For example, Shenandoah lays out
>>>> objects differently in heap, and needs one extra store to initialize
>>>> objects.
>>>>
>>>> This proposed change factors out the interesting part of object
>>>> allocation (i.e. the actual allocation) into BarrierSetC2. It's
>>>> mostly a
>>>> move-and-rename-job. I had to move some little things around, that is:
>>>> - for the need-gc-check, I'm passing back the needgc_ctrl to plug into
>>>> slow-path
>>>> - for prefetching, instead of passing around the 'length' node, only to
>>>> determine the number of prefetch lines, I determine this early, and
>>>> pass
>>>> through the lines arg.
>>>> - i_o, needgc_ctrl, fast_oop_ctrl, fast_oop_rawmem are passed as in/out
>>>> or out-args to stitch together into the regions and phis as
>>>> appropriate.
>>>> I see no easy way around that.
>>>>
>>>> I tested this using hotspot/jtreg:tier1 and also verified that this
>>>> fills Shenandoah's needs and run tier3_gc_shenandoah testsuite.
>>>>
>>>> http://cr.openjdk.java.net/~rkennke/JDK-8210829/webrev.00/
>>>>
>>>> Can I please get reviews?
>>>> Thanks,
>>>> Roman
>>>>
>>
>>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180919/9ef43073/signature.asc>

From Pengfei.Li at arm.com  Thu Sep 20 04:15:28 2018
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Thu, 20 Sep 2018 04:15:28 +0000
Subject: [aarch64-port-dev ] RFR(S): 8210413: AArch64: Optimize div/rem by
 constant in C1
In-Reply-To: <9645c210-3d87-52fa-8051-54dc60629866@redhat.com>
References: <DB7PR08MB3115445A18A786BAFD1F7B08961A0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <9645c210-3d87-52fa-8051-54dc60629866@redhat.com>
Message-ID: <DB7PR08MB3115D98CC20DAB17F160991496130@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi Andrew,

Please find below new patch that added the same optimization for longs as well as ints and also fixed an issue.
http://cr.openjdk.java.net/~yzhang/8210413/webrev.01/

Could you help look at it again?

--
Thanks,
Pengfei


> -----Original Message-----
> 
> Hi,
> 
> On 09/13/2018 10:04 AM, Pengfei Li (Arm Technology China) wrote:
> 
> > Could you please help review this optimization in C1 AArch64?
> > JBS: https://bugs.openjdk.java.net/browse/JDK-8210413
> > webrev: http://cr.openjdk.java.net/~njian/8210413/webrev.00/
> 
> It looks fine, but it's really odd that this is only implemented for ints and not
> longs. Can you do longs too?
> 
> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From patric.hedlin at oracle.com  Thu Sep 20 09:53:12 2018
From: patric.hedlin at oracle.com (Patric Hedlin)
Date: Thu, 20 Sep 2018 11:53:12 +0200
Subject: RFR(S): JDK-8191339: [JVMCI] BigInteger compiler intrinsics on
 Graal.
In-Reply-To: <02f34a26-2a97-6a30-384f-115327781aac@oracle.com>
References: <28011331-bd43-2c32-dba4-e41879ffe28a@oracle.com>
 <02f34a26-2a97-6a30-384f-115327781aac@oracle.com>
Message-ID: <661f70d5-7a09-d181-5669-9841b590c7a3@oracle.com>

Hi Vladimir, Andrew,

Sorry for dropping this after vacation. The testing is a simplistic 
benchmark (soon to be... I hope) added to Graal (and some directed, a 
bit to ad hoc, testing not meant for up-streaming to Graal). I also used 
a simplified version of a more general JVMCI/VM test case for these 
options only, but it really does only exercise the JVMCI (not the option 
propagation in Graal or some other JVMCI "client"), making it less useful.

But in essence, Graal is the test-case.


On 2018-06-22 18:04, Vladimir Kozlov wrote:
> Hi Patric,
>
> Do you need Graal changes for this? Or it already has these intrinsics 
> and the only problem is these flags were not set in vm_version_x86.cpp?

No further changes have been made to Graal.

>
> Small note. In vm_version_x86.cpp previous code has already 
> COMPILER2_OR_JVMCI check. You can remove previous #endif and new 
> #ifdef. Also change comment for closing #endif at line 1080 to // 
> COMPILER2_OR_JVMCI
>
> 1080 #endif // COMPILER2

You are right (actually the intended webrev) and it should look correct 
now (just a tad old).

Best regards,
Patric
>
> What testing you did?
>
> Thanks,
> Vladimir
>
> On 6/21/18 8:26 AM, Patric Hedlin wrote:
>> Dear all,
>>
>> I would like to ask for help to review the following change/update:
>>
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8191339
>>
>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8191339/
>>
>>
>> 8191339: [JVMCI] BigInteger compiler intrinsics on Graal.
>>
>> ???? Enabling BigInteger intrinsics via JVMCI.
>>
>>
>>
>> Best regards,
>> Patric


From aph at redhat.com  Thu Sep 20 10:53:23 2018
From: aph at redhat.com (Andrew Haley)
Date: Thu, 20 Sep 2018 11:53:23 +0100
Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11
In-Reply-To: <9b58801d-fcdd-88ff-7d45-15c44096137f@bell-sw.com>
References: <dk6zhx64pl5.fsf@rwestrel.remote.csb>
 <98587949-4bef-51ad-0652-3a74f79af233@oracle.com>
 <dk67ek311xv.fsf@rwestrel.remote.csb>
 <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com>
 <dk65zzkz7wz.fsf@rwestrel.remote.csb>
 <ff6c3ae6-99a9-47bc-8dea-01232eab4927@bell-sw.com>
 <dca1a96d-4120-2c7d-6175-31137161ccdd@oracle.com>
 <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com>
 <f14ff724-5635-fb52-d18d-154395207979@bell-sw.com>
 <dk6efe7xfh2.fsf@rwestrel.remote.csb>
 <097e9014-db4a-0f2f-9b78-80cb5dcc7832@bell-sw.com>
 <dk68t4eydnw.fsf@rwestrel.remote.csb>
 <9b58801d-fcdd-88ff-7d45-15c44096137f@bell-sw.com>
Message-ID: <ff5468fd-75b2-5568-c295-322c17fb4de3@redhat.com>

On 09/06/2018 02:20 PM, Dmitry Chuyko wrote:
> On 09/06/2018 04:10 PM, Roland Westrelin wrote:
>>> Yes. Here is how it looks like:
>>> ...................................
>> That does seem like a pretty minimal difference and not a reason not to
>> push that change. What do you think?
> I agree, it looks like something we should investigate in aarch64 port.

mkay, but how, exactly? Is it simply the case that Intel is improved
so the patch is good, even if AArch64 regresses?

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From shade at redhat.com  Thu Sep 20 14:18:11 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 20 Sep 2018 16:18:11 +0200
Subject: RFR (XS) 8210963: Build failures after "8210829: Modularize
 allocations in C2"
Message-ID: <66f8fba1-f1cf-5291-256e-11fef5184435@redhat.com>

Bug:
  https://bugs.openjdk.java.net/browse/JDK-8210963

Missing include for 32-bit platforms makes it fail x86_32 and arm32 builds. Also, uint/intx
inconsistency makes it fail even after includes are proper, because "lines" is now "intx". Seems
easier to fix uint->intx right at uses in for-loops, because "i" is only used for "i == 0"
comparisons in them.

Fix:

diff -r 1fd0f300d4b7 src/hotspot/share/gc/shared/c2/barrierSetC2.cpp
--- a/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp	Thu Sep 20 08:11:21 2018 -0400
+++ b/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp	Thu Sep 20 15:49:02 2018 +0200
@@ -27,2 +27,3 @@
 #include "opto/arraycopynode.hpp"
+#include "opto/convertnode.hpp"
 #include "opto/graphKit.hpp"
diff -r 1fd0f300d4b7 src/hotspot/share/opto/macro.cpp
--- a/src/hotspot/share/opto/macro.cpp	Thu Sep 20 08:11:21 2018 -0400
+++ b/src/hotspot/share/opto/macro.cpp	Thu Sep 20 15:49:02 2018 +0200
@@ -1729,3 +1729,3 @@

-      for ( uint i = 0; i < lines; i++ ) {
+      for ( intx i = 0; i < lines; i++ ) {
         prefetch_adr = new AddPNode( old_pf_wm, new_pf_wmt,
@@ -1782,3 +1782,3 @@
       distance = step_size;
-      for ( uint i = 1; i < lines; i++ ) {
+      for ( intx i = 1; i < lines; i++ ) {
         prefetch_adr = new AddPNode( cache_adr, cache_adr,
@@ -1798,3 +1798,3 @@
       uint distance = AllocatePrefetchDistance;
-      for ( uint i = 0; i < lines; i++ ) {
+      for ( intx i = 0; i < lines; i++ ) {
         prefetch_adr = new AddPNode( old_eden_top, new_eden_top,


Testing: x86_64, x86_32, armhf builds

Thanks,
-Aleksey


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180920/b755a180/signature.asc>

From adinn at redhat.com  Thu Sep 20 14:20:02 2018
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 20 Sep 2018 15:20:02 +0100
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
Message-ID: <84fa6c2e-6a6e-a59a-8dff-175f7e50240f@redhat.com>

Ping!

Could I please get a review of this latest version of the JEP?

This includes responses to all previous comments with changes made both
to the JEP and the draft implementation.

I would like to get this into JDK12 if at all possible

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

On 10/09/18 19:05, Alan Bateman wrote:
> On 20/08/2018 16:18, Andrew Dinn wrote:
>> Hi Alan,
>>
>> Round 4:
>>
>> I have redrafted the JEP and updated the implementation in the light of
>> your last feedback:
>>
>> ?? JEP JIRA: https://bugs.openjdk.java.net/browse/JDK-8207851
>>
>> ?? Formatted JEP: http://openjdk.java.net/jeps/8207851
>>
>> ?? New webrev: http://cr.openjdk.java.net/~adinn/pmem/webrev.04/
>>
>>
> The updated JEP looks much better.
> 
> I realize we've been through several iterations on this but I'm now
> wondering if the MappedByteBuffer is the right API. As you've shown,
> it's straight forward to map a region of NVM and use the existing API,
> I'm just not sure if it's the right API. I think I'd like to see a few
> examples of how the API might be used. ByteBuffers aren't intended for
> use by concurrent threads and I just wonder if the examples might need
> that. I also wonder if there is a possible connection with work in
> Project Panama and whether it's worth exploring if its scopes and
> pointers could be used to backed by NVM. The Risks and Assumption
> section mentions the 2GB limit which is another reminder that the MBB
> API may not be the right API.
> 
> The 2-arg force method to msync a region make sense? although it might
> be more consistent for the second parameter to be the length than the
> end offset.
> 
> A detail for later is whether UOE might be more appropriate for
> implementations that do not support the XXX_PERSISTENT modes.
> 
> -Alan.
> 

From rkennke at redhat.com  Thu Sep 20 14:20:49 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 20 Sep 2018 16:20:49 +0200
Subject: RFR (XS) 8210963: Build failures after "8210829: Modularize
 allocations in C2"
In-Reply-To: <66f8fba1-f1cf-5291-256e-11fef5184435@redhat.com>
References: <66f8fba1-f1cf-5291-256e-11fef5184435@redhat.com>
Message-ID: <1a282baa-680d-d3b5-7701-d20571b9da77@redhat.com>

> Bug:
>   https://bugs.openjdk.java.net/browse/JDK-8210963
> 
> Missing include for 32-bit platforms makes it fail x86_32 and arm32 builds. Also, uint/intx
> inconsistency makes it fail even after includes are proper, because "lines" is now "intx". Seems
> easier to fix uint->intx right at uses in for-loops, because "i" is only used for "i == 0"
> comparisons in them.
> 
> Fix:
> 
> diff -r 1fd0f300d4b7 src/hotspot/share/gc/shared/c2/barrierSetC2.cpp
> --- a/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp	Thu Sep 20 08:11:21 2018 -0400
> +++ b/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp	Thu Sep 20 15:49:02 2018 +0200
> @@ -27,2 +27,3 @@
>  #include "opto/arraycopynode.hpp"
> +#include "opto/convertnode.hpp"
>  #include "opto/graphKit.hpp"
> diff -r 1fd0f300d4b7 src/hotspot/share/opto/macro.cpp
> --- a/src/hotspot/share/opto/macro.cpp	Thu Sep 20 08:11:21 2018 -0400
> +++ b/src/hotspot/share/opto/macro.cpp	Thu Sep 20 15:49:02 2018 +0200
> @@ -1729,3 +1729,3 @@
> 
> -      for ( uint i = 0; i < lines; i++ ) {
> +      for ( intx i = 0; i < lines; i++ ) {
>          prefetch_adr = new AddPNode( old_pf_wm, new_pf_wmt,
> @@ -1782,3 +1782,3 @@
>        distance = step_size;
> -      for ( uint i = 1; i < lines; i++ ) {
> +      for ( intx i = 1; i < lines; i++ ) {
>          prefetch_adr = new AddPNode( cache_adr, cache_adr,
> @@ -1798,3 +1798,3 @@
>        uint distance = AllocatePrefetchDistance;
> -      for ( uint i = 0; i < lines; i++ ) {
> +      for ( intx i = 0; i < lines; i++ ) {
>          prefetch_adr = new AddPNode( old_eden_top, new_eden_top,
> 
> 
> Testing: x86_64, x86_32, armhf builds
> 

Looks good to me. Thanks for fixing this.
Roman


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180920/7a41f403/signature.asc>

From rwestrel at redhat.com  Thu Sep 20 14:54:13 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 20 Sep 2018 16:54:13 +0200
Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11
In-Reply-To: <ff5468fd-75b2-5568-c295-322c17fb4de3@redhat.com>
References: <dk6zhx64pl5.fsf@rwestrel.remote.csb>
 <98587949-4bef-51ad-0652-3a74f79af233@oracle.com>
 <dk67ek311xv.fsf@rwestrel.remote.csb>
 <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com>
 <dk65zzkz7wz.fsf@rwestrel.remote.csb>
 <ff6c3ae6-99a9-47bc-8dea-01232eab4927@bell-sw.com>
 <dca1a96d-4120-2c7d-6175-31137161ccdd@oracle.com>
 <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com>
 <f14ff724-5635-fb52-d18d-154395207979@bell-sw.com>
 <dk6efe7xfh2.fsf@rwestrel.remote.csb>
 <097e9014-db4a-0f2f-9b78-80cb5dcc7832@bell-sw.com>
 <dk68t4eydnw.fsf@rwestrel.remote.csb>
 <9b58801d-fcdd-88ff-7d45-15c44096137f@bell-sw.com>
 <ff5468fd-75b2-5568-c295-322c17fb4de3@redhat.com>
Message-ID: <dk6fty46wyy.fsf@rwestrel.remote.csb>


> mkay, but how, exactly? Is it simply the case that Intel is improved
> so the patch is good, even if AArch64 regresses?

Well, no, I don't think that's an accurate description of what this
is. Dmitry reported a performance regression but the generated code is
almost identical with or without the patch (the only difference being
that in one case the generated code uses b.cc and in the other
b.eq). Dmitry also hypothesized that branch prediction may not perform
as well with the patch. That doesn't seem directly related to the patch
but more of an unfortunate side effect. So the patch simplifies the IR
so less instructions may need to be emitted. That's not x86 specific. It
just happens that aarch64 don't seem to be able to take advantage of it
but it doesn't increase the number of instructions that aarch64 needs
either or forces aarch64 to use less efficient instructions. So overall,
it seemed to me there was no reasonable reason to not push this patch.

Roland.

From tobias.hartmann at oracle.com  Thu Sep 20 15:22:07 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 20 Sep 2018 11:22:07 -0400
Subject: [Patch] 8210853: C2 doesn't skip post barrier for new allocated
 objects
In-Reply-To: <37b6e1a5-6dcf-4097-baca-3f9c03d38bb1.kuaiwei.kw@alibaba-inc.com>
References: <37b6e1a5-6dcf-4097-baca-3f9c03d38bb1.kuaiwei.kw@alibaba-inc.com>
Message-ID: <a5f9f824-efdc-c04f-3325-999c8ab2d43a@oracle.com>

Hi,

isn't this code executed during parsing and therefore it could happen that more inputs are added to
the region? For example, by Parse::Block::add_new_path():
http://hg.openjdk.java.net/jdk/jdk/file/75e4ce0fa1ba/src/hotspot/share/opto/parse1.cpp#l1917

Best regards,
Tobias

On 18.09.2018 09:33, Kuai Wei wrote:
> 
> Hi,
> 
> ? I made a patch to?https://bugs.openjdk.java.net/browse/JDK-8210853 . Could you help review my change?
> 
> Background:
> ? C2 could remove G1 post barrier if store to new allocated object. But the check of
> just_allocated_object will be prevent by a Region node which is created when inline initialize
> method of super class. The change is to check the pattern and skip the Region node.
> 
> src/hotspot/share/opto/graphKit.cpp
> 
> ?//?We?use?this?to?determine?if?an?object?is?so?"fresh"?that
> ?//?it?does?not?require?card?marks.
> ?Node*?GraphKit::just_allocated_object(Node*?current_control)?{
> -??if?(C->recent_alloc_ctl()?==?current_control)
> +??Node?*?ctrl?=?current_control;
> +??//?Object::<init>?is?invoked?after?allocation,?most?of?invoke?nodes
> +??//?will?be?reduced,?but?a?region?node?is?kept?in?parse?time,?we?check
> +??//?the?pattern?and?skip?the?region?node
> +??if?(ctrl?!=?NULL?&&?ctrl->is_Region()?&&?ctrl->req()?==?2)?{
> +????ctrl?=?ctrl->in(1);
> +??}
> +??if?(C->recent_alloc_ctl()?==?ctrl)
> ?????return?C->recent_alloc_obj();
> ???return?NULL;
> ?}

From tobias.hartmann at oracle.com  Thu Sep 20 15:29:26 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 20 Sep 2018 11:29:26 -0400
Subject: RFR (XS) 8210963: Build failures after "8210829: Modularize
 allocations in C2"
In-Reply-To: <66f8fba1-f1cf-5291-256e-11fef5184435@redhat.com>
References: <66f8fba1-f1cf-5291-256e-11fef5184435@redhat.com>
Message-ID: <12b325bd-f00e-38ff-b357-f9f4c82d3f69@oracle.com>

Hi Aleksey,

looks good to me too.

Best regards,
Tobias

On 20.09.2018 10:18, Aleksey Shipilev wrote:
> Bug:
>   https://bugs.openjdk.java.net/browse/JDK-8210963
> 
> Missing include for 32-bit platforms makes it fail x86_32 and arm32 builds. Also, uint/intx
> inconsistency makes it fail even after includes are proper, because "lines" is now "intx". Seems
> easier to fix uint->intx right at uses in for-loops, because "i" is only used for "i == 0"
> comparisons in them.
> 
> Fix:
> 
> diff -r 1fd0f300d4b7 src/hotspot/share/gc/shared/c2/barrierSetC2.cpp
> --- a/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp	Thu Sep 20 08:11:21 2018 -0400
> +++ b/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp	Thu Sep 20 15:49:02 2018 +0200
> @@ -27,2 +27,3 @@
>  #include "opto/arraycopynode.hpp"
> +#include "opto/convertnode.hpp"
>  #include "opto/graphKit.hpp"
> diff -r 1fd0f300d4b7 src/hotspot/share/opto/macro.cpp
> --- a/src/hotspot/share/opto/macro.cpp	Thu Sep 20 08:11:21 2018 -0400
> +++ b/src/hotspot/share/opto/macro.cpp	Thu Sep 20 15:49:02 2018 +0200
> @@ -1729,3 +1729,3 @@
> 
> -      for ( uint i = 0; i < lines; i++ ) {
> +      for ( intx i = 0; i < lines; i++ ) {
>          prefetch_adr = new AddPNode( old_pf_wm, new_pf_wmt,
> @@ -1782,3 +1782,3 @@
>        distance = step_size;
> -      for ( uint i = 1; i < lines; i++ ) {
> +      for ( intx i = 1; i < lines; i++ ) {
>          prefetch_adr = new AddPNode( cache_adr, cache_adr,
> @@ -1798,3 +1798,3 @@
>        uint distance = AllocatePrefetchDistance;
> -      for ( uint i = 0; i < lines; i++ ) {
> +      for ( intx i = 0; i < lines; i++ ) {
>          prefetch_adr = new AddPNode( old_eden_top, new_eden_top,
> 
> 
> Testing: x86_64, x86_32, armhf builds
> 
> Thanks,
> -Aleksey
> 
> 
> 

From shade at redhat.com  Thu Sep 20 15:30:50 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 20 Sep 2018 17:30:50 +0200
Subject: RFR (XS) 8210963: Build failures after "8210829: Modularize
 allocations in C2"
In-Reply-To: <12b325bd-f00e-38ff-b357-f9f4c82d3f69@oracle.com>
References: <66f8fba1-f1cf-5291-256e-11fef5184435@redhat.com>
 <12b325bd-f00e-38ff-b357-f9f4c82d3f69@oracle.com>
Message-ID: <412e47a7-174c-94e0-4dd8-29aedcfd8fbc@redhat.com>

Thanks. Trivial? Can I push without jdk-submit?

-Aleksey

On 09/20/2018 05:29 PM, Tobias Hartmann wrote:
> Hi Aleksey,
> 
> looks good to me too.
> 
> Best regards,
> Tobias
> 
> On 20.09.2018 10:18, Aleksey Shipilev wrote:
>> Bug:
>>   https://bugs.openjdk.java.net/browse/JDK-8210963
>>
>> Missing include for 32-bit platforms makes it fail x86_32 and arm32 builds. Also, uint/intx
>> inconsistency makes it fail even after includes are proper, because "lines" is now "intx". Seems
>> easier to fix uint->intx right at uses in for-loops, because "i" is only used for "i == 0"
>> comparisons in them.
>>
>> Fix:
>>
>> diff -r 1fd0f300d4b7 src/hotspot/share/gc/shared/c2/barrierSetC2.cpp
>> --- a/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp	Thu Sep 20 08:11:21 2018 -0400
>> +++ b/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp	Thu Sep 20 15:49:02 2018 +0200
>> @@ -27,2 +27,3 @@
>>  #include "opto/arraycopynode.hpp"
>> +#include "opto/convertnode.hpp"
>>  #include "opto/graphKit.hpp"
>> diff -r 1fd0f300d4b7 src/hotspot/share/opto/macro.cpp
>> --- a/src/hotspot/share/opto/macro.cpp	Thu Sep 20 08:11:21 2018 -0400
>> +++ b/src/hotspot/share/opto/macro.cpp	Thu Sep 20 15:49:02 2018 +0200
>> @@ -1729,3 +1729,3 @@
>>
>> -      for ( uint i = 0; i < lines; i++ ) {
>> +      for ( intx i = 0; i < lines; i++ ) {
>>          prefetch_adr = new AddPNode( old_pf_wm, new_pf_wmt,
>> @@ -1782,3 +1782,3 @@
>>        distance = step_size;
>> -      for ( uint i = 1; i < lines; i++ ) {
>> +      for ( intx i = 1; i < lines; i++ ) {
>>          prefetch_adr = new AddPNode( cache_adr, cache_adr,
>> @@ -1798,3 +1798,3 @@
>>        uint distance = AllocatePrefetchDistance;
>> -      for ( uint i = 0; i < lines; i++ ) {
>> +      for ( intx i = 0; i < lines; i++ ) {
>>          prefetch_adr = new AddPNode( old_eden_top, new_eden_top,
>>
>>
>> Testing: x86_64, x86_32, armhf builds
>>
>> Thanks,
>> -Aleksey
>>
>>
>>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180920/5779e30f/signature.asc>

From tobias.hartmann at oracle.com  Thu Sep 20 16:16:43 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 20 Sep 2018 12:16:43 -0400
Subject: RFR (XS) 8210963: Build failures after "8210829: Modularize
 allocations in C2"
In-Reply-To: <412e47a7-174c-94e0-4dd8-29aedcfd8fbc@redhat.com>
References: <66f8fba1-f1cf-5291-256e-11fef5184435@redhat.com>
 <12b325bd-f00e-38ff-b357-f9f4c82d3f69@oracle.com>
 <412e47a7-174c-94e0-4dd8-29aedcfd8fbc@redhat.com>
Message-ID: <ec0ee80b-9a09-14d7-48e9-2200f2560049@oracle.com>

Yes.

Best regards,
Tobias

On 20.09.2018 11:30, Aleksey Shipilev wrote:
> Thanks. Trivial? Can I push without jdk-submit?
> 
> -Aleksey
> 
> On 09/20/2018 05:29 PM, Tobias Hartmann wrote:
>> Hi Aleksey,
>>
>> looks good to me too.
>>
>> Best regards,
>> Tobias
>>
>> On 20.09.2018 10:18, Aleksey Shipilev wrote:
>>> Bug:
>>>   https://bugs.openjdk.java.net/browse/JDK-8210963
>>>
>>> Missing include for 32-bit platforms makes it fail x86_32 and arm32 builds. Also, uint/intx
>>> inconsistency makes it fail even after includes are proper, because "lines" is now "intx". Seems
>>> easier to fix uint->intx right at uses in for-loops, because "i" is only used for "i == 0"
>>> comparisons in them.
>>>
>>> Fix:
>>>
>>> diff -r 1fd0f300d4b7 src/hotspot/share/gc/shared/c2/barrierSetC2.cpp
>>> --- a/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp	Thu Sep 20 08:11:21 2018 -0400
>>> +++ b/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp	Thu Sep 20 15:49:02 2018 +0200
>>> @@ -27,2 +27,3 @@
>>>  #include "opto/arraycopynode.hpp"
>>> +#include "opto/convertnode.hpp"
>>>  #include "opto/graphKit.hpp"
>>> diff -r 1fd0f300d4b7 src/hotspot/share/opto/macro.cpp
>>> --- a/src/hotspot/share/opto/macro.cpp	Thu Sep 20 08:11:21 2018 -0400
>>> +++ b/src/hotspot/share/opto/macro.cpp	Thu Sep 20 15:49:02 2018 +0200
>>> @@ -1729,3 +1729,3 @@
>>>
>>> -      for ( uint i = 0; i < lines; i++ ) {
>>> +      for ( intx i = 0; i < lines; i++ ) {
>>>          prefetch_adr = new AddPNode( old_pf_wm, new_pf_wmt,
>>> @@ -1782,3 +1782,3 @@
>>>        distance = step_size;
>>> -      for ( uint i = 1; i < lines; i++ ) {
>>> +      for ( intx i = 1; i < lines; i++ ) {
>>>          prefetch_adr = new AddPNode( cache_adr, cache_adr,
>>> @@ -1798,3 +1798,3 @@
>>>        uint distance = AllocatePrefetchDistance;
>>> -      for ( uint i = 0; i < lines; i++ ) {
>>> +      for ( intx i = 0; i < lines; i++ ) {
>>>          prefetch_adr = new AddPNode( old_eden_top, new_eden_top,
>>>
>>>
>>> Testing: x86_64, x86_32, armhf builds
>>>
>>> Thanks,
>>> -Aleksey
>>>
>>>
>>>
> 
> 

From jonathan.halliday at redhat.com  Thu Sep 20 16:17:33 2018
From: jonathan.halliday at redhat.com (Jonathan Halliday)
Date: Thu, 20 Sep 2018 17:17:33 +0100
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
Message-ID: <e503c1df-c68c-b7a6-77c2-ea9ad36c6b30@redhat.com>


Hi Alan

I'm a middleware engineer (transaction engine, message queues, etc) and 
I evolved the current API design whilst making some of Red Hat's Jakarta 
EE stack work with persistent memory. It's a good fit for our needs 
because it pretty much matches they way we currently do off-heap and 
persistent storage, so porting existing code is a breeze. For anything 
that is a 'make this bunch of bytes persistent' use case there isn't 
really a complex API. We're not trying to pass data structures to and 
fro as we would when calling a richer C library. The serialization layer 
takes care of flattening all the structures to an opaque byte[] or 
ByteBuffer already. We just need to be able to reason about the 
persistence guarantees the same way we can with the existing sync() 
call. We already take care of the threading, since existing storage 
solutions wouldn't work without those safeguards anyhow. So, there are 
certainly some use cases for which the current API is a good fit, 
because those are the ones I designed it for, based on code that already 
uses and copes with the limitations of MappedByteBuffer.

However... There are cases where we may want to get further 
optimizations by eliding the serialization to byte[]/ByteBuffer and 
instead be able to access persistent memory *as objects*. That's a 
harder problem and may involve language integration rather than just API 
changes, for example being able to allocate an object whose state 
(primitive fields, perhaps also object pointers) is backed by an 
(optionally explicitly specified area) of pmem. It's definitely a more 
powerful model, but also a much bigger problem to chew on.

Some halfway solution in which we can use Java objects to point into 
specific areas of memory in a typesafe way (e.g. 'that pmem address 
should be considered an int') would seem to be something that Panama 
could overlap with, but it's a convenience layer that could also be 
modelled by putting higher level abstractions over the proposed low 
level API. Over time we may have e.g. PersistentLong in the same way 
that today we have AtomicLong, but it's something that could be tested 
out in a 3rd party library initially and then migrated into the standard 
library if it's shown to be useful.

Is the proposed API sufficient for all use cases? Probably not. But it's 
useful for some and, so far as I can tell, non-harmful to others. Under 
the new release model what we have now is useful in its own right and 
should ship sooner rather than later, with additional functionality 
following later in a modular, agile fashion? I don't really see 
sufficient advantage in holding this pending e.g. investigation of 
integration with Panama, though that's definitely an interesting avenue 
for future work.

Regards

Jonathan

On 10/09/2018 19:05, Alan Bateman wrote:
...
> I realize we've been through several iterations on this but I'm now 
> wondering if the MappedByteBuffer is the right API. As you've shown, 
> it's straight forward to map a region of NVM and use the existing API, 
> I'm just not sure if it's the right API. I think I'd like to see a few 
> examples of how the API might be used. ByteBuffers aren't intended for 
> use by concurrent threads and I just wonder if the examples might need 
> that. I also wonder if there is a possible connection with work in 
> Project Panama and whether it's worth exploring if its scopes and 
> pointers could be used to backed by NVM. The Risks and Assumption 
> section mentions the 2GB limit which is another reminder that the MBB 
> API may not be the right API.


-- 
Registered in England and Wales under Company Registration No. 03798903 
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From vladimir.kozlov at oracle.com  Thu Sep 20 17:09:02 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 20 Sep 2018 10:09:02 -0700
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <f45b8b3e-182b-c1d3-e8c7-a33f29e6bd43@oracle.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
 <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>
 <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com>
 <be409c1c-789c-3160-f1d4-55acab52d8df@oracle.com>
 <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com>
 <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com>
 <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com>
 <3c0fe42a-0606-a060-7435-32547676683b@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com>
 <CAF9BGBzuA33v_aP7eJ_9hdnbv_jwH=GYRXcr-fG5Zzi-fdyfxw@mail.gmail.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF4BB3@FMSMSX126.amr.corp.intel.com>
 <f45b8b3e-182b-c1d3-e8c7-a33f29e6bd43@oracle.com>
Message-ID: <3cbd216b-56f9-0398-db93-4e032af85dc1@oracle.com>

I hit build failure on SPARC due to shared changes in C1:

workspace/open/src/hotspot/cpu/sparc/c1_LIRAssembler_sparc.cpp", line 3027: Error: "LIR_Assembler::negate(LIR_OprDesc*, 
LIR_OprDesc*)" was previously declared "LIR_Assembler::negate(LIR_OprDesc*, LIR_OprDesc*, LIR_OprDesc*)".
jib > 1 Error(s) detected.

I assume other platforms are also affected.

Vladimir

On 9/19/18 9:53 AM, Vladimir Kozlov wrote:
> Thank you, Sandhya
> 
> I submitted new testing.
> 
> Vladimir
> 
> On 9/18/18 4:52 PM, Viswanathan, Sandhya wrote:
>> Hi Vladimir,
>>
>> Please find below the updated webrev with fixes for the two issues:
>>
>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.03/ 
>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.03/>
>>
>> RFE: https://bugs.openjdk.java.net/browse/JDK-8210764
>>
>> Fix for compiler/vectorization/TestNaNVector.java was to pass legVecS as the temporary register type for intrinsics 
>> instead of legVecD.
>>
>> This test was only failing with -XX:MaxVectorSize=4.
>>
>> The file modified is x86_64.ad.
>>
>> Fix for compiler/vectorization/TestNaNVector.java was to allow all xmm registers (xmm0-xmm31) for C1 and handle 
>> floating point abs and negate appropriately by providing a temp register.
>>
>> The C1 files are modified for this fix.
>>
>> I reran compiler jtreg tests with setting nativepath appropriately on Haswell, SKX and KNL.
>>
>> Best Regards,
>>
>> Sandhya
>>
>> *From:*Viswanathan, Sandhya
>> *Sent:* Tuesday, September 18, 2018 1:47 PM
>> *To:* 'JC Beyler' <jcbeyler at google.com>
>> *Cc:* vladimir.kozlov at oracle.com; hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>
>> *Subject:* RE: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>
>> Hi Jc,
>>
>> Thanks a lot for the steps. I am now able to verify my fix for the NativeCallTest.java.
>>
>> Best Regards,
>>
>> Sandhya
>>
>> *From:*JC Beyler [mailto:jcbeyler at google.com]
>> *Sent:* Monday, September 17, 2018 9:29 PM
>> *To:* Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>
>> *Cc:* vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>; hotspot-compiler-dev 
>> <hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>>
>> *Subject:* Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>
>> Hi Sandhya,
>>
>> How are you invoking the test for NativeCallTest?
>>
>> The way I would do it using jtreg would be something like this:
>>
>> $ export BUILD_TYPE=release
>>
>> $ export JDK_PATH=wherever you have your JDK
>>
>> ?From the test subfolder:
>>
>> $ wherever-your-jtreg-is/bin/jtreg 
>> -nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/support/test/hotspot/jtreg/native/lib -jdk 
>> $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk 
>> hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java
>>
>> Seems to pass for me.
>>
>> But much easier is:
>>
>> $ make run-test TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java"
>>
>> That seems to pass for me as well and is easier to use :)
>>
>> For information, the make run-test documentation is here:
>>
>> http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing.html
>>
>> Let me know if that helps,
>>
>> Jc
>>
>> ??? Note: For NativeCallTest.java and many others I get the following message in the corresponding .jtr file:
>> ???? ? ? ? ? ? ? ? ? ? ? "test result: Error. Use -nativepath to specify the location of native code"
>> ???? ? ? ? ? ? ? Do I need to give any additional info to jtreg to get over this problem?
>>
>> ??? Thanks a lot!
>> ??? Best Regards,
>> ??? Sandhya
>>
>> ??? -----Original Message-----
>> ??? From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>]
>> ??? Sent: Monday, September 17, 2018 10:14 AM
>> ??? To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>;
>> ??? hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>> ??? Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>
>> ??? I finished testing on avx512 machine.
>> ??? All passed except known (TestNaNVector.java) failures.
>>
>> ??? Thanks,
>> ??? Vladimir
>>
>> ??? On 9/14/18 5:22 PM, Vladimir Kozlov wrote:
>> ???? > I gave incorrect link to RFE. Here is correct:
>> ???? >
>> ???? > https://bugs.openjdk.java.net/browse/JDK-8210764
>> ???? >
>> ???? > Vladimir
>> ???? >
>> ???? > On 9/14/18 3:49 PM, Vladimir Kozlov wrote:
>> ???? >> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed.
>> ???? >>
>> ???? >> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too.
>> ???? >>
>> ???? >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation'
>> ??? on CPU
>> ???? >> with AVX1 only
>> ???? >>
>> ???? >> #? SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884
>> ???? >> # Problematic frame:
>> ???? >> # V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20
>> ???? >>
>> ???? >> Current CompileTask:
>> ???? >> C2:??? 154??? 5???????????? java.lang.String::equals (65 bytes)
>> ???? >>
>> ???? >> Stack: [0x00007f3b10044000,0x00007f3b10145000],? sp=0x00007f3b1013fe70,? free space=1007k
>> ???? >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> ???? >> V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20
>> ???? >> V? [libjvm.so+0x882a72]? PhaseChaitin::gather_lrg_masks(bool)+0x872
>> ???? >> V? [libjvm.so+0xd82235]? PhaseCFG::global_code_motion()+0xfc5
>> ???? >> V? [libjvm.so+0xd824b1]? PhaseCFG::do_global_code_motion()+0x51
>> ???? >> V? [libjvm.so+0xa2c26c]? Compile::Code_Gen()+0x24c
>> ???? >> V? [libjvm.so+0xa2ff82]? Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool,
>> ??? DirectiveSet*)+0xe42
>> ???? >>
>> ???? >> ------------------------------------------------------------------------------------------------
>> ???? >> 2.
>>
>> ??? with '-Xcomp'
>> ???? >> #? Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073
>> ???? >> #? assert(false) failed: cannot spill interval that is used in first instruction (possible reason: no register
>> ??? found)
>> ???? >>
>> ???? >> Current CompileTask:
>> ???? >> C1: 854767 13391?????? 3?????? org.sunflow.math.Matrix4::multiply (692 bytes)
>> ???? >>
>> ???? >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],? sp=0x00007f23b9e7f9d0,? free space=1014k
>> ???? >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> ???? >> V? [libjvm.so+0x1882202]? VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, 
>> unsigned
>> ???? >> char*, void*, void*, char const*, int, unsigned long)+0x562
>> ???? >> V? [libjvm.so+0x1882d2f]? VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*,
>> ???? >> __va_list_tag*)+0x2f
>> ???? >> V? [libjvm.so+0xb0bea0]? report_vm_error(char const*, int, char const*, char const*, ...)+0x100
>> ???? >> V? [libjvm.so+0x7e0410]? LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0
>> ???? >> V? [libjvm.so+0x7e0a20]? LinearScanWalker::activate_current()+0x280
>> ???? >> V? [libjvm.so+0x7e0c7d]? IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d
>> ???? >> V? [libjvm.so+0x7e1078]? LinearScan::allocate_registers()+0x338
>> ???? >> V? [libjvm.so+0x7e2135]? LinearScan::do_linear_scan()+0x155
>> ???? >> V? [libjvm.so+0x70a6bb]? Compilation::emit_lir()+0x99b
>> ???? >> V? [libjvm.so+0x70caff]? Compilation::compile_java_method()+0x42f
>> ???? >> V? [libjvm.so+0x70d974]? Compilation::compile_method()+0x1d4
>> ???? >> V? [libjvm.so+0x70e547]? Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*,
>> ???? >> DirectiveSet*)+0x357
>> ???? >> V? [libjvm.so+0x71073c]? Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c
>> ???? >> V? [libjvm.so+0xa3cf89]? CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409
>> ???? >>
>> ???? >> Vladimir
>> ???? >>
>> ???? >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote:
>> ???? >>>
>> ???? >>> Thanks Vladimir, the below should fix this issue:
>> ???? >>>
>> ???? >>> ------------------------------
>> ???? >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700
>> ???? >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700
>> ???? >>> @@ -233,22 +233,6 @@
>> ???? >>> ??? _xmm_regs[13]? = xmm13;
>> ???? >>> ??? _xmm_regs[14]? = xmm14;
>> ???? >>> ??? _xmm_regs[15]? = xmm15;
>> ???? >>> -? _xmm_regs[16]? = xmm16;
>> ???? >>> -? _xmm_regs[17]? = xmm17;
>> ???? >>> -? _xmm_regs[18]? = xmm18;
>> ???? >>> -? _xmm_regs[19]? = xmm19;
>> ???? >>> -? _xmm_regs[20]? = xmm20;
>> ???? >>> -? _xmm_regs[21]? = xmm21;
>> ???? >>> -? _xmm_regs[22]? = xmm22;
>> ???? >>> -? _xmm_regs[23]? = xmm23;
>> ???? >>> -? _xmm_regs[24]? = xmm24;
>> ???? >>> -? _xmm_regs[25]? = xmm25;
>> ???? >>> -? _xmm_regs[26]? = xmm26;
>> ???? >>> -? _xmm_regs[27]? = xmm27;
>> ???? >>> -? _xmm_regs[28]? = xmm28;
>> ???? >>> -? _xmm_regs[29]? = xmm29;
>> ???? >>> -? _xmm_regs[30]? = xmm30;
>> ???? >>> -? _xmm_regs[31]? = xmm31;
>> ???? >>> ? #endif // _LP64
>> ???? >>>
>> ???? >>> ??? for (int i = 0; i < 8; i++) {
>> ???? >>> ---------------------------------
>> ???? >>>
>> ???? >>> I think the gcc version on my desktop is older so didn?t catch this.
>> ???? >>>
>> ???? >>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to:
>> ???? >>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/
>> ??? <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.02/>
>> ???? >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735
>> ???? >>>
>> ???? >>> FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you 
>> before
>> ???? >>> changing it back to 3.
>> ???? >>>
>> ???? >>> Best Regards,
>> ???? >>> Sandhya
>> ???? >>>
>> ???? >>>
>> ???? >>> -----Original Message-----
>> ???? >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>]
>> ???? >>> Sent: Friday, September 14, 2018 12:13 PM
>> ???? >>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>;
>> ??? hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>> ???? >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>> ???? >>>
>> ???? >>> I got build failure:
>> ???? >>>
>> ???? >>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the array
>> ???? >>> (which contains 16 elements) [-Werror,-Warray-bounds]
>> ???? >>> jib >?? _xmm_regs[16]? = xmm16;
>> ???? >>>
>> ???? >>> I also noticed that we don't have RFE for this work. I filed:
>> ???? >>>
>> ???? >>> https://bugs.openjdk.java.net/browse/JDK-8209735
>> ???? >>>
>> ???? >>> You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next
>> ???? >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp:
>> ???? >>>
>> ???? >>> - product(intx, UseAVX, 2, \
>> ???? >>> + product(intx, UseAVX, 3, \
>> ???? >>>
>> ???? >>> Thanks,
>> ???? >>> Vladimir
>> ???? >>>
>> ???? >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote:
>> ???? >>>> Looks good to me. I will start testing and let you know results.
>> ???? >>>>
>> ???? >>>> Thanks,
>> ???? >>>> Vladimir
>> ???? >>>>
>> ???? >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
>> ???? >>>>> Hi Vladimir,
>> ???? >>>>>
>> ???? >>>>> Please find below the updated webrev with all your comments incorporated:
>> ???? >>>>>
>> ???? >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
>> ??? <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.01/>
>> ???? >>>>>
>> ???? >>>>> I have run the jtreg compiler tests on SKX and KNL which have two
>> ???? >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms.
>> ???? >>>>>
>> ???? >>>>> Best Regards,
>> ???? >>>>> Sandhya
>> ???? >>>>>
>> ???? >>>>>
>> ???? >>>>> -----Original Message-----
>> ???? >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>]
>> ???? >>>>> Sent: Tuesday, September 11, 2018 8:54 PM
>> ???? >>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>;
>> ???? >>>>> hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>> ???? >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>> ???? >>>>>
>> ???? >>>>> Thank you, Sandhya
>> ???? >>>>>
>> ???? >>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.
>> ???? >>>>>
>> ???? >>>>> Vladimir
>> ???? >>>>>
>> ???? >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
>> ???? >>>>>> Hi Vladimir,
>> ???? >>>>>>
>> ???? >>>>>> Thanks a lot for the detailed review. I really appreciate your feedback.
>> ???? >>>>>> Please see my response in your email below marked with (Sandhya
>> ???? >>>>>>>>> ). Looking forward to your advice.
>> ???? >>>>>>
>> ???? >>>>>> Best Regards,
>> ???? >>>>>> Sandhya
>> ???? >>>>>>
>> ???? >>>>>>
>> ???? >>>>>> -----Original Message-----
>> ???? >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>]
>> ???? >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM
>> ???? >>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>;
>> ???? >>>>>> hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>> ???? >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>> ???? >>>>>> instruction
>> ???? >>>>>>
>> ???? >>>>>> Thank you.
>> ???? >>>>>>
>> ???? >>>>>> I want to discuss next issue:
>> ???? >>>>>>
>> ???? >>>>>> ??? > You did not added instructions to load these registers from
>> ???? >>>>>> memory (and stack). What happens in such cases when you need to load or store?
>> ???? >>>>>> ??? >>>> Let us take an example, e.g. for loading into rregF. First
>> ???? >>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>> ???? >>>>>>
>> ???? >>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack.
>> ???? >>>>>> Also we don't check that register could be the same as result you may get unneeded moves.
>> ???? >>>>>>
>> ???? >>>>>> I would advice add memory moves at least.
>> ???? >>>>>>
>> ???? >>>>>> Sandhya >>>? I had added those rules initially and removed them in
>> ???? >>>>>> the final patch. I noticed that the register allocator uses the
>> ???? >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask
>> ???? >>>>>> (matcher.cpp). I would like the register allocator to get all the
>> ???? >>>>>> possible register on an architecture for idealreg2reg mask. I
>> ???? >>>>>> wondered that multiple instruct rules in .ad file for LoadF from
>> ???? >>>>>> memory might cause problems.? I would have to have higher cost for
>> ???? >>>>>> loading into restricted register set like vlReg. Then I decided that
>> ???? >>>>>> the register allocator can handle this in much better way than me
>> ???? >>>>>> adding rules to load from memory. This is with the background that the regF is always all the available
>> ??? registers
>> ???? >>>>>> and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this
>> ??? and if
>> ???? >>>>>> I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp
>> ??? that I
>> ???? >>>>>> am referring to is:
>> ???? >>>>>> ???? MachNode *spillCP = match_tree(new
>> ???? >>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>> ???? >>>>>> #endif
>> ???? >>>>>> ???? MachNode *spillI? = match_tree(new
>> ???? >>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>> ???? >>>>>> ???? MachNode *spillL? = match_tree(new
>> ???? >>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered,
>> ???? >>>>>> LoadNode::DependsO nlyOnTest, false));
>> ???? >>>>>> ???? MachNode *spillF? = match_tree(new
>> ???? >>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>> ???? >>>>>> ???? MachNode *spillD? = match_tree(new
>> ???? >>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>> ???? >>>>>> ???? MachNode *spillP? = match_tree(new
>> ???? >>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>> ???? >>>>>> ???? ....
>> ???? >>>>>> ???? idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
>> ???? >>>>>>
>> ???? >>>>>> An other question. You use movflt() and movdbl() which use either
>> ???? >>>>>> movap[s|d] and movs[s|d]
>> ???? >>>>>> instructions:
>> ???? >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu
>> ???? >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when
>> ???? >>>>>> avx512vl is not available? I see for vectors you use
>> ???? >>>>>> vpxor+vinserti* combination.
>> ???? >>>>>>
>> ???? >>>>>> Sandhya >>> Yes the scalar floating point instructions are available
>> ???? >>>>>> with AVX512 encoding when avx512vl is not available. That is why you
>> ???? >>>>>> would see not just movflt, movdbl but all the other scalar
>> ???? >>>>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F
>> ??? instructions.
>> ???? >>>>>>
>> ???? >>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad <http://x86.ad>:
>> ???? >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
>> ???? >>>>>>
>> ???? >>>>>> Should it be (UseAVX < 3)?
>> ???? >>>>>>
>> ???? >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
>> ???? >>>>>>
>> ???? >>>>>> Thanks,
>> ???? >>>>>> Vladimir
>> ???? >>>>>>
>> ???? >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>> ???? >>>>>>> Hi Vladimir,
>> ???? >>>>>>>
>> ???? >>>>>>> Thanks a lot for your review and feedback. Please see my response
>> ???? >>>>>>> in your email below. I will send an updated webrev incorporating your feedback.
>> ???? >>>>>>>
>> ???? >>>>>>> Best Regards,
>> ???? >>>>>>> Sandhya
>> ???? >>>>>>>
>> ???? >>>>>>>
>> ???? >>>>>>> -----Original Message-----
>> ???? >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>]
>> ???? >>>>>>> Sent: Monday, September 10, 2018 6:09 PM
>> ???? >>>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>;
>> ???? >>>>>>> hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>> ???? >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>> ???? >>>>>>> instruction
>> ???? >>>>>>>
>> ???? >>>>>>> Very nice. Thank you, Sandhya.
>> ???? >>>>>>>
>> ???? >>>>>>> I would like to see more meaningful naming in .ad files - instead
>> ???? >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*.
>> ???? >>>>>>>
>> ???? >>>>>>>>>> Yes, accepted.
>> ???? >>>>>>>
>> ???? >>>>>>> New load_from_* and load_to_* instructions in .ad files should be
>> ???? >>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions:
>> ???? >>>>>>>
>> ???? >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst,
>> ???? >>>>>>> vlRegF src)
>> ???? >>>>>>>>>> Yes, accepted.
>> ???? >>>>>>>
>> ???? >>>>>>> You did not added instructions to load these registers from memory
>> ???? >>>>>>> (and stack). What happens in such cases when you need to load or store?
>> ???? >>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it
>> ???? >>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa.
>> ???? >>>>>>>
>> ???? >>>>>>> Also please explain why these registers are used when UseAVX == 0?:
>> ???? >>>>>>>
>> ???? >>>>>>> +instruct absD_reg(rregD dst) %{
>> ???? >>>>>>> ??????? predicate((UseSSE>=2) && (UseAVX == 0));
>> ???? >>>>>>>
>> ???? >>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too:
>> ???? >>>>>>> ????? 661?? if (UseAVX < 3) {
>> ???? >>>>>>> ????? 662???? _features &= ~CPU_AVX512F;
>> ???? >>>>>>>
>> ???? >>>>>>>>>> Yes, accepted. It could be regD here.
>> ???? >>>>>>>
>> ???? >>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
>> ???? >>>>>>>
>> ???? >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>> ???? >>>>>>> +vectors_reg_legacy, %{
>> ???? >>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
>> ???? >>>>>>> VM_Version::supports_avx512dq() &&
>> ???? >>>>>>> VM_Version::supports_avx512vl() %} );
>> ???? >>>>>>>
>> ???? >>>>>>>>>> Yes, accepted.
>> ???? >>>>>>>
>> ???? >>>>>>> I would suggest to test these changes on different machines
>> ???? >>>>>>> (non-avx512 and avx512) and with different UseAVX values.
>> ???? >>>>>>>
>> ???? >>>>>>>>>> Will do.
>> ???? >>>>>>>
>> ???? >>>>>>> Thanks,
>> ???? >>>>>>> Vladimir
>> ???? >>>>>>>
>> ???? >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>> ???? >>>>>>>> Recently there have been couple of high priority issues with
>> ???? >>>>>>>> regards to high bank of XMM register
>> ???? >>>>>>>> (XMM16-XMM31) usage by C2:
>> ???? >>>>>>>>
>> ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>> ???? >>>>>>>>
>> ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>> ???? >>>>>>>>
>> ???? >>>>>>>> Please find below a patch which attempts to clean up the XMM
>> ???? >>>>>>>> register handling by using register groups.
>> ???? >>>>>>>>
>> ???? >>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>> ??? <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>> ???? >>>>>>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>> ???? >>>>>>>>
>> ???? >>>>>>>> The patch provides a restricted set of registers to the match
>> ???? >>>>>>>> rules in the ad file based on the underlying architecture.
>> ???? >>>>>>>>
>> ???? >>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>> ???? >>>>>>>>
>> ???? >>>>>>>> By removing the special handling, the patch reduces the overall
>> ???? >>>>>>>> code size by about 1800 lines of code.
>> ???? >>>>>>>>
>> ???? >>>>>>>> Your review and feedback is very welcome.
>> ???? >>>>>>>>
>> ???? >>>>>>>> Best Regards,
>> ???? >>>>>>>>
>> ???? >>>>>>>> Sandhya
>> ???? >>>>>>>>
>>
>>
>> -- 
>>
>> Thanks,
>>
>> Jc
>>

From vladimir.kozlov at oracle.com  Thu Sep 20 17:12:46 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 20 Sep 2018 10:12:46 -0700
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <3cbd216b-56f9-0398-db93-4e032af85dc1@oracle.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>
 <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com>
 <be409c1c-789c-3160-f1d4-55acab52d8df@oracle.com>
 <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com>
 <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com>
 <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com>
 <3c0fe42a-0606-a060-7435-32547676683b@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com>
 <CAF9BGBzuA33v_aP7eJ_9hdnbv_jwH=GYRXcr-fG5Zzi-fdyfxw@mail.gmail.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF4BB3@FMSMSX126.amr.corp.intel.com>
 <f45b8b3e-182b-c1d3-e8c7-a33f29e6bd43@oracle.com>
 <3cbd216b-56f9-0398-db93-4e032af85dc1@oracle.com>
Message-ID: <8facd4a1-f93a-4f63-daa7-34ad993a556b@oracle.com>

Sandhya, you can use jdk submit repo to test build on other Oracle platforms (x64 and SPARC only, no 32-bit):

https://wiki.openjdk.java.net/display/Build/Submit+Repo

Vladimir

On 9/20/18 10:09 AM, Vladimir Kozlov wrote:
> I hit build failure on SPARC due to shared changes in C1:
> 
> workspace/open/src/hotspot/cpu/sparc/c1_LIRAssembler_sparc.cpp", line 3027: Error: "LIR_Assembler::negate(LIR_OprDesc*, 
> LIR_OprDesc*)" was previously declared "LIR_Assembler::negate(LIR_OprDesc*, LIR_OprDesc*, LIR_OprDesc*)".
> jib > 1 Error(s) detected.
> 
> I assume other platforms are also affected.
> 
> Vladimir
> 
> On 9/19/18 9:53 AM, Vladimir Kozlov wrote:
>> Thank you, Sandhya
>>
>> I submitted new testing.
>>
>> Vladimir
>>
>> On 9/18/18 4:52 PM, Viswanathan, Sandhya wrote:
>>> Hi Vladimir,
>>>
>>> Please find below the updated webrev with fixes for the two issues:
>>>
>>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.03/ 
>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.03/>
>>>
>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8210764
>>>
>>> Fix for compiler/vectorization/TestNaNVector.java was to pass legVecS as the temporary register type for intrinsics 
>>> instead of legVecD.
>>>
>>> This test was only failing with -XX:MaxVectorSize=4.
>>>
>>> The file modified is x86_64.ad.
>>>
>>> Fix for compiler/vectorization/TestNaNVector.java was to allow all xmm registers (xmm0-xmm31) for C1 and handle 
>>> floating point abs and negate appropriately by providing a temp register.
>>>
>>> The C1 files are modified for this fix.
>>>
>>> I reran compiler jtreg tests with setting nativepath appropriately on Haswell, SKX and KNL.
>>>
>>> Best Regards,
>>>
>>> Sandhya
>>>
>>> *From:*Viswanathan, Sandhya
>>> *Sent:* Tuesday, September 18, 2018 1:47 PM
>>> *To:* 'JC Beyler' <jcbeyler at google.com>
>>> *Cc:* vladimir.kozlov at oracle.com; hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>
>>> *Subject:* RE: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>>
>>> Hi Jc,
>>>
>>> Thanks a lot for the steps. I am now able to verify my fix for the NativeCallTest.java.
>>>
>>> Best Regards,
>>>
>>> Sandhya
>>>
>>> *From:*JC Beyler [mailto:jcbeyler at google.com]
>>> *Sent:* Monday, September 17, 2018 9:29 PM
>>> *To:* Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>
>>> *Cc:* vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>; hotspot-compiler-dev 
>>> <hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>>
>>> *Subject:* Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>>
>>> Hi Sandhya,
>>>
>>> How are you invoking the test for NativeCallTest?
>>>
>>> The way I would do it using jtreg would be something like this:
>>>
>>> $ export BUILD_TYPE=release
>>>
>>> $ export JDK_PATH=wherever you have your JDK
>>>
>>> ?From the test subfolder:
>>>
>>> $ wherever-your-jtreg-is/bin/jtreg 
>>> -nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/support/test/hotspot/jtreg/native/lib -jdk 
>>> $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk 
>>> hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java
>>>
>>> Seems to pass for me.
>>>
>>> But much easier is:
>>>
>>> $ make run-test TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java"
>>>
>>> That seems to pass for me as well and is easier to use :)
>>>
>>> For information, the make run-test documentation is here:
>>>
>>> http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing.html
>>>
>>> Let me know if that helps,
>>>
>>> Jc
>>>
>>> ??? Note: For NativeCallTest.java and many others I get the following message in the corresponding .jtr file:
>>> ???? ? ? ? ? ? ? ? ? ? ? "test result: Error. Use -nativepath to specify the location of native code"
>>> ???? ? ? ? ? ? ? Do I need to give any additional info to jtreg to get over this problem?
>>>
>>> ??? Thanks a lot!
>>> ??? Best Regards,
>>> ??? Sandhya
>>>
>>> ??? -----Original Message-----
>>> ??? From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>]
>>> ??? Sent: Monday, September 17, 2018 10:14 AM
>>> ??? To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>;
>>> ??? hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>>> ??? Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>>
>>> ??? I finished testing on avx512 machine.
>>> ??? All passed except known (TestNaNVector.java) failures.
>>>
>>> ??? Thanks,
>>> ??? Vladimir
>>>
>>> ??? On 9/14/18 5:22 PM, Vladimir Kozlov wrote:
>>> ???? > I gave incorrect link to RFE. Here is correct:
>>> ???? >
>>> ???? > https://bugs.openjdk.java.net/browse/JDK-8210764
>>> ???? >
>>> ???? > Vladimir
>>> ???? >
>>> ???? > On 9/14/18 3:49 PM, Vladimir Kozlov wrote:
>>> ???? >> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed.
>>> ???? >>
>>> ???? >> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too.
>>> ???? >>
>>> ???? >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation'
>>> ??? on CPU
>>> ???? >> with AVX1 only
>>> ???? >>
>>> ???? >> #? SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884
>>> ???? >> # Problematic frame:
>>> ???? >> # V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20
>>> ???? >>
>>> ???? >> Current CompileTask:
>>> ???? >> C2:??? 154??? 5???????????? java.lang.String::equals (65 bytes)
>>> ???? >>
>>> ???? >> Stack: [0x00007f3b10044000,0x00007f3b10145000],? sp=0x00007f3b1013fe70,? free space=1007k
>>> ???? >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>>> ???? >> V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20
>>> ???? >> V? [libjvm.so+0x882a72]? PhaseChaitin::gather_lrg_masks(bool)+0x872
>>> ???? >> V? [libjvm.so+0xd82235]? PhaseCFG::global_code_motion()+0xfc5
>>> ???? >> V? [libjvm.so+0xd824b1]? PhaseCFG::do_global_code_motion()+0x51
>>> ???? >> V? [libjvm.so+0xa2c26c]? Compile::Code_Gen()+0x24c
>>> ???? >> V? [libjvm.so+0xa2ff82]? Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool,
>>> ??? DirectiveSet*)+0xe42
>>> ???? >>
>>> ???? >> ------------------------------------------------------------------------------------------------
>>> ???? >> 2.
>>>
>>> ??? with '-Xcomp'
>>> ???? >> #? Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073
>>> ???? >> #? assert(false) failed: cannot spill interval that is used in first instruction (possible reason: no register
>>> ??? found)
>>> ???? >>
>>> ???? >> Current CompileTask:
>>> ???? >> C1: 854767 13391?????? 3?????? org.sunflow.math.Matrix4::multiply (692 bytes)
>>> ???? >>
>>> ???? >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],? sp=0x00007f23b9e7f9d0,? free space=1014k
>>> ???? >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>>> ???? >> V? [libjvm.so+0x1882202]? VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, 
>>> unsigned
>>> ???? >> char*, void*, void*, char const*, int, unsigned long)+0x562
>>> ???? >> V? [libjvm.so+0x1882d2f]? VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*,
>>> ???? >> __va_list_tag*)+0x2f
>>> ???? >> V? [libjvm.so+0xb0bea0]? report_vm_error(char const*, int, char const*, char const*, ...)+0x100
>>> ???? >> V? [libjvm.so+0x7e0410]? LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0
>>> ???? >> V? [libjvm.so+0x7e0a20]? LinearScanWalker::activate_current()+0x280
>>> ???? >> V? [libjvm.so+0x7e0c7d]? IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d
>>> ???? >> V? [libjvm.so+0x7e1078]? LinearScan::allocate_registers()+0x338
>>> ???? >> V? [libjvm.so+0x7e2135]? LinearScan::do_linear_scan()+0x155
>>> ???? >> V? [libjvm.so+0x70a6bb]? Compilation::emit_lir()+0x99b
>>> ???? >> V? [libjvm.so+0x70caff]? Compilation::compile_java_method()+0x42f
>>> ???? >> V? [libjvm.so+0x70d974]? Compilation::compile_method()+0x1d4
>>> ???? >> V? [libjvm.so+0x70e547]? Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*,
>>> ???? >> DirectiveSet*)+0x357
>>> ???? >> V? [libjvm.so+0x71073c]? Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c
>>> ???? >> V? [libjvm.so+0xa3cf89]? CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409
>>> ???? >>
>>> ???? >> Vladimir
>>> ???? >>
>>> ???? >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote:
>>> ???? >>>
>>> ???? >>> Thanks Vladimir, the below should fix this issue:
>>> ???? >>>
>>> ???? >>> ------------------------------
>>> ???? >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700
>>> ???? >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700
>>> ???? >>> @@ -233,22 +233,6 @@
>>> ???? >>> ??? _xmm_regs[13]? = xmm13;
>>> ???? >>> ??? _xmm_regs[14]? = xmm14;
>>> ???? >>> ??? _xmm_regs[15]? = xmm15;
>>> ???? >>> -? _xmm_regs[16]? = xmm16;
>>> ???? >>> -? _xmm_regs[17]? = xmm17;
>>> ???? >>> -? _xmm_regs[18]? = xmm18;
>>> ???? >>> -? _xmm_regs[19]? = xmm19;
>>> ???? >>> -? _xmm_regs[20]? = xmm20;
>>> ???? >>> -? _xmm_regs[21]? = xmm21;
>>> ???? >>> -? _xmm_regs[22]? = xmm22;
>>> ???? >>> -? _xmm_regs[23]? = xmm23;
>>> ???? >>> -? _xmm_regs[24]? = xmm24;
>>> ???? >>> -? _xmm_regs[25]? = xmm25;
>>> ???? >>> -? _xmm_regs[26]? = xmm26;
>>> ???? >>> -? _xmm_regs[27]? = xmm27;
>>> ???? >>> -? _xmm_regs[28]? = xmm28;
>>> ???? >>> -? _xmm_regs[29]? = xmm29;
>>> ???? >>> -? _xmm_regs[30]? = xmm30;
>>> ???? >>> -? _xmm_regs[31]? = xmm31;
>>> ???? >>> ? #endif // _LP64
>>> ???? >>>
>>> ???? >>> ??? for (int i = 0; i < 8; i++) {
>>> ???? >>> ---------------------------------
>>> ???? >>>
>>> ???? >>> I think the gcc version on my desktop is older so didn?t catch this.
>>> ???? >>>
>>> ???? >>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to:
>>> ???? >>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/
>>> ??? <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.02/>
>>> ???? >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735
>>> ???? >>>
>>> ???? >>> FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you 
>>> before
>>> ???? >>> changing it back to 3.
>>> ???? >>>
>>> ???? >>> Best Regards,
>>> ???? >>> Sandhya
>>> ???? >>>
>>> ???? >>>
>>> ???? >>> -----Original Message-----
>>> ???? >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>]
>>> ???? >>> Sent: Friday, September 14, 2018 12:13 PM
>>> ???? >>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>;
>>> ??? hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>>> ???? >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>> ???? >>>
>>> ???? >>> I got build failure:
>>> ???? >>>
>>> ???? >>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the 
>>> array
>>> ???? >>> (which contains 16 elements) [-Werror,-Warray-bounds]
>>> ???? >>> jib >?? _xmm_regs[16]? = xmm16;
>>> ???? >>>
>>> ???? >>> I also noticed that we don't have RFE for this work. I filed:
>>> ???? >>>
>>> ???? >>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>> ???? >>>
>>> ???? >>> You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next
>>> ???? >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp:
>>> ???? >>>
>>> ???? >>> - product(intx, UseAVX, 2, \
>>> ???? >>> + product(intx, UseAVX, 3, \
>>> ???? >>>
>>> ???? >>> Thanks,
>>> ???? >>> Vladimir
>>> ???? >>>
>>> ???? >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote:
>>> ???? >>>> Looks good to me. I will start testing and let you know results.
>>> ???? >>>>
>>> ???? >>>> Thanks,
>>> ???? >>>> Vladimir
>>> ???? >>>>
>>> ???? >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
>>> ???? >>>>> Hi Vladimir,
>>> ???? >>>>>
>>> ???? >>>>> Please find below the updated webrev with all your comments incorporated:
>>> ???? >>>>>
>>> ???? >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
>>> ??? <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.01/>
>>> ???? >>>>>
>>> ???? >>>>> I have run the jtreg compiler tests on SKX and KNL which have two
>>> ???? >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms.
>>> ???? >>>>>
>>> ???? >>>>> Best Regards,
>>> ???? >>>>> Sandhya
>>> ???? >>>>>
>>> ???? >>>>>
>>> ???? >>>>> -----Original Message-----
>>> ???? >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>]
>>> ???? >>>>> Sent: Tuesday, September 11, 2018 8:54 PM
>>> ???? >>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>;
>>> ???? >>>>> hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>>> ???? >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>> ???? >>>>>
>>> ???? >>>>> Thank you, Sandhya
>>> ???? >>>>>
>>> ???? >>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.
>>> ???? >>>>>
>>> ???? >>>>> Vladimir
>>> ???? >>>>>
>>> ???? >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
>>> ???? >>>>>> Hi Vladimir,
>>> ???? >>>>>>
>>> ???? >>>>>> Thanks a lot for the detailed review. I really appreciate your feedback.
>>> ???? >>>>>> Please see my response in your email below marked with (Sandhya
>>> ???? >>>>>>>>> ). Looking forward to your advice.
>>> ???? >>>>>>
>>> ???? >>>>>> Best Regards,
>>> ???? >>>>>> Sandhya
>>> ???? >>>>>>
>>> ???? >>>>>>
>>> ???? >>>>>> -----Original Message-----
>>> ???? >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>]
>>> ???? >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM
>>> ???? >>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>;
>>> ???? >>>>>> hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>>> ???? >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>> ???? >>>>>> instruction
>>> ???? >>>>>>
>>> ???? >>>>>> Thank you.
>>> ???? >>>>>>
>>> ???? >>>>>> I want to discuss next issue:
>>> ???? >>>>>>
>>> ???? >>>>>> ??? > You did not added instructions to load these registers from
>>> ???? >>>>>> memory (and stack). What happens in such cases when you need to load or store?
>>> ???? >>>>>> ??? >>>> Let us take an example, e.g. for loading into rregF. First
>>> ???? >>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>> ???? >>>>>>
>>> ???? >>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack.
>>> ???? >>>>>> Also we don't check that register could be the same as result you may get unneeded moves.
>>> ???? >>>>>>
>>> ???? >>>>>> I would advice add memory moves at least.
>>> ???? >>>>>>
>>> ???? >>>>>> Sandhya >>>? I had added those rules initially and removed them in
>>> ???? >>>>>> the final patch. I noticed that the register allocator uses the
>>> ???? >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask
>>> ???? >>>>>> (matcher.cpp). I would like the register allocator to get all the
>>> ???? >>>>>> possible register on an architecture for idealreg2reg mask. I
>>> ???? >>>>>> wondered that multiple instruct rules in .ad file for LoadF from
>>> ???? >>>>>> memory might cause problems.? I would have to have higher cost for
>>> ???? >>>>>> loading into restricted register set like vlReg. Then I decided that
>>> ???? >>>>>> the register allocator can handle this in much better way than me
>>> ???? >>>>>> adding rules to load from memory. This is with the background that the regF is always all the available
>>> ??? registers
>>> ???? >>>>>> and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this
>>> ??? and if
>>> ???? >>>>>> I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp
>>> ??? that I
>>> ???? >>>>>> am referring to is:
>>> ???? >>>>>> ???? MachNode *spillCP = match_tree(new
>>> ???? >>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>> ???? >>>>>> #endif
>>> ???? >>>>>> ???? MachNode *spillI? = match_tree(new
>>> ???? >>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>>> ???? >>>>>> ???? MachNode *spillL? = match_tree(new
>>> ???? >>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered,
>>> ???? >>>>>> LoadNode::DependsO nlyOnTest, false));
>>> ???? >>>>>> ???? MachNode *spillF? = match_tree(new
>>> ???? >>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>>> ???? >>>>>> ???? MachNode *spillD? = match_tree(new
>>> ???? >>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>>> ???? >>>>>> ???? MachNode *spillP? = match_tree(new
>>> ???? >>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>> ???? >>>>>> ???? ....
>>> ???? >>>>>> ???? idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
>>> ???? >>>>>>
>>> ???? >>>>>> An other question. You use movflt() and movdbl() which use either
>>> ???? >>>>>> movap[s|d] and movs[s|d]
>>> ???? >>>>>> instructions:
>>> ???? >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu
>>> ???? >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when
>>> ???? >>>>>> avx512vl is not available? I see for vectors you use
>>> ???? >>>>>> vpxor+vinserti* combination.
>>> ???? >>>>>>
>>> ???? >>>>>> Sandhya >>> Yes the scalar floating point instructions are available
>>> ???? >>>>>> with AVX512 encoding when avx512vl is not available. That is why you
>>> ???? >>>>>> would see not just movflt, movdbl but all the other scalar
>>> ???? >>>>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F
>>> ??? instructions.
>>> ???? >>>>>>
>>> ???? >>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad <http://x86.ad>:
>>> ???? >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
>>> ???? >>>>>>
>>> ???? >>>>>> Should it be (UseAVX < 3)?
>>> ???? >>>>>>
>>> ???? >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
>>> ???? >>>>>>
>>> ???? >>>>>> Thanks,
>>> ???? >>>>>> Vladimir
>>> ???? >>>>>>
>>> ???? >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>>> ???? >>>>>>> Hi Vladimir,
>>> ???? >>>>>>>
>>> ???? >>>>>>> Thanks a lot for your review and feedback. Please see my response
>>> ???? >>>>>>> in your email below. I will send an updated webrev incorporating your feedback.
>>> ???? >>>>>>>
>>> ???? >>>>>>> Best Regards,
>>> ???? >>>>>>> Sandhya
>>> ???? >>>>>>>
>>> ???? >>>>>>>
>>> ???? >>>>>>> -----Original Message-----
>>> ???? >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>]
>>> ???? >>>>>>> Sent: Monday, September 10, 2018 6:09 PM
>>> ???? >>>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com <mailto:sandhya.viswanathan at intel.com>>;
>>> ???? >>>>>>> hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>>> ???? >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>> ???? >>>>>>> instruction
>>> ???? >>>>>>>
>>> ???? >>>>>>> Very nice. Thank you, Sandhya.
>>> ???? >>>>>>>
>>> ???? >>>>>>> I would like to see more meaningful naming in .ad files - instead
>>> ???? >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*.
>>> ???? >>>>>>>
>>> ???? >>>>>>>>>> Yes, accepted.
>>> ???? >>>>>>>
>>> ???? >>>>>>> New load_from_* and load_to_* instructions in .ad files should be
>>> ???? >>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions:
>>> ???? >>>>>>>
>>> ???? >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst,
>>> ???? >>>>>>> vlRegF src)
>>> ???? >>>>>>>>>> Yes, accepted.
>>> ???? >>>>>>>
>>> ???? >>>>>>> You did not added instructions to load these registers from memory
>>> ???? >>>>>>> (and stack). What happens in such cases when you need to load or store?
>>> ???? >>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it
>>> ???? >>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>> ???? >>>>>>>
>>> ???? >>>>>>> Also please explain why these registers are used when UseAVX == 0?:
>>> ???? >>>>>>>
>>> ???? >>>>>>> +instruct absD_reg(rregD dst) %{
>>> ???? >>>>>>> ??????? predicate((UseSSE>=2) && (UseAVX == 0));
>>> ???? >>>>>>>
>>> ???? >>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too:
>>> ???? >>>>>>> ????? 661?? if (UseAVX < 3) {
>>> ???? >>>>>>> ????? 662???? _features &= ~CPU_AVX512F;
>>> ???? >>>>>>>
>>> ???? >>>>>>>>>> Yes, accepted. It could be regD here.
>>> ???? >>>>>>>
>>> ???? >>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
>>> ???? >>>>>>>
>>> ???? >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>>> ???? >>>>>>> +vectors_reg_legacy, %{
>>> ???? >>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
>>> ???? >>>>>>> VM_Version::supports_avx512dq() &&
>>> ???? >>>>>>> VM_Version::supports_avx512vl() %} );
>>> ???? >>>>>>>
>>> ???? >>>>>>>>>> Yes, accepted.
>>> ???? >>>>>>>
>>> ???? >>>>>>> I would suggest to test these changes on different machines
>>> ???? >>>>>>> (non-avx512 and avx512) and with different UseAVX values.
>>> ???? >>>>>>>
>>> ???? >>>>>>>>>> Will do.
>>> ???? >>>>>>>
>>> ???? >>>>>>> Thanks,
>>> ???? >>>>>>> Vladimir
>>> ???? >>>>>>>
>>> ???? >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>>> ???? >>>>>>>> Recently there have been couple of high priority issues with
>>> ???? >>>>>>>> regards to high bank of XMM register
>>> ???? >>>>>>>> (XMM16-XMM31) usage by C2:
>>> ???? >>>>>>>>
>>> ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>>> ???? >>>>>>>>
>>> ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>> ???? >>>>>>>>
>>> ???? >>>>>>>> Please find below a patch which attempts to clean up the XMM
>>> ???? >>>>>>>> register handling by using register groups.
>>> ???? >>>>>>>>
>>> ???? >>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>>> ??? <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>> ???? >>>>>>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>> ???? >>>>>>>>
>>> ???? >>>>>>>> The patch provides a restricted set of registers to the match
>>> ???? >>>>>>>> rules in the ad file based on the underlying architecture.
>>> ???? >>>>>>>>
>>> ???? >>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>>> ???? >>>>>>>>
>>> ???? >>>>>>>> By removing the special handling, the patch reduces the overall
>>> ???? >>>>>>>> code size by about 1800 lines of code.
>>> ???? >>>>>>>>
>>> ???? >>>>>>>> Your review and feedback is very welcome.
>>> ???? >>>>>>>>
>>> ???? >>>>>>>> Best Regards,
>>> ???? >>>>>>>>
>>> ???? >>>>>>>> Sandhya
>>> ???? >>>>>>>>
>>>
>>>
>>> -- 
>>>
>>> Thanks,
>>>
>>> Jc
>>>

From dmitrij.pochepko at bell-sw.com  Thu Sep 20 17:27:28 2018
From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko)
Date: Thu, 20 Sep 2018 20:27:28 +0300
Subject: RFR: 8189107 - AARCH64: create intrinsic for pow
In-Reply-To: <47c4307d-eca1-5270-7e2c-83a31686ef91@redhat.com>
References: <d5554366-9851-a63c-b8d2-c1300219f487@bell-sw.com>
 <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com>
 <c9153905-1f74-fb34-4938-b48b34d5a12c@redhat.com>
 <a7707848-3075-9fa4-f81d-89c128fd6dca@bell-sw.com>
 <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com>
 <ae6b81dd-795d-e23f-099a-761d776f3466@redhat.com>
 <02da7309-9a70-ea30-fafc-d1d85cfae328@bell-sw.com>
 <47c4307d-eca1-5270-7e2c-83a31686ef91@redhat.com>
Message-ID: <2aff2d56-4126-5fdb-ece5-68324f974f8c@bell-sw.com>


On 14/09/18 18:29, Andrew Dinn wrote:
> On 13/09/18 15:35, Dmitrij Pochepko wrote:
>> Other comments seems fine
> I am glad to hear that you did not find any errors in my analysis.
> However, I also need to ask you to answer a question that was implicit
> in my earlier note. I said:
>
> "I assume you are familiar with the relevant mathematics and how it has
> been used to derive the algorithm. If so then I would like you to review
> this rewrite and ensure that there are nor mathematical errors in it. I
> would also like you to check that the explanatory comments for of the
> individual steps in the algorithm do not contain any errors.
>
> If you are not familiar with the mathematics then please let me know. I
> need to know whether this has been reviewed by someone competent to do so."
>
> As you didn't respond to this I will have to ask you explicitly this
> time. Do you have a background in mathematics and numerical analysis
> that means you understand how the original algorithm has been arrived
> at? equally, how your algorithm may legitimately vary from that original?
 ?Yes, I do have relevant background in mathematics. And yes to the 
questions below but for the latest. That said, it's always good to have 
another pair of eyes looking at the review. To be honest, I had to 
refresh my memory regarding Remez polynomials.
>
> I'll break this down into several steps:
>
> Do you understand the (elementary) theory that explains how the various
> polynomial expansions I described in my comments converge to the
> original log and exp functions?
>
> Do you understand the theory that explains how partial polynomial sums
> (Remez polynomials) can be used used to approximate these polynomial
> expansions within specified ranges?
>
> Do you know how the coefficients of these Remez polynomial can be
> derived to any necessary accuracy?
>
> Do you understand how the computation of the values of those Remez
> polynomials must proceed in order to guarantee accuracy in the computed
> result in the presence of rounding errors?
>
> Can you provide a mathematical proof that the variations you have
> introduced into the computational process (specifially the move from
> Horner form to Estrin form) will not introduce rounding errors?
I have formal verification for some arguments ranges that I considered 
the most problematic, but the complete proof is too complicated. Looking 
at the situation from reviewer side I understand why it'll be safer and 
easier to maintain to have assembly version duplicate the original 
fdlibm code and because of that I suggest to revert questionable places 
to original schemas as the performance improvement is not that big.

new webrev with polynomial calculations changed back to original schema. 
Also changed scalbn implementation to be the same as original: 
http://cr.openjdk.java.net/~dpochepk/8189107/webrev.03/

As expected, it's about 2% slower.

Thanks,
Dmitrij

>
> I certainly cannot lay claim to a /thorough/ understanding of most, if
> not all, those topics. If you also cannot then I think we need to bring
> in someone who does. In particular, it is the last point that matters
> most of all here as this is where you have /chosen/ to make your
> algorithm diverge from the code you inherited.
>
> As regards the rest of the background maths, we do at least know that
> the other aspects of the algorithm -- in its original manifestation --
> have been checked by numerical experts. Hence, if we ensure that your
> algorithm implements /equivalent/ steps then it ought to inherit the
> same guarantees of correctness. So, the only task as far as most of the
> code is concerned is to iron out any errors you might inadvertently have
> introduced. I have several nits to pick in that regard that which I will
> be posting shortly.
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander


From sandhya.viswanathan at intel.com  Thu Sep 20 17:53:16 2018
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Thu, 20 Sep 2018 17:53:16 +0000
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <3cbd216b-56f9-0398-db93-4e032af85dc1@oracle.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
 <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>
 <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com>
 <be409c1c-789c-3160-f1d4-55acab52d8df@oracle.com>
 <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com>
 <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com>
 <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com>
 <3c0fe42a-0606-a060-7435-32547676683b@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com>
 <CAF9BGBzuA33v_aP7eJ_9hdnbv_jwH=GYRXcr-fG5Zzi-fdyfxw@mail.gmail.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF4BB3@FMSMSX126.amr.corp.intel.com>
 <f45b8b3e-182b-c1d3-e8c7-a33f29e6bd43@oracle.com>
 <3cbd216b-56f9-0398-db93-4e032af85dc1@oracle.com>
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF5771@FMSMSX126.amr.corp.intel.com>

Hi Vladimir,

In C1_LIRAssembler.hpp, when I added an additional parameter to negate, I did make sure to add it as a default parameter:

src/hotspot/share/c1/c1_LIRAssembler.hpp, line 282:
  void negate(LIR_Opr left, LIR_Opr dest, LIR_Opr tmp = LIR_OprFact::illegalOpr);

But I guess since the function is not just called but declared/defined in all the other architectures, I need to add an unused LIR_Opr to the negate function for them.
This would be on similar lines as done in some other C1_LIRAssembler methods. 

I will make this change and work with Vivek to use the submit repo for testing it on Sparc.

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Thursday, September 20, 2018 10:09 AM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction

I hit build failure on SPARC due to shared changes in C1:

workspace/open/src/hotspot/cpu/sparc/c1_LIRAssembler_sparc.cpp", line 3027: Error: "LIR_Assembler::negate(LIR_OprDesc*,
LIR_OprDesc*)" was previously declared "LIR_Assembler::negate(LIR_OprDesc*, LIR_OprDesc*, LIR_OprDesc*)".
jib > 1 Error(s) detected.

I assume other platforms are also affected.

Vladimir

On 9/19/18 9:53 AM, Vladimir Kozlov wrote:
> Thank you, Sandhya
> 
> I submitted new testing.
> 
> Vladimir
> 
> On 9/18/18 4:52 PM, Viswanathan, Sandhya wrote:
>> Hi Vladimir,
>>
>> Please find below the updated webrev with fixes for the two issues:
>>
>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.03/
>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.03/>
>>
>> RFE: https://bugs.openjdk.java.net/browse/JDK-8210764
>>
>> Fix for compiler/vectorization/TestNaNVector.java was to pass legVecS 
>> as the temporary register type for intrinsics instead of legVecD.
>>
>> This test was only failing with -XX:MaxVectorSize=4.
>>
>> The file modified is x86_64.ad.
>>
>> Fix for compiler/vectorization/TestNaNVector.java was to allow all 
>> xmm registers (xmm0-xmm31) for C1 and handle floating point abs and negate appropriately by providing a temp register.
>>
>> The C1 files are modified for this fix.
>>
>> I reran compiler jtreg tests with setting nativepath appropriately on Haswell, SKX and KNL.
>>
>> Best Regards,
>>
>> Sandhya
>>
>> *From:*Viswanathan, Sandhya
>> *Sent:* Tuesday, September 18, 2018 1:47 PM
>> *To:* 'JC Beyler' <jcbeyler at google.com>
>> *Cc:* vladimir.kozlov at oracle.com; hotspot-compiler-dev 
>> <hotspot-compiler-dev at openjdk.java.net>
>> *Subject:* RE: RFR (M) 8207746:C2: Lucene crashes on AVX512 
>> instruction
>>
>> Hi Jc,
>>
>> Thanks a lot for the steps. I am now able to verify my fix for the NativeCallTest.java.
>>
>> Best Regards,
>>
>> Sandhya
>>
>> *From:*JC Beyler [mailto:jcbeyler at google.com]
>> *Sent:* Monday, September 17, 2018 9:29 PM
>> *To:* Viswanathan, Sandhya <sandhya.viswanathan at intel.com 
>> <mailto:sandhya.viswanathan at intel.com>>
>> *Cc:* vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>; 
>> hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net 
>> <mailto:hotspot-compiler-dev at openjdk.java.net>>
>> *Subject:* Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 
>> instruction
>>
>> Hi Sandhya,
>>
>> How are you invoking the test for NativeCallTest?
>>
>> The way I would do it using jtreg would be something like this:
>>
>> $ export BUILD_TYPE=release
>>
>> $ export JDK_PATH=wherever you have your JDK
>>
>> ?From the test subfolder:
>>
>> $ wherever-your-jtreg-is/bin/jtreg
>> -nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/su
>> pport/test/hotspot/jtreg/native/lib -jdk 
>> $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk
>> hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/t
>> est/NativeCallTest.java
>>
>> Seems to pass for me.
>>
>> But much easier is:
>>
>> $ make run-test TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java"
>>
>> That seems to pass for me as well and is easier to use :)
>>
>> For information, the make run-test documentation is here:
>>
>> http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing.
>> html
>>
>> Let me know if that helps,
>>
>> Jc
>>
>> ??? Note: For NativeCallTest.java and many others I get the following message in the corresponding .jtr file:
>> ???? ? ? ? ? ? ? ? ? ? ? "test result: Error. Use -nativepath to specify the location of native code"
>> ???? ? ? ? ? ? ? Do I need to give any additional info to jtreg to get over this problem?
>>
>> ??? Thanks a lot!
>> ??? Best Regards,
>> ??? Sandhya
>>
>> ??? -----Original Message-----
>> ??? From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com 
>> <mailto:vladimir.kozlov at oracle.com>]
>> ??? Sent: Monday, September 17, 2018 10:14 AM
>> ??? To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com 
>> <mailto:sandhya.viswanathan at intel.com>>;
>> ??? hotspot-compiler-dev at openjdk.java.net 
>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>> ??? Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 
>> instruction
>>
>> ??? I finished testing on avx512 machine.
>> ??? All passed except known (TestNaNVector.java) failures.
>>
>> ??? Thanks,
>> ??? Vladimir
>>
>> ??? On 9/14/18 5:22 PM, Vladimir Kozlov wrote:
>> ???? > I gave incorrect link to RFE. Here is correct:
>> ???? >
>> ???? > https://bugs.openjdk.java.net/browse/JDK-8210764
>> ???? >
>> ???? > Vladimir
>> ???? >
>> ???? > On 9/14/18 3:49 PM, Vladimir Kozlov wrote:
>> ???? >> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed.
>> ???? >>
>> ???? >> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too.
>> ???? >>
>> ???? >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation'
>> ??? on CPU
>> ???? >> with AVX1 only
>> ???? >>
>> ???? >> #? SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, 
>> tid=13884
>> ???? >> # Problematic frame:
>> ???? >> # V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20
>> ???? >>
>> ???? >> Current CompileTask:
>> ???? >> C2:??? 154??? 5???????????? java.lang.String::equals (65 
>> bytes)
>> ???? >>
>> ???? >> Stack: [0x00007f3b10044000,0x00007f3b10145000],? 
>> sp=0x00007f3b1013fe70,? free space=1007k
>> ???? >> Native frames: (J=compiled Java code, A=aot compiled Java 
>> code, j=interpreted, Vv=VM code, C=native code)
>> ???? >> V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20
>> ???? >> V? [libjvm.so+0x882a72]? 
>> PhaseChaitin::gather_lrg_masks(bool)+0x872
>> ???? >> V? [libjvm.so+0xd82235]? PhaseCFG::global_code_motion()+0xfc5
>> ???? >> V? [libjvm.so+0xd824b1]? 
>> PhaseCFG::do_global_code_motion()+0x51
>> ???? >> V? [libjvm.so+0xa2c26c]? Compile::Code_Gen()+0x24c
>> ???? >> V? [libjvm.so+0xa2ff82]? Compile::Compile(ciEnv*, 
>> C2Compiler*, ciMethod*, int, bool, bool, bool,
>> ??? DirectiveSet*)+0xe42
>> ???? >>
>> ???? >> 
>> ---------------------------------------------------------------------
>> ---------------------------
>> ???? >> 2.
>>
>> ??? with '-Xcomp'
>> ???? >> #? Internal Error 
>> (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), 
>> pid=22016, tid=22073
>> ???? >> #? assert(false) failed: cannot spill interval that is used 
>> in first instruction (possible reason: no register
>> ??? found)
>> ???? >>
>> ???? >> Current CompileTask:
>> ???? >> C1: 854767 13391?????? 3?????? 
>> org.sunflow.math.Matrix4::multiply (692 bytes)
>> ???? >>
>> ???? >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],? 
>> sp=0x00007f23b9e7f9d0,? free space=1014k
>> ???? >> Native frames: (J=compiled Java code, A=aot compiled Java 
>> code, j=interpreted, Vv=VM code, C=native code)
>> ???? >> V? [libjvm.so+0x1882202]? VMError::report_and_die(int, char 
>> const*, char const*, __va_list_tag*, Thread*, unsigned
>> ???? >> char*, void*, void*, char const*, int, unsigned long)+0x562
>> ???? >> V? [libjvm.so+0x1882d2f]? VMError::report_and_die(Thread*, 
>> void*, char const*, int, char const*, char const*,
>> ???? >> __va_list_tag*)+0x2f
>> ???? >> V? [libjvm.so+0xb0bea0]? report_vm_error(char const*, int, 
>> char const*, char const*, ...)+0x100
>> ???? >> V? [libjvm.so+0x7e0410]? 
>> LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0
>> ???? >> V? [libjvm.so+0x7e0a20]? 
>> LinearScanWalker::activate_current()+0x280
>> ???? >> V? [libjvm.so+0x7e0c7d]? IntervalWalker::walk_to(int) [clone 
>> .constprop.299]+0x9d
>> ???? >> V? [libjvm.so+0x7e1078]? 
>> LinearScan::allocate_registers()+0x338
>> ???? >> V? [libjvm.so+0x7e2135]? LinearScan::do_linear_scan()+0x155
>> ???? >> V? [libjvm.so+0x70a6bb]? Compilation::emit_lir()+0x99b
>> ???? >> V? [libjvm.so+0x70caff]? 
>> Compilation::compile_java_method()+0x42f
>> ???? >> V? [libjvm.so+0x70d974]? Compilation::compile_method()+0x1d4
>> ???? >> V? [libjvm.so+0x70e547]? 
>> Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, 
>> BufferBlob*,
>> ???? >> DirectiveSet*)+0x357
>> ???? >> V? [libjvm.so+0x71073c]? Compiler::compile_method(ciEnv*, 
>> ciMethod*, int, DirectiveSet*)+0x14c
>> ???? >> V? [libjvm.so+0xa3cf89]? 
>> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409
>> ???? >>
>> ???? >> Vladimir
>> ???? >>
>> ???? >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote:
>> ???? >>>
>> ???? >>> Thanks Vladimir, the below should fix this issue:
>> ???? >>>
>> ???? >>> ------------------------------
>> ???? >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 
>> 13:10:23.488379912 -0700
>> ???? >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 
>> 13:10:23.308379915 -0700
>> ???? >>> @@ -233,22 +233,6 @@
>> ???? >>> ??? _xmm_regs[13]? = xmm13;
>> ???? >>> ??? _xmm_regs[14]? = xmm14;
>> ???? >>> ??? _xmm_regs[15]? = xmm15;
>> ???? >>> -? _xmm_regs[16]? = xmm16;
>> ???? >>> -? _xmm_regs[17]? = xmm17;
>> ???? >>> -? _xmm_regs[18]? = xmm18;
>> ???? >>> -? _xmm_regs[19]? = xmm19;
>> ???? >>> -? _xmm_regs[20]? = xmm20;
>> ???? >>> -? _xmm_regs[21]? = xmm21;
>> ???? >>> -? _xmm_regs[22]? = xmm22;
>> ???? >>> -? _xmm_regs[23]? = xmm23;
>> ???? >>> -? _xmm_regs[24]? = xmm24;
>> ???? >>> -? _xmm_regs[25]? = xmm25;
>> ???? >>> -? _xmm_regs[26]? = xmm26;
>> ???? >>> -? _xmm_regs[27]? = xmm27;
>> ???? >>> -? _xmm_regs[28]? = xmm28;
>> ???? >>> -? _xmm_regs[29]? = xmm29;
>> ???? >>> -? _xmm_regs[30]? = xmm30;
>> ???? >>> -? _xmm_regs[31]? = xmm31;
>> ???? >>> ? #endif // _LP64
>> ???? >>>
>> ???? >>> ??? for (int i = 0; i < 8; i++) {
>> ???? >>> ---------------------------------
>> ???? >>>
>> ???? >>> I think the gcc version on my desktop is older so didn?t catch this.
>> ???? >>>
>> ???? >>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to:
>> ???? >>> Patch: 
>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/
>> ??? <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.02/>
>> ???? >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735
>> ???? >>>
>> ???? >>> FYI, I did notice that the default for UseAVX had been 
>> rolled back and wanted to get confirmation from you before
>> ???? >>> changing it back to 3.
>> ???? >>>
>> ???? >>> Best Regards,
>> ???? >>> Sandhya
>> ???? >>>
>> ???? >>>
>> ???? >>> -----Original Message-----
>> ???? >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com 
>> <mailto:vladimir.kozlov at oracle.com>]
>> ???? >>> Sent: Friday, September 14, 2018 12:13 PM
>> ???? >>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com 
>> <mailto:sandhya.viswanathan at intel.com>>;
>> ??? hotspot-compiler-dev at openjdk.java.net 
>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>> ???? >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 
>> instruction
>> ???? >>>
>> ???? >>> I got build failure:
>> ???? >>>
>> ???? >>> 
>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: 
>> array index 16 is past the end of the array
>> ???? >>> (which contains 16 elements) [-Werror,-Warray-bounds]
>> ???? >>> jib >?? _xmm_regs[16]? = xmm16;
>> ???? >>>
>> ???? >>> I also noticed that we don't have RFE for this work. I filed:
>> ???? >>>
>> ???? >>> https://bugs.openjdk.java.net/browse/JDK-8209735
>> ???? >>>
>> ???? >>> You did not enabled avx512 by default (8209735 change was 
>> synced from jdk 11 into 12 2 weeks ago). I added next
>> ???? >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp:
>> ???? >>>
>> ???? >>> - product(intx, UseAVX, 2, \
>> ???? >>> + product(intx, UseAVX, 3, \
>> ???? >>>
>> ???? >>> Thanks,
>> ???? >>> Vladimir
>> ???? >>>
>> ???? >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote:
>> ???? >>>> Looks good to me. I will start testing and let you know results.
>> ???? >>>>
>> ???? >>>> Thanks,
>> ???? >>>> Vladimir
>> ???? >>>>
>> ???? >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
>> ???? >>>>> Hi Vladimir,
>> ???? >>>>>
>> ???? >>>>> Please find below the updated webrev with all your comments incorporated:
>> ???? >>>>>
>> ???? >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
>> ??? <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.01/>
>> ???? >>>>>
>> ???? >>>>> I have run the jtreg compiler tests on SKX and KNL which 
>> have two
>> ???? >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms.
>> ???? >>>>>
>> ???? >>>>> Best Regards,
>> ???? >>>>> Sandhya
>> ???? >>>>>
>> ???? >>>>>
>> ???? >>>>> -----Original Message-----
>> ???? >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com 
>> <mailto:vladimir.kozlov at oracle.com>]
>> ???? >>>>> Sent: Tuesday, September 11, 2018 8:54 PM
>> ???? >>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com 
>> <mailto:sandhya.viswanathan at intel.com>>;
>> ???? >>>>> hotspot-compiler-dev at openjdk.java.net 
>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>> ???? >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 
>> instruction
>> ???? >>>>>
>> ???? >>>>> Thank you, Sandhya
>> ???? >>>>>
>> ???? >>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.
>> ???? >>>>>
>> ???? >>>>> Vladimir
>> ???? >>>>>
>> ???? >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
>> ???? >>>>>> Hi Vladimir,
>> ???? >>>>>>
>> ???? >>>>>> Thanks a lot for the detailed review. I really appreciate your feedback.
>> ???? >>>>>> Please see my response in your email below marked with 
>> (Sandhya
>> ???? >>>>>>>>> ). Looking forward to your advice.
>> ???? >>>>>>
>> ???? >>>>>> Best Regards,
>> ???? >>>>>> Sandhya
>> ???? >>>>>>
>> ???? >>>>>>
>> ???? >>>>>> -----Original Message-----
>> ???? >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com 
>> <mailto:vladimir.kozlov at oracle.com>]
>> ???? >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM
>> ???? >>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com 
>> <mailto:sandhya.viswanathan at intel.com>>;
>> ???? >>>>>> hotspot-compiler-dev at openjdk.java.net 
>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>> ???? >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>> ???? >>>>>> instruction
>> ???? >>>>>>
>> ???? >>>>>> Thank you.
>> ???? >>>>>>
>> ???? >>>>>> I want to discuss next issue:
>> ???? >>>>>>
>> ???? >>>>>> ??? > You did not added instructions to load these 
>> registers from
>> ???? >>>>>> memory (and stack). What happens in such cases when you need to load or store?
>> ???? >>>>>> ??? >>>> Let us take an example, e.g. for loading into 
>> rregF. First
>> ???? >>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>> ???? >>>>>>
>> ???? >>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack.
>> ???? >>>>>> Also we don't check that register could be the same as result you may get unneeded moves.
>> ???? >>>>>>
>> ???? >>>>>> I would advice add memory moves at least.
>> ???? >>>>>>
>> ???? >>>>>> Sandhya >>>? I had added those rules initially and 
>> removed them in
>> ???? >>>>>> the final patch. I noticed that the register allocator 
>> uses the
>> ???? >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg 
>> mask
>> ???? >>>>>> (matcher.cpp). I would like the register allocator to get 
>> all the
>> ???? >>>>>> possible register on an architecture for idealreg2reg 
>> mask. I
>> ???? >>>>>> wondered that multiple instruct rules in .ad file for 
>> LoadF from
>> ???? >>>>>> memory might cause problems.? I would have to have higher 
>> cost for
>> ???? >>>>>> loading into restricted register set like vlReg. Then I 
>> decided that
>> ???? >>>>>> the register allocator can handle this in much better way 
>> than me
>> ???? >>>>>> adding rules to load from memory. This is with the 
>> background that the regF is always all the available
>> ??? registers
>> ???? >>>>>> and vlRegF is the restricted register set. Likewise for 
>> VecS and legVecS. Let me know you thoughts on this
>> ??? and if
>> ???? >>>>>> I should still add the rules to load from memory into 
>> vlReg and legVec. The specific code from matcher.cpp
>> ??? that I
>> ???? >>>>>> am referring to is:
>> ???? >>>>>> ???? MachNode *spillCP = match_tree(new
>> ???? >>>>>> 
>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>> ???? >>>>>> #endif
>> ???? >>>>>> ???? MachNode *spillI? = match_tree(new
>> ???? >>>>>> 
>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>> ???? >>>>>> ???? MachNode *spillL? = match_tree(new
>> ???? >>>>>> 
>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered,
>> ???? >>>>>> LoadNode::DependsO nlyOnTest, false));
>> ???? >>>>>> ???? MachNode *spillF? = match_tree(new
>> ???? >>>>>> 
>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>> ???? >>>>>> ???? MachNode *spillD? = match_tree(new
>> ???? >>>>>> 
>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>> ???? >>>>>> ???? MachNode *spillP? = match_tree(new
>> ???? >>>>>> 
>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>> ???? >>>>>> ???? ....
>> ???? >>>>>> ???? idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
>> ???? >>>>>>
>> ???? >>>>>> An other question. You use movflt() and movdbl() which 
>> use either
>> ???? >>>>>> movap[s|d] and movs[s|d]
>> ???? >>>>>> instructions:
>> ???? >>>>>> 
>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu
>> ???? >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions 
>> work when
>> ???? >>>>>> avx512vl is not available? I see for vectors you use
>> ???? >>>>>> vpxor+vinserti* combination.
>> ???? >>>>>>
>> ???? >>>>>> Sandhya >>> Yes the scalar floating point instructions 
>> are available
>> ???? >>>>>> with AVX512 encoding when avx512vl is not available. That 
>> is why you
>> ???? >>>>>> would see not just movflt, movdbl but all the other 
>> scalar
>> ???? >>>>>> operations like adds, addsd etc using the entire xmm 
>> range (xmm0-31). In other words they are AVX512F
>> ??? instructions.
>> ???? >>>>>>
>> ???? >>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad <http://x86.ad>:
>> ???? >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
>> ???? >>>>>>
>> ???? >>>>>> Should it be (UseAVX < 3)?
>> ???? >>>>>>
>> ???? >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
>> ???? >>>>>>
>> ???? >>>>>> Thanks,
>> ???? >>>>>> Vladimir
>> ???? >>>>>>
>> ???? >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>> ???? >>>>>>> Hi Vladimir,
>> ???? >>>>>>>
>> ???? >>>>>>> Thanks a lot for your review and feedback. Please see my 
>> response
>> ???? >>>>>>> in your email below. I will send an updated webrev incorporating your feedback.
>> ???? >>>>>>>
>> ???? >>>>>>> Best Regards,
>> ???? >>>>>>> Sandhya
>> ???? >>>>>>>
>> ???? >>>>>>>
>> ???? >>>>>>> -----Original Message-----
>> ???? >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com 
>> <mailto:vladimir.kozlov at oracle.com>]
>> ???? >>>>>>> Sent: Monday, September 10, 2018 6:09 PM
>> ???? >>>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com 
>> <mailto:sandhya.viswanathan at intel.com>>;
>> ???? >>>>>>> hotspot-compiler-dev at openjdk.java.net 
>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>> ???? >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on 
>> AVX512
>> ???? >>>>>>> instruction
>> ???? >>>>>>>
>> ???? >>>>>>> Very nice. Thank you, Sandhya.
>> ???? >>>>>>>
>> ???? >>>>>>> I would like to see more meaningful naming in .ad files 
>> - instead
>> ???? >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*.
>> ???? >>>>>>>
>> ???? >>>>>>>>>> Yes, accepted.
>> ???? >>>>>>>
>> ???? >>>>>>> New load_from_* and load_to_* instructions in .ad files 
>> should be
>> ???? >>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions:
>> ???? >>>>>>>
>> ???? >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct 
>> MoveVL2F(regF dst,
>> ???? >>>>>>> vlRegF src)
>> ???? >>>>>>>>>> Yes, accepted.
>> ???? >>>>>>>
>> ???? >>>>>>> You did not added instructions to load these registers 
>> from memory
>> ???? >>>>>>> (and stack). What happens in such cases when you need to load or store?
>> ???? >>>>>>>>>> Let us take an example, e.g. for loading into rregF. 
>> First it
>> ???? >>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa.
>> ???? >>>>>>>
>> ???? >>>>>>> Also please explain why these registers are used when UseAVX == 0?:
>> ???? >>>>>>>
>> ???? >>>>>>> +instruct absD_reg(rregD dst) %{
>> ???? >>>>>>> ??????? predicate((UseSSE>=2) && (UseAVX == 0));
>> ???? >>>>>>>
>> ???? >>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too:
>> ???? >>>>>>> ????? 661?? if (UseAVX < 3) {
>> ???? >>>>>>> ????? 662???? _features &= ~CPU_AVX512F;
>> ???? >>>>>>>
>> ???? >>>>>>>>>> Yes, accepted. It could be regD here.
>> ???? >>>>>>>
>> ???? >>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
>> ???? >>>>>>>
>> ???? >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>> ???? >>>>>>> +vectors_reg_legacy, %{
>> ???? >>>>>>> VM_Version::supports_evex() && 
>> VM_Version::supports_avx512bw() &&
>> ???? >>>>>>> VM_Version::supports_avx512dq() &&
>> ???? >>>>>>> VM_Version::supports_avx512vl() %} );
>> ???? >>>>>>>
>> ???? >>>>>>>>>> Yes, accepted.
>> ???? >>>>>>>
>> ???? >>>>>>> I would suggest to test these changes on different 
>> machines
>> ???? >>>>>>> (non-avx512 and avx512) and with different UseAVX values.
>> ???? >>>>>>>
>> ???? >>>>>>>>>> Will do.
>> ???? >>>>>>>
>> ???? >>>>>>> Thanks,
>> ???? >>>>>>> Vladimir
>> ???? >>>>>>>
>> ???? >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>> ???? >>>>>>>> Recently there have been couple of high priority issues 
>> with
>> ???? >>>>>>>> regards to high bank of XMM register
>> ???? >>>>>>>> (XMM16-XMM31) usage by C2:
>> ???? >>>>>>>>
>> ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>> ???? >>>>>>>>
>> ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>> ???? >>>>>>>>
>> ???? >>>>>>>> Please find below a patch which attempts to clean up 
>> the XMM
>> ???? >>>>>>>> register handling by using register groups.
>> ???? >>>>>>>>
>> ???? >>>>>>>> 
>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>> ??? <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>> ???? >>>>>>>> 
>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>> ???? >>>>>>>>
>> ???? >>>>>>>> The patch provides a restricted set of registers to the 
>> match
>> ???? >>>>>>>> rules in the ad file based on the underlying architecture.
>> ???? >>>>>>>>
>> ???? >>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>> ???? >>>>>>>>
>> ???? >>>>>>>> By removing the special handling, the patch reduces the 
>> overall
>> ???? >>>>>>>> code size by about 1800 lines of code.
>> ???? >>>>>>>>
>> ???? >>>>>>>> Your review and feedback is very welcome.
>> ???? >>>>>>>>
>> ???? >>>>>>>> Best Regards,
>> ???? >>>>>>>>
>> ???? >>>>>>>> Sandhya
>> ???? >>>>>>>>
>>
>>
>> --
>>
>> Thanks,
>>
>> Jc
>>

From aph at redhat.com  Thu Sep 20 17:58:34 2018
From: aph at redhat.com (Andrew Haley)
Date: Thu, 20 Sep 2018 18:58:34 +0100
Subject: [aarch64-port-dev ] RFR(S): 8210413: AArch64: Optimize div/rem by
 constant in C1
In-Reply-To: <DB7PR08MB3115D98CC20DAB17F160991496130@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115445A18A786BAFD1F7B08961A0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <9645c210-3d87-52fa-8051-54dc60629866@redhat.com>
 <DB7PR08MB3115D98CC20DAB17F160991496130@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <9d454f5b-475b-5713-7155-c6946f378c3e@redhat.com>

On 09/20/2018 05:15 AM, Pengfei Li (Arm Technology China) wrote:
> Please find below new patch that added the same optimization for longs as well as ints and also fixed an issue.
> http://cr.openjdk.java.net/~yzhang/8210413/webrev.01/
> 
> Could you help look at it again?

That's fine. I'm not exactly delighted by the amount of duplicated
code for long and int, but it's very hard to avoid in this case.
The patch is good for JDK/JDK.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From Pengfei.Li at arm.com  Fri Sep 21 06:53:05 2018
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Fri, 21 Sep 2018 06:53:05 +0000
Subject: [aarch64-port-dev ] RFR(S): 8210413: AArch64: Optimize div/rem by
 constant in C1
In-Reply-To: <9d454f5b-475b-5713-7155-c6946f378c3e@redhat.com>
References: <DB7PR08MB3115445A18A786BAFD1F7B08961A0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <9645c210-3d87-52fa-8051-54dc60629866@redhat.com>
 <DB7PR08MB3115D98CC20DAB17F160991496130@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <9d454f5b-475b-5713-7155-c6946f378c3e@redhat.com>
Message-ID: <DB7PR08MB3115A33F5A78FFA06D04319296120@DB7PR08MB3115.eurprd08.prod.outlook.com>

Thanks for your code review. Could you help push this patch?

--
Thanks,
Pengfei


> -----Original Message-----
> 
> On 09/20/2018 05:15 AM, Pengfei Li (Arm Technology China) wrote:
> > Please find below new patch that added the same optimization for longs as
> well as ints and also fixed an issue.
> > http://cr.openjdk.java.net/~yzhang/8210413/webrev.01/
> >
> > Could you help look at it again?
> 
> That's fine. I'm not exactly delighted by the amount of duplicated code for
> long and int, but it's very hard to avoid in this case.
> The patch is good for JDK/JDK.
> 
> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From adinn at redhat.com  Fri Sep 21 08:55:32 2018
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 21 Sep 2018 09:55:32 +0100
Subject: RFR: 8189107 - AARCH64: create intrinsic for pow
In-Reply-To: <2aff2d56-4126-5fdb-ece5-68324f974f8c@bell-sw.com>
References: <d5554366-9851-a63c-b8d2-c1300219f487@bell-sw.com>
 <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com>
 <c9153905-1f74-fb34-4938-b48b34d5a12c@redhat.com>
 <a7707848-3075-9fa4-f81d-89c128fd6dca@bell-sw.com>
 <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com>
 <ae6b81dd-795d-e23f-099a-761d776f3466@redhat.com>
 <02da7309-9a70-ea30-fafc-d1d85cfae328@bell-sw.com>
 <47c4307d-eca1-5270-7e2c-83a31686ef91@redhat.com>
 <2aff2d56-4126-5fdb-ece5-68324f974f8c@bell-sw.com>
Message-ID: <7ca0112f-76f9-538d-f263-8199cbd380ab@redhat.com>

On 20/09/18 18:27, Dmitrij Pochepko wrote:
> On 14/09/18 18:29, Andrew Dinn wrote:
> ?Yes, I do have relevant background in mathematics. And yes to the
> questions below but for the latest. That said, it's always good to have
> another pair of eyes looking at the review. To be honest, I had to
> refresh my memory regarding Remez polynomials.

Thank you for the confirmation.

   . . .

>> Can you provide a mathematical proof that the variations you have
>> introduced into the computational process (specifially the move from
>> Horner form to Estrin form) will not introduce rounding errors?
> I have formal verification for some arguments ranges that I considered
> the most problematic, but the complete proof is too complicated. Looking
> at the situation from reviewer side I understand why it'll be safer and
> easier to maintain to have assembly version duplicate the original
> fdlibm code and because of that I suggest to revert questionable places
> to original schemas as the performance improvement is not that big.

Ok, use of Horner form was one of the things I was going to ask you to
restore. I did actually ask Joe Darcy about the use of Estrin form. If
he can provide an argument that it is ok to employ it then we can think
about reinstating the vector computation as an upgrade at a later date.
I am not surprised its removal makes only a small difference, given how
little of the computation it represents.

> new webrev with polynomial calculations changed back to original schema.
> Also changed scalbn implementation to be the same as original:
> http://cr.openjdk.java.net/~dpochepk/8189107/webrev.03/
So, I guess that means you have now actually tested the underflow case?
The previous scalbn implementation was one of two places in which the
code was seriously broken. In consequence, all computations meant to
generate underflowing results were not computed correctly.

One of the things wrong in the previous scalbn implementation was the
use of cmp rather than cmpw, an error which I see you have now fixed
(there were 3 further mistakes in this section of the code). Your
rewrite of the case handling looks complex and unnecessarily slow to me.
I'll suggest a better fix in a review I am currently writing up.

The other place where there was an error was in the Y_IS_HUGE branch.
The vector maths code expects to load the first 4 table values in a
permuted order (i.e. it assumes they are 0.25, -1/ln2, -0.333333, 0.5)
but the corresponding constants in _pow_coef1 have not been permuted.
Because of this all computations where 2^31 < y < 2^64 were not computed
correctly. This error appears still to be present in the latest version
of the code. So, I assume you have also never tested that range of values?

For example, try x = 1.0000000001, y = (double)0x1_1000_0000L and
compare the result with that obtained from the StrictMath routine.

I did explicitly ask you to ensure that all paths through the code were
tested in an earlier posting. Once I had read through and understood the
code it took me about 2 minutes to produce a test program that exercised
each of these 2 broken paths (and about 1/2 hour in gdb to detect and
fix each problem). I'm very under-impressed that you did not bother to
produce such tests as part of your test regime.

These errors are not bizarre or unexpected corner cases. They are paths
that can be expected to be taken during normal computations. Testing the
code requires at the very least driving the code through such paths with
suitable inputs and then ensuring the results are valid so you should
really have looked for and found these bugs.

Of course, testing also requires identifying and checking bizarre or
unexpected corner cases. However, while it might be understandable if
some of those latter cases were missed there is really no excuse for not
checking the known, expected paths with at least some inputs. It's
extremely unhelpful to those who have to maintain this code when a
contributor takes the job of testing this lightly. Nor is such behaviour
going to encourage anyone to accept further contributions.

You can expect a full review of the code some time today (it will be
based on the previous version so you will have to make allowance for the
changes you have made in the latest version). There are only a few
things I would like you to tweak in the sequence of generated
instructions. However, you will need to do a lot of rework to make the
generator code more systematic and more readable. That includes 1)
introducing some block structure and local declarations to the generator
code and 2) adopting a more coherent allocation of values to registers
which reflects the naming and local usage of variables in the original
algorithm. To simplify that task I will provide a revised algorithm that
which faithfully reflects the structure of your generated code and
clarifies its relation to the original.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From aph at redhat.com  Fri Sep 21 09:31:28 2018
From: aph at redhat.com (Andrew Haley)
Date: Fri, 21 Sep 2018 10:31:28 +0100
Subject: [aarch64-port-dev ] RFR: 8189107 - AARCH64: create intrinsic for
 pow
In-Reply-To: <7ca0112f-76f9-538d-f263-8199cbd380ab@redhat.com>
References: <d5554366-9851-a63c-b8d2-c1300219f487@bell-sw.com>
 <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com>
 <c9153905-1f74-fb34-4938-b48b34d5a12c@redhat.com>
 <a7707848-3075-9fa4-f81d-89c128fd6dca@bell-sw.com>
 <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com>
 <ae6b81dd-795d-e23f-099a-761d776f3466@redhat.com>
 <02da7309-9a70-ea30-fafc-d1d85cfae328@bell-sw.com>
 <47c4307d-eca1-5270-7e2c-83a31686ef91@redhat.com>
 <2aff2d56-4126-5fdb-ece5-68324f974f8c@bell-sw.com>
 <7ca0112f-76f9-538d-f263-8199cbd380ab@redhat.com>
Message-ID: <f3e27032-c334-1ab4-37e0-4c2cb61a7050@redhat.com>

On 09/21/2018 09:55 AM, Andrew Dinn wrote:

> Ok, use of Horner form was one of the things I was going to ask you to
> restore. I did actually ask Joe Darcy about the use of Estrin form. If
> he can provide an argument that it is ok to employ it then we can think
> about reinstating the vector computation as an upgrade at a later date.
> I am not surprised its removal makes only a small difference, given how
> little of the computation it represents.

I've run the crlibm difficult rounding tests on the patch, with no
regressions, so I'm pretty happy about using the Estrin form. Of
course it's possible that there will be problems elsewhere, but it's
not likely IMO.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From rkennke at redhat.com  Fri Sep 21 12:48:39 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 21 Sep 2018 14:48:39 +0200
Subject: RFR: JDK-8210752: Remaining explicit barriers for C2
In-Reply-To: <dk6wori7aph.fsf@rwestrel.remote.csb>
References: <15be8e2d-dba5-2e8a-c851-b6821b81d4b3@redhat.com>
 <dk6wori7aph.fsf@rwestrel.remote.csb>
Message-ID: <a9274225-95e2-53d7-3451-6a4f82763288@redhat.com>

Hi Roland,

thanks for reviewing!

Any other reviews? Can I push that stuff?

Roman


>> http://cr.openjdk.java.net/~rkennke/JDK-8210752/webrev.00/
> 
> That looks good to me.
> 
> Roland.
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180921/9b22a2d8/signature.asc>

From shade at redhat.com  Fri Sep 21 13:13:07 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 21 Sep 2018 15:13:07 +0200
Subject: RFR: JDK-8210752: Remaining explicit barriers for C2
In-Reply-To: <a9274225-95e2-53d7-3451-6a4f82763288@redhat.com>
References: <15be8e2d-dba5-2e8a-c851-b6821b81d4b3@redhat.com>
 <dk6wori7aph.fsf@rwestrel.remote.csb>
 <a9274225-95e2-53d7-3451-6a4f82763288@redhat.com>
Message-ID: <0b832f84-2555-5251-8164-b05732ae8b4a@redhat.com>

On 09/21/2018 02:48 PM, Roman Kennke wrote:
> Any other reviews? Can I push that stuff?
> 
>>> http://cr.openjdk.java.net/~rkennke/JDK-8210752/webrev.00/

Looks good to me too.

Thanks,
-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180921/6f11ca5a/signature.asc>

From adinn at redhat.com  Fri Sep 21 15:01:57 2018
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 21 Sep 2018 16:01:57 +0100
Subject: RFR: 8189107 - AARCH64: create intrinsic for pow
In-Reply-To: <47c4307d-eca1-5270-7e2c-83a31686ef91@redhat.com>
References: <d5554366-9851-a63c-b8d2-c1300219f487@bell-sw.com>
 <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com>
 <c9153905-1f74-fb34-4938-b48b34d5a12c@redhat.com>
 <a7707848-3075-9fa4-f81d-89c128fd6dca@bell-sw.com>
 <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com>
 <ae6b81dd-795d-e23f-099a-761d776f3466@redhat.com>
 <02da7309-9a70-ea30-fafc-d1d85cfae328@bell-sw.com>
 <47c4307d-eca1-5270-7e2c-83a31686ef91@redhat.com>
Message-ID: <1b770596-8da2-74be-bad2-832bf4a6622a@redhat.com>

Hi Dmitrij,

As promised here is the review (after sig) based on webrev.02. The
review first describes the problems I have identified. It then continues
with recommendations for (extensive) rework. Since it is based on
webrev.02 you will have to make some allowance for the changes you
introduced in your webrev.03

I have revised you webrev to include fixes for the two errors I
identified and a new version is available here

  http://cr.openjdk.java.net/~adinn/8189107/webrev.03/

The webrev includes my recommended fix for to the scalbn code in
macroAssembler_pow.cpp and a correction to the declaration of table
_pow_coef1 in stubRoutines_aarch64.cpp. I explain these changes below.

I have also uploaded a commented version of the original algorithm and a
revised algorithm based on your code here

  http://cr.openjdk.java.net/~adinn/8189107/original_algorithm.txt
  http://cr.openjdk.java.net/~adinn/8189107/revised_algorithm.txt

You have seen the original algorithm modulo a few tweaks that emerged in
 creating the revised version. The revised algorithm /faithfully/ models
your webrev.02 code (with a couple of notable exceptions that relate to
problems described below). That faithful modelling of the code includes
retaining the order of computation of all values. In particular, it
models early computation of some data that you appear to have introduced
in order to pipeline certain operations.

At the same time, the algorithm also introduces a much more coherent
control structure (inserting 'if then else' in place of GOTO everywhere
it is possible) and a nested /block structure/ (none of this required
reordering btw). It profits from this to introduce block local
declarations which scope the values computed and used at successive
steps. As far as possible the revised algorithm employs the same naming
convention as the original algorithm (I'll explain more on that in the
detailed feedback below).

Why does this matter? Well, once the errors get fixed, by far the
biggest remaining problems with the existing code are 1) its unclear
control structure and 2) its incoherent allocation of data to registers.
The intention of providing block structure and block local use of
variables in the revised algorithm is to enable introduction of similar
block structuring and block local declarations for registers and branch
labels in the generator code. In particular, that will allow the
generator code to be rewritten to use symbolic names for registers
consistently throughout the function.

So, a small number of register mappings for variables that are global to
the algorithm will need to be be retained at the start of the function.
However, they will use meaningful names like x, exp_mask, one_d, result
etc instead of the completely meaningless aliases, tm1, tmp2 etc, that
you provided. Similarly, some of the label declarations (mostly for exit
cases) will need to be retained at the top level.

However, most register mappings will be for variables that are local to
a block. So, they will need to be declared at the start of that block
making it clear where they are used and where they are no longer in use.
Similarly most label declarations will be only need to be declared at
the start of the immediately enclosing block that generates the code
identified by the label. This will ensure declarations are close to
their point of use and are not in scope after they become redundant (or
at least not for very long).

Register mappings for variables declared in an outer block that are live
across nested blocks will then be able to be used consistently in those
inner blocks while clearly identifying precisely what values are being
generated, used or updated. The same applies for label declarations.
They can be used as branch targets in nested blocks but will not be
visible in outer scopes or sibling scopes.

Where possible the revised algorithm employs the same naming convention
as the original algorithm for the values it operates on -- with one
important modification. A suffix convention has been adopted to clarify
some ambiguities present in the original. For example, this allows us to
distinguish the double value x from its long bits representation x_r,
its 32 bit high word x_h or its 32 bit positive high word ix_h. The
algorithm also employs block local declarations to scope intermediate
values, employing names starting with 'tmp'. These are often introduced
in small, local blocks, allowing the same names tmp1, tmp2 etc to be
reused without ambiguity.

So, I hope it is clear how you can use this algorithm to rewrite the
generator code to be much more readable and maintainable -- that being
the ultimate goal here. I'm not willing to let the code be committed
without this restructuring taking place.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

Problems
--------

1) Error in Y_IS_HUGE case

The vector maths that computes the polynomial for this case expects to
load the first 6 coefficients in table _pow_coef1 in the permuted order
(0.25, -ivln2, -0.333333333333, 0.5, ivln2_l, ivln2_h). However, the
table declaration in stubRoutines_aarch64.cpp declares them in the order
(-0.333333333333, 0.25, 0.5, -ivln2, ivln2_l, ivln2_h).

2) scalbn implementation is wrong

The original code was using this algorithm

1  if (0 >= (j_h >> 20))
2    tmp1 = n' - 1023               // bias n as an exponent
3    tmp2 = (tmp1 & 0x3ff) <<  52   // install as high bits
4    two_n_d = LONG_TO_DOUBLE(tmp2)
5    z = z * two_n_d              // rescale
6    b(CHECK_RESULT_NEGATION);

The problems are:

line 1: as you spotted the test was wrongly implemented using cmp not
cmpw (j_h is computed using an int add so sign extending as a long fails
to retain the overflow bit).

line 2: n is the unbiased exponent -- rebiasing it requires adding 1023,
not subtracting it

line 3: when we hit underflow the unbiased exponent is in the range
[-1024, -1075]. So, after correcting the sub to an add the exponent in
tmp1 will be negative (that is precisely the case the if test looks
for).  Installing the top 12 bits exponent of this negative value into
the high  bits of a double gives a float with unbiased exponent in range
[970,  1023] i.e. a very high positive power of 2 rather than a very low
 negative one. The result is by by about 300 orders of  magnitude!

line 6: As you spotted, the multiply here updates the register storing z
rather than installing the result in v0

I explain all this not to point out why it is wrong but to show how your
original version can be salvaged with a few small changes. Basically we
want to multiply z by 2^n to get the result where n lies between -1023
and -1075. Depending on the values of z and n the result will be either
a subnormal double or +0. So, the simplest solution is to do the
multiply in two stages. Here is a revised algorithm:

  if (0 >= (j_h >> 20)) {
    double two_n_r  // power of 2 as long bits mapped to r2
    long biased_n   // n' - 1023 mapped to r2
    double two_n_d  // used to rescale z to underflow value
                    // mapped to v17
    // split the rescale into two steps: 2^-512 then the rest
    n = n + 512
    two_n_r = 0x1ff << 52 // high bits for 2^-512
    two_n_d = LONG_TO_DOUBLE(two_n_r)
    z' = z' * two_n_d
    biased_n = n + 1023   // bias remainder -- will now be positive!
    two_n_r = (biased_n & 0x3ff) << 52 // high bits for 2^n
    two_n_d = LONG_TO_DOUBLE(two_n_r)
    result = z' * two_n_d
   } else {
     ...

The code for this is:

    cmpw(zr, tmp10, ASR, 20);
    br(LT, NO_SCALBN);
    // n = tmp1
    // rescale first by 2^-512 and then by the rest
    addw(tmp1, tmp1, 512);        // n -> n + 512
    movz(tmp2, 0x1FF0, 48);
    fmovd(v17, tmp2);             // 2^-512
    fmuld(v18, v18, v17);         // z = z * 2^-512
    addw(tmp2, tmp1, 1023);       // biased exponent
    ubfm(tmp2, tmp2, 12, 10);
    fmovd(v17, tmp2);             // 2^n
    fmuld(v0, v18, v17);          // result = z * 2^n
    b(CHECK_RESULT_NEGATION);
  bind(NO_SCALBN);
        . . .

I think this is simpler than your alternative. I checked it on several
test cases and it agrees with the existing StrictMath code.

3) Use of Estrin form in place of Horner form

I would prefer not to use this without a proof that the re-ordered
computation does not introduce rounding errors. I doubt that it will and
I suspect that even if it did the error will be very small, certainly
small enough that the leeway between what is expected of StrictMath.pow
and what is allowed for in Math.pow will not cause anyone's expectations
to be challenged. However, even so I'd prefer not to surprise users.
Anyway, if Andrew Haley really wants this to be adopted then I'll accept
his override and you can leave it in Estrin form.

4) Repetition of code in K_IS_0/K_IS_1 branches

In the Y_IS_NOT_HUGE block 15/17 instructions are generated in the if
and else branches for k == 0 and k == 1, implementing what is almost
exactly the same computation. The two generated sequences differ very
slightly. In the k == 1 case dp_h and dp_l need to be folded into the
computation to subtract ln2(1.5) from the result while in the k == 0
case dp_h and dp_l are both 0.0 and can be ignored.

The most important difference is the need to load dp_l/dp_h from the
coefficient table in one branch while the other merely moves forward the
cursor which points at the table. The other differences consist of two
extra operations in the k == 1 branch, an extra fmlavs and an extra
faddd, which fold in the dp_l and dp_h values.

An alternative would be to use common code for the computation which
always perform the extra fmlavs and faddd. The revised algorithm
describes precisely this alternative. To make it work the k = 0 branch
needs to install 0.0 in the registers used for dp_h and dp_k (not
necessarily by loading from the table). This shortens the branches,
relocating 15 common instructions after the branch

As far as code clarity is concerned it is easier to understand and
maintain if the common code is generated only once. As for performance,
I believe this trade off of a few more cycles against code size is also
a better choice. Shaving instructions may give a small improvement in a
benchmark, especially if the benchmark repeatedly runs with values that
exercise only one of the paths. However, in real use the extra code size
from the duplicated code is likely to push more code out of cache. Since
this is main path code that is actually quite likely to happen.

So, I would like to see the duplication removed unless you can make a
really strong case for keeping it. If you can provide such a reason then
an explanation why the duplication is required needs to be provided in
the revised algorithm and the code and the algorithm need sot be updated
to specify both paths.

4) Repetition of odd/even tests in exit cases.

The original algorithm takes the hit of computing the even/odd/fraction
status of y inline, i.e. in the main path, during special case checks.
That happens even if the result is not used later. You have chosen to do
it at the relevant exits, resulting in more repeated code.

These cases will likely be a rare path so the issue of extra code size
is not going to be very important relative to the saved cycles. However,
the replication of inline code is a maintenance issue.

It would be better to use a separate function to generate the common
code that computes the test values (lowest non-zero bit idx and exponent
of y) avoiding any need to read, check and fix the generator code in
different places. Please update the code accordingly.

5) Test code

You need to include a test as part of this patch which checks the
outputs are correct for a few selected inputs that exercise the
underflow and Y_IS_HUGE branches. I adapted the existing Math tests to
include these extra checks:

    public static int testUnderflow() {
        int failures = 0;
        failures += testPowCase(1.5E-2D, 170D, 8.6201344461973E-311D);
        failures += testPowCase(1.55E-2D, 171.3D,  1.00883443217485E-310D);
        // failures += testPowCase(150D, 141.6D, 1.3630829399139085E308);
        return failures;
    }

    public static int testHugeY() {
        int failures = 0;
        long yl = 0x1_1000_0000L;
        failures += testPowCase(1.0000000001, (double)yl,
1.5782873649891997D);
        failures += testPowCase(1.0000000001, (double)yl + 0.3,
1.5782873650365483D);
        return failures;

You don't have to add this to the existing math test suite. A simple
standalone test which checks that the StrictMath and Math outputs agree
on these inputs is all that is needed.


Rework
------

1) Please check that the revised algorithm I have provided accurately
reflects the computations performed by your code. That will require
changing the code to deal with the error cases 1, 2, 4 and 5 above. If
you stick with the Estrin form in case 3 then ensure the algorithm is
correct. If you stick with Horner form then update the algorithm so it
is consistent.

The algorithm currently details all mappings of variable to registers.
That is provided as a heuristic which i) allowed me to track usages when
writing up the algorithm and ii) will allow you to analyze the current
register mappings and rationalize them. Once you have a more coherent
strategy for allocating variables to registers details of the mapping
can and should be removed.

As mentioned, the algorithm uses a suffix notation to distinguish
different values where there is some scope for ambiguity. The suffices
are used as follows

  1) '_d' double values (mapped to d registers)

  2) '_hd' and '_ld' hi/lo double pairs used to retain accuracy (mapped
to independent d registers)

  3) '_d2' double vectors (mapped to 2d v registers)

  4) '_r' long values that represent the long version of a double value
(mapped to general x registers)

  5) '_h' int values that represent the high word of a double value
(mapped to general w registers)

In many unambiguous cases a suffix is omitted e.g. x, y, k, n.

2) Sort out inconsistent mappings and unnecessary remappings

One of the problems I faced in writing up the algorithm is that some of
your register use appears to be inconsistent -- the same 'logical'
variable is mapped to different registers at different points in the code.

In some cases, this reflects different use of the same name for
different quantities calculated at different stages in the algorithm
(for example, z is used to name both log2(x) and then reused later for
the fractional part of log2(x)). Most of those reuses are actually
catered for by declaring the var in one block and then redeclaring it in
a different sibling block. If this block structure is replicated in the
code then it will enable z to be mapped differently for each different
scope. However, that's not the preferred outcome.  It would make the
code easier to follow if every variable named in the algorithm was
always mapped to the same register unless there was a clear need to
relocate it.

There are also cases where a remapping is performed (without any
discernible reason) within the same scope or within nested scopes! For
example, after the sign of x_r has been noted (and, possibly, it's sign
bit removed) the resulting absolute value of x_r is moved from r0 to r1
(using an explicit mov). There doesn't seem to be any need to do this.
Likewise, in the COMPUTE_POW_PROCEED block the value for z stored in v0
is reassigned to v18 in the same block!

I have flagged these remappings with a !!! comment and avoided the
ambiguity that arises by adding a ' to the remapped value (x_r', z') and
using it thereafter. This ensures that uses of the different versions of
the value located in different registers can be distinguished.

An example of remapping in nested scope is provided by p_hd and p_ld. At
the outermost scope these values are mapped to v5 and b6. However, in
the K_IS_SET block they are stored in v17 and v16 (values for v and u,
local to the same block, are placed in v5 and v6 so there is no reason
why the outer scope mapping could not have been retained).

I'd like to see a much more obvious and obviously consistent plan
adopted for mappings before the code is committed.

3) Insert my original and revised algorithms into your code in place of
the current ones.

4) Change the code to introduce local blocks as per the revised algorithm

This should mostly be a simple matter of introducing braces into the
code at strategic locations (although please re-indent).

5) Change the code to use symbolic names for register arguments and
declare those names as Register or FloatRegister in the appropriate
local scope

The main point of employing consistent, logical names for values defined
in the algorithm  is to allow registers employed in the code to be
identified using meaningful names rather than using r0, r1, v0, v1,
tmp1, tmp2 etc.

So, for example at the top level of the routine you need to declare
register mappings for the function global variables and then use them in
all the generated instructions e.g. the code should look like

  // double x      // input arg
  // double y      // input arg
  FloatRegister x = v0;
  FloatRegister y = v1;
  // long y_r      // y as long bits
  Register y_r = rscratch2
  // long x_r      // x as long bits
  Register x_r = r0
  // double one_d  // 1.0d
  FloatRegister one_d = v2

  . . .

  // y_r = DOUBLE_TO_LONG(y)
  fmovd(y_r, y);
  // x_r = DOUBLE_TO_LONG(x)
  fmovd(x_r, x)

  . . .

Similarly, in nested blocks you need to introduce block local names for
registers and then use them in the code. For example in the
SUBNORMAL_HANDLED block

  bind(SUBNORMAL_HANDLED);
  // block introduced to scope vars/labels in this region
  {
    Label K_IS_SET;
    // int ix_h      // |x| with exponent rescaled so 1 =< ix <
    Register ix_h = r2;
    // int k         // range 0 or 1 for poly expansion
    Register k = r7
    // block introduced to scope vars/labels in this subregion
    {
      // int x_h        // high word of x
      Register x_h = r2
      //int mant_x_h   // mantissa bits of high word of
      Register mant_x_h = r10

      . . .

      // x_h = (x_r' >> 32)
      lsr(x_h, x_r, 32);
      // mant_x_h = (x_r >> 32) & 0x000FFFFF
      ubfx(mant_x_h, x_r, 32, 20); // i.e. top 20 fractions bits of x

      . . .

    }
    bind(K_IS_SET);

  . . .

You should be able to hang the code directly off the algorithm as shown
above, making it clear that it matches the revised algorithm and
allowing meaningful comparison with the original.

If you have changed the code in your latest revisions then please update
the algorithm accordingly to ensure they continue to correspond with
each other.

From adinn at redhat.com  Fri Sep 21 15:44:14 2018
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 21 Sep 2018 16:44:14 +0100
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
Message-ID: <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>

Hi Alan,

Thanks for the response and apologies for failing to notice you had
posted it some days ago (doh!).

Jonathan Halliday has already explained how Red Hat might want to use
this API. Well, what he said, essentially! In particular, this model
provides a way of ensuring that raw byte data is able to be persisted
coherently from Java with the minimal possible overhead. It would be up
to client code above this layer to implement structuring mechanisms for
how those raw bytes get populated with data and to manage any associated
issues regarding atomicity, consistency and isolation (i.e. to provide
the A, C and I of ACID to this API's D).

The main point of the JEP is to ensure that this such a performant base
capability is available for known important cases where that is needed
such as, for example, a transaction manager or a distributed cache. If
equivalent middleware written in C can use persistent memory to bring
the persistent storage tier nearer to the CPU and, hence, lower data
durability overheads then we really need an equivalently performant
option in Java or risk Java dropping out as a player in those middleware
markets.

I am glad to hear that other alternatives might be available and would
be happy to consider them. However, I'm not sure that this means this
option is not still desirable, especially if it is orthogonal to those
other alternatives. Most importantly, this one has the advantage that we
know it is ready to use and will provide benefits (we have already
implemented a journaled transaction log over it with promising results
and someone from our messaging team has already been looking into using
it to persist log messages). Indeed, we also know we can use it to
provide a base for supporting all the use cases addressed by Intel's
libpmem and available to C programmers, e.g. a block store, simply by
implementing Java client libraries that provide managed access to the
persistent buffer along the same lines as the Intel C libraries.

I'm afraid I am not familiar with Panama 'scopes' and 'pointers' so I
can't really compare options here. Can you point me at any info that
explains what those terms mean and how it might be possible to use them
to access off-heap, persistent data.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From sandhya.viswanathan at intel.com  Fri Sep 21 21:30:01 2018
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 21 Sep 2018 21:30:01 +0000
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF5771@FMSMSX126.amr.corp.intel.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com>
 <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com>
 <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com>
 <be409c1c-789c-3160-f1d4-55acab52d8df@oracle.com>
 <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com>
 <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com>
 <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com>
 <3c0fe42a-0606-a060-7435-32547676683b@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com>
 <CAF9BGBzuA33v_aP7eJ_9hdnbv_jwH=GYRXcr-fG5Zzi-fdyfxw@mail.gmail.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF4BB3@FMSMSX126.amr.corp.intel.com>
 <f45b8b3e-182b-c1d3-e8c7-a33f29e6bd43@oracle.com>
 <3cbd216b-56f9-0398-db93-4e032af85dc1@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF5771@FMSMSX126.amr.corp.intel.com>
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF5FCD@FMSMSX126.amr.corp.intel.com>

Hi Vladimir,

Please find the updated webrev with fix for build failure on SPARC and other architectures at:
Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.04/ 
RFE: https://bugs.openjdk.java.net/browse/JDK-8210764 

Vivek submitted this webrev for testing to submit repo yesterday at around noon. We haven?t received any email back so far. This is our first time with submit repo.
http://mail.openjdk.java.net/pipermail/jdk-submit-changes/2018-September/003164.html

Best Regards,
Sandhya

-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Viswanathan, Sandhya
Sent: Thursday, September 20, 2018 10:53 AM
To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>
Subject: RE: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction

Hi Vladimir,

In C1_LIRAssembler.hpp, when I added an additional parameter to negate, I did make sure to add it as a default parameter:

src/hotspot/share/c1/c1_LIRAssembler.hpp, line 282:
  void negate(LIR_Opr left, LIR_Opr dest, LIR_Opr tmp = LIR_OprFact::illegalOpr);

But I guess since the function is not just called but declared/defined in all the other architectures, I need to add an unused LIR_Opr to the negate function for them.
This would be on similar lines as done in some other C1_LIRAssembler methods. 

I will make this change and work with Vivek to use the submit repo for testing it on Sparc.

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
Sent: Thursday, September 20, 2018 10:09 AM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction

I hit build failure on SPARC due to shared changes in C1:

workspace/open/src/hotspot/cpu/sparc/c1_LIRAssembler_sparc.cpp", line 3027: Error: "LIR_Assembler::negate(LIR_OprDesc*,
LIR_OprDesc*)" was previously declared "LIR_Assembler::negate(LIR_OprDesc*, LIR_OprDesc*, LIR_OprDesc*)".
jib > 1 Error(s) detected.

I assume other platforms are also affected.

Vladimir

On 9/19/18 9:53 AM, Vladimir Kozlov wrote:
> Thank you, Sandhya
> 
> I submitted new testing.
> 
> Vladimir
> 
> On 9/18/18 4:52 PM, Viswanathan, Sandhya wrote:
>> Hi Vladimir,
>>
>> Please find below the updated webrev with fixes for the two issues:
>>
>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.03/
>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.03/>
>>
>> RFE: https://bugs.openjdk.java.net/browse/JDK-8210764
>>
>> Fix for compiler/vectorization/TestNaNVector.java was to pass legVecS 
>> as the temporary register type for intrinsics instead of legVecD.
>>
>> This test was only failing with -XX:MaxVectorSize=4.
>>
>> The file modified is x86_64.ad.
>>
>> Fix for compiler/vectorization/TestNaNVector.java was to allow all 
>> xmm registers (xmm0-xmm31) for C1 and handle floating point abs and negate appropriately by providing a temp register.
>>
>> The C1 files are modified for this fix.
>>
>> I reran compiler jtreg tests with setting nativepath appropriately on Haswell, SKX and KNL.
>>
>> Best Regards,
>>
>> Sandhya
>>
>> *From:*Viswanathan, Sandhya
>> *Sent:* Tuesday, September 18, 2018 1:47 PM
>> *To:* 'JC Beyler' <jcbeyler at google.com>
>> *Cc:* vladimir.kozlov at oracle.com; hotspot-compiler-dev 
>> <hotspot-compiler-dev at openjdk.java.net>
>> *Subject:* RE: RFR (M) 8207746:C2: Lucene crashes on AVX512 
>> instruction
>>
>> Hi Jc,
>>
>> Thanks a lot for the steps. I am now able to verify my fix for the NativeCallTest.java.
>>
>> Best Regards,
>>
>> Sandhya
>>
>> *From:*JC Beyler [mailto:jcbeyler at google.com]
>> *Sent:* Monday, September 17, 2018 9:29 PM
>> *To:* Viswanathan, Sandhya <sandhya.viswanathan at intel.com 
>> <mailto:sandhya.viswanathan at intel.com>>
>> *Cc:* vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>;
>> hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net
>> <mailto:hotspot-compiler-dev at openjdk.java.net>>
>> *Subject:* Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 
>> instruction
>>
>> Hi Sandhya,
>>
>> How are you invoking the test for NativeCallTest?
>>
>> The way I would do it using jtreg would be something like this:
>>
>> $ export BUILD_TYPE=release
>>
>> $ export JDK_PATH=wherever you have your JDK
>>
>> ?From the test subfolder:
>>
>> $ wherever-your-jtreg-is/bin/jtreg
>> -nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/su
>> pport/test/hotspot/jtreg/native/lib -jdk 
>> $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk
>> hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/t
>> est/NativeCallTest.java
>>
>> Seems to pass for me.
>>
>> But much easier is:
>>
>> $ make run-test TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java"
>>
>> That seems to pass for me as well and is easier to use :)
>>
>> For information, the make run-test documentation is here:
>>
>> http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing.
>> html
>>
>> Let me know if that helps,
>>
>> Jc
>>
>> ??? Note: For NativeCallTest.java and many others I get the following message in the corresponding .jtr file:
>> ???? ? ? ? ? ? ? ? ? ? ? "test result: Error. Use -nativepath to specify the location of native code"
>> ???? ? ? ? ? ? ? Do I need to give any additional info to jtreg to get over this problem?
>>
>> ??? Thanks a lot!
>> ??? Best Regards,
>> ??? Sandhya
>>
>> ??? -----Original Message-----
>> ??? From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com
>> <mailto:vladimir.kozlov at oracle.com>]
>> ??? Sent: Monday, September 17, 2018 10:14 AM
>> ??? To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com 
>> <mailto:sandhya.viswanathan at intel.com>>;
>> ??? hotspot-compiler-dev at openjdk.java.net
>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>> ??? Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 
>> instruction
>>
>> ??? I finished testing on avx512 machine.
>> ??? All passed except known (TestNaNVector.java) failures.
>>
>> ??? Thanks,
>> ??? Vladimir
>>
>> ??? On 9/14/18 5:22 PM, Vladimir Kozlov wrote:
>> ???? > I gave incorrect link to RFE. Here is correct:
>> ???? >
>> ???? > https://bugs.openjdk.java.net/browse/JDK-8210764
>> ???? >
>> ???? > Vladimir
>> ???? >
>> ???? > On 9/14/18 3:49 PM, Vladimir Kozlov wrote:
>> ???? >> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed.
>> ???? >>
>> ???? >> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too.
>> ???? >>
>> ???? >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation'
>> ??? on CPU
>> ???? >> with AVX1 only
>> ???? >>
>> ???? >> #? SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871,
>> tid=13884
>> ???? >> # Problematic frame:
>> ???? >> # V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20
>> ???? >>
>> ???? >> Current CompileTask:
>> ???? >> C2:??? 154??? 5???????????? java.lang.String::equals (65
>> bytes)
>> ???? >>
>> ???? >> Stack: [0x00007f3b10044000,0x00007f3b10145000],
>> sp=0x00007f3b1013fe70,? free space=1007k
>> ???? >> Native frames: (J=compiled Java code, A=aot compiled Java 
>> code, j=interpreted, Vv=VM code, C=native code)
>> ???? >> V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20
>> ???? >> V? [libjvm.so+0x882a72]
>> PhaseChaitin::gather_lrg_masks(bool)+0x872
>> ???? >> V? [libjvm.so+0xd82235]? PhaseCFG::global_code_motion()+0xfc5
>> ???? >> V? [libjvm.so+0xd824b1]
>> PhaseCFG::do_global_code_motion()+0x51
>> ???? >> V? [libjvm.so+0xa2c26c]? Compile::Code_Gen()+0x24c
>> ???? >> V? [libjvm.so+0xa2ff82]? Compile::Compile(ciEnv*, 
>> C2Compiler*, ciMethod*, int, bool, bool, bool,
>> ??? DirectiveSet*)+0xe42
>> ???? >>
>> ???? >>
>> ---------------------------------------------------------------------
>> ---------------------------
>> ???? >> 2.
>>
>> ??? with '-Xcomp'
>> ???? >> #? Internal Error
>> (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646),
>> pid=22016, tid=22073
>> ???? >> #? assert(false) failed: cannot spill interval that is used 
>> in first instruction (possible reason: no register
>> ??? found)
>> ???? >>
>> ???? >> Current CompileTask:
>> ???? >> C1: 854767 13391?????? 3 org.sunflow.math.Matrix4::multiply 
>> (692 bytes)
>> ???? >>
>> ???? >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],
>> sp=0x00007f23b9e7f9d0,? free space=1014k
>> ???? >> Native frames: (J=compiled Java code, A=aot compiled Java 
>> code, j=interpreted, Vv=VM code, C=native code)
>> ???? >> V? [libjvm.so+0x1882202]? VMError::report_and_die(int, char 
>> const*, char const*, __va_list_tag*, Thread*, unsigned
>> ???? >> char*, void*, void*, char const*, int, unsigned long)+0x562
>> ???? >> V? [libjvm.so+0x1882d2f]? VMError::report_and_die(Thread*, 
>> void*, char const*, int, char const*, char const*,
>> ???? >> __va_list_tag*)+0x2f
>> ???? >> V? [libjvm.so+0xb0bea0]? report_vm_error(char const*, int, 
>> char const*, char const*, ...)+0x100
>> ???? >> V? [libjvm.so+0x7e0410]
>> LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0
>> ???? >> V? [libjvm.so+0x7e0a20]
>> LinearScanWalker::activate_current()+0x280
>> ???? >> V? [libjvm.so+0x7e0c7d]? IntervalWalker::walk_to(int) [clone 
>> .constprop.299]+0x9d
>> ???? >> V? [libjvm.so+0x7e1078]
>> LinearScan::allocate_registers()+0x338
>> ???? >> V? [libjvm.so+0x7e2135]? LinearScan::do_linear_scan()+0x155
>> ???? >> V? [libjvm.so+0x70a6bb]? Compilation::emit_lir()+0x99b
>> ???? >> V? [libjvm.so+0x70caff]
>> Compilation::compile_java_method()+0x42f
>> ???? >> V? [libjvm.so+0x70d974]? Compilation::compile_method()+0x1d4
>> ???? >> V? [libjvm.so+0x70e547]
>> Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, 
>> BufferBlob*,
>> ???? >> DirectiveSet*)+0x357
>> ???? >> V? [libjvm.so+0x71073c]? Compiler::compile_method(ciEnv*, 
>> ciMethod*, int, DirectiveSet*)+0x14c
>> ???? >> V? [libjvm.so+0xa3cf89]
>> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409
>> ???? >>
>> ???? >> Vladimir
>> ???? >>
>> ???? >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote:
>> ???? >>>
>> ???? >>> Thanks Vladimir, the below should fix this issue:
>> ???? >>>
>> ???? >>> ------------------------------
>> ???? >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14
>> 13:10:23.488379912 -0700
>> ???? >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14
>> 13:10:23.308379915 -0700
>> ???? >>> @@ -233,22 +233,6 @@
>> ???? >>> ??? _xmm_regs[13]? = xmm13;
>> ???? >>> ??? _xmm_regs[14]? = xmm14;
>> ???? >>> ??? _xmm_regs[15]? = xmm15;
>> ???? >>> -? _xmm_regs[16]? = xmm16;
>> ???? >>> -? _xmm_regs[17]? = xmm17;
>> ???? >>> -? _xmm_regs[18]? = xmm18;
>> ???? >>> -? _xmm_regs[19]? = xmm19;
>> ???? >>> -? _xmm_regs[20]? = xmm20;
>> ???? >>> -? _xmm_regs[21]? = xmm21;
>> ???? >>> -? _xmm_regs[22]? = xmm22;
>> ???? >>> -? _xmm_regs[23]? = xmm23;
>> ???? >>> -? _xmm_regs[24]? = xmm24;
>> ???? >>> -? _xmm_regs[25]? = xmm25;
>> ???? >>> -? _xmm_regs[26]? = xmm26;
>> ???? >>> -? _xmm_regs[27]? = xmm27;
>> ???? >>> -? _xmm_regs[28]? = xmm28;
>> ???? >>> -? _xmm_regs[29]? = xmm29;
>> ???? >>> -? _xmm_regs[30]? = xmm30;
>> ???? >>> -? _xmm_regs[31]? = xmm31;
>> ???? >>> ? #endif // _LP64
>> ???? >>>
>> ???? >>> ??? for (int i = 0; i < 8; i++) {
>> ???? >>> ---------------------------------
>> ???? >>>
>> ???? >>> I think the gcc version on my desktop is older so didn?t catch this.
>> ???? >>>
>> ???? >>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to:
>> ???? >>> Patch: 
>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/
>> ??? <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.02/>
>> ???? >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735
>> ???? >>>
>> ???? >>> FYI, I did notice that the default for UseAVX had been 
>> rolled back and wanted to get confirmation from you before
>> ???? >>> changing it back to 3.
>> ???? >>>
>> ???? >>> Best Regards,
>> ???? >>> Sandhya
>> ???? >>>
>> ???? >>>
>> ???? >>> -----Original Message-----
>> ???? >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com
>> <mailto:vladimir.kozlov at oracle.com>]
>> ???? >>> Sent: Friday, September 14, 2018 12:13 PM
>> ???? >>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com 
>> <mailto:sandhya.viswanathan at intel.com>>;
>> ??? hotspot-compiler-dev at openjdk.java.net
>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>> ???? >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 
>> instruction
>> ???? >>>
>> ???? >>> I got build failure:
>> ???? >>>
>> ???? >>>
>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: 
>> array index 16 is past the end of the array
>> ???? >>> (which contains 16 elements) [-Werror,-Warray-bounds]
>> ???? >>> jib >?? _xmm_regs[16]? = xmm16;
>> ???? >>>
>> ???? >>> I also noticed that we don't have RFE for this work. I filed:
>> ???? >>>
>> ???? >>> https://bugs.openjdk.java.net/browse/JDK-8209735
>> ???? >>>
>> ???? >>> You did not enabled avx512 by default (8209735 change was 
>> synced from jdk 11 into 12 2 weeks ago). I added next
>> ???? >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp:
>> ???? >>>
>> ???? >>> - product(intx, UseAVX, 2, \
>> ???? >>> + product(intx, UseAVX, 3, \
>> ???? >>>
>> ???? >>> Thanks,
>> ???? >>> Vladimir
>> ???? >>>
>> ???? >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote:
>> ???? >>>> Looks good to me. I will start testing and let you know results.
>> ???? >>>>
>> ???? >>>> Thanks,
>> ???? >>>> Vladimir
>> ???? >>>>
>> ???? >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
>> ???? >>>>> Hi Vladimir,
>> ???? >>>>>
>> ???? >>>>> Please find below the updated webrev with all your comments incorporated:
>> ???? >>>>>
>> ???? >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
>> ??? <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.01/>
>> ???? >>>>>
>> ???? >>>>> I have run the jtreg compiler tests on SKX and KNL which 
>> have two
>> ???? >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms.
>> ???? >>>>>
>> ???? >>>>> Best Regards,
>> ???? >>>>> Sandhya
>> ???? >>>>>
>> ???? >>>>>
>> ???? >>>>> -----Original Message-----
>> ???? >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com
>> <mailto:vladimir.kozlov at oracle.com>]
>> ???? >>>>> Sent: Tuesday, September 11, 2018 8:54 PM
>> ???? >>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com 
>> <mailto:sandhya.viswanathan at intel.com>>;
>> ???? >>>>> hotspot-compiler-dev at openjdk.java.net
>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>> ???? >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 
>> instruction
>> ???? >>>>>
>> ???? >>>>> Thank you, Sandhya
>> ???? >>>>>
>> ???? >>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.
>> ???? >>>>>
>> ???? >>>>> Vladimir
>> ???? >>>>>
>> ???? >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
>> ???? >>>>>> Hi Vladimir,
>> ???? >>>>>>
>> ???? >>>>>> Thanks a lot for the detailed review. I really appreciate your feedback.
>> ???? >>>>>> Please see my response in your email below marked with 
>> (Sandhya
>> ???? >>>>>>>>> ). Looking forward to your advice.
>> ???? >>>>>>
>> ???? >>>>>> Best Regards,
>> ???? >>>>>> Sandhya
>> ???? >>>>>>
>> ???? >>>>>>
>> ???? >>>>>> -----Original Message-----
>> ???? >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com
>> <mailto:vladimir.kozlov at oracle.com>]
>> ???? >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM
>> ???? >>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com 
>> <mailto:sandhya.viswanathan at intel.com>>;
>> ???? >>>>>> hotspot-compiler-dev at openjdk.java.net
>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>> ???? >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>> ???? >>>>>> instruction
>> ???? >>>>>>
>> ???? >>>>>> Thank you.
>> ???? >>>>>>
>> ???? >>>>>> I want to discuss next issue:
>> ???? >>>>>>
>> ???? >>>>>> ??? > You did not added instructions to load these 
>> registers from
>> ???? >>>>>> memory (and stack). What happens in such cases when you need to load or store?
>> ???? >>>>>> ??? >>>> Let us take an example, e.g. for loading into 
>> rregF. First
>> ???? >>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>> ???? >>>>>>
>> ???? >>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack.
>> ???? >>>>>> Also we don't check that register could be the same as result you may get unneeded moves.
>> ???? >>>>>>
>> ???? >>>>>> I would advice add memory moves at least.
>> ???? >>>>>>
>> ???? >>>>>> Sandhya >>>? I had added those rules initially and 
>> removed them in
>> ???? >>>>>> the final patch. I noticed that the register allocator 
>> uses the
>> ???? >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg 
>> mask
>> ???? >>>>>> (matcher.cpp). I would like the register allocator to get 
>> all the
>> ???? >>>>>> possible register on an architecture for idealreg2reg 
>> mask. I
>> ???? >>>>>> wondered that multiple instruct rules in .ad file for 
>> LoadF from
>> ???? >>>>>> memory might cause problems.? I would have to have higher 
>> cost for
>> ???? >>>>>> loading into restricted register set like vlReg. Then I 
>> decided that
>> ???? >>>>>> the register allocator can handle this in much better way 
>> than me
>> ???? >>>>>> adding rules to load from memory. This is with the 
>> background that the regF is always all the available
>> ??? registers
>> ???? >>>>>> and vlRegF is the restricted register set. Likewise for 
>> VecS and legVecS. Let me know you thoughts on this
>> ??? and if
>> ???? >>>>>> I should still add the rules to load from memory into 
>> vlReg and legVec. The specific code from matcher.cpp
>> ??? that I
>> ???? >>>>>> am referring to is:
>> ???? >>>>>> ???? MachNode *spillCP = match_tree(new
>> ???? >>>>>>
>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>> ???? >>>>>> #endif
>> ???? >>>>>> ???? MachNode *spillI? = match_tree(new
>> ???? >>>>>>
>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>> ???? >>>>>> ???? MachNode *spillL? = match_tree(new
>> ???? >>>>>>
>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered,
>> ???? >>>>>> LoadNode::DependsO nlyOnTest, false));
>> ???? >>>>>> ???? MachNode *spillF? = match_tree(new
>> ???? >>>>>>
>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>> ???? >>>>>> ???? MachNode *spillD? = match_tree(new
>> ???? >>>>>>
>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>> ???? >>>>>> ???? MachNode *spillP? = match_tree(new
>> ???? >>>>>>
>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>> ???? >>>>>> ???? ....
>> ???? >>>>>> ???? idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
>> ???? >>>>>>
>> ???? >>>>>> An other question. You use movflt() and movdbl() which 
>> use either
>> ???? >>>>>> movap[s|d] and movs[s|d]
>> ???? >>>>>> instructions:
>> ???? >>>>>>
>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu
>> ???? >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions 
>> work when
>> ???? >>>>>> avx512vl is not available? I see for vectors you use
>> ???? >>>>>> vpxor+vinserti* combination.
>> ???? >>>>>>
>> ???? >>>>>> Sandhya >>> Yes the scalar floating point instructions 
>> are available
>> ???? >>>>>> with AVX512 encoding when avx512vl is not available. That 
>> is why you
>> ???? >>>>>> would see not just movflt, movdbl but all the other 
>> scalar
>> ???? >>>>>> operations like adds, addsd etc using the entire xmm 
>> range (xmm0-31). In other words they are AVX512F
>> ??? instructions.
>> ???? >>>>>>
>> ???? >>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad <http://x86.ad>:
>> ???? >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
>> ???? >>>>>>
>> ???? >>>>>> Should it be (UseAVX < 3)?
>> ???? >>>>>>
>> ???? >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
>> ???? >>>>>>
>> ???? >>>>>> Thanks,
>> ???? >>>>>> Vladimir
>> ???? >>>>>>
>> ???? >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>> ???? >>>>>>> Hi Vladimir,
>> ???? >>>>>>>
>> ???? >>>>>>> Thanks a lot for your review and feedback. Please see my 
>> response
>> ???? >>>>>>> in your email below. I will send an updated webrev incorporating your feedback.
>> ???? >>>>>>>
>> ???? >>>>>>> Best Regards,
>> ???? >>>>>>> Sandhya
>> ???? >>>>>>>
>> ???? >>>>>>>
>> ???? >>>>>>> -----Original Message-----
>> ???? >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com
>> <mailto:vladimir.kozlov at oracle.com>]
>> ???? >>>>>>> Sent: Monday, September 10, 2018 6:09 PM
>> ???? >>>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com 
>> <mailto:sandhya.viswanathan at intel.com>>;
>> ???? >>>>>>> hotspot-compiler-dev at openjdk.java.net
>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>> ???? >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on
>> AVX512
>> ???? >>>>>>> instruction
>> ???? >>>>>>>
>> ???? >>>>>>> Very nice. Thank you, Sandhya.
>> ???? >>>>>>>
>> ???? >>>>>>> I would like to see more meaningful naming in .ad files
>> - instead
>> ???? >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*.
>> ???? >>>>>>>
>> ???? >>>>>>>>>> Yes, accepted.
>> ???? >>>>>>>
>> ???? >>>>>>> New load_from_* and load_to_* instructions in .ad files 
>> should be
>> ???? >>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions:
>> ???? >>>>>>>
>> ???? >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct 
>> MoveVL2F(regF dst,
>> ???? >>>>>>> vlRegF src)
>> ???? >>>>>>>>>> Yes, accepted.
>> ???? >>>>>>>
>> ???? >>>>>>> You did not added instructions to load these registers 
>> from memory
>> ???? >>>>>>> (and stack). What happens in such cases when you need to load or store?
>> ???? >>>>>>>>>> Let us take an example, e.g. for loading into rregF. 
>> First it
>> ???? >>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa.
>> ???? >>>>>>>
>> ???? >>>>>>> Also please explain why these registers are used when UseAVX == 0?:
>> ???? >>>>>>>
>> ???? >>>>>>> +instruct absD_reg(rregD dst) %{
>> ???? >>>>>>> ??????? predicate((UseSSE>=2) && (UseAVX == 0));
>> ???? >>>>>>>
>> ???? >>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too:
>> ???? >>>>>>> ????? 661?? if (UseAVX < 3) {
>> ???? >>>>>>> ????? 662???? _features &= ~CPU_AVX512F;
>> ???? >>>>>>>
>> ???? >>>>>>>>>> Yes, accepted. It could be regD here.
>> ???? >>>>>>>
>> ???? >>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
>> ???? >>>>>>>
>> ???? >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>> ???? >>>>>>> +vectors_reg_legacy, %{
>> ???? >>>>>>> VM_Version::supports_evex() &&
>> VM_Version::supports_avx512bw() &&
>> ???? >>>>>>> VM_Version::supports_avx512dq() &&
>> ???? >>>>>>> VM_Version::supports_avx512vl() %} );
>> ???? >>>>>>>
>> ???? >>>>>>>>>> Yes, accepted.
>> ???? >>>>>>>
>> ???? >>>>>>> I would suggest to test these changes on different 
>> machines
>> ???? >>>>>>> (non-avx512 and avx512) and with different UseAVX values.
>> ???? >>>>>>>
>> ???? >>>>>>>>>> Will do.
>> ???? >>>>>>>
>> ???? >>>>>>> Thanks,
>> ???? >>>>>>> Vladimir
>> ???? >>>>>>>
>> ???? >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>> ???? >>>>>>>> Recently there have been couple of high priority issues 
>> with
>> ???? >>>>>>>> regards to high bank of XMM register
>> ???? >>>>>>>> (XMM16-XMM31) usage by C2:
>> ???? >>>>>>>>
>> ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>> ???? >>>>>>>>
>> ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>> ???? >>>>>>>>
>> ???? >>>>>>>> Please find below a patch which attempts to clean up 
>> the XMM
>> ???? >>>>>>>> register handling by using register groups.
>> ???? >>>>>>>>
>> ???? >>>>>>>>
>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>> ??? <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>> ???? >>>>>>>>
>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>> ???? >>>>>>>>
>> ???? >>>>>>>> The patch provides a restricted set of registers to the 
>> match
>> ???? >>>>>>>> rules in the ad file based on the underlying architecture.
>> ???? >>>>>>>>
>> ???? >>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>> ???? >>>>>>>>
>> ???? >>>>>>>> By removing the special handling, the patch reduces the 
>> overall
>> ???? >>>>>>>> code size by about 1800 lines of code.
>> ???? >>>>>>>>
>> ???? >>>>>>>> Your review and feedback is very welcome.
>> ???? >>>>>>>>
>> ???? >>>>>>>> Best Regards,
>> ???? >>>>>>>>
>> ???? >>>>>>>> Sandhya
>> ???? >>>>>>>>
>>
>>
>> --
>>
>> Thanks,
>>
>> Jc
>>

From rkennke at redhat.com  Sun Sep 23 18:47:53 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Sun, 23 Sep 2018 20:47:53 +0200
Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of
 single-threaded walk of thread stacks in
 NMethodSweeper::mark_active_nmethods()
In-Reply-To: <e16d7532-8c77-1380-61fa-146c49d1f926@oracle.com>
References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com>
 <e16d7532-8c77-1380-61fa-146c49d1f926@oracle.com>
Message-ID: <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com>

Hi David,

thanks for looking at this!

> Should compiler folk be looking at this as well?

Maybe. I added them.

> I'm not familiar with the details of the NMethodSweeper but it seems to
> me that this change potentially allows multiple concurrent executions of
> NMethodSweeper::prepare_mark_active_nmethods() and that code does not
> appear to be thread-safe.

There are two scenarios now:
- TLHS enabled: NMethodSweeper::prepare_mark_active_nmethods() only gets
called from the sweeper thread.
- TLHS disabled: NMethodSweeper::prepare_mark_active_nmethods() only
gets called from VMThread/at-safepoint.

The structures used in NMethodSweeper::prepare_mark_active_nmethods()
are only ever called from sweeper thread, or at safepoint, and those are
exclusive, that means it should be safe. And instead of removing the
assert, we can extend it to accept the sweeper thread. I also noticed
that we need to grab the CodeCache_lock before calling into
prepare_mark_active_nmethods() so I added that and put that into the
assertion.

Incremental webrev:
http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.02.diff/
Full webrev:
http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.02/

Better now?

Thanks,
Roman


> 
> Thanks,
> David
> 
> On 21/09/2018 9:57 AM, Roman Kennke wrote:
>> Please review the following change to improve and/or eliminate stop to
>> to mark stacks for NMethodSweeper.
>>
>> The proposed change is two-fold:
>> - If ThreadLocalHandshake is enabled, do the stack-nmethod-marking using
>> TLHS. This completely eliminates the full safepoint. In this scenario,
>> nmethod-marking will also be skipped during safepoint-cleanup. IOW, it
>> only happens when the sweeper loop asks for it. It is also most
>> efficient because each thread scans its own stack, without requiring to
>> synchronize with other threads. Everything remains free-running.
>> - Otherwise, try to use GC-safepoint-workers to do the marking at SP.
>> The infrastructure for this is already there since some time, and both
>> G1 and ZGC (and Shenandoah, when it arrives) support it. The
>> safepoint-cleanup-phase already uses it, so let's just do the same in
>> sweeper-loop-induced safepoints.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8132849
>> Webrev:
>> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.01/
>>
>> Testing: hotspot/jtreg:tier1 using +ThreadLocalHandshakes and
>> -ThreadLocalHandshakes
>>
>> One issue that I am not sure of is the:
>>
>> assert(SafepointSynchronize::is_at_safepoint(), "must be executed at a
>> safepoint");
>>
>> at the start of NMethodSweeper::prepare_mark_active_nmethods().
>>
>> I couldn't see any particular reason for it. The
>> wait_for_stack_scanning() stuff is called outside safepoinst anyway, and
>> the other stuff doesn't seem critical. And besides, in the scenario
>> where we'd call this outside safepoint (+ThreadLocalHandshakes) we'd
>> only ever call it from the sweeper thread anyway.
>>
>> What do you think?
>>
>>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180923/12215fd9/signature.asc>

From david.holmes at oracle.com  Sun Sep 23 19:38:56 2018
From: david.holmes at oracle.com (David Holmes)
Date: Sun, 23 Sep 2018 15:38:56 -0400
Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of
 single-threaded walk of thread stacks in
 NMethodSweeper::mark_active_nmethods()
In-Reply-To: <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com>
References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com>
 <e16d7532-8c77-1380-61fa-146c49d1f926@oracle.com>
 <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com>
Message-ID: <41505140-0b14-76c7-3c0b-6cf71486c9c7@oracle.com>

Hi Roman,

Thanks for clarifying only two possible threads involved - and not 
concurrently. That eases my concern.

I'll leave detailed reviews to others.

David

On 23/09/2018 2:47 PM, Roman Kennke wrote:
> Hi David,
> 
> thanks for looking at this!
> 
>> Should compiler folk be looking at this as well?
> 
> Maybe. I added them.
> 
>> I'm not familiar with the details of the NMethodSweeper but it seems to
>> me that this change potentially allows multiple concurrent executions of
>> NMethodSweeper::prepare_mark_active_nmethods() and that code does not
>> appear to be thread-safe.
> 
> There are two scenarios now:
> - TLHS enabled: NMethodSweeper::prepare_mark_active_nmethods() only gets
> called from the sweeper thread.
> - TLHS disabled: NMethodSweeper::prepare_mark_active_nmethods() only
> gets called from VMThread/at-safepoint.
> 
> The structures used in NMethodSweeper::prepare_mark_active_nmethods()
> are only ever called from sweeper thread, or at safepoint, and those are
> exclusive, that means it should be safe. And instead of removing the
> assert, we can extend it to accept the sweeper thread. I also noticed
> that we need to grab the CodeCache_lock before calling into
> prepare_mark_active_nmethods() so I added that and put that into the
> assertion.
> 
> Incremental webrev:
> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.02.diff/
> Full webrev:
> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.02/
> 
> Better now?
> 
> Thanks,
> Roman
> 
> 
>>
>> Thanks,
>> David
>>
>> On 21/09/2018 9:57 AM, Roman Kennke wrote:
>>> Please review the following change to improve and/or eliminate stop to
>>> to mark stacks for NMethodSweeper.
>>>
>>> The proposed change is two-fold:
>>> - If ThreadLocalHandshake is enabled, do the stack-nmethod-marking using
>>> TLHS. This completely eliminates the full safepoint. In this scenario,
>>> nmethod-marking will also be skipped during safepoint-cleanup. IOW, it
>>> only happens when the sweeper loop asks for it. It is also most
>>> efficient because each thread scans its own stack, without requiring to
>>> synchronize with other threads. Everything remains free-running.
>>> - Otherwise, try to use GC-safepoint-workers to do the marking at SP.
>>> The infrastructure for this is already there since some time, and both
>>> G1 and ZGC (and Shenandoah, when it arrives) support it. The
>>> safepoint-cleanup-phase already uses it, so let's just do the same in
>>> sweeper-loop-induced safepoints.
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8132849
>>> Webrev:
>>> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.01/
>>>
>>> Testing: hotspot/jtreg:tier1 using +ThreadLocalHandshakes and
>>> -ThreadLocalHandshakes
>>>
>>> One issue that I am not sure of is the:
>>>
>>> assert(SafepointSynchronize::is_at_safepoint(), "must be executed at a
>>> safepoint");
>>>
>>> at the start of NMethodSweeper::prepare_mark_active_nmethods().
>>>
>>> I couldn't see any particular reason for it. The
>>> wait_for_stack_scanning() stuff is called outside safepoinst anyway, and
>>> the other stuff doesn't seem critical. And besides, in the scenario
>>> where we'd call this outside safepoint (+ThreadLocalHandshakes) we'd
>>> only ever call it from the sweeper thread anyway.
>>>
>>> What do you think?
>>>
>>>
> 
> 

From erik.osterlund at oracle.com  Sun Sep 23 22:08:48 2018
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Mon, 24 Sep 2018 00:08:48 +0200
Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of
 single-threaded walk of thread stacks in
 NMethodSweeper::mark_active_nmethods()
In-Reply-To: <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com>
References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com>
 <e16d7532-8c77-1380-61fa-146c49d1f926@oracle.com>
 <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com>
Message-ID: <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com>

Hi Roman,

Thank you for sorting this out. It is very helpful.

Could you change the name of ThreadToCodeBlobClosure to 
NMethodMarkingThreadClosure. (Motivation: the closure filters 
JavaThreads that are not the sweeper, and actually only looks at 
nmethods, and not other types of CodeBlobs, e.g. AoT methods, so it does 
less than I expect).

Also, the NMethodSweeper::prepare_mark_active_nmethods() was built for 
safepoint cleaning and returns either hotness counting or nmethod 
marking closures. However, when moving nmethod marking out to be done 
concurrently with TLH, it is slightly confusing to have the same member 
function called from the concurrent context despite never ever wanting a 
hotness counter closure from there.

I'm thinking the prepare_mark_active_nmethods() member function could be 
split into two:

One member function that returns either nmethod marking closure or NULL 
(depending on whether it's needed or not).
Another member function that calls the first one, and if NULL slaps on a 
hotness counter closure.

Then from concurrent contexts we would call the first method (nmethod 
marking or NULL), and from STW contexts we would call the second member 
function (nmethod marking or hotness counter).

Another thing worth noticing is that the VM_MarkActiveNMethods VM 
operation marks the nmethods on the stack twice. First in safepoint 
cleanup, and subsequently in the operation itself 
(VM_MarkActiveNMethods::doit). I would argue that only one pass is 
enough. Therefore, I would propose to completely remove the nmethod 
marking from the safepoint cleanup, and have safepoint cleanup *only* 
fiddle around with hotness counters. If we do that, then nmethod marking 
is done in VM_MarkActiveNMethods::doit if TLH is off, and in your new 
handshake operation when TLH is on.

Then we can have zero nmethod marking in safepoint cleanup, and 
subsequently figure out how to get rid of the hotness counters.

Thanks,
/Erik

On 2018-09-23 20:47, Roman Kennke wrote:
> Hi David,
>
> thanks for looking at this!
>
>> Should compiler folk be looking at this as well?
> Maybe. I added them.
>
>> I'm not familiar with the details of the NMethodSweeper but it seems to
>> me that this change potentially allows multiple concurrent executions of
>> NMethodSweeper::prepare_mark_active_nmethods() and that code does not
>> appear to be thread-safe.
> There are two scenarios now:
> - TLHS enabled: NMethodSweeper::prepare_mark_active_nmethods() only gets
> called from the sweeper thread.
> - TLHS disabled: NMethodSweeper::prepare_mark_active_nmethods() only
> gets called from VMThread/at-safepoint.
>
> The structures used in NMethodSweeper::prepare_mark_active_nmethods()
> are only ever called from sweeper thread, or at safepoint, and those are
> exclusive, that means it should be safe. And instead of removing the
> assert, we can extend it to accept the sweeper thread. I also noticed
> that we need to grab the CodeCache_lock before calling into
> prepare_mark_active_nmethods() so I added that and put that into the
> assertion.
>
> Incremental webrev:
> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.02.diff/
> Full webrev:
> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.02/
>
> Better now?
>
> Thanks,
> Roman
>
>
>> Thanks,
>> David
>>
>> On 21/09/2018 9:57 AM, Roman Kennke wrote:
>>> Please review the following change to improve and/or eliminate stop to
>>> to mark stacks for NMethodSweeper.
>>>
>>> The proposed change is two-fold:
>>> - If ThreadLocalHandshake is enabled, do the stack-nmethod-marking using
>>> TLHS. This completely eliminates the full safepoint. In this scenario,
>>> nmethod-marking will also be skipped during safepoint-cleanup. IOW, it
>>> only happens when the sweeper loop asks for it. It is also most
>>> efficient because each thread scans its own stack, without requiring to
>>> synchronize with other threads. Everything remains free-running.
>>> - Otherwise, try to use GC-safepoint-workers to do the marking at SP.
>>> The infrastructure for this is already there since some time, and both
>>> G1 and ZGC (and Shenandoah, when it arrives) support it. The
>>> safepoint-cleanup-phase already uses it, so let's just do the same in
>>> sweeper-loop-induced safepoints.
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8132849
>>> Webrev:
>>> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.01/
>>>
>>> Testing: hotspot/jtreg:tier1 using +ThreadLocalHandshakes and
>>> -ThreadLocalHandshakes
>>>
>>> One issue that I am not sure of is the:
>>>
>>> assert(SafepointSynchronize::is_at_safepoint(), "must be executed at a
>>> safepoint");
>>>
>>> at the start of NMethodSweeper::prepare_mark_active_nmethods().
>>>
>>> I couldn't see any particular reason for it. The
>>> wait_for_stack_scanning() stuff is called outside safepoinst anyway, and
>>> the other stuff doesn't seem critical. And besides, in the scenario
>>> where we'd call this outside safepoint (+ThreadLocalHandshakes) we'd
>>> only ever call it from the sweeper thread anyway.
>>>
>>> What do you think?
>>>
>>>
>


From kuaiwei.kw at alibaba-inc.com  Mon Sep 24 06:06:11 2018
From: kuaiwei.kw at alibaba-inc.com (Kuai Wei)
Date: Mon, 24 Sep 2018 14:06:11 +0800
Subject: =?UTF-8?B?5Zue5aSN77yaW1BhdGNoXSA4MjEwODUzOiBDMiBkb2Vzbid0IHNraXAgcG9zdCBiYXJyaWVy?=
 =?UTF-8?B?IGZvciBuZXcgYWxsb2NhdGVkIG9iamVjdHM=?=
In-Reply-To: <a5f9f824-efdc-c04f-3325-999c8ab2d43a@oracle.com>
References: <37b6e1a5-6dcf-4097-baca-3f9c03d38bb1.kuaiwei.kw@alibaba-inc.com>,
 <a5f9f824-efdc-c04f-3325-999c8ab2d43a@oracle.com>
Message-ID: <c48acfbd-335a-42be-a3de-eb3bb703f06d.kuaiwei.kw@alibaba-inc.com>


Hi Tobias,

  Thanks for your suggestion. I think your point is the region node may have new path in later parse phase, so we can not make sure the region node will be optimized.

  It's a good question and I checked it. Now I think it may not cause trouble. In post barrier reduce, the oop store use allocation node as base pointer. The data graph guarantee control of allocation node should dominate control of store. If allocation node is in pred of region node and there's a new path into region, the graph is bad because we can reach store without allocation. If allocation node is in a domination ancestor, the graph shape is a little complicated, so we can not reach control of allocation by skipping one region.

  The better solution is we can know the region node is created in exit_map and we will not change it in later. Is there any way to know it in compile time?

Thanks,
Kevin


------------------------------------------------------------------
????Tobias Hartmann <tobias.hartmann at oracle.com>
?????2018?9?20?(???) 23:22
??????(??) <kuaiwei.kw at alibaba-inc.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
????Re: [Patch] 8210853: C2 doesn't skip post barrier for new allocated objects

Hi,

isn't this code executed during parsing and therefore it could happen that more inputs are added to
the region? For example, by Parse::Block::add_new_path():
http://hg.openjdk.java.net/jdk/jdk/file/75e4ce0fa1ba/src/hotspot/share/opto/parse1.cpp#l1917

Best regards,
Tobias

On 18.09.2018 09:33, Kuai Wei wrote:
> 
> Hi,
> 
>   I made a patch to https://bugs.openjdk.java.net/browse/JDK-8210853 . Could you help review my change?
> 
> Background:
>   C2 could remove G1 post barrier if store to new allocated object. But the check of
> just_allocated_object will be prevent by a Region node which is created when inline initialize
> method of super class. The change is to check the pattern and skip the Region node.
> 
> src/hotspot/share/opto/graphKit.cpp
> 
>  // We use this to determine if an object is so "fresh" that
>  // it does not require card marks.
>  Node* GraphKit::just_allocated_object(Node* current_control) {
> -  if (C->recent_alloc_ctl() == current_control)
> +  Node * ctrl = current_control;
> +  // Object::<init> is invoked after allocation, most of invoke nodes
> +  // will be reduced, but a region node is kept in parse time, we check
> +  // the pattern and skip the region node
> +  if (ctrl != NULL && ctrl->is_Region() && ctrl->req() == 2) {
> +    ctrl = ctrl->in(1);
> +  }
> +  if (C->recent_alloc_ctl() == ctrl)
>      return C->recent_alloc_obj();
>    return NULL;
>  }
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180924/921b9218/attachment-0001.html>

From Alan.Bateman at oracle.com  Mon Sep 24 08:14:44 2018
From: Alan.Bateman at oracle.com (Alan Bateman)
Date: Mon, 24 Sep 2018 09:14:44 +0100
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
Message-ID: <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>

On 21/09/2018 16:44, Andrew Dinn wrote:
> Hi Alan,
>
> Thanks for the response and apologies for failing to notice you had
> posted it some days ago (doh!).
>
> Jonathan Halliday has already explained how Red Hat might want to use
> this API. Well, what he said, essentially! In particular, this model
> provides a way of ensuring that raw byte data is able to be persisted
> coherently from Java with the minimal possible overhead. It would be up
> to client code above this layer to implement structuring mechanisms for
> how those raw bytes get populated with data and to manage any associated
> issues regarding atomicity, consistency and isolation (i.e. to provide
> the A, C and I of ACID to this API's D).
>
> The main point of the JEP is to ensure that this such a performant base
> capability is available for known important cases where that is needed
> such as, for example, a transaction manager or a distributed cache. If
> equivalent middleware written in C can use persistent memory to bring
> the persistent storage tier nearer to the CPU and, hence, lower data
> durability overheads then we really need an equivalently performant
> option in Java or risk Java dropping out as a player in those middleware
> markets.
>
> I am glad to hear that other alternatives might be available and would
> be happy to consider them. However, I'm not sure that this means this
> option is not still desirable, especially if it is orthogonal to those
> other alternatives. Most importantly, this one has the advantage that we
> know it is ready to use and will provide benefits (we have already
> implemented a journaled transaction log over it with promising results
> and someone from our messaging team has already been looking into using
> it to persist log messages). Indeed, we also know we can use it to
> provide a base for supporting all the use cases addressed by Intel's
> libpmem and available to C programmers, e.g. a block store, simply by
> implementing Java client libraries that provide managed access to the
> persistent buffer along the same lines as the Intel C libraries.
>
> I'm afraid I am not familiar with Panama 'scopes' and 'pointers' so I
> can't really compare options here. Can you point me at any info that
> explains what those terms mean and how it might be possible to use them
> to access off-heap, persistent data.
>
I'm not questioning the need to support NVM, instead I'm trying to see 
whether MappedByteBuffer is the right way to expose this in the standard 
API. Buffers were designed in JSR-51 with specific use-cases in mind but 
they are problematic for many off-heap cases as they aren't thread safe, 
are limited to 2GB, lack confinement, only support homogeneous data (no 
layout support). At the same time, Project Panama (foreign branch in 
panama/dev) has the potential to provide the right API to work with 
memory. I see Jonathan's mail where he seems to be using object 
serialization so the solution on the table works for his use-case but it 
may not be the right solution for more general multi-threaded access to 
NVM. There is some interest in seeing whether this part of Project 
Panama could be advanced to address many of the cases where developers 
are resorting to using Unsafe today. There would of course need to be 
some integration with buffers too. There's no concrete proposal/JEP at 
this time, I'm just pointing out that many of the complaints about 
buffers that are really cases where it's the wrong API and the real need 
is something more fundamental.

So where does this leave us? If support for persistent memory is added 
to FileChannel.map as we've been discussing then it may not be too bad 
as the API surface is small. The API surface is just new map modes and a 
MappedByteBuffer::isPersistent method. The force method that specify a 
range is something useful to add to MBB anyway. If (and I hope when) 
there is support for memory regions or pointers then I could imagine 
re-visiting this so that there are alternative ways to get a memory 
region or pointer that is backed by NVM. If the timing were different 
then I think we'd skip the new map modes and we would be having a 
different discussion here. An alternative is course to create the mapped 
buffer via a JDK-specific API as that would be easier to deprecate and 
remove in the future if needed.

I'm interested to see if there is other input on this topic before it 
gets locked into extending the standard API.

-Alan.

From rkennke at redhat.com  Mon Sep 24 08:18:21 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 24 Sep 2018 10:18:21 +0200
Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of
 single-threaded walk of thread stacks in
 NMethodSweeper::mark_active_nmethods()
In-Reply-To: <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com>
References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com>
 <e16d7532-8c77-1380-61fa-146c49d1f926@oracle.com>
 <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com>
 <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com>
Message-ID: <cb4e1ff5-bdec-9275-c5bf-ef82edf9fe39@redhat.com>

Hi Erik,

> Thank you for sorting this out. It is very helpful.
> 
> Could you change the name of ThreadToCodeBlobClosure to
> NMethodMarkingThreadClosure. (Motivation: the closure filters
> JavaThreads that are not the sweeper, and actually only looks at
> nmethods, and not other types of CodeBlobs, e.g. AoT methods, so it does
> less than I expect).

Yes will do, but let's first agree on the other issues:

> Also, the NMethodSweeper::prepare_mark_active_nmethods() was built for
> safepoint cleaning and returns either hotness counting or nmethod
> marking closures. However, when moving nmethod marking out to be done
> concurrently with TLH, it is slightly confusing to have the same member
> function called from the concurrent context despite never ever wanting a
> hotness counter closure from there.

The hotness-counting-only-closure will never be used when called from
the sweeper thread because this only happens between sweeping-cycles.
E.g. when safepointing while sweeper is active, it would do only hotness
counting, when sweeper is idle, it would do nmethod marking, which
*also* does the hotness counting. With TLHS, we'd always do the full
thing, because it's only ever queried between sweeper cycles.

> I'm thinking the prepare_mark_active_nmethods() member function could be
> split into two:
> 
> One member function that returns either nmethod marking closure or NULL
> (depending on whether it's needed or not).
> Another member function that calls the first one, and if NULL slaps on a
> hotness counter closure.

We could do that, yes.

> Then from concurrent contexts we would call the first method (nmethod
> marking or NULL), and from STW contexts we would call the second member
> function (nmethod marking or hotness counter).

Right.

> Another thing worth noticing is that the VM_MarkActiveNMethods VM
> operation marks the nmethods on the stack twice. First in safepoint
> cleanup, and subsequently in the operation itself
> (VM_MarkActiveNMethods::doit). I would argue that only one pass is
> enough.

Right...

 Therefore, I would propose to completely remove the nmethod
> marking from the safepoint cleanup, and have safepoint cleanup *only*
> fiddle around with hotness counters. If we do that, then nmethod marking
> is done in VM_MarkActiveNMethods::doit if TLH is off, and in your new
> handshake operation when TLH is on.

Yeah, except that hotness counting is also done in nmethod marking pass.
Would it be enough if we just kept it there? Or do we want hotness
counting stuff to be done always in SP cleanup phase, and not piggy-back
it on nmethod marking?

> Then we can have zero nmethod marking in safepoint cleanup, and
> subsequently figure out how to get rid of the hotness counters.

Is there a use of doing nmethod marking more frequently than what is
forced in do_stack_scanning() ? Is there a use of doing frequent hotness
counters?

Roman


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180924/af911d15/signature.asc>

From rkennke at redhat.com  Mon Sep 24 09:36:09 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 24 Sep 2018 11:36:09 +0200
Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of
 single-threaded walk of thread stacks in
 NMethodSweeper::mark_active_nmethods()
In-Reply-To: <cb4e1ff5-bdec-9275-c5bf-ef82edf9fe39@redhat.com>
References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com>
 <e16d7532-8c77-1380-61fa-146c49d1f926@oracle.com>
 <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com>
 <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com>
 <cb4e1ff5-bdec-9275-c5bf-ef82edf9fe39@redhat.com>
Message-ID: <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com>


> Is there a use of doing nmethod marking more frequently than what is
> forced in do_stack_scanning() ?

As far as I can tell, it is sufficient to mark nmethods right before
sweeping. It might even be counter-productive to do more marking passes:
it would result in more non-entrant nmethods marked as 'seen on stack'
even if they are no longer on stack.

I am not 100% sure about the hotness counter though. From what I see,
it's only used for sweeper too, and it really looks like resetting the
counter on nmethod-walk is enough. But I'd like confirmation from
somebody who knows better than I do. If it's really good enough, we may
remove the nmethod stuff completely from SP cleanup, and also remove the
hotness-counter-closure, and always piggy-back the stuff on nmethod
walking, either in its own VM_Op, or in its handshake.

On the other hand, why is hotness counting and nmethod marking split out
in sp-cleanup in the first place then?

Roman

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180924/c128d954/signature-0001.asc>

From rkennke at redhat.com  Mon Sep 24 10:02:25 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 24 Sep 2018 12:02:25 +0200
Subject: RFR(M): 8210885: Convert left over loads/stores to access api
In-Reply-To: <dk65zz28toe.fsf@rwestrel.remote.csb>
References: <dk65zz28toe.fsf@rwestrel.remote.csb>
Message-ID: <4215c8ba-3d64-a506-1f48-716e30a2fa56@redhat.com>

Hi Roland,

the change looks good to me.

Thanks,
Roman

> http://cr.openjdk.java.net/~roland/8210885/webrev.00/
> 
> This converts some remaining loads and stores to the access API (as
> preparation for shenandoah). This also cleans up the C2 access API: some
> entry points get a control argument that's in practice useless because
> current control() is always used.
> 
> Roland.
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180924/09edece3/signature.asc>

From tobias.hartmann at oracle.com  Mon Sep 24 10:06:43 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 24 Sep 2018 12:06:43 +0200
Subject: RFR(M): 8210885: Convert left over loads/stores to access api
In-Reply-To: <dk65zz28toe.fsf@rwestrel.remote.csb>
References: <dk65zz28toe.fsf@rwestrel.remote.csb>
Message-ID: <322e26c7-6179-f14c-3c5d-abd4a1cae76d@oracle.com>

Hi Roland,

looks good to me.

Best regards,
Tobias

On 18.09.2018 21:57, Roland Westrelin wrote:
> 
> http://cr.openjdk.java.net/~roland/8210885/webrev.00/
> 
> This converts some remaining loads and stores to the access API (as
> preparation for shenandoah). This also cleans up the C2 access API: some
> entry points get a control argument that's in practice useless because
> current control() is always used.
> 
> Roland.
> 

From tobias.hartmann at oracle.com  Mon Sep 24 10:15:35 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 24 Sep 2018 12:15:35 +0200
Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy
In-Reply-To: <dk636u68t4l.fsf@rwestrel.remote.csb>
References: <dk636u68t4l.fsf@rwestrel.remote.csb>
Message-ID: <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com>

Hi Roland,

Looks good to me. I'm assuming the currently unused arguments of array_copy_requires_gc_barriers
will be used for Shenandoah? Why you need to distinguish between ArrayCopyPhase?

Best regards,
Tobias

On 18.09.2018 22:09, Roland Westrelin wrote:
> 
> http://cr.openjdk.java.net/~roland/8210887/webrev.00/
> 
> This extends the entry point of the c2 access api for arraycopy (in
> preparation for shenandoah). This also fixes some incorrect
> _adr_type's. It also modifies the ArrayCopyNode::Ideal() transform that
> produces a series of loads/stores so, as a subsequent change, barriers
> can be added to loads and stores: they need to produce and consume
> memory state other than the slice of the array being copied.
> 
> Roland.
> 

From erik.osterlund at oracle.com  Mon Sep 24 11:25:16 2018
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Mon, 24 Sep 2018 13:25:16 +0200
Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of
 single-threaded walk of thread stacks in
 NMethodSweeper::mark_active_nmethods()
In-Reply-To: <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com>
References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com>
 <e16d7532-8c77-1380-61fa-146c49d1f926@oracle.com>
 <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com>
 <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com>
 <cb4e1ff5-bdec-9275-c5bf-ef82edf9fe39@redhat.com>
 <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com>
Message-ID: <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com>

Hi Roman,

I think that by answering the meta question "wait why are we doing this" 
in this email, I will cover the questions in the previous email too.

The nmethod marking is strictly required so that after you have selected 
your not entrant nmethods that you want to nuke, you know that at some 
snapshot in time, they were not on the stack (and cant have become so 
afterwards because they are not entrant). As I mentioned earlier, doing 
this both in safepoint cleanup for every safepoint, as well as in the VM 
operation itself, is "questionable". Doing it in just the VM 
operation/handshake should be enough.

The hotness counting is not strictly necessary at all. In fact, you can 
turn it off with the JVM flag -XX:-UseCodeAging.

So the hotness counter updating is part of the code aging mechanism. 
This is more of a heuristic thing than a correctness thing. You can just 
wait until you run out of space in the code heap, and then nuke a bunch 
of stuff (using the nmethod marking mechanism), and you are good. But 
similar to how you in your GC algorithm want to avoid running into full 
GCs because they are expensive, you also want to avoid filling up the 
code heap, because the consequences of that are also very expensive. The 
code aging mechanism was therefore introduced as a way of figuring out 
if there are seemingly inactive nmethods that can be discarded before 
running out of code heap memory.

So the way that works is that you give each nmethod a counter that you 
decay every now and then, but heat up again when you see said nmethods 
on the stack. That way, the sweeper can look for nmethods that do not 
seem to have been found on the stack "for a while", and select them as 
good candidates for being inactive.

So to answer the question whether you can update hotness counters only 
when you mark nmethods... you can. But by doing that, it no longer 
serves its purpose of finding inactive nmethods, and becomes more of a 
piece of logic that we run occasionally for the fun of it. So we should 
not do that.

The reason that hotness counters are in safepoint cleanup, is to provide 
fresh stack samples to the sweeper.

So my suggestion for now is:
Do nmethod marking in VM operation/handshake operation.
Do hotness counter updating when UseCodeAging in safepoint cleanup.

And now you might be wondering if it really makes sense to walk all 
stacks in the system every safepoint, to provide some heuristic about 
whether nmethods are inactive or not. Arguably not. I have an idea about 
a much better way of doing this. I will get back to you in a few days 
about that.

Hope this helps.

Thanks,
/Erik

On 2018-09-24 11:36, Roman Kennke wrote:
>> Is there a use of doing nmethod marking more frequently than what is
>> forced in do_stack_scanning() ?
> As far as I can tell, it is sufficient to mark nmethods right before
> sweeping. It might even be counter-productive to do more marking passes:
> it would result in more non-entrant nmethods marked as 'seen on stack'
> even if they are no longer on stack.
>
> I am not 100% sure about the hotness counter though. From what I see,
> it's only used for sweeper too, and it really looks like resetting the
> counter on nmethod-walk is enough. But I'd like confirmation from
> somebody who knows better than I do. If it's really good enough, we may
> remove the nmethod stuff completely from SP cleanup, and also remove the
> hotness-counter-closure, and always piggy-back the stuff on nmethod
> walking, either in its own VM_Op, or in its handshake.
>
> On the other hand, why is hotness counting and nmethod marking split out
> in sp-cleanup in the first place then?
>
> Roman
>


From rkennke at redhat.com  Mon Sep 24 11:47:16 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 24 Sep 2018 13:47:16 +0200
Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of
 single-threaded walk of thread stacks in
 NMethodSweeper::mark_active_nmethods()
In-Reply-To: <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com>
References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com>
 <e16d7532-8c77-1380-61fa-146c49d1f926@oracle.com>
 <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com>
 <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com>
 <cb4e1ff5-bdec-9275-c5bf-ef82edf9fe39@redhat.com>
 <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com>
 <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com>
Message-ID: <89db00b1-cadd-8f78-92b3-253e2dd1a652@redhat.com>

Hi Erik,

> I think that by answering the meta question "wait why are we doing this"
> in this email, I will cover the questions in the previous email too.
> 
> The nmethod marking is strictly required so that after you have selected
> your not entrant nmethods that you want to nuke, you know that at some
> snapshot in time, they were not on the stack (and cant have become so
> afterwards because they are not entrant). As I mentioned earlier, doing
> this both in safepoint cleanup for every safepoint, as well as in the VM
> operation itself, is "questionable". Doing it in just the VM
> operation/handshake should be enough.
> 
> The hotness counting is not strictly necessary at all. In fact, you can
> turn it off with the JVM flag -XX:-UseCodeAging.
> 
> So the hotness counter updating is part of the code aging mechanism.
> This is more of a heuristic thing than a correctness thing. You can just
> wait until you run out of space in the code heap, and then nuke a bunch
> of stuff (using the nmethod marking mechanism), and you are good. But
> similar to how you in your GC algorithm want to avoid running into full
> GCs because they are expensive, you also want to avoid filling up the
> code heap, because the consequences of that are also very expensive. The
> code aging mechanism was therefore introduced as a way of figuring out
> if there are seemingly inactive nmethods that can be discarded before
> running out of code heap memory.
> 
> So the way that works is that you give each nmethod a counter that you
> decay every now and then, but heat up again when you see said nmethods
> on the stack. That way, the sweeper can look for nmethods that do not
> seem to have been found on the stack "for a while", and select them as
> good candidates for being inactive.
> 
> So to answer the question whether you can update hotness counters only
> when you mark nmethods... you can. But by doing that, it no longer
> serves its purpose of finding inactive nmethods, and becomes more of a
> piece of logic that we run occasionally for the fun of it. So we should
> not do that.
> 
> The reason that hotness counters are in safepoint cleanup, is to provide
> fresh stack samples to the sweeper.
> 
> So my suggestion for now is:
> Do nmethod marking in VM operation/handshake operation.
> Do hotness counter updating when UseCodeAging in safepoint cleanup.
> 
> And now you might be wondering if it really makes sense to walk all
> stacks in the system every safepoint, to provide some heuristic about
> whether nmethods are inactive or not. Arguably not. I have an idea about
> a much better way of doing this. I will get back to you in a few days
> about that.

Thanks for your explanations. That's more or less what I figured out
from studying the code too.

Couldn't we have a CodeAgeInterval (or similar) every this many ms we do
the hotness-reset-scan, either by firing (from sweeper thread) a TLHS or
a VM_Op ? This should get us a more regular sampling than doing this at
the somewhat random safepoint-prologue?

Roman

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180924/eecf3e61/signature-0001.asc>

From rkennke at redhat.com  Mon Sep 24 13:21:01 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 24 Sep 2018 15:21:01 +0200
Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of
 single-threaded walk of thread stacks in
 NMethodSweeper::mark_active_nmethods()
In-Reply-To: <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com>
References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com>
 <e16d7532-8c77-1380-61fa-146c49d1f926@oracle.com>
 <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com>
 <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com>
 <cb4e1ff5-bdec-9275-c5bf-ef82edf9fe39@redhat.com>
 <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com>
 <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com>
Message-ID: <6c97cf4c-517f-767c-6723-1b2f41080f7f@redhat.com>

> So my suggestion for now is:
> Do nmethod marking in VM operation/handshake operation.
> Do hotness counter updating when UseCodeAging in safepoint cleanup.

Ok, this change should do that:
Incremental:
http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.03.diff/
Full:
http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.03/

Note that nmethod-marking still resets hotness counters.

Good now for pushing?

Roman


> And now you might be wondering if it really makes sense to walk all
> stacks in the system every safepoint, to provide some heuristic about
> whether nmethods are inactive or not. Arguably not. I have an idea about
> a much better way of doing this. I will get back to you in a few days
> about that.
> 
> Hope this helps.
> 
> Thanks,
> /Erik
> 
> On 2018-09-24 11:36, Roman Kennke wrote:
>>> Is there a use of doing nmethod marking more frequently than what is
>>> forced in do_stack_scanning() ?
>> As far as I can tell, it is sufficient to mark nmethods right before
>> sweeping. It might even be counter-productive to do more marking passes:
>> it would result in more non-entrant nmethods marked as 'seen on stack'
>> even if they are no longer on stack.
>>
>> I am not 100% sure about the hotness counter though. From what I see,
>> it's only used for sweeper too, and it really looks like resetting the
>> counter on nmethod-walk is enough. But I'd like confirmation from
>> somebody who knows better than I do. If it's really good enough, we may
>> remove the nmethod stuff completely from SP cleanup, and also remove the
>> hotness-counter-closure, and always piggy-back the stuff on nmethod
>> walking, either in its own VM_Op, or in its handshake.
>>
>> On the other hand, why is hotness counting and nmethod marking split out
>> in sp-cleanup in the first place then?
>>
>> Roman
>>
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180924/ad4fc0df/signature.asc>

From tobias.hartmann at oracle.com  Mon Sep 24 13:34:28 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 24 Sep 2018 15:34:28 +0200
Subject: =?UTF-8?B?UmU6IOWbnuWkje+8mltQYXRjaF0gODIxMDg1MzogQzIgZG9lc24ndCBz?=
 =?UTF-8?Q?kip_post_barrier_for_new_allocated_objects?=
In-Reply-To: <c48acfbd-335a-42be-a3de-eb3bb703f06d.kuaiwei.kw@alibaba-inc.com>
References: <37b6e1a5-6dcf-4097-baca-3f9c03d38bb1.kuaiwei.kw@alibaba-inc.com>
 <a5f9f824-efdc-c04f-3325-999c8ab2d43a@oracle.com>
 <c48acfbd-335a-42be-a3de-eb3bb703f06d.kuaiwei.kw@alibaba-inc.com>
Message-ID: <be0d3e45-e565-69dc-8fa4-164a79e05d9e@oracle.com>

Hi Kevin,

On 24.09.2018 08:06, Kuai Wei wrote:
> ? Thanks for your suggestion. I think your point is the region node may have new path in later parse
> phase, so we can not make sure the region node will be optimized.

Yes, my point is that a new path to the region might be added after your optimization and that path
might contain stores to the newly allocated object.

> ? It's a good question and I checked it. Now I think it may not cause trouble. In post barrier
> reduce, the oop store use allocation node as base pointer. The data graph guarantee control of
> allocation node should dominate control of store. If allocation node is in pred of region node and
> there's a new path into region, the graph is bad because we can reach store without allocation.

Yes but the new path might be a backedge from a loop that is dominated by the allocation.

> If allocation node is in a domination ancestor, the graph shape is a little complicated, so we can not
> reach control of allocation by skipping one region.

Right, that's basically the implicit assumption of your patch. I'm not sure if it always holds. But
I think you should at least use RegionNode::is_copy().

Let's see what other reviewers think.

> ? The better solution is we can know the region node is created in exit_map and we will not change
> it in later. Is there any way to know it in compile time?

The region node is created in Parse::build_exits(). I don't think there is a way to keep track of this.

Thanks,
Tobias

From rkennke at redhat.com  Mon Sep 24 16:04:22 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 24 Sep 2018 18:04:22 +0200
Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of
 single-threaded walk of thread stacks in
 NMethodSweeper::mark_active_nmethods()
In-Reply-To: <6c97cf4c-517f-767c-6723-1b2f41080f7f@redhat.com>
References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com>
 <e16d7532-8c77-1380-61fa-146c49d1f926@oracle.com>
 <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com>
 <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com>
 <cb4e1ff5-bdec-9275-c5bf-ef82edf9fe39@redhat.com>
 <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com>
 <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com>
 <6c97cf4c-517f-767c-6723-1b2f41080f7f@redhat.com>
Message-ID: <818610b4-cf48-091a-3719-09f1f863c508@redhat.com>

Zhengyu noted off-list that the !ThreadLocalHandshakes version requires
to call Threads::change_thread_parity() before using
Threads::possibly_parallel_threads_do(), and that we can assert
is_Java_thread() instead of explicit filtering.

This change does that:
Incremental:
http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.04.diff/
Full:
http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.04/

Let me know what you think!

Thanks,
Roman

>> So my suggestion for now is:
>> Do nmethod marking in VM operation/handshake operation.
>> Do hotness counter updating when UseCodeAging in safepoint cleanup.
> 
> Ok, this change should do that:
> Incremental:
> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.03.diff/
> Full:
> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.03/
> 
> Note that nmethod-marking still resets hotness counters.
> 
> Good now for pushing?
> 
> Roman
> 
> 
>> And now you might be wondering if it really makes sense to walk all
>> stacks in the system every safepoint, to provide some heuristic about
>> whether nmethods are inactive or not. Arguably not. I have an idea about
>> a much better way of doing this. I will get back to you in a few days
>> about that.
>>
>> Hope this helps.
>>
>> Thanks,
>> /Erik
>>
>> On 2018-09-24 11:36, Roman Kennke wrote:
>>>> Is there a use of doing nmethod marking more frequently than what is
>>>> forced in do_stack_scanning() ?
>>> As far as I can tell, it is sufficient to mark nmethods right before
>>> sweeping. It might even be counter-productive to do more marking passes:
>>> it would result in more non-entrant nmethods marked as 'seen on stack'
>>> even if they are no longer on stack.
>>>
>>> I am not 100% sure about the hotness counter though. From what I see,
>>> it's only used for sweeper too, and it really looks like resetting the
>>> counter on nmethod-walk is enough. But I'd like confirmation from
>>> somebody who knows better than I do. If it's really good enough, we may
>>> remove the nmethod stuff completely from SP cleanup, and also remove the
>>> hotness-counter-closure, and always piggy-back the stuff on nmethod
>>> walking, either in its own VM_Op, or in its handshake.
>>>
>>> On the other hand, why is hotness counting and nmethod marking split out
>>> in sp-cleanup in the first place then?
>>>
>>> Roman
>>>
>>
> 
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180924/0586a000/signature.asc>

From rkennke at redhat.com  Mon Sep 24 16:46:46 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 24 Sep 2018 18:46:46 +0200
Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of
 single-threaded walk of thread stacks in
 NMethodSweeper::mark_active_nmethods()
In-Reply-To: <662e1e46-a48e-34db-da0a-df693160928d@redhat.com>
References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com>
 <e16d7532-8c77-1380-61fa-146c49d1f926@oracle.com>
 <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com>
 <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com>
 <cb4e1ff5-bdec-9275-c5bf-ef82edf9fe39@redhat.com>
 <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com>
 <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com>
 <6c97cf4c-517f-767c-6723-1b2f41080f7f@redhat.com>
 <818610b4-cf48-091a-3719-09f1f863c508@redhat.com>
 <662e1e46-a48e-34db-da0a-df693160928d@redhat.com>
Message-ID: <cd3f3259-f885-aad4-11e1-dd17a530cd5a@redhat.com>

Hi Zhengyu,

>> Zhengyu noted off-list that the !ThreadLocalHandshakes version requires
>> to call Threads::change_thread_parity() before using
>> Threads::possibly_parallel_threads_do(), and that we can assert
>> is_Java_thread() instead of explicit filtering.
> 
> My bad suggestion on assertion for Java thread,
> Threads::possibly_parallel_threads_do also walks VMThread, sorry!

Yes, I noticed that, and updated/reverted the webrev accordingly:
http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.05/

> Otherwise, looks good to me.

Thanks for reviewing!
Roman

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180924/82fbf807/signature-0001.asc>

From zgu at redhat.com  Mon Sep 24 16:40:45 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 24 Sep 2018 12:40:45 -0400
Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of
 single-threaded walk of thread stacks in
 NMethodSweeper::mark_active_nmethods()
In-Reply-To: <818610b4-cf48-091a-3719-09f1f863c508@redhat.com>
References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com>
 <e16d7532-8c77-1380-61fa-146c49d1f926@oracle.com>
 <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com>
 <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com>
 <cb4e1ff5-bdec-9275-c5bf-ef82edf9fe39@redhat.com>
 <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com>
 <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com>
 <6c97cf4c-517f-767c-6723-1b2f41080f7f@redhat.com>
 <818610b4-cf48-091a-3719-09f1f863c508@redhat.com>
Message-ID: <662e1e46-a48e-34db-da0a-df693160928d@redhat.com>

Hi Roman,

On 09/24/2018 12:04 PM, Roman Kennke wrote:
> Zhengyu noted off-list that the !ThreadLocalHandshakes version requires
> to call Threads::change_thread_parity() before using
> Threads::possibly_parallel_threads_do(), and that we can assert
> is_Java_thread() instead of explicit filtering.

My bad suggestion on assertion for Java thread, 
Threads::possibly_parallel_threads_do also walks VMThread, sorry!

Otherwise, looks good to me.

-Zhengyu

> 
> This change does that:
> Incremental:
> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.04.diff/
> Full:
> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.04/
> 
> Let me know what you think!
> 
> Thanks,
> Roman
> 
>>> So my suggestion for now is:
>>> Do nmethod marking in VM operation/handshake operation.
>>> Do hotness counter updating when UseCodeAging in safepoint cleanup.
>>
>> Ok, this change should do that:
>> Incremental:
>> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.03.diff/
>> Full:
>> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.03/
>>
>> Note that nmethod-marking still resets hotness counters.
>>
>> Good now for pushing?
>>
>> Roman
>>
>>
>>> And now you might be wondering if it really makes sense to walk all
>>> stacks in the system every safepoint, to provide some heuristic about
>>> whether nmethods are inactive or not. Arguably not. I have an idea about
>>> a much better way of doing this. I will get back to you in a few days
>>> about that.
>>>
>>> Hope this helps.
>>>
>>> Thanks,
>>> /Erik
>>>
>>> On 2018-09-24 11:36, Roman Kennke wrote:
>>>>> Is there a use of doing nmethod marking more frequently than what is
>>>>> forced in do_stack_scanning() ?
>>>> As far as I can tell, it is sufficient to mark nmethods right before
>>>> sweeping. It might even be counter-productive to do more marking passes:
>>>> it would result in more non-entrant nmethods marked as 'seen on stack'
>>>> even if they are no longer on stack.
>>>>
>>>> I am not 100% sure about the hotness counter though. From what I see,
>>>> it's only used for sweeper too, and it really looks like resetting the
>>>> counter on nmethod-walk is enough. But I'd like confirmation from
>>>> somebody who knows better than I do. If it's really good enough, we may
>>>> remove the nmethod stuff completely from SP cleanup, and also remove the
>>>> hotness-counter-closure, and always piggy-back the stuff on nmethod
>>>> walking, either in its own VM_Op, or in its handshake.
>>>>
>>>> On the other hand, why is hotness counting and nmethod marking split out
>>>> in sp-cleanup in the first place then?
>>>>
>>>> Roman
>>>>
>>>
>>
>>
> 
> 

From aph at redhat.com  Mon Sep 24 17:28:57 2018
From: aph at redhat.com (Andrew Haley)
Date: Mon, 24 Sep 2018 18:28:57 +0100
Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11
In-Reply-To: <dk6fty46wyy.fsf@rwestrel.remote.csb>
References: <dk6zhx64pl5.fsf@rwestrel.remote.csb>
 <98587949-4bef-51ad-0652-3a74f79af233@oracle.com>
 <dk67ek311xv.fsf@rwestrel.remote.csb>
 <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com>
 <dk65zzkz7wz.fsf@rwestrel.remote.csb>
 <ff6c3ae6-99a9-47bc-8dea-01232eab4927@bell-sw.com>
 <dca1a96d-4120-2c7d-6175-31137161ccdd@oracle.com>
 <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com>
 <f14ff724-5635-fb52-d18d-154395207979@bell-sw.com>
 <dk6efe7xfh2.fsf@rwestrel.remote.csb>
 <097e9014-db4a-0f2f-9b78-80cb5dcc7832@bell-sw.com>
 <dk68t4eydnw.fsf@rwestrel.remote.csb>
 <9b58801d-fcdd-88ff-7d45-15c44096137f@bell-sw.com>
 <ff5468fd-75b2-5568-c295-322c17fb4de3@redhat.com>
 <dk6fty46wyy.fsf@rwestrel.remote.csb>
Message-ID: <41303f32-81a2-c84a-cc9b-bfe79a6c9577@redhat.com>

On 09/20/2018 03:54 PM, Roland Westrelin wrote:
> 
>> mkay, but how, exactly? Is it simply the case that Intel is improved
>> so the patch is good, even if AArch64 regresses?
> 
> Well, no, I don't think that's an accurate description of what this
> is. Dmitry reported a performance regression but the generated code is
> almost identical with or without the patch (the only difference being
> that in one case the generated code uses b.cc and in the other
> b.eq). Dmitry also hypothesized that branch prediction may not perform
> as well with the patch. That doesn't seem directly related to the patch
> but more of an unfortunate side effect. So the patch simplifies the IR
> so less instructions may need to be emitted. That's not x86 specific. It
> just happens that aarch64 don't seem to be able to take advantage of it
> but it doesn't increase the number of instructions that aarch64 needs
> either or forces aarch64 to use less efficient instructions. So overall,
> it seemed to me there was no reasonable reason to not push this patch.

OK, I see. I agree that reasoning is sound. We already know that
perfectly reasonable improvements to JITs occasionally cause
regressions in some cases, but that's not a reason to reject such
improvements.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From aph at redhat.com  Mon Sep 24 18:03:42 2018
From: aph at redhat.com (Andrew Haley)
Date: Mon, 24 Sep 2018 19:03:42 +0100
Subject: RFR: 8211064: [AArch64] Interpreter and c1 don't correctly handle
 jboolean results in native calls
Message-ID: <e2931dbd-a1d9-e5e3-9afa-c0f1c78b99e8@redhat.com>

apetushkov sent me this little patch and I approved it offlist. I
will push to jdk-jdk.

http://cr.openjdk.java.net/~aph/8211064/

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From shade at redhat.com  Mon Sep 24 18:06:25 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 24 Sep 2018 20:06:25 +0200
Subject: RFR: 8211064: [AArch64] Interpreter and c1 don't correctly handle
 jboolean results in native calls
In-Reply-To: <e2931dbd-a1d9-e5e3-9afa-c0f1c78b99e8@redhat.com>
References: <e2931dbd-a1d9-e5e3-9afa-c0f1c78b99e8@redhat.com>
Message-ID: <747ac95d-48ea-0777-bfab-1bcf200073a4@redhat.com>

On 09/24/2018 08:03 PM, Andrew Haley wrote:
> apetushkov sent me this little patch and I approved it offlist. I
> will push to jdk-jdk.
> 
> http://cr.openjdk.java.net/~aph/8211064/

Looks good to me.

-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180924/653dbb25/signature.asc>

From ekaterina.pavlova at oracle.com  Mon Sep 24 19:37:53 2018
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Mon, 24 Sep 2018 12:37:53 -0700
Subject: RFR(XS) 8199885: [Graal]
 org.graalvm.compiler.core.test.CountedLoopTest fails with "ControlFlowAnchor
 should never be cloned in the same graph"
In-Reply-To: <a83bd802-6f6b-e3ed-4f5b-a11ffd56abab@oracle.com>
References: <18a99545-72ad-1cff-c940-25d67cda24ef@oracle.com>
 <8fbbe891-3cce-79d1-9b78-46cfad86c8e3@oracle.com>
 <badacd27-e685-53ed-d557-0af92ddd7851@oracle.com>
 <a83bd802-6f6b-e3ed-4f5b-a11ffd56abab@oracle.com>
Message-ID: <5c7d5612-dce1-4e97-8e51-4b03e58b90ef@oracle.com>

Graal team looked at the test failure log and it looks quite weird and rather "impossible" to reach that state.
I also did testing using latest jdk bits and no failures observed.
I would prefer to integrate the fix and file new bug in case new failures.

thanks,
-katya

On 8/14/18 11:49 AM, Vladimir Kozlov wrote:
> On 8/14/18 6:40 AM, Ekaterina Pavlova wrote:
>> On 8/13/18 12:25 PM, Vladimir Kozlov wrote:
>>> Katya,
>>>
>>> Did you confirmed that these tests are actually run in mach5 after these changes.
>>
>> yes, I do confirm.
>> However I started to observe intermittent failure of org.graalvm.compiler.core.test.CountedLoopTest today.
>> The failure is different. Let's postpone this change till I discuss new failure with Doug.
>> Please see other answers below.
>>
>>> I see conflicting '@requires' in test definition:
>>>
>>> ?? * @requires vm.opt.final.EnableJVMCI == true
>>> + * @requires !vm.graal.enabled
>>
>> well, they are not conflicting because vm.graal.enabled requires UseJVMCICompiler
>>
>>> The only runs when EnableJVMCI is specified are runs with Graal as JIT in which case second @requires will skip tests.
>>
>> we do run graal unit tests in 2 configurations:
>>
>> 1) graal-off: -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:-UseJVMCICompiler
> 
> I forgot about this mode. Yes, @require change is fine then.
>>
>> 2) graal-on:? -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+TieredCompilation -XX:+UseJVMCICompiler -Djvmci.Compiler=graal
>>
>> So, these tests will be run in first configuration and skipped in second one.
>>
>>> May be move these test into a special group to make sure to run them without Graal JIT but with JVMCI on.
>>
>> we already do it, graal unit tests are run using graal-off configuration as part of tier3 testing.
> 
> Got it.
> 
> Thanks,
> Vladimir
> 
>>
>> thanks,
>> -katya
>>
>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 8/13/18 5:26 AM, Ekaterina Pavlova wrote:
>>>> Hi All,
>>>>
>>>> please review the change which disables org.graalvm.compiler.core.test.* tests in Graal as JIT mode.
>>>> All these tests (except org.graalvm.compiler.core.test.tutorial.GraalTutorial and org.graalvm.compiler.core.test.StaticInterfaceFieldTest)
>>>> subclass GraalCompilerTest and were not designed to run in Graal as JIT mode.
>>>>
>>>> Doug also confirmed that disabling org.graalvm.compiler.core.test.tutorial.GraalTutorial and org.graalvm.compiler.core.test.StaticInterfaceFieldTest
>>>> is also the right way.
>>>>
>>>> Note, the tests will need to be modified/redesigned once Graal becomes default JIT compiler.
>>>>
>>>>
>>>> ???? JBS: https://bugs.openjdk.java.net/browse/JDK-8199885
>>>> ??webrev: http://cr.openjdk.java.net/~epavlova//8199885/webrev.00/index.html
>>>> testing: Run compiler/graalunit/CoreTest.java with enabled and disabled Graal. The test was skipped in case Graal was enabled and
>>>> ????????? passed in case Graal was disabled.
>>>>
>>>> thanks,
>>>> -katya
>>>>
>>>>
>>


From vladimir.kozlov at oracle.com  Mon Sep 24 20:49:26 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 24 Sep 2018 13:49:26 -0700
Subject: RFR(XS) 8199885: [Graal]
 org.graalvm.compiler.core.test.CountedLoopTest fails with "ControlFlowAnchor
 should never be cloned in the same graph"
In-Reply-To: <5c7d5612-dce1-4e97-8e51-4b03e58b90ef@oracle.com>
References: <18a99545-72ad-1cff-c940-25d67cda24ef@oracle.com>
 <8fbbe891-3cce-79d1-9b78-46cfad86c8e3@oracle.com>
 <badacd27-e685-53ed-d557-0af92ddd7851@oracle.com>
 <a83bd802-6f6b-e3ed-4f5b-a11ffd56abab@oracle.com>
 <5c7d5612-dce1-4e97-8e51-4b03e58b90ef@oracle.com>
Message-ID: <ec0e26b9-cce6-ea91-9488-a18e0f054cd4@oracle.com>

Okay.

Thanks,
Vladimir

On 9/24/18 12:37 PM, Ekaterina Pavlova wrote:
> Graal team looked at the test failure log and it looks quite weird and rather "impossible" to reach that state.
> I also did testing using latest jdk bits and no failures observed.
> I would prefer to integrate the fix and file new bug in case new failures.
> 
> thanks,
> -katya
> 
> On 8/14/18 11:49 AM, Vladimir Kozlov wrote:
>> On 8/14/18 6:40 AM, Ekaterina Pavlova wrote:
>>> On 8/13/18 12:25 PM, Vladimir Kozlov wrote:
>>>> Katya,
>>>>
>>>> Did you confirmed that these tests are actually run in mach5 after these changes.
>>>
>>> yes, I do confirm.
>>> However I started to observe intermittent failure of org.graalvm.compiler.core.test.CountedLoopTest today.
>>> The failure is different. Let's postpone this change till I discuss new failure with Doug.
>>> Please see other answers below.
>>>
>>>> I see conflicting '@requires' in test definition:
>>>>
>>>> ?? * @requires vm.opt.final.EnableJVMCI == true
>>>> + * @requires !vm.graal.enabled
>>>
>>> well, they are not conflicting because vm.graal.enabled requires UseJVMCICompiler
>>>
>>>> The only runs when EnableJVMCI is specified are runs with Graal as JIT in which case second @requires will skip tests.
>>>
>>> we do run graal unit tests in 2 configurations:
>>>
>>> 1) graal-off: -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:-UseJVMCICompiler
>>
>> I forgot about this mode. Yes, @require change is fine then.
>>>
>>> 2) graal-on:? -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+TieredCompilation -XX:+UseJVMCICompiler 
>>> -Djvmci.Compiler=graal
>>>
>>> So, these tests will be run in first configuration and skipped in second one.
>>>
>>>> May be move these test into a special group to make sure to run them without Graal JIT but with JVMCI on.
>>>
>>> we already do it, graal unit tests are run using graal-off configuration as part of tier3 testing.
>>
>> Got it.
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> thanks,
>>> -katya
>>>
>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 8/13/18 5:26 AM, Ekaterina Pavlova wrote:
>>>>> Hi All,
>>>>>
>>>>> please review the change which disables org.graalvm.compiler.core.test.* tests in Graal as JIT mode.
>>>>> All these tests (except org.graalvm.compiler.core.test.tutorial.GraalTutorial and 
>>>>> org.graalvm.compiler.core.test.StaticInterfaceFieldTest)
>>>>> subclass GraalCompilerTest and were not designed to run in Graal as JIT mode.
>>>>>
>>>>> Doug also confirmed that disabling org.graalvm.compiler.core.test.tutorial.GraalTutorial and 
>>>>> org.graalvm.compiler.core.test.StaticInterfaceFieldTest
>>>>> is also the right way.
>>>>>
>>>>> Note, the tests will need to be modified/redesigned once Graal becomes default JIT compiler.
>>>>>
>>>>>
>>>>> ???? JBS: https://bugs.openjdk.java.net/browse/JDK-8199885
>>>>> ??webrev: http://cr.openjdk.java.net/~epavlova//8199885/webrev.00/index.html
>>>>> testing: Run compiler/graalunit/CoreTest.java with enabled and disabled Graal. The test was skipped in case Graal 
>>>>> was enabled and
>>>>> ????????? passed in case Graal was disabled.
>>>>>
>>>>> thanks,
>>>>> -katya
>>>>>
>>>>>
>>>
> 

From vladimir.kozlov at oracle.com  Mon Sep 24 21:51:01 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 24 Sep 2018 14:51:01 -0700
Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF5FCD@FMSMSX126.amr.corp.intel.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com>
 <aea92384-d0f4-ee99-e0a2-fed4f0d41cf7@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com>
 <be409c1c-789c-3160-f1d4-55acab52d8df@oracle.com>
 <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com>
 <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com>
 <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com>
 <3c0fe42a-0606-a060-7435-32547676683b@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com>
 <CAF9BGBzuA33v_aP7eJ_9hdnbv_jwH=GYRXcr-fG5Zzi-fdyfxw@mail.gmail.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF4BB3@FMSMSX126.amr.corp.intel.com>
 <f45b8b3e-182b-c1d3-e8c7-a33f29e6bd43@oracle.com>
 <3cbd216b-56f9-0398-db93-4e032af85dc1@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF5771@FMSMSX126.amr.corp.intel.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF5FCD@FMSMSX126.amr.corp.intel.com>
Message-ID: <5170787d-479a-22e2-8191-0fad15ddd770@oracle.com>

Looks good. I start testing again.

I don't see your 'submit' job in a system. I asked people to look what happened.

Thanks,
Vladimir

On 9/21/18 2:30 PM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> Please find the updated webrev with fix for build failure on SPARC and other architectures at:
> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.04/
> RFE: https://bugs.openjdk.java.net/browse/JDK-8210764
> 
> Vivek submitted this webrev for testing to submit repo yesterday at around noon. We haven?t received any email back so far. This is our first time with submit repo.
> http://mail.openjdk.java.net/pipermail/jdk-submit-changes/2018-September/003164.html
> 
> Best Regards,
> Sandhya
> 
> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Viswanathan, Sandhya
> Sent: Thursday, September 20, 2018 10:53 AM
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>
> Subject: RE: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
> 
> Hi Vladimir,
> 
> In C1_LIRAssembler.hpp, when I added an additional parameter to negate, I did make sure to add it as a default parameter:
> 
> src/hotspot/share/c1/c1_LIRAssembler.hpp, line 282:
>    void negate(LIR_Opr left, LIR_Opr dest, LIR_Opr tmp = LIR_OprFact::illegalOpr);
> 
> But I guess since the function is not just called but declared/defined in all the other architectures, I need to add an unused LIR_Opr to the negate function for them.
> This would be on similar lines as done in some other C1_LIRAssembler methods.
> 
> I will make this change and work with Vivek to use the submit repo for testing it on Sparc.
> 
> Best Regards,
> Sandhya
> 
> 
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, September 20, 2018 10:09 AM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
> 
> I hit build failure on SPARC due to shared changes in C1:
> 
> workspace/open/src/hotspot/cpu/sparc/c1_LIRAssembler_sparc.cpp", line 3027: Error: "LIR_Assembler::negate(LIR_OprDesc*,
> LIR_OprDesc*)" was previously declared "LIR_Assembler::negate(LIR_OprDesc*, LIR_OprDesc*, LIR_OprDesc*)".
> jib > 1 Error(s) detected.
> 
> I assume other platforms are also affected.
> 
> Vladimir
> 
> On 9/19/18 9:53 AM, Vladimir Kozlov wrote:
>> Thank you, Sandhya
>>
>> I submitted new testing.
>>
>> Vladimir
>>
>> On 9/18/18 4:52 PM, Viswanathan, Sandhya wrote:
>>> Hi Vladimir,
>>>
>>> Please find below the updated webrev with fixes for the two issues:
>>>
>>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.03/
>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.03/>
>>>
>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8210764
>>>
>>> Fix for compiler/vectorization/TestNaNVector.java was to pass legVecS
>>> as the temporary register type for intrinsics instead of legVecD.
>>>
>>> This test was only failing with -XX:MaxVectorSize=4.
>>>
>>> The file modified is x86_64.ad.
>>>
>>> Fix for compiler/vectorization/TestNaNVector.java was to allow all
>>> xmm registers (xmm0-xmm31) for C1 and handle floating point abs and negate appropriately by providing a temp register.
>>>
>>> The C1 files are modified for this fix.
>>>
>>> I reran compiler jtreg tests with setting nativepath appropriately on Haswell, SKX and KNL.
>>>
>>> Best Regards,
>>>
>>> Sandhya
>>>
>>> *From:*Viswanathan, Sandhya
>>> *Sent:* Tuesday, September 18, 2018 1:47 PM
>>> *To:* 'JC Beyler' <jcbeyler at google.com>
>>> *Cc:* vladimir.kozlov at oracle.com; hotspot-compiler-dev
>>> <hotspot-compiler-dev at openjdk.java.net>
>>> *Subject:* RE: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>> instruction
>>>
>>> Hi Jc,
>>>
>>> Thanks a lot for the steps. I am now able to verify my fix for the NativeCallTest.java.
>>>
>>> Best Regards,
>>>
>>> Sandhya
>>>
>>> *From:*JC Beyler [mailto:jcbeyler at google.com]
>>> *Sent:* Monday, September 17, 2018 9:29 PM
>>> *To:* Viswanathan, Sandhya <sandhya.viswanathan at intel.com
>>> <mailto:sandhya.viswanathan at intel.com>>
>>> *Cc:* vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>;
>>> hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net
>>> <mailto:hotspot-compiler-dev at openjdk.java.net>>
>>> *Subject:* Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>> instruction
>>>
>>> Hi Sandhya,
>>>
>>> How are you invoking the test for NativeCallTest?
>>>
>>> The way I would do it using jtreg would be something like this:
>>>
>>> $ export BUILD_TYPE=release
>>>
>>> $ export JDK_PATH=wherever you have your JDK
>>>
>>>  ?From the test subfolder:
>>>
>>> $ wherever-your-jtreg-is/bin/jtreg
>>> -nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/su
>>> pport/test/hotspot/jtreg/native/lib -jdk
>>> $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk
>>> hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/t
>>> est/NativeCallTest.java
>>>
>>> Seems to pass for me.
>>>
>>> But much easier is:
>>>
>>> $ make run-test TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java"
>>>
>>> That seems to pass for me as well and is easier to use :)
>>>
>>> For information, the make run-test documentation is here:
>>>
>>> http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing.
>>> html
>>>
>>> Let me know if that helps,
>>>
>>> Jc
>>>
>>>  ??? Note: For NativeCallTest.java and many others I get the following message in the corresponding .jtr file:
>>>  ???? ? ? ? ? ? ? ? ? ? ? "test result: Error. Use -nativepath to specify the location of native code"
>>>  ???? ? ? ? ? ? ? Do I need to give any additional info to jtreg to get over this problem?
>>>
>>>  ??? Thanks a lot!
>>>  ??? Best Regards,
>>>  ??? Sandhya
>>>
>>>  ??? -----Original Message-----
>>>  ??? From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com
>>> <mailto:vladimir.kozlov at oracle.com>]
>>>  ??? Sent: Monday, September 17, 2018 10:14 AM
>>>  ??? To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com
>>> <mailto:sandhya.viswanathan at intel.com>>;
>>>  ??? hotspot-compiler-dev at openjdk.java.net
>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>  ??? Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>> instruction
>>>
>>>  ??? I finished testing on avx512 machine.
>>>  ??? All passed except known (TestNaNVector.java) failures.
>>>
>>>  ??? Thanks,
>>>  ??? Vladimir
>>>
>>>  ??? On 9/14/18 5:22 PM, Vladimir Kozlov wrote:
>>>  ???? > I gave incorrect link to RFE. Here is correct:
>>>  ???? >
>>>  ???? > https://bugs.openjdk.java.net/browse/JDK-8210764
>>>  ???? >
>>>  ???? > Vladimir
>>>  ???? >
>>>  ???? > On 9/14/18 3:49 PM, Vladimir Kozlov wrote:
>>>  ???? >> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed.
>>>  ???? >>
>>>  ???? >> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too.
>>>  ???? >>
>>>  ???? >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation'
>>>  ??? on CPU
>>>  ???? >> with AVX1 only
>>>  ???? >>
>>>  ???? >> #? SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871,
>>> tid=13884
>>>  ???? >> # Problematic frame:
>>>  ???? >> # V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20
>>>  ???? >>
>>>  ???? >> Current CompileTask:
>>>  ???? >> C2:??? 154??? 5???????????? java.lang.String::equals (65
>>> bytes)
>>>  ???? >>
>>>  ???? >> Stack: [0x00007f3b10044000,0x00007f3b10145000],
>>> sp=0x00007f3b1013fe70,? free space=1007k
>>>  ???? >> Native frames: (J=compiled Java code, A=aot compiled Java
>>> code, j=interpreted, Vv=VM code, C=native code)
>>>  ???? >> V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20
>>>  ???? >> V? [libjvm.so+0x882a72]
>>> PhaseChaitin::gather_lrg_masks(bool)+0x872
>>>  ???? >> V? [libjvm.so+0xd82235]? PhaseCFG::global_code_motion()+0xfc5
>>>  ???? >> V? [libjvm.so+0xd824b1]
>>> PhaseCFG::do_global_code_motion()+0x51
>>>  ???? >> V? [libjvm.so+0xa2c26c]? Compile::Code_Gen()+0x24c
>>>  ???? >> V? [libjvm.so+0xa2ff82]? Compile::Compile(ciEnv*,
>>> C2Compiler*, ciMethod*, int, bool, bool, bool,
>>>  ??? DirectiveSet*)+0xe42
>>>  ???? >>
>>>  ???? >>
>>> ---------------------------------------------------------------------
>>> ---------------------------
>>>  ???? >> 2.
>>>
>>>  ??? with '-Xcomp'
>>>  ???? >> #? Internal Error
>>> (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646),
>>> pid=22016, tid=22073
>>>  ???? >> #? assert(false) failed: cannot spill interval that is used
>>> in first instruction (possible reason: no register
>>>  ??? found)
>>>  ???? >>
>>>  ???? >> Current CompileTask:
>>>  ???? >> C1: 854767 13391?????? 3 org.sunflow.math.Matrix4::multiply
>>> (692 bytes)
>>>  ???? >>
>>>  ???? >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],
>>> sp=0x00007f23b9e7f9d0,? free space=1014k
>>>  ???? >> Native frames: (J=compiled Java code, A=aot compiled Java
>>> code, j=interpreted, Vv=VM code, C=native code)
>>>  ???? >> V? [libjvm.so+0x1882202]? VMError::report_and_die(int, char
>>> const*, char const*, __va_list_tag*, Thread*, unsigned
>>>  ???? >> char*, void*, void*, char const*, int, unsigned long)+0x562
>>>  ???? >> V? [libjvm.so+0x1882d2f]? VMError::report_and_die(Thread*,
>>> void*, char const*, int, char const*, char const*,
>>>  ???? >> __va_list_tag*)+0x2f
>>>  ???? >> V? [libjvm.so+0xb0bea0]? report_vm_error(char const*, int,
>>> char const*, char const*, ...)+0x100
>>>  ???? >> V? [libjvm.so+0x7e0410]
>>> LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0
>>>  ???? >> V? [libjvm.so+0x7e0a20]
>>> LinearScanWalker::activate_current()+0x280
>>>  ???? >> V? [libjvm.so+0x7e0c7d]? IntervalWalker::walk_to(int) [clone
>>> .constprop.299]+0x9d
>>>  ???? >> V? [libjvm.so+0x7e1078]
>>> LinearScan::allocate_registers()+0x338
>>>  ???? >> V? [libjvm.so+0x7e2135]? LinearScan::do_linear_scan()+0x155
>>>  ???? >> V? [libjvm.so+0x70a6bb]? Compilation::emit_lir()+0x99b
>>>  ???? >> V? [libjvm.so+0x70caff]
>>> Compilation::compile_java_method()+0x42f
>>>  ???? >> V? [libjvm.so+0x70d974]? Compilation::compile_method()+0x1d4
>>>  ???? >> V? [libjvm.so+0x70e547]
>>> Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int,
>>> BufferBlob*,
>>>  ???? >> DirectiveSet*)+0x357
>>>  ???? >> V? [libjvm.so+0x71073c]? Compiler::compile_method(ciEnv*,
>>> ciMethod*, int, DirectiveSet*)+0x14c
>>>  ???? >> V? [libjvm.so+0xa3cf89]
>>> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409
>>>  ???? >>
>>>  ???? >> Vladimir
>>>  ???? >>
>>>  ???? >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote:
>>>  ???? >>>
>>>  ???? >>> Thanks Vladimir, the below should fix this issue:
>>>  ???? >>>
>>>  ???? >>> ------------------------------
>>>  ???? >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14
>>> 13:10:23.488379912 -0700
>>>  ???? >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14
>>> 13:10:23.308379915 -0700
>>>  ???? >>> @@ -233,22 +233,6 @@
>>>  ???? >>> ??? _xmm_regs[13]? = xmm13;
>>>  ???? >>> ??? _xmm_regs[14]? = xmm14;
>>>  ???? >>> ??? _xmm_regs[15]? = xmm15;
>>>  ???? >>> -? _xmm_regs[16]? = xmm16;
>>>  ???? >>> -? _xmm_regs[17]? = xmm17;
>>>  ???? >>> -? _xmm_regs[18]? = xmm18;
>>>  ???? >>> -? _xmm_regs[19]? = xmm19;
>>>  ???? >>> -? _xmm_regs[20]? = xmm20;
>>>  ???? >>> -? _xmm_regs[21]? = xmm21;
>>>  ???? >>> -? _xmm_regs[22]? = xmm22;
>>>  ???? >>> -? _xmm_regs[23]? = xmm23;
>>>  ???? >>> -? _xmm_regs[24]? = xmm24;
>>>  ???? >>> -? _xmm_regs[25]? = xmm25;
>>>  ???? >>> -? _xmm_regs[26]? = xmm26;
>>>  ???? >>> -? _xmm_regs[27]? = xmm27;
>>>  ???? >>> -? _xmm_regs[28]? = xmm28;
>>>  ???? >>> -? _xmm_regs[29]? = xmm29;
>>>  ???? >>> -? _xmm_regs[30]? = xmm30;
>>>  ???? >>> -? _xmm_regs[31]? = xmm31;
>>>  ???? >>> ? #endif // _LP64
>>>  ???? >>>
>>>  ???? >>> ??? for (int i = 0; i < 8; i++) {
>>>  ???? >>> ---------------------------------
>>>  ???? >>>
>>>  ???? >>> I think the gcc version on my desktop is older so didn?t catch this.
>>>  ???? >>>
>>>  ???? >>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to:
>>>  ???? >>> Patch:
>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/
>>>  ??? <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.02/>
>>>  ???? >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735
>>>  ???? >>>
>>>  ???? >>> FYI, I did notice that the default for UseAVX had been
>>> rolled back and wanted to get confirmation from you before
>>>  ???? >>> changing it back to 3.
>>>  ???? >>>
>>>  ???? >>> Best Regards,
>>>  ???? >>> Sandhya
>>>  ???? >>>
>>>  ???? >>>
>>>  ???? >>> -----Original Message-----
>>>  ???? >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com
>>> <mailto:vladimir.kozlov at oracle.com>]
>>>  ???? >>> Sent: Friday, September 14, 2018 12:13 PM
>>>  ???? >>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com
>>> <mailto:sandhya.viswanathan at intel.com>>;
>>>  ??? hotspot-compiler-dev at openjdk.java.net
>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>  ???? >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>> instruction
>>>  ???? >>>
>>>  ???? >>> I got build failure:
>>>  ???? >>>
>>>  ???? >>>
>>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error:
>>> array index 16 is past the end of the array
>>>  ???? >>> (which contains 16 elements) [-Werror,-Warray-bounds]
>>>  ???? >>> jib >?? _xmm_regs[16]? = xmm16;
>>>  ???? >>>
>>>  ???? >>> I also noticed that we don't have RFE for this work. I filed:
>>>  ???? >>>
>>>  ???? >>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>  ???? >>>
>>>  ???? >>> You did not enabled avx512 by default (8209735 change was
>>> synced from jdk 11 into 12 2 weeks ago). I added next
>>>  ???? >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp:
>>>  ???? >>>
>>>  ???? >>> - product(intx, UseAVX, 2, \
>>>  ???? >>> + product(intx, UseAVX, 3, \
>>>  ???? >>>
>>>  ???? >>> Thanks,
>>>  ???? >>> Vladimir
>>>  ???? >>>
>>>  ???? >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote:
>>>  ???? >>>> Looks good to me. I will start testing and let you know results.
>>>  ???? >>>>
>>>  ???? >>>> Thanks,
>>>  ???? >>>> Vladimir
>>>  ???? >>>>
>>>  ???? >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
>>>  ???? >>>>> Hi Vladimir,
>>>  ???? >>>>>
>>>  ???? >>>>> Please find below the updated webrev with all your comments incorporated:
>>>  ???? >>>>>
>>>  ???? >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
>>>  ??? <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.01/>
>>>  ???? >>>>>
>>>  ???? >>>>> I have run the jtreg compiler tests on SKX and KNL which
>>> have two
>>>  ???? >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms.
>>>  ???? >>>>>
>>>  ???? >>>>> Best Regards,
>>>  ???? >>>>> Sandhya
>>>  ???? >>>>>
>>>  ???? >>>>>
>>>  ???? >>>>> -----Original Message-----
>>>  ???? >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com
>>> <mailto:vladimir.kozlov at oracle.com>]
>>>  ???? >>>>> Sent: Tuesday, September 11, 2018 8:54 PM
>>>  ???? >>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com
>>> <mailto:sandhya.viswanathan at intel.com>>;
>>>  ???? >>>>> hotspot-compiler-dev at openjdk.java.net
>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>  ???? >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>> instruction
>>>  ???? >>>>>
>>>  ???? >>>>> Thank you, Sandhya
>>>  ???? >>>>>
>>>  ???? >>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.
>>>  ???? >>>>>
>>>  ???? >>>>> Vladimir
>>>  ???? >>>>>
>>>  ???? >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
>>>  ???? >>>>>> Hi Vladimir,
>>>  ???? >>>>>>
>>>  ???? >>>>>> Thanks a lot for the detailed review. I really appreciate your feedback.
>>>  ???? >>>>>> Please see my response in your email below marked with
>>> (Sandhya
>>>  ???? >>>>>>>>> ). Looking forward to your advice.
>>>  ???? >>>>>>
>>>  ???? >>>>>> Best Regards,
>>>  ???? >>>>>> Sandhya
>>>  ???? >>>>>>
>>>  ???? >>>>>>
>>>  ???? >>>>>> -----Original Message-----
>>>  ???? >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com
>>> <mailto:vladimir.kozlov at oracle.com>]
>>>  ???? >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM
>>>  ???? >>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com
>>> <mailto:sandhya.viswanathan at intel.com>>;
>>>  ???? >>>>>> hotspot-compiler-dev at openjdk.java.net
>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>  ???? >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>>  ???? >>>>>> instruction
>>>  ???? >>>>>>
>>>  ???? >>>>>> Thank you.
>>>  ???? >>>>>>
>>>  ???? >>>>>> I want to discuss next issue:
>>>  ???? >>>>>>
>>>  ???? >>>>>> ??? > You did not added instructions to load these
>>> registers from
>>>  ???? >>>>>> memory (and stack). What happens in such cases when you need to load or store?
>>>  ???? >>>>>> ??? >>>> Let us take an example, e.g. for loading into
>>> rregF. First
>>>  ???? >>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>  ???? >>>>>>
>>>  ???? >>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack.
>>>  ???? >>>>>> Also we don't check that register could be the same as result you may get unneeded moves.
>>>  ???? >>>>>>
>>>  ???? >>>>>> I would advice add memory moves at least.
>>>  ???? >>>>>>
>>>  ???? >>>>>> Sandhya >>>? I had added those rules initially and
>>> removed them in
>>>  ???? >>>>>> the final patch. I noticed that the register allocator
>>> uses the
>>>  ???? >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg
>>> mask
>>>  ???? >>>>>> (matcher.cpp). I would like the register allocator to get
>>> all the
>>>  ???? >>>>>> possible register on an architecture for idealreg2reg
>>> mask. I
>>>  ???? >>>>>> wondered that multiple instruct rules in .ad file for
>>> LoadF from
>>>  ???? >>>>>> memory might cause problems.? I would have to have higher
>>> cost for
>>>  ???? >>>>>> loading into restricted register set like vlReg. Then I
>>> decided that
>>>  ???? >>>>>> the register allocator can handle this in much better way
>>> than me
>>>  ???? >>>>>> adding rules to load from memory. This is with the
>>> background that the regF is always all the available
>>>  ??? registers
>>>  ???? >>>>>> and vlRegF is the restricted register set. Likewise for
>>> VecS and legVecS. Let me know you thoughts on this
>>>  ??? and if
>>>  ???? >>>>>> I should still add the rules to load from memory into
>>> vlReg and legVec. The specific code from matcher.cpp
>>>  ??? that I
>>>  ???? >>>>>> am referring to is:
>>>  ???? >>>>>> ???? MachNode *spillCP = match_tree(new
>>>  ???? >>>>>>
>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>>  ???? >>>>>> #endif
>>>  ???? >>>>>> ???? MachNode *spillI? = match_tree(new
>>>  ???? >>>>>>
>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>>>  ???? >>>>>> ???? MachNode *spillL? = match_tree(new
>>>  ???? >>>>>>
>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered,
>>>  ???? >>>>>> LoadNode::DependsO nlyOnTest, false));
>>>  ???? >>>>>> ???? MachNode *spillF? = match_tree(new
>>>  ???? >>>>>>
>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>>>  ???? >>>>>> ???? MachNode *spillD? = match_tree(new
>>>  ???? >>>>>>
>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>>>  ???? >>>>>> ???? MachNode *spillP? = match_tree(new
>>>  ???? >>>>>>
>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>>  ???? >>>>>> ???? ....
>>>  ???? >>>>>> ???? idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
>>>  ???? >>>>>>
>>>  ???? >>>>>> An other question. You use movflt() and movdbl() which
>>> use either
>>>  ???? >>>>>> movap[s|d] and movs[s|d]
>>>  ???? >>>>>> instructions:
>>>  ???? >>>>>>
>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu
>>>  ???? >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions
>>> work when
>>>  ???? >>>>>> avx512vl is not available? I see for vectors you use
>>>  ???? >>>>>> vpxor+vinserti* combination.
>>>  ???? >>>>>>
>>>  ???? >>>>>> Sandhya >>> Yes the scalar floating point instructions
>>> are available
>>>  ???? >>>>>> with AVX512 encoding when avx512vl is not available. That
>>> is why you
>>>  ???? >>>>>> would see not just movflt, movdbl but all the other
>>> scalar
>>>  ???? >>>>>> operations like adds, addsd etc using the entire xmm
>>> range (xmm0-31). In other words they are AVX512F
>>>  ??? instructions.
>>>  ???? >>>>>>
>>>  ???? >>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad <http://x86.ad>:
>>>  ???? >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
>>>  ???? >>>>>>
>>>  ???? >>>>>> Should it be (UseAVX < 3)?
>>>  ???? >>>>>>
>>>  ???? >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
>>>  ???? >>>>>>
>>>  ???? >>>>>> Thanks,
>>>  ???? >>>>>> Vladimir
>>>  ???? >>>>>>
>>>  ???? >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>>>  ???? >>>>>>> Hi Vladimir,
>>>  ???? >>>>>>>
>>>  ???? >>>>>>> Thanks a lot for your review and feedback. Please see my
>>> response
>>>  ???? >>>>>>> in your email below. I will send an updated webrev incorporating your feedback.
>>>  ???? >>>>>>>
>>>  ???? >>>>>>> Best Regards,
>>>  ???? >>>>>>> Sandhya
>>>  ???? >>>>>>>
>>>  ???? >>>>>>>
>>>  ???? >>>>>>> -----Original Message-----
>>>  ???? >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com
>>> <mailto:vladimir.kozlov at oracle.com>]
>>>  ???? >>>>>>> Sent: Monday, September 10, 2018 6:09 PM
>>>  ???? >>>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com
>>> <mailto:sandhya.viswanathan at intel.com>>;
>>>  ???? >>>>>>> hotspot-compiler-dev at openjdk.java.net
>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>  ???? >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on
>>> AVX512
>>>  ???? >>>>>>> instruction
>>>  ???? >>>>>>>
>>>  ???? >>>>>>> Very nice. Thank you, Sandhya.
>>>  ???? >>>>>>>
>>>  ???? >>>>>>> I would like to see more meaningful naming in .ad files
>>> - instead
>>>  ???? >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*.
>>>  ???? >>>>>>>
>>>  ???? >>>>>>>>>> Yes, accepted.
>>>  ???? >>>>>>>
>>>  ???? >>>>>>> New load_from_* and load_to_* instructions in .ad files
>>> should be
>>>  ???? >>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions:
>>>  ???? >>>>>>>
>>>  ???? >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct
>>> MoveVL2F(regF dst,
>>>  ???? >>>>>>> vlRegF src)
>>>  ???? >>>>>>>>>> Yes, accepted.
>>>  ???? >>>>>>>
>>>  ???? >>>>>>> You did not added instructions to load these registers
>>> from memory
>>>  ???? >>>>>>> (and stack). What happens in such cases when you need to load or store?
>>>  ???? >>>>>>>>>> Let us take an example, e.g. for loading into rregF.
>>> First it
>>>  ???? >>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>  ???? >>>>>>>
>>>  ???? >>>>>>> Also please explain why these registers are used when UseAVX == 0?:
>>>  ???? >>>>>>>
>>>  ???? >>>>>>> +instruct absD_reg(rregD dst) %{
>>>  ???? >>>>>>> ??????? predicate((UseSSE>=2) && (UseAVX == 0));
>>>  ???? >>>>>>>
>>>  ???? >>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too:
>>>  ???? >>>>>>> ????? 661?? if (UseAVX < 3) {
>>>  ???? >>>>>>> ????? 662???? _features &= ~CPU_AVX512F;
>>>  ???? >>>>>>>
>>>  ???? >>>>>>>>>> Yes, accepted. It could be regD here.
>>>  ???? >>>>>>>
>>>  ???? >>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
>>>  ???? >>>>>>>
>>>  ???? >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>>>  ???? >>>>>>> +vectors_reg_legacy, %{
>>>  ???? >>>>>>> VM_Version::supports_evex() &&
>>> VM_Version::supports_avx512bw() &&
>>>  ???? >>>>>>> VM_Version::supports_avx512dq() &&
>>>  ???? >>>>>>> VM_Version::supports_avx512vl() %} );
>>>  ???? >>>>>>>
>>>  ???? >>>>>>>>>> Yes, accepted.
>>>  ???? >>>>>>>
>>>  ???? >>>>>>> I would suggest to test these changes on different
>>> machines
>>>  ???? >>>>>>> (non-avx512 and avx512) and with different UseAVX values.
>>>  ???? >>>>>>>
>>>  ???? >>>>>>>>>> Will do.
>>>  ???? >>>>>>>
>>>  ???? >>>>>>> Thanks,
>>>  ???? >>>>>>> Vladimir
>>>  ???? >>>>>>>
>>>  ???? >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>>>  ???? >>>>>>>> Recently there have been couple of high priority issues
>>> with
>>>  ???? >>>>>>>> regards to high bank of XMM register
>>>  ???? >>>>>>>> (XMM16-XMM31) usage by C2:
>>>  ???? >>>>>>>>
>>>  ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>>>  ???? >>>>>>>>
>>>  ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>  ???? >>>>>>>>
>>>  ???? >>>>>>>> Please find below a patch which attempts to clean up
>>> the XMM
>>>  ???? >>>>>>>> register handling by using register groups.
>>>  ???? >>>>>>>>
>>>  ???? >>>>>>>>
>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>>>  ??? <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>>  ???? >>>>>>>>
>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>>  ???? >>>>>>>>
>>>  ???? >>>>>>>> The patch provides a restricted set of registers to the
>>> match
>>>  ???? >>>>>>>> rules in the ad file based on the underlying architecture.
>>>  ???? >>>>>>>>
>>>  ???? >>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>>>  ???? >>>>>>>>
>>>  ???? >>>>>>>> By removing the special handling, the patch reduces the
>>> overall
>>>  ???? >>>>>>>> code size by about 1800 lines of code.
>>>  ???? >>>>>>>>
>>>  ???? >>>>>>>> Your review and feedback is very welcome.
>>>  ???? >>>>>>>>
>>>  ???? >>>>>>>> Best Regards,
>>>  ???? >>>>>>>>
>>>  ???? >>>>>>>> Sandhya
>>>  ???? >>>>>>>>
>>>
>>>
>>> --
>>>
>>> Thanks,
>>>
>>> Jc
>>>

From rkennke at redhat.com  Tue Sep 25 07:19:06 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 25 Sep 2018 09:19:06 +0200
Subject: RFR: JDK-8211061: Tests fail with
 assert(VM_Version::supports_sse4_1()) on ThreadRipper CPU
In-Reply-To: <8470d1f1-e878-7a01-ca1b-bacbb0b845e6@redhat.com>
References: <8470d1f1-e878-7a01-ca1b-bacbb0b845e6@redhat.com>
Message-ID: <d6403946-f66f-366e-8ac8-c52354cb5ee2@redhat.com>

Involving hotspot-compiler-dev...

> Some tests fail with:
> 
> # Internal Error
> (/home/rkennke/src/openjdk/jdk-jdk/src/hotspot/cpu/x86/assembler_x86.cpp:3819),
> pid=5051, tid=5055
> # Error: assert(VM_Version::supports_sse4_1()) failed
> 
> When running hotspot/jtreg:tier1 on my ThreadRipper 1950X box. On my
> Intel box, this works fine. It looks like it attempts to generate
> fast_sha1 stubs, which use Assembler::pinsrd() but then runs into
> supports_sse4_1().
> 
> The failing tier1 tests are:
> compiler/c1/Test6579789.java
> compiler/c1/Test6855215.java
> compiler/cpuflags/TestSSE4Disabled.java
> 
> The failing tests seem to disable SSE4 or SSE altogether and check if it
> still compiles fine. This does not go well for the SHA1 and SHA256 stubs
> because they use SSE4.1 instructions. It seems that it compiles on my
> Intel box because that doesn't support_sha(), and thus disables those
> intrinsics altogether.
> 
> The proposed fix is to check for SSE4.1 present before enabling the
> affected intrinsics.
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8211061
> Webrev:
> http://cr.openjdk.java.net/~rkennke/JDK-8211061/webrev.00/
> 
> Testing: hotspot/jtreg:tier1 failed before, now passes
> 
> Thanks,
> Roman
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180925/d9c7142a/signature.asc>

From tobias.hartmann at oracle.com  Tue Sep 25 07:42:05 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 25 Sep 2018 09:42:05 +0200
Subject: RFR: JDK-8211061: Tests fail with
 assert(VM_Version::supports_sse4_1()) on ThreadRipper CPU
In-Reply-To: <d6403946-f66f-366e-8ac8-c52354cb5ee2@redhat.com>
References: <8470d1f1-e878-7a01-ca1b-bacbb0b845e6@redhat.com>
 <d6403946-f66f-366e-8ac8-c52354cb5ee2@redhat.com>
Message-ID: <5b44471e-0d24-4d65-71d5-0ccf589cc4e6@oracle.com>

Hi Roman,

this looks good to me.

Best regards,
Tobias

On 25.09.2018 09:19, Roman Kennke wrote:
> Involving hotspot-compiler-dev...
> 
>> Some tests fail with:
>>
>> # Internal Error
>> (/home/rkennke/src/openjdk/jdk-jdk/src/hotspot/cpu/x86/assembler_x86.cpp:3819),
>> pid=5051, tid=5055
>> # Error: assert(VM_Version::supports_sse4_1()) failed
>>
>> When running hotspot/jtreg:tier1 on my ThreadRipper 1950X box. On my
>> Intel box, this works fine. It looks like it attempts to generate
>> fast_sha1 stubs, which use Assembler::pinsrd() but then runs into
>> supports_sse4_1().
>>
>> The failing tier1 tests are:
>> compiler/c1/Test6579789.java
>> compiler/c1/Test6855215.java
>> compiler/cpuflags/TestSSE4Disabled.java
>>
>> The failing tests seem to disable SSE4 or SSE altogether and check if it
>> still compiles fine. This does not go well for the SHA1 and SHA256 stubs
>> because they use SSE4.1 instructions. It seems that it compiles on my
>> Intel box because that doesn't support_sha(), and thus disables those
>> intrinsics altogether.
>>
>> The proposed fix is to check for SSE4.1 present before enabling the
>> affected intrinsics.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8211061
>> Webrev:
>> http://cr.openjdk.java.net/~rkennke/JDK-8211061/webrev.00/
>>
>> Testing: hotspot/jtreg:tier1 failed before, now passes
>>
>> Thanks,
>> Roman
>>
> 
> 

From tobias.hartmann at oracle.com  Tue Sep 25 08:13:04 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 25 Sep 2018 10:13:04 +0200
Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of
 single-threaded walk of thread stacks in
 NMethodSweeper::mark_active_nmethods()
In-Reply-To: <cd3f3259-f885-aad4-11e1-dd17a530cd5a@redhat.com>
References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com>
 <e16d7532-8c77-1380-61fa-146c49d1f926@oracle.com>
 <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com>
 <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com>
 <cb4e1ff5-bdec-9275-c5bf-ef82edf9fe39@redhat.com>
 <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com>
 <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com>
 <6c97cf4c-517f-767c-6723-1b2f41080f7f@redhat.com>
 <818610b4-cf48-091a-3719-09f1f863c508@redhat.com>
 <662e1e46-a48e-34db-da0a-df693160928d@redhat.com>
 <cd3f3259-f885-aad4-11e1-dd17a530cd5a@redhat.com>
Message-ID: <b83f922d-ff25-234d-0624-9674832068ba@oracle.com>

Hi Roman,

On 24.09.2018 18:46, Roman Kennke wrote:
> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.05/

This looks good to me!

Best regards,
Tobias

From rkennke at redhat.com  Tue Sep 25 08:17:23 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 25 Sep 2018 10:17:23 +0200
Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of
 single-threaded walk of thread stacks in
 NMethodSweeper::mark_active_nmethods()
In-Reply-To: <b83f922d-ff25-234d-0624-9674832068ba@oracle.com>
References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com>
 <e16d7532-8c77-1380-61fa-146c49d1f926@oracle.com>
 <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com>
 <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com>
 <cb4e1ff5-bdec-9275-c5bf-ef82edf9fe39@redhat.com>
 <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com>
 <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com>
 <6c97cf4c-517f-767c-6723-1b2f41080f7f@redhat.com>
 <818610b4-cf48-091a-3719-09f1f863c508@redhat.com>
 <662e1e46-a48e-34db-da0a-df693160928d@redhat.com>
 <cd3f3259-f885-aad4-11e1-dd17a530cd5a@redhat.com>
 <b83f922d-ff25-234d-0624-9674832068ba@oracle.com>
Message-ID: <66c0754a-e93b-48b8-4528-184c636d7254@redhat.com>

Thanks Tobias for reviewing!

Erik: Is this good for you too?

Thanks,
Roman

> Hi Roman,
> 
> On 24.09.2018 18:46, Roman Kennke wrote:
>> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.05/
> 
> This looks good to me!
> 
> Best regards,
> Tobias
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180925/9dc5cc27/signature.asc>

From rkennke at redhat.com  Tue Sep 25 08:23:05 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 25 Sep 2018 10:23:05 +0200
Subject: RFR: JDK-8211061: Tests fail with
 assert(VM_Version::supports_sse4_1()) on ThreadRipper CPU
In-Reply-To: <5b44471e-0d24-4d65-71d5-0ccf589cc4e6@oracle.com>
References: <8470d1f1-e878-7a01-ca1b-bacbb0b845e6@redhat.com>
 <d6403946-f66f-366e-8ac8-c52354cb5ee2@redhat.com>
 <5b44471e-0d24-4d65-71d5-0ccf589cc4e6@oracle.com>
Message-ID: <4554cf90-c0dd-baa4-ba41-e3f46b2c8871@redhat.com>

Thanks for reviewing, Tobias!

Roman

> Hi Roman,
> 
> this looks good to me.
> 
> Best regards,
> Tobias
> 
> On 25.09.2018 09:19, Roman Kennke wrote:
>> Involving hotspot-compiler-dev...
>>
>>> Some tests fail with:
>>>
>>> # Internal Error
>>> (/home/rkennke/src/openjdk/jdk-jdk/src/hotspot/cpu/x86/assembler_x86.cpp:3819),
>>> pid=5051, tid=5055
>>> # Error: assert(VM_Version::supports_sse4_1()) failed
>>>
>>> When running hotspot/jtreg:tier1 on my ThreadRipper 1950X box. On my
>>> Intel box, this works fine. It looks like it attempts to generate
>>> fast_sha1 stubs, which use Assembler::pinsrd() but then runs into
>>> supports_sse4_1().
>>>
>>> The failing tier1 tests are:
>>> compiler/c1/Test6579789.java
>>> compiler/c1/Test6855215.java
>>> compiler/cpuflags/TestSSE4Disabled.java
>>>
>>> The failing tests seem to disable SSE4 or SSE altogether and check if it
>>> still compiles fine. This does not go well for the SHA1 and SHA256 stubs
>>> because they use SSE4.1 instructions. It seems that it compiles on my
>>> Intel box because that doesn't support_sha(), and thus disables those
>>> intrinsics altogether.
>>>
>>> The proposed fix is to check for SSE4.1 present before enabling the
>>> affected intrinsics.
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8211061
>>> Webrev:
>>> http://cr.openjdk.java.net/~rkennke/JDK-8211061/webrev.00/
>>>
>>> Testing: hotspot/jtreg:tier1 failed before, now passes
>>>
>>> Thanks,
>>> Roman
>>>
>>
>>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180925/87eb8dac/signature-0001.asc>

From rwestrel at redhat.com  Tue Sep 25 08:27:03 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 25 Sep 2018 10:27:03 +0200
Subject: RFR(M): 8210885: Convert left over loads/stores to access api
In-Reply-To: <322e26c7-6179-f14c-3c5d-abd4a1cae76d@oracle.com>
References: <dk65zz28toe.fsf@rwestrel.remote.csb>
 <322e26c7-6179-f14c-3c5d-abd4a1cae76d@oracle.com>
Message-ID: <dk6d0t2j82w.fsf@rwestrel.remote.csb>


Thanks for the reviews Roman & Tobias.

Roland.

From rwestrel at redhat.com  Tue Sep 25 08:37:27 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 25 Sep 2018 10:37:27 +0200
Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy
In-Reply-To: <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com>
References: <dk636u68t4l.fsf@rwestrel.remote.csb>
 <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com>
Message-ID: <dk67ejaj7lk.fsf@rwestrel.remote.csb>


Hi Tobias,

> Looks good to me. I'm assuming the currently unused arguments of array_copy_requires_gc_barriers
> will be used for Shenandoah? Why you need to distinguish between ArrayCopyPhase?

Thanks for the view.

Yes extra arguments are to be used by shenandoah.

Generating barriers once parsing is over is not supported by all
gcs. The shape of the barriers is sometimes too complicated to be
emitted at igvn time.

Roland.

From Pengfei.Li at arm.com  Tue Sep 25 08:38:07 2018
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Tue, 25 Sep 2018 08:38:07 +0000
Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by
 power-of-2 check
In-Reply-To: <DB7PR08MB31154E9C6A3B07BF27F4D6B2961D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB31150B1D6C7E547538B2B99A96050@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <bcf5438c-50f9-7d75-b4d8-d09b89783816@oracle.com>
 <dk61sa0wigj.fsf@rwestrel.remote.csb>
 <DB7PR08MB31151A5DD37FAB07FDCB9B29961B0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com>
 <DB7PR08MB31154E9C6A3B07BF27F4D6B2961D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <DB7PR08MB31152098BB6A5C853DA2F19F96160@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi Vladimir,

I still didn't get other comments during the past week.
Do you think it is ok to push this patch?
http://cr.openjdk.java.net/~njian/8210152/webrev.01/

--
Thanks,
Pengfei

> -----Original Message-----
> 
> Hi Reviewers,
> 
> Is there any other comments, objections or suggestions on the new webrev?
> If no problems, could anyone help to push this commit?
> 
> I look forward to your response.
> 
> --
> Thanks,
> Pengfei
> 
> > -----Original Message-----
> >
> > Looks good.
> >
> > Thanks,
> > Vladimir
> >
> > On 9/12/18 2:50 AM, Pengfei Li (Arm Technology China) wrote:
> > > Hi,
> > >
> > > I've updated the patch based on Vladimir's comment. I added checks
> > > for
> > SubI on both branches of the diamond phi.
> > > Also thanks Roland for the suggestion that supporting a Phi with 3
> > > or more
> > inputs. But I think the matching rule will be much more complex if we
> > add this. And I'm not sure if there are any real case scenario which
> > can benefit from this support. So I didn't add it in.
> > >
> > > New webrev: http://cr.openjdk.java.net/~njian/8210152/webrev.01/
> > > I've run jtreg full test with the new patch and no new issues found.
> > >
> > > Please let me know if you have other comments or suggestions. If no
> > further issues, I need your help to sponsor and push the patch.
> > >
> > > --
> > > Thanks,
> > > Pengfei
> > >
> > >

From tobias.hartmann at oracle.com  Tue Sep 25 08:42:17 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 25 Sep 2018 10:42:17 +0200
Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy
In-Reply-To: <dk67ejaj7lk.fsf@rwestrel.remote.csb>
References: <dk636u68t4l.fsf@rwestrel.remote.csb>
 <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com>
 <dk67ejaj7lk.fsf@rwestrel.remote.csb>
Message-ID: <b3329405-1736-5235-b0a2-0962ae7789fa@oracle.com>

Hi Roland,

okay, thanks for the clarifications.

Best regards,
Tobias

On 25.09.2018 10:37, Roland Westrelin wrote:
> 
> Hi Tobias,
> 
>> Looks good to me. I'm assuming the currently unused arguments of array_copy_requires_gc_barriers
>> will be used for Shenandoah? Why you need to distinguish between ArrayCopyPhase?
> 
> Thanks for the view.
> 
> Yes extra arguments are to be used by shenandoah.
> 
> Generating barriers once parsing is over is not supported by all
> gcs. The shape of the barriers is sometimes too complicated to be
> emitted at igvn time.
> 
> Roland.
> 

From aph at redhat.com  Tue Sep 25 09:01:03 2018
From: aph at redhat.com (Andrew Haley)
Date: Tue, 25 Sep 2018 10:01:03 +0100
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
Message-ID: <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>

On 09/24/2018 09:14 AM, Alan Bateman wrote:

> I'm not questioning the need to support NVM, instead I'm trying to
> see whether MappedByteBuffer is the right way to expose this in the
> standard API. Buffers were designed in JSR-51 with specific
> use-cases in mind but they are problematic for many off-heap cases
> as they aren't thread safe, are limited to 2GB, lack confinement,
> only support homogeneous data (no layout support).

I'm baffled by this assertion. It's true that the 2Gb limit is turning
into a real pain, but all of the rest are nothing to do with
ByteBuffers, which are just raw memory. Adding structure is something
that can be done by third-party libraries or by some future OpenJDK
API.

> So where does this leave us? If support for persistent memory is
> added to FileChannel.map as we've been discussing then it may not be
> too bad as the API surface is small. The API surface is just new map
> modes and a MappedByteBuffer::isPersistent method. The force method
> that specify a range is something useful to add to MBB anyway.

Yeah, that's right, it is. While something not yet planned might be an
alternative, even a better one, the purpose of our faster release
cadence is to "evolve the Java SE Platform and the JDK at a more rapid
pace, so that new features [can] be delivered in timelier
manner". This is timely; waiting for Panama to think of what might be
possible, not so much.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From tobias.hartmann at oracle.com  Tue Sep 25 09:28:35 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 25 Sep 2018 11:28:35 +0200
Subject: [12] RFR(M): 8210215: C2 should optimize trichotomy calculations
Message-ID: <f28f8c5e-41e2-cc9d-d8b8-d1acf42dcdf4@oracle.com>

Hi,

please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8210215
http://cr.openjdk.java.net/~thartmann/8210215/webrev.00/

While analyzing performance results for the Value Types LW1 EA we got back from Doug Lea [1], I've
found that C2 is very bad at optimizing simple "trichotomic" comparisons of the form

 int compare(int a, int b) {
   return (a < b) ? -1 : (a == b) ? 0 : 1;
 }

when being inlined into a caller comparing the result against -1, 0 or 1.

For example, "compare(a, b) == 1" should be folded to "a > b" but currently isn't. Since this is a
very common pattern for sorting algorithms and since it will be even more common with value types,
it's crucial to optimize these. Out of the 66 comparisons in the jtreg test, C2 can optimize only
16. For all the other 50 tests, C2 emits two comparisons. With my patch, all comparisons are
optimized. The optimization improves the performance of Doug's microbenchmark by 5 - 12%.

The basic idea of the optimization is to search for two ifs that compare the same values while one
projection of the first if connects to the second if and all other projections connect to the same
region (potentially through an intermediate region). If two out of three projections to the region
then map to the same result value or control, we can replace the two ifs by a single if. The
implementation does this by first checking for one of the two shapes described in the
RegionNode::optimize_trichotomy comment which ensure that the comparisons have only two result
branches. We then merge the two ifs by computing the logical AND of the two tests (might be a
constant if the result is always false).

Thanks to John for pre-reviewing this change.

Thanks,
Tobias


[1] http://mail.openjdk.java.net/pipermail/valhalla-dev/2018-August/004879.html

From tobias.hartmann at oracle.com  Tue Sep 25 09:38:02 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 25 Sep 2018 11:38:02 +0200
Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by
 power-of-2 check
In-Reply-To: <DB7PR08MB31152098BB6A5C853DA2F19F96160@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB31150B1D6C7E547538B2B99A96050@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <bcf5438c-50f9-7d75-b4d8-d09b89783816@oracle.com>
 <dk61sa0wigj.fsf@rwestrel.remote.csb>
 <DB7PR08MB31151A5DD37FAB07FDCB9B29961B0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com>
 <DB7PR08MB31154E9C6A3B07BF27F4D6B2961D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <DB7PR08MB31152098BB6A5C853DA2F19F96160@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <8a5aba14-5feb-e6ca-953b-a91d12e4b5bc@oracle.com>

Hi Pengfei,

this looks good to me but please fix the whitespacing before pushing:
"if( a == b )" -> "if (a == b)"
"method( param )" -> "method(param)"

It's not consistently like that in old HotSpot code but we should at least fix it in new code.

Thanks,
Tobias

On 25.09.2018 10:38, Pengfei Li (Arm Technology China) wrote:
> Hi Vladimir,
> 
> I still didn't get other comments during the past week.
> Do you think it is ok to push this patch?
> http://cr.openjdk.java.net/~njian/8210152/webrev.01/
> 
> --
> Thanks,
> Pengfei
> 
>> -----Original Message-----
>>
>> Hi Reviewers,
>>
>> Is there any other comments, objections or suggestions on the new webrev?
>> If no problems, could anyone help to push this commit?
>>
>> I look forward to your response.
>>
>> --
>> Thanks,
>> Pengfei
>>
>>> -----Original Message-----
>>>
>>> Looks good.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/12/18 2:50 AM, Pengfei Li (Arm Technology China) wrote:
>>>> Hi,
>>>>
>>>> I've updated the patch based on Vladimir's comment. I added checks
>>>> for
>>> SubI on both branches of the diamond phi.
>>>> Also thanks Roland for the suggestion that supporting a Phi with 3
>>>> or more
>>> inputs. But I think the matching rule will be much more complex if we
>>> add this. And I'm not sure if there are any real case scenario which
>>> can benefit from this support. So I didn't add it in.
>>>>
>>>> New webrev: http://cr.openjdk.java.net/~njian/8210152/webrev.01/
>>>> I've run jtreg full test with the new patch and no new issues found.
>>>>
>>>> Please let me know if you have other comments or suggestions. If no
>>> further issues, I need your help to sponsor and push the patch.
>>>>
>>>> --
>>>> Thanks,
>>>> Pengfei
>>>>
>>>>

From Pengfei.Li at arm.com  Tue Sep 25 10:13:56 2018
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Tue, 25 Sep 2018 10:13:56 +0000
Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by
 power-of-2 check
In-Reply-To: <8a5aba14-5feb-e6ca-953b-a91d12e4b5bc@oracle.com>
References: <DB7PR08MB31150B1D6C7E547538B2B99A96050@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <bcf5438c-50f9-7d75-b4d8-d09b89783816@oracle.com>
 <dk61sa0wigj.fsf@rwestrel.remote.csb>
 <DB7PR08MB31151A5DD37FAB07FDCB9B29961B0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com>
 <DB7PR08MB31154E9C6A3B07BF27F4D6B2961D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <DB7PR08MB31152098BB6A5C853DA2F19F96160@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <8a5aba14-5feb-e6ca-953b-a91d12e4b5bc@oracle.com>
Message-ID: <DB7PR08MB311510B90842104F1034C51296160@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi Tobias,

Thanks for your review comment. I've fixed the whitespaces and below is a new webrev.
http://cr.openjdk.java.net/~njian/8210152/webrev.02/

Could you help push this patch since I don't have permissions to do that?

--
Thanks,
Pengfei


> -----Original Message-----
> 
> Hi Pengfei,
> 
> this looks good to me but please fix the whitespacing before pushing:
> "if( a == b )" -> "if (a == b)"
> "method( param )" -> "method(param)"
> 
> It's not consistently like that in old HotSpot code but we should at least fix it
> in new code.
> 
> Thanks,
> Tobias
> 
> On 25.09.2018 10:38, Pengfei Li (Arm Technology China) wrote:
> > Hi Vladimir,
> >
> > I still didn't get other comments during the past week.
> > Do you think it is ok to push this patch?
> > http://cr.openjdk.java.net/~njian/8210152/webrev.01/
> >
> > --
> > Thanks,
> > Pengfei
> >
> >> -----Original Message-----
> >>
> >> Hi Reviewers,
> >>
> >> Is there any other comments, objections or suggestions on the new
> webrev?
> >> If no problems, could anyone help to push this commit?
> >>
> >> I look forward to your response.
> >>
> >> --
> >> Thanks,
> >> Pengfei
> >>
> >>> -----Original Message-----
> >>>
> >>> Looks good.
> >>>
> >>> Thanks,
> >>> Vladimir
> >>>
> >>> On 9/12/18 2:50 AM, Pengfei Li (Arm Technology China) wrote:
> >>>> Hi,
> >>>>
> >>>> I've updated the patch based on Vladimir's comment. I added checks
> >>>> for
> >>> SubI on both branches of the diamond phi.
> >>>> Also thanks Roland for the suggestion that supporting a Phi with 3
> >>>> or more
> >>> inputs. But I think the matching rule will be much more complex if
> >>> we add this. And I'm not sure if there are any real case scenario
> >>> which can benefit from this support. So I didn't add it in.
> >>>>
> >>>> New webrev: http://cr.openjdk.java.net/~njian/8210152/webrev.01/
> >>>> I've run jtreg full test with the new patch and no new issues found.
> >>>>
> >>>> Please let me know if you have other comments or suggestions. If no
> >>> further issues, I need your help to sponsor and push the patch.
> >>>>
> >>>> --
> >>>> Thanks,
> >>>> Pengfei
> >>>>
> >>>>

From tobias.hartmann at oracle.com  Tue Sep 25 12:22:25 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 25 Sep 2018 14:22:25 +0200
Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by
 power-of-2 check
In-Reply-To: <DB7PR08MB311510B90842104F1034C51296160@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB31150B1D6C7E547538B2B99A96050@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <bcf5438c-50f9-7d75-b4d8-d09b89783816@oracle.com>
 <dk61sa0wigj.fsf@rwestrel.remote.csb>
 <DB7PR08MB31151A5DD37FAB07FDCB9B29961B0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com>
 <DB7PR08MB31154E9C6A3B07BF27F4D6B2961D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <DB7PR08MB31152098BB6A5C853DA2F19F96160@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <8a5aba14-5feb-e6ca-953b-a91d12e4b5bc@oracle.com>
 <DB7PR08MB311510B90842104F1034C51296160@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <54cd88be-ed94-9ffa-0862-f803496577fc@oracle.com>

Hi Pengfei,

sure, pushed:
http://hg.openjdk.java.net/jdk/jdk/rev/bc38c75eed57

Best regards,
Tobias

On 25.09.2018 12:13, Pengfei Li (Arm Technology China) wrote:
> Hi Tobias,
> 
> Thanks for your review comment. I've fixed the whitespaces and below is a new webrev.
> http://cr.openjdk.java.net/~njian/8210152/webrev.02/
> 
> Could you help push this patch since I don't have permissions to do that?
> 
> --
> Thanks,
> Pengfei
> 
> 
>> -----Original Message-----
>>
>> Hi Pengfei,
>>
>> this looks good to me but please fix the whitespacing before pushing:
>> "if( a == b )" -> "if (a == b)"
>> "method( param )" -> "method(param)"
>>
>> It's not consistently like that in old HotSpot code but we should at least fix it
>> in new code.
>>
>> Thanks,
>> Tobias
>>
>> On 25.09.2018 10:38, Pengfei Li (Arm Technology China) wrote:
>>> Hi Vladimir,
>>>
>>> I still didn't get other comments during the past week.
>>> Do you think it is ok to push this patch?
>>> http://cr.openjdk.java.net/~njian/8210152/webrev.01/
>>>
>>> --
>>> Thanks,
>>> Pengfei
>>>
>>>> -----Original Message-----
>>>>
>>>> Hi Reviewers,
>>>>
>>>> Is there any other comments, objections or suggestions on the new
>> webrev?
>>>> If no problems, could anyone help to push this commit?
>>>>
>>>> I look forward to your response.
>>>>
>>>> --
>>>> Thanks,
>>>> Pengfei
>>>>
>>>>> -----Original Message-----
>>>>>
>>>>> Looks good.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 9/12/18 2:50 AM, Pengfei Li (Arm Technology China) wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've updated the patch based on Vladimir's comment. I added checks
>>>>>> for
>>>>> SubI on both branches of the diamond phi.
>>>>>> Also thanks Roland for the suggestion that supporting a Phi with 3
>>>>>> or more
>>>>> inputs. But I think the matching rule will be much more complex if
>>>>> we add this. And I'm not sure if there are any real case scenario
>>>>> which can benefit from this support. So I didn't add it in.
>>>>>>
>>>>>> New webrev: http://cr.openjdk.java.net/~njian/8210152/webrev.01/
>>>>>> I've run jtreg full test with the new patch and no new issues found.
>>>>>>
>>>>>> Please let me know if you have other comments or suggestions. If no
>>>>> further issues, I need your help to sponsor and push the patch.
>>>>>>
>>>>>> --
>>>>>> Thanks,
>>>>>> Pengfei
>>>>>>
>>>>>>

From rwestrel at redhat.com  Tue Sep 25 13:10:40 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 25 Sep 2018 15:10:40 +0200
Subject: RFR(M): 8210885: Convert left over loads/stores to access api
In-Reply-To: <dk6d0t2j82w.fsf@rwestrel.remote.csb>
References: <dk65zz28toe.fsf@rwestrel.remote.csb>
 <322e26c7-6179-f14c-3c5d-abd4a1cae76d@oracle.com>
 <dk6d0t2j82w.fsf@rwestrel.remote.csb>
Message-ID: <dk6va6tiuy7.fsf@rwestrel.remote.csb>


I ran this one through the submit repo and it came back with 1 failed
test:

 runtime/XCheckJniJsig/XCheckJSig.java

that I can't reproduce. Could it be a spurious failure?

Roland.

From rkennke at redhat.com  Tue Sep 25 13:33:34 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 25 Sep 2018 15:33:34 +0200
Subject: RFR(M): 8210885: Convert left over loads/stores to access api
In-Reply-To: <dk6va6tiuy7.fsf@rwestrel.remote.csb>
References: <dk65zz28toe.fsf@rwestrel.remote.csb>
 <322e26c7-6179-f14c-3c5d-abd4a1cae76d@oracle.com>
 <dk6d0t2j82w.fsf@rwestrel.remote.csb> <dk6va6tiuy7.fsf@rwestrel.remote.csb>
Message-ID: <ddb74820-9788-dfe1-a539-ba9ec0c592f5@redhat.com>

I've also seen this with JDK-8211061.

Roman


> I ran this one through the submit repo and it came back with 1 failed
> test:
> 
>  runtime/XCheckJniJsig/XCheckJSig.java
> 
> that I can't reproduce. Could it be a spurious failure?
> 
> Roland.
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180925/fa8f5f9f/signature.asc>

From kuaiwei.kw at alibaba-inc.com  Tue Sep 25 13:50:35 2018
From: kuaiwei.kw at alibaba-inc.com (Kuai Wei)
Date: Tue, 25 Sep 2018 21:50:35 +0800
Subject: =?UTF-8?B?5Zue5aSN77ya5Zue5aSN77yaW1BhdGNoXSA4MjEwODUzOiBDMiBkb2Vzbid0IHNraXAgcG9z?=
 =?UTF-8?B?dCBiYXJyaWVyIGZvciBuZXcgYWxsb2NhdGVkIG9iamVjdHM=?=
In-Reply-To: <be0d3e45-e565-69dc-8fa4-164a79e05d9e@oracle.com>
References: <37b6e1a5-6dcf-4097-baca-3f9c03d38bb1.kuaiwei.kw@alibaba-inc.com>
 <a5f9f824-efdc-c04f-3325-999c8ab2d43a@oracle.com>
 <c48acfbd-335a-42be-a3de-eb3bb703f06d.kuaiwei.kw@alibaba-inc.com>,
 <be0d3e45-e565-69dc-8fa4-164a79e05d9e@oracle.com>
Message-ID: <4f7cdcf3-fa85-4011-85f9-fb2fc4f8d80e.kuaiwei.kw@alibaba-inc.com>

Hi Tobias,

  Thanks for your comments. I will check RegionNode::is_copy to see if it can be used to detect 
unnecessary region node. I will send new review after testing.

Best Regards,
Kevin


------------------------------------------------------------------
????Tobias Hartmann <tobias.hartmann at oracle.com>
?????2018?9?24?(???) 21:34
??????(??) <kuaiwei.kw at alibaba-inc.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
???????(??) <sanhong.lsh at alibaba-inc.com>
????Re: ???[Patch] 8210853: C2 doesn't skip post barrier for new allocated objects

Hi Kevin,

On 24.09.2018 08:06, Kuai Wei wrote:
>   Thanks for your suggestion. I think your point is the region node may have new path in later parse
> phase, so we can not make sure the region node will be optimized.

Yes, my point is that a new path to the region might be added after your optimization and that path
might contain stores to the newly allocated object.

>   It's a good question and I checked it. Now I think it may not cause trouble. In post barrier
> reduce, the oop store use allocation node as base pointer. The data graph guarantee control of
> allocation node should dominate control of store. If allocation node is in pred of region node and
> there's a new path into region, the graph is bad because we can reach store without allocation.

Yes but the new path might be a backedge from a loop that is dominated by the allocation.

> If allocation node is in a domination ancestor, the graph shape is a little complicated, so we can not
> reach control of allocation by skipping one region.

Right, that's basically the implicit assumption of your patch. I'm not sure if it always holds. But
I think you should at least use RegionNode::is_copy().

Let's see what other reviewers think.

>   The better solution is we can know the region node is created in exit_map and we will not change
> it in later. Is there any way to know it in compile time?

The region node is created in Parse::build_exits(). I don't think there is a way to keep track of this.

Thanks,
Tobias
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180925/e7b11c83/attachment-0001.html>

From Pengfei.Li at arm.com  Tue Sep 25 13:54:06 2018
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Tue, 25 Sep 2018 13:54:06 +0000
Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by
 power-of-2 check
In-Reply-To: <54cd88be-ed94-9ffa-0862-f803496577fc@oracle.com>
References: <DB7PR08MB31150B1D6C7E547538B2B99A96050@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <bcf5438c-50f9-7d75-b4d8-d09b89783816@oracle.com>
 <dk61sa0wigj.fsf@rwestrel.remote.csb>
 <DB7PR08MB31151A5DD37FAB07FDCB9B29961B0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com>
 <DB7PR08MB31154E9C6A3B07BF27F4D6B2961D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <DB7PR08MB31152098BB6A5C853DA2F19F96160@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <8a5aba14-5feb-e6ca-953b-a91d12e4b5bc@oracle.com>
 <DB7PR08MB311510B90842104F1034C51296160@DB7PR08MB3115.eurprd08.prod.outlook.com>,
 <54cd88be-ed94-9ffa-0862-f803496577fc@oracle.com>
Message-ID: <DB7PR08MB311502791942BECF5EB843AD96160@DB7PR08MB3115.eurprd08.prod.outlook.com>


Thanks Tobias.


Hi Pengfei,

sure, pushed:
http://hg.openjdk.java.net/jdk/jdk/rev/bc38c75eed57

Best regards,
Tobias

On 25.09.2018 12:13, Pengfei Li (Arm Technology China) wrote:
> Hi Tobias,
>
> Thanks for your review comment. I've fixed the whitespaces and below is a new webrev.
> http://cr.openjdk.java.net/~njian/8210152/webrev.02/
>
> Could you help push this patch since I don't have permissions to do that?
>
> --
> Thanks,
> Pengfei
>
>
>> -----Original Message-----
>>
>> Hi Pengfei,
>>
>> this looks good to me but please fix the whitespacing before pushing:
>> "if( a == b )" -> "if (a == b)"
>> "method( param )" -> "method(param)"
>>
>> It's not consistently like that in old HotSpot code but we should at least fix it
>> in new code.
>>
>> Thanks,
>> Tobias
>>
>> On 25.09.2018 10:38, Pengfei Li (Arm Technology China) wrote:
>>> Hi Vladimir,
>>>
>>> I still didn't get other comments during the past week.
>>> Do you think it is ok to push this patch?
>>> http://cr.openjdk.java.net/~njian/8210152/webrev.01/
>>>
>>> --
>>> Thanks,
>>> Pengfei
>>>
>>>> -----Original Message-----
>>>>
>>>> Hi Reviewers,
>>>>
>>>> Is there any other comments, objections or suggestions on the new
>> webrev?
>>>> If no problems, could anyone help to push this commit?
>>>>
>>>> I look forward to your response.
>>>>
>>>> --
>>>> Thanks,
>>>> Pengfei
>>>>
>>>>> -----Original Message-----
>>>>>
>>>>> Looks good.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 9/12/18 2:50 AM, Pengfei Li (Arm Technology China) wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've updated the patch based on Vladimir's comment. I added checks
>>>>>> for
>>>>> SubI on both branches of the diamond phi.
>>>>>> Also thanks Roland for the suggestion that supporting a Phi with 3
>>>>>> or more
>>>>> inputs. But I think the matching rule will be much more complex if
>>>>> we add this. And I'm not sure if there are any real case scenario
>>>>> which can benefit from this support. So I didn't add it in.
>>>>>>
>>>>>> New webrev: http://cr.openjdk.java.net/~njian/8210152/webrev.01/
>>>>>> I've run jtreg full test with the new patch and no new issues found.
>>>>>>
>>>>>> Please let me know if you have other comments or suggestions. If no
>>>>> further issues, I need your help to sponsor and push the patch.
>>>>>>
>>>>>> --
>>>>>> Thanks,
>>>>>> Pengfei
>>>>>>
>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180925/a79f3691/attachment-0001.html>

From dcherepanov at azul.com  Tue Sep 25 14:00:35 2018
From: dcherepanov at azul.com (Dmitry Cherepanov)
Date: Tue, 25 Sep 2018 14:00:35 +0000
Subject: RFR: 8211100: hotspot C1 issue with comparing long numbers on x86
 32-bit
Message-ID: <B1F2DAF7-6A1D-456A-BEE8-24C938661124@azul.com>

Hello,

Please review a patch that resolves issue in x86 32bit builds. It slightly adjusts the fix for JDK-8201447 (C1 does backedge profiling incorrectly) by creating a copy of the left operand and using it for incrementing backedge counter.

JBS issue: https://bugs.openjdk.java.net/browse/JDK-8211100
webrev: http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.00/

Thanks,

Dmitry

From tobias.hartmann at oracle.com  Tue Sep 25 14:33:27 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 25 Sep 2018 16:33:27 +0200
Subject: RFR(M): 8210885: Convert left over loads/stores to access api
In-Reply-To: <ddb74820-9788-dfe1-a539-ba9ec0c592f5@redhat.com>
References: <dk65zz28toe.fsf@rwestrel.remote.csb>
 <322e26c7-6179-f14c-3c5d-abd4a1cae76d@oracle.com>
 <dk6d0t2j82w.fsf@rwestrel.remote.csb> <dk6va6tiuy7.fsf@rwestrel.remote.csb>
 <ddb74820-9788-dfe1-a539-ba9ec0c592f5@redhat.com>
Message-ID: <40296bcb-540f-693f-3431-f20dba984245@oracle.com>

This is:
https://bugs.openjdk.java.net/browse/JDK-8211084

Best regards,
Tobias

On 25.09.2018 15:33, Roman Kennke wrote:
> I've also seen this with JDK-8211061.
> 
> Roman
> 
> 
>> I ran this one through the submit repo and it came back with 1 failed
>> test:
>>
>>  runtime/XCheckJniJsig/XCheckJSig.java
>>
>> that I can't reproduce. Could it be a spurious failure?
>>
>> Roland.
>>
> 
> 

From rwestrel at redhat.com  Tue Sep 25 14:37:46 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 25 Sep 2018 16:37:46 +0200
Subject: RFR(M): 8210885: Convert left over loads/stores to access api
In-Reply-To: <40296bcb-540f-693f-3431-f20dba984245@oracle.com>
References: <dk65zz28toe.fsf@rwestrel.remote.csb>
 <322e26c7-6179-f14c-3c5d-abd4a1cae76d@oracle.com>
 <dk6d0t2j82w.fsf@rwestrel.remote.csb> <dk6va6tiuy7.fsf@rwestrel.remote.csb>
 <ddb74820-9788-dfe1-a539-ba9ec0c592f5@redhat.com>
 <40296bcb-540f-693f-3431-f20dba984245@oracle.com>
Message-ID: <dk6mus5iqx1.fsf@rwestrel.remote.csb>


> https://bugs.openjdk.java.net/browse/JDK-8211084

Thanks. I'll push this change then.

Roland.

From doug.simon at oracle.com  Tue Sep 25 14:48:12 2018
From: doug.simon at oracle.com (Doug Simon)
Date: Tue, 25 Sep 2018 16:48:12 +0200
Subject: RFR: 8208686: [AOT] JVMTI ResourceExhausted event repeated for same
 allocation
Message-ID: <910A4A3C-3EEF-4167-84D5-9819C83D6FC1@oracle.com>

A major design point of Graal is to treat allocations as non-side effecting to give more freedom to the optimizer by reducing the number of distinct FrameStates that need to be managed. When failing an allocation, Graal will deoptimize to the last side effecting instruction before the allocation. This mean the VM code for heap allocation will potentially be executed twice, once from Graal compiled code and then again in the interpreter. While this is perfectly fine according to the JVM specification, it can cause confusing behavior for JVMTI based tools. They will receive 2 ResourceExhausted events for a single allocation. Furthermore, the first ResourceExhausted event (on the Graal allocation slow path) might denote a bytecode instruction that performs no allocation, making it hard to debug the memory failure.

The proposed solution is to add an extra set of JVMCI VM runtime calls for allocation. These entry points will attempt the allocation and upon failure,
skip side-effects such as posting JVMTI events or handling -XX:OnOutOfMemoryError. The compiled code using these entry points is expected deoptmize on null.

The path from these new entry points to where allocation can fail goes through quite a bit of VM code. One could modify all these paths by:
* Returning null instead of throwing an exception on failure.
* Adding a `bool null_on_fail` argument to all relevant methods.
* Adding extra null checking where necessary after each call to these methods when `null_on_fail == true`.
This represents a significant number of changes.

Instead, the proposed solution introduces a new _in_retryable_allocation thread-local. This way, only the entry points and allocation routines that raise an exception need to be modified. Failure is communicated back to the new entry points by throwing a special pre-allocated OOME object (i.e., Universe::out_of_memory_error_retry()) which must not propagate back to Java code. Use of this object is not strictly necessary; it is introduced to highlight/document the special allocation mode.

The proposed solution is at http://cr.openjdk.java.net/~dnsimon/8208686.
THE JBS bug is: https://bugs.openjdk.java.net/browse/JDK-8208686

-Doug

From daniel.daugherty at oracle.com  Tue Sep 25 15:11:18 2018
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Tue, 25 Sep 2018 11:11:18 -0400
Subject: RFR: 8208686: [AOT] JVMTI ResourceExhausted event repeated for
 same allocation
In-Reply-To: <910A4A3C-3EEF-4167-84D5-9819C83D6FC1@oracle.com>
References: <910A4A3C-3EEF-4167-84D5-9819C83D6FC1@oracle.com>
Message-ID: <64cbe730-5c9a-a04d-9eee-a56abfbb8e07@oracle.com>

Adding serviceability-dev at ... since this is JVM/TI...

Dan


On 9/25/18 10:48 AM, Doug Simon wrote:
> A major design point of Graal is to treat allocations as non-side effecting to give more freedom to the optimizer by reducing the number of distinct FrameStates that need to be managed. When failing an allocation, Graal will deoptimize to the last side effecting instruction before the allocation. This mean the VM code for heap allocation will potentially be executed twice, once from Graal compiled code and then again in the interpreter. While this is perfectly fine according to the JVM specification, it can cause confusing behavior for JVMTI based tools. They will receive 2 ResourceExhausted events for a single allocation. Furthermore, the first ResourceExhausted event (on the Graal allocation slow path) might denote a bytecode instruction that performs no allocation, making it hard to debug the memory failure.
>
> The proposed solution is to add an extra set of JVMCI VM runtime calls for allocation. These entry points will attempt the allocation and upon failure,
> skip side-effects such as posting JVMTI events or handling -XX:OnOutOfMemoryError. The compiled code using these entry points is expected deoptmize on null.
>
> The path from these new entry points to where allocation can fail goes through quite a bit of VM code. One could modify all these paths by:
> * Returning null instead of throwing an exception on failure.
> * Adding a `bool null_on_fail` argument to all relevant methods.
> * Adding extra null checking where necessary after each call to these methods when `null_on_fail == true`.
> This represents a significant number of changes.
>
> Instead, the proposed solution introduces a new _in_retryable_allocation thread-local. This way, only the entry points and allocation routines that raise an exception need to be modified. Failure is communicated back to the new entry points by throwing a special pre-allocated OOME object (i.e., Universe::out_of_memory_error_retry()) which must not propagate back to Java code. Use of this object is not strictly necessary; it is introduced to highlight/document the special allocation mode.
>
> The proposed solution is at http://cr.openjdk.java.net/~dnsimon/8208686.
> THE JBS bug is: https://bugs.openjdk.java.net/browse/JDK-8208686
>
> -Doug


From tobias.hartmann at oracle.com  Tue Sep 25 15:40:19 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 25 Sep 2018 17:40:19 +0200
Subject: RFR: 8211100: hotspot C1 issue with comparing long numbers on x86
 32-bit
In-Reply-To: <B1F2DAF7-6A1D-456A-BEE8-24C938661124@azul.com>
References: <B1F2DAF7-6A1D-456A-BEE8-24C938661124@azul.com>
Message-ID: <e0d34924-5e72-2a7a-78b3-0b36996495bd@oracle.com>

Hi Dmitry,

Shouldn't this at least be guarded by an #ifndef _LP64 to avoid the additional move on x86_64?

Could you please add the regression test to the webrev? Or did this reproduce with other tests?

Thanks,
Tobias

On 25.09.2018 16:00, Dmitry Cherepanov wrote:
> Hello,
> 
> Please review a patch that resolves issue in x86 32bit builds. It slightly adjusts the fix for JDK-8201447 (C1 does backedge profiling incorrectly) by creating a copy of the left operand and using it for incrementing backedge counter.
> 
> JBS issue: https://bugs.openjdk.java.net/browse/JDK-8211100
> webrev: http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.00/
> 
> Thanks,
> 
> Dmitry
> 

From gromero at linux.vnet.ibm.com  Tue Sep 25 18:29:04 2018
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Tue, 25 Sep 2018 15:29:04 -0300
Subject: [8u] RFR for backport of JDK-8164920: ppc: enhancement of CRC32
 intrinsic to jdk8u-dev (v2)
In-Reply-To: <31e036a0-a7b7-70f2-422f-addd4049436f@linux.vnet.ibm.com>
References: <31e036a0-a7b7-70f2-422f-addd4049436f@linux.vnet.ibm.com>
Message-ID: <5d059529-6048-ea79-f661-aae05b754dcc@linux.vnet.ibm.com>

Hi,

Maybe I please get reviews for the following changes (v2)?

http://cr.openjdk.java.net/~gromero/crc32_jdk8u/v2/8131048/
http://cr.openjdk.java.net/~gromero/crc32_jdk8u/v2/8164920/

Change JDK-8131048 is a dependency to backport JDK-8164920 to 8u.

Goetz reviewed the first version of this backport and pointed out necessary
changes and fixes that are present in this v2. Thank you, Goetz.

There is no code change except to adapt the file paths, to add has_vpmsumb()
feature detection, and to adapt the signature before doing the runtime call
to CRC32 intrinsic by casting T_INTs as T_LONGs, because PPC64
c_calling_convention() requires T_LONGs on 8u, otherwise a proper assert()
for that is hit.

Change JDK-8131048 touches shared code but, since it checks for
'CCallingConventionRequiresIntsAsLongs' flag that is only currently set on
PPC64, that change is in effect PPC64-only.

JDK-8164920 is important for PPC64 because it helps a lot some Apache
Cassandra workloads on Power.

Thank you.


Best regards,
Gustavo


From sandhya.viswanathan at intel.com  Tue Sep 25 17:31:23 2018
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Tue, 25 Sep 2018 17:31:23 +0000
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF0F805@FMSMSX126.amr.corp.intel.com>

Hi Alan,

It looks like Apache HBASE also uses FileChannel map and MappedByteBuffer mechanism. The feature proposed by Andrew Dinn and Jonathan Halliday will be very useful for Big Data frameworks as well and help them to use NVM without a need to go to JNI. Copying HBASE experts Anoop and Ram to this thread.

Apache has API layers to overcome the 2GB limitation through MultiByteBuff class:
https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/nio/MultiByteBuff.html
https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/ByteBufferArray.html

Some example uses of ByteBuffer in HBASE today:
https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/bucket/FileMmapEngine.html
https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/ByteBuffInputStream.html
https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/ipc/ServerRpcConnection.ByteBuffByteInput.html

Best Regards,
Sandhya

-----Original Message-----
From: Alan Bateman [mailto:Alan.Bateman at oracle.com] 
Sent: Monday, September 24, 2018 1:15 AM
To: Andrew Dinn <adinn at redhat.com>; core-libs-dev at openjdk.java.net; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; Aundhe, Shirish <shirish.aundhe at intel.com>; Dohrmann, Steve <steve.dohrmann at intel.com>; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Deshpande, Vivek R <vivek.r.deshpande at intel.com>; Jonathan Halliday <jonathan.halliday at redhat.com>
Subject: Re: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory

On 21/09/2018 16:44, Andrew Dinn wrote:
> Hi Alan,
>
> Thanks for the response and apologies for failing to notice you had
> posted it some days ago (doh!).
>
> Jonathan Halliday has already explained how Red Hat might want to use
> this API. Well, what he said, essentially! In particular, this model
> provides a way of ensuring that raw byte data is able to be persisted
> coherently from Java with the minimal possible overhead. It would be up
> to client code above this layer to implement structuring mechanisms for
> how those raw bytes get populated with data and to manage any associated
> issues regarding atomicity, consistency and isolation (i.e. to provide
> the A, C and I of ACID to this API's D).
>
> The main point of the JEP is to ensure that this such a performant base
> capability is available for known important cases where that is needed
> such as, for example, a transaction manager or a distributed cache. If
> equivalent middleware written in C can use persistent memory to bring
> the persistent storage tier nearer to the CPU and, hence, lower data
> durability overheads then we really need an equivalently performant
> option in Java or risk Java dropping out as a player in those middleware
> markets.
>
> I am glad to hear that other alternatives might be available and would
> be happy to consider them. However, I'm not sure that this means this
> option is not still desirable, especially if it is orthogonal to those
> other alternatives. Most importantly, this one has the advantage that we
> know it is ready to use and will provide benefits (we have already
> implemented a journaled transaction log over it with promising results
> and someone from our messaging team has already been looking into using
> it to persist log messages). Indeed, we also know we can use it to
> provide a base for supporting all the use cases addressed by Intel's
> libpmem and available to C programmers, e.g. a block store, simply by
> implementing Java client libraries that provide managed access to the
> persistent buffer along the same lines as the Intel C libraries.
>
> I'm afraid I am not familiar with Panama 'scopes' and 'pointers' so I
> can't really compare options here. Can you point me at any info that
> explains what those terms mean and how it might be possible to use them
> to access off-heap, persistent data.
>
I'm not questioning the need to support NVM, instead I'm trying to see 
whether MappedByteBuffer is the right way to expose this in the standard 
API. Buffers were designed in JSR-51 with specific use-cases in mind but 
they are problematic for many off-heap cases as they aren't thread safe, 
are limited to 2GB, lack confinement, only support homogeneous data (no 
layout support). At the same time, Project Panama (foreign branch in 
panama/dev) has the potential to provide the right API to work with 
memory. I see Jonathan's mail where he seems to be using object 
serialization so the solution on the table works for his use-case but it 
may not be the right solution for more general multi-threaded access to 
NVM. There is some interest in seeing whether this part of Project 
Panama could be advanced to address many of the cases where developers 
are resorting to using Unsafe today. There would of course need to be 
some integration with buffers too. There's no concrete proposal/JEP at 
this time, I'm just pointing out that many of the complaints about 
buffers that are really cases where it's the wrong API and the real need 
is something more fundamental.

So where does this leave us? If support for persistent memory is added 
to FileChannel.map as we've been discussing then it may not be too bad 
as the API surface is small. The API surface is just new map modes and a 
MappedByteBuffer::isPersistent method. The force method that specify a 
range is something useful to add to MBB anyway. If (and I hope when) 
there is support for memory regions or pointers then I could imagine 
re-visiting this so that there are alternative ways to get a memory 
region or pointer that is backed by NVM. If the timing were different 
then I think we'd skip the new map modes and we would be having a 
different discussion here. An alternative is course to create the mapped 
buffer via a JDK-specific API as that would be easier to deprecate and 
remove in the future if needed.

I'm interested to see if there is other input on this topic before it 
gets locked into extending the standard API.

-Alan.

From david.holmes at oracle.com  Tue Sep 25 21:27:47 2018
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 25 Sep 2018 17:27:47 -0400
Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of
 single-threaded walk of thread stacks in
 NMethodSweeper::mark_active_nmethods()
In-Reply-To: <66c0754a-e93b-48b8-4528-184c636d7254@redhat.com>
References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com>
 <e16d7532-8c77-1380-61fa-146c49d1f926@oracle.com>
 <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com>
 <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com>
 <cb4e1ff5-bdec-9275-c5bf-ef82edf9fe39@redhat.com>
 <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com>
 <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com>
 <6c97cf4c-517f-767c-6723-1b2f41080f7f@redhat.com>
 <818610b4-cf48-091a-3719-09f1f863c508@redhat.com>
 <662e1e46-a48e-34db-da0a-df693160928d@redhat.com>
 <cd3f3259-f885-aad4-11e1-dd17a530cd5a@redhat.com>
 <b83f922d-ff25-234d-0624-9674832068ba@oracle.com>
 <66c0754a-e93b-48b8-4528-184c636d7254@redhat.com>
Message-ID: <c60ace90-e0b7-9cf8-2625-22c71d4e413e@oracle.com>

Hi Roman,

This change seems to have broken a test:

compiler/whitebox/ForceNMethodSweepTest.java

on exception 'sweep after deoptimization should decrease usage: expected 
that 1477504 < 1477504':

I'm assuming the test is assuming (it is a whitebox test afterall) that 
the cleanup is happening synchronously.

No bug filed yet.

Thanks,
David

On 25/09/2018 4:17 AM, Roman Kennke wrote:
> Thanks Tobias for reviewing!
> 
> Erik: Is this good for you too?
> 
> Thanks,
> Roman
> 
>> Hi Roman,
>>
>> On 24.09.2018 18:46, Roman Kennke wrote:
>>> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.05/
>>
>> This looks good to me!
>>
>> Best regards,
>> Tobias
>>
> 
> 

From david.holmes at oracle.com  Tue Sep 25 21:32:57 2018
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 25 Sep 2018 17:32:57 -0400
Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of
 single-threaded walk of thread stacks in
 NMethodSweeper::mark_active_nmethods()
In-Reply-To: <c60ace90-e0b7-9cf8-2625-22c71d4e413e@oracle.com>
References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com>
 <e16d7532-8c77-1380-61fa-146c49d1f926@oracle.com>
 <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com>
 <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com>
 <cb4e1ff5-bdec-9275-c5bf-ef82edf9fe39@redhat.com>
 <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com>
 <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com>
 <6c97cf4c-517f-767c-6723-1b2f41080f7f@redhat.com>
 <818610b4-cf48-091a-3719-09f1f863c508@redhat.com>
 <662e1e46-a48e-34db-da0a-df693160928d@redhat.com>
 <cd3f3259-f885-aad4-11e1-dd17a530cd5a@redhat.com>
 <b83f922d-ff25-234d-0624-9674832068ba@oracle.com>
 <66c0754a-e93b-48b8-4528-184c636d7254@redhat.com>
 <c60ace90-e0b7-9cf8-2625-22c71d4e413e@oracle.com>
Message-ID: <2163fb80-e8de-c604-9547-2790c15558af@oracle.com>

Filed: JDK-8211129

David

On 25/09/2018 5:27 PM, David Holmes wrote:
> Hi Roman,
> 
> This change seems to have broken a test:
> 
> compiler/whitebox/ForceNMethodSweepTest.java
> 
> on exception 'sweep after deoptimization should decrease usage: expected 
> that 1477504 < 1477504':
> 
> I'm assuming the test is assuming (it is a whitebox test afterall) that 
> the cleanup is happening synchronously.
> 
> No bug filed yet.
> 
> Thanks,
> David
> 
> On 25/09/2018 4:17 AM, Roman Kennke wrote:
>> Thanks Tobias for reviewing!
>>
>> Erik: Is this good for you too?
>>
>> Thanks,
>> Roman
>>
>>> Hi Roman,
>>>
>>> On 24.09.2018 18:46, Roman Kennke wrote:
>>>> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.05/
>>>
>>> This looks good to me!
>>>
>>> Best regards,
>>> Tobias
>>>
>>
>>

From john.r.rose at oracle.com  Tue Sep 25 23:57:26 2018
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 25 Sep 2018 16:57:26 -0700
Subject: [12] RFR(M): 8210215: C2 should optimize trichotomy calculations
In-Reply-To: <f28f8c5e-41e2-cc9d-d8b8-d1acf42dcdf4@oracle.com>
References: <f28f8c5e-41e2-cc9d-d8b8-d1acf42dcdf4@oracle.com>
Message-ID: <8DC6DEBA-9D31-4DFB-97D7-83474E69A5E3@oracle.com>

On Sep 25, 2018, at 2:28 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> Thanks to John for pre-reviewing this change.

Based on a partial inspection, two comments:

`res[9][9]` should be `res[illegal+1][illegal+1]` and should have rows and columns for `never`
(code smell:  `9` is a naked constant; makes it hard to tell your table is out of date)

In the test cases `compare1` has `(a < b) ? -1 : (a == b) ? 0 : 1`.
Shouldn?t you also test `(a < b) ? -1 : (a <= b) ? 0 : 1`?
And similarly, for other cases where the second test overlaps
with the first.

? John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180925/02a3905c/attachment-0001.html>

From stuart.marks at oracle.com  Wed Sep 26 02:19:02 2018
From: stuart.marks at oracle.com (Stuart Marks)
Date: Tue, 25 Sep 2018 19:19:02 -0700
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
 <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>
Message-ID: <c4fdca2f-7c6a-7429-6b1a-efdff2994bf6@oracle.com>

Hi Andrew,

I've been starting to look at some of the buffer-related issues and I've been 
discussing this issue with Alan.

On 9/25/18 2:01 AM, Andrew Haley wrote:
> On 09/24/2018 09:14 AM, Alan Bateman wrote:
> 
>> I'm not questioning the need to support NVM, instead I'm trying to
>> see whether MappedByteBuffer is the right way to expose this in the
>> standard API. Buffers were designed in JSR-51 with specific
>> use-cases in mind but they are problematic for many off-heap cases
>> as they aren't thread safe, are limited to 2GB, lack confinement,
>> only support homogeneous data (no layout support).
> 
> I'm baffled by this assertion. It's true that the 2Gb limit is turning
> into a real pain, but all of the rest are nothing to do with
> ByteBuffers, which are just raw memory. Adding structure is something
> that can be done by third-party libraries or by some future OpenJDK
> API.

If you look around Java SE for a public API to represent raw memory, then 
MappedByteBuffer is the obvious choice; there isn't any realistic alternative 
right now. By asking whether MBB is "the right way to expose" raw memory, I 
think Alan is really saying, is MBB the best API to use to expose raw memory in 
the long run? I think the answer is clearly No.

However, that's not an argument against proceeding with MBB. Rather, it's 
setting expectations that MBB has limitations that impose some pain in the short 
term, that possibly can be worked around, but which in the long term may prove 
to be insurmountable.

For an example of something that might be "insurmountable", I'll use the 2GB 
limitation. Doing something to raise the limit is certainly possible. The 
question is, after retrofitting this into the API, whether the result will be 
something that people want to program with, and whether it will perform well 
enough. It might not.

Another example would be a library layered on top that provides structured 
access. It's certainly possible have such a library that will get the job done. 
However, the Buffer API necessarily requires certain operations to be 
implemented using multiple method calls, or it might require copying in some 
cases. Either of these might impose unacceptable overhead.

There are, however, certain things that can be done with buffers in the short 
term to make things work better. For example, JDK-5029431 absolute bulk put/get 
methods. I suspect this will be quite helpful for the NVM case, and indeed it's 
been something that's been asked for repeatedly for quite some time.

If you (collectively) are aware of this and other limitations, then sure, let's 
proceed with this JEP.

>> So where does this leave us? If support for persistent memory is
>> added to FileChannel.map as we've been discussing then it may not be
>> too bad as the API surface is small. The API surface is just new map
>> modes and a MappedByteBuffer::isPersistent method. The force method
>> that specify a range is something useful to add to MBB anyway.
> 
> Yeah, that's right, it is. While something not yet planned might be an
> alternative, even a better one, the purpose of our faster release
> cadence is to "evolve the Java SE Platform and the JDK at a more rapid
> pace, so that new features [can] be delivered in timelier
> manner". This is timely; waiting for Panama to think of what might be
> possible, not so much.

Agree, "waiting for Panama" is certainly not something that anyone wants to do.

s'marks

From felix.yang at huawei.com  Wed Sep 26 03:01:51 2018
From: felix.yang at huawei.com (Yangfei (Felix))
Date: Wed, 26 Sep 2018 03:01:51 +0000
Subject: [aarch64-port-dev ] RFR(S): 8210413: AArch64: Optimize div/rem
 by constant in C1
In-Reply-To: <DB7PR08MB3115A33F5A78FFA06D04319296120@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115445A18A786BAFD1F7B08961A0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <9645c210-3d87-52fa-8051-54dc60629866@redhat.com>
 <DB7PR08MB3115D98CC20DAB17F160991496130@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <9d454f5b-475b-5713-7155-c6946f378c3e@redhat.com>
 <DB7PR08MB3115A33F5A78FFA06D04319296120@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <DA41BE1DDCA941489001C7FBD7A8820ED5F07A3E@dggeml527-mbx.china.huawei.com>

Just eyeballed the changes, looks good.
Pushed: http://hg.openjdk.java.net/jdk/jdk/rev/e1368526699d

Thanks,
Felix

> 
> Thanks for your code review. Could you help push this patch?
> 
> --
> Thanks,
> Pengfei
> 
> 
> > -----Original Message-----
> >
> > On 09/20/2018 05:15 AM, Pengfei Li (Arm Technology China) wrote:
> > > Please find below new patch that added the same optimization for longs as
> > well as ints and also fixed an issue.
> > > http://cr.openjdk.java.net/~yzhang/8210413/webrev.01/
> > >
> > > Could you help look at it again?
> >
> > That's fine. I'm not exactly delighted by the amount of duplicated code for
> > long and int, but it's very hard to avoid in this case.
> > The patch is good for JDK/JDK.
> >
> > --
> > Andrew Haley
> > Java Platform Lead Engineer
> > Red Hat UK Ltd. <https://www.redhat.com>
> > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From dcherepanov at azul.com  Wed Sep 26 07:04:18 2018
From: dcherepanov at azul.com (Dmitry Cherepanov)
Date: Wed, 26 Sep 2018 07:04:18 +0000
Subject: RFR: 8211100: hotspot C1 issue with comparing long numbers on x86
 32-bit
In-Reply-To: <e0d34924-5e72-2a7a-78b3-0b36996495bd@oracle.com>
References: <B1F2DAF7-6A1D-456A-BEE8-24C938661124@azul.com>
 <e0d34924-5e72-2a7a-78b3-0b36996495bd@oracle.com>
Message-ID: <C42E6132-613F-474C-B408-7313677C08B5@azul.com>

Hi Tobias,

Thanks for the review, updated patch avoids the additional move on x86_64 and includes the regression test.

http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.01/

Dmitry

On Sep 25, 2018, at 6:40 PM, Tobias Hartmann <tobias.hartmann at oracle.com<mailto:tobias.hartmann at oracle.com>> wrote:

Hi Dmitry,

Shouldn't this at least be guarded by an #ifndef _LP64 to avoid the additional move on x86_64?

Could you please add the regression test to the webrev? Or did this reproduce with other tests?

Thanks,
Tobias

On 25.09.2018 16:00, Dmitry Cherepanov wrote:
Hello,

Please review a patch that resolves issue in x86 32bit builds. It slightly adjusts the fix for JDK-8201447 (C1 does backedge profiling incorrectly) by creating a copy of the left operand and using it for incrementing backedge counter.

JBS issue: https://bugs.openjdk.java.net/browse/JDK-8211100
webrev: http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.00/

Thanks,

Dmitry


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180926/9f279ec3/attachment.html>

From tobias.hartmann at oracle.com  Wed Sep 26 08:25:17 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 26 Sep 2018 10:25:17 +0200
Subject: [12] RFR(M): 8210215: C2 should optimize trichotomy calculations
In-Reply-To: <8DC6DEBA-9D31-4DFB-97D7-83474E69A5E3@oracle.com>
References: <f28f8c5e-41e2-cc9d-d8b8-d1acf42dcdf4@oracle.com>
 <8DC6DEBA-9D31-4DFB-97D7-83474E69A5E3@oracle.com>
Message-ID: <4991b788-ed40-ffd0-ff0c-85a4ef246df2@oracle.com>

Hi John,

thanks for looking at this again!

On 26.09.2018 01:57, John Rose wrote:
> `res[9][9]` should be `res[illegal+1][illegal+1]` and should have rows and columns for `never`
> (code smell: ?`9` is a naked constant; makes it hard to tell your table is out of date)

Right, I've updated the table.

> In the test cases `compare1` has `(a < b) ? -1 : (a == b) ? 0 : 1`.
> Shouldn?t you also test `(a < b) ? -1 : (a <= b) ? 0 : 1`?
> And similarly, for other cases where the second test overlaps
> with the first.

I did not add tests for all the 6? operator combinations but I think more overlapping tests won't
hurt. I've added

 (a < b)  ? -1 : (a <= b) ?  0 :  1;
 (a > b)  ?  1 : (a >= b) ?  0 : -1;
 (a == b) ?  0 : (a <= b) ? -1 :  1;
 (a == b) ?  0 : (a >= b) ?  1 : -1;

and verified that all inlined comparisons fold.

Here's the new webrev:
http://cr.openjdk.java.net/~thartmann/8210215/webrev.01/

Thanks,
Tobias

From tobias.hartmann at oracle.com  Wed Sep 26 08:29:42 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 26 Sep 2018 10:29:42 +0200
Subject: RFR: 8211100: hotspot C1 issue with comparing long numbers on x86
 32-bit
In-Reply-To: <C42E6132-613F-474C-B408-7313677C08B5@azul.com>
References: <B1F2DAF7-6A1D-456A-BEE8-24C938661124@azul.com>
 <e0d34924-5e72-2a7a-78b3-0b36996495bd@oracle.com>
 <C42E6132-613F-474C-B408-7313677C08B5@azul.com>
Message-ID: <b4eb664b-4aef-7abf-a9f1-39e400ca0fc3@oracle.com>

Hi Dmitry,

this looks good to me but Igor (who implemented 8201447) should have a look as well.

Best regards,
Tobias

On 26.09.2018 09:04, Dmitry Cherepanov wrote:
> Hi Tobias,
> 
> Thanks for the review, updated patch avoids the additional move on x86_64 and includes the
> regression test.
> 
> http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.01/
> <http://cr.openjdk.java.net/%7Edcherepanov/8211100/webrev.01/>
> 
> Dmitry
> 
>> On Sep 25, 2018, at 6:40 PM, Tobias Hartmann <tobias.hartmann at oracle.com
>> <mailto:tobias.hartmann at oracle.com>> wrote:
>>
>> Hi Dmitry,
>>
>> Shouldn't this at least be guarded by an #ifndef _LP64 to avoid the additional move on x86_64?
>>
>> Could you please add the regression test to the webrev? Or did this reproduce with other tests?
>>
>> Thanks,
>> Tobias
>>
>> On 25.09.2018 16:00, Dmitry Cherepanov wrote:
>>> Hello,
>>>
>>> Please review a patch that resolves issue in x86 32bit builds. It slightly adjusts the fix for
>>> JDK-8201447 (C1 does backedge profiling incorrectly) by creating a copy of the left operand and
>>> using it for incrementing backedge counter.
>>>
>>> JBS issue: https://bugs.openjdk.java.net/browse/JDK-8211100
>>> webrev: http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.00/
>>> <http://cr.openjdk.java.net/%7Edcherepanov/8211100/webrev.00/>
>>>
>>> Thanks,
>>>
>>> Dmitry
>>>
> 

From rkennke at redhat.com  Wed Sep 26 08:30:17 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 26 Sep 2018 10:30:17 +0200
Subject: RFR: JDK-8211129: [Testbug]
 compiler/whitebox/ForceNMethodSweepTest.java fails after JDK-8132849
Message-ID: <e2c48288-4559-4544-a69a-7e2a5f9bfe23@redhat.com>

Please review the following change:

Several tests fail because after forcing nmethod sweep via Whitebox API,
the sweeper doesn't actually kick in.

The reason is the changed heuristic in NMethodSweeper: before
JDK-8132849, we would scan stacks and mark nmethods at every safepoint,
during safepoint cleanup phase. This would subsequently trigger a sweep
cycle via _should_sweep. If no stack-scanning is performed, the sweeper
would skip sweeping because the CompiledMethodIterator _current has not
been reset.

I propose to change the following:

- In the sweep-loop, call into do_stack_scanning() whenever it's forced
(via WhiteBox API) or if should_sweep has been determined by other
heuristics (code-cache-change, time-since-last-sweep,..)

- Instead let do_stack_scanning() not set _should_sweep anymore.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8211129
Webrev:
http://cr.openjdk.java.net/~rkennke/JDK-8211129/webrev.00/

Testing: Fixes previously failing:
compiler/whitebox/ForceNMethodSweepTest.java
jdk/jfr/event/compiler/TestCodeSweeperStats.java

Passes: hotspot/jtreg:tier1

-------------- next part --------------
A non-text attachment was scrubbed...
Name: pEpkey.asc
Type: application/pgp-keys
Size: 1761 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180926/7ceadf00/pEpkey-0001.asc>

From Alan.Bateman at oracle.com  Wed Sep 26 10:35:52 2018
From: Alan.Bateman at oracle.com (Alan Bateman)
Date: Wed, 26 Sep 2018 11:35:52 +0100
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
 <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>
 <c4fdca2f-7c6a-7429-6b1a-efdff2994bf6@oracle.com>
 <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com>
Message-ID: <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com>

On 26/09/2018 09:44, Andrew Dinn wrote:
> :
>> If you (collectively) are aware of this and other limitations, then
>> sure, let's proceed with this JEP.
> Well, I'm very happy to proceed if Alan is in agreement. One thing he
> suggested in an earlier post was splitting off the functionality to
> create a persistent ByteBuffer into a separate method so as to avoid any
> issues if we have to deprecate this model at a alter date. I think
> that's a quite reasonable precaution and I'd be happy to propose an
> alternative API or let Alan suggest one. Perhaps Alan can comment?
>
I'm reasonably happy with the approach that we converged on to introduce 
new map modes and use the existing FileChannel.map method. Ideally new 
map modes wouldn't need to be exposed but you've looked into that (to my 
satisfaction at least). One detail that I think may need another 
iteration or two on is whether we need one or two modes. This will 
become clearer once the javadoc is fleshed out a bit further. It maybe 
that one new map mode, "SYNC" for example, that works with the existing 
READ_WRITE mode may be clearer and easier to specify. I think that would 
be consistent with how copy-on-write mappings are exposed with the 
PRIVATE mode. It also provides a 1-1 mapping to the underlying MAP_SYNC 
flag too.

As regards the bigger topic on what the right API is for "memory" then I 
don't think ByteBuffer is the right answer. You've touched on a few of 
the issues in your mail but there are bigger issues around thread safety 
and confinement, also the issue of the buffer position/limit that get in 
the way and the reason why several libraries use Unsafe. There isn't any 
concrete proposal or discussion to point at around splitting out this 
aspect of Project Panama. Stuart and I just pointing out that a better 
solution could emerge which could lead to have an alternative API to map 
a region of NVM as "memory" rather than a mapped byte buffer. If I were 
developing a file system backed by NVM then that is probably the raw API 
that I would want, not MBB.

As regards introducing an API that we could deprecate then that musing 
was about introducing a JDK-specific API. If MapMode were an interface 
then we could have introduce a JDK-specific map mode that wouldn't have 
required rev'ing the standard API. Introducing a completely separate map 
method in a JDK-specific module doesn't seem to be worth it as I think 
we can be confident that the proposed and possible-new.future approaches 
will not conflict.

-Alan

From adinn at redhat.com  Wed Sep 26 08:44:04 2018
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 26 Sep 2018 09:44:04 +0100
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <c4fdca2f-7c6a-7429-6b1a-efdff2994bf6@oracle.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
 <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>
 <c4fdca2f-7c6a-7429-6b1a-efdff2994bf6@oracle.com>
Message-ID: <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com>

Hi Stuart,

On 26/09/18 03:19, Stuart Marks wrote:
> I've been starting to look at some of the buffer-related issues and I've
> been discussing this issue with Alan.

I'd be interested to hear more details if the discussion has gone far
enough for any of it to be aired online.

> On 9/25/18 2:01 AM, Andrew Haley wrote:
>> . . .
>> I'm baffled by this assertion. It's true that the 2Gb limit is turning
>> into a real pain, but all of the rest are nothing to do with
>> ByteBuffers, which are just raw memory. Adding structure is something
>> that can be done by third-party libraries or by some future OpenJDK
>> API.
> 
> If you look around Java SE for a public API to represent raw memory,
> then MappedByteBuffer is the obvious choice; there isn't any realistic
> alternative right now. By asking whether MBB is "the right way to
> expose" raw memory, I think Alan is really saying, is MBB the best API
> to use to expose raw memory in the long run? I think the answer is
> clearly No.
Sorry, it may well be my fault but it's not really clear to me. You
mention two issues below, buffer size limit and API verbosity.

I acknowledge the former is a problem but i) there is a proposal
(JDK-8180628, referred to in the JEP) to deal with this limitation by
adding extra methods that support the creation of larger buffers and use
of long indices and ii) there are existing Java libraries built over
ByteBuffer that overcome this issue (as Sandhya pointed out in a note
somewhere near this one). Sure, both of these remedies have limitations
which /might/ lead to problems but I don't see (yet, at least) that they
are manifestly unworkable.

As regards the latter issue, I am not really sure what you are
suggesting would be a better alternative to using ByteBuffer get and put
methods? Are you perhaps thinking of some way of overlaying a record (or
object?) structure over the mapped memory that might allow a compiler to
provide an equivalent to these ByteBuffer method calls as direct memory
loads and stores? Of course, a Java library built on top of this
proposal could provide a similar abstraction, although perhaps not with
as firm guarantees for compiler optimization and certainly not with the
possibility of direct language integration.

Copying might indeed be an issue but surely that depends on the type of
data being written, the library design and the way the client needs to
operate in order to use it (essentially on whether it can size in
advance a data area in which to write the contents direct vs build a
separate copy as distinct pieces and then serialize them).

Anyway, I hope the above explains why I'm not sure about your use of the
the words 'clearly' or (in a in a later comment) 'insurmountable'.
Perhaps more details of your conversation with Alan would help.

> There are, however, certain things that can be done with buffers in the
> short term to make things work better. For example, JDK-5029431 absolute
> bulk put/get methods. I suspect this will be quite helpful for the NVM
> case, and indeed it's been something that's been asked for repeatedly
> for quite some time.

Would it be enough to add a comment to the Risks and Assumptions of the
JEP to point out this limitation and the potential need to address it,
mentioning this specific JDK issue -- much as was done with JDK-8180628.

> If you (collectively) are aware of this and other limitations, then
> sure, let's proceed with this JEP.

Well, I'm very happy to proceed if Alan is in agreement. One thing he
suggested in an earlier post was splitting off the functionality to
create a persistent ByteBuffer into a separate method so as to avoid any
issues if we have to deprecate this model at a alter date. I think
that's a quite reasonable precaution and I'd be happy to propose an
alternative API or let Alan suggest one. Perhaps Alan can comment?

> Agree, "waiting for Panama" is certainly not something that anyone wants
> to do.
Yes, indeed, there are already several important use cases waiting in
the wings.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From aph at redhat.com  Wed Sep 26 09:53:23 2018
From: aph at redhat.com (Andrew Haley)
Date: Wed, 26 Sep 2018 10:53:23 +0100
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
 <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>
 <c4fdca2f-7c6a-7429-6b1a-efdff2994bf6@oracle.com>
 <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com>
Message-ID: <d073aa9e-0c80-35c0-7430-c6b5dbfd7ff9@redhat.com>

On 09/26/2018 09:44 AM, Andrew Dinn wrote:
> As regards the latter issue, I am not really sure what you are
> suggesting would be a better alternative to using ByteBuffer get and put
> methods? Are you perhaps thinking of some way of overlaying a record (or
> object?) structure over the mapped memory that might allow a compiler to
> provide an equivalent to these ByteBuffer method calls as direct memory
> loads and stores? Of course, a Java library built on top of this
> proposal could provide a similar abstraction, although perhaps not with
> as firm guarantees for compiler optimization and certainly not with the
> possibility of direct language integration.

Thinking about it some more, I guess that being able to say something
like

  aFoo.bar = n;

rather than

  aFoo.setBar(n);

is preferable, although common Java practice (and indeed good OOP
practice) is to provide getters and setters rather than directly
expose fields. I suppose one advantage of being able to use an object
structure is that the compiler can do better (type-based) alias
analysis, can track dead stores, etc. But from a language design
perspective, the fact that classes internally use direct field
accesses but expose a completely different get/set notation is
something of a linguistic wart.

[ The BETA language used a single notation, the pattern, for
assignment, method calls, and argument passing. Therefore, in BETA
there would be no API difference between the two exaples above. They'd
both be something like

  n -> aFoo.bar

Curiously, the first commercial licences for BETA were acquired by
Bill Joy and James Gosling, so they knew about this, but I suppose a
more C-like notation for Java was the right decision. The BETA
notation would have been too frightening for the target audience. :-) ]

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From adinn at redhat.com  Wed Sep 26 13:27:31 2018
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 26 Sep 2018 14:27:31 +0100
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
 <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>
 <c4fdca2f-7c6a-7429-6b1a-efdff2994bf6@oracle.com>
 <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com>
 <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com>
Message-ID: <df03ca64-d68a-ca8b-5309-395672ac7dcb@redhat.com>

Hi Alan,

On 26/09/18 11:35, Alan Bateman wrote:

> I'm reasonably happy with the approach that we converged on to introduce
> new map modes and use the existing FileChannel.map method. Ideally new
> map modes wouldn't need to be exposed but you've looked into that (to my
> satisfaction at least). One detail that I think may need another
> iteration or two on is whether we need one or two modes. This will
> become clearer once the javadoc is fleshed out a bit further. It maybe
> that one new map mode, "SYNC" for example, that works with the existing
> READ_WRITE mode may be clearer and easier to specify. I think that would
> be consistent with how copy-on-write mappings are exposed with the
> PRIVATE mode. It also provides a 1-1 mapping to the underlying MAP_SYNC
> flag too.

I'm not clear why we should only use one flag. The two flags I specified
reflect two independent use cases, one where data stored in an NVM
device is accessed read-only and another where it is accessed
read-write. Are you suggesting that the read-only case is redundant? I'm
not sure I agree. For example, a utility which might want to review the
state of persistent data while a service is off-line would really want
to pass flag READ_ONLY_PERSISTENT. Of course, it could employ
READ_WRITE_PERSISTENT (or equivalently, SYNC) and just not write the
data but, mutatis mutandis, that same argument would remove the case for
flag READ_ONLY.

> As regards the bigger topic on what the right API is for "memory" then I
> don't think ByteBuffer is the right answer. You've touched on a few of
> the issues in your mail but there are bigger issues around thread safety
> and confinement, also the issue of the buffer position/limit that get in
> the way and the reason why several libraries use Unsafe. There isn't any
> concrete proposal or discussion to point at around splitting out this
> aspect of Project Panama. Stuart and I just pointing out that a better
> solution could emerge which could lead to have an alternative API to map
> a region of NVM as "memory" rather than a mapped byte buffer. If I were
> developing a file system backed by NVM then that is probably the raw API
> that I would want, not MBB.

I really don't understand how thread safety comes into the argument
here. How is some other mechanism going to avoid the need for client
threads -- or, rather, the applications which create them them -- to
manage concurrent updates of NVM? Are you perhaps thinking of some form
of software transactional memory? I'm really struggling to understand
why you keep raising this point without any further detail to explain
how the lack of exclusion and synchronization primitives constitutes a
problem this API that can be bypassed by rolling some equivalent
alternative into another API.

Also, can you explain what you mean by confinement? (thread confinement?).

Also, I don't think I would label this API an attempt to develop a file
system. I think that's rather and overblown characterisation of what it
does. This is an attempt to provide an intermediate storage tier
somewhere between a file system and volatile memory to
create/access/update data across program runs, without incurring the
costs associated with implementing that sort of capability on top of
existing file system implementations.

The use of a byte array layout at the base level is indeed, as the
success of Unix/Linux/MS Windows file systems makes clear, a helpful way
of enabling a variety of application-defined data layouts to be
implemented on top of this storage tier. But I don't really see how that
makes this a file system.

> As regards introducing an API that we could deprecate then that musing
> was about introducing a JDK-specific API. If MapMode were an interface
> then we could have introduce a JDK-specific map mode that wouldn't have
> required rev'ing the standard API. Introducing a completely separate map
> method in a JDK-specific module doesn't seem to be worth it as I think
> we can be confident that the proposed and possible-new.future approaches
> will not conflict.
Ok, so no need for a change there then I guess.

I'm still not quite sure where this reply leaves the JEP though. Shall I
update the Risks and Assumptions section to include mention of
JDK-5029431 as suggested to Stuart? Is there anything else I can do to
progress things?

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From adinn at redhat.com  Wed Sep 26 15:26:44 2018
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 26 Sep 2018 16:26:44 +0100
Subject: RFR: JDK-8211105: AArch64: Disable cos/sin and log intrinsics in
 jdk11u pending fix
Message-ID: <3ae53e62-75c1-0bc6-b0be-fa07c183efe2@redhat.com>

Can I please get a review for this trivial fix for jdk11u which is
intended to disable the broken, generated AArch64 trig and log
intrinsics. This is a stop-gap to avoid the breakage until we are ok to
backport upstream fixes.

  JIRA:   https://bugs.openjdk.java.net/browse/JDK-8211105
  webrev: http://cr.openjdk.java.net/~adinn/8211105/webrev.00/

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From aph at redhat.com  Wed Sep 26 15:31:14 2018
From: aph at redhat.com (Andrew Haley)
Date: Wed, 26 Sep 2018 16:31:14 +0100
Subject: RFR: JDK-8211105: AArch64: Disable cos/sin and log intrinsics in
 jdk11u pending fix
In-Reply-To: <3ae53e62-75c1-0bc6-b0be-fa07c183efe2@redhat.com>
References: <3ae53e62-75c1-0bc6-b0be-fa07c183efe2@redhat.com>
Message-ID: <b7de85cf-8487-823e-5469-7d1412410391@redhat.com>

On 09/26/2018 04:26 PM, Andrew Dinn wrote:
> Can I please get a review for this trivial fix for jdk11u which is
> intended to disable the broken, generated AArch64 trig and log
> intrinsics. This is a stop-gap to avoid the breakage until we are ok to
> backport upstream fixes.
> 
>   JIRA:   https://bugs.openjdk.java.net/browse/JDK-8211105
>   webrev: http://cr.openjdk.java.net/~adinn/8211105/webrev.00/

OK

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From shade at redhat.com  Wed Sep 26 15:33:44 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 26 Sep 2018 17:33:44 +0200
Subject: RFR: JDK-8211105: AArch64: Disable cos/sin and log intrinsics in
 jdk11u pending fix
In-Reply-To: <3ae53e62-75c1-0bc6-b0be-fa07c183efe2@redhat.com>
References: <3ae53e62-75c1-0bc6-b0be-fa07c183efe2@redhat.com>
Message-ID: <e4323696-caf9-ca1f-8b66-ef7307dd7b52@redhat.com>

On 09/26/2018 05:26 PM, Andrew Dinn wrote:
> Can I please get a review for this trivial fix for jdk11u which is
> intended to disable the broken, generated AArch64 trig and log
> intrinsics. This is a stop-gap to avoid the breakage until we are ok to
> backport upstream fixes.
> 
>   JIRA:   https://bugs.openjdk.java.net/browse/JDK-8211105
>   webrev: http://cr.openjdk.java.net/~adinn/8211105/webrev.00/

Looks good.

Please mention in the comments that we are waiting on JDK-8210858 and JDK-8210461.

-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180926/71fd47fe/signature.asc>

From dmitrij.pochepko at bell-sw.com  Wed Sep 26 15:40:33 2018
From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko)
Date: Wed, 26 Sep 2018 18:40:33 +0300
Subject: RFR: JDK-8211105: AArch64: Disable cos/sin and log intrinsics in
 jdk11u pending fix
In-Reply-To: <3ae53e62-75c1-0bc6-b0be-fa07c183efe2@redhat.com>
References: <3ae53e62-75c1-0bc6-b0be-fa07c183efe2@redhat.com>
Message-ID: <cafc54a3-8fbc-8a4f-8b17-8c5043776055@bell-sw.com>

Looks food to me.

(Sorry for the trouble)


On 26/09/18 18:26, Andrew Dinn wrote:
> Can I please get a review for this trivial fix for jdk11u which is
> intended to disable the broken, generated AArch64 trig and log
> intrinsics. This is a stop-gap to avoid the breakage until we are ok to
> backport upstream fixes.
>
>    JIRA:   https://bugs.openjdk.java.net/browse/JDK-8211105
>    webrev: http://cr.openjdk.java.net/~adinn/8211105/webrev.00/
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander


From Alan.Bateman at oracle.com  Wed Sep 26 16:00:43 2018
From: Alan.Bateman at oracle.com (Alan Bateman)
Date: Wed, 26 Sep 2018 17:00:43 +0100
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <df03ca64-d68a-ca8b-5309-395672ac7dcb@redhat.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
 <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>
 <c4fdca2f-7c6a-7429-6b1a-efdff2994bf6@oracle.com>
 <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com>
 <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com>
 <df03ca64-d68a-ca8b-5309-395672ac7dcb@redhat.com>
Message-ID: <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com>

On 26/09/2018 14:27, Andrew Dinn wrote:
> :
> I really don't understand how thread safety comes into the argument
> here. How is some other mechanism going to avoid the need for client
> threads -- or, rather, the applications which create them them -- to
> manage concurrent updates of NVM? Are you perhaps thinking of some form
> of software transactional memory? I'm really struggling to understand
> why you keep raising this point without any further detail to explain
> how the lack of exclusion and synchronization primitives constitutes a
> problem this API that can be bypassed by rolling some equivalent
> alternative into another API.
The reason that we've mentioned it a few times is because it's a 
significant issue. If you have a byte buffer then you can't have 
different threads accessing different parts of the buffer at the same 
time, at least not with any of the relative get/put methods as they 
depend on the buffer position. Sure you can globally synchronize all 
operations but you'll likely want much finer granularity. This bugbear 
comes up periodically, particularly when using buffers for cases that 
they weren't really designed for. Stuart pointed out the lack of 
absolute bulk get/put operations which is something that I think will 
help some of these cases.

>
> Also, can you explain what you mean by confinement? (thread confinement?).
Yes, thread vs. global. I haven't been following Panama close enough to 
say how this is exposed in the API.


>
> Also, I don't think I would label this API an attempt to develop a file
> system. I think that's rather and overblown characterisation of what it
> does.
I think you may have mis-read my mail as was just picking another 
example where MBB would be problematic.


> :
> I'm still not quite sure where this reply leaves the JEP though. Shall I
> update the Risks and Assumptions section to include mention of
> JDK-5029431 as suggested to Stuart? Is there anything else I can do to
> progress things?
>
It wouldn't do any harm to have this section mention that an alternative 
that exposes a more memory centric API may be possible in the future.

-Alan

From igor.veresov at oracle.com  Wed Sep 26 16:35:18 2018
From: igor.veresov at oracle.com (Igor Veresov)
Date: Wed, 26 Sep 2018 09:35:18 -0700
Subject: RFR: 8211100: hotspot C1 issue with comparing long numbers on x86
 32-bit
In-Reply-To: <b4eb664b-4aef-7abf-a9f1-39e400ca0fc3@oracle.com>
References: <B1F2DAF7-6A1D-456A-BEE8-24C938661124@azul.com>
 <e0d34924-5e72-2a7a-78b3-0b36996495bd@oracle.com>
 <C42E6132-613F-474C-B408-7313677C08B5@azul.com>
 <b4eb664b-4aef-7abf-a9f1-39e400ca0fc3@oracle.com>
Message-ID: <659DF4FF-71B9-472D-A064-038ADF2A50FF@oracle.com>

It doesn?t seem to me like the proper way to fix it. The problem is that the cmp is destroying opr1 without telling the register allocator about it.

One possible solution would be to make opr1 also a temp (see LIR_OpVisitState::visit(LIR_Op* op) in c1_LIR.cpp), only for x86 32bit and only if the operand type is T_LONG. 
Another solution is to maintain a temporary register for lir_cmp and use it to save/restore opr1 when emitting the code in LIR_Assembler::comp_op(). Again, the temporary register has to be there only for x86 32bit and T_LONG.

igor


> On Sep 26, 2018, at 1:29 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> Hi Dmitry,
> 
> this looks good to me but Igor (who implemented 8201447) should have a look as well.
> 
> Best regards,
> Tobias
> 
> On 26.09.2018 09:04, Dmitry Cherepanov wrote:
>> Hi Tobias,
>> 
>> Thanks for the review, updated patch avoids the additional move on x86_64 and includes the
>> regression test.
>> 
>> http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.01/ <http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.01/>
>> <http://cr.openjdk.java.net/%7Edcherepanov/8211100/webrev.01/ <http://cr.openjdk.java.net/%7Edcherepanov/8211100/webrev.01/>>
>> 
>> Dmitry
>> 
>>> On Sep 25, 2018, at 6:40 PM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>
>>> <mailto:tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>>> wrote:
>>> 
>>> Hi Dmitry,
>>> 
>>> Shouldn't this at least be guarded by an #ifndef _LP64 to avoid the additional move on x86_64?
>>> 
>>> Could you please add the regression test to the webrev? Or did this reproduce with other tests?
>>> 
>>> Thanks,
>>> Tobias
>>> 
>>> On 25.09.2018 16:00, Dmitry Cherepanov wrote:
>>>> Hello,
>>>> 
>>>> Please review a patch that resolves issue in x86 32bit builds. It slightly adjusts the fix for
>>>> JDK-8201447 (C1 does backedge profiling incorrectly) by creating a copy of the left operand and
>>>> using it for incrementing backedge counter.
>>>> 
>>>> JBS issue: https://bugs.openjdk.java.net/browse/JDK-8211100 <https://bugs.openjdk.java.net/browse/JDK-8211100>
>>>> webrev: http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.00/ <http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.00/>
>>>> <http://cr.openjdk.java.net/%7Edcherepanov/8211100/webrev.00/ <http://cr.openjdk.java.net/%7Edcherepanov/8211100/webrev.00/>>
>>>> 
>>>> Thanks,
>>>> 
>>>> Dmitry

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180926/92b3654e/attachment.html>

From rkennke at redhat.com  Wed Sep 26 17:26:29 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 26 Sep 2018 19:26:29 +0200
Subject: RFR: JDK-8211129: compiler/whitebox/ForceNMethodSweepTest.java
 fails after JDK-8132849
In-Reply-To: <e2c48288-4559-4544-a69a-7e2a5f9bfe23@redhat.com>
References: <e2c48288-4559-4544-a69a-7e2a5f9bfe23@redhat.com>
Message-ID: <e8e1e8a9-eb86-1e69-f033-eda67ccc8d2e@redhat.com>

Ping! This fixes two failing tests...

(also changed subject to remove [Testbug])

Thanks,
Roman


Am 26.09.18 um 10:30 schrieb Roman Kennke:
> Please review the following change:
> 
> Several tests fail because after forcing nmethod sweep via Whitebox API,
> the sweeper doesn't actually kick in.
> 
> The reason is the changed heuristic in NMethodSweeper: before
> JDK-8132849, we would scan stacks and mark nmethods at every safepoint,
> during safepoint cleanup phase. This would subsequently trigger a sweep
> cycle via _should_sweep. If no stack-scanning is performed, the sweeper
> would skip sweeping because the CompiledMethodIterator _current has not
> been reset.
> 
> I propose to change the following:
> 
> - In the sweep-loop, call into do_stack_scanning() whenever it's forced
> (via WhiteBox API) or if should_sweep has been determined by other
> heuristics (code-cache-change, time-since-last-sweep,..)
> 
> - Instead let do_stack_scanning() not set _should_sweep anymore.
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8211129
> Webrev:
> http://cr.openjdk.java.net/~rkennke/JDK-8211129/webrev.00/
> 
> Testing: Fixes previously failing:
> compiler/whitebox/ForceNMethodSweepTest.java
> jdk/jfr/event/compiler/TestCodeSweeperStats.java
> 
> Passes: hotspot/jtreg:tier1
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180926/6f510112/signature-0001.asc>

From vladimir.kozlov at oracle.com  Wed Sep 26 19:25:43 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 26 Sep 2018 12:25:43 -0700
Subject: [12] RFR(M): 8210215: C2 should optimize trichotomy calculations
In-Reply-To: <4991b788-ed40-ffd0-ff0c-85a4ef246df2@oracle.com>
References: <f28f8c5e-41e2-cc9d-d8b8-d1acf42dcdf4@oracle.com>
 <8DC6DEBA-9D31-4DFB-97D7-83474E69A5E3@oracle.com>
 <4991b788-ed40-ffd0-ff0c-85a4ef246df2@oracle.com>
Message-ID: <a3f0e902-aa81-831b-9a05-822054223c36@oracle.com>

Hi Tobias,

In the head comment of RegionNode::optimize_trichotomy() add code examples you optimizing.

I think you need to return bool value if modification happens and || with 'modified' value.

In shape 1 check you check for region->outcnt() != 2. Does it mean this Region node does not have Phi node attached? 1 - 
is itself, 2 - is this Region node.

Next optimization seems wrong for case 1 where you don't check Phi inputs - it could be normal diamond shape with 
different Phi node values on each branch:

+  if (iff1 == iff2) {
+    igvn->replace_input_of(region, idx1, iff1->in(0));
+    igvn->replace_input_of(region, idx2, igvn->C->top());
+    return; // Remove useless if (both projections map to the same control/value)
+  }

I think you need to check control flow and Phi inputs for both cases to make sure you got only expected shapes before 
transforming graph.

Thanks,
Vladimir

On 9/26/18 1:25 AM, Tobias Hartmann wrote:
> Hi John,
> 
> thanks for looking at this again!
> 
> On 26.09.2018 01:57, John Rose wrote:
>> `res[9][9]` should be `res[illegal+1][illegal+1]` and should have rows and columns for `never`
>> (code smell: ?`9` is a naked constant; makes it hard to tell your table is out of date)
> 
> Right, I've updated the table.
> 
>> In the test cases `compare1` has `(a < b) ? -1 : (a == b) ? 0 : 1`.
>> Shouldn?t you also test `(a < b) ? -1 : (a <= b) ? 0 : 1`?
>> And similarly, for other cases where the second test overlaps
>> with the first.
> 
> I did not add tests for all the 6? operator combinations but I think more overlapping tests won't
> hurt. I've added
> 
>   (a < b)  ? -1 : (a <= b) ?  0 :  1;
>   (a > b)  ?  1 : (a >= b) ?  0 : -1;
>   (a == b) ?  0 : (a <= b) ? -1 :  1;
>   (a == b) ?  0 : (a >= b) ?  1 : -1;
> 
> and verified that all inlined comparisons fold.
> 
> Here's the new webrev:
> http://cr.openjdk.java.net/~thartmann/8210215/webrev.01/
> 
> Thanks,
> Tobias
> 

From vladimir.kozlov at oracle.com  Wed Sep 26 19:54:20 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 26 Sep 2018 12:54:20 -0700
Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy
In-Reply-To: <b3329405-1736-5235-b0a2-0962ae7789fa@oracle.com>
References: <dk636u68t4l.fsf@rwestrel.remote.csb>
 <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com>
 <dk67ejaj7lk.fsf@rwestrel.remote.csb>
 <b3329405-1736-5235-b0a2-0962ae7789fa@oracle.com>
Message-ID: <3524b9d8-f387-a3c4-49c3-eace83126bb7@oracle.com>

Looks good but we need to test it hard. I submitted testing.

Thanks,
Vladimir

On 9/25/18 1:42 AM, Tobias Hartmann wrote:
> Hi Roland,
> 
> okay, thanks for the clarifications.
> 
> Best regards,
> Tobias
> 
> On 25.09.2018 10:37, Roland Westrelin wrote:
>>
>> Hi Tobias,
>>
>>> Looks good to me. I'm assuming the currently unused arguments of array_copy_requires_gc_barriers
>>> will be used for Shenandoah? Why you need to distinguish between ArrayCopyPhase?
>>
>> Thanks for the view.
>>
>> Yes extra arguments are to be used by shenandoah.
>>
>> Generating barriers once parsing is over is not supported by all
>> gcs. The shape of the barriers is sometimes too complicated to be
>> emitted at igvn time.
>>
>> Roland.
>>

From vladimir.kozlov at oracle.com  Wed Sep 26 21:58:22 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 26 Sep 2018 14:58:22 -0700
Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy
In-Reply-To: <3524b9d8-f387-a3c4-49c3-eace83126bb7@oracle.com>
References: <dk636u68t4l.fsf@rwestrel.remote.csb>
 <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com>
 <dk67ejaj7lk.fsf@rwestrel.remote.csb>
 <b3329405-1736-5235-b0a2-0962ae7789fa@oracle.com>
 <3524b9d8-f387-a3c4-49c3-eace83126bb7@oracle.com>
Message-ID: <27315645-ff46-0e05-a22d-89c166f5bd8b@oracle.com>

A LOT tests failed.

workspace/open/src/hotspot/share/opto/graphKit.cpp:1848), pid=24736, tid=15619
jib > #  assert(C->alias_type(call->adr_type()) == C->alias_type(hook_mem)) failed: call node must be constructed correctly

Vladimir

On 9/26/18 12:54 PM, Vladimir Kozlov wrote:
> Looks good but we need to test it hard. I submitted testing.
> 
> Thanks,
> Vladimir
> 
> On 9/25/18 1:42 AM, Tobias Hartmann wrote:
>> Hi Roland,
>>
>> okay, thanks for the clarifications.
>>
>> Best regards,
>> Tobias
>>
>> On 25.09.2018 10:37, Roland Westrelin wrote:
>>>
>>> Hi Tobias,
>>>
>>>> Looks good to me. I'm assuming the currently unused arguments of array_copy_requires_gc_barriers
>>>> will be used for Shenandoah? Why you need to distinguish between ArrayCopyPhase?
>>>
>>> Thanks for the view.
>>>
>>> Yes extra arguments are to be used by shenandoah.
>>>
>>> Generating barriers once parsing is over is not supported by all
>>> gcs. The shape of the barriers is sometimes too complicated to be
>>> emitted at igvn time.
>>>
>>> Roland.
>>>

From igor.veresov at oracle.com  Wed Sep 26 23:18:58 2018
From: igor.veresov at oracle.com (Igor Veresov)
Date: Wed, 26 Sep 2018 16:18:58 -0700
Subject: RFR: 8211100: hotspot C1 issue with comparing long numbers on x86
 32-bit
In-Reply-To: <659DF4FF-71B9-472D-A064-038ADF2A50FF@oracle.com>
References: <B1F2DAF7-6A1D-456A-BEE8-24C938661124@azul.com>
 <e0d34924-5e72-2a7a-78b3-0b36996495bd@oracle.com>
 <C42E6132-613F-474C-B408-7313677C08B5@azul.com>
 <b4eb664b-4aef-7abf-a9f1-39e400ca0fc3@oracle.com>
 <659DF4FF-71B9-472D-A064-038ADF2A50FF@oracle.com>
Message-ID: <CC22DE2F-EB21-42B9-8885-4D8929E8EF29@oracle.com>

Edit: It may be more consistent to check for is_double_cpu() instead of T_LONG. Although that?s semantically equivalent.

> On Sep 26, 2018, at 9:35 AM, Igor Veresov <igor.veresov at oracle.com> wrote:
> 
> It doesn?t seem to me like the proper way to fix it. The problem is that the cmp is destroying opr1 without telling the register allocator about it.
> 
> One possible solution would be to make opr1 also a temp (see LIR_OpVisitState::visit(LIR_Op* op) in c1_LIR.cpp), only for x86 32bit and only if the operand type is T_LONG. 
> Another solution is to maintain a temporary register for lir_cmp and use it to save/restore opr1 when emitting the code in LIR_Assembler::comp_op(). Again, the temporary register has to be there only for x86 32bit and T_LONG.
> 
> igor
> 
> 
>> On Sep 26, 2018, at 1:29 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>> 
>> Hi Dmitry,
>> 
>> this looks good to me but Igor (who implemented 8201447) should have a look as well.
>> 
>> Best regards,
>> Tobias
>> 
>> On 26.09.2018 09:04, Dmitry Cherepanov wrote:
>>> Hi Tobias,
>>> 
>>> Thanks for the review, updated patch avoids the additional move on x86_64 and includes the
>>> regression test.
>>> 
>>> http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.01/ <http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.01/>
>>> <http://cr.openjdk.java.net/%7Edcherepanov/8211100/webrev.01/ <http://cr.openjdk.java.net/%7Edcherepanov/8211100/webrev.01/>>
>>> 
>>> Dmitry
>>> 
>>>> On Sep 25, 2018, at 6:40 PM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>
>>>> <mailto:tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>>> wrote:
>>>> 
>>>> Hi Dmitry,
>>>> 
>>>> Shouldn't this at least be guarded by an #ifndef _LP64 to avoid the additional move on x86_64?
>>>> 
>>>> Could you please add the regression test to the webrev? Or did this reproduce with other tests?
>>>> 
>>>> Thanks,
>>>> Tobias
>>>> 
>>>> On 25.09.2018 16:00, Dmitry Cherepanov wrote:
>>>>> Hello,
>>>>> 
>>>>> Please review a patch that resolves issue in x86 32bit builds. It slightly adjusts the fix for
>>>>> JDK-8201447 (C1 does backedge profiling incorrectly) by creating a copy of the left operand and
>>>>> using it for incrementing backedge counter.
>>>>> 
>>>>> JBS issue: https://bugs.openjdk.java.net/browse/JDK-8211100 <https://bugs.openjdk.java.net/browse/JDK-8211100>
>>>>> webrev: http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.00/ <http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.00/>
>>>>> <http://cr.openjdk.java.net/%7Edcherepanov/8211100/webrev.00/ <http://cr.openjdk.java.net/%7Edcherepanov/8211100/webrev.00/>>
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Dmitry
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180926/f1f22805/attachment-0001.html>

From erik.osterlund at oracle.com  Thu Sep 27 01:36:38 2018
From: erik.osterlund at oracle.com (Erik Osterlund)
Date: Wed, 26 Sep 2018 21:36:38 -0400
Subject: RFR: JDK-8211129: compiler/whitebox/ForceNMethodSweepTest.java
 fails after JDK-8132849
In-Reply-To: <e8e1e8a9-eb86-1e69-f033-eda67ccc8d2e@redhat.com>
References: <e2c48288-4559-4544-a69a-7e2a5f9bfe23@redhat.com>
 <e8e1e8a9-eb86-1e69-f033-eda67ccc8d2e@redhat.com>
Message-ID: <88708510-05E6-494F-937D-D9B91BB70E11@oracle.com>

Hi Roman,

Looks reasonable.

Thanks,
/Erik

> On 26 Sep 2018, at 13:26, Roman Kennke <rkennke at redhat.com> wrote:
> 
> Ping! This fixes two failing tests...
> 
> (also changed subject to remove [Testbug])
> 
> Thanks,
> Roman
> 
> 
>> Am 26.09.18 um 10:30 schrieb Roman Kennke:
>> Please review the following change:
>> 
>> Several tests fail because after forcing nmethod sweep via Whitebox API,
>> the sweeper doesn't actually kick in.
>> 
>> The reason is the changed heuristic in NMethodSweeper: before
>> JDK-8132849, we would scan stacks and mark nmethods at every safepoint,
>> during safepoint cleanup phase. This would subsequently trigger a sweep
>> cycle via _should_sweep. If no stack-scanning is performed, the sweeper
>> would skip sweeping because the CompiledMethodIterator _current has not
>> been reset.
>> 
>> I propose to change the following:
>> 
>> - In the sweep-loop, call into do_stack_scanning() whenever it's forced
>> (via WhiteBox API) or if should_sweep has been determined by other
>> heuristics (code-cache-change, time-since-last-sweep,..)
>> 
>> - Instead let do_stack_scanning() not set _should_sweep anymore.
>> 
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8211129
>> Webrev:
>> http://cr.openjdk.java.net/~rkennke/JDK-8211129/webrev.00/
>> 
>> Testing: Fixes previously failing:
>> compiler/whitebox/ForceNMethodSweepTest.java
>> jdk/jfr/event/compiler/TestCodeSweeperStats.java
>> 
>> Passes: hotspot/jtreg:tier1
>> 
> 

From tobias.hartmann at oracle.com  Thu Sep 27 09:05:50 2018
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 27 Sep 2018 11:05:50 +0200
Subject: RFR: JDK-8211129: compiler/whitebox/ForceNMethodSweepTest.java
 fails after JDK-8132849
In-Reply-To: <e8e1e8a9-eb86-1e69-f033-eda67ccc8d2e@redhat.com>
References: <e2c48288-4559-4544-a69a-7e2a5f9bfe23@redhat.com>
 <e8e1e8a9-eb86-1e69-f033-eda67ccc8d2e@redhat.com>
Message-ID: <d37bc9da-3ff3-fd33-8713-ee4bce12112a@oracle.com>

Hi Roman,

this looks reasonable to me as well. Please verify that all failing tests now pass
(TestCodeSweeperStats.java, ForceNMethodSweepTest.java).

Thanks,
Tobias

On 26.09.2018 19:26, Roman Kennke wrote:
> Ping! This fixes two failing tests...
> 
> (also changed subject to remove [Testbug])
> 
> Thanks,
> Roman
> 
> 
> Am 26.09.18 um 10:30 schrieb Roman Kennke:
>> Please review the following change:
>>
>> Several tests fail because after forcing nmethod sweep via Whitebox API,
>> the sweeper doesn't actually kick in.
>>
>> The reason is the changed heuristic in NMethodSweeper: before
>> JDK-8132849, we would scan stacks and mark nmethods at every safepoint,
>> during safepoint cleanup phase. This would subsequently trigger a sweep
>> cycle via _should_sweep. If no stack-scanning is performed, the sweeper
>> would skip sweeping because the CompiledMethodIterator _current has not
>> been reset.
>>
>> I propose to change the following:
>>
>> - In the sweep-loop, call into do_stack_scanning() whenever it's forced
>> (via WhiteBox API) or if should_sweep has been determined by other
>> heuristics (code-cache-change, time-since-last-sweep,..)
>>
>> - Instead let do_stack_scanning() not set _should_sweep anymore.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8211129
>> Webrev:
>> http://cr.openjdk.java.net/~rkennke/JDK-8211129/webrev.00/
>>
>> Testing: Fixes previously failing:
>> compiler/whitebox/ForceNMethodSweepTest.java
>> jdk/jfr/event/compiler/TestCodeSweeperStats.java
>>
>> Passes: hotspot/jtreg:tier1
>>
> 

From rkennke at redhat.com  Thu Sep 27 09:07:08 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 27 Sep 2018 11:07:08 +0200
Subject: RFR: JDK-8211129: compiler/whitebox/ForceNMethodSweepTest.java
 fails after JDK-8132849
In-Reply-To: <d37bc9da-3ff3-fd33-8713-ee4bce12112a@oracle.com>
References: <e2c48288-4559-4544-a69a-7e2a5f9bfe23@redhat.com>
 <e8e1e8a9-eb86-1e69-f033-eda67ccc8d2e@redhat.com>
 <d37bc9da-3ff3-fd33-8713-ee4bce12112a@oracle.com>
Message-ID: <a3fa2492-861e-d2dc-79dc-a7fb568b8103@redhat.com>

Hi Tobias and Erik,

the two tests pass now.

I'm submitting it to jdk/submit and will push if it comes back clean.

Thanks,
Roman

Am 27.09.18 um 11:05 schrieb Tobias Hartmann:
> Hi Roman,
> 
> this looks reasonable to me as well. Please verify that all failing tests now pass
> (TestCodeSweeperStats.java, ForceNMethodSweepTest.java).
> 
> Thanks,
> Tobias
> 
> On 26.09.2018 19:26, Roman Kennke wrote:
>> Ping! This fixes two failing tests...
>>
>> (also changed subject to remove [Testbug])
>>
>> Thanks,
>> Roman
>>
>>
>> Am 26.09.18 um 10:30 schrieb Roman Kennke:
>>> Please review the following change:
>>>
>>> Several tests fail because after forcing nmethod sweep via Whitebox API,
>>> the sweeper doesn't actually kick in.
>>>
>>> The reason is the changed heuristic in NMethodSweeper: before
>>> JDK-8132849, we would scan stacks and mark nmethods at every safepoint,
>>> during safepoint cleanup phase. This would subsequently trigger a sweep
>>> cycle via _should_sweep. If no stack-scanning is performed, the sweeper
>>> would skip sweeping because the CompiledMethodIterator _current has not
>>> been reset.
>>>
>>> I propose to change the following:
>>>
>>> - In the sweep-loop, call into do_stack_scanning() whenever it's forced
>>> (via WhiteBox API) or if should_sweep has been determined by other
>>> heuristics (code-cache-change, time-since-last-sweep,..)
>>>
>>> - Instead let do_stack_scanning() not set _should_sweep anymore.
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8211129
>>> Webrev:
>>> http://cr.openjdk.java.net/~rkennke/JDK-8211129/webrev.00/
>>>
>>> Testing: Fixes previously failing:
>>> compiler/whitebox/ForceNMethodSweepTest.java
>>> jdk/jfr/event/compiler/TestCodeSweeperStats.java
>>>
>>> Passes: hotspot/jtreg:tier1
>>>
>>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180927/03a8d413/signature.asc>

From adinn at redhat.com  Thu Sep 27 09:23:05 2018
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 27 Sep 2018 10:23:05 +0100
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
 <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>
 <c4fdca2f-7c6a-7429-6b1a-efdff2994bf6@oracle.com>
 <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com>
 <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com>
 <df03ca64-d68a-ca8b-5309-395672ac7dcb@redhat.com>
 <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com>
Message-ID: <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com>

On 26/09/18 17:00, Alan Bateman wrote:
> The reason that we've mentioned it a few times is because it's a
> significant issue. If you have a byte buffer then you can't have
> different threads accessing different parts of the buffer at the same
> time, at least not with any of the relative get/put methods as they
> depend on the buffer position. Sure you can globally synchronize all
> operations but you'll likely want much finer granularity. This bugbear
> comes up periodically, particularly when using buffers for cases that
> they weren't really designed for. Stuart pointed out the lack of
> absolute bulk get/put operations which is something that I think will
> help some of these cases.

Ok, I see that there is an issue here where only byte puts at absolute
positions can be performed concurrently (assuming threads know how to
avoid overlapping writes) while, by contrast, cursor-based byte[] stores
require synchronization. Is that the problem in full? Or is there still
more that I have missed?

I certainly agree that a retro-fit to ByteBuffer which provided for
byte[] puts at absolute positions would be of benefit for this proposal.
However, such a retro-fix would be equally as useful for volatile memory
buffers. I am not clear why this omission suggests to you that we should
look at a new, alternative model for managing this particular type of
mapped memory rather than just fixing the current one properly for all
buffers.

>> Also, can you explain what you mean by confinement? (thread
>> confinement?).
> Yes, thread vs. global. I haven't been following Panama close enough to
> say how this is exposed in the API.

Well, my vague stab was obviously in the right ballpark but I'm afraid I
still don't know what baseball is. Could you explain what you mean by
confinement?

>> Also, I don't think I would label this API an attempt to develop a file
>> system. I think that's rather and overblown characterisation of what it
>> does.
> I think you may have mis-read my mail as was just picking another
> example where MBB would be problematic.

Apologies for my very evident confusion here. I'd be very grateful if
you could talk down a notch or two and/or amplify a bit more to help the
hard of thinking.

>> I'm still not quite sure where this reply leaves the JEP though. Shall I
>> update the Risks and Assumptions section to include mention of
>> JDK-5029431 as suggested to Stuart? Is there anything else I can do to
>> progress things?
>>
> It wouldn't do any harm to have this section mention that an alternative
> that exposes a more memory centric API may be possible in the future.
Ok, I'll certainly add that.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From peter.levart at gmail.com  Thu Sep 27 10:28:21 2018
From: peter.levart at gmail.com (Peter Levart)
Date: Thu, 27 Sep 2018 12:28:21 +0200
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
 <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>
 <c4fdca2f-7c6a-7429-6b1a-efdff2994bf6@oracle.com>
 <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com>
 <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com>
 <df03ca64-d68a-ca8b-5309-395672ac7dcb@redhat.com>
 <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com>
 <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com>
Message-ID: <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com>

Hi Andrew,

On 09/27/2018 11:23 AM, Andrew Dinn wrote:
> On 26/09/18 17:00, Alan Bateman wrote:
>> The reason that we've mentioned it a few times is because it's a
>> significant issue. If you have a byte buffer then you can't have
>> different threads accessing different parts of the buffer at the same
>> time, at least not with any of the relative get/put methods as they
>> depend on the buffer position. Sure you can globally synchronize all
>> operations but you'll likely want much finer granularity. This bugbear
>> comes up periodically, particularly when using buffers for cases that
>> they weren't really designed for. Stuart pointed out the lack of
>> absolute bulk get/put operations which is something that I think will
>> help some of these cases.
> Ok, I see that there is an issue here where only byte puts at absolute
> positions can be performed concurrently (assuming threads know how to
> avoid overlapping writes) while, by contrast, cursor-based byte[] stores
> require synchronization. Is that the problem in full? Or is there still
> more that I have missed?
>
> I certainly agree that a retro-fit to ByteBuffer which provided for
> byte[] puts at absolute positions would be of benefit for this proposal.
> However, such a retro-fix would be equally as useful for volatile memory
> buffers. I am not clear why this omission suggests to you that we should
> look at a new, alternative model for managing this particular type of
> mapped memory rather than just fixing the current one properly for all
> buffers.

May I just note that multithreaded bulk operations are kind of possible 
without external synchronization (i.e. locks) if you follow a simple 
protocol:

- never use relative operations on the shared ByteBuffer instance
- never use operations that change internal 
mark/position/limit/byteOrder on the shared ByteBuffer instance
- a concurrent bulk operation on 'bb' consists of:

ByteBuffer myBb = bb.slice(0, bb.capacity());
// use myBb to perform concurrent bulk operation (any operations are 
allowed) and then throw it away or cache it in ThreadLocal

If you combine this with explicit fences and/or atomic 16, 32 and 64 bit 
operations via VarHandles. (see 
MethodHandles.byteBufferViewVarHandle(Class, ByteOrder)), concurrent 
programming with ByteBuffer(s) is entirely possible.

Regards, Peter

>
>>> Also, can you explain what you mean by confinement? (thread
>>> confinement?).
>> Yes, thread vs. global. I haven't been following Panama close enough to
>> say how this is exposed in the API.
> Well, my vague stab was obviously in the right ballpark but I'm afraid I
> still don't know what baseball is. Could you explain what you mean by
> confinement?
>
>>> Also, I don't think I would label this API an attempt to develop a file
>>> system. I think that's rather and overblown characterisation of what it
>>> does.
>> I think you may have mis-read my mail as was just picking another
>> example where MBB would be problematic.
> Apologies for my very evident confusion here. I'd be very grateful if
> you could talk down a notch or two and/or amplify a bit more to help the
> hard of thinking.
>
>>> I'm still not quite sure where this reply leaves the JEP though. Shall I
>>> update the Risks and Assumptions section to include mention of
>>> JDK-5029431 as suggested to Stuart? Is there anything else I can do to
>>> progress things?
>>>
>> It wouldn't do any harm to have this section mention that an alternative
>> that exposes a more memory centric API may be possible in the future.
> Ok, I'll certainly add that.
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander


From adinn at redhat.com  Thu Sep 27 10:41:11 2018
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 27 Sep 2018 11:41:11 +0100
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
 <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>
 <c4fdca2f-7c6a-7429-6b1a-efdff2994bf6@oracle.com>
 <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com>
 <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com>
 <df03ca64-d68a-ca8b-5309-395672ac7dcb@redhat.com>
 <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com>
 <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com>
 <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com>
Message-ID: <65f5660a-807b-426a-7af5-d5f95a6089ba@redhat.com>

Hi Peter,

On 27/09/18 11:28, Peter Levart wrote:

> May I just note that multithreaded bulk operations are kind of possible
> without external synchronization (i.e. locks) if you follow a simple
> protocol:
> 
> - never use relative operations on the shared ByteBuffer instance
> - never use operations that change internal
> mark/position/limit/byteOrder on the shared ByteBuffer instance
> - a concurrent bulk operation on 'bb' consists of:
> 
> ByteBuffer myBb = bb.slice(0, bb.capacity());
> // use myBb to perform concurrent bulk operation (any operations are
> allowed) and then throw it away or cache it in ThreadLocal
> 
> If you combine this with explicit fences and/or atomic 16, 32 and 64 bit
> operations via VarHandles. (see
> MethodHandles.byteBufferViewVarHandle(Class, ByteOrder)), concurrent
> programming with ByteBuffer(s) is entirely possible.
Thank you for the usual expert advice. I am sure it will be of great
help in implementing a persistent data management library over this
JEP's base capability.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From rkennke at redhat.com  Thu Sep 27 12:03:51 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 27 Sep 2018 14:03:51 +0200
Subject: RFR: JDK-8211219: Type inconsistency in
 LIRGenerator::atomic_cmpxchg(..)
Message-ID: <bf1d4fcf-11d1-e107-864a-81bc8135943d@redhat.com>

We spotted this in Shenandoah land. Doesn't seem to be catastrophic, but
would be nice to fix:

In c1_LIRGenerator_x86.cpp, towards the end of
LIRGenerator::atomic_cmpxchg(..) there's this cmove:

  __ cmove(lir_cond_equal, LIR_OprFact::intConst(1),
LIR_OprFact::intConst(0),
           result, type);

which should use T_INT instead of the passed-in type.


Bug:
https://bugs.openjdk.java.net/browse/JDK-8211219
Webrev:
http://cr.openjdk.java.net/~rkennke/JDK-8211219/webrev.00/

Testing: hotspot/jtreg:tier1 ok

Ok?

Roman

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180927/e0f5ead3/signature.asc>

From igor.veresov at oracle.com  Thu Sep 27 13:33:54 2018
From: igor.veresov at oracle.com (Igor Veresov)
Date: Thu, 27 Sep 2018 06:33:54 -0700
Subject: RFR: JDK-8211219: Type inconsistency in
 LIRGenerator::atomic_cmpxchg(..)
In-Reply-To: <bf1d4fcf-11d1-e107-864a-81bc8135943d@redhat.com>
References: <bf1d4fcf-11d1-e107-864a-81bc8135943d@redhat.com>
Message-ID: <CE26BCE2-41CB-46F8-976A-0BE1E3496266@oracle.com>

Looks good.

igor


> On Sep 27, 2018, at 5:03 AM, Roman Kennke <rkennke at redhat.com> wrote:
> 
> We spotted this in Shenandoah land. Doesn't seem to be catastrophic, but
> would be nice to fix:
> 
> In c1_LIRGenerator_x86.cpp, towards the end of
> LIRGenerator::atomic_cmpxchg(..) there's this cmove:
> 
>  __ cmove(lir_cond_equal, LIR_OprFact::intConst(1),
> LIR_OprFact::intConst(0),
>           result, type);
> 
> which should use T_INT instead of the passed-in type.
> 
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8211219
> Webrev:
> http://cr.openjdk.java.net/~rkennke/JDK-8211219/webrev.00/
> 
> Testing: hotspot/jtreg:tier1 ok
> 
> Ok?
> 
> Roman
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180927/d9f3c5e6/attachment.html>

From rwestrel at redhat.com  Thu Sep 27 13:58:51 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 27 Sep 2018 15:58:51 +0200
Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy
In-Reply-To: <27315645-ff46-0e05-a22d-89c166f5bd8b@oracle.com>
References: <dk636u68t4l.fsf@rwestrel.remote.csb>
 <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com>
 <dk67ejaj7lk.fsf@rwestrel.remote.csb>
 <b3329405-1736-5235-b0a2-0962ae7789fa@oracle.com>
 <3524b9d8-f387-a3c4-49c3-eace83126bb7@oracle.com>
 <27315645-ff46-0e05-a22d-89c166f5bd8b@oracle.com>
Message-ID: <dk6a7o36nz8.fsf@rwestrel.remote.csb>


Hi Vladimir,

Thanks for the review and the testing.

> A LOT tests failed.

I did some last minute code refactoring after running tests and managed
to break something. Sorry about that.

The fix is:

diff --git a/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp b/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp
--- a/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp
+++ b/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp
@@ -598,6 +598,7 @@
   ac->set_clonebasic();
   Node* n = kit->gvn().transform(ac);
   if (n == ac) {
+    ac->_adr_type = TypeRawPtr::BOTTOM;
     kit->set_predefined_output_for_runtime_call(ac, ac->in(TypeFunc::Memory), raw_adr_type);
   } else {
     kit->set_all_memory(n);


New webrev:

http://cr.openjdk.java.net/~roland/8210887/webrev.01/

Roland.

From rwestrel at redhat.com  Thu Sep 27 14:36:29 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 27 Sep 2018 16:36:29 +0200
Subject: RFR(S): 8211231: BarrierSetC1::generate_referent_check() confuses
 register allocator
Message-ID: <dk67ej76m8i.fsf@rwestrel.remote.csb>


http://cr.openjdk.java.net/~roland/8211231/webrev.00/

With Shenandoah, we had a crash in compiled code because a value was
restored from a spill in a branch that's not always executed in
BarrierSetC1::generate_referent_check(). That method generates code with
control flow within a basic block. The register allocator is not aware
of that control flow. So if a value that was spilled before is needed in
a branch, the register allocator may decide to restore it and then
assume it's live in a register from there. The fix I propose is to
assign a temp register to that value and load it before any control
flow.

Details (intermediate representation and generated code) are here:

http://mail.openjdk.java.net/pipermail/shenandoah-dev/2018-September/007605.html

Roland.


From igor.veresov at oracle.com  Thu Sep 27 15:40:13 2018
From: igor.veresov at oracle.com (Igor Veresov)
Date: Thu, 27 Sep 2018 08:40:13 -0700
Subject: RFR(S): 8211231: BarrierSetC1::generate_referent_check() confuses
 register allocator
In-Reply-To: <dk67ej76m8i.fsf@rwestrel.remote.csb>
References: <dk67ej76m8i.fsf@rwestrel.remote.csb>
Message-ID: <66CC2DB1-DF8B-4098-8C58-2C4A88DB82E4@oracle.com>

Looks good to me.

igor


> On Sep 27, 2018, at 7:36 AM, Roland Westrelin <rwestrel at redhat.com> wrote:
> 
> 
> http://cr.openjdk.java.net/~roland/8211231/webrev.00/
> 
> With Shenandoah, we had a crash in compiled code because a value was
> restored from a spill in a branch that's not always executed in
> BarrierSetC1::generate_referent_check(). That method generates code with
> control flow within a basic block. The register allocator is not aware
> of that control flow. So if a value that was spilled before is needed in
> a branch, the register allocator may decide to restore it and then
> assume it's live in a register from there. The fix I propose is to
> assign a temp register to that value and load it before any control
> flow.
> 
> Details (intermediate representation and generated code) are here:
> 
> http://mail.openjdk.java.net/pipermail/shenandoah-dev/2018-September/007605.html
> 
> Roland.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180927/28ed215b/attachment-0001.html>

From rwestrel at redhat.com  Thu Sep 27 15:52:32 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 27 Sep 2018 17:52:32 +0200
Subject: RFR(S): 8211232: GraphKit::make_runtime_call() sometimes attaches
 wrong memory state to call
Message-ID: <dk64leb6ipr.fsf@rwestrel.remote.csb>


http://cr.openjdk.java.net/~roland/8211232/webrev.00/

This came up in shenandoah testing with XX:+ExtendedDTraceProbes. 

make_runtime_call() is called through make_dtrace_method_exit() from
Parse::return_current(). Memory state at this point is:

 137 Phi === 135 _ _ 91 [[ 74 141 145 150 152 162 166 168 179 182 187 193 202 211 216 225 228 237 242 258 266 274 282 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: IdentityHashMap::put @ bci:24 
 141 MergeMem === _ 1 137 1 1 279 1 275 282 [[ 142 ]] { - - N279:java/lang/Object+-8 * - N275:narrowoop: java/lang/Object *[int:>=0]+-8 * N282:narrowoop: java/lang/Object *[int:>=0]+any * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: IdentityHashMap::put @ bci:24 

The Phi is a loop phi so not all its inputs are set yet. The following
code in GraphKit::make_runtime_call():

    assert(!wide_out, "narrow in => narrow out"); 
    Node* narrow_mem = memory(adr_type); 
    prev_mem = reset_memory(); 
    map()->set_memory(narrow_mem); 

set the entire memory state to the phi. Next in
GraphKit::set_predefined_input_for_runtime_call():

  Node* memory = reset_memory(); 

causes the current memory state (the Phi) to be transformed which the
GVN transforms to:

 91 Phi === 89 _ _ 73 [[ 137 100 103 105 113 116 118 126 129 131 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: IdentityHashMap::put @ bci:24 

the out of loop memory state and so the wrong state.

Roland.

From rwestrel at redhat.com  Thu Sep 27 16:05:28 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 27 Sep 2018 18:05:28 +0200
Subject: RFR(S): 8211233: MemBarNode::trailing_membar() and
 MemBarNode::leading_membar() need to handle dying subgraphs better
Message-ID: <dk61s9f6i47.fsf@rwestrel.remote.csb>


http://cr.openjdk.java.net/~roland/8211233/webrev.00/

I hit a bug where MemBarNode::leading_membar() doesn't return the right
result because a dying part of the graph between a trailing and a
leading membars is not properly handled.

Roland.

From vladimir.kozlov at oracle.com  Thu Sep 27 19:15:27 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 27 Sep 2018 12:15:27 -0700
Subject: RFR(S): 8211231: BarrierSetC1::generate_referent_check() confuses
 register allocator
In-Reply-To: <dk67ej76m8i.fsf@rwestrel.remote.csb>
References: <dk67ej76m8i.fsf@rwestrel.remote.csb>
Message-ID: <2d430192-9a88-d3f3-40e1-e6d57f32e5f6@oracle.com>

Good.

thanks,
Vladimir

On 9/27/18 7:36 AM, Roland Westrelin wrote:
> 
> http://cr.openjdk.java.net/~roland/8211231/webrev.00/
> 
> With Shenandoah, we had a crash in compiled code because a value was
> restored from a spill in a branch that's not always executed in
> BarrierSetC1::generate_referent_check(). That method generates code with
> control flow within a basic block. The register allocator is not aware
> of that control flow. So if a value that was spilled before is needed in
> a branch, the register allocator may decide to restore it and then
> assume it's live in a register from there. The fix I propose is to
> assign a temp register to that value and load it before any control
> flow.
> 
> Details (intermediate representation and generated code) are here:
> 
> http://mail.openjdk.java.net/pipermail/shenandoah-dev/2018-September/007605.html
> 
> Roland.
> 

From rkennke at redhat.com  Thu Sep 27 20:07:19 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 27 Sep 2018 22:07:19 +0200
Subject: RFR: JDK-8211241: Missing obj equals in TemplateTable::fast_aldc
Message-ID: <bb4cde61-0498-dfeb-748f-574b41fe12d2@redhat.com>

TemplateTable::fast_aldc compares the just-loaded reference with
Universe::the_null_sentinel. If it really is that null-sentinel, we may
get a false negative (with GCs like Shenandoah that allow both
from-space and to-space copies of an object to be around), and thus skip
NULL-ing the ref. In other words, it would allow to get
the-null-sentinel out into the wild as oop which can cause subtle and
not-so-subtle bugs.

Fix is easy, call cmpoop() which re-routes through GC-interface for GCs
that need it:

http://cr.openjdk.java.net/~rkennke/JDK-8211241/webrev.00/

Testing: hotspot/jtreg:tier1

Ok?

Roman

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180927/a385142e/signature.asc>

From daniel.daugherty at oracle.com  Thu Sep 27 20:34:05 2018
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Thu, 27 Sep 2018 16:34:05 -0400
Subject: RFR: JDK-8211241: Missing obj equals in TemplateTable::fast_aldc
In-Reply-To: <bb4cde61-0498-dfeb-748f-574b41fe12d2@redhat.com>
References: <bb4cde61-0498-dfeb-748f-574b41fe12d2@redhat.com>
Message-ID: <5492869d-c359-0a72-ebcd-fed24072ccc9@oracle.com>

On 9/27/18 4:07 PM, Roman Kennke wrote:
> TemplateTable::fast_aldc compares the just-loaded reference with
> Universe::the_null_sentinel. If it really is that null-sentinel, we may
> get a false negative (with GCs like Shenandoah that allow both
> from-space and to-space copies of an object to be around), and thus skip
> NULL-ing the ref. In other words, it would allow to get
> the-null-sentinel out into the wild as oop which can cause subtle and
> not-so-subtle bugs.
>
> Fix is easy, call cmpoop() which re-routes through GC-interface for GCs
> that need it:
>
> http://cr.openjdk.java.net/~rkennke/JDK-8211241/webrev.00/

src/hotspot/cpu/aarch64/templateTable_aarch64.cpp
 ??? No comments.

src/hotspot/cpu/x86/templateTable_x86.cpp
 ??? No comments.

Thumbs up (on the change)!

> Testing: hotspot/jtreg:tier1

Did you use jdk_submit or local testing? I don't expect build
problems but the templateTable_x86.cpp will affect all X86/X64
platforms right?

Dan

>
> Ok?
>
> Roman
>


From rkennke at redhat.com  Thu Sep 27 20:37:42 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 27 Sep 2018 22:37:42 +0200
Subject: RFR: JDK-8211241: Missing obj equals in TemplateTable::fast_aldc
In-Reply-To: <5492869d-c359-0a72-ebcd-fed24072ccc9@oracle.com>
References: <bb4cde61-0498-dfeb-748f-574b41fe12d2@redhat.com>
 <5492869d-c359-0a72-ebcd-fed24072ccc9@oracle.com>
Message-ID: <4b57bab3-723a-eedb-5654-5ce893c8573f@redhat.com>


> On 9/27/18 4:07 PM, Roman Kennke wrote:
>> TemplateTable::fast_aldc compares the just-loaded reference with
>> Universe::the_null_sentinel. If it really is that null-sentinel, we may
>> get a false negative (with GCs like Shenandoah that allow both
>> from-space and to-space copies of an object to be around), and thus skip
>> NULL-ing the ref. In other words, it would allow to get
>> the-null-sentinel out into the wild as oop which can cause subtle and
>> not-so-subtle bugs.
>>
>> Fix is easy, call cmpoop() which re-routes through GC-interface for GCs
>> that need it:
>>
>> http://cr.openjdk.java.net/~rkennke/JDK-8211241/webrev.00/
> 
> src/hotspot/cpu/aarch64/templateTable_aarch64.cpp
> ??? No comments.
> 
> src/hotspot/cpu/x86/templateTable_x86.cpp
> ??? No comments.
> 
> Thumbs up (on the change)!
> 
>> Testing: hotspot/jtreg:tier1
> 
> Did you use jdk_submit or local testing? I don't expect build
> problems but the templateTable_x86.cpp will affect all X86/X64
> platforms right?

I tested locally on x86_64 and aarch64. I always push my stuff through
jdk/submit before pushing to jdk/jdk, usually after or during reviews.

Thanks for reviewing!
Roman

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180927/c7b06efb/signature-0001.asc>

From daniel.daugherty at oracle.com  Thu Sep 27 20:39:01 2018
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Thu, 27 Sep 2018 16:39:01 -0400
Subject: RFR: JDK-8211241: Missing obj equals in TemplateTable::fast_aldc
In-Reply-To: <4b57bab3-723a-eedb-5654-5ce893c8573f@redhat.com>
References: <bb4cde61-0498-dfeb-748f-574b41fe12d2@redhat.com>
 <5492869d-c359-0a72-ebcd-fed24072ccc9@oracle.com>
 <4b57bab3-723a-eedb-5654-5ce893c8573f@redhat.com>
Message-ID: <ccbe2100-e4ee-33e6-4a25-f3d98fce6365@oracle.com>

On 9/27/18 4:37 PM, Roman Kennke wrote:
>> On 9/27/18 4:07 PM, Roman Kennke wrote:
>>> TemplateTable::fast_aldc compares the just-loaded reference with
>>> Universe::the_null_sentinel. If it really is that null-sentinel, we may
>>> get a false negative (with GCs like Shenandoah that allow both
>>> from-space and to-space copies of an object to be around), and thus skip
>>> NULL-ing the ref. In other words, it would allow to get
>>> the-null-sentinel out into the wild as oop which can cause subtle and
>>> not-so-subtle bugs.
>>>
>>> Fix is easy, call cmpoop() which re-routes through GC-interface for GCs
>>> that need it:
>>>
>>> http://cr.openjdk.java.net/~rkennke/JDK-8211241/webrev.00/
>> src/hotspot/cpu/aarch64/templateTable_aarch64.cpp
>>  ??? No comments.
>>
>> src/hotspot/cpu/x86/templateTable_x86.cpp
>>  ??? No comments.
>>
>> Thumbs up (on the change)!
>>
>>> Testing: hotspot/jtreg:tier1
>> Did you use jdk_submit or local testing? I don't expect build
>> problems but the templateTable_x86.cpp will affect all X86/X64
>> platforms right?
> I tested locally on x86_64 and aarch64. I always push my stuff through
> jdk/submit before pushing to jdk/jdk, usually after or during reviews.

Thanks for confirming the testing.

Dan


>
> Thanks for reviewing!
> Roman
>


From vladimir.kozlov at oracle.com  Thu Sep 27 21:13:30 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 27 Sep 2018 14:13:30 -0700
Subject: RFR(S): 8211232: GraphKit::make_runtime_call() sometimes attaches
 wrong memory state to call
In-Reply-To: <dk64leb6ipr.fsf@rwestrel.remote.csb>
References: <dk64leb6ipr.fsf@rwestrel.remote.csb>
Message-ID: <dbe74bbe-dac1-7f3d-8e37-18474f79dda6@oracle.com>

Hi Roland,

I understand that you want to avoid second reset_memory() and I agree.
But I concern about your code for setting input memory for call. Why not to pass narrow_mem from 
memory(adr_type) to set_predefined_input_for_runtime_call() in this case and NULL in others and 
check for NULL to select which memory to set.  memory(adr_type) will check for merge_mem.

Thanks,
Vladimir

On 9/27/18 8:52 AM, Roland Westrelin wrote:
> 
> http://cr.openjdk.java.net/~roland/8211232/webrev.00/
> 
> This came up in shenandoah testing with XX:+ExtendedDTraceProbes.
> 
> make_runtime_call() is called through make_dtrace_method_exit() from
> Parse::return_current(). Memory state at this point is:
> 
>   137 Phi === 135 _ _ 91 [[ 74 141 145 150 152 162 166 168 179 182 187 193 202 211 216 225 228 237 242 258 266 274 282 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: IdentityHashMap::put @ bci:24
>   141 MergeMem === _ 1 137 1 1 279 1 275 282 [[ 142 ]] { - - N279:java/lang/Object+-8 * - N275:narrowoop: java/lang/Object *[int:>=0]+-8 * N282:narrowoop: java/lang/Object *[int:>=0]+any * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: IdentityHashMap::put @ bci:24
> 
> The Phi is a loop phi so not all its inputs are set yet. The following
> code in GraphKit::make_runtime_call():
> 
>      assert(!wide_out, "narrow in => narrow out");
>      Node* narrow_mem = memory(adr_type);
>      prev_mem = reset_memory();
>      map()->set_memory(narrow_mem);
> 
> set the entire memory state to the phi. Next in
> GraphKit::set_predefined_input_for_runtime_call():
> 
>    Node* memory = reset_memory();
> 
> causes the current memory state (the Phi) to be transformed which the
> GVN transforms to:
> 
>   91 Phi === 89 _ _ 73 [[ 137 100 103 105 113 116 118 126 129 131 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: IdentityHashMap::put @ bci:24
> 
> the out of loop memory state and so the wrong state.
> 
> Roland.
> 

From vladimir.kozlov at oracle.com  Thu Sep 27 21:22:31 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 27 Sep 2018 14:22:31 -0700
Subject: RFR(S): 8210389: C2: assert(n->outcnt() != 0 || C->top() == n ||
 n->is_Proj()) failed: No dead instructions after post-alloc
In-Reply-To: <dk68t3y8u59.fsf@rwestrel.remote.csb>
References: <dk68t3y8u59.fsf@rwestrel.remote.csb>
Message-ID: <99bc7d08-614c-9a45-0fdd-2dac729122d9@oracle.com>

Why you are not using subsume_by()?

Thanks,
Vladimir

On 9/18/18 12:47 PM, Roland Westrelin wrote:
> 
> http://cr.openjdk.java.net/~roland/8210389/webrev.00/
> 
> With volatile loads, the trailing membar has an edge to the load. After
> optimizations, that edge can point to a chain of Phis and the membar can
> be the one use that keeps the phis alive. After matching, that required
> edge is converted to a precedence edge. Liveness analysis ignores
> precedence edges, the chain of phis is killed and register allocation
> finds a node with no use.
> 
> As a fix, I propose that, at the end of optimizations, the edge between
> the volatile load's membar and the phis be removed and all dead phis be
> killed. As I understand, that edge is not required for correctness
> because anti dependencies detection code adds a precedence edge between
> a volatile load and its membar if needed. I ran full jcstress on x86 and
> aarch64 with this patch successfully.
> 
> Roland.
> 

From vladimir.kozlov at oracle.com  Thu Sep 27 21:25:19 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 27 Sep 2018 14:25:19 -0700
Subject: RFR(S): 8211233: MemBarNode::trailing_membar() and
 MemBarNode::leading_membar() need to handle dying subgraphs better
In-Reply-To: <dk61s9f6i47.fsf@rwestrel.remote.csb>
References: <dk61s9f6i47.fsf@rwestrel.remote.csb>
Message-ID: <b184ae36-34d4-ccdd-e468-a0b47f1f6c7a@oracle.com>

Looks good.

Thanks,
Vladimir

On 9/27/18 9:05 AM, Roland Westrelin wrote:
> 
> http://cr.openjdk.java.net/~roland/8211233/webrev.00/
> 
> I hit a bug where MemBarNode::leading_membar() doesn't return the right
> result because a dying part of the graph between a trailing and a
> leading membars is not properly handled.
> 
> Roland.
> 

From vladimir.kozlov at oracle.com  Thu Sep 27 21:28:28 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 27 Sep 2018 14:28:28 -0700
Subject: RFR(S): JDK-8191339: [JVMCI] BigInteger compiler intrinsics on
 Graal.
In-Reply-To: <661f70d5-7a09-d181-5669-9841b590c7a3@oracle.com>
References: <28011331-bd43-2c32-dba4-e41879ffe28a@oracle.com>
 <02f34a26-2a97-6a30-384f-115327781aac@oracle.com>
 <661f70d5-7a09-d181-5669-9841b590c7a3@oracle.com>
Message-ID: <343cef9d-15f2-a01d-d3e3-0d4e2bc4cb06@oracle.com>

Good.

Thanks,
Vladimir

On 9/20/18 2:53 AM, Patric Hedlin wrote:
> Hi Vladimir, Andrew,
> 
> Sorry for dropping this after vacation. The testing is a simplistic benchmark (soon to be... I hope) 
> added to Graal (and some directed, a bit to ad hoc, testing not meant for up-streaming to Graal). I 
> also used a simplified version of a more general JVMCI/VM test case for these options only, but it 
> really does only exercise the JVMCI (not the option propagation in Graal or some other JVMCI 
> "client"), making it less useful.
> 
> But in essence, Graal is the test-case.
> 
> 
> On 2018-06-22 18:04, Vladimir Kozlov wrote:
>> Hi Patric,
>>
>> Do you need Graal changes for this? Or it already has these intrinsics and the only problem is 
>> these flags were not set in vm_version_x86.cpp?
> 
> No further changes have been made to Graal.
> 
>>
>> Small note. In vm_version_x86.cpp previous code has already COMPILER2_OR_JVMCI check. You can 
>> remove previous #endif and new #ifdef. Also change comment for closing #endif at line 1080 to // 
>> COMPILER2_OR_JVMCI
>>
>> 1080 #endif // COMPILER2
> 
> You are right (actually the intended webrev) and it should look correct now (just a tad old).
> 
> Best regards,
> Patric
>>
>> What testing you did?
>>
>> Thanks,
>> Vladimir
>>
>> On 6/21/18 8:26 AM, Patric Hedlin wrote:
>>> Dear all,
>>>
>>> I would like to ask for help to review the following change/update:
>>>
>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8191339
>>>
>>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8191339/
>>>
>>>
>>> 8191339: [JVMCI] BigInteger compiler intrinsics on Graal.
>>>
>>> ???? Enabling BigInteger intrinsics via JVMCI.
>>>
>>>
>>>
>>> Best regards,
>>> Patric
> 

From sandhya.viswanathan at intel.com  Thu Sep 27 21:37:15 2018
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Thu, 27 Sep 2018 21:37:15 +0000
Subject: RFR(S):8211251:Default mask register for avx512 instructions
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12894@FMSMSX126.amr.corp.intel.com>


Please find below a patch which cleans up K1 mask register handling for AVX 512 instructions.
Currently unmasked instructions are encoded using k1 register which requires k1 register to be initialized properly and also reinitialized across JNI and Runtime calls.
This patch encodes AVX 512 instructions as unmasked instruction with K0 encoding where the explicit mask register is not specified.

RFE: https://bugs.openjdk.java.net/browse/JDK-8211251
Webrev: http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.00/

Best Regards,
Sandhya


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180927/8ec91fad/attachment-0001.html>

From vladimir.kozlov at oracle.com  Thu Sep 27 22:24:03 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 27 Sep 2018 15:24:03 -0700
Subject: RFR(S):8211251:Default mask register for avx512 instructions
In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12894@FMSMSX126.amr.corp.intel.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12894@FMSMSX126.amr.corp.intel.com>
Message-ID: <badfeb56-3333-4eac-6d6c-6b452ac7bd2a@oracle.com>

Looks good except PostLoopMultiversioning flag guarded changes in macroAssembler_x86.cpp which 
should be explained too.

Thanks,
Vladimir

On 9/27/18 2:37 PM, Viswanathan, Sandhya wrote:
> Please find below a patch which cleans up K1 mask register handling for AVX 512 instructions.
> 
> Currently unmasked instructions are encoded using k1 register which requires k1 register to be 
> initialized properly and also reinitialized across JNI and Runtime calls.
> 
> This patch encodes AVX 512 instructions as unmasked instruction with K0 encoding where the explicit 
> mask register is not specified.
> 
> RFE: https://bugs.openjdk.java.net/browse/JDK-8211251
> 
> Webrev: http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.00/ 
> <http://cr.openjdk.java.net/%7Evdeshpande/k_register/webrev.00/>
> 
> Best Regards,
> 
> Sandhya
> 

From sandhya.viswanathan at intel.com  Thu Sep 27 23:40:19 2018
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Thu, 27 Sep 2018 23:40:19 +0000
Subject: RFR(S):8211251:Default mask register for avx512 instructions
In-Reply-To: <badfeb56-3333-4eac-6d6c-6b452ac7bd2a@oracle.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12894@FMSMSX126.amr.corp.intel.com>
 <badfeb56-3333-4eac-6d6c-6b452ac7bd2a@oracle.com>
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF129C6@FMSMSX126.amr.corp.intel.com>

Hi Vladimir,

As you know the PostLoopMultiversioning needs to be fully redesigned and is currently disabled. 
That particular code should only be exercised when PostLoopMultiversioning is on.
I could change it with an assert statement if that looks ok to you.

Please let me know.

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Thursday, September 27, 2018 3:24 PM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(S):8211251:Default mask register for avx512 instructions

Looks good except PostLoopMultiversioning flag guarded changes in macroAssembler_x86.cpp which 
should be explained too.

Thanks,
Vladimir

On 9/27/18 2:37 PM, Viswanathan, Sandhya wrote:
> Please find below a patch which cleans up K1 mask register handling for AVX 512 instructions.
> 
> Currently unmasked instructions are encoded using k1 register which requires k1 register to be 
> initialized properly and also reinitialized across JNI and Runtime calls.
> 
> This patch encodes AVX 512 instructions as unmasked instruction with K0 encoding where the explicit 
> mask register is not specified.
> 
> RFE: https://bugs.openjdk.java.net/browse/JDK-8211251
> 
> Webrev: http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.00/ 
> <http://cr.openjdk.java.net/%7Evdeshpande/k_register/webrev.00/>
> 
> Best Regards,
> 
> Sandhya
> 

From vladimir.kozlov at oracle.com  Thu Sep 27 23:49:45 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 27 Sep 2018 16:49:45 -0700
Subject: RFR(S):8211251:Default mask register for avx512 instructions
In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF129C6@FMSMSX126.amr.corp.intel.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12894@FMSMSX126.amr.corp.intel.com>
 <badfeb56-3333-4eac-6d6c-6b452ac7bd2a@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF129C6@FMSMSX126.amr.corp.intel.com>
Message-ID: <1a675691-9178-eb7e-f517-72a2a9e0519c@oracle.com>

Use guarantee() instead of assert so if someone to try to use it with product JDK it will fail.

Thanks,
Vladimir

On 9/27/18 4:40 PM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> As you know the PostLoopMultiversioning needs to be fully redesigned and is currently disabled.
> That particular code should only be exercised when PostLoopMultiversioning is on.
> I could change it with an assert statement if that looks ok to you.
> 
> Please let me know.
> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, September 27, 2018 3:24 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR(S):8211251:Default mask register for avx512 instructions
> 
> Looks good except PostLoopMultiversioning flag guarded changes in macroAssembler_x86.cpp which
> should be explained too.
> 
> Thanks,
> Vladimir
> 
> On 9/27/18 2:37 PM, Viswanathan, Sandhya wrote:
>> Please find below a patch which cleans up K1 mask register handling for AVX 512 instructions.
>>
>> Currently unmasked instructions are encoded using k1 register which requires k1 register to be
>> initialized properly and also reinitialized across JNI and Runtime calls.
>>
>> This patch encodes AVX 512 instructions as unmasked instruction with K0 encoding where the explicit
>> mask register is not specified.
>>
>> RFE: https://bugs.openjdk.java.net/browse/JDK-8211251
>>
>> Webrev: http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.00/
>> <http://cr.openjdk.java.net/%7Evdeshpande/k_register/webrev.00/>
>>
>> Best Regards,
>>
>> Sandhya
>>

From sandhya.viswanathan at intel.com  Fri Sep 28 00:46:48 2018
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 28 Sep 2018 00:46:48 +0000
Subject: RFR(S):8211251:Default mask register for avx512 instructions
In-Reply-To: <1a675691-9178-eb7e-f517-72a2a9e0519c@oracle.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12894@FMSMSX126.amr.corp.intel.com>
 <badfeb56-3333-4eac-6d6c-6b452ac7bd2a@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF129C6@FMSMSX126.amr.corp.intel.com>
 <1a675691-9178-eb7e-f517-72a2a9e0519c@oracle.com>
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12CE1@FMSMSX126.amr.corp.intel.com>

Hi Vladimir,

Please find the updated webrev with this change at:
http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.01/

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Thursday, September 27, 2018 4:50 PM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(S):8211251:Default mask register for avx512 instructions

Use guarantee() instead of assert so if someone to try to use it with product JDK it will fail.

Thanks,
Vladimir

On 9/27/18 4:40 PM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> As you know the PostLoopMultiversioning needs to be fully redesigned and is currently disabled.
> That particular code should only be exercised when PostLoopMultiversioning is on.
> I could change it with an assert statement if that looks ok to you.
> 
> Please let me know.
> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, September 27, 2018 3:24 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot 
> compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR(S):8211251:Default mask register for avx512 
> instructions
> 
> Looks good except PostLoopMultiversioning flag guarded changes in 
> macroAssembler_x86.cpp which should be explained too.
> 
> Thanks,
> Vladimir
> 
> On 9/27/18 2:37 PM, Viswanathan, Sandhya wrote:
>> Please find below a patch which cleans up K1 mask register handling for AVX 512 instructions.
>>
>> Currently unmasked instructions are encoded using k1 register which 
>> requires k1 register to be initialized properly and also reinitialized across JNI and Runtime calls.
>>
>> This patch encodes AVX 512 instructions as unmasked instruction with 
>> K0 encoding where the explicit mask register is not specified.
>>
>> RFE: https://bugs.openjdk.java.net/browse/JDK-8211251
>>
>> Webrev: http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.00/
>> <http://cr.openjdk.java.net/%7Evdeshpande/k_register/webrev.00/>
>>
>> Best Regards,
>>
>> Sandhya
>>

From erik.osterlund at oracle.com  Fri Sep 28 01:21:26 2018
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Fri, 28 Sep 2018 03:21:26 +0200
Subject: RFR: JDK-8211219: Type inconsistency in
 LIRGenerator::atomic_cmpxchg(..)
In-Reply-To: <bf1d4fcf-11d1-e107-864a-81bc8135943d@redhat.com>
References: <bf1d4fcf-11d1-e107-864a-81bc8135943d@redhat.com>
Message-ID: <a3acbd19-7152-6635-db67-676b8454fdd2@oracle.com>

Hi Roman,

Looks good.

Thanks,
/Erik

On 2018-09-27 14:03, Roman Kennke wrote:
> We spotted this in Shenandoah land. Doesn't seem to be catastrophic, but
> would be nice to fix:
>
> In c1_LIRGenerator_x86.cpp, towards the end of
> LIRGenerator::atomic_cmpxchg(..) there's this cmove:
>
>    __ cmove(lir_cond_equal, LIR_OprFact::intConst(1),
> LIR_OprFact::intConst(0),
>             result, type);
>
> which should use T_INT instead of the passed-in type.
>
>
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8211219
> Webrev:
> http://cr.openjdk.java.net/~rkennke/JDK-8211219/webrev.00/
>
> Testing: hotspot/jtreg:tier1 ok
>
> Ok?
>
> Roman
>


From vladimir.kozlov at oracle.com  Fri Sep 28 01:37:48 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 27 Sep 2018 18:37:48 -0700
Subject: RFR(S):8211251:Default mask register for avx512 instructions
In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12CE1@FMSMSX126.amr.corp.intel.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12894@FMSMSX126.amr.corp.intel.com>
 <badfeb56-3333-4eac-6d6c-6b452ac7bd2a@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF129C6@FMSMSX126.amr.corp.intel.com>
 <1a675691-9178-eb7e-f517-72a2a9e0519c@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12CE1@FMSMSX126.amr.corp.intel.com>
Message-ID: <a6c24555-0271-10a0-5b45-2d566fc50e8a@oracle.com>

This looks fine. I assume you did testing. It only affects avx512 machines - right?

Thanks,
Vladimir

On 9/27/18 5:46 PM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> Please find the updated webrev with this change at:
> http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.01/
> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, September 27, 2018 4:50 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR(S):8211251:Default mask register for avx512 instructions
> 
> Use guarantee() instead of assert so if someone to try to use it with product JDK it will fail.
> 
> Thanks,
> Vladimir
> 
> On 9/27/18 4:40 PM, Viswanathan, Sandhya wrote:
>> Hi Vladimir,
>>
>> As you know the PostLoopMultiversioning needs to be fully redesigned and is currently disabled.
>> That particular code should only be exercised when PostLoopMultiversioning is on.
>> I could change it with an assert statement if that looks ok to you.
>>
>> Please let me know.
>>
>> Best Regards,
>> Sandhya
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Thursday, September 27, 2018 3:24 PM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot
>> compiler <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR(S):8211251:Default mask register for avx512
>> instructions
>>
>> Looks good except PostLoopMultiversioning flag guarded changes in
>> macroAssembler_x86.cpp which should be explained too.
>>
>> Thanks,
>> Vladimir
>>
>> On 9/27/18 2:37 PM, Viswanathan, Sandhya wrote:
>>> Please find below a patch which cleans up K1 mask register handling for AVX 512 instructions.
>>>
>>> Currently unmasked instructions are encoded using k1 register which
>>> requires k1 register to be initialized properly and also reinitialized across JNI and Runtime calls.
>>>
>>> This patch encodes AVX 512 instructions as unmasked instruction with
>>> K0 encoding where the explicit mask register is not specified.
>>>
>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8211251
>>>
>>> Webrev: http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.00/
>>> <http://cr.openjdk.java.net/%7Evdeshpande/k_register/webrev.00/>
>>>
>>> Best Regards,
>>>
>>> Sandhya
>>>

From vladimir.kozlov at oracle.com  Fri Sep 28 01:50:34 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 27 Sep 2018 18:50:34 -0700
Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy
In-Reply-To: <dk6a7o36nz8.fsf@rwestrel.remote.csb>
References: <dk636u68t4l.fsf@rwestrel.remote.csb>
 <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com>
 <dk67ejaj7lk.fsf@rwestrel.remote.csb>
 <b3329405-1736-5235-b0a2-0962ae7789fa@oracle.com>
 <3524b9d8-f387-a3c4-49c3-eace83126bb7@oracle.com>
 <27315645-ff46-0e05-a22d-89c166f5bd8b@oracle.com>
 <dk6a7o36nz8.fsf@rwestrel.remote.csb>
Message-ID: <39b8c3c2-d412-06fa-9162-1d80f067ed11@oracle.com>

gc/epsilon/TestManyThreads.java test failed on SPARC
I add information and replay file to bug report.

Vladimir

On 9/27/18 6:58 AM, Roland Westrelin wrote:
> 
> Hi Vladimir,
> 
> Thanks for the review and the testing.
> 
>> A LOT tests failed.
> 
> I did some last minute code refactoring after running tests and managed
> to break something. Sorry about that.
> 
> The fix is:
> 
> diff --git a/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp b/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp
> --- a/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp
> +++ b/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp
> @@ -598,6 +598,7 @@
>     ac->set_clonebasic();
>     Node* n = kit->gvn().transform(ac);
>     if (n == ac) {
> +    ac->_adr_type = TypeRawPtr::BOTTOM;
>       kit->set_predefined_output_for_runtime_call(ac, ac->in(TypeFunc::Memory), raw_adr_type);
>     } else {
>       kit->set_all_memory(n);
> 
> 
> New webrev:
> 
> http://cr.openjdk.java.net/~roland/8210887/webrev.01/
> 
> Roland.
> 

From stuart.marks at oracle.com  Fri Sep 28 05:51:45 2018
From: stuart.marks at oracle.com (Stuart Marks)
Date: Thu, 27 Sep 2018 22:51:45 -0700
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <65f5660a-807b-426a-7af5-d5f95a6089ba@redhat.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
 <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>
 <c4fdca2f-7c6a-7429-6b1a-efdff2994bf6@oracle.com>
 <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com>
 <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com>
 <df03ca64-d68a-ca8b-5309-395672ac7dcb@redhat.com>
 <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com>
 <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com>
 <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com>
 <65f5660a-807b-426a-7af5-d5f95a6089ba@redhat.com>
Message-ID: <bbeec978-75ea-5fb9-dfc4-aaa35e7ae510@oracle.com>

Hi Andrew,

Let me first stay that this issue of "ByteBuffer might not be the right answer" 
is something of a digression from the JEP discussion. I think the JEP should 
proceed forward using MBB with the API that you and Alan had discussed 
previously. At most, the discussion of the "right thing" issue might affect a 
side note in the JEP text about possible limitations and future directions of 
this effort. However, it's not a blocker to the JEP making progress as far as 
I'm concerned.

With that in mind, I'll discuss the issue of multithreaded access to ByteBuffers 
and how this bears on whether buffers are or aren't the "right answer." There 
are actually several issues that figure into the "right answer" analysis. In 
this message, though, I'll just focus on the issue of multithreaded access.

To recap (possibly for the benefit of other readers) the Buffer class doc has 
the following statement:

     Buffers are not safe for use by multiple concurrent threads. If a buffer
     is to be used by more than one thread then access to the buffer should be
     controlled by appropriate synchronization.

Buffers are primarily designed for sequential operations such as I/O or codeset 
conversion. Typical buffer operations set the mark, position, and limit before 
initiating the operation. If the operation completes partially -- not uncommon 
with I/O or codeset conversion -- the position is updated so that the operation 
can be resumed easily from where it left off.

The fact that buffers not only contain the data being operated upon but also 
mutable state information such as mark/position/limit makes it difficult to have 
multiple threads operate on different parts of the same buffer. Each thread 
would have to lock around setting the position and limit and performing the 
operation, preventing any parallelism. The typical way to deal with this is to 
create multiple buffer slices, one per thread. Each slice has its own 
mark/position/limit values but shares the same backing data.

We can avoid the need for this by adding absolute bulk operations, right?

Let's suppose we were to add something like this (considering ByteBuffer only, 
setting the buffer views aside):

     get(int srcOff, byte[] dst, int dstOff, int length)
     put(int dstOff, byte[] src, int srcOff, int length)

Each thread can perform its operations on a different part of the buffer, in 
parallel, without interference from the others. Presumably these operations 
don't read or write the mark and position. Oh, wait. The existing absolute put 
and get overloads *do* respect the buffer's limit, so the absolute bulk 
operations ought to as well. This means they do depend on shared state. (I guess 
we could make the absolute bulk ops not respect the limit, but that seems 
inconsistent.)

OK, let's adopt an approach similar to what was described by Peter Levart a 
couple messages upthread, where a) there is an initialization step where various 
things including the limit are set properly; b) the buffer is published to the 
worker threads properly, e.g., using a lock or other suitable memory operation; 
and c) all worker threads agree only to use absolute operations and to avoid 
relative operations.

Now suppose the threads have completed their work and you want to, say, write 
the buffer's contents to a channel. You have to carefully make sure the threads 
are all finished and properly publish their results back to some central thread, 
have that central thread receive the results, set the position and limit, after 
which the central thread can initiate the I/O operation.

This can certainly be made to work.

But note what we just did. We now have an API where:

  - there are different "phases", where in one phase all the methods work, but 
in another phase only certain methods work (otherwise it breaks silently);

  - you have to carefully control all the code to ensure that the wrong methods 
aren't called when the buffer is in the wrong phase (otherwise it breaks 
silently); and

  - you can't hand off the buffer to a library (3rd party or JDK) without 
carefully orchestrating a transition into the right phase (otherwise it breaks 
silently).

Frankly, this is pretty crappy. It's certainly possible to work around it. 
People do, and it is painful, and they complain about it up and down all day 
long (and rightfully so).

Note that this discussion is based primarily on looking at the ByteBuffer API. I 
have not done extensive investigation of the impact of the various buffer views 
(IntBuffer, LongBuffer, etc.), nor have I looked thoroughly at the 
implementations. I have no doubt that we will run into additional issues when we 
do those investigations.

If we were designing an API to support multi-threaded access to memory regions, 
it would almost certainly look nothing like the buffer API. This is what Alan 
means by "buffers might not be the right answer." As things stand, it appears 
quite difficult to me to fix the multi-threaded access problem without turning 
buffers into something they aren't, or fragmenting the API in some complex and 
uncomfortable way.

Finally, note that this is not an argument against adding bulk absolute 
operations! I think we should probably go ahead and do that anyway. But let's 
not fool ourselves into thinking that bulk absolute operations solve the 
multi-threaded buffer access problem.

s'marks


From peter.levart at gmail.com  Fri Sep 28 07:21:13 2018
From: peter.levart at gmail.com (Peter Levart)
Date: Fri, 28 Sep 2018 09:21:13 +0200
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <bbeec978-75ea-5fb9-dfc4-aaa35e7ae510@oracle.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
 <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>
 <c4fdca2f-7c6a-7429-6b1a-efdff2994bf6@oracle.com>
 <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com>
 <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com>
 <df03ca64-d68a-ca8b-5309-395672ac7dcb@redhat.com>
 <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com>
 <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com>
 <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com>
 <65f5660a-807b-426a-7af5-d5f95a6089ba@redhat.com>
 <bbeec978-75ea-5fb9-dfc4-aaa35e7ae510@oracle.com>
Message-ID: <b0ff5779-b9ba-731f-49d7-a920164c5054@gmail.com>

Hi Stuart,

I mostly agree with your assessment about the suitability of the 
ByteBuffer API for nice multithreaded use. What would such API look 
like? I think pretty much like ByteBuffer but without things that mutate 
mark/position/limit/ByteOrder. A stripped-down ByteBuffer API therefore. 
That would be in my opinion the most low-level API possible. If you add 
things to such API that coordinate multithreaded access to the 
underlying memory, you are already creating a concurrent data structure 
for a particular set of use cases, which might not cover all possible 
use cases or be sub-optimal at some of them. So I think this is better 
layered on top of such API not built into it. Low-level multithreaded 
access to memory is, in my opinion, always going to be "unsafe" from the 
standpoint of coordination. It's not only the 
mark/position/limit/ByteOrder that is not multithreaded-friendly about 
ByteBuffer API, but the underlying memory too. It would be nice if 
mark/position/limit/ByteOrder weren't in the way though.

Regards, Peter

On 09/28/2018 07:51 AM, Stuart Marks wrote:
> Hi Andrew,
>
> Let me first stay that this issue of "ByteBuffer might not be the 
> right answer" is something of a digression from the JEP discussion. I 
> think the JEP should proceed forward using MBB with the API that you 
> and Alan had discussed previously. At most, the discussion of the 
> "right thing" issue might affect a side note in the JEP text about 
> possible limitations and future directions of this effort. However, 
> it's not a blocker to the JEP making progress as far as I'm concerned.
>
> With that in mind, I'll discuss the issue of multithreaded access to 
> ByteBuffers and how this bears on whether buffers are or aren't the 
> "right answer." There are actually several issues that figure into the 
> "right answer" analysis. In this message, though, I'll just focus on 
> the issue of multithreaded access.
>
> To recap (possibly for the benefit of other readers) the Buffer class 
> doc has the following statement:
>
> ??? Buffers are not safe for use by multiple concurrent threads. If a 
> buffer
> ??? is to be used by more than one thread then access to the buffer 
> should be
> ??? controlled by appropriate synchronization.
>
> Buffers are primarily designed for sequential operations such as I/O 
> or codeset conversion. Typical buffer operations set the mark, 
> position, and limit before initiating the operation. If the operation 
> completes partially -- not uncommon with I/O or codeset conversion -- 
> the position is updated so that the operation can be resumed easily 
> from where it left off.
>
> The fact that buffers not only contain the data being operated upon 
> but also mutable state information such as mark/position/limit makes 
> it difficult to have multiple threads operate on different parts of 
> the same buffer. Each thread would have to lock around setting the 
> position and limit and performing the operation, preventing any 
> parallelism. The typical way to deal with this is to create multiple 
> buffer slices, one per thread. Each slice has its own 
> mark/position/limit values but shares the same backing data.
>
> We can avoid the need for this by adding absolute bulk operations, right?
>
> Let's suppose we were to add something like this (considering 
> ByteBuffer only, setting the buffer views aside):
>
> ??? get(int srcOff, byte[] dst, int dstOff, int length)
> ??? put(int dstOff, byte[] src, int srcOff, int length)
>
> Each thread can perform its operations on a different part of the 
> buffer, in parallel, without interference from the others. Presumably 
> these operations don't read or write the mark and position. Oh, wait. 
> The existing absolute put and get overloads *do* respect the buffer's 
> limit, so the absolute bulk operations ought to as well. This means 
> they do depend on shared state. (I guess we could make the absolute 
> bulk ops not respect the limit, but that seems inconsistent.)
>
> OK, let's adopt an approach similar to what was described by Peter 
> Levart a couple messages upthread, where a) there is an initialization 
> step where various things including the limit are set properly; b) the 
> buffer is published to the worker threads properly, e.g., using a lock 
> or other suitable memory operation; and c) all worker threads agree 
> only to use absolute operations and to avoid relative operations.
>
> Now suppose the threads have completed their work and you want to, 
> say, write the buffer's contents to a channel. You have to carefully 
> make sure the threads are all finished and properly publish their 
> results back to some central thread, have that central thread receive 
> the results, set the position and limit, after which the central 
> thread can initiate the I/O operation.
>
> This can certainly be made to work.
>
> But note what we just did. We now have an API where:
>
> ?- there are different "phases", where in one phase all the methods 
> work, but in another phase only certain methods work (otherwise it 
> breaks silently);
>
> ?- you have to carefully control all the code to ensure that the wrong 
> methods aren't called when the buffer is in the wrong phase (otherwise 
> it breaks silently); and
>
> ?- you can't hand off the buffer to a library (3rd party or JDK) 
> without carefully orchestrating a transition into the right phase 
> (otherwise it breaks silently).
>
> Frankly, this is pretty crappy. It's certainly possible to work around 
> it. People do, and it is painful, and they complain about it up and 
> down all day long (and rightfully so).
>
> Note that this discussion is based primarily on looking at the 
> ByteBuffer API. I have not done extensive investigation of the impact 
> of the various buffer views (IntBuffer, LongBuffer, etc.), nor have I 
> looked thoroughly at the implementations. I have no doubt that we will 
> run into additional issues when we do those investigations.
>
> If we were designing an API to support multi-threaded access to memory 
> regions, it would almost certainly look nothing like the buffer API. 
> This is what Alan means by "buffers might not be the right answer." As 
> things stand, it appears quite difficult to me to fix the 
> multi-threaded access problem without turning buffers into something 
> they aren't, or fragmenting the API in some complex and uncomfortable 
> way.
>
> Finally, note that this is not an argument against adding bulk 
> absolute operations! I think we should probably go ahead and do that 
> anyway. But let's not fool ourselves into thinking that bulk absolute 
> operations solve the multi-threaded buffer access problem.
>
> s'marks
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180928/597632a1/attachment-0001.html>

From rwestrel at redhat.com  Fri Sep 28 07:27:17 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 28 Sep 2018 09:27:17 +0200
Subject: RFR(S): 8211231: BarrierSetC1::generate_referent_check() confuses
 register allocator
In-Reply-To: <2d430192-9a88-d3f3-40e1-e6d57f32e5f6@oracle.com>
References: <dk67ej76m8i.fsf@rwestrel.remote.csb>
 <2d430192-9a88-d3f3-40e1-e6d57f32e5f6@oracle.com>
Message-ID: <dk6y3bm5bfu.fsf@rwestrel.remote.csb>


Thanks Igor and Vladimir for the reviews.

Roland.

From nigro.fra at gmail.com  Fri Sep 28 07:38:48 2018
From: nigro.fra at gmail.com (Francesco Nigro)
Date: Fri, 28 Sep 2018 09:38:48 +0200
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <b0ff5779-b9ba-731f-49d7-a920164c5054@gmail.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
 <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>
 <c4fdca2f-7c6a-7429-6b1a-efdff2994bf6@oracle.com>
 <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com>
 <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com>
 <df03ca64-d68a-ca8b-5309-395672ac7dcb@redhat.com>
 <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com>
 <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com>
 <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com>
 <65f5660a-807b-426a-7af5-d5f95a6089ba@redhat.com>
 <bbeec978-75ea-5fb9-dfc4-aaa35e7ae510@oracle.com>
 <b0ff5779-b9ba-731f-49d7-a920164c5054@gmail.com>
Message-ID: <CAKxGtTWcunzwM=7K21Sk=VKrKfwVzWbAvONBK1nZAYqP-zihsg@mail.gmail.com>

Hi guys!

I'm one of the mentioned devs (like many others) that are using external
(and unsafe) APIs to concurrent access ByteBuffer's content and a developer
of a messaging broker's journal
that would benefit by this JEP :)
Re concurrent access API, how it looks
https://github.com/real-logic/agrona/blob/master/agrona/src/main/java/org/agrona/concurrent/AtomicBuffer.java
?

note:
I don't know how's considered to appear in these discussions without
presenting myself and I hope to not be OT, but both this JEP and the
comments around are so interesting
that I couldn't resist: I apologize if I'm not respecting some rule on it

Thanks for the hard work,
Francesco

Il giorno ven 28 set 2018 alle ore 09:21 Peter Levart <
peter.levart at gmail.com> ha scritto:

> Hi Stuart,
>
> I mostly agree with your assessment about the suitability of the
> ByteBuffer API for nice multithreaded use. What would such API look like? I
> think pretty much like ByteBuffer but without things that mutate
> mark/position/limit/ByteOrder. A stripped-down ByteBuffer API therefore.
> That would be in my opinion the most low-level API possible. If you add
> things to such API that coordinate multithreaded access to the underlying
> memory, you are already creating a concurrent data structure for a
> particular set of use cases, which might not cover all possible use cases
> or be sub-optimal at some of them. So I think this is better layered on top
> of such API not built into it. Low-level multithreaded access to memory is,
> in my opinion, always going to be "unsafe" from the standpoint of
> coordination. It's not only the mark/position/limit/ByteOrder that is not
> multithreaded-friendly about ByteBuffer API, but the underlying memory too.
> It would be nice if mark/position/limit/ByteOrder weren't in the way though.
>
> Regards, Peter
>
>
> On 09/28/2018 07:51 AM, Stuart Marks wrote:
>
> Hi Andrew,
>
> Let me first stay that this issue of "ByteBuffer might not be the right
> answer" is something of a digression from the JEP discussion. I think the
> JEP should proceed forward using MBB with the API that you and Alan had
> discussed previously. At most, the discussion of the "right thing" issue
> might affect a side note in the JEP text about possible limitations and
> future directions of this effort. However, it's not a blocker to the JEP
> making progress as far as I'm concerned.
>
> With that in mind, I'll discuss the issue of multithreaded access to
> ByteBuffers and how this bears on whether buffers are or aren't the "right
> answer." There are actually several issues that figure into the "right
> answer" analysis. In this message, though, I'll just focus on the issue of
> multithreaded access.
>
> To recap (possibly for the benefit of other readers) the Buffer class doc
> has the following statement:
>
>     Buffers are not safe for use by multiple concurrent threads. If a
> buffer
>     is to be used by more than one thread then access to the buffer should
> be
>     controlled by appropriate synchronization.
>
> Buffers are primarily designed for sequential operations such as I/O or
> codeset conversion. Typical buffer operations set the mark, position, and
> limit before initiating the operation. If the operation completes partially
> -- not uncommon with I/O or codeset conversion -- the position is updated
> so that the operation can be resumed easily from where it left off.
>
> The fact that buffers not only contain the data being operated upon but
> also mutable state information such as mark/position/limit makes it
> difficult to have multiple threads operate on different parts of the same
> buffer. Each thread would have to lock around setting the position and
> limit and performing the operation, preventing any parallelism. The typical
> way to deal with this is to create multiple buffer slices, one per thread.
> Each slice has its own mark/position/limit values but shares the same
> backing data.
>
> We can avoid the need for this by adding absolute bulk operations, right?
>
> Let's suppose we were to add something like this (considering ByteBuffer
> only, setting the buffer views aside):
>
>     get(int srcOff, byte[] dst, int dstOff, int length)
>     put(int dstOff, byte[] src, int srcOff, int length)
>
> Each thread can perform its operations on a different part of the buffer,
> in parallel, without interference from the others. Presumably these
> operations don't read or write the mark and position. Oh, wait. The
> existing absolute put and get overloads *do* respect the buffer's limit, so
> the absolute bulk operations ought to as well. This means they do depend on
> shared state. (I guess we could make the absolute bulk ops not respect the
> limit, but that seems inconsistent.)
>
> OK, let's adopt an approach similar to what was described by Peter Levart
> a couple messages upthread, where a) there is an initialization step where
> various things including the limit are set properly; b) the buffer is
> published to the worker threads properly, e.g., using a lock or other
> suitable memory operation; and c) all worker threads agree only to use
> absolute operations and to avoid relative operations.
>
> Now suppose the threads have completed their work and you want to, say,
> write the buffer's contents to a channel. You have to carefully make sure
> the threads are all finished and properly publish their results back to
> some central thread, have that central thread receive the results, set the
> position and limit, after which the central thread can initiate the I/O
> operation.
>
> This can certainly be made to work.
>
> But note what we just did. We now have an API where:
>
>  - there are different "phases", where in one phase all the methods work,
> but in another phase only certain methods work (otherwise it breaks
> silently);
>
>  - you have to carefully control all the code to ensure that the wrong
> methods aren't called when the buffer is in the wrong phase (otherwise it
> breaks silently); and
>
>  - you can't hand off the buffer to a library (3rd party or JDK) without
> carefully orchestrating a transition into the right phase (otherwise it
> breaks silently).
>
> Frankly, this is pretty crappy. It's certainly possible to work around it.
> People do, and it is painful, and they complain about it up and down all
> day long (and rightfully so).
>
> Note that this discussion is based primarily on looking at the ByteBuffer
> API. I have not done extensive investigation of the impact of the various
> buffer views (IntBuffer, LongBuffer, etc.), nor have I looked thoroughly at
> the implementations. I have no doubt that we will run into additional
> issues when we do those investigations.
>
> If we were designing an API to support multi-threaded access to memory
> regions, it would almost certainly look nothing like the buffer API. This
> is what Alan means by "buffers might not be the right answer." As things
> stand, it appears quite difficult to me to fix the multi-threaded access
> problem without turning buffers into something they aren't, or fragmenting
> the API in some complex and uncomfortable way.
>
> Finally, note that this is not an argument against adding bulk absolute
> operations! I think we should probably go ahead and do that anyway. But
> let's not fool ourselves into thinking that bulk absolute operations solve
> the multi-threaded buffer access problem.
>
> s'marks
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180928/931606f5/attachment.html>

From rwestrel at redhat.com  Fri Sep 28 08:23:15 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 28 Sep 2018 10:23:15 +0200
Subject: RFR(S): 8210389: C2: assert(n->outcnt() != 0 || C->top() == n ||
 n->is_Proj()) failed: No dead instructions after post-alloc
In-Reply-To: <99bc7d08-614c-9a45-0fdd-2dac729122d9@oracle.com>
References: <dk68t3y8u59.fsf@rwestrel.remote.csb>
 <99bc7d08-614c-9a45-0fdd-2dac729122d9@oracle.com>
Message-ID: <dk6va6q58uk.fsf@rwestrel.remote.csb>


Hi Vladimir,

Thanks for looking at this.

> Why you are not using subsume_by()?

Instead of disconnect_inputs()? Or as a replacement for the loop? I want
all nodes that become useless as a result of the edge removal to be
disconnected. subsume_by() wouldn't do as that AFAICT.

Roland.

From rwestrel at redhat.com  Fri Sep 28 08:23:30 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 28 Sep 2018 10:23:30 +0200
Subject: RFR(S): 8211233: MemBarNode::trailing_membar() and
 MemBarNode::leading_membar() need to handle dying subgraphs better
In-Reply-To: <b184ae36-34d4-ccdd-e468-a0b47f1f6c7a@oracle.com>
References: <dk61s9f6i47.fsf@rwestrel.remote.csb>
 <b184ae36-34d4-ccdd-e468-a0b47f1f6c7a@oracle.com>
Message-ID: <dk6sh1u58u5.fsf@rwestrel.remote.csb>


Hi Vladimir,

Thanks for the review.

Roland.

From rwestrel at redhat.com  Fri Sep 28 09:06:17 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 28 Sep 2018 11:06:17 +0200
Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy
In-Reply-To: <39b8c3c2-d412-06fa-9162-1d80f067ed11@oracle.com>
References: <dk636u68t4l.fsf@rwestrel.remote.csb>
 <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com>
 <dk67ejaj7lk.fsf@rwestrel.remote.csb>
 <b3329405-1736-5235-b0a2-0962ae7789fa@oracle.com>
 <3524b9d8-f387-a3c4-49c3-eace83126bb7@oracle.com>
 <27315645-ff46-0e05-a22d-89c166f5bd8b@oracle.com>
 <dk6a7o36nz8.fsf@rwestrel.remote.csb>
 <39b8c3c2-d412-06fa-9162-1d80f067ed11@oracle.com>
Message-ID: <dk6o9ci56uu.fsf@rwestrel.remote.csb>


> gc/epsilon/TestManyThreads.java test failed on SPARC
> I add information and replay file to bug report.

Thanks for the test result. The fix is:

diff --git a/src/hotspot/share/opto/arraycopynode.cpp b/src/hotspot/share/opto/arraycopynode.cpp
--- a/src/hotspot/share/opto/arraycopynode.cpp
+++ b/src/hotspot/share/opto/arraycopynode.cpp
@@ -422,7 +422,8 @@
     Node *start_mem_dest = mm->memory_at(alias_idx_dest);
     Node* mem = start_mem_dest;
 
-    assert(copy_type != T_OBJECT, "only tightly coupled allocations for object arrays");
+    BarrierSetC2* bs = BarrierSet::barrier_set()->barrier_set_c2();
+    assert(copy_type != T_OBJECT || !bs->array_copy_requires_gc_barriers(false, T_OBJECT, false, BarrierSetC2::Optimization), "only tightly coupled allocations for object arrays");
     bool same_alias = (alias_idx_src == alias_idx_dest);
 
     if (count > 0) {


New webrev:

http://cr.openjdk.java.net/~roland/8210887/webrev.02/

Roland.

From rahul.v.raghavan at oracle.com  Fri Sep 28 09:10:34 2018
From: rahul.v.raghavan at oracle.com (Rahul Raghavan)
Date: Fri, 28 Sep 2018 14:40:34 +0530
Subject: RFR: 8211168: Solaris-X64 build failure with error nreg hides the
 same name in an outer scope
Message-ID: <18885331-d9de-6d52-d28f-c876849b76f8@oracle.com>

Hi,

Please review the following fix proposal contributed by Daniel Daugherty.

<webrev> - http://cr.openjdk.java.net/~rraghavan/8211168/webrev.00/

<JBS> - https://bugs.openjdk.java.net/browse/JDK-8211168

(Pre-integration testing in progress)

Thanks,
Rahul

From adinn at redhat.com  Fri Sep 28 09:16:13 2018
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 28 Sep 2018 10:16:13 +0100
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <bbeec978-75ea-5fb9-dfc4-aaa35e7ae510@oracle.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
 <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>
 <c4fdca2f-7c6a-7429-6b1a-efdff2994bf6@oracle.com>
 <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com>
 <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com>
 <df03ca64-d68a-ca8b-5309-395672ac7dcb@redhat.com>
 <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com>
 <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com>
 <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com>
 <65f5660a-807b-426a-7af5-d5f95a6089ba@redhat.com>
 <bbeec978-75ea-5fb9-dfc4-aaa35e7ae510@oracle.com>
Message-ID: <2cdc5a22-0217-b985-9069-0fc72058f081@redhat.com>

Hi Stuart,

On 28/09/18 06:51, Stuart Marks wrote:
> Let me first stay that this issue of "ByteBuffer might not be the right
> answer" is something of a digression from the JEP discussion. I think
> the JEP should proceed forward using MBB with the API that you and Alan
> had discussed previously. At most, the discussion of the "right thing"
> issue might affect a side note in the JEP text about possible
> limitations and future directions of this effort. However, it's not a
> blocker to the JEP making progress as far as I'm concerned.

Thanks for clarifying that point. I have already added a note to that
effect to the JEP. I take your other point that these limitations make
this JEP a less useful addition than it could be. However, it's hard to
see what else might usefully be provided that does not involve a
reworking of JDK core-lib (and, potentially, JVM) functionality that has
a much larger scope than is needed to crack the specific nut the JEP
addresses.

> With that in mind, I'll discuss the issue of multithreaded access to
> ByteBuffers and how this bears on whether buffers are or aren't the
> "right answer." There are actually several issues that figure into the
> "right answer" analysis. In this message, though, I'll just focus on the
> issue of multithreaded access.

Thank you for a very clear and interesting summary of the limitations of
the Buffer API. I have cut it from this reply for the sake of brevity
but I will respond to a few points.

I think the limitations you point out regarding concurrent clients' mode
of operation are less severe in this specific case because there is not
really a need for those client threads to reach a rendezvous point in
order to execute some form of FileChannel update. The buffer content is
persistent memory. So, essentially, the data writes constitute the update.

If independent threads can arrange to coordinate over carving up
separate regions of a persistent mapped buffer for parallel update then
they can also write and flush (by which I mean force cache writeback
for) those regions independently.

Clearly there will also be a need for threads to write common index
regions of the persistent mapped buffer in order to ensure that the
associated data updates are committed. That means the writes and flushes
for those common regions need to synchronize. However, that is simply
business as usual for persistent data management code. A TX manager will
already have code in place for this purpose, for example. Certainly,
that synchronized update will not need to rely on buffer cursor
(position) management.

Also, I am not sure I see any problem arising from your point about
absolute puts (and gets) depending on the 'limit' property. The various
put operations do indeed /read/ the current limit but they do not update
it. So, you are right to state that a persistent store management
library built over this API would need to ensure that put operations
were reined in via some form of rendezvous if it ever wanted to adjust
the limit. However, I don't think that is going to happen with a librray
that manages a mapped persistent store. I would expect that any such
code is never going to call clear(), flip(), truncate() -- nor make a
direct call to limit() --  except as part of the initialization or
reconciliation performed at startup before concurrent clients are unleashed.

Anyway, thank you for a clear warning as to the precise perils faced in
implementing correct client libraries over the base layer this JEP proposes.

> If we were designing an API to support multi-threaded access to memory
> regions, it would almost certainly look nothing like the buffer API.
> This is what Alan means by "buffers might not be the right answer." As
> things stand, it appears quite difficult to me to fix the multi-threaded
> access problem without turning buffers into something they aren't, or
> fragmenting the API in some complex and uncomfortable way.

Agreed.

> Finally, note that this is not an argument against adding bulk absolute
> operations! I think we should probably go ahead and do that anyway. But
> let's not fool ourselves into thinking that bulk absolute operations
> solve the multi-threaded buffer access problem.
Also agreed.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From shade at redhat.com  Fri Sep 28 10:24:58 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 28 Sep 2018 12:24:58 +0200
Subject: RFR: 8211168: Solaris-X64 build failure with error nreg hides the
 same name in an outer scope
In-Reply-To: <18885331-d9de-6d52-d28f-c876849b76f8@oracle.com>
References: <18885331-d9de-6d52-d28f-c876849b76f8@oracle.com>
Message-ID: <8b945716-b537-95f8-bbd7-7979a82c92b1@redhat.com>

On 09/28/2018 11:10 AM, Rahul Raghavan wrote:
> Please review the following fix proposal contributed by Daniel Daugherty.
> 
> <webrev> - http://cr.openjdk.java.net/~rraghavan/8211168/webrev.00/

This looks good and trivially correct. I would just move the "int nreg" statement at the beginning
of the method before all the blocks, but this build fix is fine too.

-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180928/1d604555/signature.asc>

From rwestrel at redhat.com  Fri Sep 28 11:06:56 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 28 Sep 2018 13:06:56 +0200
Subject: RFR(S): 8211232: GraphKit::make_runtime_call() sometimes attaches
 wrong memory state to call
In-Reply-To: <dbe74bbe-dac1-7f3d-8e37-18474f79dda6@oracle.com>
References: <dk64leb6ipr.fsf@rwestrel.remote.csb>
 <dbe74bbe-dac1-7f3d-8e37-18474f79dda6@oracle.com>
Message-ID: <dk6k1n56fu7.fsf@rwestrel.remote.csb>


Hi Vladimir,

Thanks for looking at this.

> I understand that you want to avoid second reset_memory() and I agree.
> But I concern about your code for setting input memory for call. Why not to pass narrow_mem from 
> memory(adr_type) to set_predefined_input_for_runtime_call() in this case and NULL in others and 
> check for NULL to select which memory to set.  memory(adr_type) will check for merge_mem.

Is this what you're suggesting?

http://cr.openjdk.java.net/~roland/8211232/webrev.00/

Roland.

From shade at redhat.com  Fri Sep 28 11:16:50 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 28 Sep 2018 13:16:50 +0200
Subject: RFR 8211272: x86_32 build failures after JDK-8210764 (Update avx512
 implementation)
Message-ID: <b58816ce-8030-ee4d-d19a-50bc0c856874@redhat.com>

Bug:
  https://bugs.openjdk.java.net/browse/JDK-8211272

It is a trivial mistake: the braces got unbalanced when _LP64 is not defined, which is the case for
x86_32 build. I would not bother to run it through jdk-submit.

Fix:

diff -r eb3e72f181af src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp
--- a/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp       Thu Sep 27 10:24:12 2018 +0200
+++ b/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp       Fri Sep 28 13:08:20 2018 +0200
@@ -2401,12 +2401,13 @@
         {
 #ifdef _LP64
           if (UseAVX > 2 && !VM_Version::supports_avx512vl()) {
             assert(tmp->is_valid(), "need temporary");
             __ vpandn(dest->as_xmm_double_reg(), tmp->as_xmm_double_reg(),
value->as_xmm_double_reg(), 2);
-          } else {
+          } else
 #endif
+          {
             if (dest->as_xmm_double_reg() != value->as_xmm_double_reg()) {
               __ movdbl(dest->as_xmm_double_reg(), value->as_xmm_double_reg());
             }
             assert(!tmp->is_valid(), "do not need temporary");
             __ andpd(dest->as_xmm_double_reg(),

Testing: x86_32 build, x86_64 build

Thanks,
-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180928/3a2670db/signature.asc>

From daniel.daugherty at oracle.com  Fri Sep 28 14:39:26 2018
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 28 Sep 2018 10:39:26 -0400
Subject: RFR: 8211168: Solaris-X64 build failure with error nreg hides the
 same name in an outer scope
In-Reply-To: <18885331-d9de-6d52-d28f-c876849b76f8@oracle.com>
References: <18885331-d9de-6d52-d28f-c876849b76f8@oracle.com>
Message-ID: <c29acf0f-7fe1-257e-5d19-afb78e430810@oracle.com>

Thumbs up.

Dan


On 9/28/18 5:10 AM, Rahul Raghavan wrote:
> Hi,
>
> Please review the following fix proposal contributed by Daniel Daugherty.
>
> <webrev> - http://cr.openjdk.java.net/~rraghavan/8211168/webrev.00/
>
> <JBS> - https://bugs.openjdk.java.net/browse/JDK-8211168
>
> (Pre-integration testing in progress)
>
> Thanks,
> Rahul


From rkennke at redhat.com  Fri Sep 28 15:01:50 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 28 Sep 2018 17:01:50 +0200
Subject: RFR 8211272: x86_32 build failures after JDK-8210764 (Update
 avx512 implementation)
In-Reply-To: <b58816ce-8030-ee4d-d19a-50bc0c856874@redhat.com>
References: <b58816ce-8030-ee4d-d19a-50bc0c856874@redhat.com>
Message-ID: <817d02b0-31e8-e488-b374-6021aa475190@redhat.com>

Ok fine.

> Bug:
>   https://bugs.openjdk.java.net/browse/JDK-8211272
> 
> It is a trivial mistake: the braces got unbalanced when _LP64 is not defined, which is the case for
> x86_32 build. I would not bother to run it through jdk-submit.

jdk/submit seems unresponsive anyway since at least yesterday. Dunno
what's up? Did it go up in flames because of the warnings stuff?

Thanks,
Roman

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180928/0bf531ad/signature.asc>

From vladimir.kozlov at oracle.com  Fri Sep 28 16:10:56 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 28 Sep 2018 09:10:56 -0700
Subject: RFR 8211272: x86_32 build failures after JDK-8210764 (Update
 avx512 implementation)
In-Reply-To: <b58816ce-8030-ee4d-d19a-50bc0c856874@redhat.com>
References: <b58816ce-8030-ee4d-d19a-50bc0c856874@redhat.com>
Message-ID: <76cca0b1-e4be-e0bc-d89c-1c8e3c389879@oracle.com>

Good. You can push it since it is trivial.

Thanks,
Vladimir

On 9/28/18 4:16 AM, Aleksey Shipilev wrote:
> Bug:
>    https://bugs.openjdk.java.net/browse/JDK-8211272
> 
> It is a trivial mistake: the braces got unbalanced when _LP64 is not defined, which is the case for
> x86_32 build. I would not bother to run it through jdk-submit.
> 
> Fix:
> 
> diff -r eb3e72f181af src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp
> --- a/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp       Thu Sep 27 10:24:12 2018 +0200
> +++ b/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp       Fri Sep 28 13:08:20 2018 +0200
> @@ -2401,12 +2401,13 @@
>           {
>   #ifdef _LP64
>             if (UseAVX > 2 && !VM_Version::supports_avx512vl()) {
>               assert(tmp->is_valid(), "need temporary");
>               __ vpandn(dest->as_xmm_double_reg(), tmp->as_xmm_double_reg(),
> value->as_xmm_double_reg(), 2);
> -          } else {
> +          } else
>   #endif
> +          {
>               if (dest->as_xmm_double_reg() != value->as_xmm_double_reg()) {
>                 __ movdbl(dest->as_xmm_double_reg(), value->as_xmm_double_reg());
>               }
>               assert(!tmp->is_valid(), "do not need temporary");
>               __ andpd(dest->as_xmm_double_reg(),
> 
> Testing: x86_32 build, x86_64 build
> 
> Thanks,
> -Aleksey
> 

From vladimir.kozlov at oracle.com  Fri Sep 28 16:43:41 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 28 Sep 2018 09:43:41 -0700
Subject: RFR(S): 8211232: GraphKit::make_runtime_call() sometimes attaches
 wrong memory state to call
In-Reply-To: <dk6k1n56fu7.fsf@rwestrel.remote.csb>
References: <dk64leb6ipr.fsf@rwestrel.remote.csb>
 <dbe74bbe-dac1-7f3d-8e37-18474f79dda6@oracle.com>
 <dk6k1n56fu7.fsf@rwestrel.remote.csb>
Message-ID: <f238a86e-c64b-c43e-3768-3fb0d8629efd@oracle.com>

On 9/28/18 4:06 AM, Roland Westrelin wrote:
> 
> Hi Vladimir,
> 
> Thanks for looking at this.
> 
>> I understand that you want to avoid second reset_memory() and I agree.
>> But I concern about your code for setting input memory for call. Why not to pass narrow_mem from
>> memory(adr_type) to set_predefined_input_for_runtime_call() in this case and NULL in others and
>> check for NULL to select which memory to set.  memory(adr_type) will check for merge_mem.
> 
> Is this what you're suggesting?
> 
> http://cr.openjdk.java.net/~roland/8211232/webrev.00/

Yes. Does it work for you?

Thanks,
Vladimir

> 
> Roland.
> 

From shade at redhat.com  Fri Sep 28 16:47:59 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 28 Sep 2018 18:47:59 +0200
Subject: RFR 8211272: x86_32 build failures after JDK-8210764 (Update
 avx512 implementation)
In-Reply-To: <76cca0b1-e4be-e0bc-d89c-1c8e3c389879@oracle.com>
References: <b58816ce-8030-ee4d-d19a-50bc0c856874@redhat.com>
 <76cca0b1-e4be-e0bc-d89c-1c8e3c389879@oracle.com>
Message-ID: <c0f9767d-0f17-f711-5aa1-63e8ce364e7c@redhat.com>

Right on. Pushed.

-Aleksey

On 09/28/2018 06:10 PM, Vladimir Kozlov wrote:
> Good. You can push it since it is trivial.
> 
> Thanks,
> Vladimir
> 
> On 9/28/18 4:16 AM, Aleksey Shipilev wrote:
>> Bug:
>> ?? https://bugs.openjdk.java.net/browse/JDK-8211272
>>
>> It is a trivial mistake: the braces got unbalanced when _LP64 is not defined, which is the case for
>> x86_32 build. I would not bother to run it through jdk-submit.
>>
>> Fix:
>>
>> diff -r eb3e72f181af src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp
>> --- a/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp?????? Thu Sep 27 10:24:12 2018 +0200
>> +++ b/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp?????? Fri Sep 28 13:08:20 2018 +0200
>> @@ -2401,12 +2401,13 @@
>> ????????? {
>> ? #ifdef _LP64
>> ??????????? if (UseAVX > 2 && !VM_Version::supports_avx512vl()) {
>> ????????????? assert(tmp->is_valid(), "need temporary");
>> ????????????? __ vpandn(dest->as_xmm_double_reg(), tmp->as_xmm_double_reg(),
>> value->as_xmm_double_reg(), 2);
>> -????????? } else {
>> +????????? } else
>> ? #endif
>> +????????? {
>> ????????????? if (dest->as_xmm_double_reg() != value->as_xmm_double_reg()) {
>> ??????????????? __ movdbl(dest->as_xmm_double_reg(), value->as_xmm_double_reg());
>> ????????????? }
>> ????????????? assert(!tmp->is_valid(), "do not need temporary");
>> ????????????? __ andpd(dest->as_xmm_double_reg(),
>>
>> Testing: x86_32 build, x86_64 build
>>
>> Thanks,
>> -Aleksey
>>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180928/b111611e/signature-0001.asc>

From sandhya.viswanathan at intel.com  Fri Sep 28 17:04:03 2018
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 28 Sep 2018 17:04:03 +0000
Subject: RFR(S):8211251:Default mask register for avx512 instructions
In-Reply-To: <a6c24555-0271-10a0-5b45-2d566fc50e8a@oracle.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12894@FMSMSX126.amr.corp.intel.com>
 <badfeb56-3333-4eac-6d6c-6b452ac7bd2a@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF129C6@FMSMSX126.amr.corp.intel.com>
 <1a675691-9178-eb7e-f517-72a2a9e0519c@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12CE1@FMSMSX126.amr.corp.intel.com>
 <a6c24555-0271-10a0-5b45-2d566fc50e8a@oracle.com>
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12FCB@FMSMSX126.amr.corp.intel.com>

Hi Vladimir,

Yes, it only affects avx512 machines with UseAVX=3. I have run jtreg compiler tests on SKX, KNL and Haswell. Also ran SPECjvm2008.

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Thursday, September 27, 2018 6:38 PM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(S):8211251:Default mask register for avx512 instructions

This looks fine. I assume you did testing. It only affects avx512 machines - right?

Thanks,
Vladimir

On 9/27/18 5:46 PM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> Please find the updated webrev with this change at:
> http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.01/
> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, September 27, 2018 4:50 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR(S):8211251:Default mask register for avx512 instructions
> 
> Use guarantee() instead of assert so if someone to try to use it with product JDK it will fail.
> 
> Thanks,
> Vladimir
> 
> On 9/27/18 4:40 PM, Viswanathan, Sandhya wrote:
>> Hi Vladimir,
>>
>> As you know the PostLoopMultiversioning needs to be fully redesigned and is currently disabled.
>> That particular code should only be exercised when PostLoopMultiversioning is on.
>> I could change it with an assert statement if that looks ok to you.
>>
>> Please let me know.
>>
>> Best Regards,
>> Sandhya
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Thursday, September 27, 2018 3:24 PM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot
>> compiler <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR(S):8211251:Default mask register for avx512
>> instructions
>>
>> Looks good except PostLoopMultiversioning flag guarded changes in
>> macroAssembler_x86.cpp which should be explained too.
>>
>> Thanks,
>> Vladimir
>>
>> On 9/27/18 2:37 PM, Viswanathan, Sandhya wrote:
>>> Please find below a patch which cleans up K1 mask register handling for AVX 512 instructions.
>>>
>>> Currently unmasked instructions are encoded using k1 register which
>>> requires k1 register to be initialized properly and also reinitialized across JNI and Runtime calls.
>>>
>>> This patch encodes AVX 512 instructions as unmasked instruction with
>>> K0 encoding where the explicit mask register is not specified.
>>>
>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8211251
>>>
>>> Webrev: http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.00/
>>> <http://cr.openjdk.java.net/%7Evdeshpande/k_register/webrev.00/>
>>>
>>> Best Regards,
>>>
>>> Sandhya
>>>

From vladimir.kozlov at oracle.com  Fri Sep 28 17:20:25 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 28 Sep 2018 10:20:25 -0700
Subject: RFR(S):8211251:Default mask register for avx512 instructions
In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12FCB@FMSMSX126.amr.corp.intel.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12894@FMSMSX126.amr.corp.intel.com>
 <badfeb56-3333-4eac-6d6c-6b452ac7bd2a@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF129C6@FMSMSX126.amr.corp.intel.com>
 <1a675691-9178-eb7e-f517-72a2a9e0519c@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12CE1@FMSMSX126.amr.corp.intel.com>
 <a6c24555-0271-10a0-5b45-2d566fc50e8a@oracle.com>
 <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12FCB@FMSMSX126.amr.corp.intel.com>
Message-ID: <0439a3ae-1b18-f38a-6832-e44c2c2b8b05@oracle.com>

Okay. Thanks. I submitted testing on avx512 machine too.

Vladimir

On 9/28/18 10:04 AM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> Yes, it only affects avx512 machines with UseAVX=3. I have run jtreg compiler tests on SKX, KNL and Haswell. Also ran SPECjvm2008.
> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, September 27, 2018 6:38 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR(S):8211251:Default mask register for avx512 instructions
> 
> This looks fine. I assume you did testing. It only affects avx512 machines - right?
> 
> Thanks,
> Vladimir
> 
> On 9/27/18 5:46 PM, Viswanathan, Sandhya wrote:
>> Hi Vladimir,
>>
>> Please find the updated webrev with this change at:
>> http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.01/
>>
>> Best Regards,
>> Sandhya
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Thursday, September 27, 2018 4:50 PM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR(S):8211251:Default mask register for avx512 instructions
>>
>> Use guarantee() instead of assert so if someone to try to use it with product JDK it will fail.
>>
>> Thanks,
>> Vladimir
>>
>> On 9/27/18 4:40 PM, Viswanathan, Sandhya wrote:
>>> Hi Vladimir,
>>>
>>> As you know the PostLoopMultiversioning needs to be fully redesigned and is currently disabled.
>>> That particular code should only be exercised when PostLoopMultiversioning is on.
>>> I could change it with an assert statement if that looks ok to you.
>>>
>>> Please let me know.
>>>
>>> Best Regards,
>>> Sandhya
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Thursday, September 27, 2018 3:24 PM
>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot
>>> compiler <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: Re: RFR(S):8211251:Default mask register for avx512
>>> instructions
>>>
>>> Looks good except PostLoopMultiversioning flag guarded changes in
>>> macroAssembler_x86.cpp which should be explained too.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/27/18 2:37 PM, Viswanathan, Sandhya wrote:
>>>> Please find below a patch which cleans up K1 mask register handling for AVX 512 instructions.
>>>>
>>>> Currently unmasked instructions are encoded using k1 register which
>>>> requires k1 register to be initialized properly and also reinitialized across JNI and Runtime calls.
>>>>
>>>> This patch encodes AVX 512 instructions as unmasked instruction with
>>>> K0 encoding where the explicit mask register is not specified.
>>>>
>>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8211251
>>>>
>>>> Webrev: http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.00/
>>>> <http://cr.openjdk.java.net/%7Evdeshpande/k_register/webrev.00/>
>>>>
>>>> Best Regards,
>>>>
>>>> Sandhya
>>>>

From vladimir.kozlov at oracle.com  Fri Sep 28 17:51:35 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 28 Sep 2018 10:51:35 -0700
Subject: RFR: 8208686: [AOT] JVMTI ResourceExhausted event repeated for
 same allocation
In-Reply-To: <64cbe730-5c9a-a04d-9eee-a56abfbb8e07@oracle.com>
References: <910A4A3C-3EEF-4167-84D5-9819C83D6FC1@oracle.com>
 <64cbe730-5c9a-a04d-9eee-a56abfbb8e07@oracle.com>
Message-ID: <015e8416-a948-fdc5-46db-8b5d80ba52e8@oracle.com>

To let you know, me and Tom R. did review these changes and agreed that it is the least intrusive 
changes for Hotspot shared code.

Thanks,
Vladimir

On 9/25/18 8:11 AM, Daniel D. Daugherty wrote:
> Adding serviceability-dev at ... since this is JVM/TI...
> 
> Dan
> 
> 
> On 9/25/18 10:48 AM, Doug Simon wrote:
>> A major design point of Graal is to treat allocations as non-side effecting to give more freedom 
>> to the optimizer by reducing the number of distinct FrameStates that need to be managed. When 
>> failing an allocation, Graal will deoptimize to the last side effecting instruction before the 
>> allocation. This mean the VM code for heap allocation will potentially be executed twice, once 
>> from Graal compiled code and then again in the interpreter. While this is perfectly fine according 
>> to the JVM specification, it can cause confusing behavior for JVMTI based tools. They will receive 
>> 2 ResourceExhausted events for a single allocation. Furthermore, the first ResourceExhausted event 
>> (on the Graal allocation slow path) might denote a bytecode instruction that performs no 
>> allocation, making it hard to debug the memory failure.
>>
>> The proposed solution is to add an extra set of JVMCI VM runtime calls for allocation. These entry 
>> points will attempt the allocation and upon failure,
>> skip side-effects such as posting JVMTI events or handling -XX:OnOutOfMemoryError. The compiled 
>> code using these entry points is expected deoptmize on null.
>>
>> The path from these new entry points to where allocation can fail goes through quite a bit of VM 
>> code. One could modify all these paths by:
>> * Returning null instead of throwing an exception on failure.
>> * Adding a `bool null_on_fail` argument to all relevant methods.
>> * Adding extra null checking where necessary after each call to these methods when `null_on_fail 
>> == true`.
>> This represents a significant number of changes.
>>
>> Instead, the proposed solution introduces a new _in_retryable_allocation thread-local. This way, 
>> only the entry points and allocation routines that raise an exception need to be modified. Failure 
>> is communicated back to the new entry points by throwing a special pre-allocated OOME object 
>> (i.e., Universe::out_of_memory_error_retry()) which must not propagate back to Java code. Use of 
>> this object is not strictly necessary; it is introduced to highlight/document the special 
>> allocation mode.
>>
>> The proposed solution is at http://cr.openjdk.java.net/~dnsimon/8208686.
>> THE JBS bug is: https://bugs.openjdk.java.net/browse/JDK-8208686
>>
>> -Doug
> 

From vladimir.kozlov at oracle.com  Fri Sep 28 18:24:57 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 28 Sep 2018 11:24:57 -0700
Subject: RFR(S): 8210389: C2: assert(n->outcnt() != 0 || C->top() == n ||
 n->is_Proj()) failed: No dead instructions after post-alloc
In-Reply-To: <dk6va6q58uk.fsf@rwestrel.remote.csb>
References: <dk68t3y8u59.fsf@rwestrel.remote.csb>
 <99bc7d08-614c-9a45-0fdd-2dac729122d9@oracle.com>
 <dk6va6q58uk.fsf@rwestrel.remote.csb>
Message-ID: <dc6f3047-c768-e66b-9363-4d0ae39ff3d0@oracle.com>

subsume_by() + disconnect_inputs() are used by other code in final_graph_reshaping_impl(). If it 
does not work in your case it may not work for other cases to and should be solved in general.

May be we can modify final_graph_reshaping_walk() and final_graph_reshaping_impl() to remove dead 
code. Or do separate path over graph like PhaseRemoveUseless. One thing to point is that 
verify_graph_edges() call after Optimize() should have no_dead_code = true to catch all cases we 
missing.

Thanks,
Vladimir

On 9/28/18 1:23 AM, Roland Westrelin wrote:
> 
> Hi Vladimir,
> 
> Thanks for looking at this.
> 
>> Why you are not using subsume_by()?
> 
> Instead of disconnect_inputs()? Or as a replacement for the loop? I want
> all nodes that become useless as a result of the edge removal to be
> disconnected. subsume_by() wouldn't do as that AFAICT.
> 
> Roland.
> 

From stuart.marks at oracle.com  Fri Sep 28 20:12:44 2018
From: stuart.marks at oracle.com (Stuart Marks)
Date: Fri, 28 Sep 2018 13:12:44 -0700
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <2cdc5a22-0217-b985-9069-0fc72058f081@redhat.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
 <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>
 <c4fdca2f-7c6a-7429-6b1a-efdff2994bf6@oracle.com>
 <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com>
 <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com>
 <df03ca64-d68a-ca8b-5309-395672ac7dcb@redhat.com>
 <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com>
 <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com>
 <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com>
 <65f5660a-807b-426a-7af5-d5f95a6089ba@redhat.com>
 <bbeec978-75ea-5fb9-dfc4-aaa35e7ae510@oracle.com>
 <2cdc5a22-0217-b985-9069-0fc72058f081@redhat.com>
Message-ID: <51f1741a-7854-a8fb-3d25-b4cd7fcfd7bf@oracle.com>


On 9/28/18 2:16 AM, Andrew Dinn wrote:
> Thanks for clarifying that point. I have already added a note to that
> effect to the JEP. I take your other point that these limitations make
> this JEP a less useful addition than it could be. However, it's hard to
> see what else might usefully be provided that does not involve a
> reworking of JDK core-lib (and, potentially, JVM) functionality that has
> a much larger scope than is needed to crack the specific nut the JEP
> addresses.

I'm not sure I'd put it quite that way, "less useful than it could be."

I guess it depends on what you think the JEP is about. If the JEP is about MBB, 
and MBB is at some point superseded by something else, then yes, I suppose that 
means this JEP is less useful than it might be.

On the other hand, suppose that this JEP is primarily about NVM, including 
access, operations, API, architecture, life cycle issues, etc., and these happen 
to be surfaced through MBB today. If something supersedes MBB, then the concepts 
developed by this JEP can be retargeted to that other thing at the appropriate 
time. Or are the concepts developed by this JEP so closely intertwined with MBB 
that this idea of "retargeting" doesn't make sense? I don't know.

> Thank you for a very clear and interesting summary of the limitations of
> the Buffer API. I have cut it from this reply for the sake of brevity
> but I will respond to a few points.

Great, I'm glad this helped. I'm never quite sure whether writing these big 
essays is helpful.

(Note also that there are OTHER limitations of the buffer API that I didn't 
cover, since the message was getting too long as it was. Example: 2GB limit.)

> I think the limitations you point out regarding concurrent clients' mode
> of operation are less severe in this specific case because there is not
> really a need for those client threads to reach a rendezvous point in
> order to execute some form of FileChannel update. The buffer content is
> persistent memory. So, essentially, the data writes constitute the update.

Sure. It may be that the use cases for NVM aren't particularly affected by 
limitations of the Buffer APIs. If so, so much the better! But there are other 
systems where the limitations imposed by buffers are so onerous that they've had 
to go directly to Unsafe.

> Anyway, thank you for a clear warning as to the precise perils faced in
> implementing correct client libraries over the base layer this JEP proposes.

Yes, this is essentially it. When you run into a problem -- as every project 
does -- think about whether it's inherent to NVM, or whether it's incidental to 
NVM and is rooted in the use of Buffers.

s'marks

From stuart.marks at oracle.com  Fri Sep 28 20:50:44 2018
From: stuart.marks at oracle.com (Stuart Marks)
Date: Fri, 28 Sep 2018 13:50:44 -0700
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <b0ff5779-b9ba-731f-49d7-a920164c5054@gmail.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
 <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>
 <c4fdca2f-7c6a-7429-6b1a-efdff2994bf6@oracle.com>
 <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com>
 <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com>
 <df03ca64-d68a-ca8b-5309-395672ac7dcb@redhat.com>
 <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com>
 <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com>
 <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com>
 <65f5660a-807b-426a-7af5-d5f95a6089ba@redhat.com>
 <bbeec978-75ea-5fb9-dfc4-aaa35e7ae510@oracle.com>
 <b0ff5779-b9ba-731f-49d7-a920164c5054@gmail.com>
Message-ID: <e9cb6a30-8025-6c1b-88ff-e18e6778ce9d@oracle.com>


On 9/28/18 12:21 AM, Peter Levart wrote:
> I mostly agree with your assessment about the suitability of the ByteBuffer API 
> for nice multithreaded use. What would such API look like? I think pretty much 
> like ByteBuffer but without things that mutate mark/position/limit/ByteOrder. A 
> stripped-down ByteBuffer API therefore. That would be in my opinion the most 
> low-level API possible. If you add things to such API that coordinate 
> multithreaded access to the underlying memory, you are already creating a 
> concurrent data structure for a particular set of use cases, which might not 
> cover all possible use cases or be sub-optimal at some of them. So I think this 
> is better layered on top of such API not built into it. Low-level multithreaded 
> access to memory is, in my opinion, always going to be "unsafe" from the 
> standpoint of coordination. It's not only the mark/position/limit/ByteOrder that 
> is not multithreaded-friendly about ByteBuffer API, but the underlying memory 
> too. It would be nice if mark/position/limit/ByteOrder weren't in the way though.

Right, getting mark/position/limit/ByteOrder out of the way would be a good 
first step. (I just realized that ByteOrder is mutable too!)

I also think you're right that proceeding down a "classic" thread-safe object 
design won't be fruitful. We don't know what the right set of operations is yet, 
so it'll be difficult to know how to deal with thread safety.

One complicating factor is timely deallocation. This is an existing problem with 
direct buffers and MappedByteBuffer (see JDK-4724038). If a "buffer" were 
confined to a single thread, it could be deallocated safely when that thread is 
finished. I don't know how to guarantee thread confinement though.

On the other hand, if a "buffer" is exposed to multiple threads, deallocation 
requires that proper synchronization and checking be done so that subsequent 
operations are properly checked (so that they do something reasonable, like 
throw an exception) instead of accessing unmapped or repurposed memory. If 
checking is done, this pushes operations to be coarser-grained (bulk) so that 
the checking overhead is amortized over a more expensive operation.

I know there has been some thought put into this in the Panama project, but I 
don't know exactly where it stands at the moment. See the MemoryRegion and Scope 
stuff.

s'marks

From stuart.marks at oracle.com  Fri Sep 28 21:14:00 2018
From: stuart.marks at oracle.com (Stuart Marks)
Date: Fri, 28 Sep 2018 14:14:00 -0700
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <CAKxGtTWcunzwM=7K21Sk=VKrKfwVzWbAvONBK1nZAYqP-zihsg@mail.gmail.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
 <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>
 <c4fdca2f-7c6a-7429-6b1a-efdff2994bf6@oracle.com>
 <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com>
 <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com>
 <df03ca64-d68a-ca8b-5309-395672ac7dcb@redhat.com>
 <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com>
 <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com>
 <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com>
 <65f5660a-807b-426a-7af5-d5f95a6089ba@redhat.com>
 <bbeec978-75ea-5fb9-dfc4-aaa35e7ae510@oracle.com>
 <b0ff5779-b9ba-731f-49d7-a920164c5054@gmail.com>
 <CAKxGtTWcunzwM=7K21Sk=VKrKfwVzWbAvONBK1nZAYqP-zihsg@mail.gmail.com>
Message-ID: <3fe9410d-2d20-c26c-a62e-cf9ad47b529a@oracle.com>

Hi Francesco,

Thanks for the pointer to AtomicBuffer stuff. It's quite interesting.

I don't know how directly relevant this JEP is your work. I guess that's really 
up to you and possibly Andrew Dinn. However, in my thinking, if you have useful 
comments and relevant questions, you're certainly welcome to participate in the 
discussion.

Looking at the AtomicBuffer interface, I see that it supports reading and 
writing of a variety of data items, with a few different memory access modes. 
That reminds me of the VarHandles API. [1] This enables quite a number of 
different operations on a data item somewhere in memory, with a variety of 
memory access modes. What would AtomicBuffer look like if it were to use 
VarHandles? Or would AtomicBuffer be necessary at all if the rest of the library 
were to use VarHandles?

Note that a VarHandle can be used to access an arbitrary item within a region of 
memory, such as an array or a ByteBuffer.[2] An obvious extension to VarHandle 
is to allow a long offset, not just an int offset.

Note also that while many VarHandle methods return Object and take a varargs 
parameter of Object..., this does not imply that primitives are boxed! This is a 
bit of VM magic called "signature polymorphism"; see JVMS 2.9.3 [3].

s'marks

[1] 
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/invoke/VarHandle.html

[2] 
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/invoke/MethodHandles.html#byteBufferViewVarHandle(java.lang.Class,java.nio.ByteOrder)

[3] https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-2.html#jvms-2.9.3

On 9/28/18 12:38 AM, Francesco Nigro wrote:
> Hi guys!
>
> I'm one of the mentioned devs (like many others) that are using external (and 
> unsafe) APIs to concurrent access ByteBuffer's content and a developer of a 
> messaging broker's journal
> that would benefit by this JEP :)
> Re concurrent access API, how it looks 
> https://github.com/real-logic/agrona/blob/master/agrona/src/main/java/org/agrona/concurrent/AtomicBuffer.java?
>
> note:
> I don't know how's considered to appear in these discussions without 
> presenting myself and I hope to not be OT, but both this JEP and the comments 
> around are so interesting
> that I couldn't resist: I apologize if I'm not respecting some rule on it
>
> Thanks for the hard work,
> Francesco
>
> Il giorno ven 28 set 2018 alle ore 09:21 Peter Levart <peter.levart at gmail.com 
> <mailto:peter.levart at gmail.com>> ha scritto:
>
>     Hi Stuart,
>
>     I mostly agree with your assessment about the suitability of the
>     ByteBuffer API for nice multithreaded use. What would such API look like?
>     I think pretty much like ByteBuffer but without things that mutate
>     mark/position/limit/ByteOrder. A stripped-down ByteBuffer API therefore.
>     That would be in my opinion the most low-level API possible. If you add
>     things to such API that coordinate multithreaded access to the underlying
>     memory, you are already creating a concurrent data structure for a
>     particular set of use cases, which might not cover all possible use cases
>     or be sub-optimal at some of them. So I think this is better layered on
>     top of such API not built into it. Low-level multithreaded access to
>     memory is, in my opinion, always going to be "unsafe" from the standpoint
>     of coordination. It's not only the mark/position/limit/ByteOrder that is
>     not multithreaded-friendly about ByteBuffer API, but the underlying memory
>     too. It would be nice if mark/position/limit/ByteOrder weren't in the way
>     though.
>
>     Regards, Peter
>
>
>     On 09/28/2018 07:51 AM, Stuart Marks wrote:
>>     Hi Andrew,
>>
>>     Let me first stay that this issue of "ByteBuffer might not be the right
>>     answer" is something of a digression from the JEP discussion. I think the
>>     JEP should proceed forward using MBB with the API that you and Alan had
>>     discussed previously. At most, the discussion of the "right thing" issue
>>     might affect a side note in the JEP text about possible limitations and
>>     future directions of this effort. However, it's not a blocker to the JEP
>>     making progress as far as I'm concerned.
>>
>>     With that in mind, I'll discuss the issue of multithreaded access to
>>     ByteBuffers and how this bears on whether buffers are or aren't the
>>     "right answer." There are actually several issues that figure into the
>>     "right answer" analysis. In this message, though, I'll just focus on the
>>     issue of multithreaded access.
>>
>>     To recap (possibly for the benefit of other readers) the Buffer class doc
>>     has the following statement:
>>
>>     ??? Buffers are not safe for use by multiple concurrent threads. If a buffer
>>     ??? is to be used by more than one thread then access to the buffer
>>     should be
>>     ??? controlled by appropriate synchronization.
>>
>>     Buffers are primarily designed for sequential operations such as I/O or
>>     codeset conversion. Typical buffer operations set the mark, position, and
>>     limit before initiating the operation. If the operation completes
>>     partially -- not uncommon with I/O or codeset conversion -- the position
>>     is updated so that the operation can be resumed easily from where it left
>>     off.
>>
>>     The fact that buffers not only contain the data being operated upon but
>>     also mutable state information such as mark/position/limit makes it
>>     difficult to have multiple threads operate on different parts of the same
>>     buffer. Each thread would have to lock around setting the position and
>>     limit and performing the operation, preventing any parallelism. The
>>     typical way to deal with this is to create multiple buffer slices, one
>>     per thread. Each slice has its own mark/position/limit values but shares
>>     the same backing data.
>>
>>     We can avoid the need for this by adding absolute bulk operations, right?
>>
>>     Let's suppose we were to add something like this (considering ByteBuffer
>>     only, setting the buffer views aside):
>>
>>     ??? get(int srcOff, byte[] dst, int dstOff, int length)
>>     ??? put(int dstOff, byte[] src, int srcOff, int length)
>>
>>     Each thread can perform its operations on a different part of the buffer,
>>     in parallel, without interference from the others. Presumably these
>>     operations don't read or write the mark and position. Oh, wait. The
>>     existing absolute put and get overloads *do* respect the buffer's limit,
>>     so the absolute bulk operations ought to as well. This means they do
>>     depend on shared state. (I guess we could make the absolute bulk ops not
>>     respect the limit, but that seems inconsistent.)
>>
>>     OK, let's adopt an approach similar to what was described by Peter Levart
>>     a couple messages upthread, where a) there is an initialization step
>>     where various things including the limit are set properly; b) the buffer
>>     is published to the worker threads properly, e.g., using a lock or other
>>     suitable memory operation; and c) all worker threads agree only to use
>>     absolute operations and to avoid relative operations.
>>
>>     Now suppose the threads have completed their work and you want to, say,
>>     write the buffer's contents to a channel. You have to carefully make sure
>>     the threads are all finished and properly publish their results back to
>>     some central thread, have that central thread receive the results, set
>>     the position and limit, after which the central thread can initiate the
>>     I/O operation.
>>
>>     This can certainly be made to work.
>>
>>     But note what we just did. We now have an API where:
>>
>>     ?- there are different "phases", where in one phase all the methods work,
>>     but in another phase only certain methods work (otherwise it breaks
>>     silently);
>>
>>     ?- you have to carefully control all the code to ensure that the wrong
>>     methods aren't called when the buffer is in the wrong phase (otherwise it
>>     breaks silently); and
>>
>>     ?- you can't hand off the buffer to a library (3rd party or JDK) without
>>     carefully orchestrating a transition into the right phase (otherwise it
>>     breaks silently).
>>
>>     Frankly, this is pretty crappy. It's certainly possible to work around
>>     it. People do, and it is painful, and they complain about it up and down
>>     all day long (and rightfully so).
>>
>>     Note that this discussion is based primarily on looking at the ByteBuffer
>>     API. I have not done extensive investigation of the impact of the various
>>     buffer views (IntBuffer, LongBuffer, etc.), nor have I looked thoroughly
>>     at the implementations. I have no doubt that we will run into additional
>>     issues when we do those investigations.
>>
>>     If we were designing an API to support multi-threaded access to memory
>>     regions, it would almost certainly look nothing like the buffer API. This
>>     is what Alan means by "buffers might not be the right answer." As things
>>     stand, it appears quite difficult to me to fix the multi-threaded access
>>     problem without turning buffers into something they aren't, or
>>     fragmenting the API in some complex and uncomfortable way.
>>
>>     Finally, note that this is not an argument against adding bulk absolute
>>     operations! I think we should probably go ahead and do that anyway. But
>>     let's not fool ourselves into thinking that bulk absolute operations
>>     solve the multi-threaded buffer access problem.
>>
>>     s'marks
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180928/9a81a7e0/attachment-0001.html>

From vladimir.kozlov at oracle.com  Fri Sep 28 23:22:08 2018
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 28 Sep 2018 16:22:08 -0700
Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy
In-Reply-To: <dk6o9ci56uu.fsf@rwestrel.remote.csb>
References: <dk636u68t4l.fsf@rwestrel.remote.csb>
 <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com>
 <dk67ejaj7lk.fsf@rwestrel.remote.csb>
 <b3329405-1736-5235-b0a2-0962ae7789fa@oracle.com>
 <3524b9d8-f387-a3c4-49c3-eace83126bb7@oracle.com>
 <27315645-ff46-0e05-a22d-89c166f5bd8b@oracle.com>
 <dk6a7o36nz8.fsf@rwestrel.remote.csb>
 <39b8c3c2-d412-06fa-9162-1d80f067ed11@oracle.com>
 <dk6o9ci56uu.fsf@rwestrel.remote.csb>
Message-ID: <fa0bcfad-eaa2-2b46-217a-2da8cb163086@oracle.com>

This version passed clean!

Thanks,
Vladimir

On 9/28/18 2:06 AM, Roland Westrelin wrote:
> 
>> gc/epsilon/TestManyThreads.java test failed on SPARC
>> I add information and replay file to bug report.
> 
> Thanks for the test result. The fix is:
> 
> diff --git a/src/hotspot/share/opto/arraycopynode.cpp b/src/hotspot/share/opto/arraycopynode.cpp
> --- a/src/hotspot/share/opto/arraycopynode.cpp
> +++ b/src/hotspot/share/opto/arraycopynode.cpp
> @@ -422,7 +422,8 @@
>       Node *start_mem_dest = mm->memory_at(alias_idx_dest);
>       Node* mem = start_mem_dest;
>   
> -    assert(copy_type != T_OBJECT, "only tightly coupled allocations for object arrays");
> +    BarrierSetC2* bs = BarrierSet::barrier_set()->barrier_set_c2();
> +    assert(copy_type != T_OBJECT || !bs->array_copy_requires_gc_barriers(false, T_OBJECT, false, BarrierSetC2::Optimization), "only tightly coupled allocations for object arrays");
>       bool same_alias = (alias_idx_src == alias_idx_dest);
>   
>       if (count > 0) {
> 
> 
> New webrev:
> 
> http://cr.openjdk.java.net/~roland/8210887/webrev.02/
> 
> Roland.
> 

From Alan.Bateman at oracle.com  Sun Sep 30 15:31:03 2018
From: Alan.Bateman at oracle.com (Alan Bateman)
Date: Sun, 30 Sep 2018 16:31:03 +0100
Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over
 non-volatile memory
In-Reply-To: <df03ca64-d68a-ca8b-5309-395672ac7dcb@redhat.com>
References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com>
 <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com>
 <c31673fd-133f-56e7-2e82-77e5109be9fc@oracle.com>
 <e7a20676-e76e-27c6-a770-44382d4d77ad@redhat.com>
 <50ed4716-b76e-6557-1146-03084776c160@redhat.com>
 <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com>
 <ba59cf3a-1b07-1530-79d1-d7f22a1cb900@redhat.com>
 <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com>
 <ad813d34-3ebd-6f66-332a-06b9446367c0@redhat.com>
 <c4fdca2f-7c6a-7429-6b1a-efdff2994bf6@oracle.com>
 <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com>
 <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com>
 <df03ca64-d68a-ca8b-5309-395672ac7dcb@redhat.com>
Message-ID: <09c92b1a-c6da-e16b-bb68-553876e8a6ea@oracle.com>

On 26/09/2018 14:27, Andrew Dinn wrote:
> :
> I'm not clear why we should only use one flag. The two flags I specified
> reflect two independent use cases, one where data stored in an NVM
> device is accessed read-only and another where it is accessed
> read-write. Are you suggesting that the read-only case is redundant? I'm
> not sure I agree. For example, a utility which might want to review the
> state of persistent data while a service is off-line would really want
> to pass flag READ_ONLY_PERSISTENT. Of course, it could employ
> READ_WRITE_PERSISTENT (or equivalently, SYNC) and just not write the
> data but, mutatis mutandis, that same argument would remove the case for
> flag READ_ONLY.
>
I'm wrong on this point. The map takes a single MapMode, not a set of 
modes as I was assuming,? so you are right that it needs two new modes, 
not one. I do think we should re-visit the name though as the native 
flag is MAP_SYNC.

-Alan