From Pengfei.Li at arm.com Mon Sep 3 05:49:46 2018 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Mon, 3 Sep 2018 05:49:46 +0000 Subject: RFR(S): 8210152: Optimize integer divisible by power-of-2 check In-Reply-To: <97407ba1-7aec-0893-b540-7e1472ce9529@oracle.com> References: <97407ba1-7aec-0893-b540-7e1472ce9529@oracle.com> Message-ID: Hi Vladimir, Dean, Thanks for your review. > I don't see where negation is coming from for 'X % 2 == 0' expression. > It should be only 2 instructions: 'cmp (X and 1), 0' The 'cmp (X and 1), 0' is just what we expected. But there's redundant conditional negation coming from the possibly negative X handling in "X % 2". For instance, X = -5, "X % 2" should be -1. So only "(X and 1)" operation is not enough. We have to negate the result. > I will look on it next week. But it would be nice if you can provide small test to show this issue. I've already provided a case of "if (a%2 == 0) { ... }" in JBS description. What code generated and what can be optimized are listed there. You could see https://bugs.openjdk.java.net/browse/JDK-8210152 for details. You could also see the test case for this optimization I attached below. > It looks like your matching may allow more patterns than expected. I was expecting it to look for < 0 or >= 0 for the conditional negation, but I don't see it. Yes. I didn't limit the if condition to <0 or >= 0 so it will match more patterns. But nothing is going wrong if this ideal transformation applies on more cases. In pseudo code, if someone writes: if ( some_condition ) { x = -x; } if ( x == 0 ) { do_something(); } The negation in 1st if-clause could always be eliminated whatever the condition is. -- Thanks, Pengfei -- my test case attached below -- public class Foo { public static void main(String[] args) { int[] dividends = { 0, 17, 1553, -90, -35789, 0x80000000 }; for (int i = 0; i < dividends.length; i++) { int x = dividends[i]; System.out.println(testDivisible(x)); System.out.println(testModulo(x)); testCondNeg(x); } return; } public static int testDivisible(int x) { // Modulo result is only for zero check if (x % 4 == 0) { return 444; } return 555; } public static int testModulo(int x) { int y = x % 4; if (y == 0) { return 222; } // Modulo result is used elsewhere System.out.println(y); return 333; } public static void testCondNeg(int x) { // Pure conditional negation if (printAndIfNeg(x)) { x = -x; } if (x == 0) { System.out.println("zero!"); } } static boolean printAndIfNeg(int x) { System.out.println(x); return x <= 0; } } From rwestrel at redhat.com Mon Sep 3 07:21:16 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 03 Sep 2018 09:21:16 +0200 Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11 In-Reply-To: <98587949-4bef-51ad-0652-3a74f79af233@oracle.com> References: <98587949-4bef-51ad-0652-3a74f79af233@oracle.com> Message-ID: Hi Vladimir, Thanks for the review. Here is a new webrev that should address your comment. http://cr.openjdk.java.net/~roland/8209544/webrev.01/ Roland. From erik.osterlund at oracle.com Mon Sep 3 08:37:04 2018 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 3 Sep 2018 10:37:04 +0200 Subject: RFR: JDK-8207169: X86: Modularize cmpxchg-oop assembler for C1 and C2 In-Reply-To: References: <55efde87-1da8-e644-f45d-51dfbb3b934c@redhat.com> <5B86B7CE.3030507@oracle.com> <9c22a683-9065-6a2e-8f7c-202cc30cb378@redhat.com> <5B86BBB6.7000401@oracle.com> <026c2eef-fa68-cc74-0763-c3c54018342a@redhat.com> <4D4C2A84-7FB2-4E51-8EAF-51EBF73D0F14@oracle.com> Message-ID: <5B8CF2B0.2060509@oracle.com> Hi Roland, First of all, I apologize for getting your name wrong in the last email. On 2018-08-31 16:46, Roland Westrelin wrote: >> Well... C1 uses CAS in the heap only for the Unsafe CAS intrinsic, >> which is indeed inserted at parse time. And all other GCs alter the >> CFG for the GC barriers in their CAS barriers, using LIR. Except >> Epsilon I suppose. > Are you talking about for instance G1BarrierSetC1::pre_barrier()? That > method adds control flow withing a basic block. It doesn't hack the CFG > (it doesn't add new basic blocks). How can the register allocator > compute liveness without a correct CFG? Either > G1BarrierSetC1::pre_barrier() is a simple enough case that register > allocation is correct or there are some nasty bugs in there. In any > case, building control flow within a block like > G1BarrierSetC1::pre_barrier() does is an ugly hack. Doing anything more > complicated that way is asking for trouble. The C1 basic blocks are built and optimized as part of the HIR and are not to be changed after that. Once the HIR is generated, the LIR inserts operations required for lowering this optimized HIR to machine code. After IR::compute_code() of the HIR, those basic blocks are set in stone. That means that any control flow alterations needed by the LIRGenerator, which comes into play after that, is going to use branches within the HIR basic block instead (as we promised not to change the HIR basic blocks after the HIR is built and optimized). I can see how that might feel like a hack, but that is kind of the way that things are currently done in C1. It is used this way for all barrier sets today (UseCondCardMark for card marking GCs, for G1, ZGC), and it's also used by T_BOOLEAN normalization, switch statements, checking for referents in unsafe intrinsics etc. I suppose the stubs inserted at the LIR level also similarly break the basic block abstraction of the HIR level. These are things that can of course be changed into a more strict basic block model even at the LIR level. But I don't know how much that would help given that this is just the pass before lowering to machine code. But that is a whole different discussion. I do not propose to move the GC barriers into the HIR - it is too early. I propose to insert it at the LIR level like all the other GCs, in a similar way to all the other GCs, using the same mechanisms used by all the other GCs. @Roman: If you feel more comfortable using your own LIR_Op with your own lowering or stubs instead because you want this written in assembly for whatever reason, then I am fine with that too as long as it is contained in the shenandoah folders. What I do have reservations against is to change the API that everybody else uses to make the LIRGenerator raw CAS get lowered into a not raw Access call to the macro assembler, passing in temporary registers used by Shenandoah from above into the raw cas used by the not raw macro assembler access CAS. For example, in ZGC we have a class LIR_OpZLoadBarrierTest LIR_Op defined in zBarrierSetC1.cpp, which allows us to do custom machine dependent lowering of the test itself, which can be inserted into the LIR list. I hope we are on the same page here! Thanks, /Erik > Roland. From rwestrel at redhat.com Mon Sep 3 08:41:21 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 03 Sep 2018 10:41:21 +0200 Subject: RFR: JDK-8207169: X86: Modularize cmpxchg-oop assembler for C1 and C2 In-Reply-To: <5B8CF2B0.2060509@oracle.com> References: <55efde87-1da8-e644-f45d-51dfbb3b934c@redhat.com> <5B86B7CE.3030507@oracle.com> <9c22a683-9065-6a2e-8f7c-202cc30cb378@redhat.com> <5B86BBB6.7000401@oracle.com> <026c2eef-fa68-cc74-0763-c3c54018342a@redhat.com> <4D4C2A84-7FB2-4E51-8EAF-51EBF73D0F14@oracle.com> <5B8CF2B0.2060509@oracle.com> Message-ID: Hi Erik, > The C1 basic blocks are built and optimized as part of the HIR and are > not to be changed after that. Once the HIR is generated, the LIR inserts > operations required for lowering this optimized HIR to machine code. > After IR::compute_code() of the HIR, those basic blocks are set in > stone. That means that any control flow alterations needed by the > LIRGenerator, which comes into play after that, is going to use branches > within the HIR basic block instead (as we promised not to change the HIR > basic blocks after the HIR is built and optimized). I can see how that > might feel like a hack, but that is kind of the way that things are > currently done in C1. It is used this way for all barrier sets today > (UseCondCardMark for card marking GCs, for G1, ZGC), and it's also used > by T_BOOLEAN normalization, switch statements, checking for referents in > unsafe intrinsics etc. I suppose the stubs inserted at the LIR level > also similarly break the basic block abstraction of the HIR level. These > are things that can of course be changed into a more strict basic block > model even at the LIR level. But I don't know how much that would help > given that this is just the pass before lowering to machine code. But > that is a whole different discussion. Adding a loop within a basic block is simply not possible. The register allocator won't know it's a loop and has no way to know operands are live across iterations. So it's not like we even have a choice. Roland. From erik.osterlund at oracle.com Mon Sep 3 08:54:34 2018 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 3 Sep 2018 10:54:34 +0200 Subject: RFR: JDK-8207169: X86: Modularize cmpxchg-oop assembler for C1 and C2 In-Reply-To: References: <55efde87-1da8-e644-f45d-51dfbb3b934c@redhat.com> <5B86B7CE.3030507@oracle.com> <9c22a683-9065-6a2e-8f7c-202cc30cb378@redhat.com> <5B86BBB6.7000401@oracle.com> <026c2eef-fa68-cc74-0763-c3c54018342a@redhat.com> <4D4C2A84-7FB2-4E51-8EAF-51EBF73D0F14@oracle.com> <5B8CF2B0.2060509@oracle.com> Message-ID: <5B8CF6CA.1010600@oracle.com> Hi Roman, Who would clobber those registers between iterations though in your tight loop? /Erik On 2018-09-03 10:41, Roland Westrelin wrote: > Hi Erik, > >> The C1 basic blocks are built and optimized as part of the HIR and are >> not to be changed after that. Once the HIR is generated, the LIR inserts >> operations required for lowering this optimized HIR to machine code. >> After IR::compute_code() of the HIR, those basic blocks are set in >> stone. That means that any control flow alterations needed by the >> LIRGenerator, which comes into play after that, is going to use branches >> within the HIR basic block instead (as we promised not to change the HIR >> basic blocks after the HIR is built and optimized). I can see how that >> might feel like a hack, but that is kind of the way that things are >> currently done in C1. It is used this way for all barrier sets today >> (UseCondCardMark for card marking GCs, for G1, ZGC), and it's also used >> by T_BOOLEAN normalization, switch statements, checking for referents in >> unsafe intrinsics etc. I suppose the stubs inserted at the LIR level >> also similarly break the basic block abstraction of the HIR level. These >> are things that can of course be changed into a more strict basic block >> model even at the LIR level. But I don't know how much that would help >> given that this is just the pass before lowering to machine code. But >> that is a whole different discussion. > Adding a loop within a basic block is simply not possible. The register > allocator won't know it's a loop and has no way to know operands are > live across iterations. So it's not like we even have a choice. > > Roland. From rwestrel at redhat.com Mon Sep 3 08:58:00 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 03 Sep 2018 10:58:00 +0200 Subject: RFR: JDK-8207169: X86: Modularize cmpxchg-oop assembler for C1 and C2 In-Reply-To: <5B8CF6CA.1010600@oracle.com> References: <55efde87-1da8-e644-f45d-51dfbb3b934c@redhat.com> <5B86B7CE.3030507@oracle.com> <9c22a683-9065-6a2e-8f7c-202cc30cb378@redhat.com> <5B86BBB6.7000401@oracle.com> <026c2eef-fa68-cc74-0763-c3c54018342a@redhat.com> <4D4C2A84-7FB2-4E51-8EAF-51EBF73D0F14@oracle.com> <5B8CF2B0.2060509@oracle.com> <5B8CF6CA.1010600@oracle.com> Message-ID: > Who would clobber those registers between iterations though in your > tight loop? Ignoring cas, but with a simple example: input = 0; loop_entry: input++; array[i] = input; // some other code goto loop_entry; input is live across iterations but given the loop is hidden in a basic block, the register allocator expects it to be live from its initialization to the store in the array. So it's free to assign a register to input and reuse it in whatever code is in the rest of the loop body. Roland. From rkennke at redhat.com Mon Sep 3 08:59:13 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 03 Sep 2018 10:59:13 +0200 Subject: RFR: JDK-8207169: X86: Modularize cmpxchg-oop assembler for C1 and C2 In-Reply-To: <5B8CF2B0.2060509@oracle.com> References: <55efde87-1da8-e644-f45d-51dfbb3b934c@redhat.com> <5B86B7CE.3030507@oracle.com> <9c22a683-9065-6a2e-8f7c-202cc30cb378@redhat.com> <5B86BBB6.7000401@oracle.com> <026c2eef-fa68-cc74-0763-c3c54018342a@redhat.com> <4D4C2A84-7FB2-4E51-8EAF-51EBF73D0F14@oracle.com> <5B8CF2B0.2060509@oracle.com> Message-ID: I wasn't sure that the BarrierSetC1 interface allows to define custom ops. This sounds like a good natural solution. Ditto for C2. Let's see if we can make that work. Roman Am 3. September 2018 10:37:04 MESZ schrieb "Erik ?sterlund" : >Hi Roland, > >First of all, I apologize for getting your name wrong in the last >email. > >On 2018-08-31 16:46, Roland Westrelin wrote: >>> Well... C1 uses CAS in the heap only for the Unsafe CAS intrinsic, >>> which is indeed inserted at parse time. And all other GCs alter the >>> CFG for the GC barriers in their CAS barriers, using LIR. Except >>> Epsilon I suppose. >> Are you talking about for instance G1BarrierSetC1::pre_barrier()? >That >> method adds control flow withing a basic block. It doesn't hack the >CFG >> (it doesn't add new basic blocks). How can the register allocator >> compute liveness without a correct CFG? Either >> G1BarrierSetC1::pre_barrier() is a simple enough case that register >> allocation is correct or there are some nasty bugs in there. In any >> case, building control flow within a block like >> G1BarrierSetC1::pre_barrier() does is an ugly hack. Doing anything >more >> complicated that way is asking for trouble. > >The C1 basic blocks are built and optimized as part of the HIR and are >not to be changed after that. Once the HIR is generated, the LIR >inserts >operations required for lowering this optimized HIR to machine code. >After IR::compute_code() of the HIR, those basic blocks are set in >stone. That means that any control flow alterations needed by the >LIRGenerator, which comes into play after that, is going to use >branches >within the HIR basic block instead (as we promised not to change the >HIR >basic blocks after the HIR is built and optimized). I can see how that >might feel like a hack, but that is kind of the way that things are >currently done in C1. It is used this way for all barrier sets today >(UseCondCardMark for card marking GCs, for G1, ZGC), and it's also used > >by T_BOOLEAN normalization, switch statements, checking for referents >in >unsafe intrinsics etc. I suppose the stubs inserted at the LIR level >also similarly break the basic block abstraction of the HIR level. >These >are things that can of course be changed into a more strict basic block > >model even at the LIR level. But I don't know how much that would help >given that this is just the pass before lowering to machine code. But >that is a whole different discussion. > >I do not propose to move the GC barriers into the HIR - it is too >early. >I propose to insert it at the LIR level like all the other GCs, in a >similar way to all the other GCs, using the same mechanisms used by all > >the other GCs. > >@Roman: If you feel more comfortable using your own LIR_Op with your >own >lowering or stubs instead because you want this written in assembly for > >whatever reason, then I am fine with that too as long as it is >contained >in the shenandoah folders. What I do have reservations against is to >change the API that everybody else uses to make the LIRGenerator raw >CAS >get lowered into a not raw Access call to the macro assembler, passing >in temporary registers used by Shenandoah from above into the raw cas >used by the not raw macro assembler access CAS. > >For example, in ZGC we have a class LIR_OpZLoadBarrierTest LIR_Op >defined in zBarrierSetC1.cpp, which allows us to do custom machine >dependent lowering of the test itself, which can be inserted into the >LIR list. > >I hope we are on the same page here! > >Thanks, >/Erik > >> Roland. From erik.osterlund at oracle.com Mon Sep 3 09:25:12 2018 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 3 Sep 2018 11:25:12 +0200 Subject: RFR: JDK-8207169: X86: Modularize cmpxchg-oop assembler for C1 and C2 In-Reply-To: References: <55efde87-1da8-e644-f45d-51dfbb3b934c@redhat.com> <5B86B7CE.3030507@oracle.com> <9c22a683-9065-6a2e-8f7c-202cc30cb378@redhat.com> <5B86BBB6.7000401@oracle.com> <026c2eef-fa68-cc74-0763-c3c54018342a@redhat.com> <4D4C2A84-7FB2-4E51-8EAF-51EBF73D0F14@oracle.com> <5B8CF2B0.2060509@oracle.com> Message-ID: <5B8CFDF8.9040700@oracle.com> Hi Roman, It did not use to be possible as it needed its own enum switches all over the place. But as part of my C1 barrier set interface work, I wanted to make it possible to make your own LIR_Ops in the barrier set as well without cluttering the switches and inserted appropriate virutal calls to the LIR_Ops allowing you to do that. Now, basically, if your LIR_Op id is lir_none (which the default constructor sets it to), then it will use virtual calls into your LIR_Op in the switch statements. I see how inserting LIR loops in the HIR basic block in the general case can go horribly wrong as Roland showed in his example. So if you feel like defining your own LIR_Op and lower it in your barrier set is the more natural solution for Shenandoah, you can use that mechanism of course. It sounds like we have reached an agreement? Thanks, /Erik On 2018-09-03 10:59, Roman Kennke wrote: > I wasn't sure that the BarrierSetC1 interface allows to define custom ops. This sounds like a good natural solution. Ditto for C2. Let's see if we can make that work. > > Roman > > Am 3. September 2018 10:37:04 MESZ schrieb "Erik ?sterlund" : >> Hi Roland, >> >> First of all, I apologize for getting your name wrong in the last >> email. >> >> On 2018-08-31 16:46, Roland Westrelin wrote: >>>> Well... C1 uses CAS in the heap only for the Unsafe CAS intrinsic, >>>> which is indeed inserted at parse time. And all other GCs alter the >>>> CFG for the GC barriers in their CAS barriers, using LIR. Except >>>> Epsilon I suppose. >>> Are you talking about for instance G1BarrierSetC1::pre_barrier()? >> That >>> method adds control flow withing a basic block. It doesn't hack the >> CFG >>> (it doesn't add new basic blocks). How can the register allocator >>> compute liveness without a correct CFG? Either >>> G1BarrierSetC1::pre_barrier() is a simple enough case that register >>> allocation is correct or there are some nasty bugs in there. In any >>> case, building control flow within a block like >>> G1BarrierSetC1::pre_barrier() does is an ugly hack. Doing anything >> more >>> complicated that way is asking for trouble. >> The C1 basic blocks are built and optimized as part of the HIR and are >> not to be changed after that. Once the HIR is generated, the LIR >> inserts >> operations required for lowering this optimized HIR to machine code. >> After IR::compute_code() of the HIR, those basic blocks are set in >> stone. That means that any control flow alterations needed by the >> LIRGenerator, which comes into play after that, is going to use >> branches >> within the HIR basic block instead (as we promised not to change the >> HIR >> basic blocks after the HIR is built and optimized). I can see how that >> might feel like a hack, but that is kind of the way that things are >> currently done in C1. It is used this way for all barrier sets today >> (UseCondCardMark for card marking GCs, for G1, ZGC), and it's also used >> >> by T_BOOLEAN normalization, switch statements, checking for referents >> in >> unsafe intrinsics etc. I suppose the stubs inserted at the LIR level >> also similarly break the basic block abstraction of the HIR level. >> These >> are things that can of course be changed into a more strict basic block >> >> model even at the LIR level. But I don't know how much that would help >> given that this is just the pass before lowering to machine code. But >> that is a whole different discussion. >> >> I do not propose to move the GC barriers into the HIR - it is too >> early. >> I propose to insert it at the LIR level like all the other GCs, in a >> similar way to all the other GCs, using the same mechanisms used by all >> >> the other GCs. >> >> @Roman: If you feel more comfortable using your own LIR_Op with your >> own >> lowering or stubs instead because you want this written in assembly for >> >> whatever reason, then I am fine with that too as long as it is >> contained >> in the shenandoah folders. What I do have reservations against is to >> change the API that everybody else uses to make the LIRGenerator raw >> CAS >> get lowered into a not raw Access call to the macro assembler, passing >> in temporary registers used by Shenandoah from above into the raw cas >> used by the not raw macro assembler access CAS. >> >> For example, in ZGC we have a class LIR_OpZLoadBarrierTest LIR_Op >> defined in zBarrierSetC1.cpp, which allows us to do custom machine >> dependent lowering of the test itself, which can be inserted into the >> LIR list. >> >> I hope we are on the same page here! >> >> Thanks, >> /Erik >> >>> Roland. From rkennke at redhat.com Mon Sep 3 09:57:51 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 3 Sep 2018 11:57:51 +0200 Subject: RFR: JDK-8207169: X86: Modularize cmpxchg-oop assembler for C1 and C2 In-Reply-To: <5B8CFDF8.9040700@oracle.com> References: <55efde87-1da8-e644-f45d-51dfbb3b934c@redhat.com> <5B86B7CE.3030507@oracle.com> <9c22a683-9065-6a2e-8f7c-202cc30cb378@redhat.com> <5B86BBB6.7000401@oracle.com> <026c2eef-fa68-cc74-0763-c3c54018342a@redhat.com> <4D4C2A84-7FB2-4E51-8EAF-51EBF73D0F14@oracle.com> <5B8CF2B0.2060509@oracle.com> <5B8CFDF8.9040700@oracle.com> Message-ID: <40ef7c2e-faad-ce6e-3fda-3e1c66aaf517@redhat.com> Hi Erik, > It did not use to be possible as it needed its own enum switches all > over the place. But as part of my C1 barrier set interface work, I > wanted to make it possible to make your own LIR_Ops in the barrier set > as well without cluttering the switches and inserted appropriate virutal > calls to the LIR_Ops allowing you to do that. Now, basically, if your > LIR_Op id is lir_none (which the default constructor sets it to), then > it will use virtual calls into your LIR_Op in the switch statements. > > I see how inserting LIR loops in the HIR basic block in the general case > can go horribly wrong as Roland showed in his example. So if you feel > like defining your own LIR_Op and lower it in your barrier set is the > more natural solution for Shenandoah, you can use that mechanism of course. > > It sounds like we have reached an agreement? I think so, at least for now. We'll try to turn our cmpxchg-oop problem into LIR_Op and C2 node and see how that goes. I withdraw this RFR. Thanks a lot, Roman > > Thanks, > /Erik > > On 2018-09-03 10:59, Roman Kennke wrote: >> I wasn't sure that the BarrierSetC1 interface allows to define custom >> ops. This sounds like a good natural solution. Ditto for C2. Let's see >> if we can make that work. >> >> Roman >> >> Am 3. September 2018 10:37:04 MESZ schrieb "Erik ?sterlund" >> : >>> Hi Roland, >>> >>> First of all, I apologize for getting your name wrong in the last >>> email. >>> >>> On 2018-08-31 16:46, Roland Westrelin wrote: >>>>> Well... C1 uses CAS in the heap only for the Unsafe CAS intrinsic, >>>>> which is indeed inserted at parse time. And all other GCs alter the >>>>> CFG for the GC barriers in their CAS barriers, using LIR. Except >>>>> Epsilon I suppose. >>>> Are you talking about for instance G1BarrierSetC1::pre_barrier()? >>> That >>>> method adds control flow withing a basic block. It doesn't hack the >>> CFG >>>> (it doesn't add new basic blocks). How can the register allocator >>>> compute liveness without a correct CFG? Either >>>> G1BarrierSetC1::pre_barrier() is a simple enough case that register >>>> allocation is correct or there are some nasty bugs in there. In any >>>> case, building control flow within a block like >>>> G1BarrierSetC1::pre_barrier() does is an ugly hack. Doing anything >>> more >>>> complicated that way is asking for trouble. >>> The C1 basic blocks are built and optimized as part of the HIR and are >>> not to be changed after that. Once the HIR is generated, the LIR >>> inserts >>> operations required for lowering this optimized HIR to machine code. >>> After IR::compute_code() of the HIR, those basic blocks are set in >>> stone. That means that any control flow alterations needed by the >>> LIRGenerator, which comes into play after that, is going to use >>> branches >>> within the HIR basic block instead (as we promised not to change the >>> HIR >>> basic blocks after the HIR is built and optimized). I can see how that >>> might feel like a hack, but that is kind of the way that things are >>> currently done in C1. It is used this way for all barrier sets today >>> (UseCondCardMark for card marking GCs, for G1, ZGC), and it's also used >>> >>> by T_BOOLEAN normalization, switch statements, checking for referents >>> in >>> unsafe intrinsics etc. I suppose the stubs inserted at the LIR level >>> also similarly break the basic block abstraction of the HIR level. >>> These >>> are things that can of course be changed into a more strict basic block >>> >>> model even at the LIR level. But I don't know how much that would help >>> given that this is just the pass before lowering to machine code. But >>> that is a whole different discussion. >>> >>> I do not propose to move the GC barriers into the HIR - it is too >>> early. >>> I propose to insert it at the LIR level like all the other GCs, in a >>> similar way to all the other GCs, using the same mechanisms used by all >>> >>> the other GCs. >>> >>> @Roman: If you feel more comfortable using your own LIR_Op with your >>> own >>> lowering or stubs instead because you want this written in assembly for >>> >>> whatever reason, then I am fine with that too as long as it is >>> contained >>> in the shenandoah folders. What I do have reservations against is to >>> change the API that everybody else uses to make the LIRGenerator raw >>> CAS >>> get lowered into a not raw Access call to the macro assembler, passing >>> in temporary registers used by Shenandoah from above into the raw cas >>> used by the not raw macro assembler access CAS. >>> >>> For example, in ZGC we have a class LIR_OpZLoadBarrierTest LIR_Op >>> defined in zBarrierSetC1.cpp, which allows us to do custom machine >>> dependent lowering of the test itself, which can be inserted into the >>> LIR list. >>> >>> I hope we are on the same page here! >>> >>> Thanks, >>> /Erik >>> >>>> Roland. > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From goetz.lindenmaier at sap.com Mon Sep 3 12:27:56 2018 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 3 Sep 2018 12:27:56 +0000 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: <346da54af45243c4bdaf475f118a450d@sap.com> References: <346da54af45243c4bdaf475f118a450d@sap.com> Message-ID: <9553d65d98f74f37a35b49a1e39f015e@sap.com> Hi Michihiro, I had a look at your change. First, this should have been reviewed on hotspot-compiler-dev. It is clearly a compiler change. http://mail.openjdk.java.net/mailman/listinfo says that hotspot-dev is for "Technical discussion about the development of the HotSpot virtual machine that's not specific to any particular component" while hotspot-compiler-dev is for "Technical discussion about the development of the HotSpot bytecode compilers" Also, I can not find all of the mail traffic in http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html. Is this a problem of the pipermail server? For some reason this webrev lacks the links to browse the diffs. Do you need to use a more recent webrev? You can obtain it with hg clone http://hg.openjdk.java.net/code-tools/webrev/ . Why do you rename vnoreg to vnoregi? Besides that the change is fine, thanks for implementing this! Best regards, Goetz. > -----Original Message----- > From: Doerr, Martin > Sent: Dienstag, 28. August 2018 19:35 > To: Gustavo Romero ; Michihiro Horie > > Cc: Lindenmaier, Goetz ; hotspot- > dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker > > Subject: RE: RFR: 8208171: PPC64: Enrich SLP support > > Hi Michihiro, > > thank you for implementing it. I have just taken a first look at your > webrev.01. > > It looks basically good. Only the Power version check seems to be incorrect. > VM_Version::has_popcntb() checks for Power5. > I believe most instructions are available with Power7. > Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with > Power8? > We should check this carefully. > > Also, indentation in register_ppc.hpp could get improved. > > Thanks and best regard, > Martin > > > -----Original Message----- > From: Gustavo Romero > Sent: Donnerstag, 26. Juli 2018 16:02 > To: Michihiro Horie > Cc: Lindenmaier, Goetz ; hotspot- > dev at openjdk.java.net; Doerr, Martin ; ppc-aix- > port-dev at openjdk.java.net; Simonis, Volker > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > Hi Michi, > > On 07/26/2018 01:43 AM, Michihiro Horie wrote: > > I updated webrev: > > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ > > Thanks for providing an updated webrev and for fixing indentation and > function > order in assembler_ppc.inline.hpp as well. I have no further comments :) > > > Best Regards, > Gustavo > > > > > Best regards, > > -- > > Michihiro, > > IBM Research - Tokyo > > > > Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, > On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero --- > 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie > wrote: > > > > From: Gustavo Romero > > To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port- > dev at openjdk.java.net, hotspot-dev at openjdk.java.net > > Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin" > > > Date: 2018/07/25 23:05 > > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > > > ------------------------------------------------------------------------------------------- > ---------------------------------------------------------------------------------------------- > ---------------------------------------------------------------------------------------------- > ---------------------------------------------------------------------------------------------- > ---------------------------------------------------------------------------------------------- > ---------------------------------------------------------------------------------------------- > ---------------------------------------------------------------------------------------------- > ---------------------------------------------------------------------------------------------- > ---------------------------------------------------------------------------------------------- > ---------------------------------------------------------------------------------------------- > ----------------------------------------------------- > > > > > > > > Hi Michi, > > > > On 07/25/2018 02:43 AM, Michihiro Horie wrote: > > > Dear all, > > > > > > Would you review the following change? > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 > > > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 > > > > > > This change adds support for vectorized arithmetic calculation with SLP. > > > > > > The to_vr function is added to convert VSR to VR. Currently, vecX is > associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, > which are exactly overlapped with VRs. Instruction APIs receiving VRs use the > to_vr via vecX. Another thing is the change in sqrtF_reg to enable the > matching with SqrtVF. I think the change in sqrtF_reg would be fine due to > the ConvD2FNode::Value in convertnode.cpp. > > > > Looks good. Just a few comments: > > > > - In vmul4F_reg() would it be reasonable to use xvmulsp instead of > vmaddfp in > > order to avoid the splat? > > > > - Although all instructions added by your change where introduced in ISA > 2.06, > > so POWER7 and above are OK, as I see probes for > PowerArchictecturePPC64=6|5 in > > vm_version_ppc.cpp (line 64), I'm wondering if there is any control point > to > > guarantee that these instructions won't be emitted on a CPU that does > not > > support them. > > > > - I think that in general string in format %{} are in upper case. For instance, > > this the current output on optoassembly for vmul4F: > > > > 2941835 5b4 ADDI R24, R24, #64 > > 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F > > 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector > > > > I think it would be better to be in upper case instead. I also think that if > > the node match emits more than one instruction all instructions must be > listed > > in format %{}, since it's meant for detailed debugging. Finally I think it > > would be better to replace \t! by \t// in that string (unless I'm missing any > > special meaning for that char). So for vmul4F it would be something like: > > > > 2941835 5b4 ADDI R24, R24, #64 > > VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 > > 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F > > 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector > > > > > > But feel free to change anything just after you get additional reviews :) > > > > > > > I confirmed this change with JTREG. In addition, I used attached micro > benchmarks. > > > /(See attached file: slp_microbench.zip)/ > > > > Thanks for sharing it. > > Btw, another option to host it would be in the CR > > server, in http://cr.openjdk.java.net/~mhorie/8208171 > > > > > > Best regards, > > Gustavo > > > > > > > > Best regards, > > > -- > > > Michihiro, > > > IBM Research - Tokyo > > > > > > > > > From gromero at linux.vnet.ibm.com Mon Sep 3 12:56:44 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 3 Sep 2018 09:56:44 -0300 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: <9553d65d98f74f37a35b49a1e39f015e@sap.com> References: <346da54af45243c4bdaf475f118a450d@sap.com> <9553d65d98f74f37a35b49a1e39f015e@sap.com> Message-ID: <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com> Hi Goetz, On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote: > Also, I can not find all of the mail traffic in > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html. > Is this a problem of the pipermail server? > > For some reason this webrev lacks the links to browse the diffs. > Do you need to use a more recent webrev? You can obtain it with > hg clone http://hg.openjdk.java.net/code-tools/webrev/ . Yes, probably it was a problem of the pipermail or in some relay. I noted the same thing, i.e. at least one Michi reply arrived to me but missed a ML. The initial discussion is here: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html I understand Martin reviewed the last webrev in that thread, which is http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ (taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html) Martin's review of webrev.01: http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html and Michi's reply to Martin's review of webrev.01: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02, taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html). and your last review: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html HTH. Best regards, Gustavo > Why do you rename vnoreg to vnoregi? > > Besides that the change is fine, thanks for implementing this! > > Best regards, > Goetz. > > >> -----Original Message----- >> From: Doerr, Martin >> Sent: Dienstag, 28. August 2018 19:35 >> To: Gustavo Romero ; Michihiro Horie >> >> Cc: Lindenmaier, Goetz ; hotspot- >> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker >> >> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support >> >> Hi Michihiro, >> >> thank you for implementing it. I have just taken a first look at your >> webrev.01. >> >> It looks basically good. Only the Power version check seems to be incorrect. >> VM_Version::has_popcntb() checks for Power5. >> I believe most instructions are available with Power7. >> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with >> Power8? >> We should check this carefully. >> >> Also, indentation in register_ppc.hpp could get improved. >> >> Thanks and best regard, >> Martin >> >> >> -----Original Message----- >> From: Gustavo Romero >> Sent: Donnerstag, 26. Juli 2018 16:02 >> To: Michihiro Horie >> Cc: Lindenmaier, Goetz ; hotspot- >> dev at openjdk.java.net; Doerr, Martin ; ppc-aix- >> port-dev at openjdk.java.net; Simonis, Volker >> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >> >> Hi Michi, >> >> On 07/26/2018 01:43 AM, Michihiro Horie wrote: >>> I updated webrev: >>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ >> >> Thanks for providing an updated webrev and for fixing indentation and >> function >> order in assembler_ppc.inline.hpp as well. I have no further comments :) >> >> >> Best Regards, >> Gustavo >> >>> >>> Best regards, >>> -- >>> Michihiro, >>> IBM Research - Tokyo >>> >>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, >> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero --- >> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie >> wrote: >>> >>> From: Gustavo Romero >>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port- >> dev at openjdk.java.net, hotspot-dev at openjdk.java.net >>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin" >> >>> Date: 2018/07/25 23:05 >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >>> >>> ------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ----------------------------------------------------- >>> >>> >>> >>> Hi Michi, >>> >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote: >>> > Dear all, >>> > >>> > Would you review the following change? >>> > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 >>> > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 >>> > >>> > This change adds support for vectorized arithmetic calculation with SLP. >>> > >>> > The to_vr function is added to convert VSR to VR. Currently, vecX is >> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, >> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the >> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the >> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to >> the ConvD2FNode::Value in convertnode.cpp. >>> >>> Looks good. Just a few comments: >>> >>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of >> vmaddfp in >>> order to avoid the splat? >>> >>> - Although all instructions added by your change where introduced in ISA >> 2.06, >>> so POWER7 and above are OK, as I see probes for >> PowerArchictecturePPC64=6|5 in >>> vm_version_ppc.cpp (line 64), I'm wondering if there is any control point >> to >>> guarantee that these instructions won't be emitted on a CPU that does >> not >>> support them. >>> >>> - I think that in general string in format %{} are in upper case. For instance, >>> this the current output on optoassembly for vmul4F: >>> >>> 2941835 5b4 ADDI R24, R24, #64 >>> 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F >>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>> >>> I think it would be better to be in upper case instead. I also think that if >>> the node match emits more than one instruction all instructions must be >> listed >>> in format %{}, since it's meant for detailed debugging. Finally I think it >>> would be better to replace \t! by \t// in that string (unless I'm missing any >>> special meaning for that char). So for vmul4F it would be something like: >>> >>> 2941835 5b4 ADDI R24, R24, #64 >>> VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 >>> 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F >>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>> >>> >>> But feel free to change anything just after you get additional reviews :) >>> >>> >>> > I confirmed this change with JTREG. In addition, I used attached micro >> benchmarks. >>> > /(See attached file: slp_microbench.zip)/ >>> >>> Thanks for sharing it. >>> Btw, another option to host it would be in the CR >>> server, in http://cr.openjdk.java.net/~mhorie/8208171 >>> >>> >>> Best regards, >>> Gustavo >>> >>> > >>> > Best regards, >>> > -- >>> > Michihiro, >>> > IBM Research - Tokyo >>> > >>> >>> >>> > From martin.doerr at sap.com Mon Sep 3 17:18:18 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 3 Sep 2018 17:18:18 +0000 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com> References: <346da54af45243c4bdaf475f118a450d@sap.com> <9553d65d98f74f37a35b49a1e39f015e@sap.com> <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com> Message-ID: Hi Gustavo and Michihiro, we noticed jtreg test failures when using this change: compiler/runtime/safepoints/TestRegisterRestoring.java compiler/runtime/Test7196199.java TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. That's what I found out so far. Maybe you have an idea? I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. Best regards, Martin -----Original Message----- From: hotspot-dev On Behalf Of Gustavo Romero Sent: Montag, 3. September 2018 14:57 To: Lindenmaier, Goetz ; Michihiro Horie Cc: hotspot compiler ; hotspot-dev at openjdk.java.net Subject: Re: RFR: 8208171: PPC64: Enrich SLP support Hi Goetz, On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote: > Also, I can not find all of the mail traffic in > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html. > Is this a problem of the pipermail server? > > For some reason this webrev lacks the links to browse the diffs. > Do you need to use a more recent webrev? You can obtain it with > hg clone http://hg.openjdk.java.net/code-tools/webrev/ . Yes, probably it was a problem of the pipermail or in some relay. I noted the same thing, i.e. at least one Michi reply arrived to me but missed a ML. The initial discussion is here: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html I understand Martin reviewed the last webrev in that thread, which is http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ (taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html) Martin's review of webrev.01: http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html and Michi's reply to Martin's review of webrev.01: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02, taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html). and your last review: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html HTH. Best regards, Gustavo > Why do you rename vnoreg to vnoregi? > > Besides that the change is fine, thanks for implementing this! > > Best regards, > Goetz. > > >> -----Original Message----- >> From: Doerr, Martin >> Sent: Dienstag, 28. August 2018 19:35 >> To: Gustavo Romero ; Michihiro Horie >> >> Cc: Lindenmaier, Goetz ; hotspot- >> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker >> >> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support >> >> Hi Michihiro, >> >> thank you for implementing it. I have just taken a first look at your >> webrev.01. >> >> It looks basically good. Only the Power version check seems to be incorrect. >> VM_Version::has_popcntb() checks for Power5. >> I believe most instructions are available with Power7. >> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with >> Power8? >> We should check this carefully. >> >> Also, indentation in register_ppc.hpp could get improved. >> >> Thanks and best regard, >> Martin >> >> >> -----Original Message----- >> From: Gustavo Romero >> Sent: Donnerstag, 26. Juli 2018 16:02 >> To: Michihiro Horie >> Cc: Lindenmaier, Goetz ; hotspot- >> dev at openjdk.java.net; Doerr, Martin ; ppc-aix- >> port-dev at openjdk.java.net; Simonis, Volker >> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >> >> Hi Michi, >> >> On 07/26/2018 01:43 AM, Michihiro Horie wrote: >>> I updated webrev: >>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ >> >> Thanks for providing an updated webrev and for fixing indentation and >> function >> order in assembler_ppc.inline.hpp as well. I have no further comments :) >> >> >> Best Regards, >> Gustavo >> >>> >>> Best regards, >>> -- >>> Michihiro, >>> IBM Research - Tokyo >>> >>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, >> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero --- >> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie >> wrote: >>> >>> From: Gustavo Romero >>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port- >> dev at openjdk.java.net, hotspot-dev at openjdk.java.net >>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin" >> >>> Date: 2018/07/25 23:05 >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >>> >>> ------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ----------------------------------------------------- >>> >>> >>> >>> Hi Michi, >>> >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote: >>> > Dear all, >>> > >>> > Would you review the following change? >>> > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 >>> > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 >>> > >>> > This change adds support for vectorized arithmetic calculation with SLP. >>> > >>> > The to_vr function is added to convert VSR to VR. Currently, vecX is >> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, >> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the >> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the >> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to >> the ConvD2FNode::Value in convertnode.cpp. >>> >>> Looks good. Just a few comments: >>> >>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of >> vmaddfp in >>> order to avoid the splat? >>> >>> - Although all instructions added by your change where introduced in ISA >> 2.06, >>> so POWER7 and above are OK, as I see probes for >> PowerArchictecturePPC64=6|5 in >>> vm_version_ppc.cpp (line 64), I'm wondering if there is any control point >> to >>> guarantee that these instructions won't be emitted on a CPU that does >> not >>> support them. >>> >>> - I think that in general string in format %{} are in upper case. For instance, >>> this the current output on optoassembly for vmul4F: >>> >>> 2941835 5b4 ADDI R24, R24, #64 >>> 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F >>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>> >>> I think it would be better to be in upper case instead. I also think that if >>> the node match emits more than one instruction all instructions must be >> listed >>> in format %{}, since it's meant for detailed debugging. Finally I think it >>> would be better to replace \t! by \t// in that string (unless I'm missing any >>> special meaning for that char). So for vmul4F it would be something like: >>> >>> 2941835 5b4 ADDI R24, R24, #64 >>> VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 >>> 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F >>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>> >>> >>> But feel free to change anything just after you get additional reviews :) >>> >>> >>> > I confirmed this change with JTREG. In addition, I used attached micro >> benchmarks. >>> > /(See attached file: slp_microbench.zip)/ >>> >>> Thanks for sharing it. >>> Btw, another option to host it would be in the CR >>> server, in http://cr.openjdk.java.net/~mhorie/8208171 >>> >>> >>> Best regards, >>> Gustavo >>> >>> > >>> > Best regards, >>> > -- >>> > Michihiro, >>> > IBM Research - Tokyo >>> > >>> >>> >>> > From gromero at linux.vnet.ibm.com Mon Sep 3 22:15:23 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 3 Sep 2018 19:15:23 -0300 Subject: RFR (S): 8209972: [GRAAL] Don't run RTM tests with Graal In-Reply-To: <3ae68161-0f78-2766-655f-fe3fe4a4719f@oracle.com> References: <9148c105-1bd8-0852-9130-6b3c7823330e@linux.vnet.ibm.com> <3ae68161-0f78-2766-655f-fe3fe4a4719f@oracle.com> Message-ID: Hi Vladimir, Thanks a lot for reviewing it and for your comments. On 08/31/2018 03:12 PM, Vladimir Kozlov wrote: > Hi Gustavo, > > I think you should replace !Compiler.isGraalEnabled() with Compiler.isC2Enabled() because C2 may be switched off with TieredStopAtLevel < 4 flag Yes, although currently afaics all tests will explicitly enabled C2 (for instance, passing -Xcomp and -XX:-TieredCompilation) or implicitly enable C2 through a warming up before testing, I agree that nothing forbids one to switch off C2 with TieredStopAtLevel = level < 4 for a test case. It also looks better to list explicitly which compilers do support RTM instead of the ones that don't support it. I've updated the webrev accordingly: http://cr.openjdk.java.net/~gromero/8209972/v2/ diff in there looks odd so I generated another one with --patience for a better (IMO) diff format: http://cr.openjdk.java.net/~gromero/8209972/v2/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v2.diff > Also can platforms check be replaced with one vmRTMCPU()? Is ppc64 return true from vmRTMCPU()? For example, on Linux the following cases are possible regarding CPU / OS RTM support: POWER7 : cpu = false, os = false => vm.rtm.cpu = false POWER8 : cpu = true, os = false | true => vm.rtm.cpu = false | true POWER9 VM: cpu = true, os = false | true => vm.rtm.cpu = false | true POWER9 NV: cpu = true, os = false => vm.rtm.cpu = false PPC64 will return 'true' for vmRTMCPU() _only_ if both CPU and OS support RTM. In other words, when @requires asks for "vm.rtm.cpu & vm.rtm.os" it really looks like a tautology because if "vm.rtm.cpu" is 'true' it implies "vm.rtm.os" being 'true' as well, otherwise the JVM would never advertise the "rtm" feature (which is used by "vm.rtm.cpu"). That seems true for Linux and for AIX. That said I don't think that the platforms check can be replaced with one vmRTMCPU(), because in some cases it's necessary to run a test for cpu = false and compiler = true, i.e. it's necessary to run a test on an unsupported CPU for a given platform _only if_ the compiler in use supports RTM (like C2). So if, for instance, we do: 'vm.rtm.compiler = C2 && vmRTMCPU()' and use 'vm.rtm.compiler' in @requires we "tie" CPU+OS RTM support to compiler RTM support and the evaluation returns 'false' for cpu = false and compiler = true, skipping the test (vm.rtm.compiler = false). On the other hand, to evaluate 'vm.rtm.compiler' as 'true' and run the test in that case one could match for '!vm.rtm.compiler', but if compiler = false (Graal) '!vm.rtm.compiler' will be evaluated as 'true' and the test will run even thought the Graal compiler is selected, which is wrong. Hence 'vm.rtm.compiler' property must not rely on vmRTMCPU() and must contain its own list of supported compilers with RTM support for each platform IMO. Basically we can't ask the JVM about the compiler's support for RTM, since the JVM can only tell us about the CPU+OS support for RTM regarding the CPU and OS in which the JVM is running on. > And since you check cpu in here why not to replace all vm.rtm.* with one vm.rtm.supported? In such case you would need only one @requires checks in tests instead of: > > vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os & vm.rtm.compiler I think it's not possible either. Currently there are 5 match cases in RTM tests: gromero at moog:~/hg/jdk/jdk/test/hotspot/jtreg/compiler/rtm$ fgrep @require -RIn | cut -d ' ' -f 2-40 | sort -u * @requires !(vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os) * @requires vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os * @requires (!vm.rtm.cpu) & (vm.flavor == "server" & !vm.emulatedClient) * @requires vm.rtm.cpu & !(vm.flavor == "server" & !vm.emulatedClient) which can be simplified 5 cases as: 1: !(flavor == "server" & !emulatedClient & cpu & os) 2: flavor == "server" & !emulatedClient & cpu & os 3: (!cpu) & (flavor == "server" & !emulatedClient) 4: cpu & !(flavor == "server" & !emulatedClient) 5: no @requires I understand that case 1 and 2 (since CPU implies OS) can be simplified as: 1: !(flavor == "server" & !emulatedClient & cpu) 2: flavor == "server" & !emulatedClient & cpu 3: (!cpu) & (flavor == "server" & !emulatedClient) 4: cpu & !(flavor == "server" & !emulatedClient) 5: no @requires and case 1 and 2 are mere opposites, so we have 4 cases: 1: !(flavor == "server" & !emulatedClient & cpu) 3: (!cpu) & (flavor == "server" & !emulatedClient) 4: cpu & !(flavor == "server" & !emulatedClient) 5: no @requires We could simplify further making P = (flavor == "server" & !emulatedClient), and make: 1: !(P & cpu) 3: (!cpu) & (P) 4: cpu & !(P) 5: no @requires So if we add a compiler = C2 && (x64 | PPC) property to each of them in order to control running the tests only if the selected compiler on a given platform has RTM support (skipping Graal, for instance): 1: !(P & cpu) & compiler 3: (!cpu) & (P) & compiler 4: cpu & !(P) & compiler 5: no @requires & compiler So it looks like that at minimum we would need 3 properties, but IMO it's not worth to add another property P = (flavor == "server" & !emulatedClient) just to simplify further the @requires line. In summing up, I think it's only possible to replace 'cpu & os' by 'cpu', so I updated the webrev removing the vm.rtm.os property and keeping only vm.rtm.cpu and vm.rtm.compiler (plus flavor and emulatedClient checks). I've tested the following scenarios and observed no regression [1]: 1. X86_64 w/ RTM 2. X86_64 w/ RTM + Graal enabled 3. POWER7: no CPU+OS support for RTM 4. POWER8: CPU+OS support for RTM But I think we need a confirmation from SAP about AIX. Best regards, Gustavo [1] ** X86_64 w/ RTM ** Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java Passed: compiler/rtm/cli/TestRTMRetryCountOption.java Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java Passed: compiler/rtm/locking/TestRTMAbortThreshold.java Passed: compiler/rtm/locking/TestRTMAbortRatio.java Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java Passed: compiler/rtm/locking/TestRTMLockingThreshold.java Passed: compiler/rtm/locking/TestRTMRetryCount.java Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java Passed: compiler/rtm/locking/TestUseRTMDeopt.java Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java Test results: passed: 30 ** X86_64 w/ RTM + Graal enabled ** Test results: no tests selected (all RTM tests skipped) ** POWER7: no CPU+OS support for RTM ** Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java Passed: compiler/rtm/cli/TestRTMRetryCountOption.java Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java Test results: passed: 10 ** POWER8: CPU+OS support for RTM ** Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java Passed: compiler/rtm/cli/TestRTMRetryCountOption.java Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java Passed: compiler/rtm/locking/TestRTMAbortRatio.java Passed: compiler/rtm/locking/TestRTMAbortThreshold.java Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java Passed: compiler/rtm/locking/TestRTMLockingThreshold.java Passed: compiler/rtm/locking/TestRTMRetryCount.java Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java Passed: compiler/rtm/locking/TestUseRTMDeopt.java Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java Test results: passed: 30 > Thanks, > Vladimir > > On 8/31/18 8:38 AM, Gustavo Romero wrote: >> Hi, >> >> Could the following small change be reviewed please? >> >> Bug : https://bugs.openjdk.java.net/browse/JDK-8209972 >> Webrev: http://cr.openjdk.java.net/~gromero/8209972/v1/ >> >> It disables all RTM tests when a compiler not supporting RTM (e.g. Graal) >> is selected on platforms that can have CPU/OS with RTM support. >> >> It also disables all RTM tests for any other platform that has not a single >> compiler supporting RTM. >> >> The RTM support was first added to C2 compiler and once checkers for RTM >> (notably vm.rtm.cpu) find the feature "rtm" advertised by the JVM they >> assume that a compiler supporting RTM is available for sure ("rtm" is >> advertised only if RTM is supported by both CPU and OS). Later the JVM >> began to allow the selection of a compiler different from C2, like Graal, >> and it became possible to select a compiler without RTM support despite the >> fact that both the CPU and the OS support RTM. Thus for platforms >> supporting Graal or any other specific compiler the compiler availability for >> the RTM tests must be adjusted and if the selected compiler does not >> support RTM then all RTM tests must be skipped, including the ones meant >> for platforms without CPU or OS RTM support (e.g. *Unsupported*.java) >> because in some cases, like in TestUseRTMLockingOptionOnUnsupportedCPU.java, >> the test expects JVM initialization errors that will never occur because the >> problem is not that the RTM support for CPU or OS is missing, but rather >> because the selected compiler does not support RTM. >> >> That change adds a new VM property 'vm.rtm.compiler' which can be used to >> filter out compilers without RTM support for specific platforms and adapts >> the current RTM tests to use that new property. >> >> Nothing changes regarding the number of passing/selected tests for the >> various cpu/os/compiler combinations on platforms that currently might >> support RTM [1], except when Graal is in use. >> >> Thank you. >> >> Best regards, >> Gustavo >> >> >> [1] >> >> ** X64 w/ CPU and OS supporting RTM ** >> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >> Passed: compiler/rtm/locking/TestRTMRetryCount.java >> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >> Test results: passed: 30 >> >> ** X64 w/ CPU and OS supporting RTM + Graal compiler wo/ RTM support ** >> Test results: no tests selected (all RTM tests skipped) >> >> ** POWER8 w/ CPU and OS supporting RTM ** >> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >> Passed: compiler/rtm/locking/TestRTMRetryCount.java >> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >> Test results: passed: 30 >> >> ** POWER7 wo/ CPU and OS supporting RTM ** >> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >> Test results: passed: 10 >> > From HORIE at jp.ibm.com Tue Sep 4 05:32:01 2018 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Tue, 4 Sep 2018 14:32:01 +0900 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: References: <346da54af45243c4bdaf475f118a450d@sap.com> <9553d65d98f74f37a35b49a1e39f015e@sap.com> <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com> Message-ID: Hi Goetz, Martin, and Gustavo, >First, this should have been reviewed on hotspot-compiler-dev. It is clearly >a compiler change. >http://mail.openjdk.java.net/mailman/listinfo ?says that hotspot-dev is for >"Technical discussion about the development of the HotSpot virtual machine that's not specific to any particular component" >while hotspot-compiler-dev is for >"Technical discussion about the development of the HotSpot bytecode compilers" I understood the instruction and would use hotspot-compiler-dev in future RFRs, thanks. > Why do you rename vnoreg to vnoregi? I followed the way of coding for vsnoreg and vsnoregi, but the renaming does not look necessary. I would get this part back. Should I also rename vsnoregi to vsnoreg? >we noticed jtreg test failures when using this change: >compiler/runtime/safepoints/TestRegisterRestoring.java >compiler/runtime/Test7196199.java > >TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. > >We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. >The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. Thank you for letting me know the issue, I will try to reproduce this on a SUSE machine. >I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. Thank you for pointing out another issue. Currently I do not hit this problem, but preventing to match the vector nodes makes sense to avoid the crash. I did not prepare match rules for non-vector nodes, so it might be better to prepare them similarly like the Replicate* rules, in any case. Gustavo, thanks for the wrap-up! Best regards, -- Michihiro, IBM Research - Tokyo From: "Doerr, Martin" To: Gustavo Romero , "Lindenmaier, Goetz" , Michihiro Horie Cc: hotspot compiler , "hotspot-dev at openjdk.java.net" Date: 2018/09/04 02:18 Subject: RE: RFR: 8208171: PPC64: Enrich SLP support Hi Gustavo and Michihiro, we noticed jtreg test failures when using this change: compiler/runtime/safepoints/TestRegisterRestoring.java compiler/runtime/Test7196199.java TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. That's what I found out so far. Maybe you have an idea? I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. Best regards, Martin -----Original Message----- From: hotspot-dev On Behalf Of Gustavo Romero Sent: Montag, 3. September 2018 14:57 To: Lindenmaier, Goetz ; Michihiro Horie Cc: hotspot compiler ; hotspot-dev at openjdk.java.net Subject: Re: RFR: 8208171: PPC64: Enrich SLP support Hi Goetz, On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote: > Also, I can not find all of the mail traffic in > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html . > Is this a problem of the pipermail server? > > For some reason this webrev lacks the links to browse the diffs. > Do you need to use a more recent webrev? You can obtain it with > hg clone http://hg.openjdk.java.net/code-tools/webrev/ . Yes, probably it was a problem of the pipermail or in some relay. I noted the same thing, i.e. at least one Michi reply arrived to me but missed a ML. The initial discussion is here: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html I understand Martin reviewed the last webrev in that thread, which is http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ (taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html ) Martin's review of webrev.01: http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html and Michi's reply to Martin's review of webrev.01: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02, taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html ). and your last review: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html HTH. Best regards, Gustavo > Why do you rename vnoreg to vnoregi? > > Besides that the change is fine, thanks for implementing this! > > Best regards, > Goetz. > > >> -----Original Message----- >> From: Doerr, Martin >> Sent: Dienstag, 28. August 2018 19:35 >> To: Gustavo Romero ; Michihiro Horie >> >> Cc: Lindenmaier, Goetz ; hotspot- >> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker >> >> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support >> >> Hi Michihiro, >> >> thank you for implementing it. I have just taken a first look at your >> webrev.01. >> >> It looks basically good. Only the Power version check seems to be incorrect. >> VM_Version::has_popcntb() checks for Power5. >> I believe most instructions are available with Power7. >> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with >> Power8? >> We should check this carefully. >> >> Also, indentation in register_ppc.hpp could get improved. >> >> Thanks and best regard, >> Martin >> >> >> -----Original Message----- >> From: Gustavo Romero >> Sent: Donnerstag, 26. Juli 2018 16:02 >> To: Michihiro Horie >> Cc: Lindenmaier, Goetz ; hotspot- >> dev at openjdk.java.net; Doerr, Martin ; ppc-aix- >> port-dev at openjdk.java.net; Simonis, Volker >> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >> >> Hi Michi, >> >> On 07/26/2018 01:43 AM, Michihiro Horie wrote: >>> I updated webrev: >>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ >> >> Thanks for providing an updated webrev and for fixing indentation and >> function >> order in assembler_ppc.inline.hpp as well. I have no further comments :) >> >> >> Best Regards, >> Gustavo >> >>> >>> Best regards, >>> -- >>> Michihiro, >>> IBM Research - Tokyo >>> >>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, >> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero --- >> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie >> wrote: >>> >>> From: Gustavo Romero >>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port- >> dev at openjdk.java.net, hotspot-dev at openjdk.java.net >>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin" >> >>> Date: 2018/07/25 23:05 >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >>> >>> ------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ----------------------------------------------------- >>> >>> >>> >>> Hi Michi, >>> >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote: >>> > Dear all, >>> > >>> > Would you review the following change? >>> > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 >>> > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 >>> > >>> > This change adds support for vectorized arithmetic calculation with SLP. >>> > >>> > The to_vr function is added to convert VSR to VR. Currently, vecX is >> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, >> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the >> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the >> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to >> the ConvD2FNode::Value in convertnode.cpp. >>> >>> Looks good. Just a few comments: >>> >>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of >> vmaddfp in >>> order to avoid the splat? >>> >>> - Although all instructions added by your change where introduced in ISA >> 2.06, >>> so POWER7 and above are OK, as I see probes for >> PowerArchictecturePPC64=6|5 in >>> vm_version_ppc.cpp (line 64), I'm wondering if there is any control point >> to >>> guarantee that these instructions won't be emitted on a CPU that does >> not >>> support them. >>> >>> - I think that in general string in format %{} are in upper case. For instance, >>> this the current output on optoassembly for vmul4F: >>> >>> 2941835 5b4 ADDI R24, R24, #64 >>> 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F >>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>> >>> I think it would be better to be in upper case instead. I also think that if >>> the node match emits more than one instruction all instructions must be >> listed >>> in format %{}, since it's meant for detailed debugging. Finally I think it >>> would be better to replace \t! by \t// in that string (unless I'm missing any >>> special meaning for that char). So for vmul4F it would be something like: >>> >>> 2941835 5b4 ADDI R24, R24, #64 >>> VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 >>> 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F >>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>> >>> >>> But feel free to change anything just after you get additional reviews :) >>> >>> >>> > I confirmed this change with JTREG. In addition, I used attached micro >> benchmarks. >>> > /(See attached file: slp_microbench.zip)/ >>> >>> Thanks for sharing it. >>> Btw, another option to host it would be in the CR >>> server, in http://cr.openjdk.java.net/~mhorie/8208171 >>> >>> >>> Best regards, >>> Gustavo >>> >>> > >>> > Best regards, >>> > -- >>> > Michihiro, >>> > IBM Research - Tokyo >>> > >>> >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From goetz.lindenmaier at sap.com Tue Sep 4 06:12:19 2018 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 4 Sep 2018 06:12:19 +0000 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: References: <346da54af45243c4bdaf475f118a450d@sap.com> <9553d65d98f74f37a35b49a1e39f015e@sap.com> <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com> Message-ID: <24ecdf108c9c4d77a3a673ecaaca3ab3@sap.com> > > Why do you rename vnoreg to vnoregi? > I followed the way of coding for vsnoreg and vsnoregi, but the renaming > does not look necessary. I would get this part back. Should I also rename > vsnoregi to vsnoreg? I think it would be more consistent, but it's not that important :) Best regards, Goetz. > > > >we noticed jtreg test failures when using this change: > >compiler/runtime/safepoints/TestRegisterRestoring.java > >compiler/runtime/Test7196199.java > > > >TestRegisterRestoring is a simple test which returns arbitrary results instead > of 10000. > > > >We didn't see it on all machines, so it might be an issue with > saving&restoring VR registers in the signal handler. > >The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" > with kernel 4.4.126-94.22-default. > Thank you for letting me know the issue, I will try to reproduce this on a SUSE > machine. > > > >I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when > your patch is applied. Looks like matching the vector nodes needs to be > prevented. > Thank you for pointing out another issue. Currently I do not hit this problem, > but preventing to match the vector nodes makes sense to avoid the crash. I > did not prepare match rules for non-vector nodes, so it might be better to > prepare them similarly like the Replicate* rules, in any case. > > > Gustavo, thanks for the wrap-up! > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > "Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we > noticed jtreg test failures when using this change: > > From: "Doerr, Martin" > To: Gustavo Romero , "Lindenmaier, Goetz" > , Michihiro Horie > Cc: hotspot compiler , "hotspot- > dev at openjdk.java.net" > Date: 2018/09/04 02:18 > Subject: RE: RFR: 8208171: PPC64: Enrich SLP support > > ________________________________ > > > > > Hi Gustavo and Michihiro, > > we noticed jtreg test failures when using this change: > compiler/runtime/safepoints/TestRegisterRestoring.java > compiler/runtime/Test7196199.java > > TestRegisterRestoring is a simple test which returns arbitrary results instead > of 10000. > > We didn't see it on all machines, so it might be an issue with saving&restoring > VR registers in the signal handler. > The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" > with kernel 4.4.126-94.22-default. > > That's what I found out so far. Maybe you have an idea? > > I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when > your patch is applied. Looks like matching the vector nodes needs to be > prevented. > > Best regards, > Martin > > > -----Original Message----- > From: hotspot-dev On Behalf Of > Gustavo Romero > Sent: Montag, 3. September 2018 14:57 > To: Lindenmaier, Goetz ; Michihiro Horie > > Cc: hotspot compiler ; hotspot- > dev at openjdk.java.net > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > Hi Goetz, > > On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote: > > Also, I can not find all of the mail traffic in > > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018- > August/thread.html. > > Is this a problem of the pipermail server? > > > > For some reason this webrev lacks the links to browse the diffs. > > Do you need to use a more recent webrev? You can obtain it with > > hg clone http://hg.openjdk.java.net/code-tools/webrev/ . > > Yes, probably it was a problem of the pipermail or in some relay. > I noted the same thing, i.e. at least one Michi reply arrived > to me but missed a ML. > > The initial discussion is here: > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018- > July/003613.html > > I understand Martin reviewed the last webrev in that thread, which is > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ (taken from > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018- > July/003615.html) > > Martin's review of webrev.01: > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018- > August/033958.html > > and Michi's reply to Martin's review of webrev.01: > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018- > August/003632.html (with webrev.02, > taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018- > August/003632.html). > > and your last review: > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018- > September/030419.html > > > HTH. > > Best regards, > Gustavo > > > Why do you rename vnoreg to vnoregi? > > > > Besides that the change is fine, thanks for implementing this! > > > > Best regards, > > Goetz. > > > > > >> -----Original Message----- > >> From: Doerr, Martin > >> Sent: Dienstag, 28. August 2018 19:35 > >> To: Gustavo Romero ; Michihiro Horie > >> > >> Cc: Lindenmaier, Goetz ; hotspot- > >> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, > Volker > >> > >> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support > >> > >> Hi Michihiro, > >> > >> thank you for implementing it. I have just taken a first look at your > >> webrev.01. > >> > >> It looks basically good. Only the Power version check seems to be > incorrect. > >> VM_Version::has_popcntb() checks for Power5. > >> I believe most instructions are available with Power7. > >> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with > >> Power8? > >> We should check this carefully. > >> > >> Also, indentation in register_ppc.hpp could get improved. > >> > >> Thanks and best regard, > >> Martin > >> > >> > >> -----Original Message----- > >> From: Gustavo Romero > >> Sent: Donnerstag, 26. Juli 2018 16:02 > >> To: Michihiro Horie > >> Cc: Lindenmaier, Goetz ; hotspot- > >> dev at openjdk.java.net; Doerr, Martin ; ppc-aix- > >> port-dev at openjdk.java.net; Simonis, Volker > >> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > >> > >> Hi Michi, > >> > >> On 07/26/2018 01:43 AM, Michihiro Horie wrote: > >>> I updated webrev: > >>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ > >> > >> Thanks for providing an updated webrev and for fixing indentation and > >> function > >> order in assembler_ppc.inline.hpp as well. I have no further comments :) > >> > >> > >> Best Regards, > >> Gustavo > >> > >>> > >>> Best regards, > >>> -- > >>> Michihiro, > >>> IBM Research - Tokyo > >>> > >>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi > Michi, > >> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero --- > >> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie > >> wrote: > >>> > >>> From: Gustavo Romero > >>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port- > >> dev at openjdk.java.net, hotspot-dev at openjdk.java.net > >>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, > Martin" > >> > >>> Date: 2018/07/25 23:05 > >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > >>> > >>> ---------------------------------------------------------------------------------------- > --- > >> ------------------------------------------------------------------------------------------ > ---- > >> ------------------------------------------------------------------------------------------ > ---- > >> ------------------------------------------------------------------------------------------ > ---- > >> ------------------------------------------------------------------------------------------ > ---- > >> ------------------------------------------------------------------------------------------ > ---- > >> ------------------------------------------------------------------------------------------ > ---- > >> ------------------------------------------------------------------------------------------ > ---- > >> ------------------------------------------------------------------------------------------ > ---- > >> ------------------------------------------------------------------------------------------ > ---- > >> ----------------------------------------------------- > >>> > >>> > >>> > >>> Hi Michi, > >>> > >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote: > >>> > Dear all, > >>> > > >>> > Would you review the following change? > >>> > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 > >>> > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 > >>> > > >>> > This change adds support for vectorized arithmetic calculation with > SLP. > >>> > > >>> > The to_vr function is added to convert VSR to VR. Currently, vecX is > >> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, > >> which are exactly overlapped with VRs. Instruction APIs receiving VRs use > the > >> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the > >> matching with SqrtVF. I think the change in sqrtF_reg would be fine due > to > >> the ConvD2FNode::Value in convertnode.cpp. > >>> > >>> Looks good. Just a few comments: > >>> > >>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of > >> vmaddfp in > >>> order to avoid the splat? > >>> > >>> - Although all instructions added by your change where introduced in ISA > >> 2.06, > >>> so POWER7 and above are OK, as I see probes for > >> PowerArchictecturePPC64=6|5 in > >>> vm_version_ppc.cpp (line 64), I'm wondering if there is any control > point > >> to > >>> guarantee that these instructions won't be emitted on a CPU that does > >> not > >>> support them. > >>> > >>> - I think that in general string in format %{} are in upper case. For > instance, > >>> this the current output on optoassembly for vmul4F: > >>> > >>> 2941835 5b4 ADDI R24, R24, #64 > >>> 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F > >>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector > >>> > >>> I think it would be better to be in upper case instead. I also think that if > >>> the node match emits more than one instruction all instructions must > be > >> listed > >>> in format %{}, since it's meant for detailed debugging. Finally I think it > >>> would be better to replace \t! by \t// in that string (unless I'm missing > any > >>> special meaning for that char). So for vmul4F it would be something > like: > >>> > >>> 2941835 5b4 ADDI R24, R24, #64 > >>> VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 > >>> 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F > >>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector > >>> > >>> > >>> But feel free to change anything just after you get additional reviews :) > >>> > >>> > >>> > I confirmed this change with JTREG. In addition, I used attached micro > >> benchmarks. > >>> > /(See attached file: slp_microbench.zip)/ > >>> > >>> Thanks for sharing it. > >>> Btw, another option to host it would be in the CR > >>> server, in http://cr.openjdk.java.net/~mhorie/8208171 > >>> > >>> > >>> Best regards, > >>> Gustavo > >>> > >>> > > >>> > Best regards, > >>> > -- > >>> > Michihiro, > >>> > IBM Research - Tokyo > >>> > > >>> > >>> > >>> > > > > > > > From HORIE at jp.ibm.com Tue Sep 4 07:36:06 2018 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Tue, 4 Sep 2018 16:36:06 +0900 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: <24ecdf108c9c4d77a3a673ecaaca3ab3@sap.com> References: <346da54af45243c4bdaf475f118a450d@sap.com> <9553d65d98f74f37a35b49a1e39f015e@sap.com> <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com> <24ecdf108c9c4d77a3a673ecaaca3ab3@sap.com> Message-ID: Hi Goetz, >I think it would be more consistent, but it's not that important :) Thank you for your comments, then I would firstly try to resolve the crash issues. Best regards, -- Michihiro, IBM Research - Tokyo From: "Lindenmaier, Goetz" To: Michihiro Horie , "Doerr, Martin" Cc: Gustavo Romero , hotspot compiler , "hotspot-dev at openjdk.java.net" Date: 2018/09/04 15:12 Subject: RE: RFR: 8208171: PPC64: Enrich SLP support > > Why do you rename vnoreg to vnoregi? > I followed the way of coding for vsnoreg and vsnoregi, but the renaming > does not look necessary. I would get this part back. Should I also rename > vsnoregi to vsnoreg? I think it would be more consistent, but it's not that important :) Best regards, Goetz. > > > >we noticed jtreg test failures when using this change: > >compiler/runtime/safepoints/TestRegisterRestoring.java > >compiler/runtime/Test7196199.java > > > >TestRegisterRestoring is a simple test which returns arbitrary results instead > of 10000. > > > >We didn't see it on all machines, so it might be an issue with > saving&restoring VR registers in the signal handler. > >The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" > with kernel 4.4.126-94.22-default. > Thank you for letting me know the issue, I will try to reproduce this on a SUSE > machine. > > > >I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when > your patch is applied. Looks like matching the vector nodes needs to be > prevented. > Thank you for pointing out another issue. Currently I do not hit this problem, > but preventing to match the vector nodes makes sense to avoid the crash. I > did not prepare match rules for non-vector nodes, so it might be better to > prepare them similarly like the Replicate* rules, in any case. > > > Gustavo, thanks for the wrap-up! > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > "Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we > noticed jtreg test failures when using this change: > > From: "Doerr, Martin" > To: Gustavo Romero , "Lindenmaier, Goetz" > , Michihiro Horie > Cc: hotspot compiler , "hotspot- > dev at openjdk.java.net" > Date: 2018/09/04 02:18 > Subject: RE: RFR: 8208171: PPC64: Enrich SLP support > > ________________________________ > > > > > Hi Gustavo and Michihiro, > > we noticed jtreg test failures when using this change: > compiler/runtime/safepoints/TestRegisterRestoring.java > compiler/runtime/Test7196199.java > > TestRegisterRestoring is a simple test which returns arbitrary results instead > of 10000. > > We didn't see it on all machines, so it might be an issue with saving&restoring > VR registers in the signal handler. > The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" > with kernel 4.4.126-94.22-default. > > That's what I found out so far. Maybe you have an idea? > > I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when > your patch is applied. Looks like matching the vector nodes needs to be > prevented. > > Best regards, > Martin > > > -----Original Message----- > From: hotspot-dev On Behalf Of > Gustavo Romero > Sent: Montag, 3. September 2018 14:57 > To: Lindenmaier, Goetz ; Michihiro Horie > > Cc: hotspot compiler ; hotspot- > dev at openjdk.java.net > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > Hi Goetz, > > On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote: > > Also, I can not find all of the mail traffic in > > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018- > August/thread.html. > > Is this a problem of the pipermail server? > > > > For some reason this webrev lacks the links to browse the diffs. > > Do you need to use a more recent webrev? You can obtain it with > > hg clone http://hg.openjdk.java.net/code-tools/webrev/ . > > Yes, probably it was a problem of the pipermail or in some relay. > I noted the same thing, i.e. at least one Michi reply arrived > to me but missed a ML. > > The initial discussion is here: > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018- > July/003613.html > > I understand Martin reviewed the last webrev in that thread, which is > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ (taken from > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018- > July/003615.html) > > Martin's review of webrev.01: > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018- > August/033958.html > > and Michi's reply to Martin's review of webrev.01: > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018- > August/003632.html (with webrev.02, > taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018- > August/003632.html). > > and your last review: > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018- > September/030419.html > > > HTH. > > Best regards, > Gustavo > > > Why do you rename vnoreg to vnoregi? > > > > Besides that the change is fine, thanks for implementing this! > > > > Best regards, > > Goetz. > > > > > >> -----Original Message----- > >> From: Doerr, Martin > >> Sent: Dienstag, 28. August 2018 19:35 > >> To: Gustavo Romero ; Michihiro Horie > >> > >> Cc: Lindenmaier, Goetz ; hotspot- > >> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, > Volker > >> > >> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support > >> > >> Hi Michihiro, > >> > >> thank you for implementing it. I have just taken a first look at your > >> webrev.01. > >> > >> It looks basically good. Only the Power version check seems to be > incorrect. > >> VM_Version::has_popcntb() checks for Power5. > >> I believe most instructions are available with Power7. > >> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with > >> Power8? > >> We should check this carefully. > >> > >> Also, indentation in register_ppc.hpp could get improved. > >> > >> Thanks and best regard, > >> Martin > >> > >> > >> -----Original Message----- > >> From: Gustavo Romero > >> Sent: Donnerstag, 26. Juli 2018 16:02 > >> To: Michihiro Horie > >> Cc: Lindenmaier, Goetz ; hotspot- > >> dev at openjdk.java.net; Doerr, Martin ; ppc-aix- > >> port-dev at openjdk.java.net; Simonis, Volker > >> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > >> > >> Hi Michi, > >> > >> On 07/26/2018 01:43 AM, Michihiro Horie wrote: > >>> I updated webrev: > >>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ > >> > >> Thanks for providing an updated webrev and for fixing indentation and > >> function > >> order in assembler_ppc.inline.hpp as well. I have no further comments :) > >> > >> > >> Best Regards, > >> Gustavo > >> > >>> > >>> Best regards, > >>> -- > >>> Michihiro, > >>> IBM Research - Tokyo > >>> > >>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi > Michi, > >> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero --- > >> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie > >> wrote: > >>> > >>> From: Gustavo Romero > >>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port- > >> dev at openjdk.java.net, hotspot-dev at openjdk.java.net > >>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, > Martin" > >> > >>> Date: 2018/07/25 23:05 > >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > >>> > >>> ---------------------------------------------------------------------------------------- > --- > >> ------------------------------------------------------------------------------------------ > ---- > >> ------------------------------------------------------------------------------------------ > ---- > >> ------------------------------------------------------------------------------------------ > ---- > >> ------------------------------------------------------------------------------------------ > ---- > >> ------------------------------------------------------------------------------------------ > ---- > >> ------------------------------------------------------------------------------------------ > ---- > >> ------------------------------------------------------------------------------------------ > ---- > >> ------------------------------------------------------------------------------------------ > ---- > >> ------------------------------------------------------------------------------------------ > ---- > >> ----------------------------------------------------- > >>> > >>> > >>> > >>> Hi Michi, > >>> > >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote: > >>> > Dear all, > >>> > > >>> > Would you review the following change? > >>> > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 > >>> > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 > >>> > > >>> > This change adds support for vectorized arithmetic calculation with > SLP. > >>> > > >>> > The to_vr function is added to convert VSR to VR. Currently, vecX is > >> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, > >> which are exactly overlapped with VRs. Instruction APIs receiving VRs use > the > >> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the > >> matching with SqrtVF. I think the change in sqrtF_reg would be fine due > to > >> the ConvD2FNode::Value in convertnode.cpp. > >>> > >>> Looks good. Just a few comments: > >>> > >>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of > >> vmaddfp in > >>> order to avoid the splat? > >>> > >>> - Although all instructions added by your change where introduced in ISA > >> 2.06, > >>> so POWER7 and above are OK, as I see probes for > >> PowerArchictecturePPC64=6|5 in > >>> vm_version_ppc.cpp (line 64), I'm wondering if there is any control > point > >> to > >>> guarantee that these instructions won't be emitted on a CPU that does > >> not > >>> support them. > >>> > >>> - I think that in general string in format %{} are in upper case. For > instance, > >>> this the current output on optoassembly for vmul4F: > >>> > >>> 2941835 5b4 ADDI R24, R24, #64 > >>> 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F > >>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector > >>> > >>> I think it would be better to be in upper case instead. I also think that if > >>> the node match emits more than one instruction all instructions must > be > >> listed > >>> in format %{}, since it's meant for detailed debugging. Finally I think it > >>> would be better to replace \t! by \t// in that string (unless I'm missing > any > >>> special meaning for that char). So for vmul4F it would be something > like: > >>> > >>> 2941835 5b4 ADDI R24, R24, #64 > >>> VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 > >>> 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F > >>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector > >>> > >>> > >>> But feel free to change anything just after you get additional reviews :) > >>> > >>> > >>> > I confirmed this change with JTREG. In addition, I used attached micro > >> benchmarks. > >>> > /(See attached file: slp_microbench.zip)/ > >>> > >>> Thanks for sharing it. > >>> Btw, another option to host it would be in the CR > >>> server, in http://cr.openjdk.java.net/~mhorie/8208171 > >>> > >>> > >>> Best regards, > >>> Gustavo > >>> > >>> > > >>> > Best regards, > >>> > -- > >>> > Michihiro, > >>> > IBM Research - Tokyo > >>> > > >>> > >>> > >>> > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From lutz.schmidt at sap.com Tue Sep 4 08:29:09 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 4 Sep 2018 08:29:09 +0000 Subject: RFR(S): 8210319: [s390]: Use of shift operators not covered by cpp standard Message-ID: <383FFEC2-0FB7-402B-95B1-6ACD017BB630@sap.com> Dear All, may I please request reviews for this small, s390-only patch. It fixes some shift operations which relied on behavior not covered by the language standard. Bug: https://bugs.openjdk.java.net/browse/JDK-8210319 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8210319.00/ Thank you! Lutz -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Tue Sep 4 09:28:06 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 4 Sep 2018 09:28:06 +0000 Subject: RFR(S): 8210319: [s390]: Use of shift operators not covered by cpp standard In-Reply-To: <383FFEC2-0FB7-402B-95B1-6ACD017BB630@sap.com> References: <383FFEC2-0FB7-402B-95B1-6ACD017BB630@sap.com> Message-ID: Hi Lutz, looks good. Thanks for improving. Best regards, Martin From: hotspot-compiler-dev On Behalf Of Schmidt, Lutz Sent: Dienstag, 4. September 2018 10:29 To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(S): 8210319: [s390]: Use of shift operators not covered by cpp standard Dear All, may I please request reviews for this small, s390-only patch. It fixes some shift operations which relied on behavior not covered by the language standard. Bug: https://bugs.openjdk.java.net/browse/JDK-8210319 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8210319.00/ Thank you! Lutz -------------- next part -------------- An HTML attachment was scrubbed... URL: From shade at redhat.com Tue Sep 4 10:28:55 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 4 Sep 2018 12:28:55 +0200 Subject: RFR (XS) 8210355: Minimal and Zero non-PCH builds fail after JDK-8207343 (Automate vtable/itable stub size calculation) Message-ID: Bug: https://bugs.openjdk.java.net/browse/JDK-8210355 Fix: diff -r 3ee917225506 src/hotspot/share/code/vtableStubs.cpp --- a/src/hotspot/share/code/vtableStubs.cpp Tue Sep 04 14:47:55 2018 +0800 +++ b/src/hotspot/share/code/vtableStubs.cpp Tue Sep 04 12:23:23 2018 +0200 @@ -26,6 +26,7 @@ #include "code/vtableStubs.hpp" #include "compiler/compileBroker.hpp" #include "compiler/disassembler.hpp" +#include "logging/log.hpp" #include "memory/allocation.inline.hpp" #include "memory/resourceArea.hpp" #include "oops/instanceKlass.hpp" Seems like it is transitively included from somewhere (compiler?) in most configuration, but it breaks for Minimal and Zero non-PCH builds which are configured without C1/C2. Zero is still broken by other thing. Testing: Linux x86_64 minimal builds Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From erik.osterlund at oracle.com Tue Sep 4 10:32:43 2018 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 4 Sep 2018 12:32:43 +0200 Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics In-Reply-To: References: Message-ID: <5B8E5F4B.5060707@oracle.com> Hi, Any more takers? Full: http://cr.openjdk.java.net/~eosterlund/8210158/webrev.01/ Thanks, /Erik On 2018-08-30 17:06, Erik ?sterlund wrote: > Hi Roland, > > Thank you for the review. > > On 2018-08-30 13:21, Roland Westrelin wrote: >> Hi Erik, >> >>> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.00 >> make_load() already calls _gvn.transform(), right? > > Yes you are right. I will remove the redundant _gvn.transform call of > the access_load; it is redundant indeed. > >> You don't set MO_UNORDERED. Why is it not required? > > MO_UNORDERED is the default MO of loads and stores. It is set up in > the C2Access object using fixup_decorators() which sets sane defaults > for various decorators, including MO. > > Thanks, > /Erik > >> Roland. > From shade at redhat.com Tue Sep 4 10:33:03 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 4 Sep 2018 12:33:03 +0200 Subject: RFR (XS) 8210357: Zero builds fail after JDK-8207343 (Automate vtable/itable stub size calculation) Message-ID: <1a517f29-eace-8e8b-4387-848700bad214@redhat.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8210357 Seems like VtableStub::pd_code_size_limit is gone, and should be purged from Zero too. Fix: diff -r bc76fd44b029 src/hotspot/cpu/zero/vtableStubs_zero.cpp --- a/src/hotspot/cpu/zero/vtableStubs_zero.cpp Tue Sep 04 12:28:12 2018 +0200 +++ b/src/hotspot/cpu/zero/vtableStubs_zero.cpp Tue Sep 04 12:30:21 2018 +0200 @@ -37,11 +37,6 @@ return NULL; } -int VtableStub::pd_code_size_limit(bool is_vtable_stub) { - ShouldNotCallThis(); - return 0; -} - int VtableStub::pd_code_alignment() { ShouldNotCallThis(); return 0; Testing: Linux x86_64 zero build Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From rickard.backman at oracle.com Tue Sep 4 10:36:22 2018 From: rickard.backman at oracle.com (Rickard =?utf-8?Q?B=C3=A4ckman?=) Date: Tue, 4 Sep 2018 12:36:22 +0200 Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics In-Reply-To: References: Message-ID: <20180904103622.sijpgfiltco4mxd2@rbackman> Looks good. /R On 08/30, Erik ?sterlund wrote: > Hi, > > The JFR getEventWriter() intrinsics have code in C1 and C2 that manually > resolves jobjects. This should go through the Access API to make sure the > necessary GC barriers are inserted. > > I noticed this in an attempt to move JNI handle processing out of the pause > (among other things). It crashed in kitchensink. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8210158 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8210158/webrev.00 > > I tested the patch by running it, together with a patch that moves out JNI > handle processing outside of the pause, through hs-tier1-3, as well as > running it through Kitchensink24H (as it originally crashed in kitchensink). > > Thanks, > /Erik From erik.osterlund at oracle.com Tue Sep 4 10:40:11 2018 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 4 Sep 2018 12:40:11 +0200 Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics In-Reply-To: <20180904103622.sijpgfiltco4mxd2@rbackman> References: <20180904103622.sijpgfiltco4mxd2@rbackman> Message-ID: <5B8E610B.70909@oracle.com> Hi Rickard, Thank you for the review. /Erik On 2018-09-04 12:36, Rickard B?ckman wrote: > Looks good. > > /R > > On 08/30, Erik ?sterlund wrote: >> Hi, >> >> The JFR getEventWriter() intrinsics have code in C1 and C2 that manually >> resolves jobjects. This should go through the Access API to make sure the >> necessary GC barriers are inserted. >> >> I noticed this in an attempt to move JNI handle processing out of the pause >> (among other things). It crashed in kitchensink. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8210158 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.00 >> >> I tested the patch by running it, together with a patch that moves out JNI >> handle processing outside of the pause, through hs-tier1-3, as well as >> running it through Kitchensink24H (as it originally crashed in kitchensink). >> >> Thanks, >> /Erik From tobias.hartmann at oracle.com Tue Sep 4 11:09:07 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 4 Sep 2018 13:09:07 +0200 Subject: RFR (XS) 8210355: Minimal and Zero non-PCH builds fail after JDK-8207343 (Automate vtable/itable stub size calculation) In-Reply-To: References: Message-ID: <097fa86d-7e07-d125-5b54-2b91ebb75fdf@oracle.com> Hi Aleksey, looks good to me and can be considered trivial. Best regards, Tobias On 04.09.2018 12:28, Aleksey Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8210355 > > Fix: > > diff -r 3ee917225506 src/hotspot/share/code/vtableStubs.cpp > --- a/src/hotspot/share/code/vtableStubs.cpp Tue Sep 04 14:47:55 2018 +0800 > +++ b/src/hotspot/share/code/vtableStubs.cpp Tue Sep 04 12:23:23 2018 +0200 > @@ -26,6 +26,7 @@ > #include "code/vtableStubs.hpp" > #include "compiler/compileBroker.hpp" > #include "compiler/disassembler.hpp" > +#include "logging/log.hpp" > #include "memory/allocation.inline.hpp" > #include "memory/resourceArea.hpp" > #include "oops/instanceKlass.hpp" > > Seems like it is transitively included from somewhere (compiler?) in most configuration, but it > breaks for Minimal and Zero non-PCH builds which are configured without C1/C2. Zero is still broken > by other thing. > > Testing: Linux x86_64 minimal builds > > Thanks, > -Aleksey > From tobias.hartmann at oracle.com Tue Sep 4 11:09:22 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 4 Sep 2018 13:09:22 +0200 Subject: RFR (XS) 8210357: Zero builds fail after JDK-8207343 (Automate vtable/itable stub size calculation) In-Reply-To: <1a517f29-eace-8e8b-4387-848700bad214@redhat.com> References: <1a517f29-eace-8e8b-4387-848700bad214@redhat.com> Message-ID: <50b8c457-31bc-dc56-223a-f8e655744d50@oracle.com> Hi Aleksey, looks good to me and can be considered trivial. Best regards, Tobias On 04.09.2018 12:33, Aleksey Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8210357 > > Seems like VtableStub::pd_code_size_limit is gone, and should be purged from Zero too. > > Fix: > > diff -r bc76fd44b029 src/hotspot/cpu/zero/vtableStubs_zero.cpp > --- a/src/hotspot/cpu/zero/vtableStubs_zero.cpp Tue Sep 04 12:28:12 2018 +0200 > +++ b/src/hotspot/cpu/zero/vtableStubs_zero.cpp Tue Sep 04 12:30:21 2018 +0200 > @@ -37,11 +37,6 @@ > return NULL; > } > > -int VtableStub::pd_code_size_limit(bool is_vtable_stub) { > - ShouldNotCallThis(); > - return 0; > -} > - > int VtableStub::pd_code_alignment() { > ShouldNotCallThis(); > return 0; > > > Testing: Linux x86_64 zero build > > Thanks, > -Aleksey > From shade at redhat.com Tue Sep 4 11:21:55 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 4 Sep 2018 13:21:55 +0200 Subject: RFR (XS) 8210357: Zero builds fail after JDK-8207343 (Automate vtable/itable stub size calculation) In-Reply-To: <50b8c457-31bc-dc56-223a-f8e655744d50@oracle.com> References: <1a517f29-eace-8e8b-4387-848700bad214@redhat.com> <50b8c457-31bc-dc56-223a-f8e655744d50@oracle.com> Message-ID: <0d67f0f0-ffe0-a6d4-5d49-896eb2621a0f@redhat.com> Thanks, pushed. -Aleksey On 09/04/2018 01:09 PM, Tobias Hartmann wrote: > Hi Aleksey, > > looks good to me and can be considered trivial. > > Best regards, > Tobias > > > On 04.09.2018 12:33, Aleksey Shipilev wrote: >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8210357 >> >> Seems like VtableStub::pd_code_size_limit is gone, and should be purged from Zero too. >> >> Fix: >> >> diff -r bc76fd44b029 src/hotspot/cpu/zero/vtableStubs_zero.cpp >> --- a/src/hotspot/cpu/zero/vtableStubs_zero.cpp Tue Sep 04 12:28:12 2018 +0200 >> +++ b/src/hotspot/cpu/zero/vtableStubs_zero.cpp Tue Sep 04 12:30:21 2018 +0200 >> @@ -37,11 +37,6 @@ >> return NULL; >> } >> >> -int VtableStub::pd_code_size_limit(bool is_vtable_stub) { >> - ShouldNotCallThis(); >> - return 0; >> -} >> - >> int VtableStub::pd_code_alignment() { >> ShouldNotCallThis(); >> return 0; >> >> >> Testing: Linux x86_64 zero build >> >> Thanks, >> -Aleksey >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From shade at redhat.com Tue Sep 4 11:22:09 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 4 Sep 2018 13:22:09 +0200 Subject: RFR (XS) 8210355: Minimal and Zero non-PCH builds fail after JDK-8207343 (Automate vtable/itable stub size calculation) In-Reply-To: <097fa86d-7e07-d125-5b54-2b91ebb75fdf@oracle.com> References: <097fa86d-7e07-d125-5b54-2b91ebb75fdf@oracle.com> Message-ID: Thanks, pushed. -Aleksey On 09/04/2018 01:09 PM, Tobias Hartmann wrote: > Hi Aleksey, > > looks good to me and can be considered trivial. > > Best regards, > Tobias > > On 04.09.2018 12:28, Aleksey Shipilev wrote: >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8210355 >> >> Fix: >> >> diff -r 3ee917225506 src/hotspot/share/code/vtableStubs.cpp >> --- a/src/hotspot/share/code/vtableStubs.cpp Tue Sep 04 14:47:55 2018 +0800 >> +++ b/src/hotspot/share/code/vtableStubs.cpp Tue Sep 04 12:23:23 2018 +0200 >> @@ -26,6 +26,7 @@ >> #include "code/vtableStubs.hpp" >> #include "compiler/compileBroker.hpp" >> #include "compiler/disassembler.hpp" >> +#include "logging/log.hpp" >> #include "memory/allocation.inline.hpp" >> #include "memory/resourceArea.hpp" >> #include "oops/instanceKlass.hpp" >> >> Seems like it is transitively included from somewhere (compiler?) in most configuration, but it >> breaks for Minimal and Zero non-PCH builds which are configured without C1/C2. Zero is still broken >> by other thing. >> >> Testing: Linux x86_64 minimal builds >> >> Thanks, >> -Aleksey >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From sgehwolf at redhat.com Tue Sep 4 11:44:39 2018 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Tue, 04 Sep 2018 13:44:39 +0200 Subject: RFR (XS) 8210357: Zero builds fail after JDK-8207343 (Automate vtable/itable stub size calculation) In-Reply-To: <0d67f0f0-ffe0-a6d4-5d49-896eb2621a0f@redhat.com> References: <1a517f29-eace-8e8b-4387-848700bad214@redhat.com> <50b8c457-31bc-dc56-223a-f8e655744d50@oracle.com> <0d67f0f0-ffe0-a6d4-5d49-896eb2621a0f@redhat.com> Message-ID: On Tue, 2018-09-04 at 13:21 +0200, Aleksey Shipilev wrote: > Thanks, pushed. Thanks for the Zero build fixes, Aleksey! Cheers, Severin From lutz.schmidt at sap.com Tue Sep 4 12:58:50 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 4 Sep 2018 12:58:50 +0000 Subject: RFR(S): 8210319: [s390]: Use of shift operators not covered by cpp standard In-Reply-To: References: <383FFEC2-0FB7-402B-95B1-6ACD017BB630@sap.com> Message-ID: Hi Martin, thanks for the review! Regards, Lutz From: "Doerr, Martin (martin.doerr at sap.com)" Date: Tuesday, 4. September 2018 at 11:28 To: Lutz Schmidt , "hotspot-compiler-dev at openjdk.java.net" Subject: RE: RFR(S): 8210319: [s390]: Use of shift operators not covered by cpp standard Hi Lutz, looks good. Thanks for improving. Best regards, Martin From: hotspot-compiler-dev On Behalf Of Schmidt, Lutz Sent: Dienstag, 4. September 2018 10:29 To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(S): 8210319: [s390]: Use of shift operators not covered by cpp standard Dear All, may I please request reviews for this small, s390-only patch. It fixes some shift operations which relied on behavior not covered by the language standard. Bug: https://bugs.openjdk.java.net/browse/JDK-8210319 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8210319.00/ Thank you! Lutz -------------- next part -------------- An HTML attachment was scrubbed... URL: From lutz.schmidt at sap.com Tue Sep 4 13:01:15 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 4 Sep 2018 13:01:15 +0000 Subject: RFR (XS) 8210357: Zero builds fail after JDK-8207343 (Automate vtable/itable stub size calculation) In-Reply-To: References: <1a517f29-eace-8e8b-4387-848700bad214@redhat.com> <50b8c457-31bc-dc56-223a-f8e655744d50@oracle.com> <0d67f0f0-ffe0-a6d4-5d49-896eb2621a0f@redhat.com> Message-ID: Hi folks, thanks for fixing my failures! Best regards, Lutz ?On 04.09.18, 13:44, "hotspot-compiler-dev on behalf of Severin Gehwolf" wrote: On Tue, 2018-09-04 at 13:21 +0200, Aleksey Shipilev wrote: > Thanks, pushed. Thanks for the Zero build fixes, Aleksey! Cheers, Severin From lutz.schmidt at sap.com Tue Sep 4 13:14:56 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 4 Sep 2018 13:14:56 +0000 Subject: RFR (XS) 8210357: Zero builds fail after JDK-8207343 (Automate vtable/itable stub size calculation) In-Reply-To: <1a517f29-eace-8e8b-4387-848700bad214@redhat.com> References: <1a517f29-eace-8e8b-4387-848700bad214@redhat.com> Message-ID: Sorry for the mess! Got the below mail just 10 minutes ago. May be related to internet connectivity issues here this morning. Yes, pd_code_size_limit() should be gone on all platforms. I did a grep across the source tree to find all occurrences and obviously missed zero. Don't know why. Regards, Lutz ?On 04.09.18, 12:33, "hotspot-compiler-dev on behalf of Aleksey Shipilev" wrote: Bug: https://bugs.openjdk.java.net/browse/JDK-8210357 Seems like VtableStub::pd_code_size_limit is gone, and should be purged from Zero too. Fix: diff -r bc76fd44b029 src/hotspot/cpu/zero/vtableStubs_zero.cpp --- a/src/hotspot/cpu/zero/vtableStubs_zero.cpp Tue Sep 04 12:28:12 2018 +0200 +++ b/src/hotspot/cpu/zero/vtableStubs_zero.cpp Tue Sep 04 12:30:21 2018 +0200 @@ -37,11 +37,6 @@ return NULL; } -int VtableStub::pd_code_size_limit(bool is_vtable_stub) { - ShouldNotCallThis(); - return 0; -} - int VtableStub::pd_code_alignment() { ShouldNotCallThis(); return 0; Testing: Linux x86_64 zero build Thanks, -Aleksey From gromero at linux.vnet.ibm.com Tue Sep 4 13:42:02 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 4 Sep 2018 10:42:02 -0300 Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler code Message-ID: Hi, May I please request reviews for this tiny change that fixes two uninitialized variables in PPC64 C1 LIR code? Bug : https://bugs.openjdk.java.net/browse/JDK-8210320 Webrev: http://cr.openjdk.java.net/~gromero/8210320/v1/ GCC 4.8 does not complain about these two uninitialized pointers ('data' and 'md') but more recent versions, like 5.4.0 and 7.3.1, complain about it: In file included from /home/gromero/hg/jdk/jdk/src/hotspot/share/c1/c1_Compilation.hpp:29:0, from /home/gromero/hg/jdk/jdk/src/hotspot/share/precompiled/precompiled.hpp:286: /home/gromero/hg/jdk/jdk/src/hotspot/share/ci/ciMethodData.hpp: In member function ?void LIR_Assembler::emit_typecheck_helper(LIR_OpTypeCheck*, Label*, Label*, Label*)?: /home/gromero/hg/jdk/jdk/src/hotspot/share/ci/ciMethodData.hpp:595:100: warning: ?data? may be used uninitialized in this function [-Wmaybe-uninitialized] int byte_offset_of_slot(ciProfileData* data, ByteSize slot_offset_in_data) { return in_bytes(offset_of_slot(data, slot_offset_in_data)); } ^ /home/gromero/hg/jdk/jdk/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp:2400:18: note: ?data? was declared here ciProfileData* data; ^ /home/gromero/hg/jdk/jdk/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp:2483:78: warning: ?md? may be used uninitialized in this function [-Wmaybe-uninitialized] type_profile_helper(mdo, mdo_offset_bias, md, data, recv, Rtmp1, success); ^ Thank you. Best regards, Gustavo From shade at redhat.com Tue Sep 4 13:49:12 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 4 Sep 2018 15:49:12 +0200 Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler code In-Reply-To: References: Message-ID: <36d840d7-1050-80a4-5e78-fdb36a734702@redhat.com> On 09/04/2018 03:42 PM, Gustavo Romero wrote: > May I please request reviews for this tiny change that fixes two > uninitialized variables in PPC64 C1 LIR code? > > Bug?? : https://bugs.openjdk.java.net/browse/JDK-8210320 > Webrev: http://cr.openjdk.java.net/~gromero/8210320/v1/ Looks good and trivial to me. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From matthias.baesken at sap.com Tue Sep 4 13:48:54 2018 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Tue, 4 Sep 2018 13:48:54 +0000 Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler code In-Reply-To: References: Message-ID: <4ca7e1f4303a44df8c0ada10cb402718@sap.com> Hi Gustavo , looks good (not a reviewer however). It might not hurt to initialize md and data as well in the same file in emit_opTypeCheck as well ( even without gcc complaints ) : void LIR_Assembler::emit_opTypeCheck(LIR_OpTypeCheck* op) { LIR_Code code = op->code(); if (code == lir_store_check) { Register value = op->object()->as_register(); Register array = op->array()->as_register(); Register k_RInfo = op->tmp1()->as_register(); Register klass_RInfo = op->tmp2()->as_register(); Register Rtmp1 = op->tmp3()->as_register(); bool should_profile = op->should_profile(); __ verify_oop(value); CodeStub* stub = op->stub(); // Check if it needs to be profiled. ciMethodData* md; ciProfileData* data; ... Best regards, Matthias > -----Original Message----- > From: ppc-aix-port-dev On > Behalf Of Gustavo Romero > Sent: Dienstag, 4. September 2018 15:42 > To: hotspot-compiler-dev at openjdk.java.net > Cc: ppc-aix-port-dev at openjdk.java.net > Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler > code > Importance: High > > Hi, > > May I please request reviews for this tiny change that fixes two > uninitialized variables in PPC64 C1 LIR code? > > Bug : https://bugs.openjdk.java.net/browse/JDK-8210320 > Webrev: http://cr.openjdk.java.net/~gromero/8210320/v1/ > > GCC 4.8 does not complain about these two uninitialized pointers ('data' > and 'md') but more recent versions, like 5.4.0 and 7.3.1, complain about > it: > > In file included from > /home/gromero/hg/jdk/jdk/src/hotspot/share/c1/c1_Compilation.hpp:29:0, > from > /home/gromero/hg/jdk/jdk/src/hotspot/share/precompiled/precompiled.h > pp:286: > /home/gromero/hg/jdk/jdk/src/hotspot/share/ci/ciMethodData.hpp: In > member function ?void > LIR_Assembler::emit_typecheck_helper(LIR_OpTypeCheck*, Label*, Label*, > Label*)?: > /home/gromero/hg/jdk/jdk/src/hotspot/share/ci/ciMethodData.hpp:595:10 > 0: warning: ?data? may be used uninitialized in this function [-Wmaybe- > uninitialized] > int byte_offset_of_slot(ciProfileData* data, ByteSize > slot_offset_in_data) { return in_bytes(offset_of_slot(data, > slot_offset_in_data)); } > ^ > /home/gromero/hg/jdk/jdk/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cp > p:2400:18: note: ?data? was declared here > ciProfileData* data; > ^ > /home/gromero/hg/jdk/jdk/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cp > p:2483:78: warning: ?md? may be used uninitialized in this function [- > Wmaybe-uninitialized] > type_profile_helper(mdo, mdo_offset_bias, md, data, recv, Rtmp1, > success); > ^ > > Thank you. > > Best regards, > Gustavo From gromero at linux.vnet.ibm.com Tue Sep 4 14:11:05 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 4 Sep 2018 11:11:05 -0300 Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler code In-Reply-To: <36d840d7-1050-80a4-5e78-fdb36a734702@redhat.com> References: <36d840d7-1050-80a4-5e78-fdb36a734702@redhat.com> Message-ID: <96db713b-a1af-d0f6-55fe-053f4780dab8@linux.vnet.ibm.com> Hi Matthias and Aleksey, Thanks for reviewing it. On 09/04/2018 10:49 AM, Aleksey Shipilev wrote: > On 09/04/2018 03:42 PM, Gustavo Romero wrote: >> May I please request reviews for this tiny change that fixes two >> uninitialized variables in PPC64 C1 LIR code? >> >> Bug : https://bugs.openjdk.java.net/browse/JDK-8210320 >> Webrev: http://cr.openjdk.java.net/~gromero/8210320/v1/ > > Looks good and trivial to me. Aleksey, I've updated that change to include another case pointed out by Matthias: http://cr.openjdk.java.net/~gromero/8210320/v2/ I think it's still trivial as before? If so it means I can push it once I receive a second OK from you? I also think I don't need to push it first to the 'submit' repo since it's a PPC64-only change. Is that correct? Thank you. Best regards, Gustavo From martin.doerr at sap.com Tue Sep 4 14:12:09 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 4 Sep 2018 14:12:09 +0000 Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler code In-Reply-To: References: Message-ID: Hi Gustavo, it's not a real bug, just a build warning. But it needs to get fixed. Thanks for doing it. Reviewed. Best regards, Martin -----Original Message----- From: hotspot-compiler-dev On Behalf Of Gustavo Romero Sent: Dienstag, 4. September 2018 15:42 To: hotspot-compiler-dev at openjdk.java.net Cc: ppc-aix-port-dev at openjdk.java.net Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler code Importance: High Hi, May I please request reviews for this tiny change that fixes two uninitialized variables in PPC64 C1 LIR code? Bug : https://bugs.openjdk.java.net/browse/JDK-8210320 Webrev: http://cr.openjdk.java.net/~gromero/8210320/v1/ GCC 4.8 does not complain about these two uninitialized pointers ('data' and 'md') but more recent versions, like 5.4.0 and 7.3.1, complain about it: In file included from /home/gromero/hg/jdk/jdk/src/hotspot/share/c1/c1_Compilation.hpp:29:0, from /home/gromero/hg/jdk/jdk/src/hotspot/share/precompiled/precompiled.hpp:286: /home/gromero/hg/jdk/jdk/src/hotspot/share/ci/ciMethodData.hpp: In member function ?void LIR_Assembler::emit_typecheck_helper(LIR_OpTypeCheck*, Label*, Label*, Label*)?: /home/gromero/hg/jdk/jdk/src/hotspot/share/ci/ciMethodData.hpp:595:100: warning: ?data? may be used uninitialized in this function [-Wmaybe-uninitialized] int byte_offset_of_slot(ciProfileData* data, ByteSize slot_offset_in_data) { return in_bytes(offset_of_slot(data, slot_offset_in_data)); } ^ /home/gromero/hg/jdk/jdk/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp:2400:18: note: ?data? was declared here ciProfileData* data; ^ /home/gromero/hg/jdk/jdk/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp:2483:78: warning: ?md? may be used uninitialized in this function [-Wmaybe-uninitialized] type_profile_helper(mdo, mdo_offset_bias, md, data, recv, Rtmp1, success); ^ Thank you. Best regards, Gustavo From martin.doerr at sap.com Tue Sep 4 14:15:29 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 4 Sep 2018 14:15:29 +0000 Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler code In-Reply-To: <96db713b-a1af-d0f6-55fe-053f4780dab8@linux.vnet.ibm.com> References: <36d840d7-1050-80a4-5e78-fdb36a734702@redhat.com> <96db713b-a1af-d0f6-55fe-053f4780dab8@linux.vnet.ibm.com> Message-ID: Hi Gustavo, > I think it's still trivial as before? Yes. > If so it means I can push it once I receive a second OK from you? > > I also think I don't need to push it first to the 'submit' repo since it's > a PPC64-only change. Is that correct? That's fine (assuming you have run a local build). Best regards, Martin -----Original Message----- From: hotspot-compiler-dev On Behalf Of Gustavo Romero Sent: Dienstag, 4. September 2018 16:11 To: Aleksey Shipilev ; hotspot-compiler-dev at openjdk.java.net; Baesken, Matthias Cc: ppc-aix-port-dev at openjdk.java.net Subject: Re: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler code Hi Matthias and Aleksey, Thanks for reviewing it. On 09/04/2018 10:49 AM, Aleksey Shipilev wrote: > On 09/04/2018 03:42 PM, Gustavo Romero wrote: >> May I please request reviews for this tiny change that fixes two >> uninitialized variables in PPC64 C1 LIR code? >> >> Bug : https://bugs.openjdk.java.net/browse/JDK-8210320 >> Webrev: http://cr.openjdk.java.net/~gromero/8210320/v1/ > > Looks good and trivial to me. Aleksey, I've updated that change to include another case pointed out by Matthias: http://cr.openjdk.java.net/~gromero/8210320/v2/ I think it's still trivial as before? If so it means I can push it once I receive a second OK from you? I also think I don't need to push it first to the 'submit' repo since it's a PPC64-only change. Is that correct? Thank you. Best regards, Gustavo From shade at redhat.com Tue Sep 4 14:15:32 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 4 Sep 2018 16:15:32 +0200 Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler code In-Reply-To: <96db713b-a1af-d0f6-55fe-053f4780dab8@linux.vnet.ibm.com> References: <36d840d7-1050-80a4-5e78-fdb36a734702@redhat.com> <96db713b-a1af-d0f6-55fe-053f4780dab8@linux.vnet.ibm.com> Message-ID: On 09/04/2018 04:11 PM, Gustavo Romero wrote: > http://cr.openjdk.java.net/~gromero/8210320/v2/ > > I think it's still trivial as before? Yes. > If so it means I can push it once I receive a second OK from you? Yes, this is trivial, and AFAIU only one Reviewer is required for trivial issues. > I also think I don't need to push it first to the 'submit' repo since it's > a PPC64-only change. Is that correct? Yes, I don't see the need to test trivial patches like this with submit repo. Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From gromero at linux.vnet.ibm.com Tue Sep 4 14:47:54 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 4 Sep 2018 11:47:54 -0300 Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler code In-Reply-To: References: Message-ID: <9accaa5c-5bd5-b07f-e246-c3956bff5643@linux.vnet.ibm.com> Hi Martin! On 09/04/2018 11:12 AM, Doerr, Martin wrote: > Hi Gustavo, > > it's not a real bug, just a build warning. But it needs to get fixed. Thanks for doing it. Reviewed. Thanks for reviewing it. Yes, I agree. Btw, I tried to precisely determine which change was introduced in gcc 7.3 (for instance) hoping it was only a matter of an additional -Wextra or -Wall in a gcc spec but it turned out that that was not the case apparently... I could not find a reasonable change in gcc flags or source code that might cause such a warnings when gcc 7.3 is used. I've create a "test case" from JVM code for that [1] (which is still 4.4 MiB since I didn't have the change to prune it further). But curious enough although the following simple code really triggers the same warning _both_ on gcc 4.8 and 7.3 when compiled with: $ g++ -Wuninitialized -O3 mu.cpp -c -o mu mu.cpp: void foo(int x) { printf("%d\n", x+1); } int main(int argc, char** argv) { int x; switch (argc) { case 1: x = 1; break; case 2: x = 4; break; case 3: x = 5; } foo(x); } code [1] only triggers the warning in question when gcc 7.3 is used (using the exact same flags): $ g++ -Wuninitialized -O3 ok.cpp -c -o ok.o Passing '-v' to gcc to check the flags from spec didnt show any clue. Toolchain folks also were not able to tell any differences that could account for that behavior on gcc 7.3 without a detailed look... Anyway, it's only a note :) Thanks. Best regards, Gustavo [1] http://cr.openjdk.java.net/~gromero/misc/ok.cpp > Best regards, > Martin > > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Gustavo Romero > Sent: Dienstag, 4. September 2018 15:42 > To: hotspot-compiler-dev at openjdk.java.net > Cc: ppc-aix-port-dev at openjdk.java.net > Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler code > Importance: High > > Hi, > > May I please request reviews for this tiny change that fixes two > uninitialized variables in PPC64 C1 LIR code? > > Bug : https://bugs.openjdk.java.net/browse/JDK-8210320 > Webrev: http://cr.openjdk.java.net/~gromero/8210320/v1/ > > GCC 4.8 does not complain about these two uninitialized pointers ('data' > and 'md') but more recent versions, like 5.4.0 and 7.3.1, complain about > it: > > In file included from /home/gromero/hg/jdk/jdk/src/hotspot/share/c1/c1_Compilation.hpp:29:0, > from /home/gromero/hg/jdk/jdk/src/hotspot/share/precompiled/precompiled.hpp:286: > /home/gromero/hg/jdk/jdk/src/hotspot/share/ci/ciMethodData.hpp: In member function ?void LIR_Assembler::emit_typecheck_helper(LIR_OpTypeCheck*, Label*, Label*, Label*)?: > /home/gromero/hg/jdk/jdk/src/hotspot/share/ci/ciMethodData.hpp:595:100: warning: ?data? may be used uninitialized in this function [-Wmaybe-uninitialized] > int byte_offset_of_slot(ciProfileData* data, ByteSize slot_offset_in_data) { return in_bytes(offset_of_slot(data, slot_offset_in_data)); } > ^ > /home/gromero/hg/jdk/jdk/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp:2400:18: note: ?data? was declared here > ciProfileData* data; > ^ > /home/gromero/hg/jdk/jdk/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp:2483:78: warning: ?md? may be used uninitialized in this function [-Wmaybe-uninitialized] > type_profile_helper(mdo, mdo_offset_bias, md, data, recv, Rtmp1, success); > ^ > > Thank you. > > Best regards, > Gustavo > From gromero at linux.vnet.ibm.com Tue Sep 4 14:49:36 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 4 Sep 2018 11:49:36 -0300 Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler code In-Reply-To: References: <36d840d7-1050-80a4-5e78-fdb36a734702@redhat.com> <96db713b-a1af-d0f6-55fe-053f4780dab8@linux.vnet.ibm.com> Message-ID: <04bb4c95-6252-1837-e7bf-4ad2d2e411c0@linux.vnet.ibm.com> On 09/04/2018 11:15 AM, Doerr, Martin wrote: > Hi Gustavo, > >> I think it's still trivial as before? > Yes. > >> If so it means I can push it once I receive a second OK from you? >> >> I also think I don't need to push it first to the 'submit' repo since it's >> a PPC64-only change. Is that correct? > That's fine (assuming you have run a local build). Sure :) Regards, Gustavo > Best regards, > Martin > > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Gustavo Romero > Sent: Dienstag, 4. September 2018 16:11 > To: Aleksey Shipilev ; hotspot-compiler-dev at openjdk.java.net; Baesken, Matthias > Cc: ppc-aix-port-dev at openjdk.java.net > Subject: Re: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler code > > Hi Matthias and Aleksey, > > Thanks for reviewing it. > > On 09/04/2018 10:49 AM, Aleksey Shipilev wrote: >> On 09/04/2018 03:42 PM, Gustavo Romero wrote: >>> May I please request reviews for this tiny change that fixes two >>> uninitialized variables in PPC64 C1 LIR code? >>> >>> Bug : https://bugs.openjdk.java.net/browse/JDK-8210320 >>> Webrev: http://cr.openjdk.java.net/~gromero/8210320/v1/ >> >> Looks good and trivial to me. > > Aleksey, I've updated that change to include another case pointed out by Matthias: > > http://cr.openjdk.java.net/~gromero/8210320/v2/ > > I think it's still trivial as before? > > If so it means I can push it once I receive a second OK from you? > > I also think I don't need to push it first to the 'submit' repo since it's > a PPC64-only change. Is that correct? > > Thank you. > > Best regards, > Gustavo > From gromero at linux.vnet.ibm.com Tue Sep 4 14:52:35 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 4 Sep 2018 11:52:35 -0300 Subject: RFR(xs): 8210320: PPC64: Fix uninitialized variable in C1 LIR assembler code In-Reply-To: References: <36d840d7-1050-80a4-5e78-fdb36a734702@redhat.com> <96db713b-a1af-d0f6-55fe-053f4780dab8@linux.vnet.ibm.com> Message-ID: On 09/04/2018 11:15 AM, Aleksey Shipilev wrote: > On 09/04/2018 04:11 PM, Gustavo Romero wrote: >> http://cr.openjdk.java.net/~gromero/8210320/v2/ >> >> I think it's still trivial as before? > > Yes. > >> If so it means I can push it once I receive a second OK from you? > > Yes, this is trivial, and AFAIU only one Reviewer is required for trivial issues. Got it. Thanks for confirming it. Either way Martin reviewed it also by now. >> I also think I don't need to push it first to the 'submit' repo since it's >> a PPC64-only change. Is that correct? > > Yes, I don't see the need to test trivial patches like this with submit repo. OK. Thanks! Best regards, Gustavo > Thanks, > -Aleksey > > From martin.doerr at sap.com Tue Sep 4 16:20:58 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 4 Sep 2018 16:20:58 +0000 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: References: <346da54af45243c4bdaf475f118a450d@sap.com> <9553d65d98f74f37a35b49a1e39f015e@sap.com> <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com> Message-ID: <57ebd30a66504577a6b2ec267aee4b69@sap.com> Hi Michihiro, thanks for looking into the problems. I also prefer "vnoreg" and "vsnoreg". I'd be fine with just adding "&& SuperwordUseVSX" for the new rules in "match_rule_supported". Can you reproduce the test failures? The very same VM works fine on a different Power8 machine which uses the same instructions by C2. The VM was built on the machine where it works ("SUSE Linux Enterprise Server 12 SP1"). I have seen several linux kernel changes regarding saving and restoring the VSX registers. I still haven't found out how the kernel determines things like "tsk->thread.used_vsr" which is used to set "msr |= MSR_VEC". Maybe something is missing which tells the kernel that we're using it. But that's just a guess. Best regards, Martin From: Michihiro Horie Sent: Dienstag, 4. September 2018 07:32 To: Doerr, Martin Cc: Lindenmaier, Goetz ; Gustavo Romero ; hotspot compiler ; hotspot-dev at openjdk.java.net Subject: RE: RFR: 8208171: PPC64: Enrich SLP support Hi Goetz, Martin, and Gustavo, >First, this should have been reviewed on hotspot-compiler-dev. It is clearly >a compiler change. >https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.openjdk.java.net_mailman_listinfo&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oecsIpYF-cifqq2i1JEH0Q&m=AwJriSOfe9Z0niOEpp6HGgsCBhKwnM19dyn4CipYwyU&s=O9RJz8qw_uJHSJyEdWsuR2j_lgnquX3sbwyEgkFZ3YQ&e= says that hotspot-dev is for >"Technical discussion about the development of the HotSpot virtual machine that's not specific to any particular component" >while hotspot-compiler-dev is for >"Technical discussion about the development of the HotSpot bytecode compilers" I understood the instruction and would use hotspot-compiler-dev in future RFRs, thanks. > Why do you rename vnoreg to vnoregi? I followed the way of coding for vsnoreg and vsnoregi, but the renaming does not look necessary. I would get this part back. Should I also rename vsnoregi to vsnoreg? >we noticed jtreg test failures when using this change: >compiler/runtime/safepoints/TestRegisterRestoring.java >compiler/runtime/Test7196199.java > >TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. > >We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. >The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. Thank you for letting me know the issue, I will try to reproduce this on a SUSE machine. >I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. Thank you for pointing out another issue. Currently I do not hit this problem, but preventing to match the vector nodes makes sense to avoid the crash. I did not prepare match rules for non-vector nodes, so it might be better to prepare them similarly like the Replicate* rules, in any case. Gustavo, thanks for the wrap-up! Best regards, -- Michihiro, IBM Research - Tokyo [Inactive hide details for "Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures whe]"Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures when using this change: From: "Doerr, Martin" > To: Gustavo Romero >, "Lindenmaier, Goetz" >, Michihiro Horie > Cc: hotspot compiler >, "hotspot-dev at openjdk.java.net" > Date: 2018/09/04 02:18 Subject: RE: RFR: 8208171: PPC64: Enrich SLP support ________________________________ Hi Gustavo and Michihiro, we noticed jtreg test failures when using this change: compiler/runtime/safepoints/TestRegisterRestoring.java compiler/runtime/Test7196199.java TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. That's what I found out so far. Maybe you have an idea? I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. Best regards, Martin -----Original Message----- From: hotspot-dev > On Behalf Of Gustavo Romero Sent: Montag, 3. September 2018 14:57 To: Lindenmaier, Goetz >; Michihiro Horie > Cc: hotspot compiler >; hotspot-dev at openjdk.java.net Subject: Re: RFR: 8208171: PPC64: Enrich SLP support Hi Goetz, On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote: > Also, I can not find all of the mail traffic in > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html. > Is this a problem of the pipermail server? > > For some reason this webrev lacks the links to browse the diffs. > Do you need to use a more recent webrev? You can obtain it with > hg clone http://hg.openjdk.java.net/code-tools/webrev/ . Yes, probably it was a problem of the pipermail or in some relay. I noted the same thing, i.e. at least one Michi reply arrived to me but missed a ML. The initial discussion is here: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html I understand Martin reviewed the last webrev in that thread, which is http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ (taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html) Martin's review of webrev.01: http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html and Michi's reply to Martin's review of webrev.01: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02, taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html). and your last review: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html HTH. Best regards, Gustavo > Why do you rename vnoreg to vnoregi? > > Besides that the change is fine, thanks for implementing this! > > Best regards, > Goetz. > > >> -----Original Message----- >> From: Doerr, Martin >> Sent: Dienstag, 28. August 2018 19:35 >> To: Gustavo Romero >; Michihiro Horie >> > >> Cc: Lindenmaier, Goetz >; hotspot- >> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker >> > >> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support >> >> Hi Michihiro, >> >> thank you for implementing it. I have just taken a first look at your >> webrev.01. >> >> It looks basically good. Only the Power version check seems to be incorrect. >> VM_Version::has_popcntb() checks for Power5. >> I believe most instructions are available with Power7. >> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with >> Power8? >> We should check this carefully. >> >> Also, indentation in register_ppc.hpp could get improved. >> >> Thanks and best regard, >> Martin >> >> >> -----Original Message----- >> From: Gustavo Romero > >> Sent: Donnerstag, 26. Juli 2018 16:02 >> To: Michihiro Horie > >> Cc: Lindenmaier, Goetz >; hotspot- >> dev at openjdk.java.net; Doerr, Martin >; ppc-aix- >> port-dev at openjdk.java.net; Simonis, Volker > >> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >> >> Hi Michi, >> >> On 07/26/2018 01:43 AM, Michihiro Horie wrote: >>> I updated webrev: >>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ >> >> Thanks for providing an updated webrev and for fixing indentation and >> function >> order in assembler_ppc.inline.hpp as well. I have no further comments :) >> >> >> Best Regards, >> Gustavo >> >>> >>> Best regards, >>> -- >>> Michihiro, >>> IBM Research - Tokyo >>> >>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, >> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero --- >> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie >> wrote: >>> >>> From: Gustavo Romero > >>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port- >> dev at openjdk.java.net, hotspot-dev at openjdk.java.net >>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin" >> > >>> Date: 2018/07/25 23:05 >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >>> >>> ------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------------------------- >> ----------------------------------------------------- >>> >>> >>> >>> Hi Michi, >>> >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote: >>> > Dear all, >>> > >>> > Would you review the following change? >>> > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 >>> > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 >>> > >>> > This change adds support for vectorized arithmetic calculation with SLP. >>> > >>> > The to_vr function is added to convert VSR to VR. Currently, vecX is >> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, >> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the >> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the >> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to >> the ConvD2FNode::Value in convertnode.cpp. >>> >>> Looks good. Just a few comments: >>> >>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of >> vmaddfp in >>> order to avoid the splat? >>> >>> - Although all instructions added by your change where introduced in ISA >> 2.06, >>> so POWER7 and above are OK, as I see probes for >> PowerArchictecturePPC64=6|5 in >>> vm_version_ppc.cpp (line 64), I'm wondering if there is any control point >> to >>> guarantee that these instructions won't be emitted on a CPU that does >> not >>> support them. >>> >>> - I think that in general string in format %{} are in upper case. For instance, >>> this the current output on optoassembly for vmul4F: >>> >>> 2941835 5b4 ADDI R24, R24, #64 >>> 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F >>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>> >>> I think it would be better to be in upper case instead. I also think that if >>> the node match emits more than one instruction all instructions must be >> listed >>> in format %{}, since it's meant for detailed debugging. Finally I think it >>> would be better to replace \t! by \t// in that string (unless I'm missing any >>> special meaning for that char). So for vmul4F it would be something like: >>> >>> 2941835 5b4 ADDI R24, R24, #64 >>> VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 >>> 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F >>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>> >>> >>> But feel free to change anything just after you get additional reviews :) >>> >>> >>> > I confirmed this change with JTREG. In addition, I used attached micro >> benchmarks. >>> > /(See attached file: slp_microbench.zip)/ >>> >>> Thanks for sharing it. >>> Btw, another option to host it would be in the CR >>> server, in http://cr.openjdk.java.net/~mhorie/8208171 >>> >>> >>> Best regards, >>> Gustavo >>> >>> > >>> > Best regards, >>> > -- >>> > Michihiro, >>> > IBM Research - Tokyo >>> > >>> >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From vladimir.kozlov at oracle.com Tue Sep 4 17:23:23 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 4 Sep 2018 10:23:23 -0700 Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics In-Reply-To: <20180904103622.sijpgfiltco4mxd2@rbackman> References: <20180904103622.sijpgfiltco4mxd2@rbackman> Message-ID: +1 Thanks, Vladimir On 9/4/18 3:36 AM, Rickard B?ckman wrote: > Looks good. > > /R > > On 08/30, Erik ?sterlund wrote: >> Hi, >> >> The JFR getEventWriter() intrinsics have code in C1 and C2 that manually >> resolves jobjects. This should go through the Access API to make sure the >> necessary GC barriers are inserted. >> >> I noticed this in an attempt to move JNI handle processing out of the pause >> (among other things). It crashed in kitchensink. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8210158 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.00 >> >> I tested the patch by running it, together with a patch that moves out JNI >> handle processing outside of the pause, through hs-tier1-3, as well as >> running it through Kitchensink24H (as it originally crashed in kitchensink). >> >> Thanks, >> /Erik From vladimir.kozlov at oracle.com Tue Sep 4 17:25:58 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 4 Sep 2018 10:25:58 -0700 Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11 In-Reply-To: References: <98587949-4bef-51ad-0652-3a74f79af233@oracle.com> Message-ID: <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com> Looks good. Thanks, Vladimir On 9/3/18 12:21 AM, Roland Westrelin wrote: > > Hi Vladimir, > > Thanks for the review. Here is a new webrev that should address your > comment. > > http://cr.openjdk.java.net/~roland/8209544/webrev.01/ > > Roland. > From erik.osterlund at oracle.com Tue Sep 4 17:55:30 2018 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Tue, 4 Sep 2018 19:55:30 +0200 Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics In-Reply-To: References: <20180904103622.sijpgfiltco4mxd2@rbackman> Message-ID: <1D400C8F-D4CF-44E3-82C4-00EB32AE103C@oracle.com> Hi Vladimir, Thank you for the review. /Erik > On 4 Sep 2018, at 19:23, Vladimir Kozlov wrote: > > +1 > > Thanks, > Vladimir > >> On 9/4/18 3:36 AM, Rickard B?ckman wrote: >> Looks good. >> /R >>> On 08/30, Erik ?sterlund wrote: >>> Hi, >>> >>> The JFR getEventWriter() intrinsics have code in C1 and C2 that manually >>> resolves jobjects. This should go through the Access API to make sure the >>> necessary GC barriers are inserted. >>> >>> I noticed this in an attempt to move JNI handle processing out of the pause >>> (among other things). It crashed in kitchensink. >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8210158 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.00 >>> >>> I tested the patch by running it, together with a patch that moves out JNI >>> handle processing outside of the pause, through hs-tier1-3, as well as >>> running it through Kitchensink24H (as it originally crashed in kitchensink). >>> >>> Thanks, >>> /Erik From vladimir.kozlov at oracle.com Tue Sep 4 18:40:42 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 4 Sep 2018 11:40:42 -0700 Subject: RFR (S): 8209972: [GRAAL] Don't run RTM tests with Graal In-Reply-To: References: <9148c105-1bd8-0852-9130-6b3c7823330e@linux.vnet.ibm.com> <3ae68161-0f78-2766-655f-fe3fe4a4719f@oracle.com> Message-ID: <9e7a0c2b-47c0-45b7-a394-a88849cfa3a1@oracle.com> Thank you Gustavo for detailed answer. I agree with your suggestion to keep vm.rtm.cpu and remove vm.rtm.os. Changes in VMProps.java are good now. About property check (flavor == "server" & !emulatedClient) in tests. If C2 (and Graal in a future when it supports RTM) is used, this check is always true because C2 (and Graal) is only available in Server VM and it is disabled in emulatedClient setting. This check was added before we have C2 usage check which can be done now with vm.rtm.compiler. Thanks, Vladimir On 9/3/18 3:15 PM, Gustavo Romero wrote: > Hi Vladimir, > > Thanks a lot for reviewing it and for your comments. > > On 08/31/2018 03:12 PM, Vladimir Kozlov wrote: >> Hi Gustavo, >> >> I think you should replace !Compiler.isGraalEnabled() with Compiler.isC2Enabled() because C2 may be switched off with >> TieredStopAtLevel < 4 flag > > Yes, although currently afaics all tests will explicitly enabled C2 (for > instance, passing -Xcomp and -XX:-TieredCompilation) or implicitly enable C2 > through a warming up before testing, I agree that nothing forbids one to > switch off C2 with TieredStopAtLevel = level < 4 for a test case. It also > looks better to list explicitly which compilers do support RTM instead of > the ones that don't support it. > > I've updated the webrev accordingly: > > http://cr.openjdk.java.net/~gromero/8209972/v2/ > > diff in there looks odd so I generated another one with --patience for a > better (IMO) diff format: > > http://cr.openjdk.java.net/~gromero/8209972/v2/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v2.diff > > >> Also can platforms check be replaced with one vmRTMCPU()? Is ppc64 return true from vmRTMCPU()? > > For example, on Linux the following cases are possible regarding CPU / OS > RTM support: > > POWER7?? : cpu = false, os = false???????? => vm.rtm.cpu = false > POWER8?? : cpu = true,? os = false | true? => vm.rtm.cpu = false | true > POWER9 VM: cpu = true,? os = false | true? => vm.rtm.cpu = false | true > POWER9 NV: cpu = true,? os = false???????? => vm.rtm.cpu = false > > PPC64 will return 'true' for vmRTMCPU() _only_ if both CPU and OS support > RTM. In other words, when @requires asks for "vm.rtm.cpu & vm.rtm.os" it > really looks like a tautology because if "vm.rtm.cpu" is 'true' it implies > "vm.rtm.os" being 'true' as well, otherwise the JVM would never advertise > the "rtm" feature (which is used by "vm.rtm.cpu"). That seems true for > Linux and for AIX. > > That said I don't think that the platforms check can be replaced with one > vmRTMCPU(), because in some cases it's necessary to run a test for > cpu = false and compiler = true, i.e. it's necessary to run a test on an > unsupported CPU for a given platform _only if_ the compiler in use supports > RTM (like C2). So if, for instance, we do: > > 'vm.rtm.compiler = C2 && vmRTMCPU()' and use 'vm.rtm.compiler' in @requires > we "tie" CPU+OS RTM support to compiler RTM support and the evaluation > returns 'false' for cpu = false and compiler = true, skipping the test > (vm.rtm.compiler = false). On the other hand, to evaluate 'vm.rtm.compiler' > as 'true' and run the test in that case one could match for > '!vm.rtm.compiler', but if compiler = false (Graal) '!vm.rtm.compiler' will > be evaluated as 'true' and the test will run even thought the Graal > compiler is selected, which is wrong. > > Hence 'vm.rtm.compiler' property must not rely on vmRTMCPU() and must > contain its own list of supported compilers with RTM support for each > platform IMO. Basically we can't ask the JVM about the compiler's support > for RTM, since the JVM can only tell us about the CPU+OS support for RTM > regarding the CPU and OS in which the JVM is running on. > > >> And since you check cpu in here why not to replace all vm.rtm.* with one vm.rtm.supported? In such case you would need >> only one @requires checks in tests instead of: >> >> vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os & vm.rtm.compiler > > I think it's not possible either. Currently there are 5 match cases in > RTM tests: > > gromero at moog:~/hg/jdk/jdk/test/hotspot/jtreg/compiler/rtm$ fgrep @require -RIn | cut -d ' ' -f 2-40 | sort -u > * @requires !(vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os) > * @requires vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os > * @requires (!vm.rtm.cpu) & (vm.flavor == "server" & !vm.emulatedClient) > * @requires vm.rtm.cpu & !(vm.flavor == "server" & !vm.emulatedClient) > > which can be simplified 5 cases as: > > 1:????????? !(flavor == "server" & !emulatedClient? & cpu & os) > 2:??????????? flavor == "server" & !emulatedClient? & cpu & os > 3: (!cpu) &? (flavor == "server" & !emulatedClient) > 4:?? cpu? & !(flavor == "server" & !emulatedClient) > 5: no @requires > > I understand that case 1 and 2 (since CPU implies OS) can be simplified as: > > > 1:????????? !(flavor == "server" & !emulatedClient? & cpu) > 2:??????????? flavor == "server" & !emulatedClient? & cpu > 3: (!cpu) &? (flavor == "server" & !emulatedClient) > 4:?? cpu? & !(flavor == "server" & !emulatedClient) > 5: no @requires > > and case 1 and 2 are mere opposites, so we have 4 cases: > > 1:????????? !(flavor == "server" & !emulatedClient? & cpu) > 3: (!cpu) &? (flavor == "server" & !emulatedClient) > 4:?? cpu? & !(flavor == "server" & !emulatedClient) > 5: no @requires > > We could simplify further making P = (flavor == "server" & !emulatedClient), > and make: > > 1:????????? !(P & cpu) > 3: (!cpu) &? (P) > 4:?? cpu? & !(P) > 5: no @requires > > So if we add a compiler = C2 && (x64 | PPC) property to each of them in > order to control running the tests only if the selected compiler on a > given platform has RTM support (skipping Graal, for instance): > > 1:????????? !(P & cpu) & compiler > 3: (!cpu) &? (P)?????? & compiler > 4:?? cpu? & !(P)?????? & compiler > 5: no @requires??????? & compiler > > So it looks like that at minimum we would need 3 properties, but IMO it's > not worth to add another property P = (flavor == "server" & !emulatedClient) > just to simplify further the @requires line. > > In summing up, I think it's only possible to replace 'cpu & os' by 'cpu', > so I updated the webrev removing the vm.rtm.os property and keeping only > vm.rtm.cpu and vm.rtm.compiler (plus flavor and emulatedClient checks). > > I've tested the following scenarios and observed no regression [1]: > > 1. X86_64 w/ RTM > 2. X86_64 w/ RTM + Graal enabled > 3. POWER7: no CPU+OS support for RTM > 4. POWER8: CPU+OS support for RTM > > But I think we need a confirmation from SAP about AIX. > > > Best regards, > Gustavo > > [1] > > ** X86_64 w/ RTM ** > Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java > Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java > Passed: compiler/rtm/cli/TestRTMRetryCountOption.java > Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java > Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java > Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java > Passed: compiler/rtm/locking/TestRTMAbortThreshold.java > Passed: compiler/rtm/locking/TestRTMAbortRatio.java > Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java > Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java > Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java > Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java > Passed: compiler/rtm/locking/TestRTMLockingThreshold.java > Passed: compiler/rtm/locking/TestRTMRetryCount.java > Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java > Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java > Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java > Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java > Passed: compiler/rtm/locking/TestUseRTMDeopt.java > Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java > Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java > Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java > Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java > Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java > Test results: passed: 30 > > > ** X86_64 w/ RTM + Graal enabled ** > Test results: no tests selected (all RTM tests skipped) > > > ** POWER7: no CPU+OS support for RTM ** > Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java > Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java > Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java > Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java > Passed: compiler/rtm/cli/TestRTMRetryCountOption.java > Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java > Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java > Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java > Test results: passed: 10 > > > ** POWER8: CPU+OS support for RTM ** > Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java > Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java > Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java > Passed: compiler/rtm/cli/TestRTMRetryCountOption.java > Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java > Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java > Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java > Passed: compiler/rtm/locking/TestRTMAbortRatio.java > Passed: compiler/rtm/locking/TestRTMAbortThreshold.java > Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java > Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java > Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java > Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java > Passed: compiler/rtm/locking/TestRTMLockingThreshold.java > Passed: compiler/rtm/locking/TestRTMRetryCount.java > Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java > Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java > Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java > Passed: compiler/rtm/locking/TestUseRTMDeopt.java > Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java > Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java > Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java > Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java > Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java > Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java > Test results: passed: 30 > > >> Thanks, >> Vladimir >> >> On 8/31/18 8:38 AM, Gustavo Romero wrote: >>> Hi, >>> >>> Could the following small change be reviewed please? >>> >>> Bug?? : https://bugs.openjdk.java.net/browse/JDK-8209972 >>> Webrev: http://cr.openjdk.java.net/~gromero/8209972/v1/ >>> >>> It disables all RTM tests when a compiler not supporting RTM (e.g. Graal) >>> is selected on platforms that can have CPU/OS with RTM support. >>> >>> It also disables all RTM tests for any other platform that has not a single >>> compiler supporting RTM. >>> >>> The RTM support was first added to C2 compiler and once checkers for RTM >>> (notably vm.rtm.cpu) find the feature "rtm" advertised by the JVM they >>> assume that a compiler supporting RTM is available for sure ("rtm" is >>> advertised only if RTM is supported by both CPU and OS). Later the JVM >>> began to allow the selection of a compiler different from C2, like Graal, >>> and it became possible to select a compiler without RTM support despite the >>> fact that both the CPU and the OS support RTM. Thus for platforms >>> supporting Graal or any other specific compiler the compiler availability for >>> the RTM tests must be adjusted and if the selected compiler does not >>> support RTM then all RTM tests must be skipped, including the ones meant >>> for platforms without CPU or OS RTM support (e.g. *Unsupported*.java) >>> because in some cases, like in TestUseRTMLockingOptionOnUnsupportedCPU.java, >>> the test expects JVM initialization errors that will never occur because the >>> problem is not that the RTM support for CPU or OS is missing, but rather >>> because the selected compiler does not support RTM. >>> >>> That change adds a new VM property 'vm.rtm.compiler' which can be used to >>> filter out compilers without RTM support for specific platforms and adapts >>> the current RTM tests to use that new property. >>> >>> Nothing changes regarding the number of passing/selected tests for the >>> various cpu/os/compiler combinations on platforms that currently might >>> support RTM [1], except when Graal is in use. >>> >>> Thank you. >>> >>> Best regards, >>> Gustavo >>> >>> >>> [1] >>> >>> ** X64 w/ CPU and OS supporting RTM ** >>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>> Test results: passed: 30 >>> >>> ** X64 w/ CPU and OS supporting RTM + Graal compiler wo/ RTM support ** >>> Test results: no tests selected (all RTM tests skipped) >>> >>> ** POWER8 w/ CPU and OS supporting RTM ** >>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>> Test results: passed: 30 >>> >>> ** POWER7 wo/ CPU and OS supporting RTM ** >>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>> Test results: passed: 10 >>> >> > From nils.eliasson at oracle.com Tue Sep 4 19:50:49 2018 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 4 Sep 2018 21:50:49 +0200 Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics In-Reply-To: <5B8E5F4B.5060707@oracle.com> References: <5B8E5F4B.5060707@oracle.com> Message-ID: <29357f36-4afa-9f98-5430-d12df410ca4d@oracle.com> Looks good! // Nils On 2018-09-04 12:32, Erik ?sterlund wrote: > Hi, > > Any more takers? > > Full: > http://cr.openjdk.java.net/~eosterlund/8210158/webrev.01/ > > Thanks, > /Erik > > On 2018-08-30 17:06, Erik ?sterlund wrote: >> Hi Roland, >> >> Thank you for the review. >> >> On 2018-08-30 13:21, Roland Westrelin wrote: >>> Hi Erik, >>> >>>> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.00 >>> make_load() already calls _gvn.transform(), right? >> >> Yes you are right. I will remove the redundant _gvn.transform call of >> the access_load; it is redundant indeed. >> >>> You don't set MO_UNORDERED. Why is it not required? >> >> MO_UNORDERED is the default MO of loads and stores. It is set up in >> the C2Access object using fixup_decorators() which sets sane defaults >> for various decorators, including MO. >> >> Thanks, >> /Erik >> >>> Roland. >> > From erik.osterlund at oracle.com Tue Sep 4 20:00:56 2018 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Tue, 4 Sep 2018 22:00:56 +0200 Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics In-Reply-To: <29357f36-4afa-9f98-5430-d12df410ca4d@oracle.com> References: <5B8E5F4B.5060707@oracle.com> <29357f36-4afa-9f98-5430-d12df410ca4d@oracle.com> Message-ID: <66BA2B51-3F67-4A15-ADC2-51529DEB14E9@oracle.com> Hi Nils, Thank you for the review! /Erik > On 4 Sep 2018, at 21:50, Nils Eliasson wrote: > > Looks good! > > // Nils > > >> On 2018-09-04 12:32, Erik ?sterlund wrote: >> Hi, >> >> Any more takers? >> >> Full: >> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.01/ >> >> Thanks, >> /Erik >> >>> On 2018-08-30 17:06, Erik ?sterlund wrote: >>> Hi Roland, >>> >>> Thank you for the review. >>> >>>> On 2018-08-30 13:21, Roland Westrelin wrote: >>>> Hi Erik, >>>> >>>>> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.00 >>>> make_load() already calls _gvn.transform(), right? >>> >>> Yes you are right. I will remove the redundant _gvn.transform call of the access_load; it is redundant indeed. >>> >>>> You don't set MO_UNORDERED. Why is it not required? >>> >>> MO_UNORDERED is the default MO of loads and stores. It is set up in the C2Access object using fixup_decorators() which sets sane defaults for various decorators, including MO. >>> >>> Thanks, >>> /Erik >>> >>>> Roland. >>> >> > From gromero at linux.vnet.ibm.com Tue Sep 4 22:03:22 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 4 Sep 2018 19:03:22 -0300 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: <57ebd30a66504577a6b2ec267aee4b69@sap.com> References: <346da54af45243c4bdaf475f118a450d@sap.com> <9553d65d98f74f37a35b49a1e39f015e@sap.com> <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com> <57ebd30a66504577a6b2ec267aee4b69@sap.com> Message-ID: <8a093a6e-631e-4c20-07dc-96bc29f8c411@linux.vnet.ibm.com> Hi Martin and Michi, On 09/04/2018 01:20 PM, Doerr, Martin wrote: > Can you reproduce the test failures? > > The very same VM works fine on a different Power8 machine which uses the same instructions by C2. > > The VM was built on the machine where it works ("SUSE Linux Enterprise Server 12 SP1"). > > I have seen several linux kernel changes regarding saving and restoring the VSX registers. > > I still haven?t found out how the kernel determines things like ?tsk->thread.used_vsr? which is used to set ?msr |= MSR_VEC?. > > Maybe something is missing which tells the kernel that we?re using it. But that?s just a guess. Facilities like FP (fp registers), VEC (vector registers - aka VMX/Altivec), and VSX (vector-scalar registers) are usually disabled on a new born process. Once any instruction associated to these facilities is used in the process it causes an exception that is treated by the kernel [1, 2, 3]: kernel enables the facility that caused the exception (see load_up_fp & friends) and re-execute the instruction when kernel returns the control back to the process in userspace. Starting from kernel v4.6 [4] there is a simple heuristic that employs a 8-bit counter to help track if a process, after using these facilities for the first time, continues to use the facilities. The counters (load_fp and load_vec) are incremented on each context switch and if the process stops using the FP or VEC facilities then they are disabled again with FP/VEC/VSX save/restore on context switches being disabled as well in order to improve the performance on context switches by avoiding the FP/VEC/VEX register save/restore. Either way (before or after the change introduced in v4.6) *that mechanism is opaque to userspace*, particularly to the process using these facilities. If a given facility is not enabled by the kernel (in case the CPU does not support it, kernel sends a SIGILL to the process). It's possible to inspect the thread member dynamics/state from userspace using tools like 'systemtap' (for exemple, this simple script can be used to inspect a VRSAVE registers on given thread that is running a program called 'vrsave_' [5]) or using the 'perf' tool. "tsk->thread.used_vsr" [6] is actually associated to the VSX facility whilst MSR_VEC is associated to the VEC/VMX/Altivec facility [7], so "tsk->thread.used_vsr" is set to 1 once a VSX instruction is used (if it's a new process or if the load_fp and load_vec counters overflowed and became zero disabling VSX or if only FP or only VEC - not both - were used in the process). In that case kernel will also enable the VSX by MSR |= MSR_VSX. A similar mechanism drives the FP (MSR_FP) and the VEC (MSR_VEC) facilities. If both FP and VEC facilities are used the VSX facility is enabled automatically since FP+VEC regsets == VSX regset [8]. Thus as this mechanism is entirely opaque to userspace I understand that if a program has to tell to kernel it wants to use any of these facilities (FP/VEC/VEC) before using it there is something wrong going in kernelspace. Martin and Michi, if you want any help on drilling it further at kernel side please let me know, maybe I can help. I didn't have the chance to reproduce the crash yet, so if I find anything meaningful about it tomorrow I'll keep you posted. Kind regards, Gustavo [1] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L851-L869 (FP) [2] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211 (VEC/VMX/Altivec) [3] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211 (VSX) [4] https://github.com/torvalds/linux/commit/70fe3d980#diff-ef76830326856a12ea2b45630123d1adR239 [5] http://cr.openjdk.java.net/~gromero/script.d [6] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R310 [7] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R250 [8] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R437 > Best regards, > > Martin > > *From:*Michihiro Horie > *Sent:* Dienstag, 4. September 2018 07:32 > *To:* Doerr, Martin > *Cc:* Lindenmaier, Goetz ; Gustavo Romero ; hotspot compiler ; hotspot-dev at openjdk.java.net > *Subject:* RE: RFR: 8208171: PPC64: Enrich SLP support > > Hi Goetz, Martin, and Gustavo, > > >>First, this should have been reviewed on hotspot-compiler-dev. It is clearly >>a compiler change. _ > _>https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.openjdk.java.net_mailman_listinfo&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oecsIpYF-cifqq2i1JEH0Q&m=AwJriSOfe9Z0niOEpp6HGgsCBhKwnM19dyn4CipYwyU&s=O9RJz8qw_uJHSJyEdWsuR2j_lgnquX3sbwyEgkFZ3YQ&e= says that hotspot-dev is for >>"Technical discussion about the development of the HotSpot virtual machine that's not specific to any particular component" >>while hotspot-compiler-dev is for >>"Technical discussion about the development of the HotSpot bytecode compilers" > I understood the instruction and would use hotspot-compiler-dev in future RFRs, thanks. > > >> Why do you rename vnoreg to vnoregi? > I followed the way of coding for vsnoreg and vsnoregi, but the renaming does not look necessary. I would get this part back. Should I also rename vsnoregi to vsnoreg? > > >>we noticed jtreg test failures when using this change: >>compiler/runtime/safepoints/TestRegisterRestoring.java >>compiler/runtime/Test7196199.java >> >>TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. >> >>We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. >>The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. > Thank you for letting me know the issue, I will try to reproduce this on a SUSE machine. > > >>I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. > Thank you for pointing out another issue. Currently I do not hit this problem, but preventing to match the vector nodes makes sense to avoid the crash. I did not prepare match rules for non-vector nodes, so it might be better to prepare them similarly like the Replicate* rules, in any case. > > > Gustavo, thanks for the wrap-up! > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > Inactive hide details for "Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures whe"Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures when using this change: > > From: "Doerr, Martin" > > To: Gustavo Romero >, "Lindenmaier, Goetz" >, Michihiro Horie > > Cc: hotspot compiler >, "hotspot-dev at openjdk.java.net " > > Date: 2018/09/04 02:18 > Subject: RE: RFR: 8208171: PPC64: Enrich SLP support > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > > Hi Gustavo and Michihiro, > > we noticed jtreg test failures when using this change: > compiler/runtime/safepoints/TestRegisterRestoring.java > compiler/runtime/Test7196199.java > > TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. > > We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. > The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. > > That's what I found out so far. Maybe you have an idea? > > I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. > > Best regards, > Martin > > > -----Original Message----- > From: hotspot-dev > On Behalf Of Gustavo Romero > Sent: Montag, 3. September 2018 14:57 > To: Lindenmaier, Goetz >; Michihiro Horie > > Cc: hotspot compiler >; hotspot-dev at openjdk.java.net > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > Hi Goetz, > > On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote: >> Also, I can not find all of the mail traffic in >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html. >> Is this a problem of the pipermail server? >> >> For some reason this webrev lacks the links to browse the diffs. >> Do you need to use a more recent webrev? You can obtain it with >> hg clone http://hg.openjdk.java.net/code-tools/webrev/ . > > Yes, probably it was a problem of the pipermail or in some relay. > I noted the same thing, i.e. at least one Michi reply arrived > to me but missed a ML. > > The initial discussion is here: > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html > > I understand Martin reviewed the last webrev in that thread, which is > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ (taken from > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html) > > Martin's review of webrev.01: > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html > > and Michi's reply to Martin's review of webrev.01: > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02, > taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html). > > and your last review: > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html > > > HTH. > > Best regards, > Gustavo > >> Why do you rename vnoreg to vnoregi? >> >> Besides that the change is fine, thanks for implementing this! >> >> Best regards, >> Goetz. >> >> >>> -----Original Message----- >>> From: Doerr, Martin >>> Sent: Dienstag, 28. August 2018 19:35 >>> To: Gustavo Romero >; Michihiro Horie >>> > >>> Cc: Lindenmaier, Goetz >; hotspot- >>> dev at openjdk.java.net ; ppc-aix-port-dev at openjdk.java.net ; Simonis, Volker >>> > >>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support >>> >>> Hi Michihiro, >>> >>> thank you for implementing it. I have just taken a first look at your >>> webrev.01. >>> >>> It looks basically good. Only the Power version check seems to be incorrect. >>> VM_Version::has_popcntb() checks for Power5. >>> I believe most instructions are available with Power7. >>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with >>> Power8? >>> We should check this carefully. >>> >>> Also, indentation in register_ppc.hpp could get improved. >>> >>> Thanks and best regard, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Gustavo Romero > >>> Sent: Donnerstag, 26. Juli 2018 16:02 >>> To: Michihiro Horie > >>> Cc: Lindenmaier, Goetz >; hotspot- >>> dev at openjdk.java.net ; Doerr, Martin >; ppc-aix- >>> port-dev at openjdk.java.net ; Simonis, Volker > >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >>> >>> Hi Michi, >>> >>> On 07/26/2018 01:43 AM, Michihiro Horie wrote: >>>> I updated webrev: >>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ >>> >>> Thanks for providing an updated webrev and for fixing indentation and >>> function >>> order in assembler_ppc.inline.hpp as well. I have no further comments :) >>> >>> >>> Best Regards, >>> Gustavo >>> >>>> >>>> Best regards, >>>> -- >>>> Michihiro, >>>> IBM Research - Tokyo >>>> >>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero --- >>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie >>> wrote: >>>> >>>> From: Gustavo Romero > >>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port- >>> dev at openjdk.java.net , hotspot-dev at openjdk.java.net >>>> Cc: goetz.lindenmaier at sap.com , volker.simonis at sap.com , "Doerr, Martin" >>> > >>>> Date: 2018/07/25 23:05 >>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >>>> >>>> ------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ----------------------------------------------------- >>>> >>>> >>>> >>>> Hi Michi, >>>> >>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote: >>>> > Dear all, >>>> > >>>> > Would you review the following change? >>>> > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 >>>> > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 >>>> > >>>> > This change adds support for vectorized arithmetic calculation with SLP. >>>> > >>>> > The to_vr function is added to convert VSR to VR. Currently, vecX is >>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, >>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the >>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the >>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to >>> the ConvD2FNode::Value in convertnode.cpp. >>>> >>>> Looks good. Just a few comments: >>>> >>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of >>> vmaddfp in >>>> order to avoid the splat? >>>> >>>> - Although all instructions added by your change where introduced in ISA >>> 2.06, >>>> so POWER7 and above are OK, as I see probes for >>> PowerArchictecturePPC64=6|5 in >>>> vm_version_ppc.cpp (line 64), I'm wondering if there is any control point >>> to >>>> guarantee that these instructions won't be emitted on a CPU that does >>> not >>>> support them. >>>> >>>> - I think that in general string in format %{} are in upper case. For instance, >>>> this the current output on optoassembly for vmul4F: >>>> >>>> 2941835 5b4 ADDI R24, R24, #64 >>>> 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F >>>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>>> >>>> I think it would be better to be in upper case instead. I also think that if >>>> the node match emits more than one instruction all instructions must be >>> listed >>>> in format %{}, since it's meant for detailed debugging. Finally I think it >>>> would be better to replace \t! by \t// in that string (unless I'm missing any >>>> special meaning for that char). So for vmul4F it would be something like: >>>> >>>> 2941835 5b4 ADDI R24, R24, #64 >>>> VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 >>>> 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F >>>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>>> >>>> >>>> But feel free to change anything just after you get additional reviews :) >>>> >>>> >>>> > I confirmed this change with JTREG. In addition, I used attached micro >>> benchmarks. >>>> > /(See attached file: slp_microbench.zip)/ >>>> >>>> Thanks for sharing it. >>>> Btw, another option to host it would be in the CR >>>> server, in http://cr.openjdk.java.net/~mhorie/8208171 >>>> >>>> >>>> Best regards, >>>> Gustavo >>>> >>>> > >>>> > Best regards, >>>> > -- >>>> > Michihiro, >>>> > IBM Research - Tokyo >>>> > >>>> >>>> >>>> >> > > > > From rwestrel at redhat.com Wed Sep 5 08:05:00 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 05 Sep 2018 10:05:00 +0200 Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11 In-Reply-To: <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com> References: <98587949-4bef-51ad-0652-3a74f79af233@oracle.com> <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com> Message-ID: Thanks for the review. Anyone else? Roland. From rwestrel at redhat.com Wed Sep 5 08:06:06 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 05 Sep 2018 10:06:06 +0200 Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics In-Reply-To: <5B8E5F4B.5060707@oracle.com> References: <5B8E5F4B.5060707@oracle.com> Message-ID: Hi Erik, > http://cr.openjdk.java.net/~eosterlund/8210158/webrev.01/ This one still has useless _gvn.transform() calls. Roland. From erik.osterlund at oracle.com Wed Sep 5 08:16:19 2018 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Wed, 5 Sep 2018 10:16:19 +0200 Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics In-Reply-To: References: <5B8E5F4B.5060707@oracle.com> Message-ID: <5B8F90D3.5000000@oracle.com> Hi Roland, On 2018-09-05 10:06, Roland Westrelin wrote: > Hi Erik, > >> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.01/ > This one still has useless _gvn.transform() calls. Fixed. Full: http://cr.openjdk.java.net/~eosterlund/8210158/webrev.02 Incremental: http://cr.openjdk.java.net/~eosterlund/8210158/webrev.01_02 Thanks, /Erik > > Roland. From rwestrel at redhat.com Wed Sep 5 08:16:35 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 05 Sep 2018 10:16:35 +0200 Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics In-Reply-To: <5B8F90D3.5000000@oracle.com> References: <5B8E5F4B.5060707@oracle.com> <5B8F90D3.5000000@oracle.com> Message-ID: > http://cr.openjdk.java.net/~eosterlund/8210158/webrev.02 Looks good. Thank you. Roland. From erik.osterlund at oracle.com Wed Sep 5 08:20:46 2018 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Wed, 5 Sep 2018 10:20:46 +0200 Subject: RFR: 8210158: Accessorize JFR getEventWriter() intrinsics In-Reply-To: References: <5B8E5F4B.5060707@oracle.com> <5B8F90D3.5000000@oracle.com> Message-ID: <5B8F91DE.3040709@oracle.com> Hi Roland, Thank you for the review. /Erik On 2018-09-05 10:16, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~eosterlund/8210158/webrev.02 > Looks good. Thank you. > > Roland. From vladimir.x.ivanov at oracle.com Wed Sep 5 09:22:05 2018 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 5 Sep 2018 12:22:05 +0300 Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11 In-Reply-To: References: <98587949-4bef-51ad-0652-3a74f79af233@oracle.com> Message-ID: <3c8ae9e3-e3f2-df0e-0add-4d1c54589198@oracle.com> > http://cr.openjdk.java.net/~roland/8209544/webrev.01/ Looks good. Best regards, Vladimir Ivanov From HORIE at jp.ibm.com Wed Sep 5 10:22:57 2018 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Wed, 5 Sep 2018 19:22:57 +0900 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: <8a093a6e-631e-4c20-07dc-96bc29f8c411@linux.vnet.ibm.com> References: <346da54af45243c4bdaf475f118a450d@sap.com> <9553d65d98f74f37a35b49a1e39f015e@sap.com> <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com> <57ebd30a66504577a6b2ec267aee4b69@sap.com> <8a093a6e-631e-4c20-07dc-96bc29f8c411@linux.vnet.ibm.com> Message-ID: Hi Martin, Gustavo, I cannot still reproduce the problem. I noticed the machine I have is not SUSE but OpenSUSE with 4.1.21-14-default. I've also tried kernel 4.4.0-31-generic but it's Ubuntu. Gustavo, is there any suspicious change before/after v4.4, which Martin got the crash? Apart from the problem, I uploaded the latest webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.03/ Best regards, -- Michihiro, IBM Research - Tokyo From: Gustavo Romero To: "Doerr, Martin" , Michihiro Horie/Japan/IBM at IBMJP Cc: "Lindenmaier, Goetz" , hotspot compiler Date: 2018/09/05 07:03 Subject: Re: RFR: 8208171: PPC64: Enrich SLP support Hi Martin and Michi, On 09/04/2018 01:20 PM, Doerr, Martin wrote: > Can you reproduce the test failures? > > The very same VM works fine on a different Power8 machine which uses the same instructions by C2. > > The VM was built on the machine where it works ("SUSE Linux Enterprise Server 12 SP1"). > > I have seen several linux kernel changes regarding saving and restoring the VSX registers. > > I still haven?t found out how the kernel determines things like ?tsk-> thread.used_vsr? which is used to set ?msr |= MSR_VEC?. > > Maybe something is missing which tells the kernel that we?re using it. But that?s just a guess. Facilities like FP (fp registers), VEC (vector registers - aka VMX/Altivec), and VSX (vector-scalar registers) are usually disabled on a new born process. Once any instruction associated to these facilities is used in the process it causes an exception that is treated by the kernel [1, 2, 3]: kernel enables the facility that caused the exception (see load_up_fp & friends) and re-execute the instruction when kernel returns the control back to the process in userspace. Starting from kernel v4.6 [4] there is a simple heuristic that employs a 8-bit counter to help track if a process, after using these facilities for the first time, continues to use the facilities. The counters (load_fp and load_vec) are incremented on each context switch and if the process stops using the FP or VEC facilities then they are disabled again with FP/VEC/VSX save/restore on context switches being disabled as well in order to improve the performance on context switches by avoiding the FP/VEC/VEX register save/restore. Either way (before or after the change introduced in v4.6) *that mechanism is opaque to userspace*, particularly to the process using these facilities. If a given facility is not enabled by the kernel (in case the CPU does not support it, kernel sends a SIGILL to the process). It's possible to inspect the thread member dynamics/state from userspace using tools like 'systemtap' (for exemple, this simple script can be used to inspect a VRSAVE registers on given thread that is running a program called 'vrsave_' [5]) or using the 'perf' tool. "tsk->thread.used_vsr" [6] is actually associated to the VSX facility whilst MSR_VEC is associated to the VEC/VMX/Altivec facility [7], so "tsk->thread.used_vsr" is set to 1 once a VSX instruction is used (if it's a new process or if the load_fp and load_vec counters overflowed and became zero disabling VSX or if only FP or only VEC - not both - were used in the process). In that case kernel will also enable the VSX by MSR |= MSR_VSX. A similar mechanism drives the FP (MSR_FP) and the VEC (MSR_VEC) facilities. If both FP and VEC facilities are used the VSX facility is enabled automatically since FP+VEC regsets == VSX regset [8]. Thus as this mechanism is entirely opaque to userspace I understand that if a program has to tell to kernel it wants to use any of these facilities (FP/VEC/VEC) before using it there is something wrong going in kernelspace. Martin and Michi, if you want any help on drilling it further at kernel side please let me know, maybe I can help. I didn't have the chance to reproduce the crash yet, so if I find anything meaningful about it tomorrow I'll keep you posted. Kind regards, Gustavo [1] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L851-L869 (FP) [2] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211 (VEC/VMX/Altivec) [3] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211 (VSX) [4] https://github.com/torvalds/linux/commit/70fe3d980#diff-ef76830326856a12ea2b45630123d1adR239 [5] http://cr.openjdk.java.net/~gromero/script.d [6] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R310 [7] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R250 [8] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R437 > Best regards, > > Martin > > *From:*Michihiro Horie > *Sent:* Dienstag, 4. September 2018 07:32 > *To:* Doerr, Martin > *Cc:* Lindenmaier, Goetz ; Gustavo Romero ; hotspot compiler ; hotspot-dev at openjdk.java.net > *Subject:* RE: RFR: 8208171: PPC64: Enrich SLP support > > Hi Goetz, Martin, and Gustavo, > > >>First, this should have been reviewed on hotspot-compiler-dev. It is clearly >>a compiler change. _ > _> http://mail.openjdk.java.net/mailman/listinfo says that hotspot-dev is for >>"Technical discussion about the development of the HotSpot virtual machine that's not specific to any particular component" >>while hotspot-compiler-dev is for >>"Technical discussion about the development of the HotSpot bytecode compilers" > I understood the instruction and would use hotspot-compiler-dev in future RFRs, thanks. > > >> Why do you rename vnoreg to vnoregi? > I followed the way of coding for vsnoreg and vsnoregi, but the renaming does not look necessary. I would get this part back. Should I also rename vsnoregi to vsnoreg? > > >>we noticed jtreg test failures when using this change: >>compiler/runtime/safepoints/TestRegisterRestoring.java >>compiler/runtime/Test7196199.java >> >>TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. >> >>We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. >>The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. > Thank you for letting me know the issue, I will try to reproduce this on a SUSE machine. > > >>I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. > Thank you for pointing out another issue. Currently I do not hit this problem, but preventing to match the vector nodes makes sense to avoid the crash. I did not prepare match rules for non-vector nodes, so it might be better to prepare them similarly like the Replicate* rules, in any case. > > > Gustavo, thanks for the wrap-up! > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > Inactive hide details for "Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures whe"Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures when using this change: > > From: "Doerr, Martin" > > To: Gustavo Romero >, "Lindenmaier, Goetz" >, Michihiro Horie > > Cc: hotspot compiler >, "hotspot-dev at openjdk.java.net " > > Date: 2018/09/04 02:18 > Subject: RE: RFR: 8208171: PPC64: Enrich SLP support > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > > Hi Gustavo and Michihiro, > > we noticed jtreg test failures when using this change: > compiler/runtime/safepoints/TestRegisterRestoring.java > compiler/runtime/Test7196199.java > > TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. > > We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. > The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. > > That's what I found out so far. Maybe you have an idea? > > I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. > > Best regards, > Martin > > > -----Original Message----- > From: hotspot-dev > On Behalf Of Gustavo Romero > Sent: Montag, 3. September 2018 14:57 > To: Lindenmaier, Goetz >; Michihiro Horie > > Cc: hotspot compiler >; hotspot-dev at openjdk.java.net > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > Hi Goetz, > > On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote: >> Also, I can not find all of the mail traffic in >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html. >> Is this a problem of the pipermail server? >> >> For some reason this webrev lacks the links to browse the diffs. >> Do you need to use a more recent webrev? You can obtain it with >> hg clone http://hg.openjdk.java.net/code-tools/webrev/ . > > Yes, probably it was a problem of the pipermail or in some relay. > I noted the same thing, i.e. at least one Michi reply arrived > to me but missed a ML. > > The initial discussion is here: > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html > > I understand Martin reviewed the last webrev in that thread, which is > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ < http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/> (taken from > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html ) > > Martin's review of webrev.01: > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html > > and Michi's reply to Martin's review of webrev.01: > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02, > taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html ). > > and your last review: > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html > > > HTH. > > Best regards, > Gustavo > >> Why do you rename vnoreg to vnoregi? >> >> Besides that the change is fine, thanks for implementing this! >> >> Best regards, >> Goetz. >> >> >>> -----Original Message----- >>> From: Doerr, Martin >>> Sent: Dienstag, 28. August 2018 19:35 >>> To: Gustavo Romero >; Michihiro Horie >>> > >>> Cc: Lindenmaier, Goetz >; hotspot- >>> dev at openjdk.java.net ; ppc-aix-port-dev at openjdk.java.net ; Simonis, Volker >>> > >>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support >>> >>> Hi Michihiro, >>> >>> thank you for implementing it. I have just taken a first look at your >>> webrev.01. >>> >>> It looks basically good. Only the Power version check seems to be incorrect. >>> VM_Version::has_popcntb() checks for Power5. >>> I believe most instructions are available with Power7. >>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with >>> Power8? >>> We should check this carefully. >>> >>> Also, indentation in register_ppc.hpp could get improved. >>> >>> Thanks and best regard, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Gustavo Romero > >>> Sent: Donnerstag, 26. Juli 2018 16:02 >>> To: Michihiro Horie > >>> Cc: Lindenmaier, Goetz >; hotspot- >>> dev at openjdk.java.net ; Doerr, Martin >; ppc-aix- >>> port-dev at openjdk.java.net ; Simonis, Volker > >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >>> >>> Hi Michi, >>> >>> On 07/26/2018 01:43 AM, Michihiro Horie wrote: >>>> I updated webrev: >>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ < http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/> >>> >>> Thanks for providing an updated webrev and for fixing indentation and >>> function >>> order in assembler_ppc.inline.hpp as well. I have no further comments :) >>> >>> >>> Best Regards, >>> Gustavo >>> >>>> >>>> Best regards, >>>> -- >>>> Michihiro, >>>> IBM Research - Tokyo >>>> >>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero --- >>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie >>> wrote: >>>> >>>> From: Gustavo Romero > >>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port- >>> dev at openjdk.java.net , hotspot-dev at openjdk.java.net >>>> Cc: goetz.lindenmaier at sap.com , volker.simonis at sap.com , "Doerr, Martin" >>> > >>>> Date: 2018/07/25 23:05 >>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >>>> >>>> ------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ----------------------------------------------------- >>>> >>>> >>>> >>>> Hi Michi, >>>> >>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote: >>>> > Dear all, >>>> > >>>> > Would you review the following change? >>>> > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 >>>> > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 < http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.00> >>>> > >>>> > This change adds support for vectorized arithmetic calculation with SLP. >>>> > >>>> > The to_vr function is added to convert VSR to VR. Currently, vecX is >>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, >>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the >>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the >>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to >>> the ConvD2FNode::Value in convertnode.cpp. >>>> >>>> Looks good. Just a few comments: >>>> >>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of >>> vmaddfp in >>>> order to avoid the splat? >>>> >>>> - Although all instructions added by your change where introduced in ISA >>> 2.06, >>>> so POWER7 and above are OK, as I see probes for >>> PowerArchictecturePPC64=6|5 in >>>> vm_version_ppc.cpp (line 64), I'm wondering if there is any control point >>> to >>>> guarantee that these instructions won't be emitted on a CPU that does >>> not >>>> support them. >>>> >>>> - I think that in general string in format %{} are in upper case. For instance, >>>> this the current output on optoassembly for vmul4F: >>>> >>>> 2941835 5b4 ADDI R24, R24, #64 >>>> 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F >>>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>>> >>>> I think it would be better to be in upper case instead. I also think that if >>>> the node match emits more than one instruction all instructions must be >>> listed >>>> in format %{}, since it's meant for detailed debugging. Finally I think it >>>> would be better to replace \t! by \t// in that string (unless I'm missing any >>>> special meaning for that char). So for vmul4F it would be something like: >>>> >>>> 2941835 5b4 ADDI R24, R24, #64 >>>> VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 >>>> 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F >>>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>>> >>>> >>>> But feel free to change anything just after you get additional reviews :) >>>> >>>> >>>> > I confirmed this change with JTREG. In addition, I used attached micro >>> benchmarks. >>>> > /(See attached file: slp_microbench.zip)/ >>>> >>>> Thanks for sharing it. >>>> Btw, another option to host it would be in the CR >>>> server, in http://cr.openjdk.java.net/~mhorie/8208171 < http://cr.openjdk.java.net/%7Emhorie/8208171> >>>> >>>> >>>> Best regards, >>>> Gustavo >>>> >>>> > >>>> > Best regards, >>>> > -- >>>> > Michihiro, >>>> > IBM Research - Tokyo >>>> > >>>> >>>> >>>> >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From dmitry.chuyko at bell-sw.com Wed Sep 5 15:50:34 2018 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Wed, 5 Sep 2018 18:50:34 +0300 Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11 In-Reply-To: References: <98587949-4bef-51ad-0652-3a74f79af233@oracle.com> <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com> Message-ID: I made few runs on ThunderX2 (aarch64). It is funny but I see almost reverse difference in small.AESBench.encrypt: ~4% regression for both -XX:-UseSwitchProfiling and patched version against current code. No difference for full.AESBench.encrypt. Stub code is the same and profiles differ slightly: Mainline ?53.91%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (128 bytes) ?29.76%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (40 bytes) ? 7.64%???????? c2, level 4 com.sun.crypto.provider.CipherCore::doFinal, version 868 (356 bytes) -XX:+UnlockExperimentalVMOptions -XX:-UseSwitchProfiling ?57.08%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (132 bytes) ?26.95%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (40 bytes) ? 7.85%???????? c2, level 4 com.sun.crypto.provider.CipherCore::doFinal, version 860 (384 bytes) Patched ?58.15%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (132 bytes) ?26.44%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (40 bytes) ? 6.67%???????? c2, level 4 com.sun.crypto.provider.CipherCore::doFinal, version 866 (128 bytes) -Dmitry On 09/05/2018 11:05 AM, Roland Westrelin wrote: > Thanks for the review. Anyone else? > > Roland. From vladimir.kozlov at oracle.com Wed Sep 5 16:00:44 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 5 Sep 2018 09:00:44 -0700 Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11 In-Reply-To: References: <98587949-4bef-51ad-0652-3a74f79af233@oracle.com> <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com> Message-ID: Hi Dmitry, What are (* bytes) values? Is it bytecode size? Why it is different? Thanks, Vladimir On 9/5/18 8:50 AM, Dmitry Chuyko wrote: > I made few runs on ThunderX2 (aarch64). It is funny but I see almost reverse difference in small.AESBench.encrypt: ~4% > regression for both -XX:-UseSwitchProfiling and patched version against current code. No difference for > full.AESBench.encrypt. > > Stub code is the same and profiles differ slightly: > > Mainline > ?53.91%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (128 bytes) > ?29.76%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (40 bytes) > ? 7.64%???????? c2, level 4 com.sun.crypto.provider.CipherCore::doFinal, version 868 (356 bytes) > > -XX:+UnlockExperimentalVMOptions -XX:-UseSwitchProfiling > ?57.08%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (132 bytes) > ?26.95%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (40 bytes) > ? 7.85%???????? c2, level 4 com.sun.crypto.provider.CipherCore::doFinal, version 860 (384 bytes) > > Patched > ?58.15%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (132 bytes) > ?26.44%??????? runtime stub? StubRoutines::aescrypt_encryptBlock (40 bytes) > ? 6.67%???????? c2, level 4 com.sun.crypto.provider.CipherCore::doFinal, version 866 (128 bytes) > > -Dmitry > > On 09/05/2018 11:05 AM, Roland Westrelin wrote: >> Thanks for the review. Anyone else? >> >> Roland. > From gromero at linux.vnet.ibm.com Wed Sep 5 16:20:31 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 5 Sep 2018 13:20:31 -0300 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: References: <346da54af45243c4bdaf475f118a450d@sap.com> <9553d65d98f74f37a35b49a1e39f015e@sap.com> <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com> Message-ID: <0fb73e86-12b8-a5d1-f9bb-5f4963606fbc@linux.vnet.ibm.com> Hi Martin, On 09/03/2018 02:18 PM, Doerr, Martin wrote: > Hi Gustavo and Michihiro, > > we noticed jtreg test failures when using this change: > compiler/runtime/safepoints/TestRegisterRestoring.java > compiler/runtime/Test7196199.java > > TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. Just to confirm I understood the description correctly: Where you able to check it's returning random values for the array instead of 10_000 or you just checked that test failed? Also, did you pass -XX:-SuperwordUseVSX when it failed? I'm asking because I'm able to fail that test due to a timeout, but not sure if it's the same you got there. Look (I'm using the same kernel as yours): http://cr.openjdk.java.net/~gromero/logs/slp_failure0.txt Thank you. Best regards, Gustavo > We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. > The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. > > That's what I found out so far. Maybe you have an idea? > > I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. > > Best regards, > Martin > > > -----Original Message----- > From: hotspot-dev On Behalf Of Gustavo Romero > Sent: Montag, 3. September 2018 14:57 > To: Lindenmaier, Goetz ; Michihiro Horie > Cc: hotspot compiler ; hotspot-dev at openjdk.java.net > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > Hi Goetz, > > On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote: >> Also, I can not find all of the mail traffic in >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html. >> Is this a problem of the pipermail server? >> >> For some reason this webrev lacks the links to browse the diffs. >> Do you need to use a more recent webrev? You can obtain it with >> hg clone http://hg.openjdk.java.net/code-tools/webrev/ . > > Yes, probably it was a problem of the pipermail or in some relay. > I noted the same thing, i.e. at least one Michi reply arrived > to me but missed a ML. > > The initial discussion is here: > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html > > I understand Martin reviewed the last webrev in that thread, which is > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ (taken from > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html) > > Martin's review of webrev.01: > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html > > and Michi's reply to Martin's review of webrev.01: > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02, > taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html). > > and your last review: > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html > > > HTH. > > Best regards, > Gustavo > >> Why do you rename vnoreg to vnoregi? >> >> Besides that the change is fine, thanks for implementing this! >> >> Best regards, >> Goetz. >> >> >>> -----Original Message----- >>> From: Doerr, Martin >>> Sent: Dienstag, 28. August 2018 19:35 >>> To: Gustavo Romero ; Michihiro Horie >>> >>> Cc: Lindenmaier, Goetz ; hotspot- >>> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker >>> >>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support >>> >>> Hi Michihiro, >>> >>> thank you for implementing it. I have just taken a first look at your >>> webrev.01. >>> >>> It looks basically good. Only the Power version check seems to be incorrect. >>> VM_Version::has_popcntb() checks for Power5. >>> I believe most instructions are available with Power7. >>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with >>> Power8? >>> We should check this carefully. >>> >>> Also, indentation in register_ppc.hpp could get improved. >>> >>> Thanks and best regard, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Gustavo Romero >>> Sent: Donnerstag, 26. Juli 2018 16:02 >>> To: Michihiro Horie >>> Cc: Lindenmaier, Goetz ; hotspot- >>> dev at openjdk.java.net; Doerr, Martin ; ppc-aix- >>> port-dev at openjdk.java.net; Simonis, Volker >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >>> >>> Hi Michi, >>> >>> On 07/26/2018 01:43 AM, Michihiro Horie wrote: >>>> I updated webrev: >>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ >>> >>> Thanks for providing an updated webrev and for fixing indentation and >>> function >>> order in assembler_ppc.inline.hpp as well. I have no further comments :) >>> >>> >>> Best Regards, >>> Gustavo >>> >>>> >>>> Best regards, >>>> -- >>>> Michihiro, >>>> IBM Research - Tokyo >>>> >>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero --- >>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie >>> wrote: >>>> >>>> From: Gustavo Romero >>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port- >>> dev at openjdk.java.net, hotspot-dev at openjdk.java.net >>>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin" >>> >>>> Date: 2018/07/25 23:05 >>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >>>> >>>> ------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ----------------------------------------------------- >>>> >>>> >>>> >>>> Hi Michi, >>>> >>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote: >>>> > Dear all, >>>> > >>>> > Would you review the following change? >>>> > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 >>>> > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 >>>> > >>>> > This change adds support for vectorized arithmetic calculation with SLP. >>>> > >>>> > The to_vr function is added to convert VSR to VR. Currently, vecX is >>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, >>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the >>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the >>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to >>> the ConvD2FNode::Value in convertnode.cpp. >>>> >>>> Looks good. Just a few comments: >>>> >>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of >>> vmaddfp in >>>> order to avoid the splat? >>>> >>>> - Although all instructions added by your change where introduced in ISA >>> 2.06, >>>> so POWER7 and above are OK, as I see probes for >>> PowerArchictecturePPC64=6|5 in >>>> vm_version_ppc.cpp (line 64), I'm wondering if there is any control point >>> to >>>> guarantee that these instructions won't be emitted on a CPU that does >>> not >>>> support them. >>>> >>>> - I think that in general string in format %{} are in upper case. For instance, >>>> this the current output on optoassembly for vmul4F: >>>> >>>> 2941835 5b4 ADDI R24, R24, #64 >>>> 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F >>>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>>> >>>> I think it would be better to be in upper case instead. I also think that if >>>> the node match emits more than one instruction all instructions must be >>> listed >>>> in format %{}, since it's meant for detailed debugging. Finally I think it >>>> would be better to replace \t! by \t// in that string (unless I'm missing any >>>> special meaning for that char). So for vmul4F it would be something like: >>>> >>>> 2941835 5b4 ADDI R24, R24, #64 >>>> VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 >>>> 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F >>>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>>> >>>> >>>> But feel free to change anything just after you get additional reviews :) >>>> >>>> >>>> > I confirmed this change with JTREG. In addition, I used attached micro >>> benchmarks. >>>> > /(See attached file: slp_microbench.zip)/ >>>> >>>> Thanks for sharing it. >>>> Btw, another option to host it would be in the CR >>>> server, in http://cr.openjdk.java.net/~mhorie/8208171 >>>> >>>> >>>> Best regards, >>>> Gustavo >>>> >>>> > >>>> > Best regards, >>>> > -- >>>> > Michihiro, >>>> > IBM Research - Tokyo >>>> > >>>> >>>> >>>> >> > From dmitry.chuyko at bell-sw.com Wed Sep 5 16:24:51 2018 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Wed, 5 Sep 2018 19:24:51 +0300 Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11 In-Reply-To: References: <98587949-4bef-51ad-0652-3a74f79af233@oracle.com> <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com> Message-ID: <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com> On 09/05/2018 07:00 PM, Vladimir Kozlov wrote: > Hi Dmitry, > > What are (* bytes) values? Is it bytecode size? Why it is different? It is a distance between captured event addresses in particular hot region (first and last). Perf attributes one more instruction (2 instrs down) in 132 bytes case, it is just a comparison with 52 (0.37%). The code is the same so this doesn't look too suspicious to me. But different percentage for stub parts does. Note, regions percentage distribution after inlining looks the same, e.g. ....[Hottest Methods (after inlining)].............................................................. ?83.67%??????? runtime stub? StubRoutines::aescrypt_encryptBlock ? 7.69%???????? c2, level 4 com.sun.crypto.provider.CipherCore::doFinal, version 868 ? 4.34%???????? c2, level 4 org.openjdk.bench.javax.crypto.small.generated.AESBench_encrypt_jmhTest::encrypt_thrpt_jmhStub, version 889 and ?84.03%??????? runtime stub? StubRoutines::aescrypt_encryptBlock ? 7.85%???????? c2, level 4 com.sun.crypto.provider.CipherCore::doFinal, version 860 ? 4.22%???????? c2, level 4 org.openjdk.bench.javax.crypto.small.generated.AESBench_encrypt_jmhTest::encrypt_thrpt_jmhStub, version 880 -Dmitry > > Thanks, > Vladimir > > On 9/5/18 8:50 AM, Dmitry Chuyko wrote: >> I made few runs on ThunderX2 (aarch64). It is funny but I see almost >> reverse difference in small.AESBench.encrypt: ~4% regression for both >> -XX:-UseSwitchProfiling and patched version against current code. No >> difference for full.AESBench.encrypt. >> >> Stub code is the same and profiles differ slightly: >> >> Mainline >> ??53.91%??????? runtime stub StubRoutines::aescrypt_encryptBlock (128 >> bytes) >> ??29.76%??????? runtime stub StubRoutines::aescrypt_encryptBlock (40 >> bytes) >> ?? 7.64%???????? c2, level 4 >> com.sun.crypto.provider.CipherCore::doFinal, version 868 (356 bytes) >> >> -XX:+UnlockExperimentalVMOptions -XX:-UseSwitchProfiling >> ??57.08%??????? runtime stub StubRoutines::aescrypt_encryptBlock (132 >> bytes) >> ??26.95%??????? runtime stub StubRoutines::aescrypt_encryptBlock (40 >> bytes) >> ?? 7.85%???????? c2, level 4 >> com.sun.crypto.provider.CipherCore::doFinal, version 860 (384 bytes) >> >> Patched >> ??58.15%??????? runtime stub StubRoutines::aescrypt_encryptBlock (132 >> bytes) >> ??26.44%??????? runtime stub StubRoutines::aescrypt_encryptBlock (40 >> bytes) >> ?? 6.67%???????? c2, level 4 >> com.sun.crypto.provider.CipherCore::doFinal, version 866 (128 bytes) >> >> -Dmitry >> >> On 09/05/2018 11:05 AM, Roland Westrelin wrote: >>> Thanks for the review. Anyone else? >>> >>> Roland. >> From dmitry.chuyko at bell-sw.com Wed Sep 5 17:11:44 2018 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Wed, 5 Sep 2018 20:11:44 +0300 Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11 In-Reply-To: <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com> References: <98587949-4bef-51ad-0652-3a74f79af233@oracle.com> <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com> <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com> Message-ID: On 09/05/2018 07:24 PM, Dmitry Chuyko wrote: > On 09/05/2018 07:00 PM, Vladimir Kozlov wrote: >> Hi Dmitry, >> >> What are (* bytes) values? Is it bytecode size? Why it is different? > It is a distance between captured event addresses in particular hot > region (first and last). > Perf attributes one more instruction (2 instrs down) in 132 bytes > case, it is just a comparison with 52 (0.37%). The code is the same so > this doesn't look too suspicious to me. Or it does :-) That may be a branch / branch miss miss inside stub. And then we may see extra instructions attributed and the branch itself. The extra part of region 1 is ??? __ cmpw(keylen, 44); ??? __ br(Assembler::EQ, L_doLast); ??? __ aese(v0, v1); ??? __ aesmc(v0, v0); ??? __ aese(v0, v2); ??? __ aesmc(v0, v0); ??? __ ld1(v1, v2, __ T16B, __ post(key, 32)); ??? __ rev32(v1, __ T16B, v1); ??? __ rev32(v2, __ T16B, v2); ??? __ cmpw(keylen, 52); ??? __ br(Assembler::EQ, L_doLast); Region 2 is what happens in L_doLast. -prof perfnorm shows 7-14% more branch misses. > But different percentage for stub parts does. Note, regions percentage > distribution after inlining looks the same, e.g. > > ....[Hottest Methods (after > inlining)].............................................................. > ?83.67%??????? runtime stub? StubRoutines::aescrypt_encryptBlock > ? 7.69%???????? c2, level 4 > com.sun.crypto.provider.CipherCore::doFinal, version 868 > ? 4.34%???????? c2, level 4 > org.openjdk.bench.javax.crypto.small.generated.AESBench_encrypt_jmhTest::encrypt_thrpt_jmhStub, > version 889 > > and > > ?84.03%??????? runtime stub? StubRoutines::aescrypt_encryptBlock > ? 7.85%???????? c2, level 4 > com.sun.crypto.provider.CipherCore::doFinal, version 860 > ? 4.22%???????? c2, level 4 > org.openjdk.bench.javax.crypto.small.generated.AESBench_encrypt_jmhTest::encrypt_thrpt_jmhStub, > version 880 > > -Dmitry > >> >> Thanks, >> Vladimir >> >> On 9/5/18 8:50 AM, Dmitry Chuyko wrote: >>> I made few runs on ThunderX2 (aarch64). It is funny but I see almost >>> reverse difference in small.AESBench.encrypt: ~4% regression for >>> both -XX:-UseSwitchProfiling and patched version against current >>> code. No difference for full.AESBench.encrypt. >>> >>> Stub code is the same and profiles differ slightly: >>> >>> Mainline >>> ??53.91%??????? runtime stub StubRoutines::aescrypt_encryptBlock >>> (128 bytes) >>> ??29.76%??????? runtime stub StubRoutines::aescrypt_encryptBlock (40 >>> bytes) >>> ?? 7.64%???????? c2, level 4 >>> com.sun.crypto.provider.CipherCore::doFinal, version 868 (356 bytes) >>> >>> -XX:+UnlockExperimentalVMOptions -XX:-UseSwitchProfiling >>> ??57.08%??????? runtime stub StubRoutines::aescrypt_encryptBlock >>> (132 bytes) >>> ??26.95%??????? runtime stub StubRoutines::aescrypt_encryptBlock (40 >>> bytes) >>> ?? 7.85%???????? c2, level 4 >>> com.sun.crypto.provider.CipherCore::doFinal, version 860 (384 bytes) >>> >>> Patched >>> ??58.15%??????? runtime stub StubRoutines::aescrypt_encryptBlock >>> (132 bytes) >>> ??26.44%??????? runtime stub StubRoutines::aescrypt_encryptBlock (40 >>> bytes) >>> ?? 6.67%???????? c2, level 4 >>> com.sun.crypto.provider.CipherCore::doFinal, version 866 (128 bytes) >>> >>> -Dmitry >>> >>> On 09/05/2018 11:05 AM, Roland Westrelin wrote: >>>> Thanks for the review. Anyone else? >>>> >>>> Roland. >>> > From martin.doerr at sap.com Wed Sep 5 17:45:22 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 5 Sep 2018 17:45:22 +0000 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: <0fb73e86-12b8-a5d1-f9bb-5f4963606fbc@linux.vnet.ibm.com> References: <346da54af45243c4bdaf475f118a450d@sap.com> <9553d65d98f74f37a35b49a1e39f015e@sap.com> <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com> <0fb73e86-12b8-a5d1-f9bb-5f4963606fbc@linux.vnet.ibm.com> Message-ID: <6eb4c5c5d42d49bf8564e8bfae1b3e9b@sap.com> Hi Gustavo, thank you for your detailed explanation. I wonder what happens with the registers when VSX gets disabled, but the regs are read again many context switches later. But I guess this is solved somehow. I'm getting different incorrect results every time I run the test on some machines, while other machines always compute the correct result and the test passes. But I found out that the problem shows up with different kernel versions (4.4.0-101-generic, 3.10.0-693.1.1.el7.ppc64le, 4.4.126-94.22-default). So I guess it's rather unlikely that the problem is only caused by the OS. After more investigation, it rather looks like v0 is not preserved across the safepoint: vs32 = v0, vs36 = v4, vs40 = v8 0x00007fff6813e6d0: extsw r15,r17 0x00007fff6813e6d4: rldic r18,r17,2,30 0x00007fff6813e6d8: add r18,r21,r18 0x00007fff6813e6dc: addi r20,r18,16 0x00007fff6813e6e0: addi r18,r18,16 0x00007fff6813e6e4: lxvd2x vs36,0,r18 0x00007fff6813e6e8: vaddfp v4,v4,v0 0x00007fff6813e6ec: rldicr r15,r15,2,61 0x00007fff6813e6f0: add r15,r21,r15 0x00007fff6813e6f4: addi r18,r15,32 0x00007fff6813e6f8: addi r15,r15,32 0x00007fff6813e6fc: lxvd2x vs40,0,r15 ;*faload {reexecute=0 rethrow=0 return_oop=0} ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 21 (line 62) 0x00007fff6813e700: stxvd2x vs36,0,r20 0x00007fff6813e704: vaddfp v4,v8,v0 0x00007fff6813e708: stxvd2x vs36,0,r18 ;*fastore {reexecute=0 rethrow=0 return_oop=0} ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 24 (line 62) 0x00007fff6813e70c: addi r17,r17,8 ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 25 (line 61) 0x00007fff6813e710: cmpw cr5,r17,r24 0x00007fff6813e714: blt cr5,0x00007fff6813e6d0 ;*goto {reexecute=0 rethrow=0 return_oop=0} ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61) ;; B15: # B14 B16 <- B14 Freq: 12356.3 0x00007fff6813e718: ld r15,288(r16) ; ImmutableOopMap{R21=Oop } ;*goto {reexecute=1 rethrow=0 return_oop=0} ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61) 0x00007fff6813e71c: tdlgei r15,8 ;*goto {reexecute=0 rethrow=0 return_oop=0} ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61) ; {poll} 0x00007fff6813e720: cmpw cr6,r17,r24 0x00007fff6813e724: blt cr6,0x00007fff6813e6d0 At the end of the method, I see v4_float = {10000, 10000, 10000, 10000} on machines on which the test passes. On a machine on which it fails, e.g. v4_float = {0xffffffff, 0x8a296200, 0xffffffff, 0xffffffff} I thought we had already checked saving and restoring vector registers at safepoints, but seems like we have missed something. Best regards, Martin -----Original Message----- From: Gustavo Romero Sent: Mittwoch, 5. September 2018 18:21 To: Doerr, Martin ; Lindenmaier, Goetz ; Michihiro Horie Cc: hotspot compiler ; hotspot-dev at openjdk.java.net Subject: Re: RFR: 8208171: PPC64: Enrich SLP support Hi Martin, On 09/03/2018 02:18 PM, Doerr, Martin wrote: > Hi Gustavo and Michihiro, > > we noticed jtreg test failures when using this change: > compiler/runtime/safepoints/TestRegisterRestoring.java > compiler/runtime/Test7196199.java > > TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. Just to confirm I understood the description correctly: Where you able to check it's returning random values for the array instead of 10_000 or you just checked that test failed? Also, did you pass -XX:-SuperwordUseVSX when it failed? I'm asking because I'm able to fail that test due to a timeout, but not sure if it's the same you got there. Look (I'm using the same kernel as yours): http://cr.openjdk.java.net/~gromero/logs/slp_failure0.txt Thank you. Best regards, Gustavo > We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. > The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. > > That's what I found out so far. Maybe you have an idea? > > I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. > > Best regards, > Martin > > > -----Original Message----- > From: hotspot-dev On Behalf Of Gustavo Romero > Sent: Montag, 3. September 2018 14:57 > To: Lindenmaier, Goetz ; Michihiro Horie > Cc: hotspot compiler ; hotspot-dev at openjdk.java.net > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > Hi Goetz, > > On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote: >> Also, I can not find all of the mail traffic in >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html. >> Is this a problem of the pipermail server? >> >> For some reason this webrev lacks the links to browse the diffs. >> Do you need to use a more recent webrev? You can obtain it with >> hg clone http://hg.openjdk.java.net/code-tools/webrev/ . > > Yes, probably it was a problem of the pipermail or in some relay. > I noted the same thing, i.e. at least one Michi reply arrived > to me but missed a ML. > > The initial discussion is here: > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html > > I understand Martin reviewed the last webrev in that thread, which is > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ (taken from > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html) > > Martin's review of webrev.01: > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html > > and Michi's reply to Martin's review of webrev.01: > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02, > taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html). > > and your last review: > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html > > > HTH. > > Best regards, > Gustavo > >> Why do you rename vnoreg to vnoregi? >> >> Besides that the change is fine, thanks for implementing this! >> >> Best regards, >> Goetz. >> >> >>> -----Original Message----- >>> From: Doerr, Martin >>> Sent: Dienstag, 28. August 2018 19:35 >>> To: Gustavo Romero ; Michihiro Horie >>> >>> Cc: Lindenmaier, Goetz ; hotspot- >>> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker >>> >>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support >>> >>> Hi Michihiro, >>> >>> thank you for implementing it. I have just taken a first look at your >>> webrev.01. >>> >>> It looks basically good. Only the Power version check seems to be incorrect. >>> VM_Version::has_popcntb() checks for Power5. >>> I believe most instructions are available with Power7. >>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with >>> Power8? >>> We should check this carefully. >>> >>> Also, indentation in register_ppc.hpp could get improved. >>> >>> Thanks and best regard, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Gustavo Romero >>> Sent: Donnerstag, 26. Juli 2018 16:02 >>> To: Michihiro Horie >>> Cc: Lindenmaier, Goetz ; hotspot- >>> dev at openjdk.java.net; Doerr, Martin ; ppc-aix- >>> port-dev at openjdk.java.net; Simonis, Volker >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >>> >>> Hi Michi, >>> >>> On 07/26/2018 01:43 AM, Michihiro Horie wrote: >>>> I updated webrev: >>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ >>> >>> Thanks for providing an updated webrev and for fixing indentation and >>> function >>> order in assembler_ppc.inline.hpp as well. I have no further comments :) >>> >>> >>> Best Regards, >>> Gustavo >>> >>>> >>>> Best regards, >>>> -- >>>> Michihiro, >>>> IBM Research - Tokyo >>>> >>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero --- >>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie >>> wrote: >>>> >>>> From: Gustavo Romero >>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port- >>> dev at openjdk.java.net, hotspot-dev at openjdk.java.net >>>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin" >>> >>>> Date: 2018/07/25 23:05 >>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >>>> >>>> ------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ----------------------------------------------------- >>>> >>>> >>>> >>>> Hi Michi, >>>> >>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote: >>>> > Dear all, >>>> > >>>> > Would you review the following change? >>>> > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 >>>> > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 >>>> > >>>> > This change adds support for vectorized arithmetic calculation with SLP. >>>> > >>>> > The to_vr function is added to convert VSR to VR. Currently, vecX is >>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, >>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the >>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the >>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to >>> the ConvD2FNode::Value in convertnode.cpp. >>>> >>>> Looks good. Just a few comments: >>>> >>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of >>> vmaddfp in >>>> order to avoid the splat? >>>> >>>> - Although all instructions added by your change where introduced in ISA >>> 2.06, >>>> so POWER7 and above are OK, as I see probes for >>> PowerArchictecturePPC64=6|5 in >>>> vm_version_ppc.cpp (line 64), I'm wondering if there is any control point >>> to >>>> guarantee that these instructions won't be emitted on a CPU that does >>> not >>>> support them. >>>> >>>> - I think that in general string in format %{} are in upper case. For instance, >>>> this the current output on optoassembly for vmul4F: >>>> >>>> 2941835 5b4 ADDI R24, R24, #64 >>>> 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F >>>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>>> >>>> I think it would be better to be in upper case instead. I also think that if >>>> the node match emits more than one instruction all instructions must be >>> listed >>>> in format %{}, since it's meant for detailed debugging. Finally I think it >>>> would be better to replace \t! by \t// in that string (unless I'm missing any >>>> special meaning for that char). So for vmul4F it would be something like: >>>> >>>> 2941835 5b4 ADDI R24, R24, #64 >>>> VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 >>>> 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F >>>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>>> >>>> >>>> But feel free to change anything just after you get additional reviews :) >>>> >>>> >>>> > I confirmed this change with JTREG. In addition, I used attached micro >>> benchmarks. >>>> > /(See attached file: slp_microbench.zip)/ >>>> >>>> Thanks for sharing it. >>>> Btw, another option to host it would be in the CR >>>> server, in http://cr.openjdk.java.net/~mhorie/8208171 >>>> >>>> >>>> Best regards, >>>> Gustavo >>>> >>>> > >>>> > Best regards, >>>> > -- >>>> > Michihiro, >>>> > IBM Research - Tokyo >>>> > >>>> >>>> >>>> >> > From martin.doerr at sap.com Wed Sep 5 18:10:01 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 5 Sep 2018 18:10:01 +0000 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: <6eb4c5c5d42d49bf8564e8bfae1b3e9b@sap.com> References: <346da54af45243c4bdaf475f118a450d@sap.com> <9553d65d98f74f37a35b49a1e39f015e@sap.com> <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com> <0fb73e86-12b8-a5d1-f9bb-5f4963606fbc@linux.vnet.ibm.com> <6eb4c5c5d42d49bf8564e8bfae1b3e9b@sap.com> Message-ID: <90edf96036f24d91acfd6b649d65c41b@sap.com> Hi Michihiro, support for POLL_AT_VECTOR_LOOP is required in the handler_blob / RegisterSaver like on x86. We haven't seen any issues with the current code, but I think this is affects jdk11, too. (We could also switch off SuperwordUseVSX for jdk11u.) Do you agree? Best regards, Martin -----Original Message----- From: Doerr, Martin Sent: Mittwoch, 5. September 2018 19:45 To: 'Gustavo Romero' ; Lindenmaier, Goetz ; Michihiro Horie Cc: hotspot compiler ; hotspot-dev at openjdk.java.net Subject: RE: RFR: 8208171: PPC64: Enrich SLP support Hi Gustavo, thank you for your detailed explanation. I wonder what happens with the registers when VSX gets disabled, but the regs are read again many context switches later. But I guess this is solved somehow. I'm getting different incorrect results every time I run the test on some machines, while other machines always compute the correct result and the test passes. But I found out that the problem shows up with different kernel versions (4.4.0-101-generic, 3.10.0-693.1.1.el7.ppc64le, 4.4.126-94.22-default). So I guess it's rather unlikely that the problem is only caused by the OS. After more investigation, it rather looks like v0 is not preserved across the safepoint: vs32 = v0, vs36 = v4, vs40 = v8 0x00007fff6813e6d0: extsw r15,r17 0x00007fff6813e6d4: rldic r18,r17,2,30 0x00007fff6813e6d8: add r18,r21,r18 0x00007fff6813e6dc: addi r20,r18,16 0x00007fff6813e6e0: addi r18,r18,16 0x00007fff6813e6e4: lxvd2x vs36,0,r18 0x00007fff6813e6e8: vaddfp v4,v4,v0 0x00007fff6813e6ec: rldicr r15,r15,2,61 0x00007fff6813e6f0: add r15,r21,r15 0x00007fff6813e6f4: addi r18,r15,32 0x00007fff6813e6f8: addi r15,r15,32 0x00007fff6813e6fc: lxvd2x vs40,0,r15 ;*faload {reexecute=0 rethrow=0 return_oop=0} ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 21 (line 62) 0x00007fff6813e700: stxvd2x vs36,0,r20 0x00007fff6813e704: vaddfp v4,v8,v0 0x00007fff6813e708: stxvd2x vs36,0,r18 ;*fastore {reexecute=0 rethrow=0 return_oop=0} ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 24 (line 62) 0x00007fff6813e70c: addi r17,r17,8 ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 25 (line 61) 0x00007fff6813e710: cmpw cr5,r17,r24 0x00007fff6813e714: blt cr5,0x00007fff6813e6d0 ;*goto {reexecute=0 rethrow=0 return_oop=0} ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61) ;; B15: # B14 B16 <- B14 Freq: 12356.3 0x00007fff6813e718: ld r15,288(r16) ; ImmutableOopMap{R21=Oop } ;*goto {reexecute=1 rethrow=0 return_oop=0} ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61) 0x00007fff6813e71c: tdlgei r15,8 ;*goto {reexecute=0 rethrow=0 return_oop=0} ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61) ; {poll} 0x00007fff6813e720: cmpw cr6,r17,r24 0x00007fff6813e724: blt cr6,0x00007fff6813e6d0 At the end of the method, I see v4_float = {10000, 10000, 10000, 10000} on machines on which the test passes. On a machine on which it fails, e.g. v4_float = {0xffffffff, 0x8a296200, 0xffffffff, 0xffffffff} I thought we had already checked saving and restoring vector registers at safepoints, but seems like we have missed something. Best regards, Martin -----Original Message----- From: Gustavo Romero Sent: Mittwoch, 5. September 2018 18:21 To: Doerr, Martin ; Lindenmaier, Goetz ; Michihiro Horie Cc: hotspot compiler ; hotspot-dev at openjdk.java.net Subject: Re: RFR: 8208171: PPC64: Enrich SLP support Hi Martin, On 09/03/2018 02:18 PM, Doerr, Martin wrote: > Hi Gustavo and Michihiro, > > we noticed jtreg test failures when using this change: > compiler/runtime/safepoints/TestRegisterRestoring.java > compiler/runtime/Test7196199.java > > TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. Just to confirm I understood the description correctly: Where you able to check it's returning random values for the array instead of 10_000 or you just checked that test failed? Also, did you pass -XX:-SuperwordUseVSX when it failed? I'm asking because I'm able to fail that test due to a timeout, but not sure if it's the same you got there. Look (I'm using the same kernel as yours): http://cr.openjdk.java.net/~gromero/logs/slp_failure0.txt Thank you. Best regards, Gustavo > We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. > The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. > > That's what I found out so far. Maybe you have an idea? > > I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. > > Best regards, > Martin > > > -----Original Message----- > From: hotspot-dev On Behalf Of Gustavo Romero > Sent: Montag, 3. September 2018 14:57 > To: Lindenmaier, Goetz ; Michihiro Horie > Cc: hotspot compiler ; hotspot-dev at openjdk.java.net > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > Hi Goetz, > > On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote: >> Also, I can not find all of the mail traffic in >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html. >> Is this a problem of the pipermail server? >> >> For some reason this webrev lacks the links to browse the diffs. >> Do you need to use a more recent webrev? You can obtain it with >> hg clone http://hg.openjdk.java.net/code-tools/webrev/ . > > Yes, probably it was a problem of the pipermail or in some relay. > I noted the same thing, i.e. at least one Michi reply arrived > to me but missed a ML. > > The initial discussion is here: > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html > > I understand Martin reviewed the last webrev in that thread, which is > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ (taken from > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html) > > Martin's review of webrev.01: > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html > > and Michi's reply to Martin's review of webrev.01: > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02, > taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html). > > and your last review: > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html > > > HTH. > > Best regards, > Gustavo > >> Why do you rename vnoreg to vnoregi? >> >> Besides that the change is fine, thanks for implementing this! >> >> Best regards, >> Goetz. >> >> >>> -----Original Message----- >>> From: Doerr, Martin >>> Sent: Dienstag, 28. August 2018 19:35 >>> To: Gustavo Romero ; Michihiro Horie >>> >>> Cc: Lindenmaier, Goetz ; hotspot- >>> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker >>> >>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support >>> >>> Hi Michihiro, >>> >>> thank you for implementing it. I have just taken a first look at your >>> webrev.01. >>> >>> It looks basically good. Only the Power version check seems to be incorrect. >>> VM_Version::has_popcntb() checks for Power5. >>> I believe most instructions are available with Power7. >>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with >>> Power8? >>> We should check this carefully. >>> >>> Also, indentation in register_ppc.hpp could get improved. >>> >>> Thanks and best regard, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Gustavo Romero >>> Sent: Donnerstag, 26. Juli 2018 16:02 >>> To: Michihiro Horie >>> Cc: Lindenmaier, Goetz ; hotspot- >>> dev at openjdk.java.net; Doerr, Martin ; ppc-aix- >>> port-dev at openjdk.java.net; Simonis, Volker >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >>> >>> Hi Michi, >>> >>> On 07/26/2018 01:43 AM, Michihiro Horie wrote: >>>> I updated webrev: >>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ >>> >>> Thanks for providing an updated webrev and for fixing indentation and >>> function >>> order in assembler_ppc.inline.hpp as well. I have no further comments :) >>> >>> >>> Best Regards, >>> Gustavo >>> >>>> >>>> Best regards, >>>> -- >>>> Michihiro, >>>> IBM Research - Tokyo >>>> >>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero --- >>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie >>> wrote: >>>> >>>> From: Gustavo Romero >>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port- >>> dev at openjdk.java.net, hotspot-dev at openjdk.java.net >>>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin" >>> >>>> Date: 2018/07/25 23:05 >>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >>>> >>>> ------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ---------------------------------------------------------------------------------------------- >>> ----------------------------------------------------- >>>> >>>> >>>> >>>> Hi Michi, >>>> >>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote: >>>> > Dear all, >>>> > >>>> > Would you review the following change? >>>> > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 >>>> > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 >>>> > >>>> > This change adds support for vectorized arithmetic calculation with SLP. >>>> > >>>> > The to_vr function is added to convert VSR to VR. Currently, vecX is >>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, >>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the >>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the >>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to >>> the ConvD2FNode::Value in convertnode.cpp. >>>> >>>> Looks good. Just a few comments: >>>> >>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of >>> vmaddfp in >>>> order to avoid the splat? >>>> >>>> - Although all instructions added by your change where introduced in ISA >>> 2.06, >>>> so POWER7 and above are OK, as I see probes for >>> PowerArchictecturePPC64=6|5 in >>>> vm_version_ppc.cpp (line 64), I'm wondering if there is any control point >>> to >>>> guarantee that these instructions won't be emitted on a CPU that does >>> not >>>> support them. >>>> >>>> - I think that in general string in format %{} are in upper case. For instance, >>>> this the current output on optoassembly for vmul4F: >>>> >>>> 2941835 5b4 ADDI R24, R24, #64 >>>> 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F >>>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>>> >>>> I think it would be better to be in upper case instead. I also think that if >>>> the node match emits more than one instruction all instructions must be >>> listed >>>> in format %{}, since it's meant for detailed debugging. Finally I think it >>>> would be better to replace \t! by \t// in that string (unless I'm missing any >>>> special meaning for that char). So for vmul4F it would be something like: >>>> >>>> 2941835 5b4 ADDI R24, R24, #64 >>>> VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 >>>> 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F >>>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>>> >>>> >>>> But feel free to change anything just after you get additional reviews :) >>>> >>>> >>>> > I confirmed this change with JTREG. In addition, I used attached micro >>> benchmarks. >>>> > /(See attached file: slp_microbench.zip)/ >>>> >>>> Thanks for sharing it. >>>> Btw, another option to host it would be in the CR >>>> server, in http://cr.openjdk.java.net/~mhorie/8208171 >>>> >>>> >>>> Best regards, >>>> Gustavo >>>> >>>> > >>>> > Best regards, >>>> > -- >>>> > Michihiro, >>>> > IBM Research - Tokyo >>>> > >>>> >>>> >>>> >> > From gromero at linux.vnet.ibm.com Wed Sep 5 18:29:25 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 5 Sep 2018 15:29:25 -0300 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: <6eb4c5c5d42d49bf8564e8bfae1b3e9b@sap.com> References: <346da54af45243c4bdaf475f118a450d@sap.com> <9553d65d98f74f37a35b49a1e39f015e@sap.com> <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com> <0fb73e86-12b8-a5d1-f9bb-5f4963606fbc@linux.vnet.ibm.com> <6eb4c5c5d42d49bf8564e8bfae1b3e9b@sap.com> Message-ID: <71b0c876-25a2-f757-375f-686477d2d2c7@linux.vnet.ibm.com> Hi Martin, On 09/05/2018 02:45 PM, Doerr, Martin wrote: > Hi Gustavo, > > thank you for your detailed explanation. I wonder what happens with the registers when VSX gets disabled, but the regs are read again many context switches later. But I guess this is solved somehow. No problem! Yes, kernel solves that too: once a VSX instruction is used again and VSX is disabled it generates an exception and the exception handler calls load_up_vsx(): https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/vector.S#L119 load_up_vsx() calls, by its turn, load_up_fpu() and load_up_vec() to load FP and VEC registers from the thread struct associated to the task that wants to use the VSX facility again. That thread struct contains the correct FP/VEC/VSX registers saved many context switches before when the facilities where disabled. The best description on what happens in this case (valid for VEC and VSX as well) can be found in load_up_fpu() description: https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/fpu.S#L79-L83 : * This task wants to use the FPU now. * On UP, disable FP for the task which had the FPU previously, * and save its floating-point registers in its thread_struct. * Load up this task's FP registers from its thread_struct, * enable the FPU for the current task and return to the task. Here UP stand for Uni Processor (in case we are running a machine with only 1 CPU). > I'm getting different incorrect results every time I run the test on some machines, while other machines always compute the correct result and the test passes. > > But I found out that the problem shows up with different kernel versions (4.4.0-101-generic, 3.10.0-693.1.1.el7.ppc64le, 4.4.126-94.22-default). So I guess it's rather unlikely that the problem is only caused by the OS. > > After more investigation, it rather looks like v0 is not preserved across the safepoint: > > vs32 = v0, vs36 = v4, vs40 = v8 > > 0x00007fff6813e6d0: extsw r15,r17 > 0x00007fff6813e6d4: rldic r18,r17,2,30 > 0x00007fff6813e6d8: add r18,r21,r18 > 0x00007fff6813e6dc: addi r20,r18,16 > 0x00007fff6813e6e0: addi r18,r18,16 > 0x00007fff6813e6e4: lxvd2x vs36,0,r18 > 0x00007fff6813e6e8: vaddfp v4,v4,v0 > 0x00007fff6813e6ec: rldicr r15,r15,2,61 > 0x00007fff6813e6f0: add r15,r21,r15 > 0x00007fff6813e6f4: addi r18,r15,32 > 0x00007fff6813e6f8: addi r15,r15,32 > 0x00007fff6813e6fc: lxvd2x vs40,0,r15 ;*faload {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 21 (line 62) > > 0x00007fff6813e700: stxvd2x vs36,0,r20 > 0x00007fff6813e704: vaddfp v4,v8,v0 > 0x00007fff6813e708: stxvd2x vs36,0,r18 ;*fastore {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 24 (line 62) > > 0x00007fff6813e70c: addi r17,r17,8 ;*iinc {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 25 (line 61) > > 0x00007fff6813e710: cmpw cr5,r17,r24 > 0x00007fff6813e714: blt cr5,0x00007fff6813e6d0 ;*goto {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61) > > ;; B15: # B14 B16 <- B14 Freq: 12356.3 > > 0x00007fff6813e718: ld r15,288(r16) ; ImmutableOopMap{R21=Oop } > ;*goto {reexecute=1 rethrow=0 return_oop=0} > ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61) > > 0x00007fff6813e71c: tdlgei r15,8 ;*goto {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61) > ; {poll} > 0x00007fff6813e720: cmpw cr6,r17,r24 > 0x00007fff6813e724: blt cr6,0x00007fff6813e6d0 > > > At the end of the method, I see v4_float = {10000, 10000, 10000, 10000} on machines on which the test passes. > On a machine on which it fails, e.g. v4_float = {0xffffffff, 0x8a296200, 0xffffffff, 0xffffffff} > > I thought we had already checked saving and restoring vector registers at safepoints, but seems like we have missed something. OK. So I was not able to reproduce yet... But looks like you pointed out a solution to Michi already, so I'll stay tuned. Thanks. Best regards, Gustavo > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero > Sent: Mittwoch, 5. September 2018 18:21 > To: Doerr, Martin ; Lindenmaier, Goetz ; Michihiro Horie > Cc: hotspot compiler ; hotspot-dev at openjdk.java.net > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > Hi Martin, > > On 09/03/2018 02:18 PM, Doerr, Martin wrote: >> Hi Gustavo and Michihiro, >> >> we noticed jtreg test failures when using this change: >> compiler/runtime/safepoints/TestRegisterRestoring.java >> compiler/runtime/Test7196199.java >> >> TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. > > Just to confirm I understood the description correctly: > > Where you able to check it's returning random values for the > array instead of 10_000 or you just checked that test failed? > > Also, did you pass -XX:-SuperwordUseVSX when it failed? I'm > asking because I'm able to fail that test due to a timeout, but not sure > if it's the same you got there. Look (I'm using the same kernel as yours): > > http://cr.openjdk.java.net/~gromero/logs/slp_failure0.txt > > > Thank you. > > Best regards, > Gustavo > >> We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. >> The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. >> >> That's what I found out so far. Maybe you have an idea? >> >> I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: hotspot-dev On Behalf Of Gustavo Romero >> Sent: Montag, 3. September 2018 14:57 >> To: Lindenmaier, Goetz ; Michihiro Horie >> Cc: hotspot compiler ; hotspot-dev at openjdk.java.net >> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >> >> Hi Goetz, >> >> On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote: >>> Also, I can not find all of the mail traffic in >>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html. >>> Is this a problem of the pipermail server? >>> >>> For some reason this webrev lacks the links to browse the diffs. >>> Do you need to use a more recent webrev? You can obtain it with >>> hg clone http://hg.openjdk.java.net/code-tools/webrev/ . >> >> Yes, probably it was a problem of the pipermail or in some relay. >> I noted the same thing, i.e. at least one Michi reply arrived >> to me but missed a ML. >> >> The initial discussion is here: >> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html >> >> I understand Martin reviewed the last webrev in that thread, which is >> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ (taken from >> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html) >> >> Martin's review of webrev.01: >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html >> >> and Michi's reply to Martin's review of webrev.01: >> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02, >> taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html). >> >> and your last review: >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html >> >> >> HTH. >> >> Best regards, >> Gustavo >> >>> Why do you rename vnoreg to vnoregi? >>> >>> Besides that the change is fine, thanks for implementing this! >>> >>> Best regards, >>> Goetz. >>> >>> >>>> -----Original Message----- >>>> From: Doerr, Martin >>>> Sent: Dienstag, 28. August 2018 19:35 >>>> To: Gustavo Romero ; Michihiro Horie >>>> >>>> Cc: Lindenmaier, Goetz ; hotspot- >>>> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker >>>> >>>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support >>>> >>>> Hi Michihiro, >>>> >>>> thank you for implementing it. I have just taken a first look at your >>>> webrev.01. >>>> >>>> It looks basically good. Only the Power version check seems to be incorrect. >>>> VM_Version::has_popcntb() checks for Power5. >>>> I believe most instructions are available with Power7. >>>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with >>>> Power8? >>>> We should check this carefully. >>>> >>>> Also, indentation in register_ppc.hpp could get improved. >>>> >>>> Thanks and best regard, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: Gustavo Romero >>>> Sent: Donnerstag, 26. Juli 2018 16:02 >>>> To: Michihiro Horie >>>> Cc: Lindenmaier, Goetz ; hotspot- >>>> dev at openjdk.java.net; Doerr, Martin ; ppc-aix- >>>> port-dev at openjdk.java.net; Simonis, Volker >>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >>>> >>>> Hi Michi, >>>> >>>> On 07/26/2018 01:43 AM, Michihiro Horie wrote: >>>>> I updated webrev: >>>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ >>>> >>>> Thanks for providing an updated webrev and for fixing indentation and >>>> function >>>> order in assembler_ppc.inline.hpp as well. I have no further comments :) >>>> >>>> >>>> Best Regards, >>>> Gustavo >>>> >>>>> >>>>> Best regards, >>>>> -- >>>>> Michihiro, >>>>> IBM Research - Tokyo >>>>> >>>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, >>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero --- >>>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie >>>> wrote: >>>>> >>>>> From: Gustavo Romero >>>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port- >>>> dev at openjdk.java.net, hotspot-dev at openjdk.java.net >>>>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin" >>>> >>>>> Date: 2018/07/25 23:05 >>>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support >>>>> >>>>> ------------------------------------------------------------------------------------------- >>>> ---------------------------------------------------------------------------------------------- >>>> ---------------------------------------------------------------------------------------------- >>>> ---------------------------------------------------------------------------------------------- >>>> ---------------------------------------------------------------------------------------------- >>>> ---------------------------------------------------------------------------------------------- >>>> ---------------------------------------------------------------------------------------------- >>>> ---------------------------------------------------------------------------------------------- >>>> ---------------------------------------------------------------------------------------------- >>>> ---------------------------------------------------------------------------------------------- >>>> ----------------------------------------------------- >>>>> >>>>> >>>>> >>>>> Hi Michi, >>>>> >>>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote: >>>>> > Dear all, >>>>> > >>>>> > Would you review the following change? >>>>> > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 >>>>> > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 >>>>> > >>>>> > This change adds support for vectorized arithmetic calculation with SLP. >>>>> > >>>>> > The to_vr function is added to convert VSR to VR. Currently, vecX is >>>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, >>>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the >>>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the >>>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to >>>> the ConvD2FNode::Value in convertnode.cpp. >>>>> >>>>> Looks good. Just a few comments: >>>>> >>>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of >>>> vmaddfp in >>>>> order to avoid the splat? >>>>> >>>>> - Although all instructions added by your change where introduced in ISA >>>> 2.06, >>>>> so POWER7 and above are OK, as I see probes for >>>> PowerArchictecturePPC64=6|5 in >>>>> vm_version_ppc.cpp (line 64), I'm wondering if there is any control point >>>> to >>>>> guarantee that these instructions won't be emitted on a CPU that does >>>> not >>>>> support them. >>>>> >>>>> - I think that in general string in format %{} are in upper case. For instance, >>>>> this the current output on optoassembly for vmul4F: >>>>> >>>>> 2941835 5b4 ADDI R24, R24, #64 >>>>> 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F >>>>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>>>> >>>>> I think it would be better to be in upper case instead. I also think that if >>>>> the node match emits more than one instruction all instructions must be >>>> listed >>>>> in format %{}, since it's meant for detailed debugging. Finally I think it >>>>> would be better to replace \t! by \t// in that string (unless I'm missing any >>>>> special meaning for that char). So for vmul4F it would be something like: >>>>> >>>>> 2941835 5b4 ADDI R24, R24, #64 >>>>> VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 >>>>> 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F >>>>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector >>>>> >>>>> >>>>> But feel free to change anything just after you get additional reviews :) >>>>> >>>>> >>>>> > I confirmed this change with JTREG. In addition, I used attached micro >>>> benchmarks. >>>>> > /(See attached file: slp_microbench.zip)/ >>>>> >>>>> Thanks for sharing it. >>>>> Btw, another option to host it would be in the CR >>>>> server, in http://cr.openjdk.java.net/~mhorie/8208171 >>>>> >>>>> >>>>> Best regards, >>>>> Gustavo >>>>> >>>>> > >>>>> > Best regards, >>>>> > -- >>>>> > Michihiro, >>>>> > IBM Research - Tokyo >>>>> > >>>>> >>>>> >>>>> >>> >> > From gromero at linux.vnet.ibm.com Wed Sep 5 18:34:25 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 5 Sep 2018 15:34:25 -0300 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: References: <346da54af45243c4bdaf475f118a450d@sap.com> <9553d65d98f74f37a35b49a1e39f015e@sap.com> <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com> <57ebd30a66504577a6b2ec267aee4b69@sap.com> <8a093a6e-631e-4c20-07dc-96bc29f8c411@linux.vnet.ibm.com> Message-ID: Hi Michi, On 09/05/2018 07:22 AM, Michihiro Horie wrote: > Hi Martin, Gustavo, > > I cannot still reproduce the problem. I noticed the machine I have is not SUSE but OpenSUSE with 4.1.21-14-default. I've also tried kernel 4.4.0-31-generic but it's Ubuntu. > > Gustavo, is there any suspicious change before/after v4.4, which Martin got the crash? Nope, nothing I'm aware of... However looks like Martin found no issues with your last revision. Anyway, if you need a machine with SLES 12 SP3 installed I have one that I can share. Drop me a Slack message if you need it. Regards, Gustavo > > Apart from the problem, I uploaded the latest webrev: > http://cr.openjdk.java.net/~mhorie/8208171/webrev.03/ > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > Inactive hide details for Gustavo Romero ---2018/09/05 07:03:31---Hi Martin and Michi, On 09/04/2018 01:20 PM, Doerr, Martin wrGustavo Romero ---2018/09/05 07:03:31---Hi Martin and Michi, On 09/04/2018 01:20 PM, Doerr, Martin wrote: > > From: Gustavo Romero > To: "Doerr, Martin" , Michihiro Horie/Japan/IBM at IBMJP > Cc: "Lindenmaier, Goetz" , hotspot compiler > Date: 2018/09/05 07:03 > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > Hi Martin and Michi, > > On 09/04/2018 01:20 PM, Doerr, Martin wrote: > > Can you reproduce the test failures? > > > > The very same VM works fine on a different Power8 machine which uses the same instructions by C2. > > > > The VM was built on the machine where it works ("SUSE Linux Enterprise Server 12 SP1"). > > > > I have seen several linux kernel changes regarding saving and restoring the VSX registers. > > > > I still haven?t found out how the kernel determines things like ?tsk->thread.used_vsr? which is used to set ?msr |= MSR_VEC?. > > > > Maybe something is missing which tells the kernel that we?re using it. But that?s just a guess. > > Facilities like FP (fp registers), VEC (vector registers - aka VMX/Altivec), and > VSX (vector-scalar registers) are usually disabled on a new born process. Once > any instruction associated to these facilities is used in the process it causes > an exception that is treated by the kernel [1, 2, 3]: kernel enables the > facility that caused the exception (see load_up_fp & friends) and re-execute the > instruction when kernel returns the control back to the process in userspace. > > Starting from kernel v4.6 [4] there is a simple heuristic that employs a 8-bit > counter to help track if a process, after using these facilities for the first > time, continues to use the facilities. The counters (load_fp and load_vec) are > incremented on each context switch and if the process stops using the FP or VEC > facilities then they are disabled again with FP/VEC/VSX save/restore on context > switches being disabled as well in order to improve the performance on context > switches by avoiding the FP/VEC/VEX register save/restore. > > Either way (before or after the change introduced in v4.6) *that mechanism is > opaque to userspace*, particularly to the process using these facilities. If a > given facility is not enabled by the kernel (in case the CPU does not support > it, kernel sends a SIGILL to the process). It's possible to inspect the thread > member dynamics/state from userspace using tools like 'systemtap' (for > exemple, this simple script can be used to inspect a VRSAVE registers on given > thread that is running a program called 'vrsave_' [5]) or using the 'perf' tool. > > "tsk->thread.used_vsr" [6] is actually associated to the VSX facility whilst > MSR_VEC is associated to the VEC/VMX/Altivec facility [7], so > "tsk->thread.used_vsr" is set to 1 once a VSX instruction is used (if it's a new > process or if the load_fp and load_vec counters overflowed and became zero > disabling VSX or if only FP or only VEC ?- not both - were used in the process). > In that case kernel will also enable the VSX by MSR |= MSR_VSX. A similar > mechanism drives the FP (MSR_FP) and the VEC (MSR_VEC) facilities. > > If both FP and VEC facilities are used the VSX facility is enabled automatically > since FP+VEC regsets == VSX regset [8]. > > Thus as this mechanism is entirely opaque to userspace I understand that if a > program has to tell to kernel it wants to use any of these facilities > (FP/VEC/VEC) before using it there is something wrong going in kernelspace. > > Martin and Michi, if you want any help on drilling it further at kernel side > please let me know, maybe I can help. > > I didn't have the chance to reproduce the crash yet, so if I find anything > meaningful about it tomorrow I'll keep you posted. > > > Kind regards, > Gustavo > > [1] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L851-L869? ?(FP) > [2] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211?(VEC/VMX/Altivec) > [3] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211?(VSX) > [4] https://github.com/torvalds/linux/commit/70fe3d980#diff-ef76830326856a12ea2b45630123d1adR239 > [5] http://cr.openjdk.java.net/~gromero/script.d > [6] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R310 > [7] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R250 > [8] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R437 > > > Best regards, > > > > Martin > > > > *From:*Michihiro Horie > > *Sent:* Dienstag, 4. September 2018 07:32 > > *To:* Doerr, Martin > > *Cc:* Lindenmaier, Goetz ; Gustavo Romero ; hotspot compiler ; hotspot-dev at openjdk.java.net > > *Subject:* RE: RFR: 8208171: PPC64: Enrich SLP support > > > > Hi Goetz, Martin, and Gustavo, > > > > > >>First, this should have been reviewed on hotspot-compiler-dev. It is clearly > >>a compiler change. _ > > _>http://mail.openjdk.java.net/mailman/listinfo? says that hotspot-dev is for > >>"Technical discussion about the development of the HotSpot virtual machine that's not specific to any particular component" > >>while hotspot-compiler-dev is for > >>"Technical discussion about the development of the HotSpot bytecode compilers" > > I understood the instruction and would use hotspot-compiler-dev in future RFRs, thanks. > > > > > >> Why do you rename vnoreg to vnoregi? > > I followed the way of coding for vsnoreg and vsnoregi, but the renaming does not look necessary. I would get this part back. Should I also rename vsnoregi to vsnoreg? > > > > > >>we noticed jtreg test failures when using this change: > >>compiler/runtime/safepoints/TestRegisterRestoring.java > >>compiler/runtime/Test7196199.java > >> > >>TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. > >> > >>We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. > >>The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. > > Thank you for letting me know the issue, I will try to reproduce this on a SUSE machine. > > > > > >>I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. > > Thank you for pointing out another issue. Currently I do not hit this problem, but preventing to match the vector nodes makes sense to avoid the crash. I did not prepare match rules for non-vector nodes, so it might be better to prepare them similarly like the Replicate* rules, in any case. > > > > > > Gustavo, thanks for the wrap-up! > > > > > > Best regards, > > -- > > Michihiro, > > IBM Research - Tokyo > > > > Inactive hide details for "Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures whe"Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures when using this change: > > > > From: "Doerr, Martin" > > > To: Gustavo Romero >, "Lindenmaier, Goetz" >, Michihiro Horie > > > Cc: hotspot compiler >, "hotspot-dev at openjdk.java.net " > > > Date: 2018/09/04 02:18 > > Subject: RE: RFR: 8208171: PPC64: Enrich SLP support > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > > > > > > > Hi Gustavo and Michihiro, > > > > we noticed jtreg test failures when using this change: > > compiler/runtime/safepoints/TestRegisterRestoring.java > > compiler/runtime/Test7196199.java > > > > TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. > > > > We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. > > The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. > > > > That's what I found out so far. Maybe you have an idea? > > > > I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. > > > > Best regards, > > Martin > > > > > > -----Original Message----- > > From: hotspot-dev > On Behalf Of Gustavo Romero > > Sent: Montag, 3. September 2018 14:57 > > To: Lindenmaier, Goetz >; Michihiro Horie > > > Cc: hotspot compiler >; hotspot-dev at openjdk.java.net > > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > > > Hi Goetz, > > > > On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote: > >> Also, I can not find all of the mail traffic in > >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html. > >> Is this a problem of the pipermail server? > >> > >> For some reason this webrev lacks the links to browse the diffs. > >> Do you need to use a more recent webrev? ?You can obtain it with > >> hg clone http://hg.openjdk.java.net/code-tools/webrev/?. > > > > Yes, probably it was a problem of the pipermail or in some relay. > > I noted the same thing, i.e. at least one Michi reply arrived > > to me but missed a ML. > > > > The initial discussion is here: > > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html > > > > I understand Martin reviewed the last webrev in that thread, which is > > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ ? ?(taken from > > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html) > > > > Martin's review of webrev.01: > > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html > > > > and Michi's reply to Martin's review of webrev.01: > > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html?(with webrev.02, > > taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html). > > > > and your last review: > > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html > > > > > > HTH. > > > > Best regards, > > Gustavo > > > >> Why do you rename vnoreg to vnoregi? > >> > >> Besides that the change is fine, thanks for implementing this! > >> > >> Best regards, > >> ? ?Goetz. > >> > >> > >>> -----Original Message----- > >>> From: Doerr, Martin > >>> Sent: Dienstag, 28. August 2018 19:35 > >>> To: Gustavo Romero >; Michihiro Horie > >>> > > >>> Cc: Lindenmaier, Goetz >; hotspot- > >>> dev at openjdk.java.net ; ppc-aix-port-dev at openjdk.java.net ; Simonis, Volker > >>> > > >>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support > >>> > >>> Hi Michihiro, > >>> > >>> thank you for implementing it. I have just taken a first look at your > >>> webrev.01. > >>> > >>> It looks basically good. Only the Power version check seems to be incorrect. > >>> VM_Version::has_popcntb() checks for Power5. > >>> I believe most instructions are available with Power7. > >>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with > >>> Power8? > >>> We should check this carefully. > >>> > >>> Also, indentation in register_ppc.hpp could get improved. > >>> > >>> Thanks and best regard, > >>> Martin > >>> > >>> > >>> -----Original Message----- > >>> From: Gustavo Romero > > >>> Sent: Donnerstag, 26. Juli 2018 16:02 > >>> To: Michihiro Horie > > >>> Cc: Lindenmaier, Goetz >; hotspot- > >>> dev at openjdk.java.net ; Doerr, Martin >; ppc-aix- > >>> port-dev at openjdk.java.net ; Simonis, Volker > > >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > >>> > >>> Hi Michi, > >>> > >>> On 07/26/2018 01:43 AM, Michihiro Horie wrote: > >>>> I updated webrev: > >>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ ? > >>> > >>> Thanks for providing an updated webrev and for fixing indentation and > >>> function > >>> order in assembler_ppc.inline.hpp as well. I have no further comments :) > >>> > >>> > >>> Best Regards, > >>> Gustavo > >>> > >>>> > >>>> Best regards, > >>>> -- > >>>> Michihiro, > >>>> IBM Research - Tokyo > >>>> > >>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, > >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero --- > >>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie > >>> wrote: > >>>> > >>>> From: Gustavo Romero > > >>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port- > >>> dev at openjdk.java.net , hotspot-dev at openjdk.java.net > >>>> Cc: goetz.lindenmaier at sap.com , volker.simonis at sap.com , "Doerr, Martin" > >>> > > >>>> Date: 2018/07/25 23:05 > >>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > >>>> > >>>> ------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ----------------------------------------------------- > >>>> > >>>> > >>>> > >>>> Hi Michi, > >>>> > >>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote: > >>>> ? > Dear all, > >>>> ? > > >>>> ? > Would you review the following change? > >>>> ? > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 > >>>> ? > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 ? > >>>> ? > > >>>> ? > This change adds support for vectorized arithmetic calculation with SLP. > >>>> ? > > >>>> ? > The to_vr function is added to convert VSR to VR. Currently, vecX is > >>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, > >>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the > >>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the > >>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to > >>> the ConvD2FNode::Value in convertnode.cpp. > >>>> > >>>> Looks good. Just a few comments: > >>>> > >>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of > >>> vmaddfp in > >>>> ? ? order to avoid the splat? > >>>> > >>>> - Although all instructions added by your change where introduced in ISA > >>> 2.06, > >>>> ? ? so POWER7 and above are OK, as I see probes for > >>> PowerArchictecturePPC64=6|5 in > >>>> ? ? vm_version_ppc.cpp (line 64), ?I'm wondering if there is any control point > >>> to > >>>> ? ? guarantee that these instructions won't be emitted on a CPU that does > >>> not > >>>> ? ? support them. > >>>> > >>>> - I think that in general string in format %{} are in upper case. For instance, > >>>> ? ? this the current output on optoassembly for vmul4F: > >>>> > >>>> 2941835 5b4 ? ? ADDI ? ?R24, R24, #64 > >>>> 2941836 5b8 ? ? vmaddfp ?VSR32,VSR32,VSR36 ? ? ?! mul packed4F > >>>> 2941837 5c0 ? ? STXVD2X ? ? [R17], VSR32 ? ? ? ?// store 16-byte Vector > >>>> > >>>> ? ? I think it would be better to be in upper case instead. I also think that if > >>>> ? ? the node match emits more than one instruction all instructions must be > >>> listed > >>>> ? ? in format %{}, since it's meant for detailed debugging. Finally I think it > >>>> ? ? would be better to replace \t! by \t// in that string (unless I'm missing any > >>>> ? ? special meaning for that char). So for vmul4F it would be something like: > >>>> > >>>> 2941835 5b4 ? ? ADDI ? ? ?R24, R24, #64 > >>>> ? ? ? ? ? ? ? ? ? VSPLTISW ?VSR34, 0 ? ? ? ? ? ? ? ? // Splat 0 imm in VSR34 > >>>> 2941836 5b8 ? ? VMADDFP ? VSR32,VSR32,VSR36,VSR34 ?// Mul packed4F > >>>> 2941837 5c0 ? ? STXVD2X ? [R17], VSR32 ? ? ? ? ? ? // store 16-byte Vector > >>>> > >>>> > >>>> But feel free to change anything just after you get additional reviews :) > >>>> > >>>> > >>>> ? > I confirmed this change with JTREG. In addition, I used attached micro > >>> benchmarks. > >>>> ? > /(See attached file: slp_microbench.zip)/ > >>>> > >>>> Thanks for sharing it. > >>>> Btw, another option to host it would be in the CR > >>>> server, in http://cr.openjdk.java.net/~mhorie/8208171 ? > >>>> > >>>> > >>>> Best regards, > >>>> Gustavo > >>>> > >>>> ? > > >>>> ? > Best regards, > >>>> ? > -- > >>>> ? > Michihiro, > >>>> ? > IBM Research - Tokyo > >>>> ? > > >>>> > >>>> > >>>> > >> > > > > > > > > > > > From ekaterina.pavlova at oracle.com Wed Sep 5 20:17:18 2018 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Wed, 5 Sep 2018 13:17:18 -0700 Subject: RFR (XS) 8208100: compiler/graalunit/JttLangMathALTest.java times out intermittently on Linux-X64 In-Reply-To: References: <92a20185-5422-6110-38a6-140ea71dfbd0@oracle.com> <68A63ADA-A4C1-4CB0-A869-B6B995BCAE04@oracle.com> <37916ab4-32fa-4d7d-0783-20adeaf5f7f7@oracle.com> Message-ID: <5d86a9ab-eded-08e5-b790-ed221ff05892@oracle.com> On 8/29/18 11:41 AM, Doug Simon wrote: > > >> On 29 Aug 2018, at 19:23, dean.long at oracle.com wrote: >> >> On 8/29/18 2:11 AM, Doug Simon wrote: >>> When running these tests on Graal tip against JDK 11, I get: >>> >>> mx --java-home=/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home unittest org.graalvm.compiler.jtt.lang.Math >>> ... >>> 10 longest running test classes: >>> 21.115 ms org.graalvm.compiler.jtt.lang.Math_log10 >>> 11.921 ms org.graalvm.compiler.jtt.lang.Math_log >>> 10.460 ms org.graalvm.compiler.jtt.lang.Math_sqrt >>> 3.525 ms org.graalvm.compiler.jtt.lang.Math_pow >>> 1.937 ms org.graalvm.compiler.jtt.lang.Math_sin >>> 1.689 ms org.graalvm.compiler.jtt.lang.Math_tan >>> 1.550 ms org.graalvm.compiler.jtt.lang.Math_exp >>> 1.537 ms org.graalvm.compiler.jtt.lang.Math_cos >>> 1.526 ms org.graalvm.compiler.jtt.lang.Math_abs >>> 338 ms org.graalvm.compiler.jtt.lang.Math_round >>> 10 longest running tests: >>> 10.583 ms run7(org.graalvm.compiler.jtt.lang.Math_log) >>> 10.335 ms run7(org.graalvm.compiler.jtt.lang.Math_sqrt) >>> 3.468 ms run11(org.graalvm.compiler.jtt.lang.Math_pow) >>> 1.666 ms run5(org.graalvm.compiler.jtt.lang.Math_sin) >>> 1.533 ms run5(org.graalvm.compiler.jtt.lang.Math_tan) >>> 1.519 ms run8(org.graalvm.compiler.jtt.lang.Math_exp) >>> 1.456 ms run3(org.graalvm.compiler.jtt.lang.Math_cos) >>> 1.371 ms run7(org.graalvm.compiler.jtt.lang.Math_abs) >>> 1.024 ms run0(org.graalvm.compiler.jtt.lang.Math_log) >>> 84 ms run0(org.graalvm.compiler.jtt.lang.Math_sin) >>> >>> All seems as expected. >>> >>> I wanted to try it against JDK tip but https://bugs.openjdk.java.net/browse/JDK-8209301 means GitHub Graal cannot be compiled against the latest JDK: >>> >>> /Users/dsimon/graal/graal/compiler/src/jdk.tools.jaotc/src/jdk/tools/jaotc/AOTCompiledClass.java:77: error: cannot find symbol >>> this.metadataName = type.isAnonymous() ? "anon<" + classId + ">" : type.getName(); >>> ^ >>> symbol: method isAnonymous() >>> location: variable type of type HotSpotResolvedObjectType >>> 1 error >>> >>> To fix this, AOTCompiledClass.java will either have to use reflection to access HotSpotResolvedObjectType.isAnonymous[Unsafe]Anonymous or a jdk12 versioned project will have to be added. >> >> It looks like they forgot to rename getHostClass(), so we could replace uses of isAnonymous() with getHostClass() != null. I think I added isAnonymous() first and then getHostClass() was added later. > > I assume `getHostClass() != null` is more expensive than `isAnonymous()` but it probably doesn't matter here. Either way, this will have to be resolved before the next Graal sync. Doug, could you please point to the bug id this issue is going to be tracked by. thanks, -katya > -Doug > >>> -Doug >>> >>>> On 29 Aug 2018, at 10:08, Ekaterina Pavlova wrote: >>>> >>>> I meant org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_sqrt >>>> >>>> >>>> On 8/29/18 1:05 AM, Ekaterina Pavlova wrote: >>>>> org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_log always were more than 10 times slower than other org.graalvm.compiler.jtt.lang.Math tests. But I agree, lets wait what Graal team will say regarding this slowness. >>>>> thanks, >>>>> -katya >>>>> On 8/28/18 3:30 PM, Vladimir Kozlov wrote: >>>>>> Before increase timeout Labs should look on this test and investigate it strange behavior - last iteration takes long time: >>>>>> >>>>>> run5: Passed 228.9 ms >>>>>> run6: Passed 145.7 ms >>>>>> run7: Passed 833395.5 ms >>>>>> org.graalvm.compiler.jtt.lang.Math_log finished 836029.5 ms >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 8/27/18 9:03 AM, Ekaterina Pavlova wrote: >>>>>>> Hi All, >>>>>>> >>>>>>> compiler/graalunit/JttLangMathALTest.java continues to timeout on slow machines. >>>>>>> Increased default timeout (120 seconds) in twice. Please review. >>>>>>> >>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8208100 >>>>>>> webrev: http://cr.openjdk.java.net/~epavlova/8208100/webrev.00/index.html >>>>>>> testing: Tested by running the test 10 times on all platforms. >>>>>>> >>>>>>> >>>>>>> thanks, >>>>>>> -katya > From doug.simon at oracle.com Wed Sep 5 20:29:06 2018 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 5 Sep 2018 22:29:06 +0200 Subject: RFR (XS) 8208100: compiler/graalunit/JttLangMathALTest.java times out intermittently on Linux-X64 In-Reply-To: <5d86a9ab-eded-08e5-b790-ed221ff05892@oracle.com> References: <92a20185-5422-6110-38a6-140ea71dfbd0@oracle.com> <68A63ADA-A4C1-4CB0-A869-B6B995BCAE04@oracle.com> <37916ab4-32fa-4d7d-0783-20adeaf5f7f7@oracle.com> <5d86a9ab-eded-08e5-b790-ed221ff05892@oracle.com> Message-ID: <2A0A38E8-460D-4B66-A0A6-55ACE5A2688F@oracle.com> Hi Katya, > On 5 Sep 2018, at 22:17, Ekaterina Pavlova wrote: > > On 8/29/18 11:41 AM, Doug Simon wrote: >>> On 29 Aug 2018, at 19:23, dean.long at oracle.com wrote: >>> >>> On 8/29/18 2:11 AM, Doug Simon wrote: >>>> When running these tests on Graal tip against JDK 11, I get: >>>> >>>> mx --java-home=/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home unittest org.graalvm.compiler.jtt.lang.Math >>>> ... >>>> 10 longest running test classes: >>>> 21.115 ms org.graalvm.compiler.jtt.lang.Math_log10 >>>> 11.921 ms org.graalvm.compiler.jtt.lang.Math_log >>>> 10.460 ms org.graalvm.compiler.jtt.lang.Math_sqrt >>>> 3.525 ms org.graalvm.compiler.jtt.lang.Math_pow >>>> 1.937 ms org.graalvm.compiler.jtt.lang.Math_sin >>>> 1.689 ms org.graalvm.compiler.jtt.lang.Math_tan >>>> 1.550 ms org.graalvm.compiler.jtt.lang.Math_exp >>>> 1.537 ms org.graalvm.compiler.jtt.lang.Math_cos >>>> 1.526 ms org.graalvm.compiler.jtt.lang.Math_abs >>>> 338 ms org.graalvm.compiler.jtt.lang.Math_round >>>> 10 longest running tests: >>>> 10.583 ms run7(org.graalvm.compiler.jtt.lang.Math_log) >>>> 10.335 ms run7(org.graalvm.compiler.jtt.lang.Math_sqrt) >>>> 3.468 ms run11(org.graalvm.compiler.jtt.lang.Math_pow) >>>> 1.666 ms run5(org.graalvm.compiler.jtt.lang.Math_sin) >>>> 1.533 ms run5(org.graalvm.compiler.jtt.lang.Math_tan) >>>> 1.519 ms run8(org.graalvm.compiler.jtt.lang.Math_exp) >>>> 1.456 ms run3(org.graalvm.compiler.jtt.lang.Math_cos) >>>> 1.371 ms run7(org.graalvm.compiler.jtt.lang.Math_abs) >>>> 1.024 ms run0(org.graalvm.compiler.jtt.lang.Math_log) >>>> 84 ms run0(org.graalvm.compiler.jtt.lang.Math_sin) >>>> >>>> All seems as expected. >>>> >>>> I wanted to try it against JDK tip but https://bugs.openjdk.java.net/browse/JDK-8209301 means GitHub Graal cannot be compiled against the latest JDK: >>>> >>>> /Users/dsimon/graal/graal/compiler/src/jdk.tools.jaotc/src/jdk/tools/jaotc/AOTCompiledClass.java:77: error: cannot find symbol >>>> this.metadataName = type.isAnonymous() ? "anon<" + classId + ">" : type.getName(); >>>> ^ >>>> symbol: method isAnonymous() >>>> location: variable type of type HotSpotResolvedObjectType >>>> 1 error >>>> >>>> To fix this, AOTCompiledClass.java will either have to use reflection to access HotSpotResolvedObjectType.isAnonymous[Unsafe]Anonymous or a jdk12 versioned project will have to be added. >>> >>> It looks like they forgot to rename getHostClass(), so we could replace uses of isAnonymous() with getHostClass() != null. I think I added isAnonymous() first and then getHostClass() was added later. >> I assume `getHostClass() != null` is more expensive than `isAnonymous()` but it probably doesn't matter here. Either way, this will have to be resolved before the next Graal sync. > > Doug, could you please point to the bug id this issue is going to be tracked by. I don't have a bug id for this issue - feel free to open one and assign it to me. I left a note pointing out the Graal compilation issue along with Dean's recommended fix: https://bugs.openjdk.java.net/browse/JDK-8209301?focusedCommentId=14208481&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14208481 -Doug >> -Doug >>>> -Doug >>>> >>>>> On 29 Aug 2018, at 10:08, Ekaterina Pavlova wrote: >>>>> >>>>> I meant org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_sqrt >>>>> >>>>> >>>>> On 8/29/18 1:05 AM, Ekaterina Pavlova wrote: >>>>>> org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_log always were more than 10 times slower than other org.graalvm.compiler.jtt.lang.Math tests. But I agree, lets wait what Graal team will say regarding this slowness. >>>>>> thanks, >>>>>> -katya >>>>>> On 8/28/18 3:30 PM, Vladimir Kozlov wrote: >>>>>>> Before increase timeout Labs should look on this test and investigate it strange behavior - last iteration takes long time: >>>>>>> >>>>>>> run5: Passed 228.9 ms >>>>>>> run6: Passed 145.7 ms >>>>>>> run7: Passed 833395.5 ms >>>>>>> org.graalvm.compiler.jtt.lang.Math_log finished 836029.5 ms >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 8/27/18 9:03 AM, Ekaterina Pavlova wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> compiler/graalunit/JttLangMathALTest.java continues to timeout on slow machines. >>>>>>>> Increased default timeout (120 seconds) in twice. Please review. >>>>>>>> >>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8208100 >>>>>>>> webrev: http://cr.openjdk.java.net/~epavlova/8208100/webrev.00/index.html >>>>>>>> testing: Tested by running the test 10 times on all platforms. >>>>>>>> >>>>>>>> >>>>>>>> thanks, >>>>>>>> -katya From ekaterina.pavlova at oracle.com Wed Sep 5 21:10:50 2018 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Wed, 5 Sep 2018 14:10:50 -0700 Subject: RFR (XS) 8208100: compiler/graalunit/JttLangMathALTest.java times out intermittently on Linux-X64 In-Reply-To: <2A0A38E8-460D-4B66-A0A6-55ACE5A2688F@oracle.com> References: <92a20185-5422-6110-38a6-140ea71dfbd0@oracle.com> <68A63ADA-A4C1-4CB0-A869-B6B995BCAE04@oracle.com> <37916ab4-32fa-4d7d-0783-20adeaf5f7f7@oracle.com> <5d86a9ab-eded-08e5-b790-ed221ff05892@oracle.com> <2A0A38E8-460D-4B66-A0A6-55ACE5A2688F@oracle.com> Message-ID: <75f90080-aaa5-5b9c-a1a3-f580eec4919b@oracle.com> Hi Doug, I have created JDK-8210434. -katya On 9/5/18 1:29 PM, Doug Simon wrote: > Hi Katya, > >> On 5 Sep 2018, at 22:17, Ekaterina Pavlova wrote: >> >> On 8/29/18 11:41 AM, Doug Simon wrote: >>>> On 29 Aug 2018, at 19:23, dean.long at oracle.com wrote: >>>> >>>> On 8/29/18 2:11 AM, Doug Simon wrote: >>>>> When running these tests on Graal tip against JDK 11, I get: >>>>> >>>>> mx --java-home=/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home unittest org.graalvm.compiler.jtt.lang.Math >>>>> ... >>>>> 10 longest running test classes: >>>>> 21.115 ms org.graalvm.compiler.jtt.lang.Math_log10 >>>>> 11.921 ms org.graalvm.compiler.jtt.lang.Math_log >>>>> 10.460 ms org.graalvm.compiler.jtt.lang.Math_sqrt >>>>> 3.525 ms org.graalvm.compiler.jtt.lang.Math_pow >>>>> 1.937 ms org.graalvm.compiler.jtt.lang.Math_sin >>>>> 1.689 ms org.graalvm.compiler.jtt.lang.Math_tan >>>>> 1.550 ms org.graalvm.compiler.jtt.lang.Math_exp >>>>> 1.537 ms org.graalvm.compiler.jtt.lang.Math_cos >>>>> 1.526 ms org.graalvm.compiler.jtt.lang.Math_abs >>>>> 338 ms org.graalvm.compiler.jtt.lang.Math_round >>>>> 10 longest running tests: >>>>> 10.583 ms run7(org.graalvm.compiler.jtt.lang.Math_log) >>>>> 10.335 ms run7(org.graalvm.compiler.jtt.lang.Math_sqrt) >>>>> 3.468 ms run11(org.graalvm.compiler.jtt.lang.Math_pow) >>>>> 1.666 ms run5(org.graalvm.compiler.jtt.lang.Math_sin) >>>>> 1.533 ms run5(org.graalvm.compiler.jtt.lang.Math_tan) >>>>> 1.519 ms run8(org.graalvm.compiler.jtt.lang.Math_exp) >>>>> 1.456 ms run3(org.graalvm.compiler.jtt.lang.Math_cos) >>>>> 1.371 ms run7(org.graalvm.compiler.jtt.lang.Math_abs) >>>>> 1.024 ms run0(org.graalvm.compiler.jtt.lang.Math_log) >>>>> 84 ms run0(org.graalvm.compiler.jtt.lang.Math_sin) >>>>> >>>>> All seems as expected. >>>>> >>>>> I wanted to try it against JDK tip but https://bugs.openjdk.java.net/browse/JDK-8209301 means GitHub Graal cannot be compiled against the latest JDK: >>>>> >>>>> /Users/dsimon/graal/graal/compiler/src/jdk.tools.jaotc/src/jdk/tools/jaotc/AOTCompiledClass.java:77: error: cannot find symbol >>>>> this.metadataName = type.isAnonymous() ? "anon<" + classId + ">" : type.getName(); >>>>> ^ >>>>> symbol: method isAnonymous() >>>>> location: variable type of type HotSpotResolvedObjectType >>>>> 1 error >>>>> >>>>> To fix this, AOTCompiledClass.java will either have to use reflection to access HotSpotResolvedObjectType.isAnonymous[Unsafe]Anonymous or a jdk12 versioned project will have to be added. >>>> >>>> It looks like they forgot to rename getHostClass(), so we could replace uses of isAnonymous() with getHostClass() != null. I think I added isAnonymous() first and then getHostClass() was added later. >>> I assume `getHostClass() != null` is more expensive than `isAnonymous()` but it probably doesn't matter here. Either way, this will have to be resolved before the next Graal sync. >> >> Doug, could you please point to the bug id this issue is going to be tracked by. > > I don't have a bug id for this issue - feel free to open one and assign it to me. > > I left a note pointing out the Graal compilation issue along with Dean's recommended fix: > > https://bugs.openjdk.java.net/browse/JDK-8209301?focusedCommentId=14208481&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14208481 > > -Doug > >>> -Doug >>>>> -Doug >>>>> >>>>>> On 29 Aug 2018, at 10:08, Ekaterina Pavlova wrote: >>>>>> >>>>>> I meant org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_sqrt >>>>>> >>>>>> >>>>>> On 8/29/18 1:05 AM, Ekaterina Pavlova wrote: >>>>>>> org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_log always were more than 10 times slower than other org.graalvm.compiler.jtt.lang.Math tests. But I agree, lets wait what Graal team will say regarding this slowness. >>>>>>> thanks, >>>>>>> -katya >>>>>>> On 8/28/18 3:30 PM, Vladimir Kozlov wrote: >>>>>>>> Before increase timeout Labs should look on this test and investigate it strange behavior - last iteration takes long time: >>>>>>>> >>>>>>>> run5: Passed 228.9 ms >>>>>>>> run6: Passed 145.7 ms >>>>>>>> run7: Passed 833395.5 ms >>>>>>>> org.graalvm.compiler.jtt.lang.Math_log finished 836029.5 ms >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 8/27/18 9:03 AM, Ekaterina Pavlova wrote: >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> compiler/graalunit/JttLangMathALTest.java continues to timeout on slow machines. >>>>>>>>> Increased default timeout (120 seconds) in twice. Please review. >>>>>>>>> >>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8208100 >>>>>>>>> webrev: http://cr.openjdk.java.net/~epavlova/8208100/webrev.00/index.html >>>>>>>>> testing: Tested by running the test 10 times on all platforms. >>>>>>>>> >>>>>>>>> >>>>>>>>> thanks, >>>>>>>>> -katya > From doug.simon at oracle.com Wed Sep 5 21:14:43 2018 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 5 Sep 2018 23:14:43 +0200 Subject: RFR (XS) 8208100: compiler/graalunit/JttLangMathALTest.java times out intermittently on Linux-X64 In-Reply-To: <75f90080-aaa5-5b9c-a1a3-f580eec4919b@oracle.com> References: <92a20185-5422-6110-38a6-140ea71dfbd0@oracle.com> <68A63ADA-A4C1-4CB0-A869-B6B995BCAE04@oracle.com> <37916ab4-32fa-4d7d-0783-20adeaf5f7f7@oracle.com> <5d86a9ab-eded-08e5-b790-ed221ff05892@oracle.com> <2A0A38E8-460D-4B66-A0A6-55ACE5A2688F@oracle.com> <75f90080-aaa5-5b9c-a1a3-f580eec4919b@oracle.com> Message-ID: > On 5 Sep 2018, at 23:10, Ekaterina Pavlova wrote: > > Hi Doug, > > I have created JDK-8210434. Ok. I thought you were talking about a bug id for the failing tests. Dean, I'll re-assign JDK-8210434 to you since it's a jaotc issue. -Doug > On 9/5/18 1:29 PM, Doug Simon wrote: >> Hi Katya, >>> On 5 Sep 2018, at 22:17, Ekaterina Pavlova wrote: >>> >>> On 8/29/18 11:41 AM, Doug Simon wrote: >>>>> On 29 Aug 2018, at 19:23, dean.long at oracle.com wrote: >>>>> >>>>> On 8/29/18 2:11 AM, Doug Simon wrote: >>>>>> When running these tests on Graal tip against JDK 11, I get: >>>>>> >>>>>> mx --java-home=/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home unittest org.graalvm.compiler.jtt.lang.Math >>>>>> ... >>>>>> 10 longest running test classes: >>>>>> 21.115 ms org.graalvm.compiler.jtt.lang.Math_log10 >>>>>> 11.921 ms org.graalvm.compiler.jtt.lang.Math_log >>>>>> 10.460 ms org.graalvm.compiler.jtt.lang.Math_sqrt >>>>>> 3.525 ms org.graalvm.compiler.jtt.lang.Math_pow >>>>>> 1.937 ms org.graalvm.compiler.jtt.lang.Math_sin >>>>>> 1.689 ms org.graalvm.compiler.jtt.lang.Math_tan >>>>>> 1.550 ms org.graalvm.compiler.jtt.lang.Math_exp >>>>>> 1.537 ms org.graalvm.compiler.jtt.lang.Math_cos >>>>>> 1.526 ms org.graalvm.compiler.jtt.lang.Math_abs >>>>>> 338 ms org.graalvm.compiler.jtt.lang.Math_round >>>>>> 10 longest running tests: >>>>>> 10.583 ms run7(org.graalvm.compiler.jtt.lang.Math_log) >>>>>> 10.335 ms run7(org.graalvm.compiler.jtt.lang.Math_sqrt) >>>>>> 3.468 ms run11(org.graalvm.compiler.jtt.lang.Math_pow) >>>>>> 1.666 ms run5(org.graalvm.compiler.jtt.lang.Math_sin) >>>>>> 1.533 ms run5(org.graalvm.compiler.jtt.lang.Math_tan) >>>>>> 1.519 ms run8(org.graalvm.compiler.jtt.lang.Math_exp) >>>>>> 1.456 ms run3(org.graalvm.compiler.jtt.lang.Math_cos) >>>>>> 1.371 ms run7(org.graalvm.compiler.jtt.lang.Math_abs) >>>>>> 1.024 ms run0(org.graalvm.compiler.jtt.lang.Math_log) >>>>>> 84 ms run0(org.graalvm.compiler.jtt.lang.Math_sin) >>>>>> >>>>>> All seems as expected. >>>>>> >>>>>> I wanted to try it against JDK tip but https://bugs.openjdk.java.net/browse/JDK-8209301 means GitHub Graal cannot be compiled against the latest JDK: >>>>>> >>>>>> /Users/dsimon/graal/graal/compiler/src/jdk.tools.jaotc/src/jdk/tools/jaotc/AOTCompiledClass.java:77: error: cannot find symbol >>>>>> this.metadataName = type.isAnonymous() ? "anon<" + classId + ">" : type.getName(); >>>>>> ^ >>>>>> symbol: method isAnonymous() >>>>>> location: variable type of type HotSpotResolvedObjectType >>>>>> 1 error >>>>>> >>>>>> To fix this, AOTCompiledClass.java will either have to use reflection to access HotSpotResolvedObjectType.isAnonymous[Unsafe]Anonymous or a jdk12 versioned project will have to be added. >>>>> >>>>> It looks like they forgot to rename getHostClass(), so we could replace uses of isAnonymous() with getHostClass() != null. I think I added isAnonymous() first and then getHostClass() was added later. >>>> I assume `getHostClass() != null` is more expensive than `isAnonymous()` but it probably doesn't matter here. Either way, this will have to be resolved before the next Graal sync. >>> >>> Doug, could you please point to the bug id this issue is going to be tracked by. >> I don't have a bug id for this issue - feel free to open one and assign it to me. >> I left a note pointing out the Graal compilation issue along with Dean's recommended fix: >> https://bugs.openjdk.java.net/browse/JDK-8209301?focusedCommentId=14208481&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14208481 >> -Doug >>>> -Doug >>>>>> -Doug >>>>>> >>>>>>> On 29 Aug 2018, at 10:08, Ekaterina Pavlova wrote: >>>>>>> >>>>>>> I meant org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_sqrt >>>>>>> >>>>>>> >>>>>>> On 8/29/18 1:05 AM, Ekaterina Pavlova wrote: >>>>>>>> org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_log always were more than 10 times slower than other org.graalvm.compiler.jtt.lang.Math tests. But I agree, lets wait what Graal team will say regarding this slowness. >>>>>>>> thanks, >>>>>>>> -katya >>>>>>>> On 8/28/18 3:30 PM, Vladimir Kozlov wrote: >>>>>>>>> Before increase timeout Labs should look on this test and investigate it strange behavior - last iteration takes long time: >>>>>>>>> >>>>>>>>> run5: Passed 228.9 ms >>>>>>>>> run6: Passed 145.7 ms >>>>>>>>> run7: Passed 833395.5 ms >>>>>>>>> org.graalvm.compiler.jtt.lang.Math_log finished 836029.5 ms >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>> On 8/27/18 9:03 AM, Ekaterina Pavlova wrote: >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> compiler/graalunit/JttLangMathALTest.java continues to timeout on slow machines. >>>>>>>>>> Increased default timeout (120 seconds) in twice. Please review. >>>>>>>>>> >>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8208100 >>>>>>>>>> webrev: http://cr.openjdk.java.net/~epavlova/8208100/webrev.00/index.html >>>>>>>>>> testing: Tested by running the test 10 times on all platforms. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> thanks, >>>>>>>>>> -katya > From ekaterina.pavlova at oracle.com Wed Sep 5 21:22:35 2018 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Wed, 5 Sep 2018 14:22:35 -0700 Subject: RFR (XS) 8208100: compiler/graalunit/JttLangMathALTest.java times out intermittently on Linux-X64 In-Reply-To: References: <92a20185-5422-6110-38a6-140ea71dfbd0@oracle.com> <68A63ADA-A4C1-4CB0-A869-B6B995BCAE04@oracle.com> <37916ab4-32fa-4d7d-0783-20adeaf5f7f7@oracle.com> <5d86a9ab-eded-08e5-b790-ed221ff05892@oracle.com> <2A0A38E8-460D-4B66-A0A6-55ACE5A2688F@oracle.com> <75f90080-aaa5-5b9c-a1a3-f580eec4919b@oracle.com> Message-ID: <1ffdeaed-892c-cb5a-31b6-21955147008c@oracle.com> Well, compiler/graalunit/JttLangMathALTest.java doesn't really fail, the test just runs slowly because slow org.graalvm.compiler.jtt.lang.Math_log sub-tests. Graal team doesn't see this slowness when running these tests from Graal ws. However latest jdk is not used by default. The attempt to use latest jdk failed because of 8209301. Let me know if I am missing anything. thanks, -katya On 9/5/18 2:14 PM, Doug Simon wrote: > > >> On 5 Sep 2018, at 23:10, Ekaterina Pavlova wrote: >> >> Hi Doug, >> >> I have created JDK-8210434. > > Ok. I thought you were talking about a bug id for the failing tests. > > Dean, I'll re-assign JDK-8210434 to you since it's a jaotc issue. > > -Doug > >> On 9/5/18 1:29 PM, Doug Simon wrote: >>> Hi Katya, >>>> On 5 Sep 2018, at 22:17, Ekaterina Pavlova wrote: >>>> >>>> On 8/29/18 11:41 AM, Doug Simon wrote: >>>>>> On 29 Aug 2018, at 19:23, dean.long at oracle.com wrote: >>>>>> >>>>>> On 8/29/18 2:11 AM, Doug Simon wrote: >>>>>>> When running these tests on Graal tip against JDK 11, I get: >>>>>>> >>>>>>> mx --java-home=/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home unittest org.graalvm.compiler.jtt.lang.Math >>>>>>> ... >>>>>>> 10 longest running test classes: >>>>>>> 21.115 ms org.graalvm.compiler.jtt.lang.Math_log10 >>>>>>> 11.921 ms org.graalvm.compiler.jtt.lang.Math_log >>>>>>> 10.460 ms org.graalvm.compiler.jtt.lang.Math_sqrt >>>>>>> 3.525 ms org.graalvm.compiler.jtt.lang.Math_pow >>>>>>> 1.937 ms org.graalvm.compiler.jtt.lang.Math_sin >>>>>>> 1.689 ms org.graalvm.compiler.jtt.lang.Math_tan >>>>>>> 1.550 ms org.graalvm.compiler.jtt.lang.Math_exp >>>>>>> 1.537 ms org.graalvm.compiler.jtt.lang.Math_cos >>>>>>> 1.526 ms org.graalvm.compiler.jtt.lang.Math_abs >>>>>>> 338 ms org.graalvm.compiler.jtt.lang.Math_round >>>>>>> 10 longest running tests: >>>>>>> 10.583 ms run7(org.graalvm.compiler.jtt.lang.Math_log) >>>>>>> 10.335 ms run7(org.graalvm.compiler.jtt.lang.Math_sqrt) >>>>>>> 3.468 ms run11(org.graalvm.compiler.jtt.lang.Math_pow) >>>>>>> 1.666 ms run5(org.graalvm.compiler.jtt.lang.Math_sin) >>>>>>> 1.533 ms run5(org.graalvm.compiler.jtt.lang.Math_tan) >>>>>>> 1.519 ms run8(org.graalvm.compiler.jtt.lang.Math_exp) >>>>>>> 1.456 ms run3(org.graalvm.compiler.jtt.lang.Math_cos) >>>>>>> 1.371 ms run7(org.graalvm.compiler.jtt.lang.Math_abs) >>>>>>> 1.024 ms run0(org.graalvm.compiler.jtt.lang.Math_log) >>>>>>> 84 ms run0(org.graalvm.compiler.jtt.lang.Math_sin) >>>>>>> >>>>>>> All seems as expected. >>>>>>> >>>>>>> I wanted to try it against JDK tip but https://bugs.openjdk.java.net/browse/JDK-8209301 means GitHub Graal cannot be compiled against the latest JDK: >>>>>>> >>>>>>> /Users/dsimon/graal/graal/compiler/src/jdk.tools.jaotc/src/jdk/tools/jaotc/AOTCompiledClass.java:77: error: cannot find symbol >>>>>>> this.metadataName = type.isAnonymous() ? "anon<" + classId + ">" : type.getName(); >>>>>>> ^ >>>>>>> symbol: method isAnonymous() >>>>>>> location: variable type of type HotSpotResolvedObjectType >>>>>>> 1 error >>>>>>> >>>>>>> To fix this, AOTCompiledClass.java will either have to use reflection to access HotSpotResolvedObjectType.isAnonymous[Unsafe]Anonymous or a jdk12 versioned project will have to be added. >>>>>> >>>>>> It looks like they forgot to rename getHostClass(), so we could replace uses of isAnonymous() with getHostClass() != null. I think I added isAnonymous() first and then getHostClass() was added later. >>>>> I assume `getHostClass() != null` is more expensive than `isAnonymous()` but it probably doesn't matter here. Either way, this will have to be resolved before the next Graal sync. >>>> >>>> Doug, could you please point to the bug id this issue is going to be tracked by. >>> I don't have a bug id for this issue - feel free to open one and assign it to me. >>> I left a note pointing out the Graal compilation issue along with Dean's recommended fix: >>> https://bugs.openjdk.java.net/browse/JDK-8209301?focusedCommentId=14208481&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14208481 >>> -Doug >>>>> -Doug >>>>>>> -Doug >>>>>>> >>>>>>>> On 29 Aug 2018, at 10:08, Ekaterina Pavlova wrote: >>>>>>>> >>>>>>>> I meant org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_sqrt >>>>>>>> >>>>>>>> >>>>>>>> On 8/29/18 1:05 AM, Ekaterina Pavlova wrote: >>>>>>>>> org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_log always were more than 10 times slower than other org.graalvm.compiler.jtt.lang.Math tests. But I agree, lets wait what Graal team will say regarding this slowness. >>>>>>>>> thanks, >>>>>>>>> -katya >>>>>>>>> On 8/28/18 3:30 PM, Vladimir Kozlov wrote: >>>>>>>>>> Before increase timeout Labs should look on this test and investigate it strange behavior - last iteration takes long time: >>>>>>>>>> >>>>>>>>>> run5: Passed 228.9 ms >>>>>>>>>> run6: Passed 145.7 ms >>>>>>>>>> run7: Passed 833395.5 ms >>>>>>>>>> org.graalvm.compiler.jtt.lang.Math_log finished 836029.5 ms >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vladimir >>>>>>>>>> >>>>>>>>>> On 8/27/18 9:03 AM, Ekaterina Pavlova wrote: >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> compiler/graalunit/JttLangMathALTest.java continues to timeout on slow machines. >>>>>>>>>>> Increased default timeout (120 seconds) in twice. Please review. >>>>>>>>>>> >>>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8208100 >>>>>>>>>>> webrev: http://cr.openjdk.java.net/~epavlova/8208100/webrev.00/index.html >>>>>>>>>>> testing: Tested by running the test 10 times on all platforms. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> thanks, >>>>>>>>>>> -katya >> > From doug.simon at oracle.com Wed Sep 5 21:35:50 2018 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 5 Sep 2018 23:35:50 +0200 Subject: RFR (XS) 8208100: compiler/graalunit/JttLangMathALTest.java times out intermittently on Linux-X64 In-Reply-To: <1ffdeaed-892c-cb5a-31b6-21955147008c@oracle.com> References: <92a20185-5422-6110-38a6-140ea71dfbd0@oracle.com> <68A63ADA-A4C1-4CB0-A869-B6B995BCAE04@oracle.com> <37916ab4-32fa-4d7d-0783-20adeaf5f7f7@oracle.com> <5d86a9ab-eded-08e5-b790-ed221ff05892@oracle.com> <2A0A38E8-460D-4B66-A0A6-55ACE5A2688F@oracle.com> <75f90080-aaa5-5b9c-a1a3-f580eec4919b@oracle.com> <1ffdeaed-892c-cb5a-31b6-21955147008c@oracle.com> Message-ID: <7DFCD9B1-CADB-428A-9C22-3331A8FBE28E@oracle.com> > On 5 Sep 2018, at 23:22, Ekaterina Pavlova wrote: > > Well, compiler/graalunit/JttLangMathALTest.java doesn't really fail, the test just runs slowly because > slow org.graalvm.compiler.jtt.lang.Math_log sub-tests. Graal team doesn't see this slowness > when running these tests from Graal ws. However latest jdk is not used by default. The attempt > to use latest jdk failed because of 8209301. > > Let me know if I am missing anything. Nope - that clears things up for me - thanks. Once 8209301 is resolved, I can help with 8208100 (assuming I can reproduce it). -Doug > > thanks, > -katya > > On 9/5/18 2:14 PM, Doug Simon wrote: >>> On 5 Sep 2018, at 23:10, Ekaterina Pavlova wrote: >>> >>> Hi Doug, >>> >>> I have created JDK-8210434. >> Ok. I thought you were talking about a bug id for the failing tests. >> Dean, I'll re-assign JDK-8210434 to you since it's a jaotc issue. >> -Doug >>> On 9/5/18 1:29 PM, Doug Simon wrote: >>>> Hi Katya, >>>>> On 5 Sep 2018, at 22:17, Ekaterina Pavlova wrote: >>>>> >>>>> On 8/29/18 11:41 AM, Doug Simon wrote: >>>>>>> On 29 Aug 2018, at 19:23, dean.long at oracle.com wrote: >>>>>>> >>>>>>> On 8/29/18 2:11 AM, Doug Simon wrote: >>>>>>>> When running these tests on Graal tip against JDK 11, I get: >>>>>>>> >>>>>>>> mx --java-home=/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home unittest org.graalvm.compiler.jtt.lang.Math >>>>>>>> ... >>>>>>>> 10 longest running test classes: >>>>>>>> 21.115 ms org.graalvm.compiler.jtt.lang.Math_log10 >>>>>>>> 11.921 ms org.graalvm.compiler.jtt.lang.Math_log >>>>>>>> 10.460 ms org.graalvm.compiler.jtt.lang.Math_sqrt >>>>>>>> 3.525 ms org.graalvm.compiler.jtt.lang.Math_pow >>>>>>>> 1.937 ms org.graalvm.compiler.jtt.lang.Math_sin >>>>>>>> 1.689 ms org.graalvm.compiler.jtt.lang.Math_tan >>>>>>>> 1.550 ms org.graalvm.compiler.jtt.lang.Math_exp >>>>>>>> 1.537 ms org.graalvm.compiler.jtt.lang.Math_cos >>>>>>>> 1.526 ms org.graalvm.compiler.jtt.lang.Math_abs >>>>>>>> 338 ms org.graalvm.compiler.jtt.lang.Math_round >>>>>>>> 10 longest running tests: >>>>>>>> 10.583 ms run7(org.graalvm.compiler.jtt.lang.Math_log) >>>>>>>> 10.335 ms run7(org.graalvm.compiler.jtt.lang.Math_sqrt) >>>>>>>> 3.468 ms run11(org.graalvm.compiler.jtt.lang.Math_pow) >>>>>>>> 1.666 ms run5(org.graalvm.compiler.jtt.lang.Math_sin) >>>>>>>> 1.533 ms run5(org.graalvm.compiler.jtt.lang.Math_tan) >>>>>>>> 1.519 ms run8(org.graalvm.compiler.jtt.lang.Math_exp) >>>>>>>> 1.456 ms run3(org.graalvm.compiler.jtt.lang.Math_cos) >>>>>>>> 1.371 ms run7(org.graalvm.compiler.jtt.lang.Math_abs) >>>>>>>> 1.024 ms run0(org.graalvm.compiler.jtt.lang.Math_log) >>>>>>>> 84 ms run0(org.graalvm.compiler.jtt.lang.Math_sin) >>>>>>>> >>>>>>>> All seems as expected. >>>>>>>> >>>>>>>> I wanted to try it against JDK tip but https://bugs.openjdk.java.net/browse/JDK-8209301 means GitHub Graal cannot be compiled against the latest JDK: >>>>>>>> >>>>>>>> /Users/dsimon/graal/graal/compiler/src/jdk.tools.jaotc/src/jdk/tools/jaotc/AOTCompiledClass.java:77: error: cannot find symbol >>>>>>>> this.metadataName = type.isAnonymous() ? "anon<" + classId + ">" : type.getName(); >>>>>>>> ^ >>>>>>>> symbol: method isAnonymous() >>>>>>>> location: variable type of type HotSpotResolvedObjectType >>>>>>>> 1 error >>>>>>>> >>>>>>>> To fix this, AOTCompiledClass.java will either have to use reflection to access HotSpotResolvedObjectType.isAnonymous[Unsafe]Anonymous or a jdk12 versioned project will have to be added. >>>>>>> >>>>>>> It looks like they forgot to rename getHostClass(), so we could replace uses of isAnonymous() with getHostClass() != null. I think I added isAnonymous() first and then getHostClass() was added later. >>>>>> I assume `getHostClass() != null` is more expensive than `isAnonymous()` but it probably doesn't matter here. Either way, this will have to be resolved before the next Graal sync. >>>>> >>>>> Doug, could you please point to the bug id this issue is going to be tracked by. >>>> I don't have a bug id for this issue - feel free to open one and assign it to me. >>>> I left a note pointing out the Graal compilation issue along with Dean's recommended fix: >>>> https://bugs.openjdk.java.net/browse/JDK-8209301?focusedCommentId=14208481&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14208481 >>>> -Doug >>>>>> -Doug >>>>>>>> -Doug >>>>>>>> >>>>>>>>> On 29 Aug 2018, at 10:08, Ekaterina Pavlova wrote: >>>>>>>>> >>>>>>>>> I meant org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_sqrt >>>>>>>>> >>>>>>>>> >>>>>>>>> On 8/29/18 1:05 AM, Ekaterina Pavlova wrote: >>>>>>>>>> org.graalvm.compiler.jtt.lang.Math_log and org.graalvm.compiler.jtt.lang.Math_log always were more than 10 times slower than other org.graalvm.compiler.jtt.lang.Math tests. But I agree, lets wait what Graal team will say regarding this slowness. >>>>>>>>>> thanks, >>>>>>>>>> -katya >>>>>>>>>> On 8/28/18 3:30 PM, Vladimir Kozlov wrote: >>>>>>>>>>> Before increase timeout Labs should look on this test and investigate it strange behavior - last iteration takes long time: >>>>>>>>>>> >>>>>>>>>>> run5: Passed 228.9 ms >>>>>>>>>>> run6: Passed 145.7 ms >>>>>>>>>>> run7: Passed 833395.5 ms >>>>>>>>>>> org.graalvm.compiler.jtt.lang.Math_log finished 836029.5 ms >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Vladimir >>>>>>>>>>> >>>>>>>>>>> On 8/27/18 9:03 AM, Ekaterina Pavlova wrote: >>>>>>>>>>>> Hi All, >>>>>>>>>>>> >>>>>>>>>>>> compiler/graalunit/JttLangMathALTest.java continues to timeout on slow machines. >>>>>>>>>>>> Increased default timeout (120 seconds) in twice. Please review. >>>>>>>>>>>> >>>>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8208100 >>>>>>>>>>>> webrev: http://cr.openjdk.java.net/~epavlova/8208100/webrev.00/index.html >>>>>>>>>>>> testing: Tested by running the test 10 times on all platforms. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> thanks, >>>>>>>>>>>> -katya >>> > From gromero at linux.vnet.ibm.com Wed Sep 5 22:18:27 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 5 Sep 2018 19:18:27 -0300 Subject: RFR (S): 8209972: [GRAAL] Don't run RTM tests with Graal In-Reply-To: <9e7a0c2b-47c0-45b7-a394-a88849cfa3a1@oracle.com> References: <9148c105-1bd8-0852-9130-6b3c7823330e@linux.vnet.ibm.com> <3ae68161-0f78-2766-655f-fe3fe4a4719f@oracle.com> <9e7a0c2b-47c0-45b7-a394-a88849cfa3a1@oracle.com> Message-ID: <15d5f2c1-96ce-c379-233f-e71218a63e1e@linux.vnet.ibm.com> Hi Vladimir, On 09/04/2018 03:40 PM, Vladimir Kozlov wrote: > Thank you Gustavo for detailed answer. > > I agree with your suggestion to keep vm.rtm.cpu and remove vm.rtm.os. Changes in VMProps.java are good now. Thanks for reviewing it! > About property check (flavor == "server" & !emulatedClient) in tests. If C2 (and Graal in a future when it supports RTM) is used, this check is always true because C2 (and Graal) is only available in Server VM and it is disabled in emulatedClient setting. This check was added before we have C2 usage check which can be done now with vm.rtm.compiler. Thanks, I was not aware of it. I've updated the webrev removing "flavor == "server" & !emulatedClient": http://cr.openjdk.java.net/~gromero/8209972/v3/ "hg diff --patience": http://cr.openjdk.java.net/~gromero/8209972/v3/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v3.diff Testing (on Linux): ** X86_64 w/ CPU+OS RTM support + Graal VM ** Test results: no tests selected (all RTM tests skipped) ** POWER8 w/ CPU+OS support ** Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java Passed: compiler/rtm/cli/TestRTMRetryCountOption.java Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java Passed: compiler/rtm/locking/TestRTMAbortRatio.java Passed: compiler/rtm/locking/TestRTMAbortThreshold.java Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java Passed: compiler/rtm/locking/TestRTMLockingThreshold.java Passed: compiler/rtm/locking/TestRTMRetryCount.java Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java Passed: compiler/rtm/locking/TestUseRTMDeopt.java Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java Test results: passed: 30 ** X86_64 w/ CPU+OS support ** Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java Passed: compiler/rtm/cli/TestRTMRetryCountOption.java Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java Passed: compiler/rtm/locking/TestRTMAbortRatio.java Passed: compiler/rtm/locking/TestRTMAbortThreshold.java Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java Passed: compiler/rtm/locking/TestRTMLockingThreshold.java Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java Passed: compiler/rtm/locking/TestRTMRetryCount.java Passed: compiler/rtm/locking/TestUseRTMDeopt.java Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java Test results: passed: 30 ** POWER7 wo/ CPU+OS RTM support ** Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java Passed: compiler/rtm/cli/TestRTMRetryCountOption.java Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java Test results: passed: 10 ** POWER9 NV CPU w/ CPU RTM support and OS wo/ RTM support ** Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java Passed: compiler/rtm/cli/TestRTMRetryCountOption.java Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java Test results: passed: 10 Best regards, Gustavo > Thanks, > Vladimir > > On 9/3/18 3:15 PM, Gustavo Romero wrote: >> Hi Vladimir, >> >> Thanks a lot for reviewing it and for your comments. >> >> On 08/31/2018 03:12 PM, Vladimir Kozlov wrote: >>> Hi Gustavo, >>> >>> I think you should replace !Compiler.isGraalEnabled() with Compiler.isC2Enabled() because C2 may be switched off with TieredStopAtLevel < 4 flag >> >> Yes, although currently afaics all tests will explicitly enabled C2 (for >> instance, passing -Xcomp and -XX:-TieredCompilation) or implicitly enable C2 >> through a warming up before testing, I agree that nothing forbids one to >> switch off C2 with TieredStopAtLevel = level < 4 for a test case. It also >> looks better to list explicitly which compilers do support RTM instead of >> the ones that don't support it. >> >> I've updated the webrev accordingly: >> >> http://cr.openjdk.java.net/~gromero/8209972/v2/ >> >> diff in there looks odd so I generated another one with --patience for a >> better (IMO) diff format: >> >> http://cr.openjdk.java.net/~gromero/8209972/v2/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v2.diff >> >> >>> Also can platforms check be replaced with one vmRTMCPU()? Is ppc64 return true from vmRTMCPU()? >> >> For example, on Linux the following cases are possible regarding CPU / OS >> RTM support: >> >> POWER7 : cpu = false, os = false => vm.rtm.cpu = false >> POWER8 : cpu = true, os = false | true => vm.rtm.cpu = false | true >> POWER9 VM: cpu = true, os = false | true => vm.rtm.cpu = false | true >> POWER9 NV: cpu = true, os = false => vm.rtm.cpu = false >> >> PPC64 will return 'true' for vmRTMCPU() _only_ if both CPU and OS support >> RTM. In other words, when @requires asks for "vm.rtm.cpu & vm.rtm.os" it >> really looks like a tautology because if "vm.rtm.cpu" is 'true' it implies >> "vm.rtm.os" being 'true' as well, otherwise the JVM would never advertise >> the "rtm" feature (which is used by "vm.rtm.cpu"). That seems true for >> Linux and for AIX. >> >> That said I don't think that the platforms check can be replaced with one >> vmRTMCPU(), because in some cases it's necessary to run a test for >> cpu = false and compiler = true, i.e. it's necessary to run a test on an >> unsupported CPU for a given platform _only if_ the compiler in use supports >> RTM (like C2). So if, for instance, we do: >> >> 'vm.rtm.compiler = C2 && vmRTMCPU()' and use 'vm.rtm.compiler' in @requires >> we "tie" CPU+OS RTM support to compiler RTM support and the evaluation >> returns 'false' for cpu = false and compiler = true, skipping the test >> (vm.rtm.compiler = false). On the other hand, to evaluate 'vm.rtm.compiler' >> as 'true' and run the test in that case one could match for >> '!vm.rtm.compiler', but if compiler = false (Graal) '!vm.rtm.compiler' will >> be evaluated as 'true' and the test will run even thought the Graal >> compiler is selected, which is wrong. >> >> Hence 'vm.rtm.compiler' property must not rely on vmRTMCPU() and must >> contain its own list of supported compilers with RTM support for each >> platform IMO. Basically we can't ask the JVM about the compiler's support >> for RTM, since the JVM can only tell us about the CPU+OS support for RTM >> regarding the CPU and OS in which the JVM is running on. >> >> >>> And since you check cpu in here why not to replace all vm.rtm.* with one vm.rtm.supported? In such case you would need only one @requires checks in tests instead of: >>> >>> vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os & vm.rtm.compiler >> >> I think it's not possible either. Currently there are 5 match cases in >> RTM tests: >> >> gromero at moog:~/hg/jdk/jdk/test/hotspot/jtreg/compiler/rtm$ fgrep @require -RIn | cut -d ' ' -f 2-40 | sort -u >> * @requires !(vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os) >> * @requires vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os >> * @requires (!vm.rtm.cpu) & (vm.flavor == "server" & !vm.emulatedClient) >> * @requires vm.rtm.cpu & !(vm.flavor == "server" & !vm.emulatedClient) >> >> which can be simplified 5 cases as: >> >> 1: !(flavor == "server" & !emulatedClient & cpu & os) >> 2: flavor == "server" & !emulatedClient & cpu & os >> 3: (!cpu) & (flavor == "server" & !emulatedClient) >> 4: cpu & !(flavor == "server" & !emulatedClient) >> 5: no @requires >> >> I understand that case 1 and 2 (since CPU implies OS) can be simplified as: >> >> >> 1: !(flavor == "server" & !emulatedClient & cpu) >> 2: flavor == "server" & !emulatedClient & cpu >> 3: (!cpu) & (flavor == "server" & !emulatedClient) >> 4: cpu & !(flavor == "server" & !emulatedClient) >> 5: no @requires >> >> and case 1 and 2 are mere opposites, so we have 4 cases: >> >> 1: !(flavor == "server" & !emulatedClient & cpu) >> 3: (!cpu) & (flavor == "server" & !emulatedClient) >> 4: cpu & !(flavor == "server" & !emulatedClient) >> 5: no @requires >> >> We could simplify further making P = (flavor == "server" & !emulatedClient), >> and make: >> >> 1: !(P & cpu) >> 3: (!cpu) & (P) >> 4: cpu & !(P) >> 5: no @requires >> >> So if we add a compiler = C2 && (x64 | PPC) property to each of them in >> order to control running the tests only if the selected compiler on a >> given platform has RTM support (skipping Graal, for instance): >> >> 1: !(P & cpu) & compiler >> 3: (!cpu) & (P) & compiler >> 4: cpu & !(P) & compiler >> 5: no @requires & compiler >> >> So it looks like that at minimum we would need 3 properties, but IMO it's >> not worth to add another property P = (flavor == "server" & !emulatedClient) >> just to simplify further the @requires line. >> >> In summing up, I think it's only possible to replace 'cpu & os' by 'cpu', >> so I updated the webrev removing the vm.rtm.os property and keeping only >> vm.rtm.cpu and vm.rtm.compiler (plus flavor and emulatedClient checks). >> >> I've tested the following scenarios and observed no regression [1]: >> >> 1. X86_64 w/ RTM >> 2. X86_64 w/ RTM + Graal enabled >> 3. POWER7: no CPU+OS support for RTM >> 4. POWER8: CPU+OS support for RTM >> >> But I think we need a confirmation from SAP about AIX. >> >> >> Best regards, >> Gustavo >> >> [1] >> >> ** X86_64 w/ RTM ** >> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >> Passed: compiler/rtm/locking/TestRTMRetryCount.java >> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >> Test results: passed: 30 >> >> >> ** X86_64 w/ RTM + Graal enabled ** >> Test results: no tests selected (all RTM tests skipped) >> >> >> ** POWER7: no CPU+OS support for RTM ** >> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >> Test results: passed: 10 >> >> >> ** POWER8: CPU+OS support for RTM ** >> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >> Passed: compiler/rtm/locking/TestRTMRetryCount.java >> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >> Test results: passed: 30 >> >> >>> Thanks, >>> Vladimir >>> >>> On 8/31/18 8:38 AM, Gustavo Romero wrote: >>>> Hi, >>>> >>>> Could the following small change be reviewed please? >>>> >>>> Bug : https://bugs.openjdk.java.net/browse/JDK-8209972 >>>> Webrev: http://cr.openjdk.java.net/~gromero/8209972/v1/ >>>> >>>> It disables all RTM tests when a compiler not supporting RTM (e.g. Graal) >>>> is selected on platforms that can have CPU/OS with RTM support. >>>> >>>> It also disables all RTM tests for any other platform that has not a single >>>> compiler supporting RTM. >>>> >>>> The RTM support was first added to C2 compiler and once checkers for RTM >>>> (notably vm.rtm.cpu) find the feature "rtm" advertised by the JVM they >>>> assume that a compiler supporting RTM is available for sure ("rtm" is >>>> advertised only if RTM is supported by both CPU and OS). Later the JVM >>>> began to allow the selection of a compiler different from C2, like Graal, >>>> and it became possible to select a compiler without RTM support despite the >>>> fact that both the CPU and the OS support RTM. Thus for platforms >>>> supporting Graal or any other specific compiler the compiler availability for >>>> the RTM tests must be adjusted and if the selected compiler does not >>>> support RTM then all RTM tests must be skipped, including the ones meant >>>> for platforms without CPU or OS RTM support (e.g. *Unsupported*.java) >>>> because in some cases, like in TestUseRTMLockingOptionOnUnsupportedCPU.java, >>>> the test expects JVM initialization errors that will never occur because the >>>> problem is not that the RTM support for CPU or OS is missing, but rather >>>> because the selected compiler does not support RTM. >>>> >>>> That change adds a new VM property 'vm.rtm.compiler' which can be used to >>>> filter out compilers without RTM support for specific platforms and adapts >>>> the current RTM tests to use that new property. >>>> >>>> Nothing changes regarding the number of passing/selected tests for the >>>> various cpu/os/compiler combinations on platforms that currently might >>>> support RTM [1], except when Graal is in use. >>>> >>>> Thank you. >>>> >>>> Best regards, >>>> Gustavo >>>> >>>> >>>> [1] >>>> >>>> ** X64 w/ CPU and OS supporting RTM ** >>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>> Test results: passed: 30 >>>> >>>> ** X64 w/ CPU and OS supporting RTM + Graal compiler wo/ RTM support ** >>>> Test results: no tests selected (all RTM tests skipped) >>>> >>>> ** POWER8 w/ CPU and OS supporting RTM ** >>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>> Test results: passed: 30 >>>> >>>> ** POWER7 wo/ CPU and OS supporting RTM ** >>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>> Test results: passed: 10 >>>> >>> >> > From vladimir.kozlov at oracle.com Wed Sep 5 22:54:32 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 5 Sep 2018 15:54:32 -0700 Subject: RFR (S): 8209972: [GRAAL] Don't run RTM tests with Graal In-Reply-To: <15d5f2c1-96ce-c379-233f-e71218a63e1e@linux.vnet.ibm.com> References: <9148c105-1bd8-0852-9130-6b3c7823330e@linux.vnet.ibm.com> <3ae68161-0f78-2766-655f-fe3fe4a4719f@oracle.com> <9e7a0c2b-47c0-45b7-a394-a88849cfa3a1@oracle.com> <15d5f2c1-96ce-c379-233f-e71218a63e1e@linux.vnet.ibm.com> Message-ID: <2e905821-4525-e210-5cc5-50a6f09a39f6@oracle.com> v3 looks good. Thanks, Vladimir On 9/5/18 3:18 PM, Gustavo Romero wrote: > Hi Vladimir, > > On 09/04/2018 03:40 PM, Vladimir Kozlov wrote: >> Thank you Gustavo for detailed answer. >> >> I agree with your suggestion to keep vm.rtm.cpu and remove vm.rtm.os. Changes in VMProps.java are good now. > > Thanks for reviewing it! > > >> About property check (flavor == "server" & !emulatedClient) in tests. If C2 (and Graal in a future when it supports >> RTM) is used, this check is always true because C2 (and Graal) is only available in Server VM and it is disabled in >> emulatedClient setting. This check was added before we have C2 usage check which can be done now with vm.rtm.compiler. > > Thanks, I was not aware of it. I've updated the webrev removing > "flavor == "server" & !emulatedClient": > > http://cr.openjdk.java.net/~gromero/8209972/v3/ > > "hg diff --patience": > > http://cr.openjdk.java.net/~gromero/8209972/v3/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v3.diff > > Testing (on Linux): > > ** X86_64 w/ CPU+OS RTM support + Graal VM ** > Test results: no tests selected (all RTM tests skipped) > > ** POWER8 w/ CPU+OS support ** > Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java > Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java > Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java > Passed: compiler/rtm/cli/TestRTMRetryCountOption.java > Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java > Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java > Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java > Passed: compiler/rtm/locking/TestRTMAbortRatio.java > Passed: compiler/rtm/locking/TestRTMAbortThreshold.java > Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java > Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java > Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java > Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java > Passed: compiler/rtm/locking/TestRTMLockingThreshold.java > Passed: compiler/rtm/locking/TestRTMRetryCount.java > Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java > Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java > Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java > Passed: compiler/rtm/locking/TestUseRTMDeopt.java > Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java > Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java > Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java > Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java > Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java > Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java > Test results: passed: 30 > > ** X86_64 w/ CPU+OS support ** > Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java > Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java > Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java > Passed: compiler/rtm/cli/TestRTMRetryCountOption.java > Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java > Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java > Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java > Passed: compiler/rtm/locking/TestRTMAbortRatio.java > Passed: compiler/rtm/locking/TestRTMAbortThreshold.java > Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java > Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java > Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java > Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java > Passed: compiler/rtm/locking/TestRTMLockingThreshold.java > Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java > Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java > Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java > Passed: compiler/rtm/locking/TestRTMRetryCount.java > Passed: compiler/rtm/locking/TestUseRTMDeopt.java > Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java > Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java > Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java > Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java > Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java > Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java > Test results: passed: 30 > > ** POWER7 wo/ CPU+OS RTM support ** > Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java > Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java > Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java > Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java > Passed: compiler/rtm/cli/TestRTMRetryCountOption.java > Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java > Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java > Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java > Test results: passed: 10 > > ** POWER9 NV CPU w/ CPU RTM support and OS wo/ RTM support ** > Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java > Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java > Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java > Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java > Passed: compiler/rtm/cli/TestRTMRetryCountOption.java > Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java > Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java > Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java > Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java > Test results: passed: 10 > > > Best regards, > Gustavo > >> Thanks, >> Vladimir >> >> On 9/3/18 3:15 PM, Gustavo Romero wrote: >>> Hi Vladimir, >>> >>> Thanks a lot for reviewing it and for your comments. >>> >>> On 08/31/2018 03:12 PM, Vladimir Kozlov wrote: >>>> Hi Gustavo, >>>> >>>> I think you should replace !Compiler.isGraalEnabled() with Compiler.isC2Enabled() because C2 may be switched off >>>> with TieredStopAtLevel < 4 flag >>> >>> Yes, although currently afaics all tests will explicitly enabled C2 (for >>> instance, passing -Xcomp and -XX:-TieredCompilation) or implicitly enable C2 >>> through a warming up before testing, I agree that nothing forbids one to >>> switch off C2 with TieredStopAtLevel = level < 4 for a test case. It also >>> looks better to list explicitly which compilers do support RTM instead of >>> the ones that don't support it. >>> >>> I've updated the webrev accordingly: >>> >>> http://cr.openjdk.java.net/~gromero/8209972/v2/ >>> >>> diff in there looks odd so I generated another one with --patience for a >>> better (IMO) diff format: >>> >>> http://cr.openjdk.java.net/~gromero/8209972/v2/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v2.diff >>> >>> >>>> Also can platforms check be replaced with one vmRTMCPU()? Is ppc64 return true from vmRTMCPU()? >>> >>> For example, on Linux the following cases are possible regarding CPU / OS >>> RTM support: >>> >>> POWER7?? : cpu = false, os = false???????? => vm.rtm.cpu = false >>> POWER8?? : cpu = true,? os = false | true? => vm.rtm.cpu = false | true >>> POWER9 VM: cpu = true,? os = false | true? => vm.rtm.cpu = false | true >>> POWER9 NV: cpu = true,? os = false???????? => vm.rtm.cpu = false >>> >>> PPC64 will return 'true' for vmRTMCPU() _only_ if both CPU and OS support >>> RTM. In other words, when @requires asks for "vm.rtm.cpu & vm.rtm.os" it >>> really looks like a tautology because if "vm.rtm.cpu" is 'true' it implies >>> "vm.rtm.os" being 'true' as well, otherwise the JVM would never advertise >>> the "rtm" feature (which is used by "vm.rtm.cpu"). That seems true for >>> Linux and for AIX. >>> >>> That said I don't think that the platforms check can be replaced with one >>> vmRTMCPU(), because in some cases it's necessary to run a test for >>> cpu = false and compiler = true, i.e. it's necessary to run a test on an >>> unsupported CPU for a given platform _only if_ the compiler in use supports >>> RTM (like C2). So if, for instance, we do: >>> >>> 'vm.rtm.compiler = C2 && vmRTMCPU()' and use 'vm.rtm.compiler' in @requires >>> we "tie" CPU+OS RTM support to compiler RTM support and the evaluation >>> returns 'false' for cpu = false and compiler = true, skipping the test >>> (vm.rtm.compiler = false). On the other hand, to evaluate 'vm.rtm.compiler' >>> as 'true' and run the test in that case one could match for >>> '!vm.rtm.compiler', but if compiler = false (Graal) '!vm.rtm.compiler' will >>> be evaluated as 'true' and the test will run even thought the Graal >>> compiler is selected, which is wrong. >>> >>> Hence 'vm.rtm.compiler' property must not rely on vmRTMCPU() and must >>> contain its own list of supported compilers with RTM support for each >>> platform IMO. Basically we can't ask the JVM about the compiler's support >>> for RTM, since the JVM can only tell us about the CPU+OS support for RTM >>> regarding the CPU and OS in which the JVM is running on. >>> >>> >>>> And since you check cpu in here why not to replace all vm.rtm.* with one vm.rtm.supported? In such case you would >>>> need only one @requires checks in tests instead of: >>>> >>>> vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os & vm.rtm.compiler >>> >>> I think it's not possible either. Currently there are 5 match cases in >>> RTM tests: >>> >>> gromero at moog:~/hg/jdk/jdk/test/hotspot/jtreg/compiler/rtm$ fgrep @require -RIn | cut -d ' ' -f 2-40 | sort -u >>> * @requires !(vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os) >>> * @requires vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os >>> * @requires (!vm.rtm.cpu) & (vm.flavor == "server" & !vm.emulatedClient) >>> * @requires vm.rtm.cpu & !(vm.flavor == "server" & !vm.emulatedClient) >>> >>> which can be simplified 5 cases as: >>> >>> 1:????????? !(flavor == "server" & !emulatedClient? & cpu & os) >>> 2:??????????? flavor == "server" & !emulatedClient? & cpu & os >>> 3: (!cpu) &? (flavor == "server" & !emulatedClient) >>> 4:?? cpu? & !(flavor == "server" & !emulatedClient) >>> 5: no @requires >>> >>> I understand that case 1 and 2 (since CPU implies OS) can be simplified as: >>> >>> >>> 1:????????? !(flavor == "server" & !emulatedClient? & cpu) >>> 2:??????????? flavor == "server" & !emulatedClient? & cpu >>> 3: (!cpu) &? (flavor == "server" & !emulatedClient) >>> 4:?? cpu? & !(flavor == "server" & !emulatedClient) >>> 5: no @requires >>> >>> and case 1 and 2 are mere opposites, so we have 4 cases: >>> >>> 1:????????? !(flavor == "server" & !emulatedClient? & cpu) >>> 3: (!cpu) &? (flavor == "server" & !emulatedClient) >>> 4:?? cpu? & !(flavor == "server" & !emulatedClient) >>> 5: no @requires >>> >>> We could simplify further making P = (flavor == "server" & !emulatedClient), >>> and make: >>> >>> 1:????????? !(P & cpu) >>> 3: (!cpu) &? (P) >>> 4:?? cpu? & !(P) >>> 5: no @requires >>> >>> So if we add a compiler = C2 && (x64 | PPC) property to each of them in >>> order to control running the tests only if the selected compiler on a >>> given platform has RTM support (skipping Graal, for instance): >>> >>> 1:????????? !(P & cpu) & compiler >>> 3: (!cpu) &? (P)?????? & compiler >>> 4:?? cpu? & !(P)?????? & compiler >>> 5: no @requires??????? & compiler >>> >>> So it looks like that at minimum we would need 3 properties, but IMO it's >>> not worth to add another property P = (flavor == "server" & !emulatedClient) >>> just to simplify further the @requires line. >>> >>> In summing up, I think it's only possible to replace 'cpu & os' by 'cpu', >>> so I updated the webrev removing the vm.rtm.os property and keeping only >>> vm.rtm.cpu and vm.rtm.compiler (plus flavor and emulatedClient checks). >>> >>> I've tested the following scenarios and observed no regression [1]: >>> >>> 1. X86_64 w/ RTM >>> 2. X86_64 w/ RTM + Graal enabled >>> 3. POWER7: no CPU+OS support for RTM >>> 4. POWER8: CPU+OS support for RTM >>> >>> But I think we need a confirmation from SAP about AIX. >>> >>> >>> Best regards, >>> Gustavo >>> >>> [1] >>> >>> ** X86_64 w/ RTM ** >>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>> Test results: passed: 30 >>> >>> >>> ** X86_64 w/ RTM + Graal enabled ** >>> Test results: no tests selected (all RTM tests skipped) >>> >>> >>> ** POWER7: no CPU+OS support for RTM ** >>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>> Test results: passed: 10 >>> >>> >>> ** POWER8: CPU+OS support for RTM ** >>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>> Test results: passed: 30 >>> >>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 8/31/18 8:38 AM, Gustavo Romero wrote: >>>>> Hi, >>>>> >>>>> Could the following small change be reviewed please? >>>>> >>>>> Bug?? : https://bugs.openjdk.java.net/browse/JDK-8209972 >>>>> Webrev: http://cr.openjdk.java.net/~gromero/8209972/v1/ >>>>> >>>>> It disables all RTM tests when a compiler not supporting RTM (e.g. Graal) >>>>> is selected on platforms that can have CPU/OS with RTM support. >>>>> >>>>> It also disables all RTM tests for any other platform that has not a single >>>>> compiler supporting RTM. >>>>> >>>>> The RTM support was first added to C2 compiler and once checkers for RTM >>>>> (notably vm.rtm.cpu) find the feature "rtm" advertised by the JVM they >>>>> assume that a compiler supporting RTM is available for sure ("rtm" is >>>>> advertised only if RTM is supported by both CPU and OS). Later the JVM >>>>> began to allow the selection of a compiler different from C2, like Graal, >>>>> and it became possible to select a compiler without RTM support despite the >>>>> fact that both the CPU and the OS support RTM. Thus for platforms >>>>> supporting Graal or any other specific compiler the compiler availability for >>>>> the RTM tests must be adjusted and if the selected compiler does not >>>>> support RTM then all RTM tests must be skipped, including the ones meant >>>>> for platforms without CPU or OS RTM support (e.g. *Unsupported*.java) >>>>> because in some cases, like in TestUseRTMLockingOptionOnUnsupportedCPU.java, >>>>> the test expects JVM initialization errors that will never occur because the >>>>> problem is not that the RTM support for CPU or OS is missing, but rather >>>>> because the selected compiler does not support RTM. >>>>> >>>>> That change adds a new VM property 'vm.rtm.compiler' which can be used to >>>>> filter out compilers without RTM support for specific platforms and adapts >>>>> the current RTM tests to use that new property. >>>>> >>>>> Nothing changes regarding the number of passing/selected tests for the >>>>> various cpu/os/compiler combinations on platforms that currently might >>>>> support RTM [1], except when Graal is in use. >>>>> >>>>> Thank you. >>>>> >>>>> Best regards, >>>>> Gustavo >>>>> >>>>> >>>>> [1] >>>>> >>>>> ** X64 w/ CPU and OS supporting RTM ** >>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>>> Test results: passed: 30 >>>>> >>>>> ** X64 w/ CPU and OS supporting RTM + Graal compiler wo/ RTM support ** >>>>> Test results: no tests selected (all RTM tests skipped) >>>>> >>>>> ** POWER8 w/ CPU and OS supporting RTM ** >>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>>> Test results: passed: 30 >>>>> >>>>> ** POWER7 wo/ CPU and OS supporting RTM ** >>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>> Test results: passed: 10 >>>>> >>>> >>> >> > From sandhya.viswanathan at intel.com Wed Sep 5 23:09:33 2018 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Wed, 5 Sep 2018 23:09:33 +0000 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> Recently there have been couple of high priority issues with regards to high bank of XMM register (XMM16-XMM31) usage by C2: https://bugs.openjdk.java.net/browse/JDK-8207746 https://bugs.openjdk.java.net/browse/JDK-8209735 Please find below a patch which attempts to clean up the XMM register handling by using register groups. http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ The patch provides a restricted set of registers to the match rules in the ad file based on the underlying architecture. The aim is to remove special handling/workaround from macro assembler and assembler. By removing the special handling, the patch reduces the overall code size by about 1800 lines of code. Your review and feedback is very welcome. Best Regards, Sandhya -------------- next part -------------- An HTML attachment was scrubbed... URL: From gromero at linux.vnet.ibm.com Wed Sep 5 23:53:07 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 5 Sep 2018 20:53:07 -0300 Subject: RFR (S): 8209972: [GRAAL] Don't run RTM tests with Graal In-Reply-To: <2e905821-4525-e210-5cc5-50a6f09a39f6@oracle.com> References: <9148c105-1bd8-0852-9130-6b3c7823330e@linux.vnet.ibm.com> <3ae68161-0f78-2766-655f-fe3fe4a4719f@oracle.com> <9e7a0c2b-47c0-45b7-a394-a88849cfa3a1@oracle.com> <15d5f2c1-96ce-c379-233f-e71218a63e1e@linux.vnet.ibm.com> <2e905821-4525-e210-5cc5-50a6f09a39f6@oracle.com> Message-ID: On 09/05/2018 07:54 PM, Vladimir Kozlov wrote: > v3 looks good. Thanks a lot Vladimir. @Goetz, would you mind to review v3 please? It touches code meant for AIX but I don't expect any change in the end. http://cr.openjdk.java.net/~gromero/8209972/v3/ Thank you. Best regards, Gustavo > Thanks, > Vladimir > > On 9/5/18 3:18 PM, Gustavo Romero wrote: >> Hi Vladimir, >> >> On 09/04/2018 03:40 PM, Vladimir Kozlov wrote: >>> Thank you Gustavo for detailed answer. >>> >>> I agree with your suggestion to keep vm.rtm.cpu and remove vm.rtm.os. Changes in VMProps.java are good now. >> >> Thanks for reviewing it! >> >> >>> About property check (flavor == "server" & !emulatedClient) in tests. If C2 (and Graal in a future when it supports RTM) is used, this check is always true because C2 (and Graal) is only available in Server VM and it is disabled in emulatedClient setting. This check was added before we have C2 usage check which can be done now with vm.rtm.compiler. >> >> Thanks, I was not aware of it. I've updated the webrev removing >> "flavor == "server" & !emulatedClient": >> >> http://cr.openjdk.java.net/~gromero/8209972/v3/ >> >> "hg diff --patience": >> >> http://cr.openjdk.java.net/~gromero/8209972/v3/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v3.diff >> >> Testing (on Linux): >> >> ** X86_64 w/ CPU+OS RTM support + Graal VM ** >> Test results: no tests selected (all RTM tests skipped) >> >> ** POWER8 w/ CPU+OS support ** >> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >> Passed: compiler/rtm/locking/TestRTMRetryCount.java >> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >> Test results: passed: 30 >> >> ** X86_64 w/ CPU+OS support ** >> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >> Passed: compiler/rtm/locking/TestRTMRetryCount.java >> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >> Test results: passed: 30 >> >> ** POWER7 wo/ CPU+OS RTM support ** >> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >> Test results: passed: 10 >> >> ** POWER9 NV CPU w/ CPU RTM support and OS wo/ RTM support ** >> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >> Test results: passed: 10 >> >> >> Best regards, >> Gustavo >> >>> Thanks, >>> Vladimir >>> >>> On 9/3/18 3:15 PM, Gustavo Romero wrote: >>>> Hi Vladimir, >>>> >>>> Thanks a lot for reviewing it and for your comments. >>>> >>>> On 08/31/2018 03:12 PM, Vladimir Kozlov wrote: >>>>> Hi Gustavo, >>>>> >>>>> I think you should replace !Compiler.isGraalEnabled() with Compiler.isC2Enabled() because C2 may be switched off with TieredStopAtLevel < 4 flag >>>> >>>> Yes, although currently afaics all tests will explicitly enabled C2 (for >>>> instance, passing -Xcomp and -XX:-TieredCompilation) or implicitly enable C2 >>>> through a warming up before testing, I agree that nothing forbids one to >>>> switch off C2 with TieredStopAtLevel = level < 4 for a test case. It also >>>> looks better to list explicitly which compilers do support RTM instead of >>>> the ones that don't support it. >>>> >>>> I've updated the webrev accordingly: >>>> >>>> http://cr.openjdk.java.net/~gromero/8209972/v2/ >>>> >>>> diff in there looks odd so I generated another one with --patience for a >>>> better (IMO) diff format: >>>> >>>> http://cr.openjdk.java.net/~gromero/8209972/v2/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v2.diff >>>> >>>> >>>>> Also can platforms check be replaced with one vmRTMCPU()? Is ppc64 return true from vmRTMCPU()? >>>> >>>> For example, on Linux the following cases are possible regarding CPU / OS >>>> RTM support: >>>> >>>> POWER7 : cpu = false, os = false => vm.rtm.cpu = false >>>> POWER8 : cpu = true, os = false | true => vm.rtm.cpu = false | true >>>> POWER9 VM: cpu = true, os = false | true => vm.rtm.cpu = false | true >>>> POWER9 NV: cpu = true, os = false => vm.rtm.cpu = false >>>> >>>> PPC64 will return 'true' for vmRTMCPU() _only_ if both CPU and OS support >>>> RTM. In other words, when @requires asks for "vm.rtm.cpu & vm.rtm.os" it >>>> really looks like a tautology because if "vm.rtm.cpu" is 'true' it implies >>>> "vm.rtm.os" being 'true' as well, otherwise the JVM would never advertise >>>> the "rtm" feature (which is used by "vm.rtm.cpu"). That seems true for >>>> Linux and for AIX. >>>> >>>> That said I don't think that the platforms check can be replaced with one >>>> vmRTMCPU(), because in some cases it's necessary to run a test for >>>> cpu = false and compiler = true, i.e. it's necessary to run a test on an >>>> unsupported CPU for a given platform _only if_ the compiler in use supports >>>> RTM (like C2). So if, for instance, we do: >>>> >>>> 'vm.rtm.compiler = C2 && vmRTMCPU()' and use 'vm.rtm.compiler' in @requires >>>> we "tie" CPU+OS RTM support to compiler RTM support and the evaluation >>>> returns 'false' for cpu = false and compiler = true, skipping the test >>>> (vm.rtm.compiler = false). On the other hand, to evaluate 'vm.rtm.compiler' >>>> as 'true' and run the test in that case one could match for >>>> '!vm.rtm.compiler', but if compiler = false (Graal) '!vm.rtm.compiler' will >>>> be evaluated as 'true' and the test will run even thought the Graal >>>> compiler is selected, which is wrong. >>>> >>>> Hence 'vm.rtm.compiler' property must not rely on vmRTMCPU() and must >>>> contain its own list of supported compilers with RTM support for each >>>> platform IMO. Basically we can't ask the JVM about the compiler's support >>>> for RTM, since the JVM can only tell us about the CPU+OS support for RTM >>>> regarding the CPU and OS in which the JVM is running on. >>>> >>>> >>>>> And since you check cpu in here why not to replace all vm.rtm.* with one vm.rtm.supported? In such case you would need only one @requires checks in tests instead of: >>>>> >>>>> vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os & vm.rtm.compiler >>>> >>>> I think it's not possible either. Currently there are 5 match cases in >>>> RTM tests: >>>> >>>> gromero at moog:~/hg/jdk/jdk/test/hotspot/jtreg/compiler/rtm$ fgrep @require -RIn | cut -d ' ' -f 2-40 | sort -u >>>> * @requires !(vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os) >>>> * @requires vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os >>>> * @requires (!vm.rtm.cpu) & (vm.flavor == "server" & !vm.emulatedClient) >>>> * @requires vm.rtm.cpu & !(vm.flavor == "server" & !vm.emulatedClient) >>>> >>>> which can be simplified 5 cases as: >>>> >>>> 1: !(flavor == "server" & !emulatedClient & cpu & os) >>>> 2: flavor == "server" & !emulatedClient & cpu & os >>>> 3: (!cpu) & (flavor == "server" & !emulatedClient) >>>> 4: cpu & !(flavor == "server" & !emulatedClient) >>>> 5: no @requires >>>> >>>> I understand that case 1 and 2 (since CPU implies OS) can be simplified as: >>>> >>>> >>>> 1: !(flavor == "server" & !emulatedClient & cpu) >>>> 2: flavor == "server" & !emulatedClient & cpu >>>> 3: (!cpu) & (flavor == "server" & !emulatedClient) >>>> 4: cpu & !(flavor == "server" & !emulatedClient) >>>> 5: no @requires >>>> >>>> and case 1 and 2 are mere opposites, so we have 4 cases: >>>> >>>> 1: !(flavor == "server" & !emulatedClient & cpu) >>>> 3: (!cpu) & (flavor == "server" & !emulatedClient) >>>> 4: cpu & !(flavor == "server" & !emulatedClient) >>>> 5: no @requires >>>> >>>> We could simplify further making P = (flavor == "server" & !emulatedClient), >>>> and make: >>>> >>>> 1: !(P & cpu) >>>> 3: (!cpu) & (P) >>>> 4: cpu & !(P) >>>> 5: no @requires >>>> >>>> So if we add a compiler = C2 && (x64 | PPC) property to each of them in >>>> order to control running the tests only if the selected compiler on a >>>> given platform has RTM support (skipping Graal, for instance): >>>> >>>> 1: !(P & cpu) & compiler >>>> 3: (!cpu) & (P) & compiler >>>> 4: cpu & !(P) & compiler >>>> 5: no @requires & compiler >>>> >>>> So it looks like that at minimum we would need 3 properties, but IMO it's >>>> not worth to add another property P = (flavor == "server" & !emulatedClient) >>>> just to simplify further the @requires line. >>>> >>>> In summing up, I think it's only possible to replace 'cpu & os' by 'cpu', >>>> so I updated the webrev removing the vm.rtm.os property and keeping only >>>> vm.rtm.cpu and vm.rtm.compiler (plus flavor and emulatedClient checks). >>>> >>>> I've tested the following scenarios and observed no regression [1]: >>>> >>>> 1. X86_64 w/ RTM >>>> 2. X86_64 w/ RTM + Graal enabled >>>> 3. POWER7: no CPU+OS support for RTM >>>> 4. POWER8: CPU+OS support for RTM >>>> >>>> But I think we need a confirmation from SAP about AIX. >>>> >>>> >>>> Best regards, >>>> Gustavo >>>> >>>> [1] >>>> >>>> ** X86_64 w/ RTM ** >>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>> Test results: passed: 30 >>>> >>>> >>>> ** X86_64 w/ RTM + Graal enabled ** >>>> Test results: no tests selected (all RTM tests skipped) >>>> >>>> >>>> ** POWER7: no CPU+OS support for RTM ** >>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>> Test results: passed: 10 >>>> >>>> >>>> ** POWER8: CPU+OS support for RTM ** >>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>> Test results: passed: 30 >>>> >>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 8/31/18 8:38 AM, Gustavo Romero wrote: >>>>>> Hi, >>>>>> >>>>>> Could the following small change be reviewed please? >>>>>> >>>>>> Bug : https://bugs.openjdk.java.net/browse/JDK-8209972 >>>>>> Webrev: http://cr.openjdk.java.net/~gromero/8209972/v1/ >>>>>> >>>>>> It disables all RTM tests when a compiler not supporting RTM (e.g. Graal) >>>>>> is selected on platforms that can have CPU/OS with RTM support. >>>>>> >>>>>> It also disables all RTM tests for any other platform that has not a single >>>>>> compiler supporting RTM. >>>>>> >>>>>> The RTM support was first added to C2 compiler and once checkers for RTM >>>>>> (notably vm.rtm.cpu) find the feature "rtm" advertised by the JVM they >>>>>> assume that a compiler supporting RTM is available for sure ("rtm" is >>>>>> advertised only if RTM is supported by both CPU and OS). Later the JVM >>>>>> began to allow the selection of a compiler different from C2, like Graal, >>>>>> and it became possible to select a compiler without RTM support despite the >>>>>> fact that both the CPU and the OS support RTM. Thus for platforms >>>>>> supporting Graal or any other specific compiler the compiler availability for >>>>>> the RTM tests must be adjusted and if the selected compiler does not >>>>>> support RTM then all RTM tests must be skipped, including the ones meant >>>>>> for platforms without CPU or OS RTM support (e.g. *Unsupported*.java) >>>>>> because in some cases, like in TestUseRTMLockingOptionOnUnsupportedCPU.java, >>>>>> the test expects JVM initialization errors that will never occur because the >>>>>> problem is not that the RTM support for CPU or OS is missing, but rather >>>>>> because the selected compiler does not support RTM. >>>>>> >>>>>> That change adds a new VM property 'vm.rtm.compiler' which can be used to >>>>>> filter out compilers without RTM support for specific platforms and adapts >>>>>> the current RTM tests to use that new property. >>>>>> >>>>>> Nothing changes regarding the number of passing/selected tests for the >>>>>> various cpu/os/compiler combinations on platforms that currently might >>>>>> support RTM [1], except when Graal is in use. >>>>>> >>>>>> Thank you. >>>>>> >>>>>> Best regards, >>>>>> Gustavo >>>>>> >>>>>> >>>>>> [1] >>>>>> >>>>>> ** X64 w/ CPU and OS supporting RTM ** >>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>>>> Test results: passed: 30 >>>>>> >>>>>> ** X64 w/ CPU and OS supporting RTM + Graal compiler wo/ RTM support ** >>>>>> Test results: no tests selected (all RTM tests skipped) >>>>>> >>>>>> ** POWER8 w/ CPU and OS supporting RTM ** >>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>>>> Test results: passed: 30 >>>>>> >>>>>> ** POWER7 wo/ CPU and OS supporting RTM ** >>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>>> Test results: passed: 10 >>>>>> >>>>> >>>> >>> >> > From HORIE at jp.ibm.com Thu Sep 6 03:27:34 2018 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Thu, 6 Sep 2018 12:27:34 +0900 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: References: <346da54af45243c4bdaf475f118a450d@sap.com> <9553d65d98f74f37a35b49a1e39f015e@sap.com> <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com> <57ebd30a66504577a6b2ec267aee4b69@sap.com> <8a093a6e-631e-4c20-07dc-96bc29f8c411@linux.vnet.ibm.com> Message-ID: Hi Martin, Gustavo, Thank you for giving the detailed discussions and narrowing down the current issue on ppc64. > We haven't seen any issues with the current code, but I think this is affects jdk11, too. (We could also switch off SuperwordUseVSX for jdk11u.) Do you agree? Yes, I agree. Following is the latest webrev switching off SuperwordUseVSX by default. http://cr.openjdk.java.net/~mhorie/8208171/webrev.04/ Best regards, -- Michihiro, IBM Research - Tokyo From: Gustavo Romero To: Michihiro Horie/Japan/IBM at IBMJP Cc: "Lindenmaier, Goetz" , hotspot compiler , "Doerr, Martin" Date: 2018/09/06 03:34 Subject: Re: RFR: 8208171: PPC64: Enrich SLP support Hi Michi, On 09/05/2018 07:22 AM, Michihiro Horie wrote: > Hi Martin, Gustavo, > > I cannot still reproduce the problem. I noticed the machine I have is not SUSE but OpenSUSE with 4.1.21-14-default. I've also tried kernel 4.4.0-31-generic but it's Ubuntu. > > Gustavo, is there any suspicious change before/after v4.4, which Martin got the crash? Nope, nothing I'm aware of... However looks like Martin found no issues with your last revision. Anyway, if you need a machine with SLES 12 SP3 installed I have one that I can share. Drop me a Slack message if you need it. Regards, Gustavo > > Apart from the problem, I uploaded the latest webrev:< http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.03/> > http://cr.openjdk.java.net/~mhorie/8208171/webrev.03/ < http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.03/> > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > Inactive hide details for Gustavo Romero ---2018/09/05 07:03:31---Hi Martin and Michi, On 09/04/2018 01:20 PM, Doerr, Martin wrGustavo Romero ---2018/09/05 07:03:31---Hi Martin and Michi, On 09/04/2018 01:20 PM, Doerr, Martin wrote: > > From: Gustavo Romero > To: "Doerr, Martin" , Michihiro Horie/Japan/IBM at IBMJP > Cc: "Lindenmaier, Goetz" , hotspot compiler > Date: 2018/09/05 07:03 > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > Hi Martin and Michi, > > On 09/04/2018 01:20 PM, Doerr, Martin wrote: > > Can you reproduce the test failures? > > > > The very same VM works fine on a different Power8 machine which uses the same instructions by C2. > > > > The VM was built on the machine where it works ("SUSE Linux Enterprise Server 12 SP1"). > > > > I have seen several linux kernel changes regarding saving and restoring the VSX registers. > > > > I still haven?t found out how the kernel determines things like ?tsk-> thread.used_vsr? which is used to set ?msr |= MSR_VEC?. > > > > Maybe something is missing which tells the kernel that we?re using it. But that?s just a guess. > > Facilities like FP (fp registers), VEC (vector registers - aka VMX/Altivec), and > VSX (vector-scalar registers) are usually disabled on a new born process. Once > any instruction associated to these facilities is used in the process it causes > an exception that is treated by the kernel [1, 2, 3]: kernel enables the > facility that caused the exception (see load_up_fp & friends) and re-execute the > instruction when kernel returns the control back to the process in userspace. > > Starting from kernel v4.6 [4] there is a simple heuristic that employs a 8-bit > counter to help track if a process, after using these facilities for the first > time, continues to use the facilities. The counters (load_fp and load_vec) are > incremented on each context switch and if the process stops using the FP or VEC > facilities then they are disabled again with FP/VEC/VSX save/restore on context > switches being disabled as well in order to improve the performance on context > switches by avoiding the FP/VEC/VEX register save/restore. > > Either way (before or after the change introduced in v4.6) *that mechanism is > opaque to userspace*, particularly to the process using these facilities. If a > given facility is not enabled by the kernel (in case the CPU does not support > it, kernel sends a SIGILL to the process). It's possible to inspect the thread > member dynamics/state from userspace using tools like 'systemtap' (for > exemple, this simple script can be used to inspect a VRSAVE registers on given > thread that is running a program called 'vrsave_' [5]) or using the 'perf' tool. > > "tsk->thread.used_vsr" [6] is actually associated to the VSX facility whilst > MSR_VEC is associated to the VEC/VMX/Altivec facility [7], so > "tsk->thread.used_vsr" is set to 1 once a VSX instruction is used (if it's a new > process or if the load_fp and load_vec counters overflowed and became zero > disabling VSX or if only FP or only VEC ?- not both - were used in the process). > In that case kernel will also enable the VSX by MSR |= MSR_VSX. A similar > mechanism drives the FP (MSR_FP) and the VEC (MSR_VEC) facilities. > > If both FP and VEC facilities are used the VSX facility is enabled automatically > since FP+VEC regsets == VSX regset [8]. > > Thus as this mechanism is entirely opaque to userspace I understand that if a > program has to tell to kernel it wants to use any of these facilities > (FP/VEC/VEC) before using it there is something wrong going in kernelspace. > > Martin and Michi, if you want any help on drilling it further at kernel side > please let me know, maybe I can help. > > I didn't have the chance to reproduce the crash yet, so if I find anything > meaningful about it tomorrow I'll keep you posted. > > > Kind regards, > Gustavo > > [1] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L851-L869? ?(FP) > [2] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211?(VEC/VMX/Altivec) > [3] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211?(VSX) > [4] https://github.com/torvalds/linux/commit/70fe3d980#diff-ef76830326856a12ea2b45630123d1adR239 > [5] http://cr.openjdk.java.net/~gromero/script.d < http://cr.openjdk.java.net/%7Egromero/script.d> > [6] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R310 > [7] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R250 > [8] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R437 > > > Best regards, > > > > Martin > > > > *From:*Michihiro Horie > > *Sent:* Dienstag, 4. September 2018 07:32 > > *To:* Doerr, Martin > > *Cc:* Lindenmaier, Goetz ; Gustavo Romero ; hotspot compiler ; hotspot-dev at openjdk.java.net > > *Subject:* RE: RFR: 8208171: PPC64: Enrich SLP support > > > > Hi Goetz, Martin, and Gustavo, > > > > > >>First, this should have been reviewed on hotspot-compiler-dev. It is clearly > >>a compiler change. _ > > _>http://mail.openjdk.java.net/mailman/listinfo?< http://mail.openjdk.java.net/mailman/listinfo> says that hotspot-dev is for > >>"Technical discussion about the development of the HotSpot virtual machine that's not specific to any particular component" > >>while hotspot-compiler-dev is for > >>"Technical discussion about the development of the HotSpot bytecode compilers" > > I understood the instruction and would use hotspot-compiler-dev in future RFRs, thanks. > > > > > >> Why do you rename vnoreg to vnoregi? > > I followed the way of coding for vsnoreg and vsnoregi, but the renaming does not look necessary. I would get this part back. Should I also rename vsnoregi to vsnoreg? > > > > > >>we noticed jtreg test failures when using this change: > >>compiler/runtime/safepoints/TestRegisterRestoring.java > >>compiler/runtime/Test7196199.java > >> > >>TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. > >> > >>We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. > >>The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. > > Thank you for letting me know the issue, I will try to reproduce this on a SUSE machine. > > > > > >>I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. > > Thank you for pointing out another issue. Currently I do not hit this problem, but preventing to match the vector nodes makes sense to avoid the crash. I did not prepare match rules for non-vector nodes, so it might be better to prepare them similarly like the Replicate* rules, in any case. > > > > > > Gustavo, thanks for the wrap-up! > > > > > > Best regards, > > -- > > Michihiro, > > IBM Research - Tokyo > > > > Inactive hide details for "Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures whe"Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures when using this change: > > > > From: "Doerr, Martin" > > > To: Gustavo Romero >, "Lindenmaier, Goetz" >, Michihiro Horie > > > Cc: hotspot compiler >, "hotspot-dev at openjdk.java.net " > > > Date: 2018/09/04 02:18 > > Subject: RE: RFR: 8208171: PPC64: Enrich SLP support > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > > > > > > > Hi Gustavo and Michihiro, > > > > we noticed jtreg test failures when using this change: > > compiler/runtime/safepoints/TestRegisterRestoring.java > > compiler/runtime/Test7196199.java > > > > TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. > > > > We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. > > The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. > > > > That's what I found out so far. Maybe you have an idea? > > > > I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. > > > > Best regards, > > Martin > > > > > > -----Original Message----- > > From: hotspot-dev > On Behalf Of Gustavo Romero > > Sent: Montag, 3. September 2018 14:57 > > To: Lindenmaier, Goetz >; Michihiro Horie > > > Cc: hotspot compiler >; hotspot-dev at openjdk.java.net > > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > > > Hi Goetz, > > > > On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote: > >> Also, I can not find all of the mail traffic in > >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html. > >> Is this a problem of the pipermail server? > >> > >> For some reason this webrev lacks the links to browse the diffs. > >> Do you need to use a more recent webrev? ?You can obtain it with > >> hg clone http://hg.openjdk.java.net/code-tools/webrev/?. > > > > Yes, probably it was a problem of the pipermail or in some relay. > > I noted the same thing, i.e. at least one Michi reply arrived > > to me but missed a ML. > > > > The initial discussion is here: > > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html > > > > I understand Martin reviewed the last webrev in that thread, which is > > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ < http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/>?< http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/> ?(taken from > > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html ) > > > > Martin's review of webrev.01: > > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html > > > > and Michi's reply to Martin's review of webrev.01: > > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html?(with webrev.02, > > taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html ). > > > > and your last review: > > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html > > > > > > HTH. > > > > Best regards, > > Gustavo > > > >> Why do you rename vnoreg to vnoregi? > >> > >> Besides that the change is fine, thanks for implementing this! > >> > >> Best regards, > >> ? ?Goetz. > >> > >> > >>> -----Original Message----- > >>> From: Doerr, Martin > >>> Sent: Dienstag, 28. August 2018 19:35 > >>> To: Gustavo Romero >; Michihiro Horie > >>> > > >>> Cc: Lindenmaier, Goetz >; hotspot- > >>> dev at openjdk.java.net ; ppc-aix-port-dev at openjdk.java.net ; Simonis, Volker > >>> > > >>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support > >>> > >>> Hi Michihiro, > >>> > >>> thank you for implementing it. I have just taken a first look at your > >>> webrev.01. > >>> > >>> It looks basically good. Only the Power version check seems to be incorrect. > >>> VM_Version::has_popcntb() checks for Power5. > >>> I believe most instructions are available with Power7. > >>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with > >>> Power8? > >>> We should check this carefully. > >>> > >>> Also, indentation in register_ppc.hpp could get improved. > >>> > >>> Thanks and best regard, > >>> Martin > >>> > >>> > >>> -----Original Message----- > >>> From: Gustavo Romero > > >>> Sent: Donnerstag, 26. Juli 2018 16:02 > >>> To: Michihiro Horie > > >>> Cc: Lindenmaier, Goetz >; hotspot- > >>> dev at openjdk.java.net ; Doerr, Martin >; ppc-aix- > >>> port-dev at openjdk.java.net ; Simonis, Volker > > >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > >>> > >>> Hi Michi, > >>> > >>> On 07/26/2018 01:43 AM, Michihiro Horie wrote: > >>>> I updated webrev: > >>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ < http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/>?< http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/> > >>> > >>> Thanks for providing an updated webrev and for fixing indentation and > >>> function > >>> order in assembler_ppc.inline.hpp as well. I have no further comments :) > >>> > >>> > >>> Best Regards, > >>> Gustavo > >>> > >>>> > >>>> Best regards, > >>>> -- > >>>> Michihiro, > >>>> IBM Research - Tokyo > >>>> > >>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, > >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero --- > >>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie > >>> wrote: > >>>> > >>>> From: Gustavo Romero > > >>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port- > >>> dev at openjdk.java.net , hotspot-dev at openjdk.java.net > >>>> Cc: goetz.lindenmaier at sap.com , volker.simonis at sap.com , "Doerr, Martin" > >>> > > >>>> Date: 2018/07/25 23:05 > >>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > >>>> > >>>> ------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ----------------------------------------------------- > >>>> > >>>> > >>>> > >>>> Hi Michi, > >>>> > >>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote: > >>>> ? > Dear all, > >>>> ? > > >>>> ? > Would you review the following change? > >>>> ? > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 > >>>> ? > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 < http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.00>?< http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.00> > >>>> ? > > >>>> ? > This change adds support for vectorized arithmetic calculation with SLP. > >>>> ? > > >>>> ? > The to_vr function is added to convert VSR to VR. Currently, vecX is > >>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, > >>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the > >>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the > >>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to > >>> the ConvD2FNode::Value in convertnode.cpp. > >>>> > >>>> Looks good. Just a few comments: > >>>> > >>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of > >>> vmaddfp in > >>>> ? ? order to avoid the splat? > >>>> > >>>> - Although all instructions added by your change where introduced in ISA > >>> 2.06, > >>>> ? ? so POWER7 and above are OK, as I see probes for > >>> PowerArchictecturePPC64=6|5 in > >>>> ? ? vm_version_ppc.cpp (line 64), ?I'm wondering if there is any control point > >>> to > >>>> ? ? guarantee that these instructions won't be emitted on a CPU that does > >>> not > >>>> ? ? support them. > >>>> > >>>> - I think that in general string in format %{} are in upper case. For instance, > >>>> ? ? this the current output on optoassembly for vmul4F: > >>>> > >>>> 2941835 5b4 ? ? ADDI ? ?R24, R24, #64 > >>>> 2941836 5b8 ? ? vmaddfp ?VSR32,VSR32,VSR36 ? ? ?! mul packed4F > >>>> 2941837 5c0 ? ? STXVD2X ? ? [R17], VSR32 ? ? ? ?// store 16-byte Vector > >>>> > >>>> ? ? I think it would be better to be in upper case instead. I also think that if > >>>> ? ? the node match emits more than one instruction all instructions must be > >>> listed > >>>> ? ? in format %{}, since it's meant for detailed debugging. Finally I think it > >>>> ? ? would be better to replace \t! by \t// in that string (unless I'm missing any > >>>> ? ? special meaning for that char). So for vmul4F it would be something like: > >>>> > >>>> 2941835 5b4 ? ? ADDI ? ? ?R24, R24, #64 > >>>> ? ? ? ? ? ? ? ? ? VSPLTISW ?VSR34, 0 ? ? ? ? ? ? ? ? // Splat 0 imm in VSR34 > >>>> 2941836 5b8 ? ? VMADDFP ? VSR32,VSR32,VSR36,VSR34 ?// Mul packed4F > >>>> 2941837 5c0 ? ? STXVD2X ? [R17], VSR32 ? ? ? ? ? ? // store 16-byte Vector > >>>> > >>>> > >>>> But feel free to change anything just after you get additional reviews :) > >>>> > >>>> > >>>> ? > I confirmed this change with JTREG. In addition, I used attached micro > >>> benchmarks. > >>>> ? > /(See attached file: slp_microbench.zip)/ > >>>> > >>>> Thanks for sharing it. > >>>> Btw, another option to host it would be in the CR > >>>> server, in http://cr.openjdk.java.net/~mhorie/8208171 < http://cr.openjdk.java.net/%7Emhorie/8208171>?< http://cr.openjdk.java.net/%7Emhorie/8208171> > >>>> > >>>> > >>>> Best regards, > >>>> Gustavo > >>>> > >>>> ? > > >>>> ? > Best regards, > >>>> ? > -- > >>>> ? > Michihiro, > >>>> ? > IBM Research - Tokyo > >>>> ? > > >>>> > >>>> > >>>> > >> > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From rwestrel at redhat.com Thu Sep 6 07:16:57 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 06 Sep 2018 09:16:57 +0200 Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11 In-Reply-To: References: <98587949-4bef-51ad-0652-3a74f79af233@oracle.com> <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com> <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com> Message-ID: Hi Dmitry, > -prof perfnorm shows 7-14% more branch misses. My patch doesn't make any change to the stubs. It only tweaks c2 compiled code. Do you see any difference in the code generated for com.sun.crypto.provider.CipherCore::doFinal? Roland. From rwestrel at redhat.com Thu Sep 6 07:17:13 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 06 Sep 2018 09:17:13 +0200 Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11 In-Reply-To: <3c8ae9e3-e3f2-df0e-0add-4d1c54589198@oracle.com> References: <98587949-4bef-51ad-0652-3a74f79af233@oracle.com> <3c8ae9e3-e3f2-df0e-0add-4d1c54589198@oracle.com> Message-ID: Thanks for the review, Vladimir. Roland. From dmitry.chuyko at bell-sw.com Thu Sep 6 10:59:28 2018 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Thu, 6 Sep 2018 13:59:28 +0300 Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11 In-Reply-To: References: <98587949-4bef-51ad-0652-3a74f79af233@oracle.com> <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com> <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com> Message-ID: <097e9014-db4a-0f2f-9b78-80cb5dcc7832@bell-sw.com> On 09/06/2018 10:16 AM, Roland Westrelin wrote: > Hi Dmitry, > >> -prof perfnorm shows 7-14% more branch misses. > My patch doesn't make any change to the stubs. It only tweaks c2 > compiled code. One guess could be that other code influenced branches prediction in the stub. > Do you see any difference in the code generated for > com.sun.crypto.provider.CipherCore::doFinal? Yes. Here is how it looks like: Current 0x0000fffca85ffd68: add w16, w4, w10 ;*invokestatic addExact {reexecute=0 rethrow=0 return_oop=0} ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 17 (line 328) ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917) 0.02% 0x0000fffca85ffd6c: cmp w14, #0x1 0x0000fffca85ffd70: b.cc 0x0000fffca85fff20 // b.lo, b.ul, b.last ;*lookupswitch {reexecute=0 rethrow=0 return_oop=0} ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 25 (line 329) ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917) 0x0000fffca85ffd74: lsl x11, x11, #3 ;*getfield padding {reexecute=0 rethrow=0 return_oop=0} ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 92 (line 344) ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917) Patched/-XX:-UseSwitchProfiling 0x0000fffcd86000e4: add w16, w4, w15 ;*invokestatic addExact {reexecute=0 rethrow=0 return_oop=0} ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 17 (line 328) ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917) 0.01% 0x0000fffcd86000e8: cmp w14, #0x7 0x0000fffcd86000ec: b.eq 0x0000fffcd8600234 // b.none ;*lookupswitch {reexecute=0 rethrow=0 return_oop=0} ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 25 (line 329) ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917) 0x0000fffcd86000f0: lsl x11, x12, #3 ;*getfield padding {reexecute=0 rethrow=0 return_oop=0} ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 92 (line 344) ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917) There are also several instructions/blocks rearrangements. -Dmitry > > Roland. From rwestrel at redhat.com Thu Sep 6 13:10:43 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 06 Sep 2018 15:10:43 +0200 Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11 In-Reply-To: <097e9014-db4a-0f2f-9b78-80cb5dcc7832@bell-sw.com> References: <98587949-4bef-51ad-0652-3a74f79af233@oracle.com> <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com> <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com> <097e9014-db4a-0f2f-9b78-80cb5dcc7832@bell-sw.com> Message-ID: > Yes. Here is how it looks like: > > Current > > 0x0000fffca85ffd68: add w16, w4, w10 ;*invokestatic addExact {reexecute=0 rethrow=0 return_oop=0} > ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 17 (line 328) > ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917) > 0.02% 0x0000fffca85ffd6c: cmp w14, #0x1 > 0x0000fffca85ffd70: b.cc 0x0000fffca85fff20 // b.lo, b.ul, b.last > ;*lookupswitch {reexecute=0 rethrow=0 return_oop=0} > ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 25 (line 329) > ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917) > 0x0000fffca85ffd74: lsl x11, x11, #3 ;*getfield padding {reexecute=0 rethrow=0 return_oop=0} > ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 92 (line 344) > ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917) > > Patched/-XX:-UseSwitchProfiling > > 0x0000fffcd86000e4: add w16, w4, w15 ;*invokestatic addExact {reexecute=0 rethrow=0 return_oop=0} > ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 17 (line 328) > ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917) > 0.01% 0x0000fffcd86000e8: cmp w14, #0x7 > 0x0000fffcd86000ec: b.eq 0x0000fffcd8600234 // b.none ;*lookupswitch {reexecute=0 rethrow=0 return_oop=0} > ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 25 (line 329) > ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917) > 0x0000fffcd86000f0: lsl x11, x12, #3 ;*getfield padding {reexecute=0 rethrow=0 return_oop=0} > ; - com.sun.crypto.provider.CipherCore::getOutputSizeByOperation at 92 (line 344) > ; - com.sun.crypto.provider.CipherCore::doFinal at 20 (line 917) > > There are also several instructions/blocks rearrangements. That does seem like a pretty minimal difference and not a reason not to push that change. What do you think? Roland. From dmitry.chuyko at bell-sw.com Thu Sep 6 13:20:43 2018 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Thu, 6 Sep 2018 16:20:43 +0300 Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11 In-Reply-To: References: <98587949-4bef-51ad-0652-3a74f79af233@oracle.com> <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com> <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com> <097e9014-db4a-0f2f-9b78-80cb5dcc7832@bell-sw.com> Message-ID: <9b58801d-fcdd-88ff-7d45-15c44096137f@bell-sw.com> On 09/06/2018 04:10 PM, Roland Westrelin wrote: >> Yes. Here is how it looks like: >> ................................... > That does seem like a pretty minimal difference and not a reason not to > push that change. What do you think? I agree, it looks like something we should investigate in aarch64 port. -Dmitry > > Roland. From adinn at redhat.com Fri Sep 7 12:58:59 2018 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 7 Sep 2018 13:58:59 +0100 Subject: RFR: 8189107 - AARCH64: create intrinsic for pow In-Reply-To: <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com> References: <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com> <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com> Message-ID: Hi Dmitrij On 22/08/18 11:04, Andrew Dinn wrote: > Thank you for the revised webrev and new test results. I am now working > through them. I will post comments as soon as I have given the new code > a full read and assessed the new results. I am afraid that may take a > day or two, for which delay advance apologies. This review has taken a great deal longer than expected. I am sorry but that is because the documentation for the code you have submitted is still seriously inadequate and I have had to put a lot of work into revising it before I can fully review the code. I am still finishing off that last task but I wanted to start providing you with some feedback and also to enlist your help in checking that my revisions are correct. I plan to provide feedback in 3 stages to match the 3 steps in the review that I am doing as follows: 1) Correct the original 'algorithm' you started from 2) Correct the 'modified algorithm' that is meant to describe the behaviour of your code 3) Propose any necessary corrections/improvements to the generated code So, let's start with step 1. The 'original' algorithm located in file macroAssembler_aarch64_pow.cpp is really just a fragment of C code with a few missing elements (e.g. the origin of values P1, P2, ... is not explained, hugeX, tiny are not defined). Although this code as the virtue that it is known to be correct (or at least has been verified by long use and the eyes of experts in numerical computation) it still fails to provide important information about what the 'algorithm' is supposed to do. That information is critical for anyone coming to it fresh to be able understand what is happening. The first omission is several pieces of background mathematics that are neither explained in the code nor referenced. The mathematics includes the formulae on which the algorithm is based and the numerical approximation to these formulae that is employed to define the algorithm. This is needed to explain /how/ and /why/ a) the two different computations of log2(x) and b) the computation of exp(x) are performed as they are and to justify that the results are valid. The second omission is detailed descriptions of what most of the more complex individual steps in the algorithm do. Many of the logic, floating point and branching operations which compute intermediate results are extremely opaque. This is particularly so for the steps which manipulate bit patterns in the long representation of the fp values being used. However, some of the straight fp arithmetic is also highly problematic. The other thing I think needs to be made clearer is the relationship between the various special case return points in the code and the special case rules they relate to. This is not so critical for the original algorithm because the C code at least has a regular and standardised control flow. However, labelling the exit paths is still useful here and will be much more useful if used both here and in the modified algorithm (and we'll come to that later). I have rewritten the algorithm to achieve what I think is needed to patch these omissions. The redraft of this part of the code is available here: http://cr.openjdk.java.net/~adinn/8189107/original_algorithm.txt I assume you are familiar with the relevant mathematics and how it has been used to derive the algorithm. If so then I would like you to review this rewrite and ensure that there are nor mathematical errors in it. I would also like you to check that the explanatory comments for of the individual steps in the algorithm do not contain any errors. If you are not familiar with the mathematics then please let me know. I need to know whether this has been reviewed bu someone competent to do so. n.b. one little detail you might easily miss. I removed lg2, lg2_h and lg2_l from the first table of constants as neither log(x) algorithm needs them (it relies on ivln2). I renamed the entries in the second table from LG2, etc to /ln2/, etc and change the name accordingly at point of use. The computation of exp(x) actually does need ln2. One of the code changes is to remove these redundant entries from your table pow_coeff1. Ok, as for the next 2 steps will post a follow-up to deal with them once I have completed my review. That will include a heavily revised version of your 'modified algorithm' (which is still in progress) plus suggestions for changes to the code that I have found along the way. Just as a preliminary I'll summarize what is wrong below. Note that I have not yet found any errors in how the generated code implements the mathematics but I am still not happy with it because it is extremely unclear. Correcting the 'modified algorithm' is a necessary, critical step to improvimg the clarity of the code. So, in overview, what is wrong with your 'modified algorithm'. Well, the thing that is immediately obvious is that it is /not/ actually the algorithm you have employed. It is simply a mangled version of the C code you started from that bears only a tenuous relation to the code structure it is supposed to summarize. Now, I'm happy for you to use C to model the generated code if possible and, in fact, am in the process of writing a proper algorithm that looks as much like C code as possible /but/ also actually describes what your generated code does. The problem is that what you have written is not only /not/ C it is also i) incoherent, ii) retains elements from the original code that don't exist at all in the generated code and iii) omits important elements of the generated code. So, firstly, let's deal with the problem as it relates to control flow. Your 'modified algorithm' includes various tags mentioning the word 'label' suggesting that some transfer of control is to be effected. However, these are tacked onto statement blocks connected via 'if (cond)' tests or 'else' keywords that are meant to imply some alternative control flow. Essentially, your generated code relies on gotos which do not fit a standard if/else flow model and you have tried to bodge some sort of goto model on top of the original valid, gotoless C control flow with no clear definition of how that is meant to work. Honestly, if your generated code uses a goto control flow then your C algorithm is going to have to do the same in order to clearly summarize what the code actually does. The second major problem is one I pointed out in my earlier note, i.e. that the data values described in the 'modified algorithm' do not correctly match the ones operated on in your generated code. Your algorithm lists many redundant values used in the original algorithm (e.g. ix, iy, ax, yisint) even though your code doesn't ever explicitly construct most of those values (n.b. this but not just limited to the 32 bit half-word quantities). Instead the code frequently pulls the relevant value, as needed, out of other data that it does construct and holds in registers -- sometimes across control branches. At other times it performs an equivalent operation on a different, related data value. Your response to my request was to add comments which labelled some of these on-the-fly created values or alternative values with the original names but that ignores the fact that the names and the values referenced in the comments do not actually match. Contrariwise, a lot of the values the code does actually operate on are not mentioned in the algorithm. Indeed, it is worse than that because they are not coherently identified even in the generated code. Data items stored in registers are referred to using the utterly redundant symbolic aliases tmp0, tmp2, etc for registers r0, r1 etc. What is worse the same meaningless symbolic names get reused for completely different data items. For example, at one point tmp2 identifies the exponent of y stored in r2 and later it identifies the absolute value of y also stored in r2, overwriting the exponent. Your algorithm really ought to mention values like exp_y or ay (or even |y|) for these cases and the code should correspondingly define exp_y and ay as an alias for register r2. These meaningful names should then be used when loading the constructed value into a register and at every subsequent point of use where that constructed value is valid. This is not all that is wrong with the 'modified algorithm' but it is enough to make it not just useless but worse than useless. What you have written provides a hand-wave towards what the code does that fails to summarize it with any accuracy or clarity and equally fails to clarify the difference between it and the C code you started from. That only makes the whole picture less clear not more so. As I said, I will provide a better version of the 'modified algorithm' in a follow-up and then discuss possible code changes. Please review the linked file above while I prepare that. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From martin.doerr at sap.com Fri Sep 7 13:39:25 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 7 Sep 2018 13:39:25 +0000 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: References: <346da54af45243c4bdaf475f118a450d@sap.com> <9553d65d98f74f37a35b49a1e39f015e@sap.com> <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com> <57ebd30a66504577a6b2ec267aee4b69@sap.com> <8a093a6e-631e-4c20-07dc-96bc29f8c411@linux.vnet.ibm.com> Message-ID: Hi Michihiro, I?ve created a new bug for the vector register save issue: https://bugs.openjdk.java.net/browse/JDK-8210497 I?d like to fix that one first. I can push your webrev.03 afterwards when tests are passing and review is completed. Best regards, Martin From: Michihiro Horie Sent: Donnerstag, 6. September 2018 05:28 To: Gustavo Romero Cc: Lindenmaier, Goetz ; hotspot compiler ; Doerr, Martin Subject: Re: RFR: 8208171: PPC64: Enrich SLP support Hi Martin, Gustavo, Thank you for giving the detailed discussions and narrowing down the current issue on ppc64. > We haven't seen any issues with the current code, but I think this is affects jdk11, too. (We could also switch off SuperwordUseVSX for jdk11u.) Do you agree? Yes, I agree. Following is the latest webrev switching off SuperwordUseVSX by default. http://cr.openjdk.java.net/~mhorie/8208171/webrev.04/ Best regards, -- Michihiro, IBM Research - Tokyo [Inactive hide details for Gustavo Romero ---2018/09/06 03:34:34---Hi Michi, On 09/05/2018 07:22 AM, Michihiro Horie wrote:]Gustavo Romero ---2018/09/06 03:34:34---Hi Michi, On 09/05/2018 07:22 AM, Michihiro Horie wrote: From: Gustavo Romero > To: Michihiro Horie/Japan/IBM at IBMJP Cc: "Lindenmaier, Goetz" >, hotspot compiler >, "Doerr, Martin" > Date: 2018/09/06 03:34 Subject: Re: RFR: 8208171: PPC64: Enrich SLP support ________________________________ Hi Michi, On 09/05/2018 07:22 AM, Michihiro Horie wrote: > Hi Martin, Gustavo, > > I cannot still reproduce the problem. I noticed the machine I have is not SUSE but OpenSUSE with 4.1.21-14-default. I've also tried kernel 4.4.0-31-generic but it's Ubuntu. > > Gustavo, is there any suspicious change before/after v4.4, which Martin got the crash? Nope, nothing I'm aware of... However looks like Martin found no issues with your last revision. Anyway, if you need a machine with SLES 12 SP3 installed I have one that I can share. Drop me a Slack message if you need it. Regards, Gustavo > > Apart from the problem, I uploaded the latest webrev: > http://cr.openjdk.java.net/~mhorie/8208171/webrev.03/ > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > Inactive hide details for Gustavo Romero ---2018/09/05 07:03:31---Hi Martin and Michi, On 09/04/2018 01:20 PM, Doerr, Martin wrGustavo Romero ---2018/09/05 07:03:31---Hi Martin and Michi, On 09/04/2018 01:20 PM, Doerr, Martin wrote: > > From: Gustavo Romero > > To: "Doerr, Martin" >, Michihiro Horie/Japan/IBM at IBMJP > Cc: "Lindenmaier, Goetz" >, hotspot compiler > > Date: 2018/09/05 07:03 > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > Hi Martin and Michi, > > On 09/04/2018 01:20 PM, Doerr, Martin wrote: > > Can you reproduce the test failures? > > > > The very same VM works fine on a different Power8 machine which uses the same instructions by C2. > > > > The VM was built on the machine where it works ("SUSE Linux Enterprise Server 12 SP1"). > > > > I have seen several linux kernel changes regarding saving and restoring the VSX registers. > > > > I still haven?t found out how the kernel determines things like ?tsk->thread.used_vsr? which is used to set ?msr |= MSR_VEC?. > > > > Maybe something is missing which tells the kernel that we?re using it. But that?s just a guess. > > Facilities like FP (fp registers), VEC (vector registers - aka VMX/Altivec), and > VSX (vector-scalar registers) are usually disabled on a new born process. Once > any instruction associated to these facilities is used in the process it causes > an exception that is treated by the kernel [1, 2, 3]: kernel enables the > facility that caused the exception (see load_up_fp & friends) and re-execute the > instruction when kernel returns the control back to the process in userspace. > > Starting from kernel v4.6 [4] there is a simple heuristic that employs a 8-bit > counter to help track if a process, after using these facilities for the first > time, continues to use the facilities. The counters (load_fp and load_vec) are > incremented on each context switch and if the process stops using the FP or VEC > facilities then they are disabled again with FP/VEC/VSX save/restore on context > switches being disabled as well in order to improve the performance on context > switches by avoiding the FP/VEC/VEX register save/restore. > > Either way (before or after the change introduced in v4.6) *that mechanism is > opaque to userspace*, particularly to the process using these facilities. If a > given facility is not enabled by the kernel (in case the CPU does not support > it, kernel sends a SIGILL to the process). It's possible to inspect the thread > member dynamics/state from userspace using tools like 'systemtap' (for > exemple, this simple script can be used to inspect a VRSAVE registers on given > thread that is running a program called 'vrsave_' [5]) or using the 'perf' tool. > > "tsk->thread.used_vsr" [6] is actually associated to the VSX facility whilst > MSR_VEC is associated to the VEC/VMX/Altivec facility [7], so > "tsk->thread.used_vsr" is set to 1 once a VSX instruction is used (if it's a new > process or if the load_fp and load_vec counters overflowed and became zero > disabling VSX or if only FP or only VEC - not both - were used in the process). > In that case kernel will also enable the VSX by MSR |= MSR_VSX. A similar > mechanism drives the FP (MSR_FP) and the VEC (MSR_VEC) facilities. > > If both FP and VEC facilities are used the VSX facility is enabled automatically > since FP+VEC regsets == VSX regset [8]. > > Thus as this mechanism is entirely opaque to userspace I understand that if a > program has to tell to kernel it wants to use any of these facilities > (FP/VEC/VEC) before using it there is something wrong going in kernelspace. > > Martin and Michi, if you want any help on drilling it further at kernel side > please let me know, maybe I can help. > > I didn't have the chance to reproduce the crash yet, so if I find anything > meaningful about it tomorrow I'll keep you posted. > > > Kind regards, > Gustavo > > [1] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L851-L869 (FP) > [2] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211 (VEC/VMX/Altivec) > [3] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211 (VSX) > [4] https://github.com/torvalds/linux/commit/70fe3d980#diff-ef76830326856a12ea2b45630123d1adR239 > [5] http://cr.openjdk.java.net/~gromero/script.d > [6] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R310 > [7] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R250 > [8] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R437 > > > Best regards, > > > > Martin > > > > *From:*Michihiro Horie > > > *Sent:* Dienstag, 4. September 2018 07:32 > > *To:* Doerr, Martin > > > *Cc:* Lindenmaier, Goetz >; Gustavo Romero >; hotspot compiler >; hotspot-dev at openjdk.java.net > > *Subject:* RE: RFR: 8208171: PPC64: Enrich SLP support > > > > Hi Goetz, Martin, and Gustavo, > > > > > >>First, this should have been reviewed on hotspot-compiler-dev. It is clearly > >>a compiler change. _ > > _>http://mail.openjdk.java.net/mailman/listinfo says that hotspot-dev is for > >>"Technical discussion about the development of the HotSpot virtual machine that's not specific to any particular component" > >>while hotspot-compiler-dev is for > >>"Technical discussion about the development of the HotSpot bytecode compilers" > > I understood the instruction and would use hotspot-compiler-dev in future RFRs, thanks. > > > > > >> Why do you rename vnoreg to vnoregi? > > I followed the way of coding for vsnoreg and vsnoregi, but the renaming does not look necessary. I would get this part back. Should I also rename vsnoregi to vsnoreg? > > > > > >>we noticed jtreg test failures when using this change: > >>compiler/runtime/safepoints/TestRegisterRestoring.java > >>compiler/runtime/Test7196199.java > >> > >>TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. > >> > >>We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. > >>The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. > > Thank you for letting me know the issue, I will try to reproduce this on a SUSE machine. > > > > > >>I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. > > Thank you for pointing out another issue. Currently I do not hit this problem, but preventing to match the vector nodes makes sense to avoid the crash. I did not prepare match rules for non-vector nodes, so it might be better to prepare them similarly like the Replicate* rules, in any case. > > > > > > Gustavo, thanks for the wrap-up! > > > > > > Best regards, > > -- > > Michihiro, > > IBM Research - Tokyo > > > > Inactive hide details for "Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures whe"Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures when using this change: > > > > From: "Doerr, Martin" > > > To: Gustavo Romero >, "Lindenmaier, Goetz" >, Michihiro Horie > > > Cc: hotspot compiler >, "hotspot-dev at openjdk.java.net " > > > Date: 2018/09/04 02:18 > > Subject: RE: RFR: 8208171: PPC64: Enrich SLP support > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > > > > > > > Hi Gustavo and Michihiro, > > > > we noticed jtreg test failures when using this change: > > compiler/runtime/safepoints/TestRegisterRestoring.java > > compiler/runtime/Test7196199.java > > > > TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. > > > > We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. > > The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. > > > > That's what I found out so far. Maybe you have an idea? > > > > I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. > > > > Best regards, > > Martin > > > > > > -----Original Message----- > > From: hotspot-dev > On Behalf Of Gustavo Romero > > Sent: Montag, 3. September 2018 14:57 > > To: Lindenmaier, Goetz >; Michihiro Horie > > > Cc: hotspot compiler >; hotspot-dev at openjdk.java.net > > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > > > Hi Goetz, > > > > On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote: > >> Also, I can not find all of the mail traffic in > >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html. > >> Is this a problem of the pipermail server? > >> > >> For some reason this webrev lacks the links to browse the diffs. > >> Do you need to use a more recent webrev? You can obtain it with > >> hg clone http://hg.openjdk.java.net/code-tools/webrev/ . > > > > Yes, probably it was a problem of the pipermail or in some relay. > > I noted the same thing, i.e. at least one Michi reply arrived > > to me but missed a ML. > > > > The initial discussion is here: > > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html > > > > I understand Martin reviewed the last webrev in that thread, which is > > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ (taken from > > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html) > > > > Martin's review of webrev.01: > > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html > > > > and Michi's reply to Martin's review of webrev.01: > > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02, > > taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html). > > > > and your last review: > > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html > > > > > > HTH. > > > > Best regards, > > Gustavo > > > >> Why do you rename vnoreg to vnoregi? > >> > >> Besides that the change is fine, thanks for implementing this! > >> > >> Best regards, > >> Goetz. > >> > >> > >>> -----Original Message----- > >>> From: Doerr, Martin > >>> Sent: Dienstag, 28. August 2018 19:35 > >>> To: Gustavo Romero >; Michihiro Horie > >>> > > >>> Cc: Lindenmaier, Goetz >; hotspot- > >>> dev at openjdk.java.net ; ppc-aix-port-dev at openjdk.java.net ; Simonis, Volker > >>> > > >>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support > >>> > >>> Hi Michihiro, > >>> > >>> thank you for implementing it. I have just taken a first look at your > >>> webrev.01. > >>> > >>> It looks basically good. Only the Power version check seems to be incorrect. > >>> VM_Version::has_popcntb() checks for Power5. > >>> I believe most instructions are available with Power7. > >>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with > >>> Power8? > >>> We should check this carefully. > >>> > >>> Also, indentation in register_ppc.hpp could get improved. > >>> > >>> Thanks and best regard, > >>> Martin > >>> > >>> > >>> -----Original Message----- > >>> From: Gustavo Romero > > >>> Sent: Donnerstag, 26. Juli 2018 16:02 > >>> To: Michihiro Horie > > >>> Cc: Lindenmaier, Goetz >; hotspot- > >>> dev at openjdk.java.net ; Doerr, Martin >; ppc-aix- > >>> port-dev at openjdk.java.net ; Simonis, Volker > > >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > >>> > >>> Hi Michi, > >>> > >>> On 07/26/2018 01:43 AM, Michihiro Horie wrote: > >>>> I updated webrev: > >>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ > >>> > >>> Thanks for providing an updated webrev and for fixing indentation and > >>> function > >>> order in assembler_ppc.inline.hpp as well. I have no further comments :) > >>> > >>> > >>> Best Regards, > >>> Gustavo > >>> > >>>> > >>>> Best regards, > >>>> -- > >>>> Michihiro, > >>>> IBM Research - Tokyo > >>>> > >>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, > >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero --- > >>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie > >>> wrote: > >>>> > >>>> From: Gustavo Romero > > >>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port- > >>> dev at openjdk.java.net , hotspot-dev at openjdk.java.net > >>>> Cc: goetz.lindenmaier at sap.com , volker.simonis at sap.com , "Doerr, Martin" > >>> > > >>>> Date: 2018/07/25 23:05 > >>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > >>>> > >>>> ------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ----------------------------------------------------- > >>>> > >>>> > >>>> > >>>> Hi Michi, > >>>> > >>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote: > >>>> > Dear all, > >>>> > > >>>> > Would you review the following change? > >>>> > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 > >>>> > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 > >>>> > > >>>> > This change adds support for vectorized arithmetic calculation with SLP. > >>>> > > >>>> > The to_vr function is added to convert VSR to VR. Currently, vecX is > >>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, > >>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the > >>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the > >>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to > >>> the ConvD2FNode::Value in convertnode.cpp. > >>>> > >>>> Looks good. Just a few comments: > >>>> > >>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of > >>> vmaddfp in > >>>> order to avoid the splat? > >>>> > >>>> - Although all instructions added by your change where introduced in ISA > >>> 2.06, > >>>> so POWER7 and above are OK, as I see probes for > >>> PowerArchictecturePPC64=6|5 in > >>>> vm_version_ppc.cpp (line 64), I'm wondering if there is any control point > >>> to > >>>> guarantee that these instructions won't be emitted on a CPU that does > >>> not > >>>> support them. > >>>> > >>>> - I think that in general string in format %{} are in upper case. For instance, > >>>> this the current output on optoassembly for vmul4F: > >>>> > >>>> 2941835 5b4 ADDI R24, R24, #64 > >>>> 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F > >>>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector > >>>> > >>>> I think it would be better to be in upper case instead. I also think that if > >>>> the node match emits more than one instruction all instructions must be > >>> listed > >>>> in format %{}, since it's meant for detailed debugging. Finally I think it > >>>> would be better to replace \t! by \t// in that string (unless I'm missing any > >>>> special meaning for that char). So for vmul4F it would be something like: > >>>> > >>>> 2941835 5b4 ADDI R24, R24, #64 > >>>> VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 > >>>> 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F > >>>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector > >>>> > >>>> > >>>> But feel free to change anything just after you get additional reviews :) > >>>> > >>>> > >>>> > I confirmed this change with JTREG. In addition, I used attached micro > >>> benchmarks. > >>>> > /(See attached file: slp_microbench.zip)/ > >>>> > >>>> Thanks for sharing it. > >>>> Btw, another option to host it would be in the CR > >>>> server, in http://cr.openjdk.java.net/~mhorie/8208171 > >>>> > >>>> > >>>> Best regards, > >>>> Gustavo > >>>> > >>>> > > >>>> > Best regards, > >>>> > -- > >>>> > Michihiro, > >>>> > IBM Research - Tokyo > >>>> > > >>>> > >>>> > >>>> > >> > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From dmitrij.pochepko at bell-sw.com Fri Sep 7 13:40:23 2018 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Fri, 7 Sep 2018 16:40:23 +0300 Subject: RFR(S): 8210461 - AArch64: Math.cos intrinsic gives incorrect results Message-ID: Hi, please review small fix for 8210461 - AArch64: Math.cos intrinsic gives incorrect results Large argument reduction code has a bug in one of code branches. C-code: of affected place: iq[jz] = (int)(z-two24B*fw); with bug it was calculated as iq[jz] = (int)(z+two24B*fw);? // by fmadd instruction Fix is to change it into fmsub instruction for correct calculation. I also re-parsed most of code in search of same errors. Seems like no other issues found. This bug wasn't caught by jtreg and jck tests, so I added separate small test for such case. webrev: http://cr.openjdk.java.net/~dpochepk/8210461/webrev.01/ CR: https://bugs.openjdk.java.net/browse/JDK-8210461 Testing: I tested this patch via new and old tests. All passed. I also ran this new test on x86. This patch should be pushed into jdk12 and backported into jdk11. Thanks, Dmitrij From aph at redhat.com Fri Sep 7 13:52:06 2018 From: aph at redhat.com (Andrew Haley) Date: Fri, 7 Sep 2018 14:52:06 +0100 Subject: [aarch64-port-dev ] RFR(S): 8210461 - AArch64: Math.cos intrinsic gives incorrect results In-Reply-To: References: Message-ID: On 09/07/2018 02:40 PM, Dmitrij Pochepko wrote: > C-code: of affected place: > > iq[jz] = (int)(z-two24B*fw); > > with bug it was calculated as iq[jz] = (int)(z+two24B*fw);? // by fmadd > instruction > > Fix is to change it into fmsub instruction for correct calculation. Am I right to think that this code branch has never been tested? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dmitrij.pochepko at bell-sw.com Fri Sep 7 14:03:18 2018 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Fri, 7 Sep 2018 17:03:18 +0300 Subject: [aarch64-port-dev ] RFR(S): 8210461 - AArch64: Math.cos intrinsic gives incorrect results In-Reply-To: References: Message-ID: <4ef01df1-0c2f-78ca-f2e5-f10b78247140@bell-sw.com> I remember debugging this branch while running JCK tests. Haven't checked precisely, but probably fw was? 0 on those cases, so, z - two24B*fw and z + tmp24B*fw. It would explain such behavior. On 07/09/18 16:52, Andrew Haley wrote: > On 09/07/2018 02:40 PM, Dmitrij Pochepko wrote: >> C-code: of affected place: >> >> iq[jz] = (int)(z-two24B*fw); >> >> with bug it was calculated as iq[jz] = (int)(z+two24B*fw);? // by fmadd >> instruction >> >> Fix is to change it into fmsub instruction for correct calculation. > Am I right to think that this code branch has never been tested? > From HORIE at jp.ibm.com Fri Sep 7 14:55:43 2018 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Fri, 7 Sep 2018 23:55:43 +0900 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: References: <346da54af45243c4bdaf475f118a450d@sap.com> <9553d65d98f74f37a35b49a1e39f015e@sap.com> <48ee8bca-fd73-7854-84fd-20b836c00651@linux.vnet.ibm.com> <57ebd30a66504577a6b2ec267aee4b69@sap.com> <8a093a6e-631e-4c20-07dc-96bc29f8c411@linux.vnet.ibm.com> Message-ID: Hi Martin, >I?ve created a new bug for the vector register save issue: >https://bugs.openjdk.java.net/browse/JDK-8210497 >I?d like to fix that one first. Thank you very much for handling the issue. I really appreciate it. >I can push your webrev.03 afterwards when tests are passing and review is completed. This would be great, thanks again. Best regards, -- Michihiro, IBM Research - Tokyo From: "Doerr, Martin" To: Michihiro Horie , Gustavo Romero Cc: "Lindenmaier, Goetz" , hotspot compiler Date: 2018/09/07 22:39 Subject: RE: RFR: 8208171: PPC64: Enrich SLP support Hi Michihiro, I?ve created a new bug for the vector register save issue: https://bugs.openjdk.java.net/browse/JDK-8210497 I?d like to fix that one first. I can push your webrev.03 afterwards when tests are passing and review is completed. Best regards, Martin From: Michihiro Horie Sent: Donnerstag, 6. September 2018 05:28 To: Gustavo Romero Cc: Lindenmaier, Goetz ; hotspot compiler ; Doerr, Martin Subject: Re: RFR: 8208171: PPC64: Enrich SLP support Hi Martin, Gustavo, Thank you for giving the detailed discussions and narrowing down the current issue on ppc64. > We haven't seen any issues with the current code, but I think this is affects jdk11, too. (We could also switch off SuperwordUseVSX for jdk11u.) Do you agree? Yes, I agree. Following is the latest webrev switching off SuperwordUseVSX by default. http://cr.openjdk.java.net/~mhorie/8208171/webrev.04/ Best regards, -- Michihiro, IBM Research - Tokyo Inactive hide details for Gustavo Romero ---2018/09/06 03:34:34---Hi Michi, On 09/05/2018 07:22 AM, Michihiro Horie wrote:Gustavo Romero ---2018/09/06 03:34:34---Hi Michi, On 09/05/2018 07:22 AM, Michihiro Horie wrote: From: Gustavo Romero To: Michihiro Horie/Japan/IBM at IBMJP Cc: "Lindenmaier, Goetz" , hotspot compiler < hotspot-compiler-dev at openjdk.java.net>, "Doerr, Martin" < martin.doerr at sap.com> Date: 2018/09/06 03:34 Subject: Re: RFR: 8208171: PPC64: Enrich SLP support Hi Michi, On 09/05/2018 07:22 AM, Michihiro Horie wrote: > Hi Martin, Gustavo, > > I cannot still reproduce the problem. I noticed the machine I have is not SUSE but OpenSUSE with 4.1.21-14-default. I've also tried kernel 4.4.0-31-generic but it's Ubuntu. > > Gustavo, is there any suspicious change before/after v4.4, which Martin got the crash? Nope, nothing I'm aware of... However looks like Martin found no issues with your last revision. Anyway, if you need a machine with SLES 12 SP3 installed I have one that I can share. Drop me a Slack message if you need it. Regards, Gustavo > > Apart from the problem, I uploaded the latest webrev:< http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.03/> > http://cr.openjdk.java.net/~mhorie/8208171/webrev.03/ < http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.03/> > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > Inactive hide details for Gustavo Romero ---2018/09/05 07:03:31---Hi Martin and Michi, On 09/04/2018 01:20 PM, Doerr, Martin wrGustavo Romero ---2018/09/05 07:03:31---Hi Martin and Michi, On 09/04/2018 01:20 PM, Doerr, Martin wrote: > > From: Gustavo Romero > To: "Doerr, Martin" , Michihiro Horie/Japan/IBM at IBMJP > Cc: "Lindenmaier, Goetz" , hotspot compiler < hotspot-compiler-dev at openjdk.java.net> > Date: 2018/09/05 07:03 > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > Hi Martin and Michi, > > On 09/04/2018 01:20 PM, Doerr, Martin wrote: > > Can you reproduce the test failures? > > > > The very same VM works fine on a different Power8 machine which uses the same instructions by C2. > > > > The VM was built on the machine where it works ("SUSE Linux Enterprise Server 12 SP1"). > > > > I have seen several linux kernel changes regarding saving and restoring the VSX registers. > > > > I still haven?t found out how the kernel determines things like ?tsk-> thread.used_vsr? which is used to set ?msr |= MSR_VEC?. > > > > Maybe something is missing which tells the kernel that we?re using it. But that?s just a guess. > > Facilities like FP (fp registers), VEC (vector registers - aka VMX/Altivec), and > VSX (vector-scalar registers) are usually disabled on a new born process. Once > any instruction associated to these facilities is used in the process it causes > an exception that is treated by the kernel [1, 2, 3]: kernel enables the > facility that caused the exception (see load_up_fp & friends) and re-execute the > instruction when kernel returns the control back to the process in userspace. > > Starting from kernel v4.6 [4] there is a simple heuristic that employs a 8-bit > counter to help track if a process, after using these facilities for the first > time, continues to use the facilities. The counters (load_fp and load_vec) are > incremented on each context switch and if the process stops using the FP or VEC > facilities then they are disabled again with FP/VEC/VSX save/restore on context > switches being disabled as well in order to improve the performance on context > switches by avoiding the FP/VEC/VEX register save/restore. > > Either way (before or after the change introduced in v4.6) *that mechanism is > opaque to userspace*, particularly to the process using these facilities. If a > given facility is not enabled by the kernel (in case the CPU does not support > it, kernel sends a SIGILL to the process). It's possible to inspect the thread > member dynamics/state from userspace using tools like 'systemtap' (for > exemple, this simple script can be used to inspect a VRSAVE registers on given > thread that is running a program called 'vrsave_' [5]) or using the 'perf' tool. > > "tsk->thread.used_vsr" [6] is actually associated to the VSX facility whilst > MSR_VEC is associated to the VEC/VMX/Altivec facility [7], so > "tsk->thread.used_vsr" is set to 1 once a VSX instruction is used (if it's a new > process or if the load_fp and load_vec counters overflowed and became zero > disabling VSX or if only FP or only VEC - not both - were used in the process). > In that case kernel will also enable the VSX by MSR |= MSR_VSX. A similar > mechanism drives the FP (MSR_FP) and the VEC (MSR_VEC) facilities. > > If both FP and VEC facilities are used the VSX facility is enabled automatically > since FP+VEC regsets == VSX regset [8]. > > Thus as this mechanism is entirely opaque to userspace I understand that if a > program has to tell to kernel it wants to use any of these facilities > (FP/VEC/VEC) before using it there is something wrong going in kernelspace. > > Martin and Michi, if you want any help on drilling it further at kernel side > please let me know, maybe I can help. > > I didn't have the chance to reproduce the crash yet, so if I find anything > meaningful about it tomorrow I'll keep you posted. > > > Kind regards, > Gustavo > > [1] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L851-L869 (FP) > [2] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211 (VEC/VMX/Altivec) > [3] https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/exceptions-64s.S#L1197-L1211 (VSX) > [4] https://github.com/torvalds/linux/commit/70fe3d980#diff-ef76830326856a12ea2b45630123d1adR239 > [5] http://cr.openjdk.java.net/~gromero/script.d < http://cr.openjdk.java.net/%7Egromero/script.d> > [6] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R310 > [7] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R250 > [8] https://github.com/torvalds/linux/commit/70fe3d980#diff-cc409475871baa8652ae4a5b4be7f715R437 > > > Best regards, > > > > Martin > > > > *From:*Michihiro Horie > > *Sent:* Dienstag, 4. September 2018 07:32 > > *To:* Doerr, Martin > > *Cc:* Lindenmaier, Goetz ; Gustavo Romero < gromero at linux.vnet.ibm.com>; hotspot compiler < hotspot-compiler-dev at openjdk.java.net>; hotspot-dev at openjdk.java.net > > *Subject:* RE: RFR: 8208171: PPC64: Enrich SLP support > > > > Hi Goetz, Martin, and Gustavo, > > > > > >>First, this should have been reviewed on hotspot-compiler-dev. It is clearly > >>a compiler change. _ > > _>http://mail.openjdk.java.net/mailman/listinfo < http://mail.openjdk.java.net/mailman/listinfo> says that hotspot-dev is for > >>"Technical discussion about the development of the HotSpot virtual machine that's not specific to any particular component" > >>while hotspot-compiler-dev is for > >>"Technical discussion about the development of the HotSpot bytecode compilers" > > I understood the instruction and would use hotspot-compiler-dev in future RFRs, thanks. > > > > > >> Why do you rename vnoreg to vnoregi? > > I followed the way of coding for vsnoreg and vsnoregi, but the renaming does not look necessary. I would get this part back. Should I also rename vsnoregi to vsnoreg? > > > > > >>we noticed jtreg test failures when using this change: > >>compiler/runtime/safepoints/TestRegisterRestoring.java > >>compiler/runtime/Test7196199.java > >> > >>TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. > >> > >>We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. > >>The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. > > Thank you for letting me know the issue, I will try to reproduce this on a SUSE machine. > > > > > >>I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. > > Thank you for pointing out another issue. Currently I do not hit this problem, but preventing to match the vector nodes makes sense to avoid the crash. I did not prepare match rules for non-vector nodes, so it might be better to prepare them similarly like the Replicate* rules, in any case. > > > > > > Gustavo, thanks for the wrap-up! > > > > > > Best regards, > > -- > > Michihiro, > > IBM Research - Tokyo > > > > Inactive hide details for "Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures whe"Doerr, Martin" ---2018/09/04 02:18:24---Hi Gustavo and Michihiro, we noticed jtreg test failures when using this change: > > > > From: "Doerr, Martin" > > > To: Gustavo Romero >, "Lindenmaier, Goetz" >, Michihiro Horie > > > Cc: hotspot compiler >, "hotspot-dev at openjdk.java.net " > > > Date: 2018/09/04 02:18 > > Subject: RE: RFR: 8208171: PPC64: Enrich SLP support > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > > > > > > > Hi Gustavo and Michihiro, > > > > we noticed jtreg test failures when using this change: > > compiler/runtime/safepoints/TestRegisterRestoring.java > > compiler/runtime/Test7196199.java > > > > TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000. > > > > We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler. > > The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default. > > > > That's what I found out so far. Maybe you have an idea? > > > > I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented. > > > > Best regards, > > Martin > > > > > > -----Original Message----- > > From: hotspot-dev > On Behalf Of Gustavo Romero > > Sent: Montag, 3. September 2018 14:57 > > To: Lindenmaier, Goetz >; Michihiro Horie > > > Cc: hotspot compiler >; hotspot-dev at openjdk.java.net > > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > > > Hi Goetz, > > > > On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote: > >> Also, I can not find all of the mail traffic in > >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html. > >> Is this a problem of the pipermail server? > >> > >> For some reason this webrev lacks the links to browse the diffs. > >> Do you need to use a more recent webrev? You can obtain it with > >> hg clone http://hg.openjdk.java.net/code-tools/webrev/ . > > > > Yes, probably it was a problem of the pipermail or in some relay. > > I noted the same thing, i.e. at least one Michi reply arrived > > to me but missed a ML. > > > > The initial discussion is here: > > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html > > > > I understand Martin reviewed the last webrev in that thread, which is > > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ < http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/> < http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/> (taken from > > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html ) > > > > Martin's review of webrev.01: > > http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html > > > > and Michi's reply to Martin's review of webrev.01: > > http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02, > > taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html ). > > > > and your last review: > > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html > > > > > > HTH. > > > > Best regards, > > Gustavo > > > >> Why do you rename vnoreg to vnoregi? > >> > >> Besides that the change is fine, thanks for implementing this! > >> > >> Best regards, > >> Goetz. > >> > >> > >>> -----Original Message----- > >>> From: Doerr, Martin > >>> Sent: Dienstag, 28. August 2018 19:35 > >>> To: Gustavo Romero >; Michihiro Horie > >>> > > >>> Cc: Lindenmaier, Goetz >; hotspot- > >>> dev at openjdk.java.net ; ppc-aix-port-dev at openjdk.java.net ; Simonis, Volker > >>> > > >>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support > >>> > >>> Hi Michihiro, > >>> > >>> thank you for implementing it. I have just taken a first look at your > >>> webrev.01. > >>> > >>> It looks basically good. Only the Power version check seems to be incorrect. > >>> VM_Version::has_popcntb() checks for Power5. > >>> I believe most instructions are available with Power7. > >>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with > >>> Power8? > >>> We should check this carefully. > >>> > >>> Also, indentation in register_ppc.hpp could get improved. > >>> > >>> Thanks and best regard, > >>> Martin > >>> > >>> > >>> -----Original Message----- > >>> From: Gustavo Romero > > >>> Sent: Donnerstag, 26. Juli 2018 16:02 > >>> To: Michihiro Horie > > >>> Cc: Lindenmaier, Goetz >; hotspot- > >>> dev at openjdk.java.net ; Doerr, Martin >; ppc-aix- > >>> port-dev at openjdk.java.net ; Simonis, Volker > > >>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > >>> > >>> Hi Michi, > >>> > >>> On 07/26/2018 01:43 AM, Michihiro Horie wrote: > >>>> I updated webrev: > >>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ < http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/> < http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.01/> > >>> > >>> Thanks for providing an updated webrev and for fixing indentation and > >>> function > >>> order in assembler_ppc.inline.hpp as well. I have no further comments :) > >>> > >>> > >>> Best Regards, > >>> Gustavo > >>> > >>>> > >>>> Best regards, > >>>> -- > >>>> Michihiro, > >>>> IBM Research - Tokyo > >>>> > >>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, > >>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero --- > >>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie > >>> wrote: > >>>> > >>>> From: Gustavo Romero > > >>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port- > >>> dev at openjdk.java.net , hotspot-dev at openjdk.java.net > >>>> Cc: goetz.lindenmaier at sap.com , volker.simonis at sap.com , "Doerr, Martin" > >>> > > >>>> Date: 2018/07/25 23:05 > >>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > >>>> > >>>> ------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ---------------------------------------------------------------------------------------------- > >>> ----------------------------------------------------- > >>>> > >>>> > >>>> > >>>> Hi Michi, > >>>> > >>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote: > >>>> > Dear all, > >>>> > > >>>> > Would you review the following change? > >>>> > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 > >>>> > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 < http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.00> < http://cr.openjdk.java.net/%7Emhorie/8208171/webrev.00> > >>>> > > >>>> > This change adds support for vectorized arithmetic calculation with SLP. > >>>> > > >>>> > The to_vr function is added to convert VSR to VR. Currently, vecX is > >>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, > >>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the > >>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the > >>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to > >>> the ConvD2FNode::Value in convertnode.cpp. > >>>> > >>>> Looks good. Just a few comments: > >>>> > >>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of > >>> vmaddfp in > >>>> order to avoid the splat? > >>>> > >>>> - Although all instructions added by your change where introduced in ISA > >>> 2.06, > >>>> so POWER7 and above are OK, as I see probes for > >>> PowerArchictecturePPC64=6|5 in > >>>> vm_version_ppc.cpp (line 64), I'm wondering if there is any control point > >>> to > >>>> guarantee that these instructions won't be emitted on a CPU that does > >>> not > >>>> support them. > >>>> > >>>> - I think that in general string in format %{} are in upper case. For instance, > >>>> this the current output on optoassembly for vmul4F: > >>>> > >>>> 2941835 5b4 ADDI R24, R24, #64 > >>>> 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F > >>>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector > >>>> > >>>> I think it would be better to be in upper case instead. I also think that if > >>>> the node match emits more than one instruction all instructions must be > >>> listed > >>>> in format %{}, since it's meant for detailed debugging. Finally I think it > >>>> would be better to replace \t! by \t// in that string (unless I'm missing any > >>>> special meaning for that char). So for vmul4F it would be something like: > >>>> > >>>> 2941835 5b4 ADDI R24, R24, #64 > >>>> VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 > >>>> 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F > >>>> 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector > >>>> > >>>> > >>>> But feel free to change anything just after you get additional reviews :) > >>>> > >>>> > >>>> > I confirmed this change with JTREG. In addition, I used attached micro > >>> benchmarks. > >>>> > /(See attached file: slp_microbench.zip)/ > >>>> > >>>> Thanks for sharing it. > >>>> Btw, another option to host it would be in the CR > >>>> server, in http://cr.openjdk.java.net/~mhorie/8208171 < http://cr.openjdk.java.net/%7Emhorie/8208171> < http://cr.openjdk.java.net/%7Emhorie/8208171> > >>>> > >>>> > >>>> Best regards, > >>>> Gustavo > >>>> > >>>> > > >>>> > Best regards, > >>>> > -- > >>>> > Michihiro, > >>>> > IBM Research - Tokyo > >>>> > > >>>> > >>>> > >>>> > >> > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From aph at redhat.com Fri Sep 7 15:26:36 2018 From: aph at redhat.com (Andrew Haley) Date: Fri, 7 Sep 2018 16:26:36 +0100 Subject: [aarch64-port-dev ] RFR(S): 8210461 - AArch64: Math.cos intrinsic gives incorrect results In-Reply-To: <4ef01df1-0c2f-78ca-f2e5-f10b78247140@bell-sw.com> References: <4ef01df1-0c2f-78ca-f2e5-f10b78247140@bell-sw.com> Message-ID: <5ebc307e-e1ce-06e8-d42c-36206685c679@redhat.com> On 09/07/2018 03:03 PM, Dmitrij Pochepko wrote: > I remember debugging this branch while running JCK tests. > > Haven't checked precisely, but probably fw was 0 on those cases, so, z > - two24B*fw and z + tmp24B*fw. It would explain such behavior. I see. I wrote some simple code to stress test argument reduction, and it immediately failed. The range reduction code is so horribly complicated that the *first thing* to have done should have been to stress test it, and evidently that was not done. The code, as it stands, is so complicated and tangled that it is almost impossible for anybody to debug and analyse. Its documentation is inadequate, for the same reasons that Andrew Dinn explained with respect to pow(). I can't have any confidence that there aren't more lurking bugs, and this method is too important to risk breakage. It needs some major reworking. In hindsight, I should not have accepted it. It's too late to get this fixed in the JDK 11 release, so it's going to go out broken on AArch64. I'll disable the intrinsic in JDK devel and tell the distro packagers to do patch their packages. Then we can rewrite this intrinsic with a view of fixing its maintainability and documentation, and perhaps including it in JDK 12. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From martin.doerr at sap.com Fri Sep 7 16:11:34 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 7 Sep 2018 16:11:34 +0000 Subject: RFR(M): 8210497: [PPC64] Vector registers not saved across safepoint Message-ID: Hi, we noticed that the RegisterSaver misses code to save and restore the vector registers on PPC64. I'd like to fix that. Bug: https://bugs.openjdk.java.net/browse/JDK-8210497 Webrev: http://cr.openjdk.java.net/~mdoerr/8210497_PPC64_save_CR/webrev.00/ This webrev already fixes the following tests when JDK-8208171 webrev.03 is applied: compiler/runtime/safepoints/TestRegisterRestoring.java compiler/runtime/Test7196199.java I'll try to test the OopMap part. This may be tricky. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmitrij.pochepko at bell-sw.com Fri Sep 7 16:42:41 2018 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Fri, 7 Sep 2018 19:42:41 +0300 Subject: RFR: 8189107 - AARCH64: create intrinsic for pow In-Reply-To: References: <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com> <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com> Message-ID: Hi Andrew, Thank you again for looking into it in such details. It will take me some time to review your draft with comments related to original code. Looking forward to work on improving the code and algorithm description after that. Small note though: since you're adding documentation to original code, it probably would make sense to update it in original location as well at src/hotspot/share/runtime/sharedRuntimeTrans.cpp Thanks, Dmitrij On 07/09/18 15:58, Andrew Dinn wrote: > Hi Dmitrij > > On 22/08/18 11:04, Andrew Dinn wrote: >> Thank you for the revised webrev and new test results. I am now working >> through them. I will post comments as soon as I have given the new code >> a full read and assessed the new results. I am afraid that may take a >> day or two, for which delay advance apologies. > This review has taken a great deal longer than expected. I am sorry but > that is because the documentation for the code you have submitted is > still seriously inadequate and I have had to put a lot of work into > revising it before I can fully review the code. > > I am still finishing off that last task but I wanted to start providing > you with some feedback and also to enlist your help in checking that my > revisions are correct. I plan to provide feedback in 3 stages to match > the 3 steps in the review that I am doing as follows: > > 1) Correct the original 'algorithm' you started from > > 2) Correct the 'modified algorithm' that is meant to describe the > behaviour of your code > > 3) Propose any necessary corrections/improvements to the generated code > > So, let's start with step 1. > > The 'original' algorithm located in file macroAssembler_aarch64_pow.cpp > is really just a fragment of C code with a few missing elements (e.g. > the origin of values P1, P2, ... is not explained, hugeX, tiny are not > defined). Although this code as the virtue that it is known to be > correct (or at least has been verified by long use and the eyes of > experts in numerical computation) it still fails to provide important > information about what the 'algorithm' is supposed to do. That > information is critical for anyone coming to it fresh to be able > understand what is happening. > > The first omission is several pieces of background mathematics that are > neither explained in the code nor referenced. The mathematics includes > the formulae on which the algorithm is based and the numerical > approximation to these formulae that is employed to define the > algorithm. This is needed to explain /how/ and /why/ a) the two > different computations of log2(x) and b) the computation of exp(x) are > performed as they are and to justify that the results are valid. > > The second omission is detailed descriptions of what most of the more > complex individual steps in the algorithm do. Many of the logic, > floating point and branching operations which compute intermediate > results are extremely opaque. This is particularly so for the steps > which manipulate bit patterns in the long representation of the fp > values being used. However, some of the straight fp arithmetic is also > highly problematic. > > The other thing I think needs to be made clearer is the relationship > between the various special case return points in the code and the > special case rules they relate to. This is not so critical for the > original algorithm because the C code at least has a regular and > standardised control flow. However, labelling the exit paths is still > useful here and will be much more useful if used both here and in the > modified algorithm (and we'll come to that later). > > I have rewritten the algorithm to achieve what I think is needed to > patch these omissions. The redraft of this part of the code is available > here: > > http://cr.openjdk.java.net/~adinn/8189107/original_algorithm.txt > > I assume you are familiar with the relevant mathematics and how it has > been used to derive the algorithm. If so then I would like you to review > this rewrite and ensure that there are nor mathematical errors in it. I > would also like you to check that the explanatory comments for of the > individual steps in the algorithm do not contain any errors. > > If you are not familiar with the mathematics then please let me know. I > need to know whether this has been reviewed bu someone competent to do so. > > n.b. one little detail you might easily miss. I removed lg2, lg2_h and > lg2_l from the first table of constants as neither log(x) algorithm > needs them (it relies on ivln2). I renamed the entries in the second > table from LG2, etc to /ln2/, etc and change the name accordingly at > point of use. The computation of exp(x) actually does need ln2. One of > the code changes is to remove these redundant entries from your table > pow_coeff1. > > Ok, as for the next 2 steps will post a follow-up to deal with them once > I have completed my review. That will include a heavily revised version > of your 'modified algorithm' (which is still in progress) plus > suggestions for changes to the code that I have found along the way. > Just as a preliminary I'll summarize what is wrong below. > > Note that I have not yet found any errors in how the generated code > implements the mathematics but I am still not happy with it because it > is extremely unclear. Correcting the 'modified algorithm' is a > necessary, critical step to improvimg the clarity of the code. > > So, in overview, what is wrong with your 'modified algorithm'. Well, the > thing that is immediately obvious is that it is /not/ actually the > algorithm you have employed. It is simply a mangled version of the C > code you started from that bears only a tenuous relation to the code > structure it is supposed to summarize. Now, I'm happy for you to use C > to model the generated code if possible and, in fact, am in the process > of writing a proper algorithm that looks as much like C code as possible > /but/ also actually describes what your generated code does. The problem > is that what you have written is not only /not/ C it is also i) > incoherent, ii) retains elements from the original code that don't exist > at all in the generated code and iii) omits important elements of the > generated code. > > So, firstly, let's deal with the problem as it relates to control flow. > Your 'modified algorithm' includes various tags mentioning the word > 'label' suggesting that some transfer of control is to be effected. > However, these are tacked onto statement blocks connected via 'if > (cond)' tests or 'else' keywords that are meant to imply some > alternative control flow. Essentially, your generated code relies on > gotos which do not fit a standard if/else flow model and you have tried > to bodge some sort of goto model on top of the original valid, gotoless > C control flow with no clear definition of how that is meant to work. > Honestly, if your generated code uses a goto control flow then your C > algorithm is going to have to do the same in order to clearly summarize > what the code actually does. > > The second major problem is one I pointed out in my earlier note, i.e. > that the data values described in the 'modified algorithm' do not > correctly match the ones operated on in your generated code. Your > algorithm lists many redundant values used in the original algorithm > (e.g. ix, iy, ax, yisint) even though your code doesn't ever explicitly > construct most of those values (n.b. this but not just limited to the 32 > bit half-word quantities). Instead the code frequently pulls the > relevant value, as needed, out of other data that it does construct and > holds in registers -- sometimes across control branches. At other times > it performs an equivalent operation on a different, related data value. > Your response to my request was to add comments which labelled some of > these on-the-fly created values or alternative values with the original > names but that ignores the fact that the names and the values referenced > in the comments do not actually match. > > Contrariwise, a lot of the values the code does actually operate on are > not mentioned in the algorithm. Indeed, it is worse than that because > they are not coherently identified even in the generated code. Data > items stored in registers are referred to using the utterly redundant > symbolic aliases tmp0, tmp2, etc for registers r0, r1 etc. What is worse > the same meaningless symbolic names get reused for completely different > data items. > > For example, at one point tmp2 identifies the exponent of y stored in r2 > and later it identifies the absolute value of y also stored in r2, > overwriting the exponent. Your algorithm really ought to mention values > like exp_y or ay (or even |y|) for these cases and the code should > correspondingly define exp_y and ay as an alias for register r2. These > meaningful names should then be used when loading the constructed value > into a register and at every subsequent point of use where that > constructed value is valid. > > This is not all that is wrong with the 'modified algorithm' but it is > enough to make it not just useless but worse than useless. What you have > written provides a hand-wave towards what the code does that fails to > summarize it with any accuracy or clarity and equally fails to clarify > the difference between it and the C code you started from. That only > makes the whole picture less clear not more so. > > As I said, I will provide a better version of the 'modified algorithm' > in a follow-up and then discuss possible code changes. Please review the > linked file above while I prepare that. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From dmitrij.pochepko at bell-sw.com Fri Sep 7 16:45:36 2018 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Fri, 7 Sep 2018 19:45:36 +0300 Subject: [aarch64-port-dev ] RFR(S): 8210461 - AArch64: Math.cos intrinsic gives incorrect results In-Reply-To: <5ebc307e-e1ce-06e8-d42c-36206685c679@redhat.com> References: <4ef01df1-0c2f-78ca-f2e5-f10b78247140@bell-sw.com> <5ebc307e-e1ce-06e8-d42c-36206685c679@redhat.com> Message-ID: Hi Andrew, Ok. I'm really sorry to have introduced such a bug and I agree that the best strategy is to disable the intrinsic temporarily for sin and cos. I aim to work with Andrew Dinn on pow to calibrate and enhance documentation and algorithm there first. Then I'll get back to sin/cos and revise it in a same manner. Meanwhile, do we have to abandon this particular patch? It still resolve this particular problem and it would be a waste to re-debug and fix this problem later. Thanks, Dmitrij On 07/09/18 18:26, Andrew Haley wrote: > On 09/07/2018 03:03 PM, Dmitrij Pochepko wrote: >> I remember debugging this branch while running JCK tests. >> >> Haven't checked precisely, but probably fw was 0 on those cases, so, z >> - two24B*fw and z + tmp24B*fw. It would explain such behavior. > I see. > > I wrote some simple code to stress test argument reduction, and it > immediately failed. The range reduction code is so horribly > complicated that the *first thing* to have done should have been to > stress test it, and evidently that was not done. > > The code, as it stands, is so complicated and tangled that it is > almost impossible for anybody to debug and analyse. Its documentation > is inadequate, for the same reasons that Andrew Dinn explained with > respect to pow(). I can't have any confidence that there aren't more > lurking bugs, and this method is too important to risk breakage. It > needs some major reworking. In hindsight, I should not have accepted > it. > > It's too late to get this fixed in the JDK 11 release, so it's going > to go out broken on AArch64. I'll disable the intrinsic in JDK devel > and tell the distro packagers to do patch their packages. Then we can > rewrite this intrinsic with a view of fixing its maintainability and > documentation, and perhaps including it in JDK 12. > From aph at redhat.com Sat Sep 8 09:14:07 2018 From: aph at redhat.com (Andrew Haley) Date: Sat, 8 Sep 2018 10:14:07 +0100 Subject: [aarch64-port-dev ] RFR(S): 8210461 - AArch64: Math.cos intrinsic gives incorrect results In-Reply-To: References: <4ef01df1-0c2f-78ca-f2e5-f10b78247140@bell-sw.com> <5ebc307e-e1ce-06e8-d42c-36206685c679@redhat.com> Message-ID: <7e4549e8-232e-c329-b92e-a368f3b7b7ba@redhat.com> On 09/07/2018 05:45 PM, Dmitrij Pochepko wrote: > Meanwhile, do we have to abandon this particular patch? It still resolve > this particular problem and it would be a waste to re-debug and fix this > problem later. No, we don't have to abandon this patch. Please push it to JDK head, thanks. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From adinn at redhat.com Sun Sep 9 08:08:12 2018 From: adinn at redhat.com (Andrew Dinn) Date: Sun, 9 Sep 2018 09:08:12 +0100 Subject: RFR: 8189107 - AARCH64: create intrinsic for pow In-Reply-To: References: <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com> <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com> Message-ID: Hi Dmitrij On 07/09/18 17:42, Dmitrij Pochepko wrote: > Thank you again for looking into it in such details. It will take me > some time to review your draft with comments related to original code. > Looking forward to work on improving the code and algorithm description > after that. You are welcome. However, thanks are not needed. This is simply what I am required to do as a reviewer. > Small note though: since you're adding documentation to original code, > it probably would make sense to update it in original location as well > at src/hotspot/share/runtime/sharedRuntimeTrans.cpp I agree that it would be better if comments in that shared code were also updated. However, I recommend we pursue that task as a follow-up once we have fixed the intrinsic. Also, it's important to note that the omission in the above file are, to a degree, mitigated by the /slightly/ more complete documentation in file src/java.base/share/classes/java/lang/FdLibm.java. Comments in the methods for computing log(x) and exp(x) in the latter file include some of same details of the maths/algorithms that I described (I only found these comments after deriving the relevant maths myself :-). So, we might consider upgrading the comments in the Java source and adding a cross-reference to that file from the C source. The code itself is almost identical so one commented version should work for both. I'd still like my comments to remain in your generator code. This is the most complex implementation version and has the greatest divergence from the original. So, it will be the focus of any nasty bugs that arise. Having an explanation of the maths and algorithm right there in the generator source is going to ensure whoever has to fix any such bug is best prepared to do so. Also, it will pin down the version of the shared code from which the generator was derived. The shared code ought not to be updated without changing the generator code but keeping the C template in with the generator is safer. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From jayaprabhakar at gmail.com Mon Sep 10 03:58:02 2018 From: jayaprabhakar at gmail.com (jayaprabhakar k) Date: Sun, 9 Sep 2018 20:58:02 -0700 Subject: Any way to avoid JIT overhead for small programs when using AOT? Message-ID: Hi, I understand that at present AOT and -Xint are not compatible. I see the code explicitly disables AOT when -Xint is set . For extremely short programs, typically used by beginners learning Java, I see that CDS, AOT and Xint all help reduce the startup time. While CDS works with both AOT and Xint, multiplying the benefits, AOT and Xint do not. Is there a way to keep both AOT + Xint, For classes/methods that are precompiled, use AOT code, and for others just interpret? If not now, would it be possible in the future? Thanks, JP -------------- next part -------------- An HTML attachment was scrubbed... URL: From Pengfei.Li at arm.com Mon Sep 10 04:24:16 2018 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Mon, 10 Sep 2018 04:24:16 +0000 Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by power-of-2 check Message-ID: Hi Dean / Vladimir / JDK experts, Do you have any further questions or comments on this patch? Or should I make some modifications on it, such as adding some limitations to the matching condition? I appreciate your help. -- Thanks, Pengfei > -----Original Message----- > From: Pengfei Li (Arm Technology China) > Sent: Monday, September 3, 2018 13:50 > To: 'dean.long at oracle.com' ; 'Vladimir Kozlov' > ; hotspot-compiler-dev at openjdk.java.net; > hotspot-dev at openjdk.java.net > Cc: nd > Subject: RE: RFR(S): 8210152: Optimize integer divisible by power-of-2 check > > Hi Vladimir, Dean, > > Thanks for your review. > > > I don't see where negation is coming from for 'X % 2 == 0' expression. > > It should be only 2 instructions: 'cmp (X and 1), 0' > The 'cmp (X and 1), 0' is just what we expected. But there's redundant > conditional negation coming from the possibly negative X handling in "X % 2". > For instance, X = -5, "X % 2" should be -1. So only "(X and 1)" operation is not > enough. We have to negate the result. > > > I will look on it next week. But it would be nice if you can provide small test > to show this issue. > I've already provided a case of "if (a%2 == 0) { ... }" in JBS description. What > code generated and what can be optimized are listed there. > You could see https://bugs.openjdk.java.net/browse/JDK-8210152 for details. > You could also see the test case for this optimization I attached below. > > > It looks like your matching may allow more patterns than expected. I was > expecting it to look for < 0 or >= 0 for the conditional negation, but I don't see > it. > Yes. I didn't limit the if condition to <0 or >= 0 so it will match more patterns. > But nothing is going wrong if this ideal transformation applies on more cases. > In pseudo code, if someone writes: > if ( some_condition ) { x = -x; } > if ( x == 0 ) { do_something(); } > The negation in 1st if-clause could always be eliminated whatever the > condition is. > > -- > Thanks, > Pengfei > > > -- my test case attached below -- > public class Foo { > > public static void main(String[] args) { > int[] dividends = { 0, 17, 1553, -90, -35789, 0x80000000 }; > for (int i = 0; i < dividends.length; i++) { > int x = dividends[i]; > System.out.println(testDivisible(x)); > System.out.println(testModulo(x)); > testCondNeg(x); > } > return; > } > > public static int testDivisible(int x) { > // Modulo result is only for zero check > if (x % 4 == 0) { > return 444; > } > return 555; > } > > public static int testModulo(int x) { > int y = x % 4; > if (y == 0) { > return 222; > } > // Modulo result is used elsewhere > System.out.println(y); > return 333; > } > > public static void testCondNeg(int x) { > // Pure conditional negation > if (printAndIfNeg(x)) { > x = -x; > } > if (x == 0) { > System.out.println("zero!"); > } > } > > static boolean printAndIfNeg(int x) { > System.out.println(x); > return x <= 0; > } > } From dean.long at oracle.com Mon Sep 10 08:00:29 2018 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 10 Sep 2018 01:00:29 -0700 Subject: Any way to avoid JIT overhead for small programs when using AOT? In-Reply-To: References: Message-ID: On 9/9/18 8:58 PM, jayaprabhakar k wrote: > Hi, > I understand that at present AOT and -Xint are not compatible. I see > the code explicitly disables AOT when -Xint is set > . > > For extremely short programs, typically used by beginners learning > Java, I see that CDS, AOT and Xint all help reduce the startup time. > While CDS works with both AOT and Xint, multiplying the benefits, AOT > and Xint do not. > > Is there a way to keep both AOT?+ Xint, For classes/methods that are > precompiled, use AOT code, and for others just interpret? If not now, > would it be possible in the future? > > Thanks, > JP Hi JP.? Yes, it could be possible in the future.? One problem is MethodHandle intrinsics.? With -Xint, there's no code heap, so no place to generate native adapters for those intrinsics. dl -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Mon Sep 10 08:17:59 2018 From: aph at redhat.com (Andrew Haley) Date: Mon, 10 Sep 2018 09:17:59 +0100 Subject: Any way to avoid JIT overhead for small programs when using AOT? In-Reply-To: References: Message-ID: <2753b70f-67c7-ef7a-ca40-49266f502401@redhat.com> On 09/10/2018 04:58 AM, jayaprabhakar k wrote: > I understand that at present AOT and -Xint are not compatible. I see the > code explicitly disables AOT when -Xint is set > > . > > For extremely short programs, typically used by beginners learning Java, I > see that CDS, AOT and Xint all help reduce the startup time. While CDS > works with both AOT and Xint, multiplying the benefits, AOT and Xint do > not. > > Is there a way to keep both AOT + Xint, For classes/methods that are > precompiled, use AOT code, and for others just interpret? If not now, would > it be possible in the future? Does it significantly help? If you precompile the Java library and your programs are extremely short, you'll see very little compilation activity. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From kuaiwei.kw at alibaba-inc.com Mon Sep 10 08:39:42 2018 From: kuaiwei.kw at alibaba-inc.com (Kuai Wei) Date: Mon, 10 Sep 2018 16:39:42 +0800 Subject: =?UTF-8?B?SklUOiBDMiBkb2Vzbid0IHNraXAgcG9zdCBiYXJyaWVyIGZvciBuZXcgYWxsb2NhdGVkIG9i?= =?UTF-8?B?amVjdHM=?= Message-ID: <1849545e-2e42-4b24-957c-cb4924362971.kuaiwei.kw@alibaba-inc.com> Hi, Recently I checked the optimization of reducing G1 post barrier for new allocated object. But I found it doesn't work as expected. I wrote a simple test case to store oop in initialize function or just after init function . public class StoreTest { static String val="x"; public static Foo testMethod() { Foo newfoo = new Foo(val); newfoo.b=val; // the store barrier could be reduced return newfoo; } public static void main(String []args) { Foo obj = new Foo(val); // init Foo class testMethod(); } static class Foo { Object a; Object b; public Foo(Object val) { this.a=val; // the store barrier could be reduced }; } } I inline Foo: and Object:: when compile testMethod by C2, so I think the 2 store marked red don't need post barrier. But I still found post barrier in generated assembly code. The test command: java -Xcomp -Xbatch -XX:+UseG1GC -XX:CompileCommandFile=compile_command -Xbatch -XX:+PrintCompilation -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining StoreTest compile_command: compileonly, StoreTest::testMethod compileonly, StoreTest$Foo:: inline, StoreTest$Foo:: compileonly, java.lang.Object:: inline, java.lang.Object:: print, StoreTest::testMethod I checked the node graph in parsing phase. The optimization depends on GraphKit::just_allocated_object to detect new allocate object. The idea is to check control of store is control proj of allocation. But in parse phase , there's a Region node between control proj and control of store. The region just has one input edge. So it could be optimized later. The region node is generated when C2 inline init method of super class, I think it's used in exit map to merge all exit path. The change is simple, in just_allocated_object, I checked if there's region node with only 1 input. With the change, we can see good performance improvement in pressure test. Could you check the change and give comments about it? graphKit.cpp // We use this to determine if an object is so "fresh" that // it does not require card marks. Node* GraphKit::just_allocated_object(Node* current_control) { - if (C->recent_alloc_ctl() == current_control) + Node * ctrl = current_control; + if (CheckJustAllocatedAggressive) { + // Object:: is invoked after allocation, most of invoke nodes + // will be reduced, but a region node is kept in parse time, we check + // the pattern and skip the region node + if (ctrl != NULL && ctrl->is_Region() && ctrl->req() == 2) { + ctrl = ctrl->in(1); + } + } + if (C->recent_alloc_ctl() == ctrl) return C->recent_alloc_obj(); return NULL; } Thanks, Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From 944797358 at qq.com Mon Sep 10 11:10:51 2018 From: 944797358 at qq.com (Andy Law) Date: Mon, 10 Sep 2018 19:10:51 +0800 Subject: [PATCH] 8202414: Unsafe crash in C2 Message-ID: This change is only about: Disabling the un-aligned C2 `clean_memory()` optimization when using Unsafe to write to an unaligned address. ``` java -version openjdk version "1.8.0-internal-debug" OpenJDK Runtime Environment (build 1.8.0-internal-debug-***_2018_09_03_19_31-b00) OpenJDK 64-Bit Server VM (build 25.71-b00-debug, mixed mode) ``` This issue 8202414 is about: ArrayObjects of -XX:+UseCompressedOops on 64-bit has a 12 bits header and a 4 bits length. So the length address is from 12th to 16th bytes. If we use Unsafe.putInt() to write at the 17th bit, the C2 `clean_memory()` will mistakenly do `done_offset -= BytesPerInt;`, then `done_offset` will become 13. And then it will clear the address from the 13th bit, make the array length changes to a different value. When a GC happens, it will crash. I didn?t find the unaligned memory support of `clear_memory()`, so I only do a small fix to make the affect be the least: When Unsafe.put*() writes to an aligned memory as above, it will cause the assert fail. So when it fails, we don?t do any optimizations instead, and the problem solves. I don?t know if it is a good solution? It is only 3 lines of code, so please have a look:) Thank you! Andy -------------- next part -------------- A non-text attachment was scrubbed... Name: openjdk-patch-8202414.diff Type: application/octet-stream Size: 463 bytes Desc: not available URL: From goetz.lindenmaier at sap.com Mon Sep 10 14:15:34 2018 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 10 Sep 2018 14:15:34 +0000 Subject: RFR(S): 8210319: [s390]: Use of shift operators not covered by cpp standard In-Reply-To: References: <383FFEC2-0FB7-402B-95B1-6ACD017BB630@sap.com> Message-ID: <0cc673af4a354ddd81cd9cf639c281a6@sap.com> HI Lutz, looks good to me, too. Thanks, Goetz. > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Schmidt, Lutz > Sent: Dienstag, 4. September 2018 14:59 > To: Doerr, Martin ; hotspot-compiler- > dev at openjdk.java.net > Subject: [CAUTION] Re: RFR(S): 8210319: [s390]: Use of shift operators not > covered by cpp standard > > Hi Martin, > > thanks for the review! > > Regards, > > Lutz > > > > From: "Doerr, Martin (martin.doerr at sap.com)" > Date: Tuesday, 4. September 2018 at 11:28 > To: Lutz Schmidt , "hotspot-compiler- > dev at openjdk.java.net" > Subject: RE: RFR(S): 8210319: [s390]: Use of shift operators not covered by > cpp standard > > > > Hi Lutz, > > > > looks good. Thanks for improving. > > > > Best regards, > > Martin > > > > > > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Schmidt, Lutz > Sent: Dienstag, 4. September 2018 10:29 > To: hotspot-compiler-dev at openjdk.java.net > Subject: RFR(S): 8210319: [s390]: Use of shift operators not covered by cpp > standard > > > > Dear All, > > > > may I please request reviews for this small, s390-only patch. It fixes some > shift operations which relied on behavior not covered by the language > standard. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8210319 > > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8210319.00/ > > > > > Thank you! > > Lutz > > From lutz.schmidt at sap.com Mon Sep 10 14:20:01 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Mon, 10 Sep 2018 14:20:01 +0000 Subject: RFR(S): 8210319: [s390]: Use of shift operators not covered by cpp standard In-Reply-To: <0cc673af4a354ddd81cd9cf639c281a6@sap.com> References: <383FFEC2-0FB7-402B-95B1-6ACD017BB630@sap.com> <0cc673af4a354ddd81cd9cf639c281a6@sap.com> Message-ID: <1E85BE87-07BC-4A86-A72C-512F4F297F01@sap.com> Thank you, Goetz! With two positive reviews, and with the patch having been active in our nightly tests for several days, I'll go ahead and push. Regards, Lutz ?On 10.09.18, 16:15, "Lindenmaier, Goetz" wrote: HI Lutz, looks good to me, too. Thanks, Goetz. > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Schmidt, Lutz > Sent: Dienstag, 4. September 2018 14:59 > To: Doerr, Martin ; hotspot-compiler- > dev at openjdk.java.net > Subject: [CAUTION] Re: RFR(S): 8210319: [s390]: Use of shift operators not > covered by cpp standard > > Hi Martin, > > thanks for the review! > > Regards, > > Lutz > > > > From: "Doerr, Martin (martin.doerr at sap.com)" > Date: Tuesday, 4. September 2018 at 11:28 > To: Lutz Schmidt , "hotspot-compiler- > dev at openjdk.java.net" > Subject: RE: RFR(S): 8210319: [s390]: Use of shift operators not covered by > cpp standard > > > > Hi Lutz, > > > > looks good. Thanks for improving. > > > > Best regards, > > Martin > > > > > > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Schmidt, Lutz > Sent: Dienstag, 4. September 2018 10:29 > To: hotspot-compiler-dev at openjdk.java.net > Subject: RFR(S): 8210319: [s390]: Use of shift operators not covered by cpp > standard > > > > Dear All, > > > > may I please request reviews for this small, s390-only patch. It fixes some > shift operations which relied on behavior not covered by the language > standard. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8210319 > > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8210319.00/ > > > > > Thank you! > > Lutz > > From lutz.schmidt at sap.com Mon Sep 10 15:59:51 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Mon, 10 Sep 2018 15:59:51 +0000 Subject: [PATCH] 8202414: Unsafe crash in C2 In-Reply-To: References: Message-ID: Hi Andy, to avoid misunderstandings, please be precise when talking about bits and bytes. ArrayObjects (with -XX:+UseCompressedOops) have a 16-byte header, whereof the last four bytes (byte# 12..15) designate the array length (in #elements). And now, just checking if I got your intention right: I read your text below as well as description and comments in https://bugs.openjdk.java.net/browse/JDK-8202414. In essence, you are trying to perform a ?-byte store into a byte array by means of a unaligned putInt() call. To my understanding, putInt() is not designed for unaligned accesses. Even "worse", it relies on the store address to be at least 4-byte aligned. That's what I learn e.g. from http://www.docjar.com/docs/api/sun/misc/Unsafe.html. And that's the reason why your code (sometimes) destroys the length field of the ArrayObject header. Your fix would just ignore (copy nothing) calls with unaligned end_offset. Why would you then call the unsafe function at all? Yes, your patch would probably help in your situation. It just puts a blanket of silence over a call with unsupported parameters. That's how far I can comment. I am neither a reviewer nor in a position to decide if such interface violation should be handled gracefully (e.g. by throwing an exception) or if the status quo is ok. Thank you Lutz ?On 10.09.18, 13:10, "hotspot-compiler-dev on behalf of Andy Law" wrote: This change is only about: Disabling the un-aligned C2 `clean_memory()` optimization when using Unsafe to write to an unaligned address. ``` java -version openjdk version "1.8.0-internal-debug" OpenJDK Runtime Environment (build 1.8.0-internal-debug-***_2018_09_03_19_31-b00) OpenJDK 64-Bit Server VM (build 25.71-b00-debug, mixed mode) ``` This issue 8202414 is about: ArrayObjects of -XX:+UseCompressedOops on 64-bit has a 12 bits header and a 4 bits length. So the length address is from 12th to 16th bytes. If we use Unsafe.putInt() to write at the 17th bit, the C2 `clean_memory()` will mistakenly do `done_offset -= BytesPerInt;`, then `done_offset` will become 13. And then it will clear the address from the 13th bit, make the array length changes to a different value. When a GC happens, it will crash. I didn?t find the unaligned memory support of `clear_memory()`, so I only do a small fix to make the affect be the least: When Unsafe.put*() writes to an aligned memory as above, it will cause the assert fail. So when it fails, we don?t do any optimizations instead, and the problem solves. I don?t know if it is a good solution? It is only 3 lines of code, so please have a look:) Thank you! Andy From vladimir.kozlov at oracle.com Mon Sep 10 17:57:17 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 10 Sep 2018 10:57:17 -0700 Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by power-of-2 check In-Reply-To: References: Message-ID: I finally have time to look on it and I agree with your changes. The only comment I have is to add check for SubI on other branch (not only on True branch). Negation may occur on either branch since you accept all conditions for negation. Thanks, Vladimir On 9/9/18 9:24 PM, Pengfei Li (Arm Technology China) wrote: > Hi Dean / Vladimir / JDK experts, > > Do you have any further questions or comments on this patch? Or should I make some modifications on it, such as adding some limitations to the matching condition? > I appreciate your help. > > -- > Thanks, > Pengfei > > >> -----Original Message----- >> From: Pengfei Li (Arm Technology China) >> Sent: Monday, September 3, 2018 13:50 >> To: 'dean.long at oracle.com' ; 'Vladimir Kozlov' >> ; hotspot-compiler-dev at openjdk.java.net; >> hotspot-dev at openjdk.java.net >> Cc: nd >> Subject: RE: RFR(S): 8210152: Optimize integer divisible by power-of-2 check >> >> Hi Vladimir, Dean, >> >> Thanks for your review. >> >>> I don't see where negation is coming from for 'X % 2 == 0' expression. >>> It should be only 2 instructions: 'cmp (X and 1), 0' >> The 'cmp (X and 1), 0' is just what we expected. But there's redundant >> conditional negation coming from the possibly negative X handling in "X % 2". >> For instance, X = -5, "X % 2" should be -1. So only "(X and 1)" operation is not >> enough. We have to negate the result. >> >>> I will look on it next week. But it would be nice if you can provide small test >> to show this issue. >> I've already provided a case of "if (a%2 == 0) { ... }" in JBS description. What >> code generated and what can be optimized are listed there. >> You could see https://bugs.openjdk.java.net/browse/JDK-8210152 for details. >> You could also see the test case for this optimization I attached below. >> >>> It looks like your matching may allow more patterns than expected. I was >> expecting it to look for < 0 or >= 0 for the conditional negation, but I don't see >> it. >> Yes. I didn't limit the if condition to <0 or >= 0 so it will match more patterns. >> But nothing is going wrong if this ideal transformation applies on more cases. >> In pseudo code, if someone writes: >> if ( some_condition ) { x = -x; } >> if ( x == 0 ) { do_something(); } >> The negation in 1st if-clause could always be eliminated whatever the >> condition is. >> >> -- >> Thanks, >> Pengfei >> >> >> -- my test case attached below -- >> public class Foo { >> >> public static void main(String[] args) { >> int[] dividends = { 0, 17, 1553, -90, -35789, 0x80000000 }; >> for (int i = 0; i < dividends.length; i++) { >> int x = dividends[i]; >> System.out.println(testDivisible(x)); >> System.out.println(testModulo(x)); >> testCondNeg(x); >> } >> return; >> } >> >> public static int testDivisible(int x) { >> // Modulo result is only for zero check >> if (x % 4 == 0) { >> return 444; >> } >> return 555; >> } >> >> public static int testModulo(int x) { >> int y = x % 4; >> if (y == 0) { >> return 222; >> } >> // Modulo result is used elsewhere >> System.out.println(y); >> return 333; >> } >> >> public static void testCondNeg(int x) { >> // Pure conditional negation >> if (printAndIfNeg(x)) { >> x = -x; >> } >> if (x == 0) { >> System.out.println("zero!"); >> } >> } >> >> static boolean printAndIfNeg(int x) { >> System.out.println(x); >> return x <= 0; >> } >> } From Alan.Bateman at oracle.com Mon Sep 10 18:05:18 2018 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 10 Sep 2018 19:05:18 +0100 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <50ed4716-b76e-6557-1146-03084776c160@redhat.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> Message-ID: <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> On 20/08/2018 16:18, Andrew Dinn wrote: > Hi Alan, > > Round 4: > > I have redrafted the JEP and updated the implementation in the light of > your last feedback: > > JEP JIRA: https://bugs.openjdk.java.net/browse/JDK-8207851 > > Formatted JEP: http://openjdk.java.net/jeps/8207851 > > New webrev: http://cr.openjdk.java.net/~adinn/pmem/webrev.04/ > > The updated JEP looks much better. I realize we've been through several iterations on this but I'm now wondering if the MappedByteBuffer is the right API. As you've shown, it's straight forward to map a region of NVM and use the existing API, I'm just not sure if it's the right API. I think I'd like to see a few examples of how the API might be used. ByteBuffers aren't intended for use by concurrent threads and I just wonder if the examples might need that. I also wonder if there is a possible connection with work in Project Panama and whether it's worth exploring if its scopes and pointers could be used to backed by NVM. The Risks and Assumption section mentions the 2GB limit which is another reminder that the MBB API may not be the right API. The 2-arg force method to msync a region make sense? although it might be more consistent for the second parameter to be the length than the end offset. A detail for later is whether UOE might be more appropriate for implementations that do not support the XXX_PERSISTENT modes. -Alan. From aph at redhat.com Mon Sep 10 18:29:47 2018 From: aph at redhat.com (Andrew Haley) Date: Mon, 10 Sep 2018 19:29:47 +0100 Subject: [PATCH] 8202414: Unsafe crash in C2 In-Reply-To: References: Message-ID: On 09/10/2018 04:59 PM, Schmidt, Lutz wrote: > To my understanding, putInt() is not designed for unaligned accesses. Even "worse", it relies on the store address to be at least 4-byte aligned. That's what I learn e.g. from http://www.docjar.com/docs/api/sun/misc/Unsafe.html. And that's the reason why your code (sometimes) destroys the length field of the ArrayObject header. Exactly: user error, don't do that. The doc is clear, I think. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.kozlov at oracle.com Mon Sep 10 19:34:55 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 10 Sep 2018 12:34:55 -0700 Subject: [PATCH] 8202414: Unsafe crash in C2 In-Reply-To: References: Message-ID: <3347f875-74a2-47a9-9108-3e1685107423@oracle.com> Thank you, Andy Unfortunately your change may leave uninitialized (not zeroed) bytes in object. Instead unaligned stores should be treated as subword stores: diff -r b9f6a4427da9 src/hotspot/share/opto/memnode.cpp --- a/src/hotspot/share/opto/memnode.cpp +++ b/src/hotspot/share/opto/memnode.cpp @@ -4095,10 +4095,11 @@ // See if this store needs a zero before it or under it. intptr_t zeroes_needed = st_off; - if (st_size < BytesPerInt) { + if (st_size < BytesPerInt || (zeroes_needed % BytesPerInt) != 0) { // Look for subword stores which only partially initialize words. // If we find some, we must lay down some word-level zeroes first, // underneath the subword stores. + // Do the same for unaligned stores. // // Examples: // byte[] a = { p,q,r,s } => a[0]=p,a[1]=q,a[2]=r,a[3]=s Rahul, the bug is assigned to you. Please, test this solution. Thanks, Vladimir On 9/10/18 4:10 AM, Andy Law wrote: > This change is only about: > Disabling the un-aligned C2 `clean_memory()` optimization when using Unsafe to write to an unaligned address. > > ``` > java -version > openjdk version "1.8.0-internal-debug" > OpenJDK Runtime Environment (build 1.8.0-internal-debug-***_2018_09_03_19_31-b00) > OpenJDK 64-Bit Server VM (build 25.71-b00-debug, mixed mode) > ``` > > This issue 8202414 is about: > ArrayObjects of -XX:+UseCompressedOops on 64-bit has a 12 bits header and a 4 bits length. So the length address is from 12th to 16th bytes. > If we use Unsafe.putInt() to write at the 17th bit, the C2 `clean_memory()` will mistakenly do `done_offset -= BytesPerInt;`, then `done_offset` will become 13. And then it will clear the address from the 13th bit, make the array length changes to a different value. When a GC happens, it will crash. > > I didn?t find the unaligned memory support of `clear_memory()`, so I only do a small fix to make the affect be the least: > When Unsafe.put*() writes to an aligned memory as above, it will cause the assert fail. So when it fails, we don?t do any optimizations instead, and the problem solves. > > I don?t know if it is a good solution? It is only 3 lines of code, so please have a look:) Thank you! > > > Andy > From dean.long at oracle.com Mon Sep 10 20:08:02 2018 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 10 Sep 2018 13:08:02 -0700 Subject: RFR(XS) 8210434: [Graal] 8209301 prevents GitHub Graal from compiling with latest JDK Message-ID: <87209ade-9c45-2edd-3370-31a447e38588@oracle.com> http://cr.openjdk.java.net/~dlong/8210434/webrev/ https://bugs.openjdk.java.net/browse/JDK-8210434 This change reverts the 8209301 rename in AOTCompiledClass and adds back HotSpotResolvedObjectType.isAnonymous to preserve compatibility. dl From cthalinger at twitter.com Mon Sep 10 20:17:44 2018 From: cthalinger at twitter.com (Christian Thalinger) Date: Mon, 10 Sep 2018 22:17:44 +0200 Subject: RFR (S): 8209972: [GRAAL] Don't run RTM tests with Graal In-Reply-To: References: <9148c105-1bd8-0852-9130-6b3c7823330e@linux.vnet.ibm.com> <3ae68161-0f78-2766-655f-fe3fe4a4719f@oracle.com> <9e7a0c2b-47c0-45b7-a394-a88849cfa3a1@oracle.com> <15d5f2c1-96ce-c379-233f-e71218a63e1e@linux.vnet.ibm.com> <2e905821-4525-e210-5cc5-50a6f09a39f6@oracle.com> Message-ID: <4F954DE5-DD8C-4395-8E40-6D341C42649C@twitter.com> > On Sep 6, 2018, at 1:53 AM, Gustavo Romero wrote: > > On 09/05/2018 07:54 PM, Vladimir Kozlov wrote: >> v3 looks good. > > Thanks a lot Vladimir. > > @Goetz, would you mind to review v3 please? Is he on vacation? :-) > It touches code meant for AIX but > I don't expect any change in the end. > > http://cr.openjdk.java.net/~gromero/8209972/v3/ > > Thank you. > > > Best regards, > Gustavo > >> Thanks, >> Vladimir >> On 9/5/18 3:18 PM, Gustavo Romero wrote: >>> Hi Vladimir, >>> >>> On 09/04/2018 03:40 PM, Vladimir Kozlov wrote: >>>> Thank you Gustavo for detailed answer. >>>> >>>> I agree with your suggestion to keep vm.rtm.cpu and remove vm.rtm.os. Changes in VMProps.java are good now. >>> >>> Thanks for reviewing it! >>> >>> >>>> About property check (flavor == "server" & !emulatedClient) in tests. If C2 (and Graal in a future when it supports RTM) is used, this check is always true because C2 (and Graal) is only available in Server VM and it is disabled in emulatedClient setting. This check was added before we have C2 usage check which can be done now with vm.rtm.compiler. >>> >>> Thanks, I was not aware of it. I've updated the webrev removing >>> "flavor == "server" & !emulatedClient": >>> >>> http://cr.openjdk.java.net/~gromero/8209972/v3/ >>> >>> "hg diff --patience": >>> >>> http://cr.openjdk.java.net/~gromero/8209972/v3/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v3.diff >>> >>> Testing (on Linux): >>> >>> ** X86_64 w/ CPU+OS RTM support + Graal VM ** >>> Test results: no tests selected (all RTM tests skipped) >>> >>> ** POWER8 w/ CPU+OS support ** >>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>> Test results: passed: 30 >>> >>> ** X86_64 w/ CPU+OS support ** >>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>> Test results: passed: 30 >>> >>> ** POWER7 wo/ CPU+OS RTM support ** >>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>> Test results: passed: 10 >>> >>> ** POWER9 NV CPU w/ CPU RTM support and OS wo/ RTM support ** >>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>> Test results: passed: 10 >>> >>> >>> Best regards, >>> Gustavo >>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 9/3/18 3:15 PM, Gustavo Romero wrote: >>>>> Hi Vladimir, >>>>> >>>>> Thanks a lot for reviewing it and for your comments. >>>>> >>>>> On 08/31/2018 03:12 PM, Vladimir Kozlov wrote: >>>>>> Hi Gustavo, >>>>>> >>>>>> I think you should replace !Compiler.isGraalEnabled() with Compiler.isC2Enabled() because C2 may be switched off with TieredStopAtLevel < 4 flag >>>>> >>>>> Yes, although currently afaics all tests will explicitly enabled C2 (for >>>>> instance, passing -Xcomp and -XX:-TieredCompilation) or implicitly enable C2 >>>>> through a warming up before testing, I agree that nothing forbids one to >>>>> switch off C2 with TieredStopAtLevel = level < 4 for a test case. It also >>>>> looks better to list explicitly which compilers do support RTM instead of >>>>> the ones that don't support it. >>>>> >>>>> I've updated the webrev accordingly: >>>>> >>>>> http://cr.openjdk.java.net/~gromero/8209972/v2/ >>>>> >>>>> diff in there looks odd so I generated another one with --patience for a >>>>> better (IMO) diff format: >>>>> >>>>> http://cr.openjdk.java.net/~gromero/8209972/v2/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v2.diff >>>>> >>>>> >>>>>> Also can platforms check be replaced with one vmRTMCPU()? Is ppc64 return true from vmRTMCPU()? >>>>> >>>>> For example, on Linux the following cases are possible regarding CPU / OS >>>>> RTM support: >>>>> >>>>> POWER7 : cpu = false, os = false => vm.rtm.cpu = false >>>>> POWER8 : cpu = true, os = false | true => vm.rtm.cpu = false | true >>>>> POWER9 VM: cpu = true, os = false | true => vm.rtm.cpu = false | true >>>>> POWER9 NV: cpu = true, os = false => vm.rtm.cpu = false >>>>> >>>>> PPC64 will return 'true' for vmRTMCPU() _only_ if both CPU and OS support >>>>> RTM. In other words, when @requires asks for "vm.rtm.cpu & vm.rtm.os" it >>>>> really looks like a tautology because if "vm.rtm.cpu" is 'true' it implies >>>>> "vm.rtm.os" being 'true' as well, otherwise the JVM would never advertise >>>>> the "rtm" feature (which is used by "vm.rtm.cpu"). That seems true for >>>>> Linux and for AIX. >>>>> >>>>> That said I don't think that the platforms check can be replaced with one >>>>> vmRTMCPU(), because in some cases it's necessary to run a test for >>>>> cpu = false and compiler = true, i.e. it's necessary to run a test on an >>>>> unsupported CPU for a given platform _only if_ the compiler in use supports >>>>> RTM (like C2). So if, for instance, we do: >>>>> >>>>> 'vm.rtm.compiler = C2 && vmRTMCPU()' and use 'vm.rtm.compiler' in @requires >>>>> we "tie" CPU+OS RTM support to compiler RTM support and the evaluation >>>>> returns 'false' for cpu = false and compiler = true, skipping the test >>>>> (vm.rtm.compiler = false). On the other hand, to evaluate 'vm.rtm.compiler' >>>>> as 'true' and run the test in that case one could match for >>>>> '!vm.rtm.compiler', but if compiler = false (Graal) '!vm.rtm.compiler' will >>>>> be evaluated as 'true' and the test will run even thought the Graal >>>>> compiler is selected, which is wrong. >>>>> >>>>> Hence 'vm.rtm.compiler' property must not rely on vmRTMCPU() and must >>>>> contain its own list of supported compilers with RTM support for each >>>>> platform IMO. Basically we can't ask the JVM about the compiler's support >>>>> for RTM, since the JVM can only tell us about the CPU+OS support for RTM >>>>> regarding the CPU and OS in which the JVM is running on. >>>>> >>>>> >>>>>> And since you check cpu in here why not to replace all vm.rtm.* with one vm.rtm.supported? In such case you would need only one @requires checks in tests instead of: >>>>>> >>>>>> vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os & vm.rtm.compiler >>>>> >>>>> I think it's not possible either. Currently there are 5 match cases in >>>>> RTM tests: >>>>> >>>>> gromero at moog:~/hg/jdk/jdk/test/hotspot/jtreg/compiler/rtm$ fgrep @require -RIn | cut -d ' ' -f 2-40 | sort -u >>>>> * @requires !(vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os) >>>>> * @requires vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os >>>>> * @requires (!vm.rtm.cpu) & (vm.flavor == "server" & !vm.emulatedClient) >>>>> * @requires vm.rtm.cpu & !(vm.flavor == "server" & !vm.emulatedClient) >>>>> >>>>> which can be simplified 5 cases as: >>>>> >>>>> 1: !(flavor == "server" & !emulatedClient & cpu & os) >>>>> 2: flavor == "server" & !emulatedClient & cpu & os >>>>> 3: (!cpu) & (flavor == "server" & !emulatedClient) >>>>> 4: cpu & !(flavor == "server" & !emulatedClient) >>>>> 5: no @requires >>>>> >>>>> I understand that case 1 and 2 (since CPU implies OS) can be simplified as: >>>>> >>>>> >>>>> 1: !(flavor == "server" & !emulatedClient & cpu) >>>>> 2: flavor == "server" & !emulatedClient & cpu >>>>> 3: (!cpu) & (flavor == "server" & !emulatedClient) >>>>> 4: cpu & !(flavor == "server" & !emulatedClient) >>>>> 5: no @requires >>>>> >>>>> and case 1 and 2 are mere opposites, so we have 4 cases: >>>>> >>>>> 1: !(flavor == "server" & !emulatedClient & cpu) >>>>> 3: (!cpu) & (flavor == "server" & !emulatedClient) >>>>> 4: cpu & !(flavor == "server" & !emulatedClient) >>>>> 5: no @requires >>>>> >>>>> We could simplify further making P = (flavor == "server" & !emulatedClient), >>>>> and make: >>>>> >>>>> 1: !(P & cpu) >>>>> 3: (!cpu) & (P) >>>>> 4: cpu & !(P) >>>>> 5: no @requires >>>>> >>>>> So if we add a compiler = C2 && (x64 | PPC) property to each of them in >>>>> order to control running the tests only if the selected compiler on a >>>>> given platform has RTM support (skipping Graal, for instance): >>>>> >>>>> 1: !(P & cpu) & compiler >>>>> 3: (!cpu) & (P) & compiler >>>>> 4: cpu & !(P) & compiler >>>>> 5: no @requires & compiler >>>>> >>>>> So it looks like that at minimum we would need 3 properties, but IMO it's >>>>> not worth to add another property P = (flavor == "server" & !emulatedClient) >>>>> just to simplify further the @requires line. >>>>> >>>>> In summing up, I think it's only possible to replace 'cpu & os' by 'cpu', >>>>> so I updated the webrev removing the vm.rtm.os property and keeping only >>>>> vm.rtm.cpu and vm.rtm.compiler (plus flavor and emulatedClient checks). >>>>> >>>>> I've tested the following scenarios and observed no regression [1]: >>>>> >>>>> 1. X86_64 w/ RTM >>>>> 2. X86_64 w/ RTM + Graal enabled >>>>> 3. POWER7: no CPU+OS support for RTM >>>>> 4. POWER8: CPU+OS support for RTM >>>>> >>>>> But I think we need a confirmation from SAP about AIX. >>>>> >>>>> >>>>> Best regards, >>>>> Gustavo >>>>> >>>>> [1] >>>>> >>>>> ** X86_64 w/ RTM ** >>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>>> Test results: passed: 30 >>>>> >>>>> >>>>> ** X86_64 w/ RTM + Graal enabled ** >>>>> Test results: no tests selected (all RTM tests skipped) >>>>> >>>>> >>>>> ** POWER7: no CPU+OS support for RTM ** >>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>> Test results: passed: 10 >>>>> >>>>> >>>>> ** POWER8: CPU+OS support for RTM ** >>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>>> Test results: passed: 30 >>>>> >>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 8/31/18 8:38 AM, Gustavo Romero wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Could the following small change be reviewed please? >>>>>>> >>>>>>> Bug : https://bugs.openjdk.java.net/browse/JDK-8209972 >>>>>>> Webrev: http://cr.openjdk.java.net/~gromero/8209972/v1/ >>>>>>> >>>>>>> It disables all RTM tests when a compiler not supporting RTM (e.g. Graal) >>>>>>> is selected on platforms that can have CPU/OS with RTM support. >>>>>>> >>>>>>> It also disables all RTM tests for any other platform that has not a single >>>>>>> compiler supporting RTM. >>>>>>> >>>>>>> The RTM support was first added to C2 compiler and once checkers for RTM >>>>>>> (notably vm.rtm.cpu) find the feature "rtm" advertised by the JVM they >>>>>>> assume that a compiler supporting RTM is available for sure ("rtm" is >>>>>>> advertised only if RTM is supported by both CPU and OS). Later the JVM >>>>>>> began to allow the selection of a compiler different from C2, like Graal, >>>>>>> and it became possible to select a compiler without RTM support despite the >>>>>>> fact that both the CPU and the OS support RTM. Thus for platforms >>>>>>> supporting Graal or any other specific compiler the compiler availability for >>>>>>> the RTM tests must be adjusted and if the selected compiler does not >>>>>>> support RTM then all RTM tests must be skipped, including the ones meant >>>>>>> for platforms without CPU or OS RTM support (e.g. *Unsupported*.java) >>>>>>> because in some cases, like in TestUseRTMLockingOptionOnUnsupportedCPU.java, >>>>>>> the test expects JVM initialization errors that will never occur because the >>>>>>> problem is not that the RTM support for CPU or OS is missing, but rather >>>>>>> because the selected compiler does not support RTM. >>>>>>> >>>>>>> That change adds a new VM property 'vm.rtm.compiler' which can be used to >>>>>>> filter out compilers without RTM support for specific platforms and adapts >>>>>>> the current RTM tests to use that new property. >>>>>>> >>>>>>> Nothing changes regarding the number of passing/selected tests for the >>>>>>> various cpu/os/compiler combinations on platforms that currently might >>>>>>> support RTM [1], except when Graal is in use. >>>>>>> >>>>>>> Thank you. >>>>>>> >>>>>>> Best regards, >>>>>>> Gustavo >>>>>>> >>>>>>> >>>>>>> [1] >>>>>>> >>>>>>> ** X64 w/ CPU and OS supporting RTM ** >>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>>>>> Test results: passed: 30 >>>>>>> >>>>>>> ** X64 w/ CPU and OS supporting RTM + Graal compiler wo/ RTM support ** >>>>>>> Test results: no tests selected (all RTM tests skipped) >>>>>>> >>>>>>> ** POWER8 w/ CPU and OS supporting RTM ** >>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>>>>> Test results: passed: 30 >>>>>>> >>>>>>> ** POWER7 wo/ CPU and OS supporting RTM ** >>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>>>> Test results: passed: 10 >>>>>>> >>>>>> >>>>> >>>> >>> > From doug.simon at oracle.com Mon Sep 10 20:33:17 2018 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 10 Sep 2018 22:33:17 +0200 Subject: RFR(XS) 8210434: [Graal] 8209301 prevents GitHub Graal from compiling with latest JDK In-Reply-To: <87209ade-9c45-2edd-3370-31a447e38588@oracle.com> References: <87209ade-9c45-2edd-3370-31a447e38588@oracle.com> Message-ID: <7E57CA1D-4B3F-46A2-94AD-6CE169093E94@oracle.com> Looks good to me. > On 10 Sep 2018, at 22:08, dean.long at oracle.com wrote: > > http://cr.openjdk.java.net/~dlong/8210434/webrev/ > https://bugs.openjdk.java.net/browse/JDK-8210434 > > This change reverts the 8209301 rename in AOTCompiledClass and adds back HotSpotResolvedObjectType.isAnonymous to preserve compatibility. > > dl From vladimir.kozlov at oracle.com Mon Sep 10 21:23:50 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 10 Sep 2018 14:23:50 -0700 Subject: RFR(XS) 8210434: [Graal] 8209301 prevents GitHub Graal from compiling with latest JDK In-Reply-To: <7E57CA1D-4B3F-46A2-94AD-6CE169093E94@oracle.com> References: <87209ade-9c45-2edd-3370-31a447e38588@oracle.com> <7E57CA1D-4B3F-46A2-94AD-6CE169093E94@oracle.com> Message-ID: <492e8d6b-1897-8dd4-5d79-4ee2a7c1f60f@oracle.com> +1 Thanks, Vladimir On 9/10/18 1:33 PM, Doug Simon wrote: > Looks good to me. > >> On 10 Sep 2018, at 22:08, dean.long at oracle.com wrote: >> >> http://cr.openjdk.java.net/~dlong/8210434/webrev/ >> https://bugs.openjdk.java.net/browse/JDK-8210434 >> >> This change reverts the 8209301 rename in AOTCompiledClass and adds back HotSpotResolvedObjectType.isAnonymous to preserve compatibility. >> >> dl > From dean.long at oracle.com Mon Sep 10 23:18:07 2018 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 10 Sep 2018 16:18:07 -0700 Subject: RFR(XS) 8210434: [Graal] 8209301 prevents GitHub Graal from compiling with latest JDK In-Reply-To: <492e8d6b-1897-8dd4-5d79-4ee2a7c1f60f@oracle.com> References: <87209ade-9c45-2edd-3370-31a447e38588@oracle.com> <7E57CA1D-4B3F-46A2-94AD-6CE169093E94@oracle.com> <492e8d6b-1897-8dd4-5d79-4ee2a7c1f60f@oracle.com> Message-ID: Thanks Doug and Vladimir. dl On 9/10/18 2:23 PM, Vladimir Kozlov wrote: > +1 > > Thanks, > Vladimir > > On 9/10/18 1:33 PM, Doug Simon wrote: >> Looks good to me. >> >>> On 10 Sep 2018, at 22:08, dean.long at oracle.com wrote: >>> >>> http://cr.openjdk.java.net/~dlong/8210434/webrev/ >>> https://bugs.openjdk.java.net/browse/JDK-8210434 >>> >>> This change reverts the 8209301 rename in AOTCompiledClass and adds >>> back HotSpotResolvedObjectType.isAnonymous to preserve compatibility. >>> >>> dl >> From 944797358 at qq.com Tue Sep 11 00:36:57 2018 From: 944797358 at qq.com (Andy Law) Date: Tue, 11 Sep 2018 08:36:57 +0800 Subject: [PATCH] 8202414: Unsafe crash in C2 Message-ID: Hi Lutz and Andrew, Thank you for your reply and sorry for my typos :) TL;DR I think it is the optimization of `clear_memory()`which cause this problem, in my understanding it may not be a user fault :) When running the example on the bug list using `-XX:DisableIntrinsic=_putInt`, or use interpreter/C1 only will make this problem go away, due to the fact that program will go to another branch. In function `clear_memory()`, it will make an optimization which will clear the context of the memory, in fact if (done_offset > start_offset) { // [1] // it will clear the memory from start_offset to done_offset } if (done_offset < end_offset) { // [2] // it will clear the memory by using a Int (0) to clear the memory of done_offset } |<--------------- 16-byte header ?--??>| | 0000 0001 1000 1101 | If it is aligned, it won?t have any problem but, if it is not aligned as the example, this optimization will mis-clear the `0000 0001` to `0000 0000`, so the array length becomes 141. Then it will crash when gc happens. It is the optimization which cause this problem, so when it is not aligned, we don?t do this optimization for this unaligned address may solve the problem. By the way I didn?t find the unaligned message on the doc:( but I think you?re right and it should be aligned when using Unsafe, though it is an deprecated API :) It won?t be reproduced using the templateInterpreter or C1 compiler, due to the fact that they won?t do this optimization. Thank you:), Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From 944797358 at qq.com Tue Sep 11 00:42:27 2018 From: 944797358 at qq.com (Andy Law) Date: Tue, 11 Sep 2018 08:42:27 +0800 Subject: [PATCH] 8202414: Unsafe crash in C2 Message-ID: <1BBBEB69-6301-41EE-AD50-A02AA152D757@qq.com> Hi Vladimir, Thank you for your reply:) However, I think my patch is as below diff --git a/src/share/vm/opto/memnode.cpp b/src/share/vm/opto/memnode.cpp --- a/src/share/vm/opto/memnode.cpp +++ b/src/share/vm/opto/memnode.cpp @@ -2923,8 +2923,11 @@ return mem; } + if ((end_offset % BytesPerInt) != 0) { + return mem; + } + Compile* C = phase->C; - assert((end_offset % BytesPerInt) == 0, "odd end offset"); intptr_t done_offset = end_offset; if ((done_offset % BytesPerLong) != 0) { done_offset -= BytesPerInt; Maybe I mis-submitted some code ...? Sorry for bothering :( Thanks, Andy From vladimir.kozlov at oracle.com Tue Sep 11 01:08:38 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 10 Sep 2018 18:08:38 -0700 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> Message-ID: Very nice. Thank you, Sandhya. I would like to see more meaningful naming in .ad files - instead of rreg* and ovec* to have something like vlReg* and legVec*. New load_from_* and load_to_* instructions in .ad files should be renamed to next and collocate with other Move*_reg_reg* instructions: instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, vlRegF src) You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store? Also please explain why these registers are used when UseAVX == 0?: +instruct absD_reg(rregD dst) %{ predicate((UseSSE>=2) && (UseAVX == 0)); we switch off evex so regular regD (only legacy register in this case) should work too: 661 if (UseAVX < 3) { 662 _features &= ~CPU_AVX512F; Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, vectors_reg_legacy, %{ VM_Version::supports_evex() && VM_Version::supports_avx512bw() && VM_Version::supports_avx512dq() && VM_Version::supports_avx512vl() %} ); I would suggest to test these changes on different machines (non-avx512 and avx512) and with different UseAVX values. Thanks, Vladimir On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: > Recently there have been couple of high priority issues with regards to high bank of XMM register > (XMM16-XMM31) usage by C2: > > https://bugs.openjdk.java.net/browse/JDK-8207746 > > https://bugs.openjdk.java.net/browse/JDK-8209735 > > Please find below a patch which attempts to clean up the XMM register handling by using register groups. > > http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ > > > The patch provides a restricted set of registers to the match rules in the ad file based on the > underlying architecture. > > The aim is to remove special handling/workaround from macro assembler and assembler. > > By removing the special handling, the patch reduces the overall code size by about 1800 lines of code. > > Your review and feedback is very welcome. > > Best Regards, > > Sandhya > From vladimir.kozlov at oracle.com Tue Sep 11 01:20:17 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 10 Sep 2018 18:20:17 -0700 Subject: [PATCH] 8202414: Unsafe crash in C2 In-Reply-To: <1BBBEB69-6301-41EE-AD50-A02AA152D757@qq.com> References: <1BBBEB69-6301-41EE-AD50-A02AA152D757@qq.com> Message-ID: <94b45235-1493-bb55-2d7d-ea90350a91ab@oracle.com> Hi Andy, What I sent is *my* suggested fix because I think your fix (below) is not correct. InitializeNode::complete_stores() assumes that call ClearArrayNode::clear_memory() will generate code to zero the part of object and you change does not generate such code. Thanks, Vladimir On 9/10/18 5:42 PM, Andy Law wrote: > Hi Vladimir, > > Thank you for your reply:) > > However, I think my patch is as below > > diff --git a/src/share/vm/opto/memnode.cpp b/src/share/vm/opto/memnode.cpp > --- a/src/share/vm/opto/memnode.cpp > +++ b/src/share/vm/opto/memnode.cpp > @@ -2923,8 +2923,11 @@ > return mem; > } > > + if ((end_offset % BytesPerInt) != 0) { > + return mem; > + } > + > Compile* C = phase->C; > - assert((end_offset % BytesPerInt) == 0, "odd end offset"); > intptr_t done_offset = end_offset; > if ((done_offset % BytesPerLong) != 0) { > done_offset -= BytesPerInt; > > Maybe I mis-submitted some code ...? > Sorry for bothering :( > > Thanks, > Andy > > From dean.long at oracle.com Tue Sep 11 03:12:40 2018 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 10 Sep 2018 20:12:40 -0700 Subject: [PATCH] 8202414: Unsafe crash in C2 In-Reply-To: References: Message-ID: Hi Andy.? Did you notice the difference between these two intrinsics: *case* vmIntrinsics ::_putInt :*return* inline_unsafe_access (is_store ,T_INT ,Relaxed ,*false*); [...] *case* vmIntrinsics ::_putIntUnaligned :*return* inline_unsafe_access (is_store ,T_INT ,Relaxed ,*true*); The last argument (bool unaligned) for _putInt is saying that it does not support unaligned accesses. Looking at jdk.internal.misc.Unsafe instead of sun.misc.Unsafe should make the difference clearer. dl On 9/10/18 5:36 PM, Andy Law wrote: > Hi Lutz and?Andrew, > > Thank you for your reply and sorry for my typos :) > > TL;DR > I think it is the optimization of `clear_memory()`which cause this > problem, in my understanding it may not be a user fault :) > > When running?the example on the bug list?using > `-XX:DisableIntrinsic=_putInt`, or use interpreter/C1 only will make > this problem go away, due to the fact that program will go to another > branch. > > In function `clear_memory()`, it will make an optimization which will > clear the context of the memory, in fact > > ? ??if (done_offset > start_offset) { ?// [1] > ? ? ? ? // it will clear the memory from start_offset to done_offset > ? ? } > > ? ??if (done_offset < end_offset) { ?// [2] > ? ? ? ? // it will clear the memory by using a Int (0) to clear the > memory of done_offset > ? ? } > > |<--------------- 16-byte header ?--??>| byte[397]) ???> > ? ? ? ? ? | ? ? ? 0000 ? ? 0001 ? ? 1000 ? ? 1101 ? ? | > If it is aligned, it won?t have any problem but, if it is not aligned > as the example, this optimization will mis-clear the `0000 0001` to > `0000 0000`, so the array length becomes 141. Then it will crash when > gc happens. > > It is the optimization which cause this problem, so when it is not > aligned, we don?t do this optimization for this unaligned address may > solve the problem. > By the way I didn?t find the unaligned message on the doc:( but I > think you?re right and it should be aligned when using Unsafe, though > it is an deprecated API :) It won?t be reproduced using the > templateInterpreter or C1 compiler, due to the fact that they won?t do > this optimization. > > Thank you:), > Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From 944797358 at qq.com Tue Sep 11 03:22:52 2018 From: 944797358 at qq.com (Andy Law) Date: Tue, 11 Sep 2018 11:22:52 +0800 Subject: [PATCH] 8202414: Unsafe crash in C2 In-Reply-To: References: Message-ID: <4FEA91FB-DAB2-4F69-862B-383F16DA31E7@qq.com> Hi Dean, Thanks for pointing it out, I didn?t notice it before because I mainly use openjdk 8. Now I get it, thank you :) Andy > On Sep 11, 2018, at 11:12, dean.long at oracle.com wrote: > > :_putIntUnaligned -------------- next part -------------- An HTML attachment was scrubbed... URL: From jayaprabhakar at gmail.com Tue Sep 11 06:22:51 2018 From: jayaprabhakar at gmail.com (jayaprabhakar k) Date: Mon, 10 Sep 2018 23:22:51 -0700 Subject: Any way to avoid JIT overhead for small programs when using AOT? In-Reply-To: References: Message-ID: On Mon, 10 Sep 2018 at 01:40, wrote: > Send hotspot-compiler-dev mailing list submissions to > hotspot-compiler-dev at openjdk.java.net > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev > or, via email, send a message with subject or body 'help' to > hotspot-compiler-dev-request at openjdk.java.net > > You can reach the person managing the list at > hotspot-compiler-dev-owner at openjdk.java.net > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of hotspot-compiler-dev digest..." > > > Today's Topics: > > 1. Any way to avoid JIT overhead for small programs when using > AOT? (jayaprabhakar k) > 2. [PING] RE: RFR(S): 8210152: Optimize integer divisible by > power-of-2 check (Pengfei Li (Arm Technology China)) > 3. Re: Any way to avoid JIT overhead for small programs when > using AOT? (dean.long at oracle.com) > 4. Re: Any way to avoid JIT overhead for small programs when > using AOT? (Andrew Haley) > 5. JIT: C2 doesn't skip post barrier for new allocated objects > (Kuai Wei) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 9 Sep 2018 20:58:02 -0700 > From: jayaprabhakar k > To: hotspot-compiler-dev at openjdk.java.net > Subject: Any way to avoid JIT overhead for small programs when using > AOT? > Message-ID: > Si+mtt7YVS2tAtMfD9MWhxOJkHky9T_i68EMHWvgHgEioQ at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi, > I understand that at present AOT and -Xint are not compatible. I see the > code explicitly disables AOT when -Xint is set > < > http://cr.openjdk.java.net/~kvn/8171137/webrev/raw_files/new/src/share/vm/aot/aotLoader.cpp > > > . > > For extremely short programs, typically used by beginners learning Java, I > see that CDS, AOT and Xint all help reduce the startup time. While CDS > works with both AOT and Xint, multiplying the benefits, AOT and Xint do > not. > > Is there a way to keep both AOT + Xint, For classes/methods that are > precompiled, use AOT code, and for others just interpret? If not now, would > it be possible in the future? > > Thanks, > JP > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180909/86bb6624/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Mon, 10 Sep 2018 04:24:16 +0000 > From: "Pengfei Li (Arm Technology China)" > To: "dean.long at oracle.com" , Vladimir Kozlov > , " > hotspot-compiler-dev at openjdk.java.net" > , > "hotspot-dev at openjdk.java.net" , > "Pengfei Li (Arm Technology China)" > Cc: nd > Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by > power-of-2 check > Message-ID: > < > DB7PR08MB31150B1D6C7E547538B2B99A96050 at DB7PR08MB3115.eurprd08.prod.outlook.com > > > > Content-Type: text/plain; charset="utf-8" > > Hi Dean / Vladimir / JDK experts, > > Do you have any further questions or comments on this patch? Or should I > make some modifications on it, such as adding some limitations to the > matching condition? > I appreciate your help. > > -- > Thanks, > Pengfei > > > > -----Original Message----- > > From: Pengfei Li (Arm Technology China) > > Sent: Monday, September 3, 2018 13:50 > > To: 'dean.long at oracle.com' ; 'Vladimir Kozlov' > > ; hotspot-compiler-dev at openjdk.java.net; > > hotspot-dev at openjdk.java.net > > Cc: nd > > Subject: RE: RFR(S): 8210152: Optimize integer divisible by power-of-2 > check > > > > Hi Vladimir, Dean, > > > > Thanks for your review. > > > > > I don't see where negation is coming from for 'X % 2 == 0' expression. > > > It should be only 2 instructions: 'cmp (X and 1), 0' > > The 'cmp (X and 1), 0' is just what we expected. But there's redundant > > conditional negation coming from the possibly negative X handling in "X > % 2". > > For instance, X = -5, "X % 2" should be -1. So only "(X and 1)" > operation is not > > enough. We have to negate the result. > > > > > I will look on it next week. But it would be nice if you can provide > small test > > to show this issue. > > I've already provided a case of "if (a%2 == 0) { ... }" in JBS > description. What > > code generated and what can be optimized are listed there. > > You could see https://bugs.openjdk.java.net/browse/JDK-8210152 for > details. > > You could also see the test case for this optimization I attached below. > > > > > It looks like your matching may allow more patterns than expected. I > was > > expecting it to look for < 0 or >= 0 for the conditional negation, but I > don't see > > it. > > Yes. I didn't limit the if condition to <0 or >= 0 so it will match more > patterns. > > But nothing is going wrong if this ideal transformation applies on more > cases. > > In pseudo code, if someone writes: > > if ( some_condition ) { x = -x; } > > if ( x == 0 ) { do_something(); } > > The negation in 1st if-clause could always be eliminated whatever the > > condition is. > > > > -- > > Thanks, > > Pengfei > > > > > > -- my test case attached below -- > > public class Foo { > > > > public static void main(String[] args) { > > int[] dividends = { 0, 17, 1553, -90, -35789, 0x80000000 }; > > for (int i = 0; i < dividends.length; i++) { > > int x = dividends[i]; > > System.out.println(testDivisible(x)); > > System.out.println(testModulo(x)); > > testCondNeg(x); > > } > > return; > > } > > > > public static int testDivisible(int x) { > > // Modulo result is only for zero check > > if (x % 4 == 0) { > > return 444; > > } > > return 555; > > } > > > > public static int testModulo(int x) { > > int y = x % 4; > > if (y == 0) { > > return 222; > > } > > // Modulo result is used elsewhere > > System.out.println(y); > > return 333; > > } > > > > public static void testCondNeg(int x) { > > // Pure conditional negation > > if (printAndIfNeg(x)) { > > x = -x; > > } > > if (x == 0) { > > System.out.println("zero!"); > > } > > } > > > > static boolean printAndIfNeg(int x) { > > System.out.println(x); > > return x <= 0; > > } > > } > > ------------------------------ > > Message: 3 > Date: Mon, 10 Sep 2018 01:00:29 -0700 > From: dean.long at oracle.com > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: Any way to avoid JIT overhead for small programs when > using AOT? > Message-ID: > Content-Type: text/plain; charset="utf-8"; Format="flowed" > > On 9/9/18 8:58 PM, jayaprabhakar k wrote: > > Hi, > > I understand that at present AOT and -Xint are not compatible. I see > > the code explicitly disables AOT when -Xint is set > > < > http://cr.openjdk.java.net/%7Ekvn/8171137/webrev/raw_files/new/src/share/vm/aot/aotLoader.cpp > >. > > > > For extremely short programs, typically used by beginners learning > > Java, I see that CDS, AOT and Xint all help reduce the startup time. > > While CDS works with both AOT and Xint, multiplying the benefits, AOT > > and Xint do not. > > > > Is there a way to keep both AOT?+ Xint, For classes/methods that are > > precompiled, use AOT code, and for others just interpret? If not now, > > would it be possible in the future? > > > > Thanks, > > JP > > Hi JP.? Yes, it could be possible in the future.? One problem is > MethodHandle intrinsics.? With -Xint, there's no code heap, so no place > to generate native adapters for those intrinsics. > > dl > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180910/5f3ec9cd/attachment-0001.html > > > > ------------------------------ > > Message: 4 > Date: Mon, 10 Sep 2018 09:17:59 +0100 > From: Andrew Haley > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: Any way to avoid JIT overhead for small programs when > using AOT? > Message-ID: <2753b70f-67c7-ef7a-ca40-49266f502401 at redhat.com> > Content-Type: text/plain; charset=utf-8 > > On 09/10/2018 04:58 AM, jayaprabhakar k wrote: > > > I understand that at present AOT and -Xint are not compatible. I see the > > code explicitly disables AOT when -Xint is set > > < > http://cr.openjdk.java.net/~kvn/8171137/webrev/raw_files/new/src/share/vm/aot/aotLoader.cpp > > > > . > > > > For extremely short programs, typically used by beginners learning Java, > I > > see that CDS, AOT and Xint all help reduce the startup time. While CDS > > works with both AOT and Xint, multiplying the benefits, AOT and Xint do > > not. > > > > Is there a way to keep both AOT + Xint, For classes/methods that are > > precompiled, use AOT code, and for others just interpret? If not now, > would > > it be possible in the future? > > Does it significantly help? If you precompile the Java library and your > programs > are extremely short, you'll see very little compilation activity. > Thanks Andrew. I don't see any compilation (The default -XX:CompileThreshold is quite large), but the overhead still seems to be large. I ran a small test on AWS T2 instances. The test class just has empty main method. But I could reproduce the exact same behavior when run with *--dry-run* command line option. So most of the delay happens on startup. -- Default -- $ perf stat -e cpu-clock -r50 java -XX:+UseG1GC EmptyMainMethod Performance counter stats for 'java -XX:+UseG1GC EmptyMainMethod' (50 runs): 104.039398 cpu-clock (msec) ( +- 0.39% ) 0.093801870 seconds time elapsed ( +- 2.66% ) -- Xint -- perf stat -e cpu-clock -r50 java -XX:+UseG1GC -Xint EmptyMainMethod Performance counter stats for 'java -XX:+UseG1GC -Xint EmptyMainMethod' (50 runs): 76.203249 cpu-clock (msec) ( +- 0.33% ) 0.083464038 seconds time elapsed ( +- 2.03% ) -- AOT -- $ perf stat -e cpu-clock -r50 java -XX:+UseG1GC -XX:AOTLibrary=jaot/touched_methods.so EmptyMainMethod Performance counter stats for 'java -XX:+UseG1GC -XX:AOTLibrary=jaot/touched_methods.so EmptyMainMethod' (50 runs): 102.416037 cpu-clock (msec) ( +- 0.22% ) 0.083394143 seconds time elapsed ( +- 0.92% ) -- -- The source code for the test is public class EmptyMainMethod { public static void main(String[] args) { } } -- This delay seems consistent with most programs created by school students learning Java. Context for the request: I am the developer of Codiva.io online Java IDE . Many teachers recommend it for their students to learn java. To support spiky load, I run the programs on the server on a container with reduced resource limits for each run. At 10% CPU limit, the difference gets around 200ms. > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > > > ------------------------------ > > Message: 5 > Date: Mon, 10 Sep 2018 16:39:42 +0800 > From: "Kuai Wei" > To: "hotspot compiler" > Subject: JIT: C2 doesn't skip post barrier for new allocated objects > Message-ID: > <1849545e-2e42-4b24-957c-cb4924362971.kuaiwei.kw at alibaba-inc.com> > Content-Type: text/plain; charset="utf-8" > > > Hi, > > Recently I checked the optimization of reducing G1 post barrier for new > allocated object. But I found it doesn't work as expected. > I wrote a simple test case to store oop in initialize function or just > after init function . > public class StoreTest { > static String val="x"; > > public static Foo testMethod() { > Foo newfoo = new Foo(val); > newfoo.b=val; // the store barrier could be reduced > return newfoo; > } > > public static void main(String []args) { > Foo obj = new Foo(val); // init Foo class > testMethod(); > } > > static class Foo { > Object a; > Object b; > public Foo(Object val) { > this.a=val; // the store barrier could be reduced > }; > } > } > I inline Foo: and Object:: when compile testMethod by C2, so I > think the 2 store marked red don't need post barrier. But I still found > post barrier in generated assembly code. > The test command: java -Xcomp -Xbatch -XX:+UseG1GC > -XX:CompileCommandFile=compile_command -Xbatch -XX:+PrintCompilation > -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining > StoreTest > compile_command: > compileonly, StoreTest::testMethod > compileonly, StoreTest$Foo:: > inline, StoreTest$Foo:: > compileonly, java.lang.Object:: > inline, java.lang.Object:: > print, StoreTest::testMethod > > I checked the node graph in parsing phase. The optimization depends on > GraphKit::just_allocated_object to detect new allocate object. The idea is > to check control of store is control proj of allocation. But in parse phase > , there's a Region node between control proj and control of store. The > region just has one input edge. So it could be optimized later. The region > node is generated when C2 inline init method of super class, I think it's > used in exit map to merge all exit path. > > The change is simple, in just_allocated_object, I checked if there's > region node with only 1 input. With the change, we can see good performance > improvement in pressure test. > > Could you check the change and give comments about it? > > graphKit.cpp > // We use this to determine if an object is so "fresh" that > // it does not require card marks. > Node* GraphKit::just_allocated_object(Node* current_control) { > - if (C->recent_alloc_ctl() == current_control) > + Node * ctrl = current_control; > + if (CheckJustAllocatedAggressive) { > + // Object:: is invoked after allocation, most of invoke nodes > + // will be reduced, but a region node is kept in parse time, we check > + // the pattern and skip the region node > + if (ctrl != NULL && ctrl->is_Region() && ctrl->req() == 2) { > + ctrl = ctrl->in(1); > + } > + } > + if (C->recent_alloc_ctl() == ctrl) > return C->recent_alloc_obj(); > return NULL; > } > Thanks, > Kevin > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180910/0f0f7161/attachment.html > > > > End of hotspot-compiler-dev Digest, Vol 136, Issue 30 > ***************************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Tue Sep 11 08:23:40 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 11 Sep 2018 10:23:40 +0200 Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by power-of-2 check In-Reply-To: References: Message-ID: > The only comment I have is to add check for SubI on other branch (not > only on True branch). Negation may occur on either branch since you > accept all conditions for negation. Can't we make this more general and support a phi with any number of inputs (not only 2 data inputs) as long as it's a mix of X and -X? Roland. From tobias.hartmann at oracle.com Tue Sep 11 08:31:27 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 11 Sep 2018 10:31:27 +0200 Subject: [12] RFR(S): 8210387: C2 compilation fails with "assert(node->_last_del == _last) failed: must have deleted the edge just produced" Message-ID: Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8210387 http://cr.openjdk.java.net/~thartmann/8210387/webrev.00/ During CCP, before removing unreachable regions, we first replace all dead phi users with TOP by calling PhaseIterGVN::replace_node. Replacing a phi node can trigger removal of other nodes some of which might also be phi users of the same region. This breaks verification of the DUIterator because only one node is expected to be removed. Very similar to the code in RegionNode::Ideal [1], we need to refresh the iterator and start iterating from the beginning for as long as there is progress. Thanks, Tobias [1] http://hg.openjdk.java.net/jdk/jdk/file/bbc7157ad9c5/src/hotspot/share/opto/cfgnode.cpp#l538 From adinn at redhat.com Tue Sep 11 09:06:35 2018 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 11 Sep 2018 10:06:35 +0100 Subject: RFR: 8210578: AArch64: Invalid encoding for fmlsvs instruction Message-ID: <80d98039-c412-b638-c764-0794da1ba26f@redhat.com> Can I please have a review for this trivial patch to correct the encoding for fmlsvs. JIRA Issue: https://bugs.openjdk.java.net/browse/JDK-8210578 Patch: diff -r bbc7157ad9c5 src/hotspot/cpu/aarch64/assembler_aarch64.hpp --- a/src/hotspot/cpu/aarch64/assembler_aarch64.hpp Tue Sep 11 09:14:36 2018 +0200 +++ b/src/hotspot/cpu/aarch64/assembler_aarch64.hpp Tue Sep 11 09:42:41 2018 +0100 @@ -2356,7 +2356,7 @@ // FMLA/FMLS - Vector - Scalar INSN(fmlavs, 0, 0b0001); - INSN(fmlsvs, 0, 0b0001); + INSN(fmlsvs, 0, 0b0101); // FMULX - Vector - Scalar INSN(fmulxvs, 1, 0b1001); The corrected bit identifies the sub_op which distinguishes a fused add multiply vector by scalar (fmlavs) and add from a fused multiply vector by scalar and subtract (fmlsvs). Testing: It appears that this instruction has never been exercised (by contrast, fmlavs has -- by the power intrinsic I am currently reviewing). All I have done to check this patch is ensure I can rebuild the JVM (there isn't really any opportunity to test it until it is needed in an intrinsic). Can I assume this is trivial enough to be pushed without running a submit job? regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From claes.redestad at oracle.com Tue Sep 11 10:39:57 2018 From: claes.redestad at oracle.com (Claes Redestad) Date: Tue, 11 Sep 2018 12:39:57 +0200 Subject: Any way to avoid JIT overhead for small programs when using AOT? In-Reply-To: References: Message-ID: <447a323d-58bb-2917-3599-d0515c0eef3f@oracle.com> Hi, On 2018-09-11 08:22, jayaprabhakar k wrote: > > > I understand that at present AOT and -Xint are not compatible. I > see the > > code explicitly disables AOT when -Xint is set > > > > > > . > > > > For extremely short programs, typically used by beginners > learning Java, I > > see that CDS, AOT and Xint all help reduce the startup time. > While CDS > > works with both AOT and Xint, multiplying the benefits, AOT and > Xint do > > not. > > > > Is there a way to keep both AOT + Xint, For classes/methods that are > > precompiled, use AOT code, and for others just interpret? If not > now, would > > it be possible in the future? > > Does it significantly help? If you precompile the Java library and > your programs > are extremely short, you'll see very little compilation activity. > > Thanks Andrew. > I don't see any compilation (The default -XX:CompileThreshold is quite > large), but the overhead still seems to be large.? I ran a small test > on AWS T2 instances. > The test class just has empty main method. But I could reproduce the > exact same behavior when run with *--dry-run* command line option. > > So most of the delay happens on startup. > > -- Default -- > $ perf stat -e cpu-clock -r50 java -XX:+UseG1GC EmptyMainMethod > > Performance counter stats for 'java -XX:+UseG1GC EmptyMainMethod' (50 runs): > > 104.039398 cpu-clock (msec) ( +- 0.39% ) > > 0.093801870 seconds time elapsed ( +- 2.66% ) > > -- Xint -- > perf stat -e cpu-clock -r50 java -XX:+UseG1GC -Xint EmptyMainMethod > > Performance counter stats for 'java -XX:+UseG1GC -Xint EmptyMainMethod' (50 runs): > > 76.203249 cpu-clock (msec) ( +- 0.33% ) > > 0.083464038 seconds time elapsed ( +- 2.03% ) > -- AOT -- > $ perf stat -e cpu-clock -r50 java -XX:+UseG1GC -XX:AOTLibrary=jaot/touched_methods.so EmptyMainMethod > > Performance counter stats for 'java -XX:+UseG1GC -XX:AOTLibrary=jaot/touched_methods.so EmptyMainMethod' (50 runs): > > 102.416037 cpu-clock (msec) ( +- 0.22% ) > > 0.083394143 seconds time elapsed ( +- 0.92% ) > -- there might always be some things executed by the interpreter, some of which might get hot enough to trigger compilations. And if you've compiled your AOT library with support for tiered compilation you might also see C2 jobs fired off early. You can indirectly avoid some of this by stopping the JIT from trying to go beyond C1 level optimization: -XX:TieredStopAtLevel=1 In your constrained environment you might also want to limit the number of compiler threads the system could be spinning up to a minimum: -XX:CICompilerCount=1 With this I see a significant reduction in cpu-clock time on my local machine (recent build from jdk/jdk): AOT: ???????? 81.064838????? cpu-clock (msec)????????????????????????????????????????????? ( +-? 1.13% ) ?????? 0.073530160 seconds time elapsed????????????????????????????????????????? ( +-? 1.05% ) AOT -XX:TieredStopAtLevel=1 -XX:CICompilerCount=1 ???????? 54.584255????? cpu-clock (msec)????????????????????????????????????????????? ( +-? 1.16% ) ?????? 0.054806668 seconds time elapsed????????????????????????????????????????? ( +-? 1.35% ) There's some I/O and extra linking overhead of starting up with an AOT archive, so -Xint might still outperform on a hello world: ???????? 52.138182????? cpu-clock (msec)????????????????????????????????????????????? ( +-? 1.60% ) ?????? 0.053423763 seconds time elapsed????????????????????????????????????????? ( +-? 1.67% ) Generally the static startup overhead of AOT should be amortized rather quickly, say, once you have something that runs for more than a couple of hundred milliseconds. HTH /Claes > > -- > The source code for the test is > > public class EmptyMainMethod { > ? public static void main(String[] args) { > > ? } > } > > > -- > This delay seems consistent with most programs created by school > students learning Java. > > Context for the request: I am the developer of Codiva.io online Java > IDE . Many teachers recommend it for their > students to learn java. To support spiky load, I run the programs on > the server on a container with reduced resource limits for each run. > At 10% CPU limit, the difference gets around 200ms. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lutz.schmidt at sap.com Tue Sep 11 10:43:37 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 11 Sep 2018 10:43:37 +0000 Subject: [PATCH] 8202414: Unsafe crash in C2 In-Reply-To: References: Message-ID: Hi Andy, unfortunately, we have two mail thread heads on the same topic. So I will try to give a very brief summary: - I agree, it's not your fault. The "user" is InitializeNode::complete_stores(). - clear_memory() is not prepared to handle unaligned (less than BytesPerInt) offsets. - Your patch just leaves the memory uninitialized in case of unaligned offsets. - Vladimir's patch fixes the root cause, i.e. the caller of clear_memory(). - Your patch removes the safety net from clear_memory(). Another reason why I don't like it. In essence, I suggest to go with Vladimir's patch, provided the tests Vladimir requested work out ok: ---BEGIN Vladimir's patch --- diff -r b9f6a4427da9 src/hotspot/share/opto/memnode.cpp --- a/src/hotspot/share/opto/memnode.cpp +++ b/src/hotspot/share/opto/memnode.cpp @@ -4095,10 +4095,11 @@ // See if this store needs a zero before it or under it. intptr_t zeroes_needed = st_off; - if (st_size < BytesPerInt) { + if (st_size < BytesPerInt || (zeroes_needed % BytesPerInt) != 0) { // Look for subword stores which only partially initialize words. // If we find some, we must lay down some word-level zeroes first, // underneath the subword stores. + // Do the same for unaligned stores. // // Examples: // byte[] a = { p,q,r,s } => a[0]=p,a[1]=q,a[2]=r,a[3]=s ---END Vladimir's patch --- BTW, I requested to be precise, so I have to correct myself. The length field of ArrayOop is at offset 12 (@klass_gap_offset_in_bytes()) only if UseCompressedClassPointers is true. It does not depend on UseCompressedOops. Thanks, Lutz From: Andy Law <944797358 at qq.com> Date: Tuesday, 11. September 2018 at 02:36 To: "hotspot-compiler-dev at openjdk.java.net" , Lutz Schmidt , "aph at redhat.com" Subject: Re: [PATCH] 8202414: Unsafe crash in C2 Hi Lutz and?Andrew, Thank you for your reply and sorry for my typos :) TL;DR I think it is the optimization of `clear_memory()`which cause this problem, in my understanding it may not be a user fault :) When running?the example on the bug list?using `-XX:DisableIntrinsic=_putInt`, or use interpreter/C1 only will make this problem go away, due to the fact that program will go to another branch. In function `clear_memory()`, it will make an optimization which will clear the context of the memory, in fact ? ??if (done_offset > start_offset) { ?// [1] ? ? ? ? // it will clear the memory from start_offset to done_offset ? ? } ? ??if (done_offset < end_offset) { ?// [2] ? ? ? ? // it will clear the memory by using a Int (0) to clear the memory of done_offset ? ? } |<--------------- 16-byte header ?--??>| ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | ? ? ? 0000 ? ? 0001 ? ? 1000 ? ? 1101 ? ? | If it is aligned, it won?t have any problem but, if it is not aligned as the example, this optimization will mis-clear the `0000 0001` to `0000 0000`, so the array length becomes 141. Then it will crash when gc happens. It is the optimization which cause this problem, so when it is not aligned, we don?t do this optimization for this unaligned address may solve the problem. By the way I didn?t find the unaligned message on the doc:( but I think you?re right and it should be aligned when using Unsafe, though it is an deprecated API :) It won?t be reproduced using the templateInterpreter or C1 compiler, due to the fact that they won?t do this optimization. Thank you:), Andy From 944797358 at qq.com Tue Sep 11 11:16:10 2018 From: 944797358 at qq.com (Andy Law) Date: Tue, 11 Sep 2018 19:16:10 +0800 Subject: [PATCH] 8202414: Unsafe crash in C2 In-Reply-To: References: Message-ID: <3CF1BA06-6C2B-4DFC-ACA9-2F082766922B@qq.com> Hi Lutz, Nice summary and I totally agree with your points. Thanks, Andy > On Sep 11, 2018, at 18:43, Schmidt, Lutz wrote: > > Hi Andy, > > unfortunately, we have two mail thread heads on the same topic. So I will try to give a very brief summary: > > - I agree, it's not your fault. The "user" is InitializeNode::complete_stores(). > - clear_memory() is not prepared to handle unaligned (less than BytesPerInt) offsets. > - Your patch just leaves the memory uninitialized in case of unaligned offsets. > - Vladimir's patch fixes the root cause, i.e. the caller of clear_memory(). > - Your patch removes the safety net from clear_memory(). Another reason why I don't like it. > > In essence, I suggest to go with Vladimir's patch, provided the tests Vladimir requested work out ok: > > ---BEGIN Vladimir's patch --- > diff -r b9f6a4427da9 src/hotspot/share/opto/memnode.cpp > --- a/src/hotspot/share/opto/memnode.cpp > +++ b/src/hotspot/share/opto/memnode.cpp > @@ -4095,10 +4095,11 @@ > // See if this store needs a zero before it or under it. > intptr_t zeroes_needed = st_off; > > - if (st_size < BytesPerInt) { > + if (st_size < BytesPerInt || (zeroes_needed % BytesPerInt) != 0) { > // Look for subword stores which only partially initialize words. > // If we find some, we must lay down some word-level zeroes first, > // underneath the subword stores. > + // Do the same for unaligned stores. > // > // Examples: > // byte[] a = { p,q,r,s } => a[0]=p,a[1]=q,a[2]=r,a[3]=s > ---END Vladimir's patch --- > > BTW, I requested to be precise, so I have to correct myself. The length field of ArrayOop is at offset 12 (@klass_gap_offset_in_bytes()) only if UseCompressedClassPointers is true. It does not depend on UseCompressedOops. > > Thanks, > Lutz > > From: Andy Law <944797358 at qq.com> > Date: Tuesday, 11. September 2018 at 02:36 > To: "hotspot-compiler-dev at openjdk.java.net" , Lutz Schmidt , "aph at redhat.com" > Subject: Re: [PATCH] 8202414: Unsafe crash in C2 > > Hi Lutz and Andrew, > > Thank you for your reply and sorry for my typos :) > > TL;DR > I think it is the optimization of `clear_memory()`which cause this problem, in my understanding it may not be a user fault :) > > When running the example on the bug list using `-XX:DisableIntrinsic=_putInt`, or use interpreter/C1 only will make this problem go away, due to the fact that program will go to another branch. > > In function `clear_memory()`, it will make an optimization which will clear the context of the memory, in fact > > if (done_offset > start_offset) { // [1] > // it will clear the memory from start_offset to done_offset > } > > if (done_offset < end_offset) { // [2] > // it will clear the memory by using a Int (0) to clear the memory of done_offset > } > > |<--------------- 16-byte header ?--??>| > | 0000 0001 1000 1101 | > If it is aligned, it won?t have any problem but, if it is not aligned as the example, this optimization will mis-clear the `0000 0001` to `0000 0000`, so the array length becomes 141. Then it will crash when gc happens. > > It is the optimization which cause this problem, so when it is not aligned, we don?t do this optimization for this unaligned address may solve the problem. > By the way I didn?t find the unaligned message on the doc:( but I think you?re right and it should be aligned when using Unsafe, though it is an deprecated API :) It won?t be reproduced using the templateInterpreter or C1 compiler, due to the fact that they won?t do this optimization. > > Thank you:), > Andy > From vladimir.kozlov at oracle.com Tue Sep 11 16:01:44 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 11 Sep 2018 09:01:44 -0700 Subject: [12] RFR(S): 8210387: C2 compilation fails with "assert(node->_last_del == _last) failed: must have deleted the edge just produced" In-Reply-To: References: Message-ID: <0d4748d9-6d7d-b76f-a950-8040b5911151@oracle.com> Looks good. Thanks, Vladimir On 9/11/18 1:31 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8210387 > http://cr.openjdk.java.net/~thartmann/8210387/webrev.00/ > > During CCP, before removing unreachable regions, we first replace all dead phi users with TOP by > calling PhaseIterGVN::replace_node. Replacing a phi node can trigger removal of other nodes some of > which might also be phi users of the same region. This breaks verification of the DUIterator because > only one node is expected to be removed. Very similar to the code in RegionNode::Ideal [1], we need > to refresh the iterator and start iterating from the beginning for as long as there is progress. > > Thanks, > Tobias > > [1] http://hg.openjdk.java.net/jdk/jdk/file/bbc7157ad9c5/src/hotspot/share/opto/cfgnode.cpp#l538 > From tobias.hartmann at oracle.com Tue Sep 11 16:03:36 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 11 Sep 2018 18:03:36 +0200 Subject: [12] RFR(S): 8210387: C2 compilation fails with "assert(node->_last_del == _last) failed: must have deleted the edge just produced" In-Reply-To: <0d4748d9-6d7d-b76f-a950-8040b5911151@oracle.com> References: <0d4748d9-6d7d-b76f-a950-8040b5911151@oracle.com> Message-ID: <6596277b-b5f8-5fe5-217a-89997a523f23@oracle.com> Thanks Vladimir. Best regards, Tobias On 11.09.2018 18:01, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 9/11/18 1:31 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8210387 >> http://cr.openjdk.java.net/~thartmann/8210387/webrev.00/ >> >> During CCP, before removing unreachable regions, we first replace all dead phi users with TOP by >> calling PhaseIterGVN::replace_node. Replacing a phi node can trigger removal of other nodes some of >> which might also be phi users of the same region. This breaks verification of the DUIterator because >> only one node is expected to be removed. Very similar to the code in RegionNode::Ideal [1], we need >> to refresh the iterator and start iterating from the beginning for as long as there is progress. >> >> Thanks, >> Tobias >> >> [1] http://hg.openjdk.java.net/jdk/jdk/file/bbc7157ad9c5/src/hotspot/share/opto/cfgnode.cpp#l538 >> From vladimir.kozlov at oracle.com Tue Sep 11 17:55:03 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 11 Sep 2018 10:55:03 -0700 Subject: [PATCH] 8202414: Unsafe crash in C2 In-Reply-To: <3CF1BA06-6C2B-4DFC-ACA9-2F082766922B@qq.com> References: <3CF1BA06-6C2B-4DFC-ACA9-2F082766922B@qq.com> Message-ID: Dean have additional comments in bug report regarding unaligned store in general (need to set C2_UNALIGNED). This make me nervous because stores collected by Initialized node are converted to raw stores [1], could be combined and may change such properties. I think we need make sure to not collect such unaligned stores by Initialize node. And in fact we do have such check [2] but in this case store is not marked as unaligned. Because, as Dean pointed before, intrinsic for putInt() do not mark store as unaligned [3]. We can argue that putIntUnaligned() should be used but we can't guarantee that user will use correct one or it is even available as this bug shows. That is why we need to check if store/load is unaligned regardless of which Unsafe method is used. At least for cases when offset is constant. I think we need to fix LibraryCallKit::inline_unsafe_access() and also InitializeNode::can_capture_store() because during parse phase offset could be not constant. I am retracting my suggested fix and let someone to work on this. Thanks, Vladimir [1] http://hg.openjdk.java.net/jdk/jdk/file/74dde8b66b7f/src/hotspot/share/opto/memnode.cpp#l3721 [2] http://hg.openjdk.java.net/jdk/jdk/file/74dde8b66b7f/src/hotspot/share/opto/memnode.cpp#l3478 [3] http://hg.openjdk.java.net/jdk/jdk/file/74dde8b66b7f/src/hotspot/share/opto/library_call.cpp#l2315 On 9/11/18 4:16 AM, Andy Law wrote: > Hi Lutz, > > Nice summary and I totally agree with your points. > > Thanks, > Andy > >> On Sep 11, 2018, at 18:43, Schmidt, Lutz wrote: >> >> Hi Andy, >> >> unfortunately, we have two mail thread heads on the same topic. So I will try to give a very brief summary: >> >> - I agree, it's not your fault. The "user" is InitializeNode::complete_stores(). >> - clear_memory() is not prepared to handle unaligned (less than BytesPerInt) offsets. >> - Your patch just leaves the memory uninitialized in case of unaligned offsets. >> - Vladimir's patch fixes the root cause, i.e. the caller of clear_memory(). >> - Your patch removes the safety net from clear_memory(). Another reason why I don't like it. >> >> In essence, I suggest to go with Vladimir's patch, provided the tests Vladimir requested work out ok: >> >> ---BEGIN Vladimir's patch --- >> diff -r b9f6a4427da9 src/hotspot/share/opto/memnode.cpp >> --- a/src/hotspot/share/opto/memnode.cpp >> +++ b/src/hotspot/share/opto/memnode.cpp >> @@ -4095,10 +4095,11 @@ >> // See if this store needs a zero before it or under it. >> intptr_t zeroes_needed = st_off; >> >> - if (st_size < BytesPerInt) { >> + if (st_size < BytesPerInt || (zeroes_needed % BytesPerInt) != 0) { >> // Look for subword stores which only partially initialize words. >> // If we find some, we must lay down some word-level zeroes first, >> // underneath the subword stores. >> + // Do the same for unaligned stores. >> // >> // Examples: >> // byte[] a = { p,q,r,s } => a[0]=p,a[1]=q,a[2]=r,a[3]=s >> ---END Vladimir's patch --- >> >> BTW, I requested to be precise, so I have to correct myself. The length field of ArrayOop is at offset 12 (@klass_gap_offset_in_bytes()) only if UseCompressedClassPointers is true. It does not depend on UseCompressedOops. >> >> Thanks, >> Lutz >> >> From: Andy Law <944797358 at qq.com> >> Date: Tuesday, 11. September 2018 at 02:36 >> To: "hotspot-compiler-dev at openjdk.java.net" , Lutz Schmidt , "aph at redhat.com" >> Subject: Re: [PATCH] 8202414: Unsafe crash in C2 >> >> Hi Lutz and Andrew, >> >> Thank you for your reply and sorry for my typos :) >> >> TL;DR >> I think it is the optimization of `clear_memory()`which cause this problem, in my understanding it may not be a user fault :) >> >> When running the example on the bug list using `-XX:DisableIntrinsic=_putInt`, or use interpreter/C1 only will make this problem go away, due to the fact that program will go to another branch. >> >> In function `clear_memory()`, it will make an optimization which will clear the context of the memory, in fact >> >> if (done_offset > start_offset) { // [1] >> // it will clear the memory from start_offset to done_offset >> } >> >> if (done_offset < end_offset) { // [2] >> // it will clear the memory by using a Int (0) to clear the memory of done_offset >> } >> >> |<--------------- 16-byte header ?--??>| >> | 0000 0001 1000 1101 | >> If it is aligned, it won?t have any problem but, if it is not aligned as the example, this optimization will mis-clear the `0000 0001` to `0000 0000`, so the array length becomes 141. Then it will crash when gc happens. >> >> It is the optimization which cause this problem, so when it is not aligned, we don?t do this optimization for this unaligned address may solve the problem. >> By the way I didn?t find the unaligned message on the doc:( but I think you?re right and it should be aligned when using Unsafe, though it is an deprecated API :) It won?t be reproduced using the templateInterpreter or C1 compiler, due to the fact that they won?t do this optimization. >> >> Thank you:), >> Andy >> > > > From sandhya.viswanathan at intel.com Tue Sep 11 21:58:43 2018 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 11 Sep 2018 21:58:43 +0000 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> Hi Vladimir, Thanks a lot for your review and feedback. Please see my response in your email below. I will send an updated webrev incorporating your feedback. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Monday, September 10, 2018 6:09 PM To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction Very nice. Thank you, Sandhya. I would like to see more meaningful naming in .ad files - instead of rreg* and ovec* to have something like vlReg* and legVec*. >>> Yes, accepted. New load_from_* and load_to_* instructions in .ad files should be renamed to next and collocate with other Move*_reg_reg* instructions: instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, vlRegF src) >>> Yes, accepted. You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store? >>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa. Also please explain why these registers are used when UseAVX == 0?: +instruct absD_reg(rregD dst) %{ predicate((UseSSE>=2) && (UseAVX == 0)); we switch off evex so regular regD (only legacy register in this case) should work too: 661 if (UseAVX < 3) { 662 _features &= ~CPU_AVX512F; >>> Yes, accepted. It could be regD here. Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, +vectors_reg_legacy, %{ VM_Version::supports_evex() && VM_Version::supports_avx512bw() && VM_Version::supports_avx512dq() && VM_Version::supports_avx512vl() %} ); >>> Yes, accepted. I would suggest to test these changes on different machines (non-avx512 and avx512) and with different UseAVX values. >>> Will do. Thanks, Vladimir On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: > Recently there have been couple of high priority issues with regards > to high bank of XMM register > (XMM16-XMM31) usage by C2: > > https://bugs.openjdk.java.net/browse/JDK-8207746 > > https://bugs.openjdk.java.net/browse/JDK-8209735 > > Please find below a patch which attempts to clean up the XMM register handling by using register groups. > > http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ > > > The patch provides a restricted set of registers to the match rules in > the ad file based on the underlying architecture. > > The aim is to remove special handling/workaround from macro assembler and assembler. > > By removing the special handling, the patch reduces the overall code size by about 1800 lines of code. > > Your review and feedback is very welcome. > > Best Regards, > > Sandhya > From vladimir.kozlov at oracle.com Wed Sep 12 00:11:25 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 11 Sep 2018 17:11:25 -0700 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> Message-ID: <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> Thank you. I want to discuss next issue: > You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store? >>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa. This is what I thought. You increase registers pressure this way which may cause spills on stack. Also we don't check that register could be the same as result you may get unneeded moves. I would advice add memory moves at least. An other question. You use movflt() and movdbl() which use either movap[s|d] and movs[s|d] instructions: http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu/x86/macroAssembler_x86.hpp#l164 Are these instructions work when avx512vl is not available? I see for vectors you use vpxor+vinserti* combination. Last question. I notice next UseAVX check in vectors spills code in x86.ad: if ((UseAVX < 2) || VM_Version::supports_avx512vl()) Should it be (UseAVX < 3)? Thanks, Vladimir On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > Thanks a lot for your review and feedback. Please see my response in your email below. I will send an updated webrev incorporating your feedback. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Monday, September 10, 2018 6:09 PM > To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction > > Very nice. Thank you, Sandhya. > > I would like to see more meaningful naming in .ad files - instead of rreg* and ovec* to have something like vlReg* and legVec*. > >>>> Yes, accepted. > > New load_from_* and load_to_* instructions in .ad files should be renamed to next and collocate with other Move*_reg_reg* instructions: > > instruct MoveF2VL(vlRegF dst, regF src) > instruct MoveVL2F(regF dst, vlRegF src) >>>> Yes, accepted. > > You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store? >>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa. > > Also please explain why these registers are used when UseAVX == 0?: > > +instruct absD_reg(rregD dst) %{ > predicate((UseSSE>=2) && (UseAVX == 0)); > > we switch off evex so regular regD (only legacy register in this case) should work too: > 661 if (UseAVX < 3) { > 662 _features &= ~CPU_AVX512F; > >>>> Yes, accepted. It could be regD here. > > Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): > > +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, > +vectors_reg_legacy, %{ > VM_Version::supports_evex() && VM_Version::supports_avx512bw() && VM_Version::supports_avx512dq() && > VM_Version::supports_avx512vl() %} ); > >>>> Yes, accepted. > > I would suggest to test these changes on different machines (non-avx512 and avx512) and with different UseAVX values. > >>>> Will do. > > Thanks, > Vladimir > > On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: >> Recently there have been couple of high priority issues with regards >> to high bank of XMM register >> (XMM16-XMM31) usage by C2: >> >> https://bugs.openjdk.java.net/browse/JDK-8207746 >> >> https://bugs.openjdk.java.net/browse/JDK-8209735 >> >> Please find below a patch which attempts to clean up the XMM register handling by using register groups. >> >> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ >> >> >> The patch provides a restricted set of registers to the match rules in >> the ad file based on the underlying architecture. >> >> The aim is to remove special handling/workaround from macro assembler and assembler. >> >> By removing the special handling, the patch reduces the overall code size by about 1800 lines of code. >> >> Your review and feedback is very welcome. >> >> Best Regards, >> >> Sandhya >> From sandhya.viswanathan at intel.com Wed Sep 12 01:13:32 2018 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Wed, 12 Sep 2018 01:13:32 +0000 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> Hi Vladimir, Thanks a lot for the detailed review. I really appreciate your feedback. Please see my response in your email below marked with (Sandhya >>>). Looking forward to your advice. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, September 11, 2018 5:11 PM To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction Thank you. I want to discuss next issue: > You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store? >>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa. This is what I thought. You increase registers pressure this way which may cause spills on stack. Also we don't check that register could be the same as result you may get unneeded moves. I would advice add memory moves at least. Sandhya >>> I had added those rules initially and removed them in the final patch. I noticed that the register allocator uses the memory rules (e.g. LoadF) to initialize the idealreg2reg mask (matcher.cpp). I would like the register allocator to get all the possible register on an architecture for idealreg2reg mask. I wondered that multiple instruct rules in .ad file for LoadF from memory might cause problems. I would have to have higher cost for loading into restricted register set like vlReg. Then I decided that the register allocator can handle this in much better way than me adding rules to load from memory. This is with the background that the regF is always all the available registers and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I am referring to is: MachNode *spillCP = match_tree(new LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); #endif MachNode *spillI = match_tree(new LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); MachNode *spillL = match_tree(new LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, LoadNode::DependsO nlyOnTest, false)); MachNode *spillF = match_tree(new LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); MachNode *spillD = match_tree(new LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); MachNode *spillP = match_tree(new LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); .... idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); An other question. You use movflt() and movdbl() which use either movap[s|d] and movs[s|d] instructions: http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu/x86/macroAssembler_x86.hpp#l164 Are these instructions work when avx512vl is not available? I see for vectors you use vpxor+vinserti* combination. Sandhya >>> Yes the scalar floating point instructions are available with AVX512 encoding when avx512vl is not available. That is why you would see not just movflt, movdbl but all the other scalar operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions. Last question. I notice next UseAVX check in vectors spills code in x86.ad: if ((UseAVX < 2) || VM_Version::supports_avx512vl()) Should it be (UseAVX < 3)? Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev. Thanks, Vladimir On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > Thanks a lot for your review and feedback. Please see my response in your email below. I will send an updated webrev incorporating your feedback. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Monday, September 10, 2018 6:09 PM > To: Viswanathan, Sandhya ; > hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction > > Very nice. Thank you, Sandhya. > > I would like to see more meaningful naming in .ad files - instead of rreg* and ovec* to have something like vlReg* and legVec*. > >>>> Yes, accepted. > > New load_from_* and load_to_* instructions in .ad files should be renamed to next and collocate with other Move*_reg_reg* instructions: > > instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, > vlRegF src) >>>> Yes, accepted. > > You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store? >>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa. > > Also please explain why these registers are used when UseAVX == 0?: > > +instruct absD_reg(rregD dst) %{ > predicate((UseSSE>=2) && (UseAVX == 0)); > > we switch off evex so regular regD (only legacy register in this case) should work too: > 661 if (UseAVX < 3) { > 662 _features &= ~CPU_AVX512F; > >>>> Yes, accepted. It could be regD here. > > Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): > > +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, > +vectors_reg_legacy, %{ > VM_Version::supports_evex() && VM_Version::supports_avx512bw() && > VM_Version::supports_avx512dq() && > VM_Version::supports_avx512vl() %} ); > >>>> Yes, accepted. > > I would suggest to test these changes on different machines (non-avx512 and avx512) and with different UseAVX values. > >>>> Will do. > > Thanks, > Vladimir > > On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: >> Recently there have been couple of high priority issues with regards >> to high bank of XMM register >> (XMM16-XMM31) usage by C2: >> >> https://bugs.openjdk.java.net/browse/JDK-8207746 >> >> https://bugs.openjdk.java.net/browse/JDK-8209735 >> >> Please find below a patch which attempts to clean up the XMM register handling by using register groups. >> >> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ >> >> >> The patch provides a restricted set of registers to the match rules >> in the ad file based on the underlying architecture. >> >> The aim is to remove special handling/workaround from macro assembler and assembler. >> >> By removing the special handling, the patch reduces the overall code size by about 1800 lines of code. >> >> Your review and feedback is very welcome. >> >> Best Regards, >> >> Sandhya >> From vladimir.kozlov at oracle.com Wed Sep 12 03:53:46 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 11 Sep 2018 20:53:46 -0700 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> Message-ID: Thank you, Sandhya I am satisfied with your detailed answer for memory loads issues. Okay lets not add them. Vladimir On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > Thanks a lot for the detailed review. I really appreciate your feedback. > Please see my response in your email below marked with (Sandhya >>>). Looking forward to your advice. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, September 11, 2018 5:11 PM > To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction > > Thank you. > > I want to discuss next issue: > > > You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store? > >>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa. > > This is what I thought. You increase registers pressure this way which may cause spills on stack. > Also we don't check that register could be the same as result you may get unneeded moves. > > I would advice add memory moves at least. > > Sandhya >>> I had added those rules initially and removed them in the final patch. I noticed that the register allocator uses the memory rules (e.g. LoadF) to initialize the idealreg2reg mask (matcher.cpp). I would like the register allocator to get all the possible register on an architecture for idealreg2reg mask. I wondered that multiple instruct rules in .ad file for LoadF from memory might cause problems. I would have to have higher cost for loading into restricted register set like vlReg. Then I decided that the register allocator can handle this in much better way than me adding rules to load from memory. This is with the background that the regF is always all the available registers and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I am referring to is: > MachNode *spillCP = match_tree(new LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); > #endif > MachNode *spillI = match_tree(new LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); > MachNode *spillL = match_tree(new LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, LoadNode::DependsO > nlyOnTest, false)); > MachNode *spillF = match_tree(new LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); > MachNode *spillD = match_tree(new LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); > MachNode *spillP = match_tree(new LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); > .... > idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); > > An other question. You use movflt() and movdbl() which use either movap[s|d] and movs[s|d] instructions: > http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu/x86/macroAssembler_x86.hpp#l164 > Are these instructions work when avx512vl is not available? I see for vectors you use > vpxor+vinserti* combination. > > Sandhya >>> Yes the scalar floating point instructions are available with AVX512 encoding when avx512vl is not available. That is why you would see not just movflt, movdbl but all the other scalar operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions. > > Last question. I notice next UseAVX check in vectors spills code in x86.ad: > if ((UseAVX < 2) || VM_Version::supports_avx512vl()) > > Should it be (UseAVX < 3)? > > Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev. > > Thanks, > Vladimir > > On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: >> Hi Vladimir, >> >> Thanks a lot for your review and feedback. Please see my response in your email below. I will send an updated webrev incorporating your feedback. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Monday, September 10, 2018 6:09 PM >> To: Viswanathan, Sandhya ; >> hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >> >> Very nice. Thank you, Sandhya. >> >> I would like to see more meaningful naming in .ad files - instead of rreg* and ovec* to have something like vlReg* and legVec*. >> >>>>> Yes, accepted. >> >> New load_from_* and load_to_* instructions in .ad files should be renamed to next and collocate with other Move*_reg_reg* instructions: >> >> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, >> vlRegF src) >>>>> Yes, accepted. >> >> You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store? >>>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa. >> >> Also please explain why these registers are used when UseAVX == 0?: >> >> +instruct absD_reg(rregD dst) %{ >> predicate((UseSSE>=2) && (UseAVX == 0)); >> >> we switch off evex so regular regD (only legacy register in this case) should work too: >> 661 if (UseAVX < 3) { >> 662 _features &= ~CPU_AVX512F; >> >>>>> Yes, accepted. It could be regD here. >> >> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): >> >> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, >> +vectors_reg_legacy, %{ >> VM_Version::supports_evex() && VM_Version::supports_avx512bw() && >> VM_Version::supports_avx512dq() && >> VM_Version::supports_avx512vl() %} ); >> >>>>> Yes, accepted. >> >> I would suggest to test these changes on different machines (non-avx512 and avx512) and with different UseAVX values. >> >>>>> Will do. >> >> Thanks, >> Vladimir >> >> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: >>> Recently there have been couple of high priority issues with regards >>> to high bank of XMM register >>> (XMM16-XMM31) usage by C2: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8207746 >>> >>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>> >>> Please find below a patch which attempts to clean up the XMM register handling by using register groups. >>> >>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ >>> >>> >>> The patch provides a restricted set of registers to the match rules >>> in the ad file based on the underlying architecture. >>> >>> The aim is to remove special handling/workaround from macro assembler and assembler. >>> >>> By removing the special handling, the patch reduces the overall code size by about 1800 lines of code. >>> >>> Your review and feedback is very welcome. >>> >>> Best Regards, >>> >>> Sandhya >>> From rwestrel at redhat.com Wed Sep 12 07:56:58 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 12 Sep 2018 09:56:58 +0200 Subject: RFR: 8210578: AArch64: Invalid encoding for fmlsvs instruction In-Reply-To: <80d98039-c412-b638-c764-0794da1ba26f@redhat.com> References: <80d98039-c412-b638-c764-0794da1ba26f@redhat.com> Message-ID: Patch looks good to me. Roland. From adinn at redhat.com Wed Sep 12 08:07:39 2018 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 12 Sep 2018 09:07:39 +0100 Subject: RFR: 8210578: AArch64: Invalid encoding for fmlsvs instruction In-Reply-To: References: <80d98039-c412-b638-c764-0794da1ba26f@redhat.com> Message-ID: On 12/09/18 08:56, Roland Westrelin wrote: > > Patch looks good to me. Thanks for the review Roland. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From Pengfei.Li at arm.com Wed Sep 12 09:50:44 2018 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Wed, 12 Sep 2018 09:50:44 +0000 Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by power-of-2 check In-Reply-To: References: Message-ID: Hi, I've updated the patch based on Vladimir's comment. I added checks for SubI on both branches of the diamond phi. Also thanks Roland for the suggestion that supporting a Phi with 3 or more inputs. But I think the matching rule will be much more complex if we add this. And I'm not sure if there are any real case scenario which can benefit from this support. So I didn't add it in. New webrev: http://cr.openjdk.java.net/~njian/8210152/webrev.01/ I've run jtreg full test with the new patch and no new issues found. Please let me know if you have other comments or suggestions. If no further issues, I need your help to sponsor and push the patch. -- Thanks, Pengfei > -----Original Message----- > From: Roland Westrelin > Sent: Tuesday, September 11, 2018 16:24 > To: Vladimir Kozlov ; Pengfei Li (Arm > Technology China) ; dean.long at oracle.com; hotspot- > compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net > Cc: nd > Subject: Re: [PING] RE: RFR(S): 8210152: Optimize integer divisible by power- > of-2 check > > > > The only comment I have is to add check for SubI on other branch (not > > only on True branch). Negation may occur on either branch since you > > accept all conditions for negation. > > Can't we make this more general and support a phi with any number of > inputs (not only 2 data inputs) as long as it's a mix of X and -X? > > Roland. From goetz.lindenmaier at sap.com Wed Sep 12 10:12:53 2018 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 12 Sep 2018 10:12:53 +0000 Subject: RFR(M): 8210497: [PPC64] Vector registers not saved across safepoint In-Reply-To: References: Message-ID: <41135339cfc6421d86cd3b7147eee525@sap.com> Hi Martin, I had a look at your fix and it looks good. I currently can't judge though whether even more vector support is needed in some other place. But this is not subject of this fix I guess. Reviewed. Best regards, Geotz. PS: I would appreciate if you put the 'save_registers' argument on a line of it's own whereever this is done for all other arguments. No new webrev needed. From: Doerr, Martin Sent: Freitag, 7. September 2018 18:12 To: 'hotspot-compiler-dev at openjdk.java.net' ; Michihiro Horie (HORIE at jp.ibm.com) ; Gustavo Romero ; Lindenmaier, Goetz Subject: RFR(M): 8210497: [PPC64] Vector registers not saved across safepoint Hi, we noticed that the RegisterSaver misses code to save and restore the vector registers on PPC64. I'd like to fix that. Bug: https://bugs.openjdk.java.net/browse/JDK-8210497 Webrev: http://cr.openjdk.java.net/~mdoerr/8210497_PPC64_save_CR/webrev.00/ This webrev already fixes the following tests when JDK-8208171 webrev.03 is applied: compiler/runtime/safepoints/TestRegisterRestoring.java compiler/runtime/Test7196199.java I'll try to test the OopMap part. This may be tricky. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From lutz.schmidt at sap.com Wed Sep 12 10:23:51 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Wed, 12 Sep 2018 10:23:51 +0000 Subject: [CAUTION] RE: RFR(M): 8210497: [PPC64] Vector registers not saved across safepoint Message-ID: <762D04D8-0739-4212-93EE-4B901D3D9031@sap.com> Hi Martin, your changes look good overall. I'm not a reviewer, so that judgement doesn't help you much. ?? I have found a few details you may want to consider: src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp @line 272: the new frame is pushed, but @line 275 comment says frame is not yet pushed. Is there a reason why you need both, R30 and R31, as scratch in push_frame_reg_args_and_save_live_registers()? Regards, Lutz From: hotspot-compiler-dev on behalf of G?tz Lindenmaier Date: Wednesday, 12. September 2018 at 12:12 To: "Doerr, Martin (martin.doerr at sap.com)" , "'hotspot-compiler-dev at openjdk.java.net'" , "Michihiro Horie (HORIE at jp.ibm.com)" , Gustavo Romero Subject: [CAUTION] RE: RFR(M): 8210497: [PPC64] Vector registers not saved across safepoint Hi Martin, ? I had a look at your fix and it looks good. I currently can?t judge though whether even more vector support is needed in some other place. But this is not subject of this fix I guess. ? Reviewed. ? Best regards, ? Geotz. ? PS: I would appreciate if you put the ?save_registers? argument on a line of it?s own whereever this is done for all other arguments.? No new webrev needed. ? ? From: Doerr, Martin Sent: Freitag, 7. September 2018 18:12 To: 'hotspot-compiler-dev at openjdk.java.net' ; Michihiro Horie (HORIE at jp.ibm.com) ; Gustavo Romero ; Lindenmaier, Goetz Subject: RFR(M): 8210497: [PPC64] Vector registers not saved across safepoint ? Hi, ? we noticed that the RegisterSaver misses code to save and restore the vector registers on PPC64. I?d like to fix that. ? Bug: https://bugs.openjdk.java.net/browse/JDK-8210497 ? Webrev: http://cr.openjdk.java.net/~mdoerr/8210497_PPC64_save_CR/webrev.00/ ? This webrev already fixes the following tests when JDK-8208171 webrev.03 is applied: compiler/runtime/safepoints/TestRegisterRestoring.java compiler/runtime/Test7196199.java ? I?ll try to test the OopMap part. This may be tricky. ? Best regards, Martin ? From martin.doerr at sap.com Wed Sep 12 10:44:32 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 12 Sep 2018 10:44:32 +0000 Subject: RFR(M): 8210497: [PPC64] Vector registers not saved across safepoint Message-ID: <1246d39ddb974707aaed8b9f66791f43@sap.com> Hi G?tz and Lutz, thank you for the reviews. I'll update what you requested before pushing: I'll add line breaks for the new arguments. And I'll update the comment. Thanks for pointing me to it, Lutz. R31 is used to determine the return pc which is used by one of the callers (generate_handler_blob). Please note that this register usage is unrelated to this change and I didn't touch it in this changelist. Best regards, Martin -----Original Message----- From: Schmidt, Lutz Sent: Mittwoch, 12. September 2018 12:24 To: Lindenmaier, Goetz ; Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' ; Michihiro Horie (HORIE at jp.ibm.com) ; Gustavo Romero Subject: Re: RE: RFR(M): 8210497: [PPC64] Vector registers not saved across safepoint Hi Martin, your changes look good overall. I'm not a reviewer, so that judgement doesn't help you much. ?? I have found a few details you may want to consider: src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp @line 272: the new frame is pushed, but @line 275 comment says frame is not yet pushed. Is there a reason why you need both, R30 and R31, as scratch in push_frame_reg_args_and_save_live_registers()? Regards, Lutz From: hotspot-compiler-dev on behalf of G?tz Lindenmaier Date: Wednesday, 12. September 2018 at 12:12 To: "Doerr, Martin (martin.doerr at sap.com)" , "'hotspot-compiler-dev at openjdk.java.net'" , "Michihiro Horie (HORIE at jp.ibm.com)" , Gustavo Romero Subject: [CAUTION] RE: RFR(M): 8210497: [PPC64] Vector registers not saved across safepoint Hi Martin, ? I had a look at your fix and it looks good. I currently can?t judge though whether even more vector support is needed in some other place. But this is not subject of this fix I guess. ? Reviewed. ? Best regards, ? Geotz. ? PS: I would appreciate if you put the ?save_registers? argument on a line of it?s own whereever this is done for all other arguments.? No new webrev needed. ? ? From: Doerr, Martin Sent: Freitag, 7. September 2018 18:12 To: 'hotspot-compiler-dev at openjdk.java.net' ; Michihiro Horie (HORIE at jp.ibm.com) ; Gustavo Romero ; Lindenmaier, Goetz Subject: RFR(M): 8210497: [PPC64] Vector registers not saved across safepoint ? Hi, ? we noticed that the RegisterSaver misses code to save and restore the vector registers on PPC64. I?d like to fix that. ? Bug: https://bugs.openjdk.java.net/browse/JDK-8210497 ? Webrev: http://cr.openjdk.java.net/~mdoerr/8210497_PPC64_save_CR/webrev.00/ ? This webrev already fixes the following tests when JDK-8208171 webrev.03 is applied: compiler/runtime/safepoints/TestRegisterRestoring.java compiler/runtime/Test7196199.java ? I?ll try to test the OopMap part. This may be tricky. ? Best regards, Martin ? From aph at redhat.com Wed Sep 12 14:36:40 2018 From: aph at redhat.com (Andrew Haley) Date: Wed, 12 Sep 2018 15:36:40 +0100 Subject: JIT: C2 doesn't skip post barrier for new allocated objects In-Reply-To: <1849545e-2e42-4b24-957c-cb4924362971.kuaiwei.kw@alibaba-inc.com> References: <1849545e-2e42-4b24-957c-cb4924362971.kuaiwei.kw@alibaba-inc.com> Message-ID: On 09/10/2018 09:39 AM, Kuai Wei wrote: > Recently I checked the optimization of reducing G1 post barrier for new allocated object. But I found it doesn't work as expected. > I wrote a simple test case to store oop in initialize function or just after init function . I believe you are correct. We need a bug report created for this. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From HORIE at jp.ibm.com Wed Sep 12 16:10:51 2018 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Thu, 13 Sep 2018 01:10:51 +0900 Subject: RFR: PPC64: Mapping floating point registers to vsx registers in ppc.ad Message-ID: Dear all, Would you please review the following change? Bug: https://bugs.openjdk.java.net/browse/JDK-8210660 Webrev: http://cr.openjdk.java.net/~mhorie/8210660/webrev.00/ In the current code emit for replicating the floating point value in ppc.ad, a floating point value is once stored in order to load as an integer value. However, when SuperwordUseVSX is enabled, this is redundant because the floating point registers are overlapped with vsx registers 0-31. We can use vsx instructions for replicating the floating point value by mapping the floating point register to the vsx register. Best regards, -- Michihiro, IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkennke at redhat.com Wed Sep 12 20:11:51 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 12 Sep 2018 22:11:51 +0200 Subject: RFR: JDK-8210656: Object equals abstraction for BarrierSetC2 Message-ID: <07f491e5-fb06-86a3-f6f1-1b9d37544fb0@redhat.com> This introduces an abstraction to deal with object equality in BarrierSetC2. This is needed by GCs that can have different copies of same objects alive like Shenandoah. The approach chosen here is slightly different than we did in e.g. BarrierSetAssembler and the runtime Access API: instead of owning the whole equality, it only provides a resolve-like method to resolve the operands to stable values. The reason for doing this is that it's easier to do this way in intrinsics if those barriers are detached from the actual CmpP. This is because the barriers create new memory states, and we'd have to create memphis around those things, which is considerably more complex. I chose to add a new resolve_for_obj_equals(a, b) method instead of using two calls to resolve(a); resolve(b); because this allows for optimization: if any of a or b is known to be NULL, we can elide barriers for both. This is not possible to do with two independent resolve() calls. Bug: https://bugs.openjdk.java.net/browse/JDK-8210656 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8210656/webrev.00/ Testing: passes hotspot/jtreg:tier1 What do you think about this? Thanks, Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From vladimir.kozlov at oracle.com Wed Sep 12 20:52:59 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 12 Sep 2018 13:52:59 -0700 Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error # ERROR: TEST FAILED: Cought IOException while receiving event packet Message-ID: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com> http://cr.openjdk.java.net/~kvn/8210220/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8210220 Don't register AOT method if corresponding java method has breakpoints (for debugging) otherwise AOT method will be executed which do not stop at breakpoint. JIT has similar check [1]. I also removed AOT code which is not used and we forgot to remove. Tested hs-tier1-3. thanks, Vladimir [1] http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845 From vladimir.kozlov at oracle.com Wed Sep 12 21:15:42 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 12 Sep 2018 14:15:42 -0700 Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by power-of-2 check In-Reply-To: References: Message-ID: <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com> Looks good. Thanks, Vladimir On 9/12/18 2:50 AM, Pengfei Li (Arm Technology China) wrote: > Hi, > > I've updated the patch based on Vladimir's comment. I added checks for SubI on both branches of the diamond phi. > Also thanks Roland for the suggestion that supporting a Phi with 3 or more inputs. But I think the matching rule will be much more complex if we add this. And I'm not sure if there are any real case scenario which can benefit from this support. So I didn't add it in. > > New webrev: http://cr.openjdk.java.net/~njian/8210152/webrev.01/ > I've run jtreg full test with the new patch and no new issues found. > > Please let me know if you have other comments or suggestions. If no further issues, I need your help to sponsor and push the patch. > > -- > Thanks, > Pengfei > > >> -----Original Message----- >> From: Roland Westrelin >> Sent: Tuesday, September 11, 2018 16:24 >> To: Vladimir Kozlov ; Pengfei Li (Arm >> Technology China) ; dean.long at oracle.com; hotspot- >> compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net >> Cc: nd >> Subject: Re: [PING] RE: RFR(S): 8210152: Optimize integer divisible by power- >> of-2 check >> >> >>> The only comment I have is to add check for SubI on other branch (not >>> only on True branch). Negation may occur on either branch since you >>> accept all conditions for negation. >> >> Can't we make this more general and support a phi with any number of >> inputs (not only 2 data inputs) as long as it's a mix of X and -X? >> >> Roland. From dean.long at oracle.com Thu Sep 13 00:45:48 2018 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 12 Sep 2018 17:45:48 -0700 Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error # ERROR: TEST FAILED: Cought IOException while receiving event packet In-Reply-To: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com> References: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com> Message-ID: Hi Vladimir.? C1 and C2 use ciEnv which also grabs locks and checks JvmtiExport::can_hotswap_or_post_breakpoint() and Dependencies::check_evol_method().? But if the breakpoint count can only be changed by the VM thread at a safepoint, then your fix looks good as long as we don't enter a safepoint before the code is registered.? How about adding a NoSafepointVerifier to publish_aot()? dl On 9/12/18 1:52 PM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/8210220/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8210220 > > Don't register AOT method if corresponding java method has breakpoints > (for debugging) otherwise AOT method will be executed which do not > stop at breakpoint. JIT has similar check [1]. > > I also removed AOT code which is not used and we forgot to remove. > > Tested hs-tier1-3. > > thanks, > Vladimir > > [1] > http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845 From vladimir.kozlov at oracle.com Thu Sep 13 01:25:30 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 12 Sep 2018 18:25:30 -0700 Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error # ERROR: TEST FAILED: Cought IOException while receiving event packet In-Reply-To: References: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com> Message-ID: Thank you, Dean Breakpoint is set at safepoint: http://hg.openjdk.java.net/jdk/jdk/file/b7bfd64e43a6/src/hotspot/share/prims/jvmtiImpl.cpp#l411 But why it is important to not be at safepoint in publish_aot(). If AOT is registered first and then breakpoint is set AOT methods will be deoptimized by CodeCache::flush_dependents_on_method() which is called from BreakpointInfo::set(). Vladimir On 9/12/18 5:45 PM, dean.long at oracle.com wrote: > Hi Vladimir.? C1 and C2 use ciEnv which also grabs locks and checks JvmtiExport::can_hotswap_or_post_breakpoint() and > Dependencies::check_evol_method().? But if the breakpoint count can only be changed by the VM thread at a safepoint, > then your fix looks good as long as we don't enter a safepoint before the code is registered.? How about adding a > NoSafepointVerifier to publish_aot()? > > dl > > > On 9/12/18 1:52 PM, Vladimir Kozlov wrote: >> http://cr.openjdk.java.net/~kvn/8210220/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8210220 >> >> Don't register AOT method if corresponding java method has breakpoints (for debugging) otherwise AOT method will be >> executed which do not stop at breakpoint. JIT has similar check [1]. >> >> I also removed AOT code which is not used and we forgot to remove. >> >> Tested hs-tier1-3. >> >> thanks, >> Vladimir >> >> [1] http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845 > From dean.long at oracle.com Thu Sep 13 01:51:32 2018 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 12 Sep 2018 18:51:32 -0700 Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error # ERROR: TEST FAILED: Cought IOException while receiving event packet In-Reply-To: References: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com> Message-ID: On 9/12/18 6:25 PM, Vladimir Kozlov wrote: > Thank you, Dean > > Breakpoint is set at safepoint: > > http://hg.openjdk.java.net/jdk/jdk/file/b7bfd64e43a6/src/hotspot/share/prims/jvmtiImpl.cpp#l411 > > > But why it is important to not be at safepoint in publish_aot(). If > AOT is registered first and then breakpoint is set AOT methods will be > deoptimized by CodeCache::flush_dependents_on_method() which is called > from BreakpointInfo::set(). I mean you can't do this: 1) check breakpoint count 2) safepoint 3) register code The AOT code is not visible to CodeCache::flush_dependents_on_method() until the cmpxchg(). NoSafepointVerifier would catch any changes in the future that introduce a safepoint. dl > > Vladimir > > On 9/12/18 5:45 PM, dean.long at oracle.com wrote: >> Hi Vladimir.? C1 and C2 use ciEnv which also grabs locks and checks >> JvmtiExport::can_hotswap_or_post_breakpoint() and >> Dependencies::check_evol_method().? But if the breakpoint count can >> only be changed by the VM thread at a safepoint, then your fix looks >> good as long as we don't enter a safepoint before the code is >> registered.? How about adding a NoSafepointVerifier to publish_aot()? >> >> dl >> >> >> On 9/12/18 1:52 PM, Vladimir Kozlov wrote: >>> http://cr.openjdk.java.net/~kvn/8210220/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8210220 >>> >>> Don't register AOT method if corresponding java method has >>> breakpoints (for debugging) otherwise AOT method will be executed >>> which do not stop at breakpoint. JIT has similar check [1]. >>> >>> I also removed AOT code which is not used and we forgot to remove. >>> >>> Tested hs-tier1-3. >>> >>> thanks, >>> Vladimir >>> >>> [1] >>> http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845 >> From igor.ignatyev at oracle.com Thu Sep 13 03:46:50 2018 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 12 Sep 2018 20:46:50 -0700 Subject: RFR(XS) : 8210699 : Problem list tests which times out in Xcomp mode Message-ID: http://cr.openjdk.java.net/~iignatyev//8210699/webrev.00/index.html > 62 lines changed: 62 ins; 0 del; 0 mod; Hi all, could you please review this small fix which introduces a Xcomp-specific problem-list? the analysis of the failures from last two weeks showed that most of timeouts w/ -Xcomp happen in a handful number of tests. the patch puts the tests which times out only w/ Xcomp. for the record, here is the statistics on how many timeouts we have observed: java/lang/invoke/MethodHandles/CatchExceptionTest : 33 times, on different platforms vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH : 32 times, only on solaris-sparc runtime/appcds/cacheObject/DifferentHeapSizes : 17 times, only on solaris-sparc java/lang/Class/forName/modules/TestDriver : 10 times, only on solaris-sparc webrev: http://cr.openjdk.java.net/~iignatyev//8210699/webrev.00/index.html JBS: https://bugs.openjdk.java.net/browse/JDK-8210699 Thanks, -- Igor From vladimir.kozlov at oracle.com Thu Sep 13 04:00:33 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 12 Sep 2018 21:00:33 -0700 Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error # ERROR: TEST FAILED: Cought IOException while receiving event packet In-Reply-To: References: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com> Message-ID: <1203c5d6-c74a-e205-9dcb-eb43fb6971d3@oracle.com> Yes, you are right I will add NoSafepointVerifier and will rerun testing. Thanks, Vladimir On 9/12/18 6:51 PM, dean.long at oracle.com wrote: > On 9/12/18 6:25 PM, Vladimir Kozlov wrote: >> Thank you, Dean >> >> Breakpoint is set at safepoint: >> >> http://hg.openjdk.java.net/jdk/jdk/file/b7bfd64e43a6/src/hotspot/share/prims/jvmtiImpl.cpp#l411 >> >> But why it is important to not be at safepoint in publish_aot(). If AOT is registered first and then breakpoint is set >> AOT methods will be deoptimized by CodeCache::flush_dependents_on_method() which is called from BreakpointInfo::set(). > > I mean you can't do this: > > 1) check breakpoint count > 2) safepoint > 3) register code > > The AOT code is not visible to CodeCache::flush_dependents_on_method() until the cmpxchg(). > NoSafepointVerifier would catch any changes in the future that introduce a safepoint. > > dl > >> >> Vladimir >> >> On 9/12/18 5:45 PM, dean.long at oracle.com wrote: >>> Hi Vladimir.? C1 and C2 use ciEnv which also grabs locks and checks JvmtiExport::can_hotswap_or_post_breakpoint() and >>> Dependencies::check_evol_method().? But if the breakpoint count can only be changed by the VM thread at a safepoint, >>> then your fix looks good as long as we don't enter a safepoint before the code is registered.? How about adding a >>> NoSafepointVerifier to publish_aot()? >>> >>> dl >>> >>> >>> On 9/12/18 1:52 PM, Vladimir Kozlov wrote: >>>> http://cr.openjdk.java.net/~kvn/8210220/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8210220 >>>> >>>> Don't register AOT method if corresponding java method has breakpoints (for debugging) otherwise AOT method will be >>>> executed which do not stop at breakpoint. JIT has similar check [1]. >>>> >>>> I also removed AOT code which is not used and we forgot to remove. >>>> >>>> Tested hs-tier1-3. >>>> >>>> thanks, >>>> Vladimir >>>> >>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845 >>> > From vladimir.kozlov at oracle.com Thu Sep 13 04:02:06 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 12 Sep 2018 21:02:06 -0700 Subject: RFR(XS) : 8210699 : Problem list tests which times out in Xcomp mode In-Reply-To: References: Message-ID: <67fdf5cd-4642-fad1-88ca-1d113fb5ecc4@oracle.com> Good. Thanks, Vladimir On 9/12/18 8:46 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8210699/webrev.00/index.html >> 62 lines changed: 62 ins; 0 del; 0 mod; > > Hi all, > > could you please review this small fix which introduces a Xcomp-specific problem-list? > > the analysis of the failures from last two weeks showed that most of timeouts w/ -Xcomp happen in a handful number of tests. the patch puts the tests which times out only w/ Xcomp. > > for the record, here is the statistics on how many timeouts we have observed: > java/lang/invoke/MethodHandles/CatchExceptionTest : 33 times, on different platforms > vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH : 32 times, only on solaris-sparc > runtime/appcds/cacheObject/DifferentHeapSizes : 17 times, only on solaris-sparc > java/lang/Class/forName/modules/TestDriver : 10 times, only on solaris-sparc > > webrev: http://cr.openjdk.java.net/~iignatyev//8210699/webrev.00/index.html > JBS: https://bugs.openjdk.java.net/browse/JDK-8210699 > > Thanks, > -- Igor > From igor.ignatyev at oracle.com Thu Sep 13 04:55:31 2018 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 12 Sep 2018 21:55:31 -0700 Subject: RFR(XS) : 8210699 : Problem list tests which times out in Xcomp mode In-Reply-To: <67fdf5cd-4642-fad1-88ca-1d113fb5ecc4@oracle.com> References: <67fdf5cd-4642-fad1-88ca-1d113fb5ecc4@oracle.com> Message-ID: Vladimir, thank you for review. -- Igor > On Sep 12, 2018, at 9:02 PM, Vladimir Kozlov wrote: > > Good. > > Thanks, > Vladimir > > On 9/12/18 8:46 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8210699/webrev.00/index.html >>> 62 lines changed: 62 ins; 0 del; 0 mod; >> Hi all, >> could you please review this small fix which introduces a Xcomp-specific problem-list? >> the analysis of the failures from last two weeks showed that most of timeouts w/ -Xcomp happen in a handful number of tests. the patch puts the tests which times out only w/ Xcomp. >> for the record, here is the statistics on how many timeouts we have observed: >> java/lang/invoke/MethodHandles/CatchExceptionTest : 33 times, on different platforms >> vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH : 32 times, only on solaris-sparc >> runtime/appcds/cacheObject/DifferentHeapSizes : 17 times, only on solaris-sparc >> java/lang/Class/forName/modules/TestDriver : 10 times, only on solaris-sparc >> webrev: http://cr.openjdk.java.net/~iignatyev//8210699/webrev.00/index.html >> JBS: https://bugs.openjdk.java.net/browse/JDK-8210699 >> Thanks, >> -- Igor From martin.doerr at sap.com Thu Sep 13 07:25:28 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 13 Sep 2018 07:25:28 +0000 Subject: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad Message-ID: <5e113528b9c94b5089b8b7f1b8b22ed8@sap.com> Hi Michihiro, I have added "RFR(S): 8210660" to the subject. I don't think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used. Besides this, your change looks good to me. Would you like to improve ReplicateD with vector length 2, too? Thanks and best regards, Martin From: Michihiro Horie Sent: Mittwoch, 12. September 2018 18:11 To: hotspot-compiler-dev at openjdk.java.net Cc: Doerr, Martin ; Gustavo Romero ; Lindenmaier, Goetz Subject: RFR: PPC64: Mapping floating point registers to vsx registers in ppc.ad Dear all, Would you please review the following change? Bug: https://bugs.openjdk.java.net/browse/JDK-8210660 Webrev: http://cr.openjdk.java.net/~mhorie/8210660/webrev.00/ In the current code emit for replicating the floating point value in ppc.ad, a floating point value is once stored in order to load as an integer value. However, when SuperwordUseVSX is enabled, this is redundant because the floating point registers are overlapped with vsx registers 0-31. We can use vsx instructions for replicating the floating point value by mapping the floating point register to the vsx register. Best regards, -- Michihiro, IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: From Pengfei.Li at arm.com Thu Sep 13 08:31:01 2018 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Thu, 13 Sep 2018 08:31:01 +0000 Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by power-of-2 check In-Reply-To: <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com> References: <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com> Message-ID: Thanks Vladimir. So I still need another reviewer's feedback. -- Thanks, Pengfei > -----Original Message----- > > Looks good. > > Thanks, > Vladimir > > On 9/12/18 2:50 AM, Pengfei Li (Arm Technology China) wrote: > > Hi, > > > > I've updated the patch based on Vladimir's comment. I added checks for > SubI on both branches of the diamond phi. > > Also thanks Roland for the suggestion that supporting a Phi with 3 or more > inputs. But I think the matching rule will be much more complex if we add > this. And I'm not sure if there are any real case scenario which can benefit > from this support. So I didn't add it in. > > > > New webrev: http://cr.openjdk.java.net/~njian/8210152/webrev.01/ > > I've run jtreg full test with the new patch and no new issues found. > > > > Please let me know if you have other comments or suggestions. If no > further issues, I need your help to sponsor and push the patch. > > > > -- > > Thanks, > > Pengfei > > > > > >> -----Original Message----- > >> From: Roland Westrelin > >> Sent: Tuesday, September 11, 2018 16:24 > >> To: Vladimir Kozlov ; Pengfei Li (Arm > >> Technology China) ; dean.long at oracle.com; > >> hotspot- compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net > >> Cc: nd > >> Subject: Re: [PING] RE: RFR(S): 8210152: Optimize integer divisible > >> by power- > >> of-2 check > >> > >> > >>> The only comment I have is to add check for SubI on other branch > >>> (not only on True branch). Negation may occur on either branch since > >>> you accept all conditions for negation. > >> > >> Can't we make this more general and support a phi with any number of > >> inputs (not only 2 data inputs) as long as it's a mix of X and -X? > >> > >> Roland. From Pengfei.Li at arm.com Thu Sep 13 09:04:36 2018 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Thu, 13 Sep 2018 09:04:36 +0000 Subject: RFR(S): 8210413: AArch64: Optimize div/rem by constant in C1 Message-ID: Hi, Could you please help review this optimization in C1 AArch64? Currently, there are 2 LIR_Assembler::arithmetic_idiv() methods in c1_LIRAssembler_aarch64.cpp. One is left unimplemented, the other checks whether the divisor is a power-of-2 constant but does nothing optimized then. In this patch, I combined these 2 methods and added 2 below optimizations for integer div/rem. 1) Remove the div-by-zero check if the divisor is known to be a non-zero constant. 2) Use cheaper instructions instead of "sdiv" to do div/rem by a power-of-2 constant (including 1, 2, 4, 8, 16, ...) JBS: https://bugs.openjdk.java.net/browse/JDK-8210413 webrev: http://cr.openjdk.java.net/~njian/8210413/webrev.00/ As Roman Kennke's original code comment said, using the temp register passed into arithmetic_idiv() is problematic. So I also use the rscratch1 directly for intermediate result in div/rem calculations. You could refer thread http://mail.openjdk.java.net/pipermail/aarch64-port-dev/2018-September/006315.html for this issue. I've run jtreg full test with this patch and JVM option "-XX:TieredStopAtLevel=1" on an AArch64 server and no new issues found. -- Thanks, Pengfei From erik.osterlund at oracle.com Thu Sep 13 10:43:21 2018 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 13 Sep 2018 12:43:21 +0200 Subject: RFR: JDK-8210656: Object equals abstraction for BarrierSetC2 In-Reply-To: <07f491e5-fb06-86a3-f6f1-1b9d37544fb0@redhat.com> References: <07f491e5-fb06-86a3-f6f1-1b9d37544fb0@redhat.com> Message-ID: <5B9A3F49.4090905@oracle.com> Hi Roman, Interesting. So semantically, this is cmp(resolve(a), resolve(b)), but you have circumstances in which the barriers are unnecessary and can be elided. Any of them having null in their type is one reason, but I suppose there are surely other reasons as well (such as finding dominating write barriers). I see two different approaches for this barrier elision: 1) Elide it during parsing (as you propose) 2) Elide it during Optimize (which I think conceptually looks like a natural fit) I originally proposed a function on BarrierSetC2 that I think I called optimize_barriers() or something like that. The idea was to use this hook to let GC barrier code shave off pointless (not to be confused with useless) barriers that can be removed. Roland thought that this seemed too specific to ZGC to warrant a general API, and I agreed, because indeed only ZGC used this hook at the time. This is today ZBarrierSetC2::find_dominating_barriers which is called straight from Optimize. I wonder if it would make sense to re-instate that hook. Then you could use the existing resolve() barriers during parsing, and leave barrier elision tricks (null checks included, plus other tricks you might have up your sleeve) to Optimize. For example, you might be able to walk your list of barriers and disconnect these pointless barriers. What do you think? Thanks, /Erik On 2018-09-12 22:11, Roman Kennke wrote: > This introduces an abstraction to deal with object equality in > BarrierSetC2. This is needed by GCs that can have different copies of > same objects alive like Shenandoah. > > The approach chosen here is slightly different than we did in e.g. > BarrierSetAssembler and the runtime Access API: instead of owning the > whole equality, it only provides a resolve-like method to resolve the > operands to stable values. The reason for doing this is that it's easier > to do this way in intrinsics if those barriers are detached from the > actual CmpP. This is because the barriers create new memory states, and > we'd have to create memphis around those things, which is considerably > more complex. > > I chose to add a new resolve_for_obj_equals(a, b) method instead of > using two calls to resolve(a); resolve(b); because this allows for > optimization: if any of a or b is known to be NULL, we can elide > barriers for both. This is not possible to do with two independent > resolve() calls. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8210656 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8210656/webrev.00/ > > Testing: passes hotspot/jtreg:tier1 > > What do you think about this? > > Thanks, > Roman > From rkennke at redhat.com Thu Sep 13 10:51:29 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 13 Sep 2018 12:51:29 +0200 Subject: RFR: JDK-8210656: Object equals abstraction for BarrierSetC2 In-Reply-To: <5B9A3F49.4090905@oracle.com> References: <07f491e5-fb06-86a3-f6f1-1b9d37544fb0@redhat.com> <5B9A3F49.4090905@oracle.com> Message-ID: <1cd9d4d1-5f18-15da-2c30-44effc1b8bf2@redhat.com> Hi Erik, > Interesting. So semantically, this is cmp(resolve(a), resolve(b)), but > you have circumstances in which the barriers are unnecessary and can be > elided. Any of them having null in their type is one reason, but I > suppose there are surely other reasons as well (such as finding > dominating write barriers). Yes. We can already handle reasons that relate to 'stand-alone' barriers (like dominating write-barriers and others). However, this one is different because it relates to the *combination* of the two operands. I.e. a property of operand A or B would affect barriers for both A *and* B. This seems tricky to do after parsing. I guess we could look at CmpP, check their operands for known-null, and elide the write-barriers then, but this also means we need to check if the write-barriers haven't found other uses in the meantime, etc). Overall, this seemed *much* more hassle, whereas during parsing it comes quite naturally. See our impl: https://paste.fedoraproject.org/paste/Hr~nKkm4HnZo3hmcw3Snnw Roland: how hard/feasible would it be to do something like Erik proposed? I.e. use the usual resolve() for obj-eq and elide barriers later? It might have additional advantage (not sure) to catch cases where type of an object becomes known-null later in the optimization process? Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From aph at redhat.com Thu Sep 13 11:05:52 2018 From: aph at redhat.com (Andrew Haley) Date: Thu, 13 Sep 2018 12:05:52 +0100 Subject: [aarch64-port-dev ] RFR(S): 8210413: AArch64: Optimize div/rem by constant in C1 In-Reply-To: References: Message-ID: <9645c210-3d87-52fa-8051-54dc60629866@redhat.com> Hi, On 09/13/2018 10:04 AM, Pengfei Li (Arm Technology China) wrote: > Could you please help review this optimization in C1 AArch64? > JBS: https://bugs.openjdk.java.net/browse/JDK-8210413 > webrev: http://cr.openjdk.java.net/~njian/8210413/webrev.00/ It looks fine, but it's really odd that this is only implemented for ints and not longs. Can you do longs too? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rkennke at redhat.com Thu Sep 13 12:15:21 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 13 Sep 2018 14:15:21 +0200 Subject: RFR: JDK-8210656: Object equals abstraction for BarrierSetC2 In-Reply-To: <5B9A3F49.4090905@oracle.com> References: <07f491e5-fb06-86a3-f6f1-1b9d37544fb0@redhat.com> <5B9A3F49.4090905@oracle.com> Message-ID: <8b48dd82-7162-1d5a-527a-fc068a72c009@redhat.com> Hi Erik, I talked to Roland about this. It turns out that we already have this optimization pass, and could just as well live with cmp(resolve(a), resolve(b)). We need a little (shenandoah-specific-) hook in CmpPNode::Ideal() for that though (but we'd need that anyway I suppose). If you agree with that, I'll withdraw this RFR. Ok? Roman > Hi Roman, > > Interesting. So semantically, this is cmp(resolve(a), resolve(b)), but > you have circumstances in which the barriers are unnecessary and can be > elided. Any of them having null in their type is one reason, but I > suppose there are surely other reasons as well (such as finding > dominating write barriers). > > I see two different approaches for this barrier elision: > 1) Elide it during parsing (as you propose) > 2) Elide it during Optimize (which I think conceptually looks like a > natural fit) > > I originally proposed a function on BarrierSetC2 that I think I called > optimize_barriers() or something like that. The idea was to use this > hook to let GC barrier code shave off pointless (not to be confused with > useless) barriers that can be removed. Roland thought that this seemed > too specific to ZGC to warrant a general API, and I agreed, because > indeed only ZGC used this hook at the time. This is today > ZBarrierSetC2::find_dominating_barriers which is called straight from > Optimize. > > I wonder if it would make sense to re-instate that hook. Then you could > use the existing resolve() barriers during parsing, and leave barrier > elision tricks (null checks included, plus other tricks you might have > up your sleeve) to Optimize. For example, you might be able to walk your > list of barriers and disconnect these pointless barriers. What do you > think? > > Thanks, > /Erik > > On 2018-09-12 22:11, Roman Kennke wrote: >> This introduces an abstraction to deal with object equality in >> BarrierSetC2. This is needed by GCs that can have different copies of >> same objects alive like Shenandoah. >> >> The approach chosen here is slightly different than we did in e.g. >> BarrierSetAssembler and the runtime Access API: instead of owning the >> whole equality, it only provides a resolve-like method to resolve the >> operands to stable values. The reason for doing this is that it's easier >> to do this way in intrinsics if those barriers are detached from the >> actual CmpP. This is because the barriers create new memory states, and >> we'd have to create memphis around those things, which is considerably >> more complex. >> >> I chose to add a new resolve_for_obj_equals(a, b) method instead of >> using two calls to resolve(a); resolve(b); because this allows for >> optimization: if any of a or b is known to be NULL, we can elide >> barriers for both. This is not possible to do with two independent >> resolve() calls. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8210656 >> Webrev: >> http://cr.openjdk.java.net/~rkennke/JDK-8210656/webrev.00/ >> >> Testing: passes hotspot/jtreg:tier1 >> >> What do you think about this? >> >> Thanks, >> Roman >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From erik.osterlund at oracle.com Thu Sep 13 12:51:16 2018 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 13 Sep 2018 14:51:16 +0200 Subject: RFR: JDK-8210656: Object equals abstraction for BarrierSetC2 In-Reply-To: <8b48dd82-7162-1d5a-527a-fc068a72c009@redhat.com> References: <07f491e5-fb06-86a3-f6f1-1b9d37544fb0@redhat.com> <5B9A3F49.4090905@oracle.com> <8b48dd82-7162-1d5a-527a-fc068a72c009@redhat.com> Message-ID: <5B9A5D44.7040409@oracle.com> Hi Roman, I'm glad this idea works well for you. If you need an Ideal hook for CmpPNode anyway for barrier optimizations, then I suppose we should sort something out there. In that API though, it would be great if it was not specific to CmpPNode. I'm thinking something along the lines of Node* BarrierSetC2::ideal_node(Node* n), and then figure out if something should or should not be done for a given node in the backend. That way, if we need ideal hooks for other node types, we could reuse the same API to ask the BarrierSetC2 if it has more ideal nodes. What do you think? Thanks, /Erik On 2018-09-13 14:15, Roman Kennke wrote: > Hi Erik, > > I talked to Roland about this. It turns out that we already have this > optimization pass, and could just as well live with cmp(resolve(a), > resolve(b)). We need a little (shenandoah-specific-) hook in > CmpPNode::Ideal() for that though (but we'd need that anyway I suppose). > If you agree with that, I'll withdraw this RFR. Ok? > > Roman > >> Hi Roman, >> >> Interesting. So semantically, this is cmp(resolve(a), resolve(b)), but >> you have circumstances in which the barriers are unnecessary and can be >> elided. Any of them having null in their type is one reason, but I >> suppose there are surely other reasons as well (such as finding >> dominating write barriers). >> >> I see two different approaches for this barrier elision: >> 1) Elide it during parsing (as you propose) >> 2) Elide it during Optimize (which I think conceptually looks like a >> natural fit) >> >> I originally proposed a function on BarrierSetC2 that I think I called >> optimize_barriers() or something like that. The idea was to use this >> hook to let GC barrier code shave off pointless (not to be confused with >> useless) barriers that can be removed. Roland thought that this seemed >> too specific to ZGC to warrant a general API, and I agreed, because >> indeed only ZGC used this hook at the time. This is today >> ZBarrierSetC2::find_dominating_barriers which is called straight from >> Optimize. >> >> I wonder if it would make sense to re-instate that hook. Then you could >> use the existing resolve() barriers during parsing, and leave barrier >> elision tricks (null checks included, plus other tricks you might have >> up your sleeve) to Optimize. For example, you might be able to walk your >> list of barriers and disconnect these pointless barriers. What do you >> think? >> >> Thanks, >> /Erik >> >> On 2018-09-12 22:11, Roman Kennke wrote: >>> This introduces an abstraction to deal with object equality in >>> BarrierSetC2. This is needed by GCs that can have different copies of >>> same objects alive like Shenandoah. >>> >>> The approach chosen here is slightly different than we did in e.g. >>> BarrierSetAssembler and the runtime Access API: instead of owning the >>> whole equality, it only provides a resolve-like method to resolve the >>> operands to stable values. The reason for doing this is that it's easier >>> to do this way in intrinsics if those barriers are detached from the >>> actual CmpP. This is because the barriers create new memory states, and >>> we'd have to create memphis around those things, which is considerably >>> more complex. >>> >>> I chose to add a new resolve_for_obj_equals(a, b) method instead of >>> using two calls to resolve(a); resolve(b); because this allows for >>> optimization: if any of a or b is known to be NULL, we can elide >>> barriers for both. This is not possible to do with two independent >>> resolve() calls. >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8210656 >>> Webrev: >>> http://cr.openjdk.java.net/~rkennke/JDK-8210656/webrev.00/ >>> >>> Testing: passes hotspot/jtreg:tier1 >>> >>> What do you think about this? >>> >>> Thanks, >>> Roman >>> > From sandhya.viswanathan at intel.com Thu Sep 13 13:05:49 2018 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Thu, 13 Sep 2018 13:05:49 +0000 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com> Hi Vladimir, Please find below the updated webrev with all your comments incorporated: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/ I have run the jtreg compiler tests on SKX and KNL which have two different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, September 11, 2018 8:54 PM To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction Thank you, Sandhya I am satisfied with your detailed answer for memory loads issues. Okay lets not add them. Vladimir On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > Thanks a lot for the detailed review. I really appreciate your feedback. > Please see my response in your email below marked with (Sandhya >>>). Looking forward to your advice. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, September 11, 2018 5:11 PM > To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction > > Thank you. > > I want to discuss next issue: > > > You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store? > >>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa. > > This is what I thought. You increase registers pressure this way which may cause spills on stack. > Also we don't check that register could be the same as result you may get unneeded moves. > > I would advice add memory moves at least. > > Sandhya >>> I had added those rules initially and removed them in the final patch. I noticed that the register allocator uses the memory rules (e.g. LoadF) to initialize the idealreg2reg mask (matcher.cpp). I would like the register allocator to get all the possible register on an architecture for idealreg2reg mask. I wondered that multiple instruct rules in .ad file for LoadF from memory might cause problems. I would have to have higher cost for loading into restricted register set like vlReg. Then I decided that the register allocator can handle this in much better way than me adding rules to load from memory. This is with the background that the regF is always all the available registers and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I am referring to is: > MachNode *spillCP = match_tree(new LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); > #endif > MachNode *spillI = match_tree(new LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); > MachNode *spillL = match_tree(new LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, LoadNode::DependsO > nlyOnTest, false)); > MachNode *spillF = match_tree(new LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); > MachNode *spillD = match_tree(new LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); > MachNode *spillP = match_tree(new LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); > .... > idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); > > An other question. You use movflt() and movdbl() which use either movap[s|d] and movs[s|d] instructions: > http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu/x86/macroAssembler_x86.hpp#l164 > Are these instructions work when avx512vl is not available? I see for vectors you use > vpxor+vinserti* combination. > > Sandhya >>> Yes the scalar floating point instructions are available with AVX512 encoding when avx512vl is not available. That is why you would see not just movflt, movdbl but all the other scalar operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions. > > Last question. I notice next UseAVX check in vectors spills code in x86.ad: > if ((UseAVX < 2) || VM_Version::supports_avx512vl()) > > Should it be (UseAVX < 3)? > > Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev. > > Thanks, > Vladimir > > On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: >> Hi Vladimir, >> >> Thanks a lot for your review and feedback. Please see my response in your email below. I will send an updated webrev incorporating your feedback. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Monday, September 10, 2018 6:09 PM >> To: Viswanathan, Sandhya ; >> hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >> >> Very nice. Thank you, Sandhya. >> >> I would like to see more meaningful naming in .ad files - instead of rreg* and ovec* to have something like vlReg* and legVec*. >> >>>>> Yes, accepted. >> >> New load_from_* and load_to_* instructions in .ad files should be renamed to next and collocate with other Move*_reg_reg* instructions: >> >> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, >> vlRegF src) >>>>> Yes, accepted. >> >> You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store? >>>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa. >> >> Also please explain why these registers are used when UseAVX == 0?: >> >> +instruct absD_reg(rregD dst) %{ >> predicate((UseSSE>=2) && (UseAVX == 0)); >> >> we switch off evex so regular regD (only legacy register in this case) should work too: >> 661 if (UseAVX < 3) { >> 662 _features &= ~CPU_AVX512F; >> >>>>> Yes, accepted. It could be regD here. >> >> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): >> >> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, >> +vectors_reg_legacy, %{ >> VM_Version::supports_evex() && VM_Version::supports_avx512bw() && >> VM_Version::supports_avx512dq() && >> VM_Version::supports_avx512vl() %} ); >> >>>>> Yes, accepted. >> >> I would suggest to test these changes on different machines (non-avx512 and avx512) and with different UseAVX values. >> >>>>> Will do. >> >> Thanks, >> Vladimir >> >> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: >>> Recently there have been couple of high priority issues with regards >>> to high bank of XMM register >>> (XMM16-XMM31) usage by C2: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8207746 >>> >>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>> >>> Please find below a patch which attempts to clean up the XMM register handling by using register groups. >>> >>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ >>> >>> >>> The patch provides a restricted set of registers to the match rules >>> in the ad file based on the underlying architecture. >>> >>> The aim is to remove special handling/workaround from macro assembler and assembler. >>> >>> By removing the special handling, the patch reduces the overall code size by about 1800 lines of code. >>> >>> Your review and feedback is very welcome. >>> >>> Best Regards, >>> >>> Sandhya >>> From dmitrij.pochepko at bell-sw.com Thu Sep 13 14:35:53 2018 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Thu, 13 Sep 2018 17:35:53 +0300 Subject: RFR: 8189107 - AARCH64: create intrinsic for pow In-Reply-To: References: <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com> <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com> Message-ID: <02da7309-9a70-ea30-fafc-d1d85cfae328@bell-sw.com> Hi, I found 3 items to fix in your comments in http://cr.openjdk.java.net/~adinn/8189107/original_algorithm.txt 1) //?????????? [1, sqrt(3/2)), [sqrt(3/2), sqrt(3)/2), [sqrt(3)/2, 2) //????? i.e. [1, ~1.225],??? [~1.225,??? ~1.732),??? [~1.732, 2) this one should be: [1, sqrt(3/2)), [sqrt(3/2), sqrt(3)), [sqrt(3), 2) i.e. [1, ~1.225],??? [~1.225,??? ~1.732),??? [~1.732,??? 2) 2) "4) Filter out overflows (z > 1023) or underflows (z < -1077)" should be: "4) Filter out overflows (z > 1023) or underflows (z < -1076)" 3) "5) Let |z| = n + r where n is int, 0 <= n < 10, and 0 <= r < 1" should be: "5) Let |z| = n + r where n is int, 0 <= n < 1076, and 0 <= r < 1" Other comments seems fine Thanks, Dmitrij On 07/09/18 15:58, Andrew Dinn wrote: > > I have rewritten the algorithm to achieve what I think is needed to > patch these omissions. The redraft of this part of the code is available > here: > > http://cr.openjdk.java.net/~adinn/8189107/original_algorithm.txt > > From rkennke at redhat.com Thu Sep 13 15:37:48 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 13 Sep 2018 17:37:48 +0200 Subject: RFR: JDK-8210656: Object equals abstraction for BarrierSetC2 In-Reply-To: <5B9A5D44.7040409@oracle.com> References: <07f491e5-fb06-86a3-f6f1-1b9d37544fb0@redhat.com> <5B9A3F49.4090905@oracle.com> <8b48dd82-7162-1d5a-527a-fc068a72c009@redhat.com> <5B9A5D44.7040409@oracle.com> Message-ID: <87d66c05-9f5c-8e18-a589-b47d21d72681@redhat.com> Hi Erik, > I'm glad this idea works well for you. If you need an Ideal hook for > CmpPNode anyway for barrier optimizations, then I suppose we should sort > something out there. In that API though, it would be great if it was not > specific to CmpPNode. I'm thinking something along the lines of Node* > BarrierSetC2::ideal_node(Node* n), and then figure out if something > should or should not be done for a given node in the backend. That way, > if we need ideal hooks for other node types, we could reuse the same API > to ask the BarrierSetC2 if it has more ideal nodes. What do you think? Yeah, that would actually be good. We have at least one more place that I know of where we hook into Ideal(). The API would be something like this: Node* ideal_node(PhaseGVN* phase, Node* n, bool can_reshape) const; And a sample usage from CallLeafNode would look like this: Node* CallLeafNode::Ideal(PhaseGVN* phase, bool can_reshape) { Node* ideal = BarrierSet::barrier_set_c2()->barrier_set_c2()->ideal_node(phase, n, can_reshape); if (ideal != NULL) { return ideal; } return CallNode::Ideal(phase, n, can_reshape); } Unfortunately (or maybe fortunately) this can't be inserted generically into Node::Ideal(..) because subclasses can't be expected to always call the super implementation. Thanks for reviewing! I'll withdraw this RFR and push the additional resolve() hooks via another RFE. Cheers, Roman > Thanks, > /Erik > > On 2018-09-13 14:15, Roman Kennke wrote: >> Hi Erik, >> >> I talked to Roland about this. It turns out that we already have this >> optimization pass, and could just as well live with cmp(resolve(a), >> resolve(b)). We need a little (shenandoah-specific-) hook in >> CmpPNode::Ideal() for that though (but we'd need that anyway I suppose). >> If you agree with that, I'll withdraw this RFR. Ok? >> >> Roman >> >>> Hi Roman, >>> >>> Interesting. So semantically, this is cmp(resolve(a), resolve(b)), but >>> you have circumstances in which the barriers are unnecessary and can be >>> elided. Any of them having null in their type is one reason, but I >>> suppose there are surely other reasons as well (such as finding >>> dominating write barriers). >>> >>> I see two different approaches for this barrier elision: >>> 1) Elide it during parsing (as you propose) >>> 2) Elide it during Optimize (which I think conceptually looks like a >>> natural fit) >>> >>> I originally proposed a function on BarrierSetC2 that I think I called >>> optimize_barriers() or something like that. The idea was to use this >>> hook to let GC barrier code shave off pointless (not to be confused with >>> useless) barriers that can be removed. Roland thought that this seemed >>> too specific to ZGC to warrant a general API, and I agreed, because >>> indeed only ZGC used this hook at the time. This is today >>> ZBarrierSetC2::find_dominating_barriers which is called straight from >>> Optimize. >>> >>> I wonder if it would make sense to re-instate that hook. Then you could >>> use the existing resolve() barriers during parsing, and leave barrier >>> elision tricks (null checks included, plus other tricks you might have >>> up your sleeve) to Optimize. For example, you might be able to walk your >>> list of barriers and disconnect these pointless barriers. What do you >>> think? >>> >>> Thanks, >>> /Erik >>> >>> On 2018-09-12 22:11, Roman Kennke wrote: >>>> This introduces an abstraction to deal with object equality in >>>> BarrierSetC2. This is needed by GCs that can have different copies of >>>> same objects alive like Shenandoah. >>>> >>>> The approach chosen here is slightly different than we did in e.g. >>>> BarrierSetAssembler and the runtime Access API: instead of owning the >>>> whole equality, it only provides a resolve-like method to resolve the >>>> operands to stable values. The reason for doing this is that it's >>>> easier >>>> to do this way in intrinsics if those barriers are detached from the >>>> actual CmpP. This is because the barriers create new memory states, and >>>> we'd have to create memphis around those things, which is considerably >>>> more complex. >>>> >>>> I chose to add a new resolve_for_obj_equals(a, b) method instead of >>>> using two calls to resolve(a); resolve(b); because this allows for >>>> optimization: if any of a or b is known to be NULL, we can elide >>>> barriers for both. This is not possible to do with two independent >>>> resolve() calls. >>>> >>>> Bug: >>>> https://bugs.openjdk.java.net/browse/JDK-8210656 >>>> Webrev: >>>> http://cr.openjdk.java.net/~rkennke/JDK-8210656/webrev.00/ >>>> >>>> Testing: passes hotspot/jtreg:tier1 >>>> >>>> What do you think about this? >>>> >>>> Thanks, >>>> Roman >>>> >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From vladimir.kozlov at oracle.com Thu Sep 13 16:01:18 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 13 Sep 2018 09:01:18 -0700 Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error # ERROR: TEST FAILED: Cought IOException while receiving event packet In-Reply-To: <1203c5d6-c74a-e205-9dcb-eb43fb6971d3@oracle.com> References: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com> <1203c5d6-c74a-e205-9dcb-eb43fb6971d3@oracle.com> Message-ID: Updated changes with NoSafepointVerifier: http://cr.openjdk.java.net/~kvn/8210220/webrev.01/ Vladimir On 9/12/18 9:00 PM, Vladimir Kozlov wrote: > Yes, you are right I will add NoSafepointVerifier and will rerun testing. > > Thanks, > Vladimir > > On 9/12/18 6:51 PM, dean.long at oracle.com wrote: >> On 9/12/18 6:25 PM, Vladimir Kozlov wrote: >>> Thank you, Dean >>> >>> Breakpoint is set at safepoint: >>> >>> http://hg.openjdk.java.net/jdk/jdk/file/b7bfd64e43a6/src/hotspot/share/prims/jvmtiImpl.cpp#l411 >>> >>> But why it is important to not be at safepoint in publish_aot(). If AOT is registered first and >>> then breakpoint is set AOT methods will be deoptimized by CodeCache::flush_dependents_on_method() >>> which is called from BreakpointInfo::set(). >> >> I mean you can't do this: >> >> 1) check breakpoint count >> 2) safepoint >> 3) register code >> >> The AOT code is not visible to CodeCache::flush_dependents_on_method() until the cmpxchg(). >> NoSafepointVerifier would catch any changes in the future that introduce a safepoint. >> >> dl >> >>> >>> Vladimir >>> >>> On 9/12/18 5:45 PM, dean.long at oracle.com wrote: >>>> Hi Vladimir.? C1 and C2 use ciEnv which also grabs locks and checks >>>> JvmtiExport::can_hotswap_or_post_breakpoint() and Dependencies::check_evol_method().? But if the >>>> breakpoint count can only be changed by the VM thread at a safepoint, then your fix looks good >>>> as long as we don't enter a safepoint before the code is registered.? How about adding a >>>> NoSafepointVerifier to publish_aot()? >>>> >>>> dl >>>> >>>> >>>> On 9/12/18 1:52 PM, Vladimir Kozlov wrote: >>>>> http://cr.openjdk.java.net/~kvn/8210220/webrev.00/ >>>>> https://bugs.openjdk.java.net/browse/JDK-8210220 >>>>> >>>>> Don't register AOT method if corresponding java method has breakpoints (for debugging) >>>>> otherwise AOT method will be executed which do not stop at breakpoint. JIT has similar check [1]. >>>>> >>>>> I also removed AOT code which is not used and we forgot to remove. >>>>> >>>>> Tested hs-tier1-3. >>>>> >>>>> thanks, >>>>> Vladimir >>>>> >>>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845 >>>> >> From HORIE at jp.ibm.com Thu Sep 13 17:05:08 2018 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Fri, 14 Sep 2018 02:05:08 +0900 Subject: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad In-Reply-To: <5e113528b9c94b5089b8b7f1b8b22ed8@sap.com> References: <5e113528b9c94b5089b8b7f1b8b22ed8@sap.com> Message-ID: Hi Martin, Thank you so much for your review (and adding the ID in the subject :-). >I don?t think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used. You're right, thanks. I removed a redundant one. I also refactored ReplicateD with vector length 2. Following is the latest webrev: http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/ Best regards, -- Michihiro, IBM Research - Tokyo From: "Doerr, Martin" To: Michihiro Horie , "hotspot-compiler-dev at openjdk.java.net" Cc: Gustavo Romero , "Lindenmaier, Goetz" Date: 2018/09/13 16:25 Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad Hi Michihiro, I have added ?RFR(S): 8210660? to the subject. I don?t think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used. Besides this, your change looks good to me. Would you like to improve ReplicateD with vector length 2, too? Thanks and best regards, Martin From: Michihiro Horie Sent: Mittwoch, 12. September 2018 18:11 To: hotspot-compiler-dev at openjdk.java.net Cc: Doerr, Martin ; Gustavo Romero ; Lindenmaier, Goetz Subject: RFR: PPC64: Mapping floating point registers to vsx registers in ppc.ad Dear all, Would you please review the following change? Bug: https://bugs.openjdk.java.net/browse/JDK-8210660 Webrev: http://cr.openjdk.java.net/~mhorie/8210660/webrev.00/ In the current code emit for replicating the floating point value in ppc.ad, a floating point value is once stored in order to load as an integer value. However, when SuperwordUseVSX is enabled, this is redundant because the floating point registers are overlapped with vsx registers 0-31. We can use vsx instructions for replicating the floating point value by mapping the floating point register to the vsx register. Best regards, -- Michihiro, IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From dean.long at oracle.com Thu Sep 13 18:59:05 2018 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 13 Sep 2018 11:59:05 -0700 Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error # ERROR: TEST FAILED: Cought IOException while receiving event packet In-Reply-To: References: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com> <1203c5d6-c74a-e205-9dcb-eb43fb6971d3@oracle.com> Message-ID: <3c63df34-7f40-aac7-77aa-97f6d6a482cb@oracle.com> After the first PauseNoSafepointVerifier, I think you need to check mh->number_of_breakpoints() again, because it could have changed. dl On 9/13/18 9:01 AM, Vladimir Kozlov wrote: > Updated changes with NoSafepointVerifier: > > http://cr.openjdk.java.net/~kvn/8210220/webrev.01/ > > Vladimir > > On 9/12/18 9:00 PM, Vladimir Kozlov wrote: >> Yes, you are right I will add NoSafepointVerifier and will rerun >> testing. >> >> Thanks, >> Vladimir >> >> On 9/12/18 6:51 PM, dean.long at oracle.com wrote: >>> On 9/12/18 6:25 PM, Vladimir Kozlov wrote: >>>> Thank you, Dean >>>> >>>> Breakpoint is set at safepoint: >>>> >>>> http://hg.openjdk.java.net/jdk/jdk/file/b7bfd64e43a6/src/hotspot/share/prims/jvmtiImpl.cpp#l411 >>>> >>>> >>>> But why it is important to not be at safepoint in publish_aot(). If >>>> AOT is registered first and then breakpoint is set AOT methods will >>>> be deoptimized by CodeCache::flush_dependents_on_method() which is >>>> called from BreakpointInfo::set(). >>> >>> I mean you can't do this: >>> >>> 1) check breakpoint count >>> 2) safepoint >>> 3) register code >>> >>> The AOT code is not visible to >>> CodeCache::flush_dependents_on_method() until the cmpxchg(). >>> NoSafepointVerifier would catch any changes in the future that >>> introduce a safepoint. >>> >>> dl >>> >>>> >>>> Vladimir >>>> >>>> On 9/12/18 5:45 PM, dean.long at oracle.com wrote: >>>>> Hi Vladimir.? C1 and C2 use ciEnv which also grabs locks and >>>>> checks JvmtiExport::can_hotswap_or_post_breakpoint() and >>>>> Dependencies::check_evol_method().? But if the breakpoint count >>>>> can only be changed by the VM thread at a safepoint, then your fix >>>>> looks good as long as we don't enter a safepoint before the code >>>>> is registered.? How about adding a NoSafepointVerifier to >>>>> publish_aot()? >>>>> >>>>> dl >>>>> >>>>> >>>>> On 9/12/18 1:52 PM, Vladimir Kozlov wrote: >>>>>> http://cr.openjdk.java.net/~kvn/8210220/webrev.00/ >>>>>> https://bugs.openjdk.java.net/browse/JDK-8210220 >>>>>> >>>>>> Don't register AOT method if corresponding java method has >>>>>> breakpoints (for debugging) otherwise AOT method will be executed >>>>>> which do not stop at breakpoint. JIT has similar check [1]. >>>>>> >>>>>> I also removed AOT code which is not used and we forgot to remove. >>>>>> >>>>>> Tested hs-tier1-3. >>>>>> >>>>>> thanks, >>>>>> Vladimir >>>>>> >>>>>> [1] >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845 >>>>> >>> From vladimir.kozlov at oracle.com Thu Sep 13 19:25:15 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 13 Sep 2018 12:25:15 -0700 Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error # ERROR: TEST FAILED: Cought IOException while receiving event packet In-Reply-To: <3c63df34-7f40-aac7-77aa-97f6d6a482cb@oracle.com> References: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com> <1203c5d6-c74a-e205-9dcb-eb43fb6971d3@oracle.com> <3c63df34-7f40-aac7-77aa-97f6d6a482cb@oracle.com> Message-ID: <3144010b-1cd1-cc27-2782-867326916ae1@oracle.com> No, first PauseNoSafepointVerifier is on the path where we exit function without publishing AOT method. May be my comment there is not clear. Do you have better suggestion for comment? Thanks, Vladimir On 9/13/18 11:59 AM, dean.long at oracle.com wrote: > After the first PauseNoSafepointVerifier, I think you need to check mh->number_of_breakpoints() > again, because it could have changed. > > dl > > On 9/13/18 9:01 AM, Vladimir Kozlov wrote: >> Updated changes with NoSafepointVerifier: >> >> http://cr.openjdk.java.net/~kvn/8210220/webrev.01/ >> >> Vladimir >> >> On 9/12/18 9:00 PM, Vladimir Kozlov wrote: >>> Yes, you are right I will add NoSafepointVerifier and will rerun testing. >>> >>> Thanks, >>> Vladimir >>> >>> On 9/12/18 6:51 PM, dean.long at oracle.com wrote: >>>> On 9/12/18 6:25 PM, Vladimir Kozlov wrote: >>>>> Thank you, Dean >>>>> >>>>> Breakpoint is set at safepoint: >>>>> >>>>> http://hg.openjdk.java.net/jdk/jdk/file/b7bfd64e43a6/src/hotspot/share/prims/jvmtiImpl.cpp#l411 >>>>> >>>>> But why it is important to not be at safepoint in publish_aot(). If AOT is registered first and >>>>> then breakpoint is set AOT methods will be deoptimized by >>>>> CodeCache::flush_dependents_on_method() which is called from BreakpointInfo::set(). >>>> >>>> I mean you can't do this: >>>> >>>> 1) check breakpoint count >>>> 2) safepoint >>>> 3) register code >>>> >>>> The AOT code is not visible to CodeCache::flush_dependents_on_method() until the cmpxchg(). >>>> NoSafepointVerifier would catch any changes in the future that introduce a safepoint. >>>> >>>> dl >>>> >>>>> >>>>> Vladimir >>>>> >>>>> On 9/12/18 5:45 PM, dean.long at oracle.com wrote: >>>>>> Hi Vladimir.? C1 and C2 use ciEnv which also grabs locks and checks >>>>>> JvmtiExport::can_hotswap_or_post_breakpoint() and Dependencies::check_evol_method().? But if >>>>>> the breakpoint count can only be changed by the VM thread at a safepoint, then your fix looks >>>>>> good as long as we don't enter a safepoint before the code is registered.? How about adding a >>>>>> NoSafepointVerifier to publish_aot()? >>>>>> >>>>>> dl >>>>>> >>>>>> >>>>>> On 9/12/18 1:52 PM, Vladimir Kozlov wrote: >>>>>>> http://cr.openjdk.java.net/~kvn/8210220/webrev.00/ >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8210220 >>>>>>> >>>>>>> Don't register AOT method if corresponding java method has breakpoints (for debugging) >>>>>>> otherwise AOT method will be executed which do not stop at breakpoint. JIT has similar check >>>>>>> [1]. >>>>>>> >>>>>>> I also removed AOT code which is not used and we forgot to remove. >>>>>>> >>>>>>> Tested hs-tier1-3. >>>>>>> >>>>>>> thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845 >>>>>> >>>> > From dean.long at oracle.com Thu Sep 13 19:44:07 2018 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 13 Sep 2018 12:44:07 -0700 Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error # ERROR: TEST FAILED: Cought IOException while receiving event packet In-Reply-To: <3144010b-1cd1-cc27-2782-867326916ae1@oracle.com> References: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com> <1203c5d6-c74a-e205-9dcb-eb43fb6971d3@oracle.com> <3c63df34-7f40-aac7-77aa-97f6d6a482cb@oracle.com> <3144010b-1cd1-cc27-2782-867326916ae1@oracle.com> Message-ID: <9200d62f-1854-a1ed-2f2e-7becbe1beea0@oracle.com> No it's fine.? I wasn't looking at the full context.? Sorry for the confusion.? This version is good. dl On 9/13/18 12:25 PM, Vladimir Kozlov wrote: > No, first PauseNoSafepointVerifier is on the path where we exit > function without publishing AOT method. > May be my comment there is not clear. Do you have better suggestion > for comment? > > Thanks, > Vladimir > > On 9/13/18 11:59 AM, dean.long at oracle.com wrote: >> After the first PauseNoSafepointVerifier, I think you need to check >> mh->number_of_breakpoints() again, because it could have changed. >> >> dl >> >> On 9/13/18 9:01 AM, Vladimir Kozlov wrote: >>> Updated changes with NoSafepointVerifier: >>> >>> http://cr.openjdk.java.net/~kvn/8210220/webrev.01/ >>> >>> Vladimir >>> >>> On 9/12/18 9:00 PM, Vladimir Kozlov wrote: >>>> Yes, you are right I will add NoSafepointVerifier and will rerun >>>> testing. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 9/12/18 6:51 PM, dean.long at oracle.com wrote: >>>>> On 9/12/18 6:25 PM, Vladimir Kozlov wrote: >>>>>> Thank you, Dean >>>>>> >>>>>> Breakpoint is set at safepoint: >>>>>> >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/b7bfd64e43a6/src/hotspot/share/prims/jvmtiImpl.cpp#l411 >>>>>> >>>>>> >>>>>> But why it is important to not be at safepoint in publish_aot(). >>>>>> If AOT is registered first and then breakpoint is set AOT methods >>>>>> will be deoptimized by CodeCache::flush_dependents_on_method() >>>>>> which is called from BreakpointInfo::set(). >>>>> >>>>> I mean you can't do this: >>>>> >>>>> 1) check breakpoint count >>>>> 2) safepoint >>>>> 3) register code >>>>> >>>>> The AOT code is not visible to >>>>> CodeCache::flush_dependents_on_method() until the cmpxchg(). >>>>> NoSafepointVerifier would catch any changes in the future that >>>>> introduce a safepoint. >>>>> >>>>> dl >>>>> >>>>>> >>>>>> Vladimir >>>>>> >>>>>> On 9/12/18 5:45 PM, dean.long at oracle.com wrote: >>>>>>> Hi Vladimir.? C1 and C2 use ciEnv which also grabs locks and >>>>>>> checks JvmtiExport::can_hotswap_or_post_breakpoint() and >>>>>>> Dependencies::check_evol_method().? But if the breakpoint count >>>>>>> can only be changed by the VM thread at a safepoint, then your >>>>>>> fix looks good as long as we don't enter a safepoint before the >>>>>>> code is registered.? How about adding a NoSafepointVerifier to >>>>>>> publish_aot()? >>>>>>> >>>>>>> dl >>>>>>> >>>>>>> >>>>>>> On 9/12/18 1:52 PM, Vladimir Kozlov wrote: >>>>>>>> http://cr.openjdk.java.net/~kvn/8210220/webrev.00/ >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8210220 >>>>>>>> >>>>>>>> Don't register AOT method if corresponding java method has >>>>>>>> breakpoints (for debugging) otherwise AOT method will be >>>>>>>> executed which do not stop at breakpoint. JIT has similar check >>>>>>>> [1]. >>>>>>>> >>>>>>>> I also removed AOT code which is not used and we forgot to remove. >>>>>>>> >>>>>>>> Tested hs-tier1-3. >>>>>>>> >>>>>>>> thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> [1] >>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845 >>>>>>> >>>>> >> From vladimir.kozlov at oracle.com Thu Sep 13 20:48:20 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 13 Sep 2018 13:48:20 -0700 Subject: [12] RFR(S) 8210220: [AOT] jdwp test cases are failing with error # ERROR: TEST FAILED: Cought IOException while receiving event packet In-Reply-To: <9200d62f-1854-a1ed-2f2e-7becbe1beea0@oracle.com> References: <6012da06-b2d8-5819-4bb3-7c69ac2970fe@oracle.com> <1203c5d6-c74a-e205-9dcb-eb43fb6971d3@oracle.com> <3c63df34-7f40-aac7-77aa-97f6d6a482cb@oracle.com> <3144010b-1cd1-cc27-2782-867326916ae1@oracle.com> <9200d62f-1854-a1ed-2f2e-7becbe1beea0@oracle.com> Message-ID: Thank you, Dean Vladimir On 9/13/18 12:44 PM, dean.long at oracle.com wrote: > No it's fine.? I wasn't looking at the full context.? Sorry for the confusion.? This version is good. > > dl > > On 9/13/18 12:25 PM, Vladimir Kozlov wrote: >> No, first PauseNoSafepointVerifier is on the path where we exit function without publishing AOT >> method. >> May be my comment there is not clear. Do you have better suggestion for comment? >> >> Thanks, >> Vladimir >> >> On 9/13/18 11:59 AM, dean.long at oracle.com wrote: >>> After the first PauseNoSafepointVerifier, I think you need to check mh->number_of_breakpoints() >>> again, because it could have changed. >>> >>> dl >>> >>> On 9/13/18 9:01 AM, Vladimir Kozlov wrote: >>>> Updated changes with NoSafepointVerifier: >>>> >>>> http://cr.openjdk.java.net/~kvn/8210220/webrev.01/ >>>> >>>> Vladimir >>>> >>>> On 9/12/18 9:00 PM, Vladimir Kozlov wrote: >>>>> Yes, you are right I will add NoSafepointVerifier and will rerun testing. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 9/12/18 6:51 PM, dean.long at oracle.com wrote: >>>>>> On 9/12/18 6:25 PM, Vladimir Kozlov wrote: >>>>>>> Thank you, Dean >>>>>>> >>>>>>> Breakpoint is set at safepoint: >>>>>>> >>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/b7bfd64e43a6/src/hotspot/share/prims/jvmtiImpl.cpp#l411 >>>>>>> >>>>>>> But why it is important to not be at safepoint in publish_aot(). If AOT is registered first >>>>>>> and then breakpoint is set AOT methods will be deoptimized by >>>>>>> CodeCache::flush_dependents_on_method() which is called from BreakpointInfo::set(). >>>>>> >>>>>> I mean you can't do this: >>>>>> >>>>>> 1) check breakpoint count >>>>>> 2) safepoint >>>>>> 3) register code >>>>>> >>>>>> The AOT code is not visible to CodeCache::flush_dependents_on_method() until the cmpxchg(). >>>>>> NoSafepointVerifier would catch any changes in the future that introduce a safepoint. >>>>>> >>>>>> dl >>>>>> >>>>>>> >>>>>>> Vladimir >>>>>>> >>>>>>> On 9/12/18 5:45 PM, dean.long at oracle.com wrote: >>>>>>>> Hi Vladimir.? C1 and C2 use ciEnv which also grabs locks and checks >>>>>>>> JvmtiExport::can_hotswap_or_post_breakpoint() and Dependencies::check_evol_method().? But if >>>>>>>> the breakpoint count can only be changed by the VM thread at a safepoint, then your fix >>>>>>>> looks good as long as we don't enter a safepoint before the code is registered.? How about >>>>>>>> adding a NoSafepointVerifier to publish_aot()? >>>>>>>> >>>>>>>> dl >>>>>>>> >>>>>>>> >>>>>>>> On 9/12/18 1:52 PM, Vladimir Kozlov wrote: >>>>>>>>> http://cr.openjdk.java.net/~kvn/8210220/webrev.00/ >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8210220 >>>>>>>>> >>>>>>>>> Don't register AOT method if corresponding java method has breakpoints (for debugging) >>>>>>>>> otherwise AOT method will be executed which do not stop at breakpoint. JIT has similar >>>>>>>>> check [1]. >>>>>>>>> >>>>>>>>> I also removed AOT code which is not used and we forgot to remove. >>>>>>>>> >>>>>>>>> Tested hs-tier1-3. >>>>>>>>> >>>>>>>>> thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/75261571c13d/src/hotspot/share/oops/method.cpp#l845 >>>>>>>> >>>>>> >>> > From magnus.ihse.bursie at oracle.com Thu Sep 13 22:20:52 2018 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Fri, 14 Sep 2018 00:20:52 +0200 Subject: RFR: JDK-8210731 PropertiesParser does not produce reproducible output Message-ID: <28385672-2843-9499-debd-fd7dc70e3f7c@oracle.com> The file make/langtools/tools/propertiesparser/PropertiesParser.java b/make/langtools/tools/propertiesparser/PropertiesParser.java is used to convert .properties files into .java files as part of the gensrc step. However, due to it's use of creating it's output directly from HashMaps, it's not guaranteed to be stable, and is causing spurios differences in our cmp-baseline builds. Bug: https://bugs.openjdk.java.net/browse/JDK-8210731 WebRev: http://cr.openjdk.java.net/~ihse/JDK-8210731-properties-parser-is-not-stable/webrev.01 /Magnus From mandy.chung at oracle.com Thu Sep 13 22:25:44 2018 From: mandy.chung at oracle.com (mandy chung) Date: Thu, 13 Sep 2018 15:25:44 -0700 Subject: RFR: JDK-8210731 PropertiesParser does not produce reproducible output In-Reply-To: <28385672-2843-9499-debd-fd7dc70e3f7c@oracle.com> References: <28385672-2843-9499-debd-fd7dc70e3f7c@oracle.com> Message-ID: <4d9ed0b0-770d-2d5f-2629-03d5661690fa@oracle.com> Looks okay to me. Mandy P.S. I cc'ed compiler-dev since I think you meant to cc compiler-dev instead of hotspot-compiler-dev. On 9/13/18 3:20 PM, Magnus Ihse Bursie wrote: > The file make/langtools/tools/propertiesparser/PropertiesParser.java > b/make/langtools/tools/propertiesparser/PropertiesParser.java is used > to convert .properties files into .java files as part of the gensrc step. > > However, due to it's use of creating it's output directly from > HashMaps, it's not guaranteed to be stable, and is causing spurios > differences in our cmp-baseline builds. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8210731 > WebRev: > http://cr.openjdk.java.net/~ihse/JDK-8210731-properties-parser-is-not-stable/webrev.01 > > /Magnus > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.gibbons at oracle.com Thu Sep 13 22:25:54 2018 From: jonathan.gibbons at oracle.com (Jonathan Gibbons) Date: Thu, 13 Sep 2018 15:25:54 -0700 Subject: RFR: JDK-8210731 PropertiesParser does not produce reproducible output In-Reply-To: <4d9ed0b0-770d-2d5f-2629-03d5661690fa@oracle.com> References: <28385672-2843-9499-debd-fd7dc70e3f7c@oracle.com> <4d9ed0b0-770d-2d5f-2629-03d5661690fa@oracle.com> Message-ID: <5B9AE3F2.4020404@oracle.com> +1 -- Jon On 09/13/2018 03:25 PM, mandy chung wrote: > Looks okay to me. > > Mandy > P.S. I cc'ed compiler-dev since I think you meant to cc compiler-dev > instead of hotspot-compiler-dev. > > On 9/13/18 3:20 PM, Magnus Ihse Bursie wrote: >> The file make/langtools/tools/propertiesparser/PropertiesParser.java >> b/make/langtools/tools/propertiesparser/PropertiesParser.java is used >> to convert .properties files into .java files as part of the gensrc >> step. >> >> However, due to it's use of creating it's output directly from >> HashMaps, it's not guaranteed to be stable, and is causing spurios >> differences in our cmp-baseline builds. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8210731 >> WebRev: >> http://cr.openjdk.java.net/~ihse/JDK-8210731-properties-parser-is-not-stable/webrev.01 >> >> /Magnus >> > From erik.joelsson at oracle.com Thu Sep 13 23:20:13 2018 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Thu, 13 Sep 2018 16:20:13 -0700 Subject: RFR: JDK-8210731 PropertiesParser does not produce reproducible output In-Reply-To: <28385672-2843-9499-debd-fd7dc70e3f7c@oracle.com> References: <28385672-2843-9499-debd-fd7dc70e3f7c@oracle.com> Message-ID: <2c529f83-0d72-fb47-d6f1-5be45eeac3ef@oracle.com> Hello, Looks good. Perhaps add a comment explaining why the otherwise unusual choice of collection class is used. /Erik On 2018-09-13 15:20, Magnus Ihse Bursie wrote: > The file make/langtools/tools/propertiesparser/PropertiesParser.java > b/make/langtools/tools/propertiesparser/PropertiesParser.java is used > to convert .properties files into .java files as part of the gensrc step. > > However, due to it's use of creating it's output directly from > HashMaps, it's not guaranteed to be stable, and is causing spurios > differences in our cmp-baseline builds. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8210731 > WebRev: > http://cr.openjdk.java.net/~ihse/JDK-8210731-properties-parser-is-not-stable/webrev.01 > > /Magnus > From igor.veresov at oracle.com Fri Sep 14 03:50:39 2018 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 13 Sep 2018 20:50:39 -0700 Subject: RFR(L) 8210478: Update Graal Message-ID: This is a regular update. Please see the JBS issue for the list of the changes included in this update. JBS: https://bugs.openjdk.java.net/browse/JDK-8210478 Webrev: http://cr.openjdk.java.net/~iveresov/8210478/webrev.00/ Thanks! igor -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Fri Sep 14 04:39:58 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 13 Sep 2018 21:39:58 -0700 Subject: RFR(L) 8210478: Update Graal In-Reply-To: References: Message-ID: <278cdc1f-bc09-4332-abfe-2429225c25d9@oracle.com> Looks good. Thanks, Vladimir On 9/13/18 8:50 PM, Igor Veresov wrote: > This is a regular update. Please see the JBS issue for the list of the changes included in this update. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8210478 > Webrev: http://cr.openjdk.java.net/~iveresov/8210478/webrev.00/ > > > Thanks! > igor > > > From igor.veresov at oracle.com Fri Sep 14 04:56:41 2018 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 13 Sep 2018 21:56:41 -0700 Subject: RFR(L) 8210478: Update Graal In-Reply-To: <278cdc1f-bc09-4332-abfe-2429225c25d9@oracle.com> References: <278cdc1f-bc09-4332-abfe-2429225c25d9@oracle.com> Message-ID: <114529AA-8A68-495A-98E9-EBCC98D663BE@oracle.com> Thanks, Vladimir! igor > On Sep 13, 2018, at 9:39 PM, Vladimir Kozlov wrote: > > Looks good. > > Thanks, > Vladimir > > On 9/13/18 8:50 PM, Igor Veresov wrote: >> This is a regular update. Please see the JBS issue for the list of the changes included in this update. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8210478 >> Webrev: http://cr.openjdk.java.net/~iveresov/8210478/webrev.00/ >> Thanks! >> igor -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Fri Sep 14 07:49:16 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 14 Sep 2018 09:49:16 +0200 Subject: RFR(S): 8210390: C2 still crashes with "assert(mode == ControlAroundStripMined && use == sfpt) failed: missed a node" Message-ID: http://cr.openjdk.java.net/~roland/8210390/webrev.00/ PhaseIdealLoop::reorg_offsets() creates some data nodes on the exit path of a counted loop so they are in the outer strip mined loop. Data nodes in the outer strip mined loop are expected to be referenced from the safepoint node. But that's not the case for these new nodes which have all uses outside the outer strip mined loop. This inconsistency causes a later attempt at cloning the loop in the same loop opts pass to break. The fix is to assign control to the new data nodes that's on the outer strip mined loop exit path. Roland. From martin.doerr at sap.com Fri Sep 14 08:29:58 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 14 Sep 2018 08:29:58 +0000 Subject: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad In-Reply-To: References: <5e113528b9c94b5089b8b7f1b8b22ed8@sap.com> Message-ID: Hi Michihiro, your webrev http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/ looks good to me. I only noticed that some instructs (e.g. for xscvdpspn and xxspltw) have no "format %{ ... %}" specification so they are missing in the PrintOptoAssembly output. But this seems to be missing in the current version already. We can test it while waiting for a 2nd review. Thanks and best regards, Martin From: Michihiro Horie Sent: Donnerstag, 13. September 2018 19:05 To: Doerr, Martin Cc: Lindenmaier, Goetz ; Gustavo Romero ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad Hi Martin, Thank you so much for your review (and adding the ID in the subject :-). >I don't think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used. You're right, thanks. I removed a redundant one. I also refactored ReplicateD with vector length 2. Following is the latest webrev: http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/ Best regards, -- Michihiro, IBM Research - Tokyo [Inactive hide details for "Doerr, Martin" ---2018/09/13 16:25:33---Hi Michihiro, I have added "RFR(S): 8210660" to the subject.]"Doerr, Martin" ---2018/09/13 16:25:33---Hi Michihiro, I have added "RFR(S): 8210660" to the subject. From: "Doerr, Martin" > To: Michihiro Horie >, "hotspot-compiler-dev at openjdk.java.net" > Cc: Gustavo Romero >, "Lindenmaier, Goetz" > Date: 2018/09/13 16:25 Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad ________________________________ Hi Michihiro, I have added "RFR(S): 8210660" to the subject. I don't think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used. Besides this, your change looks good to me. Would you like to improve ReplicateD with vector length 2, too? Thanks and best regards, Martin From: Michihiro Horie > Sent: Mittwoch, 12. September 2018 18:11 To: hotspot-compiler-dev at openjdk.java.net Cc: Doerr, Martin >; Gustavo Romero >; Lindenmaier, Goetz > Subject: RFR: PPC64: Mapping floating point registers to vsx registers in ppc.ad Dear all, Would you please review the following change? Bug: https://bugs.openjdk.java.net/browse/JDK-8210660 Webrev: http://cr.openjdk.java.net/~mhorie/8210660/webrev.00/ In the current code emit for replicating the floating point value in ppc.ad, a floating point value is once stored in order to load as an integer value. However, when SuperwordUseVSX is enabled, this is redundant because the floating point registers are overlapped with vsx registers 0-31. We can use vsx instructions for replicating the floating point value by mapping the floating point register to the vsx register. Best regards, -- Michihiro, IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From aph at redhat.com Fri Sep 14 09:34:34 2018 From: aph at redhat.com (Andrew Haley) Date: Fri, 14 Sep 2018 10:34:34 +0100 Subject: [aarch64-port-dev ] RFR: 8189107 - AARCH64: create intrinsic for pow In-Reply-To: References: <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com> <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com> Message-ID: <169c83ec-3e2f-a001-22c0-08528b3f189f@redhat.com> On 09/07/2018 01:58 PM, Andrew Dinn wrote: > I have rewritten the algorithm to achieve what I think is needed to > patch these omissions. The redraft of this part of the code is available > here: > > http://cr.openjdk.java.net/~adinn/8189107/original_algorithm.txt I know that you're very good at using punctuation, capitalization, and grammar in written text. However, for some reason you omit these in comments. In this case, it would be much easier to read your comments if they were recast as sentences in grammatical English. Sure, some of them could be simply noun phrases. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From adinn at redhat.com Fri Sep 14 09:59:38 2018 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 14 Sep 2018 10:59:38 +0100 Subject: RFR: 8189107 - AARCH64: create intrinsic for pow In-Reply-To: <02da7309-9a70-ea30-fafc-d1d85cfae328@bell-sw.com> References: <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com> <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com> <02da7309-9a70-ea30-fafc-d1d85cfae328@bell-sw.com> Message-ID: <26281e69-0354-9abe-1ffc-36c10fd93d68@redhat.com> Hi Dmitrij, On 13/09/18 15:35, Dmitrij Pochepko wrote: > I found 3 items to fix in your comments in > http://cr.openjdk.java.net/~adinn/8189107/original_algorithm.txt > > 1) > > //?????????? [1, sqrt(3/2)), [sqrt(3/2), sqrt(3)/2), [sqrt(3)/2, 2) > //????? i.e. [1, ~1.225],??? [~1.225,??? ~1.732),??? [~1.732, 2) > > this one should be: > > [1, sqrt(3/2)), [sqrt(3/2), sqrt(3)), [sqrt(3), 2) > i.e. [1, ~1.225],??? [~1.225,??? ~1.732),??? [~1.732,??? 2) > > > 2) > > "4) Filter out overflows (z > 1023) or underflows (z < -1077)" > > should be: > > "4) Filter out overflows (z > 1023) or underflows (z < -1076)" > > 3) "5) Let |z| = n + r where n is int, 0 <= n < 10, and 0 <= r < 1" > > should be: > > "5) Let |z| = n + r where n is int, 0 <= n < 1076, and 0 <= r < 1" > > Other comments seems fine Thank you for the corrections. I will update the file on cr.openjdk.java.net accordingly. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From zhaixiang at loongson.cn Fri Sep 14 11:31:14 2018 From: zhaixiang at loongson.cn (Leslie Zhai) Date: Fri, 14 Sep 2018 19:31:14 +0800 Subject: RFR(XS): 8024128: guarantee(codelet_size > 0 && (size_t)codelet_size > 2*K) failed: not enough space for interpreter generation Message-ID: Hi, I just quoted the old thread http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-October/012243.html I think we should increase it more for future otherwise you will have to always catch up with interpreter changes. Increase it to 256 * 1024 and 224 * 1024 Vladimir On 10/16/13 12:22 PM, Albert Noll wrote: > Hi, > > could I have a review for this patch? > > bug: https://bugs.openjdk.java.net/browse/JDK-8026708 > webrev: http://cr.openjdk.java.net/~anoll/8026708/webrev.00/ > > > Problem: Not enough room for interpreter. My last patch did not solve > the problem for solaris-amd64. >???????????????? A local build (solaris-amd64) of the most recent > hotspot-comp version requires a template interpreter >???????????????? size of 211K (obtained with -XX:+PrintInterpreter). > There have been some modifications to the template >???????????????? interpreter in the last couple of weeks which might have > triggered this error. > > Solution: Increase interpreter size by 8k (32-bit and 64-bit). > > Testing: Failing test case in solaris-amd64 ----- 8< -------- 8< -------- 8< -------- 8< -------- 8< -------- 8< --- I found that `InterpreterCodeSize` had been changed from 200K to 208K [1] ,? then changed from 208K to 256K [2] by Albert.? But if built with-debug-level=fastdebug/slowdebug,? it will be multiplied by four: NOT_PRODUCT(code_size *= 4;)? // debug uses extra interpreter code space Then it might trigger Native memory allocation (malloc) failed to allocate xxx bytes for CodeCache: no room for Interpreter issue. I don't want to always catch up with interpreter changes by guessing the suitable number, not too small, not too big :) Please give me some suggestion about the root cause,? thanks a lot! Leslie Zhai [1] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/6d7eba360ba4 [2] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/74e00b98d5dd From HORIE at jp.ibm.com Fri Sep 14 11:42:15 2018 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Fri, 14 Sep 2018 20:42:15 +0900 Subject: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad In-Reply-To: References: <5e113528b9c94b5089b8b7f1b8b22ed8@sap.com> Message-ID: Hi Martin, Thank you for your comment to improve this change and testing it. I uploaded a new webrev with format statements. http://cr.openjdk.java.net/~mhorie/8210660/webrev.02/ Best regards, -- Michihiro, IBM Research - Tokyo From: "Doerr, Martin" To: Michihiro Horie Cc: "Lindenmaier, Goetz" , Gustavo Romero , "hotspot-compiler-dev at openjdk.java.net" Date: 2018/09/14 17:30 Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad Hi Michihiro, your webrev http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/ looks good to me. I only noticed that some instructs (e.g. for xscvdpspn and xxspltw) have no ?format %{ ? %}? specification so they are missing in the PrintOptoAssembly output. But this seems to be missing in the current version already. We can test it while waiting for a 2nd review. Thanks and best regards, Martin From: Michihiro Horie Sent: Donnerstag, 13. September 2018 19:05 To: Doerr, Martin Cc: Lindenmaier, Goetz ; Gustavo Romero ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad Hi Martin, Thank you so much for your review (and adding the ID in the subject :-). >I don?t think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used. You're right, thanks. I removed a redundant one. I also refactored ReplicateD with vector length 2. Following is the latest webrev: http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/ Best regards, -- Michihiro, IBM Research - Tokyo Inactive hide details for "Doerr, Martin" ---2018/09/13 16:25:33---Hi Michihiro, I have added "RFR(S): 8210660" to the subject."Doerr, Martin" ---2018/09/13 16:25:33---Hi Michihiro, I have added "RFR(S): 8210660" to the subject. From: "Doerr, Martin" To: Michihiro Horie , " hotspot-compiler-dev at openjdk.java.net" < hotspot-compiler-dev at openjdk.java.net> Cc: Gustavo Romero , "Lindenmaier, Goetz" < goetz.lindenmaier at sap.com> Date: 2018/09/13 16:25 Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad Hi Michihiro, I have added ?RFR(S): 8210660? to the subject. I don?t think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used. Besides this, your change looks good to me. Would you like to improve ReplicateD with vector length 2, too? Thanks and best regards, Martin From: Michihiro Horie Sent: Mittwoch, 12. September 2018 18:11 To: hotspot-compiler-dev at openjdk.java.net Cc: Doerr, Martin ; Gustavo Romero < gromero at linux.vnet.ibm.com>; Lindenmaier, Goetz Subject: RFR: PPC64: Mapping floating point registers to vsx registers in ppc.ad Dear all, Would you please review the following change? Bug: https://bugs.openjdk.java.net/browse/JDK-8210660 Webrev: http://cr.openjdk.java.net/~mhorie/8210660/webrev.00/ In the current code emit for replicating the floating point value in ppc.ad, a floating point value is once stored in order to load as an integer value. However, when SuperwordUseVSX is enabled, this is redundant because the floating point registers are overlapped with vsx registers 0-31. We can use vsx instructions for replicating the floating point value by mapping the floating point register to the vsx register. Best regards, -- Michihiro, IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From rkennke at redhat.com Fri Sep 14 12:56:07 2018 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 14 Sep 2018 14:56:07 +0200 Subject: RFR: JDK-8210752: Remaining explicit barriers for C2 Message-ID: <15be8e2d-dba5-2e8a-c851-b6821b81d4b3@redhat.com> Please review the following change: JDK-8210187 introduced explicit barriers for C2. There've been a few missing: - Unsafe accesses also require explicit barriers when it's unknown if the access is on-heap or off-heap. In this case, C2 turns the access into a raw access, in which case the access_load/store APIs cannot determine what to do. Emitting explicit barriers solves this for Shenandoah: in case of raw access, base will be NULL, which gets handled by a null-check (in this case the barrier is ignored), for on-heap access, the null-check will fail and the barrier triggered correctly. - One arraycopy barrier on dst erroneously emitted for ACCESS_READ where it should be ACCESS_WRITE (my mistake) - Object equality using CmpP requires stable oops, and thus barriers on both operands. - vectorizedMismatch() and copyMemory() also require explicit barriers before building the addresses and feeding them into the calls. Bug: https://bugs.openjdk.java.net/browse/JDK-8210752 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8210752/webrev.00/ Thanks, Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From tobias.hartmann at oracle.com Fri Sep 14 13:42:39 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 14 Sep 2018 15:42:39 +0200 Subject: RFR(S): 8210390: C2 still crashes with "assert(mode == ControlAroundStripMined && use == sfpt) failed: missed a node" In-Reply-To: References: Message-ID: <0976877f-eaa7-c5e6-fb52-0c0a70467c84@oracle.com> Hi Roland, that looks good to me. Best regards, Tobias On 14.09.2018 09:49, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8210390/webrev.00/ > > PhaseIdealLoop::reorg_offsets() creates some data nodes on the exit path > of a counted loop so they are in the outer strip mined loop. Data nodes > in the outer strip mined loop are expected to be referenced from the > safepoint node. But that's not the case for these new nodes which have > all uses outside the outer strip mined loop. This inconsistency causes a > later attempt at cloning the loop in the same loop opts pass to break. > > The fix is to assign control to the new data nodes that's on the outer > strip mined loop exit path. > > Roland. > From rwestrel at redhat.com Fri Sep 14 14:47:21 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 14 Sep 2018 16:47:21 +0200 Subject: RFR(S): 8210390: C2 still crashes with "assert(mode == ControlAroundStripMined && use == sfpt) failed: missed a node" In-Reply-To: <0976877f-eaa7-c5e6-fb52-0c0a70467c84@oracle.com> References: <0976877f-eaa7-c5e6-fb52-0c0a70467c84@oracle.com> Message-ID: > that looks good to me. Thanks for the review, Tobias. Roland. From goetz.lindenmaier at sap.com Fri Sep 14 15:18:12 2018 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 14 Sep 2018 15:18:12 +0000 Subject: RFR (S): 8209972: [GRAAL] Don't run RTM tests with Graal In-Reply-To: <4F954DE5-DD8C-4395-8E40-6D341C42649C@twitter.com> References: <9148c105-1bd8-0852-9130-6b3c7823330e@linux.vnet.ibm.com> <3ae68161-0f78-2766-655f-fe3fe4a4719f@oracle.com> <9e7a0c2b-47c0-45b7-a394-a88849cfa3a1@oracle.com> <15d5f2c1-96ce-c379-233f-e71218a63e1e@linux.vnet.ibm.com> <2e905821-4525-e210-5cc5-50a6f09a39f6@oracle.com> , <4F954DE5-DD8C-4395-8E40-6D341C42649C@twitter.com> Message-ID: <20A4FBE2-A091-4437-A2D8-9806C8DC1837@sap.com> Hi, Gustavo, thanks for the offlist explanations. The change simplifies the matter nicely. Looks good, reviewed. Best regards, G?tz > Am 10.09.2018 um 22:17 schrieb Christian Thalinger : > > > >> On Sep 6, 2018, at 1:53 AM, Gustavo Romero wrote: >> >> On 09/05/2018 07:54 PM, Vladimir Kozlov wrote: >>> v3 looks good. >> >> Thanks a lot Vladimir. >> >> @Goetz, would you mind to review v3 please? > > Is he on vacation? :-) > >> It touches code meant for AIX but >> I don't expect any change in the end. >> >> http://cr.openjdk.java.net/~gromero/8209972/v3/ >> >> Thank you. >> >> >> Best regards, >> Gustavo >> >>> Thanks, >>> Vladimir >>>> On 9/5/18 3:18 PM, Gustavo Romero wrote: >>>> Hi Vladimir, >>>> >>>>> On 09/04/2018 03:40 PM, Vladimir Kozlov wrote: >>>>> Thank you Gustavo for detailed answer. >>>>> >>>>> I agree with your suggestion to keep vm.rtm.cpu and remove vm.rtm.os. Changes in VMProps.java are good now. >>>> >>>> Thanks for reviewing it! >>>> >>>> >>>>> About property check (flavor == "server" & !emulatedClient) in tests. If C2 (and Graal in a future when it supports RTM) is used, this check is always true because C2 (and Graal) is only available in Server VM and it is disabled in emulatedClient setting. This check was added before we have C2 usage check which can be done now with vm.rtm.compiler. >>>> >>>> Thanks, I was not aware of it. I've updated the webrev removing >>>> "flavor == "server" & !emulatedClient": >>>> >>>> http://cr.openjdk.java.net/~gromero/8209972/v3/ >>>> >>>> "hg diff --patience": >>>> >>>> http://cr.openjdk.java.net/~gromero/8209972/v3/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v3.diff >>>> >>>> Testing (on Linux): >>>> >>>> ** X86_64 w/ CPU+OS RTM support + Graal VM ** >>>> Test results: no tests selected (all RTM tests skipped) >>>> >>>> ** POWER8 w/ CPU+OS support ** >>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>> Test results: passed: 30 >>>> >>>> ** X86_64 w/ CPU+OS support ** >>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>> Test results: passed: 30 >>>> >>>> ** POWER7 wo/ CPU+OS RTM support ** >>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>> Test results: passed: 10 >>>> >>>> ** POWER9 NV CPU w/ CPU RTM support and OS wo/ RTM support ** >>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>> Test results: passed: 10 >>>> >>>> >>>> Best regards, >>>> Gustavo >>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>>> On 9/3/18 3:15 PM, Gustavo Romero wrote: >>>>>> Hi Vladimir, >>>>>> >>>>>> Thanks a lot for reviewing it and for your comments. >>>>>> >>>>>>> On 08/31/2018 03:12 PM, Vladimir Kozlov wrote: >>>>>>> Hi Gustavo, >>>>>>> >>>>>>> I think you should replace !Compiler.isGraalEnabled() with Compiler.isC2Enabled() because C2 may be switched off with TieredStopAtLevel < 4 flag >>>>>> >>>>>> Yes, although currently afaics all tests will explicitly enabled C2 (for >>>>>> instance, passing -Xcomp and -XX:-TieredCompilation) or implicitly enable C2 >>>>>> through a warming up before testing, I agree that nothing forbids one to >>>>>> switch off C2 with TieredStopAtLevel = level < 4 for a test case. It also >>>>>> looks better to list explicitly which compilers do support RTM instead of >>>>>> the ones that don't support it. >>>>>> >>>>>> I've updated the webrev accordingly: >>>>>> >>>>>> http://cr.openjdk.java.net/~gromero/8209972/v2/ >>>>>> >>>>>> diff in there looks odd so I generated another one with --patience for a >>>>>> better (IMO) diff format: >>>>>> >>>>>> http://cr.openjdk.java.net/~gromero/8209972/v2/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v2.diff >>>>>> >>>>>> >>>>>>> Also can platforms check be replaced with one vmRTMCPU()? Is ppc64 return true from vmRTMCPU()? >>>>>> >>>>>> For example, on Linux the following cases are possible regarding CPU / OS >>>>>> RTM support: >>>>>> >>>>>> POWER7 : cpu = false, os = false => vm.rtm.cpu = false >>>>>> POWER8 : cpu = true, os = false | true => vm.rtm.cpu = false | true >>>>>> POWER9 VM: cpu = true, os = false | true => vm.rtm.cpu = false | true >>>>>> POWER9 NV: cpu = true, os = false => vm.rtm.cpu = false >>>>>> >>>>>> PPC64 will return 'true' for vmRTMCPU() _only_ if both CPU and OS support >>>>>> RTM. In other words, when @requires asks for "vm.rtm.cpu & vm.rtm.os" it >>>>>> really looks like a tautology because if "vm.rtm.cpu" is 'true' it implies >>>>>> "vm.rtm.os" being 'true' as well, otherwise the JVM would never advertise >>>>>> the "rtm" feature (which is used by "vm.rtm.cpu"). That seems true for >>>>>> Linux and for AIX. >>>>>> >>>>>> That said I don't think that the platforms check can be replaced with one >>>>>> vmRTMCPU(), because in some cases it's necessary to run a test for >>>>>> cpu = false and compiler = true, i.e. it's necessary to run a test on an >>>>>> unsupported CPU for a given platform _only if_ the compiler in use supports >>>>>> RTM (like C2). So if, for instance, we do: >>>>>> >>>>>> 'vm.rtm.compiler = C2 && vmRTMCPU()' and use 'vm.rtm.compiler' in @requires >>>>>> we "tie" CPU+OS RTM support to compiler RTM support and the evaluation >>>>>> returns 'false' for cpu = false and compiler = true, skipping the test >>>>>> (vm.rtm.compiler = false). On the other hand, to evaluate 'vm.rtm.compiler' >>>>>> as 'true' and run the test in that case one could match for >>>>>> '!vm.rtm.compiler', but if compiler = false (Graal) '!vm.rtm.compiler' will >>>>>> be evaluated as 'true' and the test will run even thought the Graal >>>>>> compiler is selected, which is wrong. >>>>>> >>>>>> Hence 'vm.rtm.compiler' property must not rely on vmRTMCPU() and must >>>>>> contain its own list of supported compilers with RTM support for each >>>>>> platform IMO. Basically we can't ask the JVM about the compiler's support >>>>>> for RTM, since the JVM can only tell us about the CPU+OS support for RTM >>>>>> regarding the CPU and OS in which the JVM is running on. >>>>>> >>>>>> >>>>>>> And since you check cpu in here why not to replace all vm.rtm.* with one vm.rtm.supported? In such case you would need only one @requires checks in tests instead of: >>>>>>> >>>>>>> vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os & vm.rtm.compiler >>>>>> >>>>>> I think it's not possible either. Currently there are 5 match cases in >>>>>> RTM tests: >>>>>> >>>>>> gromero at moog:~/hg/jdk/jdk/test/hotspot/jtreg/compiler/rtm$ fgrep @require -RIn | cut -d ' ' -f 2-40 | sort -u >>>>>> * @requires !(vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os) >>>>>> * @requires vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os >>>>>> * @requires (!vm.rtm.cpu) & (vm.flavor == "server" & !vm.emulatedClient) >>>>>> * @requires vm.rtm.cpu & !(vm.flavor == "server" & !vm.emulatedClient) >>>>>> >>>>>> which can be simplified 5 cases as: >>>>>> >>>>>> 1: !(flavor == "server" & !emulatedClient & cpu & os) >>>>>> 2: flavor == "server" & !emulatedClient & cpu & os >>>>>> 3: (!cpu) & (flavor == "server" & !emulatedClient) >>>>>> 4: cpu & !(flavor == "server" & !emulatedClient) >>>>>> 5: no @requires >>>>>> >>>>>> I understand that case 1 and 2 (since CPU implies OS) can be simplified as: >>>>>> >>>>>> >>>>>> 1: !(flavor == "server" & !emulatedClient & cpu) >>>>>> 2: flavor == "server" & !emulatedClient & cpu >>>>>> 3: (!cpu) & (flavor == "server" & !emulatedClient) >>>>>> 4: cpu & !(flavor == "server" & !emulatedClient) >>>>>> 5: no @requires >>>>>> >>>>>> and case 1 and 2 are mere opposites, so we have 4 cases: >>>>>> >>>>>> 1: !(flavor == "server" & !emulatedClient & cpu) >>>>>> 3: (!cpu) & (flavor == "server" & !emulatedClient) >>>>>> 4: cpu & !(flavor == "server" & !emulatedClient) >>>>>> 5: no @requires >>>>>> >>>>>> We could simplify further making P = (flavor == "server" & !emulatedClient), >>>>>> and make: >>>>>> >>>>>> 1: !(P & cpu) >>>>>> 3: (!cpu) & (P) >>>>>> 4: cpu & !(P) >>>>>> 5: no @requires >>>>>> >>>>>> So if we add a compiler = C2 && (x64 | PPC) property to each of them in >>>>>> order to control running the tests only if the selected compiler on a >>>>>> given platform has RTM support (skipping Graal, for instance): >>>>>> >>>>>> 1: !(P & cpu) & compiler >>>>>> 3: (!cpu) & (P) & compiler >>>>>> 4: cpu & !(P) & compiler >>>>>> 5: no @requires & compiler >>>>>> >>>>>> So it looks like that at minimum we would need 3 properties, but IMO it's >>>>>> not worth to add another property P = (flavor == "server" & !emulatedClient) >>>>>> just to simplify further the @requires line. >>>>>> >>>>>> In summing up, I think it's only possible to replace 'cpu & os' by 'cpu', >>>>>> so I updated the webrev removing the vm.rtm.os property and keeping only >>>>>> vm.rtm.cpu and vm.rtm.compiler (plus flavor and emulatedClient checks). >>>>>> >>>>>> I've tested the following scenarios and observed no regression [1]: >>>>>> >>>>>> 1. X86_64 w/ RTM >>>>>> 2. X86_64 w/ RTM + Graal enabled >>>>>> 3. POWER7: no CPU+OS support for RTM >>>>>> 4. POWER8: CPU+OS support for RTM >>>>>> >>>>>> But I think we need a confirmation from SAP about AIX. >>>>>> >>>>>> >>>>>> Best regards, >>>>>> Gustavo >>>>>> >>>>>> [1] >>>>>> >>>>>> ** X86_64 w/ RTM ** >>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>>>> Test results: passed: 30 >>>>>> >>>>>> >>>>>> ** X86_64 w/ RTM + Graal enabled ** >>>>>> Test results: no tests selected (all RTM tests skipped) >>>>>> >>>>>> >>>>>> ** POWER7: no CPU+OS support for RTM ** >>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>>> Test results: passed: 10 >>>>>> >>>>>> >>>>>> ** POWER8: CPU+OS support for RTM ** >>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>>>> Test results: passed: 30 >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>>> On 8/31/18 8:38 AM, Gustavo Romero wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Could the following small change be reviewed please? >>>>>>>> >>>>>>>> Bug : https://bugs.openjdk.java.net/browse/JDK-8209972 >>>>>>>> Webrev: http://cr.openjdk.java.net/~gromero/8209972/v1/ >>>>>>>> >>>>>>>> It disables all RTM tests when a compiler not supporting RTM (e.g. Graal) >>>>>>>> is selected on platforms that can have CPU/OS with RTM support. >>>>>>>> >>>>>>>> It also disables all RTM tests for any other platform that has not a single >>>>>>>> compiler supporting RTM. >>>>>>>> >>>>>>>> The RTM support was first added to C2 compiler and once checkers for RTM >>>>>>>> (notably vm.rtm.cpu) find the feature "rtm" advertised by the JVM they >>>>>>>> assume that a compiler supporting RTM is available for sure ("rtm" is >>>>>>>> advertised only if RTM is supported by both CPU and OS). Later the JVM >>>>>>>> began to allow the selection of a compiler different from C2, like Graal, >>>>>>>> and it became possible to select a compiler without RTM support despite the >>>>>>>> fact that both the CPU and the OS support RTM. Thus for platforms >>>>>>>> supporting Graal or any other specific compiler the compiler availability for >>>>>>>> the RTM tests must be adjusted and if the selected compiler does not >>>>>>>> support RTM then all RTM tests must be skipped, including the ones meant >>>>>>>> for platforms without CPU or OS RTM support (e.g. *Unsupported*.java) >>>>>>>> because in some cases, like in TestUseRTMLockingOptionOnUnsupportedCPU.java, >>>>>>>> the test expects JVM initialization errors that will never occur because the >>>>>>>> problem is not that the RTM support for CPU or OS is missing, but rather >>>>>>>> because the selected compiler does not support RTM. >>>>>>>> >>>>>>>> That change adds a new VM property 'vm.rtm.compiler' which can be used to >>>>>>>> filter out compilers without RTM support for specific platforms and adapts >>>>>>>> the current RTM tests to use that new property. >>>>>>>> >>>>>>>> Nothing changes regarding the number of passing/selected tests for the >>>>>>>> various cpu/os/compiler combinations on platforms that currently might >>>>>>>> support RTM [1], except when Graal is in use. >>>>>>>> >>>>>>>> Thank you. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Gustavo >>>>>>>> >>>>>>>> >>>>>>>> [1] >>>>>>>> >>>>>>>> ** X64 w/ CPU and OS supporting RTM ** >>>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>>>>>> Test results: passed: 30 >>>>>>>> >>>>>>>> ** X64 w/ CPU and OS supporting RTM + Graal compiler wo/ RTM support ** >>>>>>>> Test results: no tests selected (all RTM tests skipped) >>>>>>>> >>>>>>>> ** POWER8 w/ CPU and OS supporting RTM ** >>>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>>>>>> Test results: passed: 30 >>>>>>>> >>>>>>>> ** POWER7 wo/ CPU and OS supporting RTM ** >>>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >>>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >>>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >>>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>>>>> Test results: passed: 10 >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> > From adinn at redhat.com Fri Sep 14 15:29:47 2018 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 14 Sep 2018 16:29:47 +0100 Subject: RFR: 8189107 - AARCH64: create intrinsic for pow In-Reply-To: <02da7309-9a70-ea30-fafc-d1d85cfae328@bell-sw.com> References: <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com> <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com> <02da7309-9a70-ea30-fafc-d1d85cfae328@bell-sw.com> Message-ID: <47c4307d-eca1-5270-7e2c-83a31686ef91@redhat.com> On 13/09/18 15:35, Dmitrij Pochepko wrote: > Other comments seems fine I am glad to hear that you did not find any errors in my analysis. However, I also need to ask you to answer a question that was implicit in my earlier note. I said: "I assume you are familiar with the relevant mathematics and how it has been used to derive the algorithm. If so then I would like you to review this rewrite and ensure that there are nor mathematical errors in it. I would also like you to check that the explanatory comments for of the individual steps in the algorithm do not contain any errors. If you are not familiar with the mathematics then please let me know. I need to know whether this has been reviewed by someone competent to do so." As you didn't respond to this I will have to ask you explicitly this time. Do you have a background in mathematics and numerical analysis that means you understand how the original algorithm has been arrived at? equally, how your algorithm may legitimately vary from that original? I'll break this down into several steps: Do you understand the (elementary) theory that explains how the various polynomial expansions I described in my comments converge to the original log and exp functions? Do you understand the theory that explains how partial polynomial sums (Remez polynomials) can be used used to approximate these polynomial expansions within specified ranges? Do you know how the coefficients of these Remez polynomial can be derived to any necessary accuracy? Do you understand how the computation of the values of those Remez polynomials must proceed in order to guarantee accuracy in the computed result in the presence of rounding errors? Can you provide a mathematical proof that the variations you have introduced into the computational process (specifially the move from Horner form to Estrin form) will not introduce rounding errors? I certainly cannot lay claim to a /thorough/ understanding of most, if not all, those topics. If you also cannot then I think we need to bring in someone who does. In particular, it is the last point that matters most of all here as this is where you have /chosen/ to make your algorithm diverge from the code you inherited. As regards the rest of the background maths, we do at least know that the other aspects of the algorithm -- in its original manifestation -- have been checked by numerical experts. Hence, if we ensure that your algorithm implements /equivalent/ steps then it ought to inherit the same guarantees of correctness. So, the only task as far as most of the code is concerned is to iron out any errors you might inadvertently have introduced. I have several nits to pick in that regard that which I will be posting shortly. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From gromero at linux.vnet.ibm.com Fri Sep 14 15:58:27 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Fri, 14 Sep 2018 12:58:27 -0300 Subject: RFR (S): 8209972: [GRAAL] Don't run RTM tests with Graal In-Reply-To: <20A4FBE2-A091-4437-A2D8-9806C8DC1837@sap.com> References: <9148c105-1bd8-0852-9130-6b3c7823330e@linux.vnet.ibm.com> <3ae68161-0f78-2766-655f-fe3fe4a4719f@oracle.com> <9e7a0c2b-47c0-45b7-a394-a88849cfa3a1@oracle.com> <15d5f2c1-96ce-c379-233f-e71218a63e1e@linux.vnet.ibm.com> <2e905821-4525-e210-5cc5-50a6f09a39f6@oracle.com> <4F954DE5-DD8C-4395-8E40-6D341C42649C@twitter.com> <20A4FBE2-A091-4437-A2D8-9806C8DC1837@sap.com> Message-ID: <606d94b1-6099-4b5b-c992-99e1d1c5661d@linux.vnet.ibm.com> Hi G?tz, On 09/14/2018 12:18 PM, Lindenmaier, Goetz wrote: > Hi, > > Gustavo, thanks for the offlist explanations. > > The change simplifies the matter nicely. > Looks good, reviewed. Thanks a lot for reviewing it! I'll push it today. Best regards, Gustavo > Best regards, G?tz > >> Am 10.09.2018 um 22:17 schrieb Christian Thalinger : >> >> >> >>> On Sep 6, 2018, at 1:53 AM, Gustavo Romero wrote: >>> >>> On 09/05/2018 07:54 PM, Vladimir Kozlov wrote: >>>> v3 looks good. >>> >>> Thanks a lot Vladimir. >>> >>> @Goetz, would you mind to review v3 please? >> >> Is he on vacation? :-) >> >>> It touches code meant for AIX but >>> I don't expect any change in the end. >>> >>> http://cr.openjdk.java.net/~gromero/8209972/v3/ >>> >>> Thank you. >>> >>> >>> Best regards, >>> Gustavo >>> >>>> Thanks, >>>> Vladimir >>>>> On 9/5/18 3:18 PM, Gustavo Romero wrote: >>>>> Hi Vladimir, >>>>> >>>>>> On 09/04/2018 03:40 PM, Vladimir Kozlov wrote: >>>>>> Thank you Gustavo for detailed answer. >>>>>> >>>>>> I agree with your suggestion to keep vm.rtm.cpu and remove vm.rtm.os. Changes in VMProps.java are good now. >>>>> >>>>> Thanks for reviewing it! >>>>> >>>>> >>>>>> About property check (flavor == "server" & !emulatedClient) in tests. If C2 (and Graal in a future when it supports RTM) is used, this check is always true because C2 (and Graal) is only available in Server VM and it is disabled in emulatedClient setting. This check was added before we have C2 usage check which can be done now with vm.rtm.compiler. >>>>> >>>>> Thanks, I was not aware of it. I've updated the webrev removing >>>>> "flavor == "server" & !emulatedClient": >>>>> >>>>> http://cr.openjdk.java.net/~gromero/8209972/v3/ >>>>> >>>>> "hg diff --patience": >>>>> >>>>> http://cr.openjdk.java.net/~gromero/8209972/v3/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v3.diff >>>>> >>>>> Testing (on Linux): >>>>> >>>>> ** X86_64 w/ CPU+OS RTM support + Graal VM ** >>>>> Test results: no tests selected (all RTM tests skipped) >>>>> >>>>> ** POWER8 w/ CPU+OS support ** >>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>>> Test results: passed: 30 >>>>> >>>>> ** X86_64 w/ CPU+OS support ** >>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>>> Test results: passed: 30 >>>>> >>>>> ** POWER7 wo/ CPU+OS RTM support ** >>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>> Test results: passed: 10 >>>>> >>>>> ** POWER9 NV CPU w/ CPU RTM support and OS wo/ RTM support ** >>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>> Test results: passed: 10 >>>>> >>>>> >>>>> Best regards, >>>>> Gustavo >>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>>> On 9/3/18 3:15 PM, Gustavo Romero wrote: >>>>>>> Hi Vladimir, >>>>>>> >>>>>>> Thanks a lot for reviewing it and for your comments. >>>>>>> >>>>>>>> On 08/31/2018 03:12 PM, Vladimir Kozlov wrote: >>>>>>>> Hi Gustavo, >>>>>>>> >>>>>>>> I think you should replace !Compiler.isGraalEnabled() with Compiler.isC2Enabled() because C2 may be switched off with TieredStopAtLevel < 4 flag >>>>>>> >>>>>>> Yes, although currently afaics all tests will explicitly enabled C2 (for >>>>>>> instance, passing -Xcomp and -XX:-TieredCompilation) or implicitly enable C2 >>>>>>> through a warming up before testing, I agree that nothing forbids one to >>>>>>> switch off C2 with TieredStopAtLevel = level < 4 for a test case. It also >>>>>>> looks better to list explicitly which compilers do support RTM instead of >>>>>>> the ones that don't support it. >>>>>>> >>>>>>> I've updated the webrev accordingly: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~gromero/8209972/v2/ >>>>>>> >>>>>>> diff in there looks odd so I generated another one with --patience for a >>>>>>> better (IMO) diff format: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~gromero/8209972/v2/8209972_GRAAL_dont_run_RTM_tests_with_Graal_v2.diff >>>>>>> >>>>>>> >>>>>>>> Also can platforms check be replaced with one vmRTMCPU()? Is ppc64 return true from vmRTMCPU()? >>>>>>> >>>>>>> For example, on Linux the following cases are possible regarding CPU / OS >>>>>>> RTM support: >>>>>>> >>>>>>> POWER7 : cpu = false, os = false => vm.rtm.cpu = false >>>>>>> POWER8 : cpu = true, os = false | true => vm.rtm.cpu = false | true >>>>>>> POWER9 VM: cpu = true, os = false | true => vm.rtm.cpu = false | true >>>>>>> POWER9 NV: cpu = true, os = false => vm.rtm.cpu = false >>>>>>> >>>>>>> PPC64 will return 'true' for vmRTMCPU() _only_ if both CPU and OS support >>>>>>> RTM. In other words, when @requires asks for "vm.rtm.cpu & vm.rtm.os" it >>>>>>> really looks like a tautology because if "vm.rtm.cpu" is 'true' it implies >>>>>>> "vm.rtm.os" being 'true' as well, otherwise the JVM would never advertise >>>>>>> the "rtm" feature (which is used by "vm.rtm.cpu"). That seems true for >>>>>>> Linux and for AIX. >>>>>>> >>>>>>> That said I don't think that the platforms check can be replaced with one >>>>>>> vmRTMCPU(), because in some cases it's necessary to run a test for >>>>>>> cpu = false and compiler = true, i.e. it's necessary to run a test on an >>>>>>> unsupported CPU for a given platform _only if_ the compiler in use supports >>>>>>> RTM (like C2). So if, for instance, we do: >>>>>>> >>>>>>> 'vm.rtm.compiler = C2 && vmRTMCPU()' and use 'vm.rtm.compiler' in @requires >>>>>>> we "tie" CPU+OS RTM support to compiler RTM support and the evaluation >>>>>>> returns 'false' for cpu = false and compiler = true, skipping the test >>>>>>> (vm.rtm.compiler = false). On the other hand, to evaluate 'vm.rtm.compiler' >>>>>>> as 'true' and run the test in that case one could match for >>>>>>> '!vm.rtm.compiler', but if compiler = false (Graal) '!vm.rtm.compiler' will >>>>>>> be evaluated as 'true' and the test will run even thought the Graal >>>>>>> compiler is selected, which is wrong. >>>>>>> >>>>>>> Hence 'vm.rtm.compiler' property must not rely on vmRTMCPU() and must >>>>>>> contain its own list of supported compilers with RTM support for each >>>>>>> platform IMO. Basically we can't ask the JVM about the compiler's support >>>>>>> for RTM, since the JVM can only tell us about the CPU+OS support for RTM >>>>>>> regarding the CPU and OS in which the JVM is running on. >>>>>>> >>>>>>> >>>>>>>> And since you check cpu in here why not to replace all vm.rtm.* with one vm.rtm.supported? In such case you would need only one @requires checks in tests instead of: >>>>>>>> >>>>>>>> vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os & vm.rtm.compiler >>>>>>> >>>>>>> I think it's not possible either. Currently there are 5 match cases in >>>>>>> RTM tests: >>>>>>> >>>>>>> gromero at moog:~/hg/jdk/jdk/test/hotspot/jtreg/compiler/rtm$ fgrep @require -RIn | cut -d ' ' -f 2-40 | sort -u >>>>>>> * @requires !(vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os) >>>>>>> * @requires vm.flavor == "server" & !vm.emulatedClient & vm.rtm.cpu & vm.rtm.os >>>>>>> * @requires (!vm.rtm.cpu) & (vm.flavor == "server" & !vm.emulatedClient) >>>>>>> * @requires vm.rtm.cpu & !(vm.flavor == "server" & !vm.emulatedClient) >>>>>>> >>>>>>> which can be simplified 5 cases as: >>>>>>> >>>>>>> 1: !(flavor == "server" & !emulatedClient & cpu & os) >>>>>>> 2: flavor == "server" & !emulatedClient & cpu & os >>>>>>> 3: (!cpu) & (flavor == "server" & !emulatedClient) >>>>>>> 4: cpu & !(flavor == "server" & !emulatedClient) >>>>>>> 5: no @requires >>>>>>> >>>>>>> I understand that case 1 and 2 (since CPU implies OS) can be simplified as: >>>>>>> >>>>>>> >>>>>>> 1: !(flavor == "server" & !emulatedClient & cpu) >>>>>>> 2: flavor == "server" & !emulatedClient & cpu >>>>>>> 3: (!cpu) & (flavor == "server" & !emulatedClient) >>>>>>> 4: cpu & !(flavor == "server" & !emulatedClient) >>>>>>> 5: no @requires >>>>>>> >>>>>>> and case 1 and 2 are mere opposites, so we have 4 cases: >>>>>>> >>>>>>> 1: !(flavor == "server" & !emulatedClient & cpu) >>>>>>> 3: (!cpu) & (flavor == "server" & !emulatedClient) >>>>>>> 4: cpu & !(flavor == "server" & !emulatedClient) >>>>>>> 5: no @requires >>>>>>> >>>>>>> We could simplify further making P = (flavor == "server" & !emulatedClient), >>>>>>> and make: >>>>>>> >>>>>>> 1: !(P & cpu) >>>>>>> 3: (!cpu) & (P) >>>>>>> 4: cpu & !(P) >>>>>>> 5: no @requires >>>>>>> >>>>>>> So if we add a compiler = C2 && (x64 | PPC) property to each of them in >>>>>>> order to control running the tests only if the selected compiler on a >>>>>>> given platform has RTM support (skipping Graal, for instance): >>>>>>> >>>>>>> 1: !(P & cpu) & compiler >>>>>>> 3: (!cpu) & (P) & compiler >>>>>>> 4: cpu & !(P) & compiler >>>>>>> 5: no @requires & compiler >>>>>>> >>>>>>> So it looks like that at minimum we would need 3 properties, but IMO it's >>>>>>> not worth to add another property P = (flavor == "server" & !emulatedClient) >>>>>>> just to simplify further the @requires line. >>>>>>> >>>>>>> In summing up, I think it's only possible to replace 'cpu & os' by 'cpu', >>>>>>> so I updated the webrev removing the vm.rtm.os property and keeping only >>>>>>> vm.rtm.cpu and vm.rtm.compiler (plus flavor and emulatedClient checks). >>>>>>> >>>>>>> I've tested the following scenarios and observed no regression [1]: >>>>>>> >>>>>>> 1. X86_64 w/ RTM >>>>>>> 2. X86_64 w/ RTM + Graal enabled >>>>>>> 3. POWER7: no CPU+OS support for RTM >>>>>>> 4. POWER8: CPU+OS support for RTM >>>>>>> >>>>>>> But I think we need a confirmation from SAP about AIX. >>>>>>> >>>>>>> >>>>>>> Best regards, >>>>>>> Gustavo >>>>>>> >>>>>>> [1] >>>>>>> >>>>>>> ** X86_64 w/ RTM ** >>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>>>>> Test results: passed: 30 >>>>>>> >>>>>>> >>>>>>> ** X86_64 w/ RTM + Graal enabled ** >>>>>>> Test results: no tests selected (all RTM tests skipped) >>>>>>> >>>>>>> >>>>>>> ** POWER7: no CPU+OS support for RTM ** >>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>>>> Test results: passed: 10 >>>>>>> >>>>>>> >>>>>>> ** POWER8: CPU+OS support for RTM ** >>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>>>>> Test results: passed: 30 >>>>>>> >>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>>> On 8/31/18 8:38 AM, Gustavo Romero wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Could the following small change be reviewed please? >>>>>>>>> >>>>>>>>> Bug : https://bugs.openjdk.java.net/browse/JDK-8209972 >>>>>>>>> Webrev: http://cr.openjdk.java.net/~gromero/8209972/v1/ >>>>>>>>> >>>>>>>>> It disables all RTM tests when a compiler not supporting RTM (e.g. Graal) >>>>>>>>> is selected on platforms that can have CPU/OS with RTM support. >>>>>>>>> >>>>>>>>> It also disables all RTM tests for any other platform that has not a single >>>>>>>>> compiler supporting RTM. >>>>>>>>> >>>>>>>>> The RTM support was first added to C2 compiler and once checkers for RTM >>>>>>>>> (notably vm.rtm.cpu) find the feature "rtm" advertised by the JVM they >>>>>>>>> assume that a compiler supporting RTM is available for sure ("rtm" is >>>>>>>>> advertised only if RTM is supported by both CPU and OS). Later the JVM >>>>>>>>> began to allow the selection of a compiler different from C2, like Graal, >>>>>>>>> and it became possible to select a compiler without RTM support despite the >>>>>>>>> fact that both the CPU and the OS support RTM. Thus for platforms >>>>>>>>> supporting Graal or any other specific compiler the compiler availability for >>>>>>>>> the RTM tests must be adjusted and if the selected compiler does not >>>>>>>>> support RTM then all RTM tests must be skipped, including the ones meant >>>>>>>>> for platforms without CPU or OS RTM support (e.g. *Unsupported*.java) >>>>>>>>> because in some cases, like in TestUseRTMLockingOptionOnUnsupportedCPU.java, >>>>>>>>> the test expects JVM initialization errors that will never occur because the >>>>>>>>> problem is not that the RTM support for CPU or OS is missing, but rather >>>>>>>>> because the selected compiler does not support RTM. >>>>>>>>> >>>>>>>>> That change adds a new VM property 'vm.rtm.compiler' which can be used to >>>>>>>>> filter out compilers without RTM support for specific platforms and adapts >>>>>>>>> the current RTM tests to use that new property. >>>>>>>>> >>>>>>>>> Nothing changes regarding the number of passing/selected tests for the >>>>>>>>> various cpu/os/compiler combinations on platforms that currently might >>>>>>>>> support RTM [1], except when Graal is in use. >>>>>>>>> >>>>>>>>> Thank you. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Gustavo >>>>>>>>> >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> >>>>>>>>> ** X64 w/ CPU and OS supporting RTM ** >>>>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>>>>>>> Test results: passed: 30 >>>>>>>>> >>>>>>>>> ** X64 w/ CPU and OS supporting RTM + Graal compiler wo/ RTM support ** >>>>>>>>> Test results: no tests selected (all RTM tests skipped) >>>>>>>>> >>>>>>>>> ** POWER8 w/ CPU and OS supporting RTM ** >>>>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnSupportedConfig.java >>>>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>>>>>> Passed: compiler/rtm/cli/TestRTMTotalCountIncrRateOptionOnSupportedConfig.java >>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnSupportedConfig.java >>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnSupportedConfig.java >>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnSupportedConfig.java >>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMAfterNonRTMDeopt.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMDeoptOnLowAbortRatio.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingCalculationDelay.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMLockingThreshold.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>>>>>>> Passed: compiler/rtm/locking/TestRTMTotalCountIncrRate.java >>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMAfterLockInflation.java >>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMDeopt.java >>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForInflatedLocks.java >>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMForStackLocks.java >>>>>>>>> Passed: compiler/rtm/locking/TestUseRTMXendForLockBusy.java >>>>>>>>> Passed: compiler/rtm/method_options/TestNoRTMLockElidingOption.java >>>>>>>>> Passed: compiler/rtm/method_options/TestUseRTMLockElidingOption.java >>>>>>>>> Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>>>>>>> Test results: passed: 30 >>>>>>>>> >>>>>>>>> ** POWER7 wo/ CPU and OS supporting RTM ** >>>>>>>>> Passed: compiler/rtm/cli/TestPrintPreciseRTMLockingStatisticsOptionOnUnsupportedConfig.java >>>>>>>>> Passed: compiler/rtm/cli/TestRTMAbortThresholdOption.java >>>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingCalculationDelayOption.java >>>>>>>>> Passed: compiler/rtm/cli/TestRTMLockingThresholdOption.java >>>>>>>>> Passed: compiler/rtm/cli/TestRTMRetryCountOption.java >>>>>>>>> Passed: compiler/rtm/cli/TestRTMSpinLoopCountOption.java >>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMDeoptOptionOnUnsupportedConfig.java >>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMForStackLocksOptionOnUnsupportedConfig.java >>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMLockingOptionOnUnsupportedCPU.java >>>>>>>>> Passed: compiler/rtm/cli/TestUseRTMXendForLockBusyOption.java >>>>>>>>> Test results: passed: 10 >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >> From vladimir.kozlov at oracle.com Fri Sep 14 16:13:01 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 14 Sep 2018 09:13:01 -0700 Subject: RFR(S): 8210390: C2 still crashes with "assert(mode == ControlAroundStripMined && use == sfpt) failed: missed a node" In-Reply-To: <0976877f-eaa7-c5e6-fb52-0c0a70467c84@oracle.com> References: <0976877f-eaa7-c5e6-fb52-0c0a70467c84@oracle.com> Message-ID: <22635c3c-4c0e-ccc1-9853-46ffa56dd96c@oracle.com> +1 Vladimir On 9/14/18 6:42 AM, Tobias Hartmann wrote: > Hi Roland, > > that looks good to me. > > Best regards, > Tobias > > On 14.09.2018 09:49, Roland Westrelin wrote: >> >> http://cr.openjdk.java.net/~roland/8210390/webrev.00/ >> >> PhaseIdealLoop::reorg_offsets() creates some data nodes on the exit path >> of a counted loop so they are in the outer strip mined loop. Data nodes >> in the outer strip mined loop are expected to be referenced from the >> safepoint node. But that's not the case for these new nodes which have >> all uses outside the outer strip mined loop. This inconsistency causes a >> later attempt at cloning the loop in the same loop opts pass to break. >> >> The fix is to assign control to the new data nodes that's on the outer >> strip mined loop exit path. >> >> Roland. >> From vladimir.kozlov at oracle.com Fri Sep 14 18:33:17 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 14 Sep 2018 11:33:17 -0700 Subject: RFR(XS): 8024128: guarantee(codelet_size > 0 && (size_t)codelet_size > 2*K) failed: not enough space for interpreter generation In-Reply-To: References: Message-ID: <156208aa-a44c-fe34-c56f-24ffdb619714@oracle.com> Hi Leslie More context is needed. Is it Client or Server VM? Did you change ReservedCodeCacheSize? Even with *4 it is about 1Mb when CodeCache size is 48Mb and in Tiered case even bigger. Also we need call stack when you hit guarantee(). Regards, Vladimir On 9/14/18 4:31 AM, Leslie Zhai wrote: > Hi, > > I just quoted the old thread > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-October/012243.html > > I think we should increase it more for future otherwise you will have to > always catch up with interpreter changes. > > Increase it to 256 * 1024 and 224 * 1024 > > Vladimir > > On 10/16/13 12:22 PM, Albert Noll wrote: > > Hi, > > > > could I have a review for this patch? > > > > bug: https://bugs.openjdk.java.net/browse/JDK-8026708 > > webrev: http://cr.openjdk.java.net/~anoll/8026708/webrev.00/ > > > > > > Problem: Not enough room for interpreter. My last patch did not solve > > the problem for solaris-amd64. > >???????????????? A local build (solaris-amd64) of the most recent > > hotspot-comp version requires a template interpreter > >???????????????? size of 211K (obtained with -XX:+PrintInterpreter). > > There have been some modifications to the template > >???????????????? interpreter in the last couple of weeks which might have > > triggered this error. > > > > Solution: Increase interpreter size by 8k (32-bit and 64-bit). > > > > Testing: Failing test case in solaris-amd64 > > ----- 8< -------- 8< -------- 8< -------- 8< -------- 8< -------- 8< --- > > I found that `InterpreterCodeSize` had been changed from 200K to 208K [1] ,? then changed from 208K > to 256K [2] by Albert.? But if built with-debug-level=fastdebug/slowdebug,? it will be multiplied by > four: > > NOT_PRODUCT(code_size *= 4;)? // debug uses extra interpreter code space > > Then it might trigger Native memory allocation (malloc) failed to allocate xxx bytes for CodeCache: > no room for Interpreter issue. > > I don't want to always catch up with interpreter changes by guessing the suitable number, not too > small, not too big :) Please give me some suggestion about the root cause,? thanks a lot! > > Leslie Zhai > > [1] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/6d7eba360ba4 > > [2] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/74e00b98d5dd > > From vladimir.kozlov at oracle.com Fri Sep 14 18:39:28 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 14 Sep 2018 11:39:28 -0700 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com> Message-ID: Looks good to me. I will start testing and let you know results. Thanks, Vladimir On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > Please find below the updated webrev with all your comments incorporated: > > http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/ > > I have run the jtreg compiler tests on SKX and KNL which have two different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, September 11, 2018 8:54 PM > To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction > > Thank you, Sandhya > > I am satisfied with your detailed answer for memory loads issues. Okay lets not add them. > > Vladimir > > On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote: >> Hi Vladimir, >> >> Thanks a lot for the detailed review. I really appreciate your feedback. >> Please see my response in your email below marked with (Sandhya >>>). Looking forward to your advice. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Tuesday, September 11, 2018 5:11 PM >> To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >> >> Thank you. >> >> I want to discuss next issue: >> >> > You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store? >> >>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa. >> >> This is what I thought. You increase registers pressure this way which may cause spills on stack. >> Also we don't check that register could be the same as result you may get unneeded moves. >> >> I would advice add memory moves at least. >> >> Sandhya >>> I had added those rules initially and removed them in the final patch. I noticed that the register allocator uses the memory rules (e.g. LoadF) to initialize the idealreg2reg mask (matcher.cpp). I would like the register allocator to get all the possible register on an architecture for idealreg2reg mask. I wondered that multiple instruct rules in .ad file for LoadF from memory might cause problems. I would have to have higher cost for loading into restricted register set like vlReg. Then I decided that the register allocator can handle this in much better way than me adding rules to load from memory. This is with the background that the regF is always all the available registers and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I am referring to is: >> MachNode *spillCP = match_tree(new LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >> #endif >> MachNode *spillI = match_tree(new LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); >> MachNode *spillL = match_tree(new LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, LoadNode::DependsO >> nlyOnTest, false)); >> MachNode *spillF = match_tree(new LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); >> MachNode *spillD = match_tree(new LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); >> MachNode *spillP = match_tree(new LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >> .... >> idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); >> >> An other question. You use movflt() and movdbl() which use either movap[s|d] and movs[s|d] instructions: >> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu/x86/macroAssembler_x86.hpp#l164 >> Are these instructions work when avx512vl is not available? I see for vectors you use >> vpxor+vinserti* combination. >> >> Sandhya >>> Yes the scalar floating point instructions are available with AVX512 encoding when avx512vl is not available. That is why you would see not just movflt, movdbl but all the other scalar operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions. >> >> Last question. I notice next UseAVX check in vectors spills code in x86.ad: >> if ((UseAVX < 2) || VM_Version::supports_avx512vl()) >> >> Should it be (UseAVX < 3)? >> >> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev. >> >> Thanks, >> Vladimir >> >> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: >>> Hi Vladimir, >>> >>> Thanks a lot for your review and feedback. Please see my response in your email below. I will send an updated webrev incorporating your feedback. >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Monday, September 10, 2018 6:09 PM >>> To: Viswanathan, Sandhya ; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >>> >>> Very nice. Thank you, Sandhya. >>> >>> I would like to see more meaningful naming in .ad files - instead of rreg* and ovec* to have something like vlReg* and legVec*. >>> >>>>>> Yes, accepted. >>> >>> New load_from_* and load_to_* instructions in .ad files should be renamed to next and collocate with other Move*_reg_reg* instructions: >>> >>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, >>> vlRegF src) >>>>>> Yes, accepted. >>> >>> You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store? >>>>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa. >>> >>> Also please explain why these registers are used when UseAVX == 0?: >>> >>> +instruct absD_reg(rregD dst) %{ >>> predicate((UseSSE>=2) && (UseAVX == 0)); >>> >>> we switch off evex so regular regD (only legacy register in this case) should work too: >>> 661 if (UseAVX < 3) { >>> 662 _features &= ~CPU_AVX512F; >>> >>>>>> Yes, accepted. It could be regD here. >>> >>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): >>> >>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, >>> +vectors_reg_legacy, %{ >>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() && >>> VM_Version::supports_avx512dq() && >>> VM_Version::supports_avx512vl() %} ); >>> >>>>>> Yes, accepted. >>> >>> I would suggest to test these changes on different machines (non-avx512 and avx512) and with different UseAVX values. >>> >>>>>> Will do. >>> >>> Thanks, >>> Vladimir >>> >>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: >>>> Recently there have been couple of high priority issues with regards >>>> to high bank of XMM register >>>> (XMM16-XMM31) usage by C2: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8207746 >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>>> >>>> Please find below a patch which attempts to clean up the XMM register handling by using register groups. >>>> >>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ >>>> >>>> >>>> The patch provides a restricted set of registers to the match rules >>>> in the ad file based on the underlying architecture. >>>> >>>> The aim is to remove special handling/workaround from macro assembler and assembler. >>>> >>>> By removing the special handling, the patch reduces the overall code size by about 1800 lines of code. >>>> >>>> Your review and feedback is very welcome. >>>> >>>> Best Regards, >>>> >>>> Sandhya >>>> From vladimir.kozlov at oracle.com Fri Sep 14 19:12:45 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 14 Sep 2018 12:12:45 -0700 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com> Message-ID: <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com> I got build failure: workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the array (which contains 16 elements) [-Werror,-Warray-bounds] jib > _xmm_regs[16] = xmm16; I also noticed that we don't have RFE for this work. I filed: https://bugs.openjdk.java.net/browse/JDK-8209735 You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next change to yours in src/hotspot/cpu/x86/globals_x86.hpp: - product(intx, UseAVX, 2, \ + product(intx, UseAVX, 3, \ Thanks, Vladimir On 9/14/18 11:39 AM, Vladimir Kozlov wrote: > Looks good to me. I will start testing and let you know results. > > Thanks, > Vladimir > > On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote: >> Hi Vladimir, >> >> Please find below the updated webrev with all your comments incorporated: >> >> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/ >> >> I have run the jtreg compiler tests on SKX and KNL which have two different flavors of AVX512 and >> Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Tuesday, September 11, 2018 8:54 PM >> To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >> >> Thank you, Sandhya >> >> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them. >> >> Vladimir >> >> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote: >>> Hi Vladimir, >>> >>> Thanks a lot for the detailed review. I really appreciate your feedback. >>> Please see my response in your email below marked with (Sandhya >>>). Looking forward to your >>> advice. >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Tuesday, September 11, 2018 5:11 PM >>> To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >>> >>> Thank you. >>> >>> I want to discuss next issue: >>> >>> ?? > You did not added instructions to load these registers from memory (and stack). What happens >>> in such cases when you need to load or store? >>> ?? >>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory >>> into regF and then register to register move to rregF and vice versa. >>> >>> This is what I thought. You increase registers pressure this way which may cause spills on stack. >>> Also we don't check that register could be the same as result you may get unneeded moves. >>> >>> I would advice add memory moves at least. >>> >>> Sandhya >>>? I had added those rules initially and removed them in the final patch. I noticed >>> that the register allocator uses the memory rules (e.g. LoadF) to initialize the idealreg2reg >>> mask (matcher.cpp). I would like the register allocator to get all the possible register on an >>> architecture for idealreg2reg mask. I wondered that multiple instruct rules in .ad file for LoadF >>> from memory might cause problems.? I would have to have higher cost for loading into restricted >>> register set like vlReg. Then I decided that the register allocator can handle this in much >>> better way than me adding rules to load from memory. This is with the background that the regF is >>> always all the available registers and vlRegF is the restricted register set. Likewise for VecS >>> and legVecS. Let me know you thoughts on this and if I should still add the rules to load from >>> memory into vlReg and legVec. The specific code from matcher.cpp that I am referring to is: >>> ??? MachNode *spillCP = match_tree(new >>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>> #endif >>> ??? MachNode *spillI? = match_tree(new LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); >>> ??? MachNode *spillL? = match_tree(new >>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, LoadNode::DependsO >>> nlyOnTest, false)); >>> ??? MachNode *spillF? = match_tree(new LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); >>> ??? MachNode *spillD? = match_tree(new LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); >>> ??? MachNode *spillP? = match_tree(new >>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>> ??? .... >>> ??? idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); >>> >>> An other question. You use movflt() and movdbl() which use either movap[s|d] and movs[s|d] >>> instructions: >>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu/x86/macroAssembler_x86.hpp#l164 >>> Are these instructions work when avx512vl is not available? I see for vectors you use >>> vpxor+vinserti* combination. >>> >>> Sandhya >>> Yes the scalar floating point instructions are available with AVX512 encoding when >>> avx512vl is not available. That is why you would see not just movflt, movdbl but all the other >>> scalar operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they >>> are AVX512F instructions. >>> >>> Last question. I notice next UseAVX check in vectors spills code in x86.ad: >>> if ((UseAVX < 2) || VM_Version::supports_avx512vl()) >>> >>> Should it be (UseAVX < 3)? >>> >>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev. >>> >>> Thanks, >>> Vladimir >>> >>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: >>>> Hi Vladimir, >>>> >>>> Thanks a lot for your review and feedback. Please see my response in your email below. I will >>>> send an updated webrev incorporating your feedback. >>>> >>>> Best Regards, >>>> Sandhya >>>> >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Monday, September 10, 2018 6:09 PM >>>> To: Viswanathan, Sandhya ; >>>> hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >>>> >>>> Very nice. Thank you, Sandhya. >>>> >>>> I would like to see more meaningful naming in .ad files - instead of rreg* and ovec* to have >>>> something like vlReg* and legVec*. >>>> >>>>>>> Yes, accepted. >>>> >>>> New load_from_* and load_to_* instructions in .ad files should be renamed to next and collocate >>>> with other Move*_reg_reg* instructions: >>>> >>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, >>>> vlRegF src) >>>>>>> Yes, accepted. >>>> >>>> You did not added instructions to load these registers from memory (and stack). What happens in >>>> such cases when you need to load or store? >>>>>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into >>>>>>> regF and then register to register move to rregF and vice versa. >>>> >>>> Also please explain why these registers are used when UseAVX == 0?: >>>> >>>> +instruct absD_reg(rregD dst) %{ >>>> ?????? predicate((UseSSE>=2) && (UseAVX == 0)); >>>> >>>> we switch off evex so regular regD (only legacy register in this case) should work too: >>>> ???? 661?? if (UseAVX < 3) { >>>> ???? 662???? _features &= ~CPU_AVX512F; >>>> >>>>>>> Yes, accepted. It could be regD here. >>>> >>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): >>>> >>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, >>>> +vectors_reg_legacy, %{ >>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() && >>>> VM_Version::supports_avx512dq() && >>>> VM_Version::supports_avx512vl() %} ); >>>> >>>>>>> Yes, accepted. >>>> >>>> I would suggest to test these changes on different machines (non-avx512 and avx512) and with >>>> different UseAVX values. >>>> >>>>>>> Will do. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: >>>>> Recently there have been couple of high priority issues with regards >>>>> to high bank of XMM register >>>>> (XMM16-XMM31) usage by C2: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8207746 >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>>>> >>>>> Please find below a patch which attempts to clean up the XMM register handling by using >>>>> register groups. >>>>> >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ >>>>> >>>>> >>>>> The patch provides a restricted set of registers to the match rules >>>>> in the ad file based on the underlying architecture. >>>>> >>>>> The aim is to remove special handling/workaround from macro assembler and assembler. >>>>> >>>>> By removing the special handling, the patch reduces the overall code size by about 1800 lines >>>>> of code. >>>>> >>>>> Your review and feedback is very welcome. >>>>> >>>>> Best Regards, >>>>> >>>>> Sandhya >>>>> From sandhya.viswanathan at intel.com Fri Sep 14 20:27:29 2018 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 14 Sep 2018 20:27:29 +0000 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com> <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com> Thanks Vladimir, the below should fix this issue: ------------------------------ --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700 +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700 @@ -233,22 +233,6 @@ _xmm_regs[13] = xmm13; _xmm_regs[14] = xmm14; _xmm_regs[15] = xmm15; - _xmm_regs[16] = xmm16; - _xmm_regs[17] = xmm17; - _xmm_regs[18] = xmm18; - _xmm_regs[19] = xmm19; - _xmm_regs[20] = xmm20; - _xmm_regs[21] = xmm21; - _xmm_regs[22] = xmm22; - _xmm_regs[23] = xmm23; - _xmm_regs[24] = xmm24; - _xmm_regs[25] = xmm25; - _xmm_regs[26] = xmm26; - _xmm_regs[27] = xmm27; - _xmm_regs[28] = xmm28; - _xmm_regs[29] = xmm29; - _xmm_regs[30] = xmm30; - _xmm_regs[31] = xmm31; #endif // _LP64 for (int i = 0; i < 8; i++) { --------------------------------- I think the gcc version on my desktop is older so didn?t catch this. The updated patch along with the above change and setting UseAVX as 3 is uploaded to: Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/ RFE: https://bugs.openjdk.java.net/browse/JDK-8209735 FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you before changing it back to 3. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Friday, September 14, 2018 12:13 PM To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction I got build failure: workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the array (which contains 16 elements) [-Werror,-Warray-bounds] jib > _xmm_regs[16] = xmm16; I also noticed that we don't have RFE for this work. I filed: https://bugs.openjdk.java.net/browse/JDK-8209735 You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next change to yours in src/hotspot/cpu/x86/globals_x86.hpp: - product(intx, UseAVX, 2, \ + product(intx, UseAVX, 3, \ Thanks, Vladimir On 9/14/18 11:39 AM, Vladimir Kozlov wrote: > Looks good to me. I will start testing and let you know results. > > Thanks, > Vladimir > > On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote: >> Hi Vladimir, >> >> Please find below the updated webrev with all your comments incorporated: >> >> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/ >> >> I have run the jtreg compiler tests on SKX and KNL which have two >> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Tuesday, September 11, 2018 8:54 PM >> To: Viswanathan, Sandhya ; >> hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >> >> Thank you, Sandhya >> >> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them. >> >> Vladimir >> >> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote: >>> Hi Vladimir, >>> >>> Thanks a lot for the detailed review. I really appreciate your feedback. >>> Please see my response in your email below marked with (Sandhya >>> >>>). Looking forward to your advice. >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Tuesday, September 11, 2018 5:11 PM >>> To: Viswanathan, Sandhya ; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>> instruction >>> >>> Thank you. >>> >>> I want to discuss next issue: >>> >>> ?? > You did not added instructions to load these registers from >>> memory (and stack). What happens in such cases when you need to load or store? >>> ?? >>>> Let us take an example, e.g. for loading into rregF. First >>> it gets loaded from memory into regF and then register to register move to rregF and vice versa. >>> >>> This is what I thought. You increase registers pressure this way which may cause spills on stack. >>> Also we don't check that register could be the same as result you may get unneeded moves. >>> >>> I would advice add memory moves at least. >>> >>> Sandhya >>>? I had added those rules initially and removed them in >>> the final patch. I noticed that the register allocator uses the >>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask >>> (matcher.cpp). I would like the register allocator to get all the >>> possible register on an architecture for idealreg2reg mask. I >>> wondered that multiple instruct rules in .ad file for LoadF from >>> memory might cause problems.? I would have to have higher cost for >>> loading into restricted register set like vlReg. Then I decided that >>> the register allocator can handle this in much better way than me >>> adding rules to load from memory. This is with the background that the regF is always all the available registers and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I am referring to is: >>> ??? MachNode *spillCP = match_tree(new >>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>> #endif >>> ??? MachNode *spillI? = match_tree(new >>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); >>> ??? MachNode *spillL? = match_tree(new >>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, >>> LoadNode::DependsO nlyOnTest, false)); >>> ??? MachNode *spillF? = match_tree(new >>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); >>> ??? MachNode *spillD? = match_tree(new >>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); >>> ??? MachNode *spillP? = match_tree(new >>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>> ??? .... >>> ??? idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); >>> >>> An other question. You use movflt() and movdbl() which use either >>> movap[s|d] and movs[s|d] >>> instructions: >>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu >>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when >>> avx512vl is not available? I see for vectors you use >>> vpxor+vinserti* combination. >>> >>> Sandhya >>> Yes the scalar floating point instructions are available >>> with AVX512 encoding when avx512vl is not available. That is why you >>> would see not just movflt, movdbl but all the other scalar >>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions. >>> >>> Last question. I notice next UseAVX check in vectors spills code in x86.ad: >>> if ((UseAVX < 2) || VM_Version::supports_avx512vl()) >>> >>> Should it be (UseAVX < 3)? >>> >>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev. >>> >>> Thanks, >>> Vladimir >>> >>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: >>>> Hi Vladimir, >>>> >>>> Thanks a lot for your review and feedback. Please see my response >>>> in your email below. I will send an updated webrev incorporating your feedback. >>>> >>>> Best Regards, >>>> Sandhya >>>> >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Monday, September 10, 2018 6:09 PM >>>> To: Viswanathan, Sandhya ; >>>> hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>>> instruction >>>> >>>> Very nice. Thank you, Sandhya. >>>> >>>> I would like to see more meaningful naming in .ad files - instead >>>> of rreg* and ovec* to have something like vlReg* and legVec*. >>>> >>>>>>> Yes, accepted. >>>> >>>> New load_from_* and load_to_* instructions in .ad files should be >>>> renamed to next and collocate with other Move*_reg_reg* instructions: >>>> >>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, >>>> vlRegF src) >>>>>>> Yes, accepted. >>>> >>>> You did not added instructions to load these registers from memory >>>> (and stack). What happens in such cases when you need to load or store? >>>>>>> Let us take an example, e.g. for loading into rregF. First it >>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa. >>>> >>>> Also please explain why these registers are used when UseAVX == 0?: >>>> >>>> +instruct absD_reg(rregD dst) %{ >>>> ?????? predicate((UseSSE>=2) && (UseAVX == 0)); >>>> >>>> we switch off evex so regular regD (only legacy register in this case) should work too: >>>> ???? 661?? if (UseAVX < 3) { >>>> ???? 662???? _features &= ~CPU_AVX512F; >>>> >>>>>>> Yes, accepted. It could be regD here. >>>> >>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): >>>> >>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, >>>> +vectors_reg_legacy, %{ >>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() && >>>> VM_Version::supports_avx512dq() && >>>> VM_Version::supports_avx512vl() %} ); >>>> >>>>>>> Yes, accepted. >>>> >>>> I would suggest to test these changes on different machines >>>> (non-avx512 and avx512) and with different UseAVX values. >>>> >>>>>>> Will do. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: >>>>> Recently there have been couple of high priority issues with >>>>> regards to high bank of XMM register >>>>> (XMM16-XMM31) usage by C2: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8207746 >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>>>> >>>>> Please find below a patch which attempts to clean up the XMM >>>>> register handling by using register groups. >>>>> >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ >>>>> >>>>> >>>>> The patch provides a restricted set of registers to the match >>>>> rules in the ad file based on the underlying architecture. >>>>> >>>>> The aim is to remove special handling/workaround from macro assembler and assembler. >>>>> >>>>> By removing the special handling, the patch reduces the overall >>>>> code size by about 1800 lines of code. >>>>> >>>>> Your review and feedback is very welcome. >>>>> >>>>> Best Regards, >>>>> >>>>> Sandhya >>>>> From vladimir.kozlov at oracle.com Fri Sep 14 22:49:49 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 14 Sep 2018 15:49:49 -0700 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com> <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com> Message-ID: <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed. Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too. 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation' on CPU with AVX1 only # SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884 # Problematic frame: # V [libjvm.so+0x46f0f0] MachNode::ideal_reg() const+0x20 Current CompileTask: C2: 154 5 java.lang.String::equals (65 bytes) Stack: [0x00007f3b10044000,0x00007f3b10145000], sp=0x00007f3b1013fe70, free space=1007k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x46f0f0] MachNode::ideal_reg() const+0x20 V [libjvm.so+0x882a72] PhaseChaitin::gather_lrg_masks(bool)+0x872 V [libjvm.so+0xd82235] PhaseCFG::global_code_motion()+0xfc5 V [libjvm.so+0xd824b1] PhaseCFG::do_global_code_motion()+0x51 V [libjvm.so+0xa2c26c] Compile::Code_Gen()+0x24c V [libjvm.so+0xa2ff82] Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, DirectiveSet*)+0xe42 ------------------------------------------------------------------------------------------------ 2. compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java with '-Xcomp' # Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073 # assert(false) failed: cannot spill interval that is used in first instruction (possible reason: no register found) Current CompileTask: C1: 854767 13391 3 org.sunflow.math.Matrix4::multiply (692 bytes) Stack: [0x00007f23b9d82000,0x00007f23b9e83000], sp=0x00007f23b9e7f9d0, free space=1014k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x1882202] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x562 V [libjvm.so+0x1882d2f] VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, __va_list_tag*)+0x2f V [libjvm.so+0xb0bea0] report_vm_error(char const*, int, char const*, char const*, ...)+0x100 V [libjvm.so+0x7e0410] LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0 V [libjvm.so+0x7e0a20] LinearScanWalker::activate_current()+0x280 V [libjvm.so+0x7e0c7d] IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d V [libjvm.so+0x7e1078] LinearScan::allocate_registers()+0x338 V [libjvm.so+0x7e2135] LinearScan::do_linear_scan()+0x155 V [libjvm.so+0x70a6bb] Compilation::emit_lir()+0x99b V [libjvm.so+0x70caff] Compilation::compile_java_method()+0x42f V [libjvm.so+0x70d974] Compilation::compile_method()+0x1d4 V [libjvm.so+0x70e547] Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, DirectiveSet*)+0x357 V [libjvm.so+0x71073c] Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c V [libjvm.so+0xa3cf89] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409 Vladimir On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote: > > > Thanks Vladimir, the below should fix this issue: > > ------------------------------ > --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700 > +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700 > @@ -233,22 +233,6 @@ > _xmm_regs[13] = xmm13; > _xmm_regs[14] = xmm14; > _xmm_regs[15] = xmm15; > - _xmm_regs[16] = xmm16; > - _xmm_regs[17] = xmm17; > - _xmm_regs[18] = xmm18; > - _xmm_regs[19] = xmm19; > - _xmm_regs[20] = xmm20; > - _xmm_regs[21] = xmm21; > - _xmm_regs[22] = xmm22; > - _xmm_regs[23] = xmm23; > - _xmm_regs[24] = xmm24; > - _xmm_regs[25] = xmm25; > - _xmm_regs[26] = xmm26; > - _xmm_regs[27] = xmm27; > - _xmm_regs[28] = xmm28; > - _xmm_regs[29] = xmm29; > - _xmm_regs[30] = xmm30; > - _xmm_regs[31] = xmm31; > #endif // _LP64 > > for (int i = 0; i < 8; i++) { > --------------------------------- > > I think the gcc version on my desktop is older so didn?t catch this. > > The updated patch along with the above change and setting UseAVX as 3 is uploaded to: > Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/ > RFE: https://bugs.openjdk.java.net/browse/JDK-8209735 > > FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you before changing it back to 3. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Friday, September 14, 2018 12:13 PM > To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction > > I got build failure: > > workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the array (which contains 16 elements) [-Werror,-Warray-bounds] > jib > _xmm_regs[16] = xmm16; > > I also noticed that we don't have RFE for this work. I filed: > > https://bugs.openjdk.java.net/browse/JDK-8209735 > > You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next change to yours in src/hotspot/cpu/x86/globals_x86.hpp: > > - product(intx, UseAVX, 2, \ > + product(intx, UseAVX, 3, \ > > Thanks, > Vladimir > > On 9/14/18 11:39 AM, Vladimir Kozlov wrote: >> Looks good to me. I will start testing and let you know results. >> >> Thanks, >> Vladimir >> >> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote: >>> Hi Vladimir, >>> >>> Please find below the updated webrev with all your comments incorporated: >>> >>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/ >>> >>> I have run the jtreg compiler tests on SKX and KNL which have two >>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms. >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Tuesday, September 11, 2018 8:54 PM >>> To: Viswanathan, Sandhya ; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >>> >>> Thank you, Sandhya >>> >>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them. >>> >>> Vladimir >>> >>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote: >>>> Hi Vladimir, >>>> >>>> Thanks a lot for the detailed review. I really appreciate your feedback. >>>> Please see my response in your email below marked with (Sandhya >>>>>>> ). Looking forward to your advice. >>>> >>>> Best Regards, >>>> Sandhya >>>> >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Tuesday, September 11, 2018 5:11 PM >>>> To: Viswanathan, Sandhya ; >>>> hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>>> instruction >>>> >>>> Thank you. >>>> >>>> I want to discuss next issue: >>>> >>>> ?? > You did not added instructions to load these registers from >>>> memory (and stack). What happens in such cases when you need to load or store? >>>> ?? >>>> Let us take an example, e.g. for loading into rregF. First >>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa. >>>> >>>> This is what I thought. You increase registers pressure this way which may cause spills on stack. >>>> Also we don't check that register could be the same as result you may get unneeded moves. >>>> >>>> I would advice add memory moves at least. >>>> >>>> Sandhya >>>? I had added those rules initially and removed them in >>>> the final patch. I noticed that the register allocator uses the >>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask >>>> (matcher.cpp). I would like the register allocator to get all the >>>> possible register on an architecture for idealreg2reg mask. I >>>> wondered that multiple instruct rules in .ad file for LoadF from >>>> memory might cause problems.? I would have to have higher cost for >>>> loading into restricted register set like vlReg. Then I decided that >>>> the register allocator can handle this in much better way than me >>>> adding rules to load from memory. This is with the background that the regF is always all the available registers and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I am referring to is: >>>> ??? MachNode *spillCP = match_tree(new >>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>>> #endif >>>> ??? MachNode *spillI? = match_tree(new >>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); >>>> ??? MachNode *spillL? = match_tree(new >>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, >>>> LoadNode::DependsO nlyOnTest, false)); >>>> ??? MachNode *spillF? = match_tree(new >>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); >>>> ??? MachNode *spillD? = match_tree(new >>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); >>>> ??? MachNode *spillP? = match_tree(new >>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>>> ??? .... >>>> ??? idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); >>>> >>>> An other question. You use movflt() and movdbl() which use either >>>> movap[s|d] and movs[s|d] >>>> instructions: >>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu >>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when >>>> avx512vl is not available? I see for vectors you use >>>> vpxor+vinserti* combination. >>>> >>>> Sandhya >>> Yes the scalar floating point instructions are available >>>> with AVX512 encoding when avx512vl is not available. That is why you >>>> would see not just movflt, movdbl but all the other scalar >>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions. >>>> >>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad: >>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl()) >>>> >>>> Should it be (UseAVX < 3)? >>>> >>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: >>>>> Hi Vladimir, >>>>> >>>>> Thanks a lot for your review and feedback. Please see my response >>>>> in your email below. I will send an updated webrev incorporating your feedback. >>>>> >>>>> Best Regards, >>>>> Sandhya >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Monday, September 10, 2018 6:09 PM >>>>> To: Viswanathan, Sandhya ; >>>>> hotspot-compiler-dev at openjdk.java.net >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>>>> instruction >>>>> >>>>> Very nice. Thank you, Sandhya. >>>>> >>>>> I would like to see more meaningful naming in .ad files - instead >>>>> of rreg* and ovec* to have something like vlReg* and legVec*. >>>>> >>>>>>>> Yes, accepted. >>>>> >>>>> New load_from_* and load_to_* instructions in .ad files should be >>>>> renamed to next and collocate with other Move*_reg_reg* instructions: >>>>> >>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, >>>>> vlRegF src) >>>>>>>> Yes, accepted. >>>>> >>>>> You did not added instructions to load these registers from memory >>>>> (and stack). What happens in such cases when you need to load or store? >>>>>>>> Let us take an example, e.g. for loading into rregF. First it >>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa. >>>>> >>>>> Also please explain why these registers are used when UseAVX == 0?: >>>>> >>>>> +instruct absD_reg(rregD dst) %{ >>>>> ?????? predicate((UseSSE>=2) && (UseAVX == 0)); >>>>> >>>>> we switch off evex so regular regD (only legacy register in this case) should work too: >>>>> ???? 661?? if (UseAVX < 3) { >>>>> ???? 662???? _features &= ~CPU_AVX512F; >>>>> >>>>>>>> Yes, accepted. It could be regD here. >>>>> >>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): >>>>> >>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, >>>>> +vectors_reg_legacy, %{ >>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() && >>>>> VM_Version::supports_avx512dq() && >>>>> VM_Version::supports_avx512vl() %} ); >>>>> >>>>>>>> Yes, accepted. >>>>> >>>>> I would suggest to test these changes on different machines >>>>> (non-avx512 and avx512) and with different UseAVX values. >>>>> >>>>>>>> Will do. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: >>>>>> Recently there have been couple of high priority issues with >>>>>> regards to high bank of XMM register >>>>>> (XMM16-XMM31) usage by C2: >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746 >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>>>>> >>>>>> Please find below a patch which attempts to clean up the XMM >>>>>> register handling by using register groups. >>>>>> >>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ >>>>>> >>>>>> >>>>>> The patch provides a restricted set of registers to the match >>>>>> rules in the ad file based on the underlying architecture. >>>>>> >>>>>> The aim is to remove special handling/workaround from macro assembler and assembler. >>>>>> >>>>>> By removing the special handling, the patch reduces the overall >>>>>> code size by about 1800 lines of code. >>>>>> >>>>>> Your review and feedback is very welcome. >>>>>> >>>>>> Best Regards, >>>>>> >>>>>> Sandhya >>>>>> From vladimir.kozlov at oracle.com Sat Sep 15 00:22:41 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 14 Sep 2018 17:22:41 -0700 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com> <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com> <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com> Message-ID: <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com> I gave incorrect link to RFE. Here is correct: https://bugs.openjdk.java.net/browse/JDK-8210764 Vladimir On 9/14/18 3:49 PM, Vladimir Kozlov wrote: > Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did > not noticed. > > Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on > avx512 too. > > 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 > -XX:-TieredCompilation' on CPU with AVX1 only > > #? SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884 > # Problematic frame: > # V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20 > > Current CompileTask: > C2:??? 154??? 5???????????? java.lang.String::equals (65 bytes) > > Stack: [0x00007f3b10044000,0x00007f3b10145000],? sp=0x00007f3b1013fe70,? free space=1007k > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native > code) > V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20 > V? [libjvm.so+0x882a72]? PhaseChaitin::gather_lrg_masks(bool)+0x872 > V? [libjvm.so+0xd82235]? PhaseCFG::global_code_motion()+0xfc5 > V? [libjvm.so+0xd824b1]? PhaseCFG::do_global_code_motion()+0x51 > V? [libjvm.so+0xa2c26c]? Compile::Code_Gen()+0x24c > V? [libjvm.so+0xa2ff82]? Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, > DirectiveSet*)+0xe42 > > ------------------------------------------------------------------------------------------------ > 2. compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java with '-Xcomp' > #? Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073 > #? assert(false) failed: cannot spill interval that is used in first instruction (possible reason: > no register found) > > Current CompileTask: > C1: 854767 13391?????? 3?????? org.sunflow.math.Matrix4::multiply (692 bytes) > > Stack: [0x00007f23b9d82000,0x00007f23b9e83000],? sp=0x00007f23b9e7f9d0,? free space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native > code) > V? [libjvm.so+0x1882202]? VMError::report_and_die(int, char const*, char const*, __va_list_tag*, > Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x562 > V? [libjvm.so+0x1882d2f]? VMError::report_and_die(Thread*, void*, char const*, int, char const*, > char const*, __va_list_tag*)+0x2f > V? [libjvm.so+0xb0bea0]? report_vm_error(char const*, int, char const*, char const*, ...)+0x100 > V? [libjvm.so+0x7e0410]? LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0 > V? [libjvm.so+0x7e0a20]? LinearScanWalker::activate_current()+0x280 > V? [libjvm.so+0x7e0c7d]? IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d > V? [libjvm.so+0x7e1078]? LinearScan::allocate_registers()+0x338 > V? [libjvm.so+0x7e2135]? LinearScan::do_linear_scan()+0x155 > V? [libjvm.so+0x70a6bb]? Compilation::emit_lir()+0x99b > V? [libjvm.so+0x70caff]? Compilation::compile_java_method()+0x42f > V? [libjvm.so+0x70d974]? Compilation::compile_method()+0x1d4 > V? [libjvm.so+0x70e547]? Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, > BufferBlob*, DirectiveSet*)+0x357 > V? [libjvm.so+0x71073c]? Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c > V? [libjvm.so+0xa3cf89]? CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409 > > Vladimir > > On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote: >> >> Thanks Vladimir, the below should fix this issue: >> >> ------------------------------ >> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700 >> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700 >> @@ -233,22 +233,6 @@ >> ??? _xmm_regs[13]? = xmm13; >> ??? _xmm_regs[14]? = xmm14; >> ??? _xmm_regs[15]? = xmm15; >> -? _xmm_regs[16]? = xmm16; >> -? _xmm_regs[17]? = xmm17; >> -? _xmm_regs[18]? = xmm18; >> -? _xmm_regs[19]? = xmm19; >> -? _xmm_regs[20]? = xmm20; >> -? _xmm_regs[21]? = xmm21; >> -? _xmm_regs[22]? = xmm22; >> -? _xmm_regs[23]? = xmm23; >> -? _xmm_regs[24]? = xmm24; >> -? _xmm_regs[25]? = xmm25; >> -? _xmm_regs[26]? = xmm26; >> -? _xmm_regs[27]? = xmm27; >> -? _xmm_regs[28]? = xmm28; >> -? _xmm_regs[29]? = xmm29; >> -? _xmm_regs[30]? = xmm30; >> -? _xmm_regs[31]? = xmm31; >> ? #endif // _LP64 >> >> ??? for (int i = 0; i < 8; i++) { >> --------------------------------- >> >> I think the gcc version on my desktop is older so didn?t catch this. >> >> The updated patch along with the above change and setting UseAVX as 3 is uploaded to: >> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/ >> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735 >> >> FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation >> from you before changing it back to 3. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Friday, September 14, 2018 12:13 PM >> To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >> >> I got build failure: >> >> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the >> end of the array (which contains 16 elements) [-Werror,-Warray-bounds] >> jib >?? _xmm_regs[16]? = xmm16; >> >> I also noticed that we don't have RFE for this work. I filed: >> >> https://bugs.openjdk.java.net/browse/JDK-8209735 >> >> You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). >> I added next change to yours in src/hotspot/cpu/x86/globals_x86.hpp: >> >> - product(intx, UseAVX, 2, \ >> + product(intx, UseAVX, 3, \ >> >> Thanks, >> Vladimir >> >> On 9/14/18 11:39 AM, Vladimir Kozlov wrote: >>> Looks good to me. I will start testing and let you know results. >>> >>> Thanks, >>> Vladimir >>> >>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote: >>>> Hi Vladimir, >>>> >>>> Please find below the updated webrev with all your comments incorporated: >>>> >>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/ >>>> >>>> I have run the jtreg compiler tests on SKX and KNL which have two >>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the >>>> three platforms. >>>> >>>> Best Regards, >>>> Sandhya >>>> >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Tuesday, September 11, 2018 8:54 PM >>>> To: Viswanathan, Sandhya ; >>>> hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >>>> >>>> Thank you, Sandhya >>>> >>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them. >>>> >>>> Vladimir >>>> >>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote: >>>>> Hi Vladimir, >>>>> >>>>> Thanks a lot for the detailed review. I really appreciate your feedback. >>>>> Please see my response in your email below marked with (Sandhya >>>>>>>> ). Looking forward to your advice. >>>>> >>>>> Best Regards, >>>>> Sandhya >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, September 11, 2018 5:11 PM >>>>> To: Viswanathan, Sandhya ; >>>>> hotspot-compiler-dev at openjdk.java.net >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>>>> instruction >>>>> >>>>> Thank you. >>>>> >>>>> I want to discuss next issue: >>>>> >>>>> ??? > You did not added instructions to load these registers from >>>>> memory (and stack). What happens in such cases when you need to load or store? >>>>> ??? >>>> Let us take an example, e.g. for loading into rregF. First >>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa. >>>>> >>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack. >>>>> Also we don't check that register could be the same as result you may get unneeded moves. >>>>> >>>>> I would advice add memory moves at least. >>>>> >>>>> Sandhya >>>? I had added those rules initially and removed them in >>>>> the final patch. I noticed that the register allocator uses the >>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask >>>>> (matcher.cpp). I would like the register allocator to get all the >>>>> possible register on an architecture for idealreg2reg mask. I >>>>> wondered that multiple instruct rules in .ad file for LoadF from >>>>> memory might cause problems.? I would have to have higher cost for >>>>> loading into restricted register set like vlReg. Then I decided that >>>>> the register allocator can handle this in much better way than me >>>>> adding rules to load from memory. This is with the background that the regF is always all the >>>>> available registers and vlRegF is the restricted register set. Likewise for VecS and legVecS. >>>>> Let me know you thoughts on this and if I should still add the rules to load from memory into >>>>> vlReg and legVec. The specific code from matcher.cpp that I am referring to is: >>>>> ???? MachNode *spillCP = match_tree(new >>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>>>> #endif >>>>> ???? MachNode *spillI? = match_tree(new >>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); >>>>> ???? MachNode *spillL? = match_tree(new >>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, >>>>> LoadNode::DependsO nlyOnTest, false)); >>>>> ???? MachNode *spillF? = match_tree(new >>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); >>>>> ???? MachNode *spillD? = match_tree(new >>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); >>>>> ???? MachNode *spillP? = match_tree(new >>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>>>> ???? .... >>>>> ???? idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); >>>>> >>>>> An other question. You use movflt() and movdbl() which use either >>>>> movap[s|d] and movs[s|d] >>>>> instructions: >>>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu >>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when >>>>> avx512vl is not available? I see for vectors you use >>>>> vpxor+vinserti* combination. >>>>> >>>>> Sandhya >>> Yes the scalar floating point instructions are available >>>>> with AVX512 encoding when avx512vl is not available. That is why you >>>>> would see not just movflt, movdbl but all the other scalar >>>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are >>>>> AVX512F instructions. >>>>> >>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad: >>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl()) >>>>> >>>>> Should it be (UseAVX < 3)? >>>>> >>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: >>>>>> Hi Vladimir, >>>>>> >>>>>> Thanks a lot for your review and feedback. Please see my response >>>>>> in your email below. I will send an updated webrev incorporating your feedback. >>>>>> >>>>>> Best Regards, >>>>>> Sandhya >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>> Sent: Monday, September 10, 2018 6:09 PM >>>>>> To: Viswanathan, Sandhya ; >>>>>> hotspot-compiler-dev at openjdk.java.net >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>>>>> instruction >>>>>> >>>>>> Very nice. Thank you, Sandhya. >>>>>> >>>>>> I would like to see more meaningful naming in .ad files - instead >>>>>> of rreg* and ovec* to have something like vlReg* and legVec*. >>>>>> >>>>>>>>> Yes, accepted. >>>>>> >>>>>> New load_from_* and load_to_* instructions in .ad files should be >>>>>> renamed to next and collocate with other Move*_reg_reg* instructions: >>>>>> >>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, >>>>>> vlRegF src) >>>>>>>>> Yes, accepted. >>>>>> >>>>>> You did not added instructions to load these registers from memory >>>>>> (and stack). What happens in such cases when you need to load or store? >>>>>>>>> Let us take an example, e.g. for loading into rregF. First it >>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa. >>>>>> >>>>>> Also please explain why these registers are used when UseAVX == 0?: >>>>>> >>>>>> +instruct absD_reg(rregD dst) %{ >>>>>> ??????? predicate((UseSSE>=2) && (UseAVX == 0)); >>>>>> >>>>>> we switch off evex so regular regD (only legacy register in this case) should work too: >>>>>> ????? 661?? if (UseAVX < 3) { >>>>>> ????? 662???? _features &= ~CPU_AVX512F; >>>>>> >>>>>>>>> Yes, accepted. It could be regD here. >>>>>> >>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have >>>>>> some): >>>>>> >>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, >>>>>> +vectors_reg_legacy, %{ >>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() && >>>>>> VM_Version::supports_avx512dq() && >>>>>> VM_Version::supports_avx512vl() %} ); >>>>>> >>>>>>>>> Yes, accepted. >>>>>> >>>>>> I would suggest to test these changes on different machines >>>>>> (non-avx512 and avx512) and with different UseAVX values. >>>>>> >>>>>>>>> Will do. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: >>>>>>> Recently there have been couple of high priority issues with >>>>>>> regards to high bank of XMM register >>>>>>> (XMM16-XMM31) usage by C2: >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746 >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>>>>>> >>>>>>> Please find below a patch which attempts to clean up the XMM >>>>>>> register handling by using register groups. >>>>>>> >>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ >>>>>>> >>>>>>> >>>>>>> The patch provides a restricted set of registers to the match >>>>>>> rules in the ad file based on the underlying architecture. >>>>>>> >>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler. >>>>>>> >>>>>>> By removing the special handling, the patch reduces the overall >>>>>>> code size by about 1800 lines of code. >>>>>>> >>>>>>> Your review and feedback is very welcome. >>>>>>> >>>>>>> Best Regards, >>>>>>> >>>>>>> Sandhya >>>>>>> From zhaixiang at loongson.cn Sat Sep 15 02:51:27 2018 From: zhaixiang at loongson.cn (Leslie Zhai) Date: Sat, 15 Sep 2018 10:51:27 +0800 Subject: RFR(XS): 8024128: guarantee(codelet_size > 0 && (size_t)codelet_size > 2*K) failed: not enough space for interpreter generation In-Reply-To: <156208aa-a44c-fe34-c56f-24ffdb619714@oracle.com> References: <156208aa-a44c-fe34-c56f-24ffdb619714@oracle.com> Message-ID: <12aa8902-a767-607d-f20c-462e9d0bc306@loongson.cn> Hi Vladimir, Thanks for your kind response! It is Server VM, I am just debugging HotSpot C2.? Yes, I changed ReservedCodeCacheSize to 3m. It is able to reproduce by jtreg for jdk8u fastdebug when InterpreterCodeSize is too big, for example 640K: $ jtreg -dir:/home/xiangzhai/project/jdk8u/hotspot/test -verbose:all -a -ignore:quiet -timeoutFactor:5 -agentvm -testjdk:/home/xiangzhai/project/jdk8u/build/linux-x86_64-normal-server-fastdebug/images/j2sdk-image compiler/startup/SmallCodeCacheStartup.java Native memory allocation (malloc) failed to allocate 2621440 bytes for CodeCache: no room for Interpreter It is clear that failed to allocate BufferBlob -> CodeBlob. And it is also able to reproduce for release when too small, for example 200K, but *no* changing ReservedCodeCacheSize: $ jtreg -dir:/home/xiangzhai/project/jdk8u/jdk/test -verbose:all -exclude:/home/xiangzhai/project/jdk8u/jdk/test/ProblemList.txt -conc:2 -Xmx512m -a -ignore:quiet -timeoutFactor:5 -agentvm -testjdk:/home/xiangzhai/project/jdk8u/build/linux-x86_64-normal-server-release/images/j2sdk-image com/sun/jdi/AccessSpecifierTest.java It is sensible that codelet_size = AbstractInterpreter::code()->available_space() - 2*K? is too small. And CodeBuffer might failed to verify the allocation for each section due to guarantee(sect->end() <= tend, "sanity"); So for X86 the suitable range of InterpreterCodeSize might be [250K, 600K], but for AArch64 is 200K[1].? What is the root cause behind it?? It is just like magic number by running with +PrintInterpreter to get the VM to print out the size.? I need to dig deep-in to find the root cause. [1] http://hg.openjdk.java.net/aarch64-port/jdk8u/hotspot/file/6bc3e4922a8b/src/cpu/aarch64/vm/templateInterpreter_aarch64.hpp#l38 ? 2018?09?15? 02:33, Vladimir Kozlov ??: > Hi Leslie > > More context is needed. Is it Client or Server VM? Did you change > ReservedCodeCacheSize? > Even with *4 it is about 1Mb when CodeCache size is 48Mb and in Tiered > case even bigger. > Also we need call stack when you hit guarantee(). > > Regards, > Vladimir > > On 9/14/18 4:31 AM, Leslie Zhai wrote: >> Hi, >> >> I just quoted the old thread >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-October/012243.html >> >> I think we should increase it more for future otherwise you will have to >> always catch up with interpreter changes. >> >> Increase it to 256 * 1024 and 224 * 1024 >> >> Vladimir >> >> On 10/16/13 12:22 PM, Albert Noll wrote: >> ?> Hi, >> ?> >> ?> could I have a review for this patch? >> ?> >> ?> bug: https://bugs.openjdk.java.net/browse/JDK-8026708 >> ?> webrev: http://cr.openjdk.java.net/~anoll/8026708/webrev.00/ >> ?> >> ?> >> ?> Problem: Not enough room for interpreter. My last patch did not solve >> ?> the problem for solaris-amd64. >> ?>???????????????? A local build (solaris-amd64) of the most recent >> ?> hotspot-comp version requires a template interpreter >> ?>???????????????? size of 211K (obtained with -XX:+PrintInterpreter). >> ?> There have been some modifications to the template >> ?>???????????????? interpreter in the last couple of weeks which >> might have >> ?> triggered this error. >> ?> >> ?> Solution: Increase interpreter size by 8k (32-bit and 64-bit). >> ?> >> ?> Testing: Failing test case in solaris-amd64 >> >> ----- 8< -------- 8< -------- 8< -------- 8< -------- 8< -------- 8< --- >> >> I found that `InterpreterCodeSize` had been changed from 200K to 208K >> [1] ,? then changed from 208K to 256K [2] by Albert.? But if built >> with-debug-level=fastdebug/slowdebug,? it will be multiplied by four: >> >> NOT_PRODUCT(code_size *= 4;)? // debug uses extra interpreter code space >> >> Then it might trigger Native memory allocation (malloc) failed to >> allocate xxx bytes for CodeCache: no room for Interpreter issue. >> >> I don't want to always catch up with interpreter changes by guessing >> the suitable number, not too small, not too big :) Please give me >> some suggestion about the root cause,? thanks a lot! >> >> Leslie Zhai >> >> [1] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/6d7eba360ba4 >> >> [2] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/74e00b98d5dd >> >> -- Regards, Leslie Zhai From vladimir.kozlov at oracle.com Mon Sep 17 17:14:13 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 17 Sep 2018 10:14:13 -0700 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com> <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com> <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com> <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com> Message-ID: <3c0fe42a-0606-a060-7435-32547676683b@oracle.com> I finished testing on avx512 machine. All passed except known (TestNaNVector.java) failures. Thanks, Vladimir On 9/14/18 5:22 PM, Vladimir Kozlov wrote: > I gave incorrect link to RFE. Here is correct: > > https://bugs.openjdk.java.net/browse/JDK-8210764 > > Vladimir > > On 9/14/18 3:49 PM, Vladimir Kozlov wrote: >> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed. >> >> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too. >> >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation' on CPU >> with AVX1 only >> >> #? SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884 >> # Problematic frame: >> # V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20 >> >> Current CompileTask: >> C2:??? 154??? 5???????????? java.lang.String::equals (65 bytes) >> >> Stack: [0x00007f3b10044000,0x00007f3b10145000],? sp=0x00007f3b1013fe70,? free space=1007k >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20 >> V? [libjvm.so+0x882a72]? PhaseChaitin::gather_lrg_masks(bool)+0x872 >> V? [libjvm.so+0xd82235]? PhaseCFG::global_code_motion()+0xfc5 >> V? [libjvm.so+0xd824b1]? PhaseCFG::do_global_code_motion()+0x51 >> V? [libjvm.so+0xa2c26c]? Compile::Code_Gen()+0x24c >> V? [libjvm.so+0xa2ff82]? Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, DirectiveSet*)+0xe42 >> >> ------------------------------------------------------------------------------------------------ >> 2. compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java with '-Xcomp' >> #? Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073 >> #? assert(false) failed: cannot spill interval that is used in first instruction (possible reason: no register found) >> >> Current CompileTask: >> C1: 854767 13391?????? 3?????? org.sunflow.math.Matrix4::multiply (692 bytes) >> >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],? sp=0x00007f23b9e7f9d0,? free space=1014k >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V? [libjvm.so+0x1882202]? VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned >> char*, void*, void*, char const*, int, unsigned long)+0x562 >> V? [libjvm.so+0x1882d2f]? VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, >> __va_list_tag*)+0x2f >> V? [libjvm.so+0xb0bea0]? report_vm_error(char const*, int, char const*, char const*, ...)+0x100 >> V? [libjvm.so+0x7e0410]? LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0 >> V? [libjvm.so+0x7e0a20]? LinearScanWalker::activate_current()+0x280 >> V? [libjvm.so+0x7e0c7d]? IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d >> V? [libjvm.so+0x7e1078]? LinearScan::allocate_registers()+0x338 >> V? [libjvm.so+0x7e2135]? LinearScan::do_linear_scan()+0x155 >> V? [libjvm.so+0x70a6bb]? Compilation::emit_lir()+0x99b >> V? [libjvm.so+0x70caff]? Compilation::compile_java_method()+0x42f >> V? [libjvm.so+0x70d974]? Compilation::compile_method()+0x1d4 >> V? [libjvm.so+0x70e547]? Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, >> DirectiveSet*)+0x357 >> V? [libjvm.so+0x71073c]? Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c >> V? [libjvm.so+0xa3cf89]? CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409 >> >> Vladimir >> >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote: >>> >>> Thanks Vladimir, the below should fix this issue: >>> >>> ------------------------------ >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700 >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700 >>> @@ -233,22 +233,6 @@ >>> ??? _xmm_regs[13]? = xmm13; >>> ??? _xmm_regs[14]? = xmm14; >>> ??? _xmm_regs[15]? = xmm15; >>> -? _xmm_regs[16]? = xmm16; >>> -? _xmm_regs[17]? = xmm17; >>> -? _xmm_regs[18]? = xmm18; >>> -? _xmm_regs[19]? = xmm19; >>> -? _xmm_regs[20]? = xmm20; >>> -? _xmm_regs[21]? = xmm21; >>> -? _xmm_regs[22]? = xmm22; >>> -? _xmm_regs[23]? = xmm23; >>> -? _xmm_regs[24]? = xmm24; >>> -? _xmm_regs[25]? = xmm25; >>> -? _xmm_regs[26]? = xmm26; >>> -? _xmm_regs[27]? = xmm27; >>> -? _xmm_regs[28]? = xmm28; >>> -? _xmm_regs[29]? = xmm29; >>> -? _xmm_regs[30]? = xmm30; >>> -? _xmm_regs[31]? = xmm31; >>> ? #endif // _LP64 >>> >>> ??? for (int i = 0; i < 8; i++) { >>> --------------------------------- >>> >>> I think the gcc version on my desktop is older so didn?t catch this. >>> >>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to: >>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/ >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735 >>> >>> FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you before >>> changing it back to 3. >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Friday, September 14, 2018 12:13 PM >>> To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >>> >>> I got build failure: >>> >>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the array >>> (which contains 16 elements) [-Werror,-Warray-bounds] >>> jib >?? _xmm_regs[16]? = xmm16; >>> >>> I also noticed that we don't have RFE for this work. I filed: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>> >>> You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp: >>> >>> - product(intx, UseAVX, 2, \ >>> + product(intx, UseAVX, 3, \ >>> >>> Thanks, >>> Vladimir >>> >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote: >>>> Looks good to me. I will start testing and let you know results. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote: >>>>> Hi Vladimir, >>>>> >>>>> Please find below the updated webrev with all your comments incorporated: >>>>> >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/ >>>>> >>>>> I have run the jtreg compiler tests on SKX and KNL which have two >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms. >>>>> >>>>> Best Regards, >>>>> Sandhya >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, September 11, 2018 8:54 PM >>>>> To: Viswanathan, Sandhya ; >>>>> hotspot-compiler-dev at openjdk.java.net >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >>>>> >>>>> Thank you, Sandhya >>>>> >>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them. >>>>> >>>>> Vladimir >>>>> >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote: >>>>>> Hi Vladimir, >>>>>> >>>>>> Thanks a lot for the detailed review. I really appreciate your feedback. >>>>>> Please see my response in your email below marked with (Sandhya >>>>>>>>> ). Looking forward to your advice. >>>>>> >>>>>> Best Regards, >>>>>> Sandhya >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM >>>>>> To: Viswanathan, Sandhya ; >>>>>> hotspot-compiler-dev at openjdk.java.net >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>>>>> instruction >>>>>> >>>>>> Thank you. >>>>>> >>>>>> I want to discuss next issue: >>>>>> >>>>>> ??? > You did not added instructions to load these registers from >>>>>> memory (and stack). What happens in such cases when you need to load or store? >>>>>> ??? >>>> Let us take an example, e.g. for loading into rregF. First >>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa. >>>>>> >>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack. >>>>>> Also we don't check that register could be the same as result you may get unneeded moves. >>>>>> >>>>>> I would advice add memory moves at least. >>>>>> >>>>>> Sandhya >>>? I had added those rules initially and removed them in >>>>>> the final patch. I noticed that the register allocator uses the >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask >>>>>> (matcher.cpp). I would like the register allocator to get all the >>>>>> possible register on an architecture for idealreg2reg mask. I >>>>>> wondered that multiple instruct rules in .ad file for LoadF from >>>>>> memory might cause problems.? I would have to have higher cost for >>>>>> loading into restricted register set like vlReg. Then I decided that >>>>>> the register allocator can handle this in much better way than me >>>>>> adding rules to load from memory. This is with the background that the regF is always all the available registers >>>>>> and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if >>>>>> I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I >>>>>> am referring to is: >>>>>> ???? MachNode *spillCP = match_tree(new >>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>>>>> #endif >>>>>> ???? MachNode *spillI? = match_tree(new >>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); >>>>>> ???? MachNode *spillL? = match_tree(new >>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, >>>>>> LoadNode::DependsO nlyOnTest, false)); >>>>>> ???? MachNode *spillF? = match_tree(new >>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); >>>>>> ???? MachNode *spillD? = match_tree(new >>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); >>>>>> ???? MachNode *spillP? = match_tree(new >>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>>>>> ???? .... >>>>>> ???? idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); >>>>>> >>>>>> An other question. You use movflt() and movdbl() which use either >>>>>> movap[s|d] and movs[s|d] >>>>>> instructions: >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when >>>>>> avx512vl is not available? I see for vectors you use >>>>>> vpxor+vinserti* combination. >>>>>> >>>>>> Sandhya >>> Yes the scalar floating point instructions are available >>>>>> with AVX512 encoding when avx512vl is not available. That is why you >>>>>> would see not just movflt, movdbl but all the other scalar >>>>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions. >>>>>> >>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad: >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl()) >>>>>> >>>>>> Should it be (UseAVX < 3)? >>>>>> >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: >>>>>>> Hi Vladimir, >>>>>>> >>>>>>> Thanks a lot for your review and feedback. Please see my response >>>>>>> in your email below. I will send an updated webrev incorporating your feedback. >>>>>>> >>>>>>> Best Regards, >>>>>>> Sandhya >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>> Sent: Monday, September 10, 2018 6:09 PM >>>>>>> To: Viswanathan, Sandhya ; >>>>>>> hotspot-compiler-dev at openjdk.java.net >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>>>>>> instruction >>>>>>> >>>>>>> Very nice. Thank you, Sandhya. >>>>>>> >>>>>>> I would like to see more meaningful naming in .ad files - instead >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*. >>>>>>> >>>>>>>>>> Yes, accepted. >>>>>>> >>>>>>> New load_from_* and load_to_* instructions in .ad files should be >>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions: >>>>>>> >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, >>>>>>> vlRegF src) >>>>>>>>>> Yes, accepted. >>>>>>> >>>>>>> You did not added instructions to load these registers from memory >>>>>>> (and stack). What happens in such cases when you need to load or store? >>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it >>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa. >>>>>>> >>>>>>> Also please explain why these registers are used when UseAVX == 0?: >>>>>>> >>>>>>> +instruct absD_reg(rregD dst) %{ >>>>>>> ??????? predicate((UseSSE>=2) && (UseAVX == 0)); >>>>>>> >>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too: >>>>>>> ????? 661?? if (UseAVX < 3) { >>>>>>> ????? 662???? _features &= ~CPU_AVX512F; >>>>>>> >>>>>>>>>> Yes, accepted. It could be regD here. >>>>>>> >>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): >>>>>>> >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, >>>>>>> +vectors_reg_legacy, %{ >>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() && >>>>>>> VM_Version::supports_avx512dq() && >>>>>>> VM_Version::supports_avx512vl() %} ); >>>>>>> >>>>>>>>>> Yes, accepted. >>>>>>> >>>>>>> I would suggest to test these changes on different machines >>>>>>> (non-avx512 and avx512) and with different UseAVX values. >>>>>>> >>>>>>>>>> Will do. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: >>>>>>>> Recently there have been couple of high priority issues with >>>>>>>> regards to high bank of XMM register >>>>>>>> (XMM16-XMM31) usage by C2: >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746 >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>>>>>>> >>>>>>>> Please find below a patch which attempts to clean up the XMM >>>>>>>> register handling by using register groups. >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ >>>>>>>> >>>>>>>> >>>>>>>> The patch provides a restricted set of registers to the match >>>>>>>> rules in the ad file based on the underlying architecture. >>>>>>>> >>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler. >>>>>>>> >>>>>>>> By removing the special handling, the patch reduces the overall >>>>>>>> code size by about 1800 lines of code. >>>>>>>> >>>>>>>> Your review and feedback is very welcome. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> >>>>>>>> Sandhya >>>>>>>> From sandhya.viswanathan at intel.com Mon Sep 17 18:18:57 2018 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Mon, 17 Sep 2018 18:18:57 +0000 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: <3c0fe42a-0606-a060-7435-32547676683b@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com> <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com> <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com> <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com> <3c0fe42a-0606-a060-7435-32547676683b@oracle.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com> Hi Vladimir, I have the fix for the TextNaNVector.java. This was a corner case, when the -XX:MaxVectorSize=4 is given on the command line. I need your advice on the second problem with NativeCallTest.java. I am not able to reproduce it but from my analysis this seems to be the case when there is a call to graal compiled code from C1. The method being called (org.sunflow.math.Matrix4::multiply) has greater than 16 floating point arguments. Per graal calling convention all the arguments need to be passed in xmm register. Since there are more than 16 arguments, some of them need to be passed in XMM register > 15. As you know, I had restricted the C1 register allocator to only XMM0-15 so it has no way of copying the arguments to appropriate XMM register say XMM16 before calling the graal compiled method. The solution then seems to be to remove the restriction and allow C1 to have all the registers. But if I go with this solution, there is one case of negation of floating point in C1 which needs special handling. In C1 negation of floating point is being done using xorps, xorpd. The xorpd and xorps instructions are not available in AVX512F (KNL) with higher bank (XMM > 15). The only assembler level alternative for this seems to be to have the ugly workaround for KNL (push_zmm(xmm0), .... pop_zmm(xmm0)). Any other solution is not straightforward. Solutions like using subss/subsd from 0.0 doesn?t work as src/dst can be the same in the call to LIRAssembler. So I cannot load 0.0 in dst and do a subtraction. For C2, I could go with providing restricted register set to the instruction and letting the register allocator do the work. I am not familiar with C1 register allocator or codegen. So my question is would you be ok with sequence like below for C1 on Knights family only when the dst/src register is > 15 for negatesd(xorpd), negatess(xorps)? push_zmm(xmm0); movdbl(xmm0, nds); vxorpd(xmm0, xmm0, src, Assembler::AVX_128bit); movdbl(dst, xmm0); pop_zmm(xmm0); Please let me know your thoughts and if a better solution is possible. In the meantime I will prepare a patch with the above solutions for the two problems that you reported, do some testing and send it to you. Note: For NativeCallTest.java and many others I get the following message in the corresponding .jtr file: "test result: Error. Use -nativepath to specify the location of native code" Do I need to give any additional info to jtreg to get over this problem? Thanks a lot! Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Monday, September 17, 2018 10:14 AM To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction I finished testing on avx512 machine. All passed except known (TestNaNVector.java) failures. Thanks, Vladimir On 9/14/18 5:22 PM, Vladimir Kozlov wrote: > I gave incorrect link to RFE. Here is correct: > > https://bugs.openjdk.java.net/browse/JDK-8210764 > > Vladimir > > On 9/14/18 3:49 PM, Vladimir Kozlov wrote: >> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed. >> >> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too. >> >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation' on CPU >> with AVX1 only >> >> #? SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884 >> # Problematic frame: >> # V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20 >> >> Current CompileTask: >> C2:??? 154??? 5???????????? java.lang.String::equals (65 bytes) >> >> Stack: [0x00007f3b10044000,0x00007f3b10145000],? sp=0x00007f3b1013fe70,? free space=1007k >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20 >> V? [libjvm.so+0x882a72]? PhaseChaitin::gather_lrg_masks(bool)+0x872 >> V? [libjvm.so+0xd82235]? PhaseCFG::global_code_motion()+0xfc5 >> V? [libjvm.so+0xd824b1]? PhaseCFG::do_global_code_motion()+0x51 >> V? [libjvm.so+0xa2c26c]? Compile::Code_Gen()+0x24c >> V? [libjvm.so+0xa2ff82]? Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, DirectiveSet*)+0xe42 >> >> ------------------------------------------------------------------------------------------------ >> 2. compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java with '-Xcomp' >> #? Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073 >> #? assert(false) failed: cannot spill interval that is used in first instruction (possible reason: no register found) >> >> Current CompileTask: >> C1: 854767 13391?????? 3?????? org.sunflow.math.Matrix4::multiply (692 bytes) >> >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],? sp=0x00007f23b9e7f9d0,? free space=1014k >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V? [libjvm.so+0x1882202]? VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned >> char*, void*, void*, char const*, int, unsigned long)+0x562 >> V? [libjvm.so+0x1882d2f]? VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, >> __va_list_tag*)+0x2f >> V? [libjvm.so+0xb0bea0]? report_vm_error(char const*, int, char const*, char const*, ...)+0x100 >> V? [libjvm.so+0x7e0410]? LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0 >> V? [libjvm.so+0x7e0a20]? LinearScanWalker::activate_current()+0x280 >> V? [libjvm.so+0x7e0c7d]? IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d >> V? [libjvm.so+0x7e1078]? LinearScan::allocate_registers()+0x338 >> V? [libjvm.so+0x7e2135]? LinearScan::do_linear_scan()+0x155 >> V? [libjvm.so+0x70a6bb]? Compilation::emit_lir()+0x99b >> V? [libjvm.so+0x70caff]? Compilation::compile_java_method()+0x42f >> V? [libjvm.so+0x70d974]? Compilation::compile_method()+0x1d4 >> V? [libjvm.so+0x70e547]? Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, >> DirectiveSet*)+0x357 >> V? [libjvm.so+0x71073c]? Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c >> V? [libjvm.so+0xa3cf89]? CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409 >> >> Vladimir >> >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote: >>> >>> Thanks Vladimir, the below should fix this issue: >>> >>> ------------------------------ >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700 >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700 >>> @@ -233,22 +233,6 @@ >>> ??? _xmm_regs[13]? = xmm13; >>> ??? _xmm_regs[14]? = xmm14; >>> ??? _xmm_regs[15]? = xmm15; >>> -? _xmm_regs[16]? = xmm16; >>> -? _xmm_regs[17]? = xmm17; >>> -? _xmm_regs[18]? = xmm18; >>> -? _xmm_regs[19]? = xmm19; >>> -? _xmm_regs[20]? = xmm20; >>> -? _xmm_regs[21]? = xmm21; >>> -? _xmm_regs[22]? = xmm22; >>> -? _xmm_regs[23]? = xmm23; >>> -? _xmm_regs[24]? = xmm24; >>> -? _xmm_regs[25]? = xmm25; >>> -? _xmm_regs[26]? = xmm26; >>> -? _xmm_regs[27]? = xmm27; >>> -? _xmm_regs[28]? = xmm28; >>> -? _xmm_regs[29]? = xmm29; >>> -? _xmm_regs[30]? = xmm30; >>> -? _xmm_regs[31]? = xmm31; >>> ? #endif // _LP64 >>> >>> ??? for (int i = 0; i < 8; i++) { >>> --------------------------------- >>> >>> I think the gcc version on my desktop is older so didn?t catch this. >>> >>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to: >>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/ >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735 >>> >>> FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you before >>> changing it back to 3. >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Friday, September 14, 2018 12:13 PM >>> To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >>> >>> I got build failure: >>> >>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the array >>> (which contains 16 elements) [-Werror,-Warray-bounds] >>> jib >?? _xmm_regs[16]? = xmm16; >>> >>> I also noticed that we don't have RFE for this work. I filed: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>> >>> You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp: >>> >>> - product(intx, UseAVX, 2, \ >>> + product(intx, UseAVX, 3, \ >>> >>> Thanks, >>> Vladimir >>> >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote: >>>> Looks good to me. I will start testing and let you know results. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote: >>>>> Hi Vladimir, >>>>> >>>>> Please find below the updated webrev with all your comments incorporated: >>>>> >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/ >>>>> >>>>> I have run the jtreg compiler tests on SKX and KNL which have two >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms. >>>>> >>>>> Best Regards, >>>>> Sandhya >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, September 11, 2018 8:54 PM >>>>> To: Viswanathan, Sandhya ; >>>>> hotspot-compiler-dev at openjdk.java.net >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >>>>> >>>>> Thank you, Sandhya >>>>> >>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them. >>>>> >>>>> Vladimir >>>>> >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote: >>>>>> Hi Vladimir, >>>>>> >>>>>> Thanks a lot for the detailed review. I really appreciate your feedback. >>>>>> Please see my response in your email below marked with (Sandhya >>>>>>>>> ). Looking forward to your advice. >>>>>> >>>>>> Best Regards, >>>>>> Sandhya >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM >>>>>> To: Viswanathan, Sandhya ; >>>>>> hotspot-compiler-dev at openjdk.java.net >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>>>>> instruction >>>>>> >>>>>> Thank you. >>>>>> >>>>>> I want to discuss next issue: >>>>>> >>>>>> ??? > You did not added instructions to load these registers from >>>>>> memory (and stack). What happens in such cases when you need to load or store? >>>>>> ??? >>>> Let us take an example, e.g. for loading into rregF. First >>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa. >>>>>> >>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack. >>>>>> Also we don't check that register could be the same as result you may get unneeded moves. >>>>>> >>>>>> I would advice add memory moves at least. >>>>>> >>>>>> Sandhya >>>? I had added those rules initially and removed them in >>>>>> the final patch. I noticed that the register allocator uses the >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask >>>>>> (matcher.cpp). I would like the register allocator to get all the >>>>>> possible register on an architecture for idealreg2reg mask. I >>>>>> wondered that multiple instruct rules in .ad file for LoadF from >>>>>> memory might cause problems.? I would have to have higher cost for >>>>>> loading into restricted register set like vlReg. Then I decided that >>>>>> the register allocator can handle this in much better way than me >>>>>> adding rules to load from memory. This is with the background that the regF is always all the available registers >>>>>> and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if >>>>>> I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I >>>>>> am referring to is: >>>>>> ???? MachNode *spillCP = match_tree(new >>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>>>>> #endif >>>>>> ???? MachNode *spillI? = match_tree(new >>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); >>>>>> ???? MachNode *spillL? = match_tree(new >>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, >>>>>> LoadNode::DependsO nlyOnTest, false)); >>>>>> ???? MachNode *spillF? = match_tree(new >>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); >>>>>> ???? MachNode *spillD? = match_tree(new >>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); >>>>>> ???? MachNode *spillP? = match_tree(new >>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>>>>> ???? .... >>>>>> ???? idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); >>>>>> >>>>>> An other question. You use movflt() and movdbl() which use either >>>>>> movap[s|d] and movs[s|d] >>>>>> instructions: >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when >>>>>> avx512vl is not available? I see for vectors you use >>>>>> vpxor+vinserti* combination. >>>>>> >>>>>> Sandhya >>> Yes the scalar floating point instructions are available >>>>>> with AVX512 encoding when avx512vl is not available. That is why you >>>>>> would see not just movflt, movdbl but all the other scalar >>>>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions. >>>>>> >>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad: >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl()) >>>>>> >>>>>> Should it be (UseAVX < 3)? >>>>>> >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: >>>>>>> Hi Vladimir, >>>>>>> >>>>>>> Thanks a lot for your review and feedback. Please see my response >>>>>>> in your email below. I will send an updated webrev incorporating your feedback. >>>>>>> >>>>>>> Best Regards, >>>>>>> Sandhya >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>> Sent: Monday, September 10, 2018 6:09 PM >>>>>>> To: Viswanathan, Sandhya ; >>>>>>> hotspot-compiler-dev at openjdk.java.net >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>>>>>> instruction >>>>>>> >>>>>>> Very nice. Thank you, Sandhya. >>>>>>> >>>>>>> I would like to see more meaningful naming in .ad files - instead >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*. >>>>>>> >>>>>>>>>> Yes, accepted. >>>>>>> >>>>>>> New load_from_* and load_to_* instructions in .ad files should be >>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions: >>>>>>> >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, >>>>>>> vlRegF src) >>>>>>>>>> Yes, accepted. >>>>>>> >>>>>>> You did not added instructions to load these registers from memory >>>>>>> (and stack). What happens in such cases when you need to load or store? >>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it >>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa. >>>>>>> >>>>>>> Also please explain why these registers are used when UseAVX == 0?: >>>>>>> >>>>>>> +instruct absD_reg(rregD dst) %{ >>>>>>> ??????? predicate((UseSSE>=2) && (UseAVX == 0)); >>>>>>> >>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too: >>>>>>> ????? 661?? if (UseAVX < 3) { >>>>>>> ????? 662???? _features &= ~CPU_AVX512F; >>>>>>> >>>>>>>>>> Yes, accepted. It could be regD here. >>>>>>> >>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): >>>>>>> >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, >>>>>>> +vectors_reg_legacy, %{ >>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() && >>>>>>> VM_Version::supports_avx512dq() && >>>>>>> VM_Version::supports_avx512vl() %} ); >>>>>>> >>>>>>>>>> Yes, accepted. >>>>>>> >>>>>>> I would suggest to test these changes on different machines >>>>>>> (non-avx512 and avx512) and with different UseAVX values. >>>>>>> >>>>>>>>>> Will do. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: >>>>>>>> Recently there have been couple of high priority issues with >>>>>>>> regards to high bank of XMM register >>>>>>>> (XMM16-XMM31) usage by C2: >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746 >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>>>>>>> >>>>>>>> Please find below a patch which attempts to clean up the XMM >>>>>>>> register handling by using register groups. >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ >>>>>>>> >>>>>>>> >>>>>>>> The patch provides a restricted set of registers to the match >>>>>>>> rules in the ad file based on the underlying architecture. >>>>>>>> >>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler. >>>>>>>> >>>>>>>> By removing the special handling, the patch reduces the overall >>>>>>>> code size by about 1800 lines of code. >>>>>>>> >>>>>>>> Your review and feedback is very welcome. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> >>>>>>>> Sandhya >>>>>>>> From vladimir.kozlov at oracle.com Mon Sep 17 18:32:59 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 17 Sep 2018 11:32:59 -0700 Subject: [12] RFR(S) 8209574: [AOT] breakpoint events are generated in different threads does not meet expected count Message-ID: <92f97ba3-ab35-b4b4-4364-423eecb6eff3@oracle.com> http://cr.openjdk.java.net/~kvn/8209574/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8209574 Disable AOT when debugger is attached. -- Thanks, Vladimir From vladimir.kozlov at oracle.com Mon Sep 17 18:37:18 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 17 Sep 2018 11:37:18 -0700 Subject: [12] RFR(S) 8209574: [AOT] breakpoint events are generated in different threads does not meet expected count In-Reply-To: <92f97ba3-ab35-b4b4-4364-423eecb6eff3@oracle.com> References: <92f97ba3-ab35-b4b4-4364-423eecb6eff3@oracle.com> Message-ID: <0b383f5d-75eb-c3a9-5e13-e8d11e0b4c58@oracle.com> Pressed 'Send' too soon. I also removed unused AOT functions and added '--info' flag to AOT test driver class AotCompiler.java to investigate cases when jaotc is timed-out (we have several bugs, for example, 8209769). Thanks, Vladimir On 9/17/18 11:32 AM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/8209574/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8209574 > > Disable AOT when debugger is attached. > From dean.long at oracle.com Mon Sep 17 18:48:11 2018 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 17 Sep 2018 11:48:11 -0700 Subject: [12] RFR(S) 8209574: [AOT] breakpoint events are generated in different threads does not meet expected count In-Reply-To: <0b383f5d-75eb-c3a9-5e13-e8d11e0b4c58@oracle.com> References: <92f97ba3-ab35-b4b4-4364-423eecb6eff3@oracle.com> <0b383f5d-75eb-c3a9-5e13-e8d11e0b4c58@oracle.com> Message-ID: >? assert(UseAOT, "called only only when AOT is enabled"); typo "only only".? The rest looks good. dl On 9/17/18 11:37 AM, Vladimir Kozlov wrote: > Pressed 'Send' too soon. > > I also removed unused AOT functions and added '--info' flag to AOT > test driver class AotCompiler.java to investigate cases when jaotc is > timed-out (we have several bugs, for example, 8209769). > > Thanks, > Vladimir > > On 9/17/18 11:32 AM, Vladimir Kozlov wrote: >> http://cr.openjdk.java.net/~kvn/8209574/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8209574 >> >> Disable AOT when debugger is attached. >> From vladimir.kozlov at oracle.com Mon Sep 17 19:03:09 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 17 Sep 2018 12:03:09 -0700 Subject: [12] RFR(S) 8209574: [AOT] breakpoint events are generated in different threads does not meet expected count In-Reply-To: References: <92f97ba3-ab35-b4b4-4364-423eecb6eff3@oracle.com> <0b383f5d-75eb-c3a9-5e13-e8d11e0b4c58@oracle.com> Message-ID: <7bb8254c-8d91-3667-6cf1-f463d49b9ea8@oracle.com> Thank you, Dean On 9/17/18 11:48 AM, dean.long at oracle.com wrote: > >? assert(UseAOT, "called only only when AOT is enabled"); > > typo "only only".? The rest looks good. Fixed. Vladimir > > dl > > On 9/17/18 11:37 AM, Vladimir Kozlov wrote: >> Pressed 'Send' too soon. >> >> I also removed unused AOT functions and added '--info' flag to AOT test driver class AotCompiler.java to investigate >> cases when jaotc is timed-out (we have several bugs, for example, 8209769). >> >> Thanks, >> Vladimir >> >> On 9/17/18 11:32 AM, Vladimir Kozlov wrote: >>> http://cr.openjdk.java.net/~kvn/8209574/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8209574 >>> >>> Disable AOT when debugger is attached. >>> > From gromero at linux.vnet.ibm.com Mon Sep 17 21:48:38 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 17 Sep 2018 18:48:38 -0300 Subject: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad In-Reply-To: References: <5e113528b9c94b5089b8b7f1b8b22ed8@sap.com> Message-ID: Hi Michi, On 09/14/2018 08:42 AM, Michihiro Horie wrote: > Hi Martin, > > Thank you for your comment to improve this change and testing it. I uploaded a new webrev with format statements. > http://cr.openjdk.java.net/~mhorie/8210660/webrev.02/ Thanks for the updated webrev. Only some nits: Besides the missing closing quotes (") in MTVSRWZ and XXSPLTW format strings I see trailing spaces in the following lines: - immI8 zero %{ (int) 0 %} + immI8 zero %{ (int) 0 %} - xscvdpspn_regF(tmpV, src); + xscvdpspn_regF(tmpV, src); Curious enough, jcheck [1] is not complaining about them. I found it because I set the color extension [2] in .hgrc: [extensions] color = which marks trailing whitespaces in red. I looks like some trailing spaces slipped also into related change "8188139: PPC64: Superword Level Parallelization with VSX", in case you want to fix them in a next change. Finally, I think it would be better in XXPERMDI format string to replace "Permute 16-byte register" to something like "Splat doubleword" to be like the description in XXSPLTW that says "Splat word". Otherwise, LGTM. Reviewed. I'll sponsor that change. Best regards, Gustavo [1] http://openjdk.java.net/projects/code-tools/jcheck/ [2] https://www.mercurial-scm.org/wiki/ColorExtension > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > Inactive hide details for "Doerr, Martin" ---2018/09/14 17:30:04---Hi Michihiro, your webrev"Doerr, Martin" ---2018/09/14 17:30:04---Hi Michihiro, your webrev > > From: "Doerr, Martin" > To: Michihiro Horie > Cc: "Lindenmaier, Goetz" , Gustavo Romero , "hotspot-compiler-dev at openjdk.java.net" > Date: 2018/09/14 17:30 > Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > Hi Michihiro, > > your webrev > _http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/_ > looks good to me. > > I only noticed that some instructs (e.g. for xscvdpspn and xxspltw) have no ?format %{ ? %}? specification so they are missing in the PrintOptoAssembly output. But this seems to be missing in the current version already. > > We can test it while waiting for a 2nd review. > > Thanks and best regards, > Martin > > > *From:* Michihiro Horie * > Sent:* Donnerstag, 13. September 2018 19:05* > To:* Doerr, Martin * > Cc:* Lindenmaier, Goetz ; Gustavo Romero ; hotspot-compiler-dev at openjdk.java.net* > Subject:* RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad > > Hi Martin, > > Thank you so much for your review (and adding the ID in the subject :-). > > >I don?t think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used. > You're right, thanks. I removed a redundant one. > > I also refactored ReplicateD with vector length 2. > > Following is the latest webrev:_ > __http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/_ > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > Inactive hide details for "Doerr, Martin" ---2018/09/13 16:25:33---Hi Michihiro, I have added "RFR(S): 8210660" to the subject."Doerr, Martin" ---2018/09/13 16:25:33---Hi Michihiro, I have added "RFR(S): 8210660" to the subject. > > From: "Doerr, Martin" <_martin.doerr at sap.com_ > > To: Michihiro Horie <_HORIE at jp.ibm.com_ >, "_hotspot-compiler-dev at openjdk.java.net_ " <_hotspot-compiler-dev at openjdk.java.net_ > > Cc: Gustavo Romero <_gromero at linux.vnet.ibm.com_ >, "Lindenmaier, Goetz" <_goetz.lindenmaier at sap.com_ > > Date: 2018/09/13 16:25 > Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > > Hi Michihiro, > > I have added ?RFR(S): 8210660? to the subject. > > I don?t think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used. > Besides this, your change looks good to me. > > Would you like to improve ReplicateD with vector length 2, too? > > Thanks and best regards, > Martin > > * > From:* Michihiro Horie <_HORIE at jp.ibm.com_ > * > Sent:* Mittwoch, 12. September 2018 18:11* > To:* _hotspot-compiler-dev at openjdk.java.net_ * > Cc:* Doerr, Martin <_martin.doerr at sap.com_ >; Gustavo Romero <_gromero at linux.vnet.ibm.com_ >; Lindenmaier, Goetz <_goetz.lindenmaier at sap.com_ >* > Subject:* RFR: PPC64: Mapping floating point registers to vsx registers in ppc.ad > > Dear all, > > Would you please review the following change? > > Bug: _https://bugs.openjdk.java.net/browse/JDK-8210660_ > Webrev: _http://cr.openjdk.java.net/~mhorie/8210660/webrev.00/_ > > In the current code emit for replicating the floating point value in ppc.ad, a floating point value is once stored in order to load as an integer value. However, when SuperwordUseVSX is enabled, this is redundant because the floating point registers are overlapped with vsx registers 0-31. We can use vsx instructions for replicating the floating point value by mapping the floating point register to the vsx register. > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > From HORIE at jp.ibm.com Tue Sep 18 02:35:46 2018 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Tue, 18 Sep 2018 11:35:46 +0900 Subject: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad In-Reply-To: References: <5e113528b9c94b5089b8b7f1b8b22ed8@sap.com> Message-ID: Hi Gustavo, Thanks a lot for your comments and review! I uploaded a latest webrev with closing quotes and removing trailing spaces. http://cr.openjdk.java.net/~mhorie/8210660/webrev.03/ Best regards, -- Michihiro, IBM Research - Tokyo From: Gustavo Romero To: Michihiro Horie/Japan/IBM at IBMJP, "Doerr, Martin" Cc: "Lindenmaier, Goetz" , "hotspot-compiler-dev at openjdk.java.net" Date: 2018/09/18 06:48 Subject: Re: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad Hi Michi, On 09/14/2018 08:42 AM, Michihiro Horie wrote: > Hi Martin, > > Thank you for your comment to improve this change and testing it. I uploaded a new webrev with format statements.< http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.02/> > http://cr.openjdk.java.net/~mhorie/8210660/webrev.02/ < http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.02/> Thanks for the updated webrev. Only some nits: Besides the missing closing quotes (") in MTVSRWZ and XXSPLTW format strings I see trailing spaces in the following lines: - immI8 zero %{ (int) 0 %} + immI8 zero %{ (int) 0 %} - xscvdpspn_regF(tmpV, src); + xscvdpspn_regF(tmpV, src); Curious enough, jcheck [1] is not complaining about them. I found it because I set the color extension [2] in .hgrc: [extensions] color = which marks trailing whitespaces in red. I looks like some trailing spaces slipped also into related change "8188139: PPC64: Superword Level Parallelization with VSX", in case you want to fix them in a next change. Finally, I think it would be better in XXPERMDI format string to replace "Permute 16-byte register" to something like "Splat doubleword" to be like the description in XXSPLTW that says "Splat word". Otherwise, LGTM. Reviewed. I'll sponsor that change. Best regards, Gustavo [1] http://openjdk.java.net/projects/code-tools/jcheck/ [2] https://www.mercurial-scm.org/wiki/ColorExtension > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > Inactive hide details for "Doerr, Martin" ---2018/09/14 17:30:04---Hi Michihiro, your webrev"Doerr, Martin" ---2018/09/14 17:30:04---Hi Michihiro, your webrev > > From: "Doerr, Martin" > To: Michihiro Horie > Cc: "Lindenmaier, Goetz" , Gustavo Romero , "hotspot-compiler-dev at openjdk.java.net" > Date: 2018/09/14 17:30 > Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > Hi Michihiro, > > your webrev > _http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/_ < http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.01/> > looks good to me. > > I only noticed that some instructs (e.g. for xscvdpspn and xxspltw) have no ?format %{ ? %}? specification so they are missing in the PrintOptoAssembly output. But this seems to be missing in the current version already. > > We can test it while waiting for a 2nd review. > > Thanks and best regards, > Martin > > > *From:* Michihiro Horie * > Sent:* Donnerstag, 13. September 2018 19:05* > To:* Doerr, Martin * > Cc:* Lindenmaier, Goetz ; Gustavo Romero ; hotspot-compiler-dev at openjdk.java.net* > Subject:* RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad > > Hi Martin, > > Thank you so much for your review (and adding the ID in the subject :-). > > >I don?t think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used. > You're right, thanks. I removed a redundant one. > > I also refactored ReplicateD with vector length 2. > > Following is the latest webrev:_ > __http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/_ < http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.00/> > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > Inactive hide details for "Doerr, Martin" ---2018/09/13 16:25:33---Hi Michihiro, I have added "RFR(S): 8210660" to the subject."Doerr, Martin" ---2018/09/13 16:25:33---Hi Michihiro, I have added "RFR(S): 8210660" to the subject. > > From: "Doerr, Martin" <_martin.doerr at sap.com_ < mailto:martin.doerr at sap.com>> > To: Michihiro Horie <_HORIE at jp.ibm.com_ >, "_hotspot-compiler-dev at openjdk.java.net_ < mailto:hotspot-compiler-dev at openjdk.java.net>" <_hotspot-compiler-dev at openjdk.java.net_ < mailto:hotspot-compiler-dev at openjdk.java.net>> > Cc: Gustavo Romero <_gromero at linux.vnet.ibm.com_ < mailto:gromero at linux.vnet.ibm.com>>, "Lindenmaier, Goetz" <_goetz.lindenmaier at sap.com_ > > Date: 2018/09/13 16:25 > Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > > Hi Michihiro, > > I have added ?RFR(S): 8210660? to the subject. > > I don?t think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used. > Besides this, your change looks good to me. > > Would you like to improve ReplicateD with vector length 2, too? > > Thanks and best regards, > Martin > > * > From:* Michihiro Horie <_HORIE at jp.ibm.com_ > * > Sent:* Mittwoch, 12. September 2018 18:11* > To:* _hotspot-compiler-dev at openjdk.java.net_ < mailto:hotspot-compiler-dev at openjdk.java.net>* > Cc:* Doerr, Martin <_martin.doerr at sap.com_ >; Gustavo Romero <_gromero at linux.vnet.ibm.com_ < mailto:gromero at linux.vnet.ibm.com>>; Lindenmaier, Goetz <_goetz.lindenmaier at sap.com_ >* > Subject:* RFR: PPC64: Mapping floating point registers to vsx registers in ppc.ad > > Dear all, > > Would you please review the following change? > > Bug: _https://bugs.openjdk.java.net/browse/JDK-8210660_ > Webrev: _http://cr.openjdk.java.net/~mhorie/8210660/webrev.00/_ < http://cr.openjdk.java.net/%7Emhorie/8210660/webrev.00/> > > In the current code emit for replicating the floating point value in ppc.ad, a floating point value is once stored in order to load as an integer value. However, when SuperwordUseVSX is enabled, this is redundant because the floating point registers are overlapped with vsx registers 0-31. We can use vsx instructions for replicating the floating point value by mapping the floating point register to the vsx register. > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From gromero at linux.vnet.ibm.com Tue Sep 18 03:46:39 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 18 Sep 2018 00:46:39 -0300 Subject: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad In-Reply-To: References: <5e113528b9c94b5089b8b7f1b8b22ed8@sap.com> Message-ID: <5532522c-e1d5-f003-8630-a663978d5f21@linux.vnet.ibm.com> Hi Michi, Thanks for the updated webrev. Pushed. Best regards, Gustavo On 09/17/2018 11:35 PM, Michihiro Horie wrote: > Hi Gustavo, > > Thanks a lot for your comments and review! I uploaded a latest webrev with closing quotes and removing trailing spaces. > > http://cr.openjdk.java.net/~mhorie/8210660/webrev.03/ > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > Inactive hide details for Gustavo Romero ---2018/09/18 06:48:48---Hi Michi, On 09/14/2018 08:42 AM, Michihiro Horie wrote:Gustavo Romero ---2018/09/18 06:48:48---Hi Michi, On 09/14/2018 08:42 AM, Michihiro Horie wrote: > > From: Gustavo Romero > To: Michihiro Horie/Japan/IBM at IBMJP, "Doerr, Martin" > Cc: "Lindenmaier, Goetz" , "hotspot-compiler-dev at openjdk.java.net" > Date: 2018/09/18 06:48 > Subject: Re: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > Hi Michi, > > On 09/14/2018 08:42 AM, Michihiro Horie wrote: > > Hi Martin, > > > > Thank you for your comment to improve this change and testing it. I uploaded a new webrev with format statements. > > http://cr.openjdk.java.net/~mhorie/8210660/webrev.02/ ? > > Thanks for the updated webrev. > > Only some nits: > > Besides the missing closing quotes (") in MTVSRWZ and XXSPLTW format > strings I see trailing spaces in the following lines: > > - ? ?immI8 ?zero %{ (int) ?0 %} > + ? ?immI8 ?zero %{ (int) ?0 %} > > - ? ?xscvdpspn_regF(tmpV, src); > + ? ?xscvdpspn_regF(tmpV, src); > > Curious enough, jcheck [1] is not complaining about them. I found it because I > set the color extension [2] in .hgrc: > > [extensions] > color = > > which marks trailing whitespaces in red. > > I looks like some trailing spaces slipped also into related change > "8188139: PPC64: Superword Level Parallelization with VSX", in case you want to > fix them in a next change. > > Finally, I think it would be better in XXPERMDI format string to replace > "Permute 16-byte register" to something like "Splat doubleword" to be like the > description in XXSPLTW that says "Splat word". > > Otherwise, LGTM. Reviewed. > > I'll sponsor that change. > > > Best regards, > Gustavo > > [1] http://openjdk.java.net/projects/code-tools/jcheck/ > [2] https://www.mercurial-scm.org/wiki/ColorExtension > > > > > Best regards, > > -- > > Michihiro, > > IBM Research - Tokyo > > > > Inactive hide details for "Doerr, Martin" ---2018/09/14 17:30:04---Hi Michihiro, your webrev"Doerr, Martin" ---2018/09/14 17:30:04---Hi Michihiro, your webrev > > > > From: "Doerr, Martin" > > To: Michihiro Horie > > Cc: "Lindenmaier, Goetz" , Gustavo Romero , "hotspot-compiler-dev at openjdk.java.net" > > Date: 2018/09/14 17:30 > > Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > > > > > Hi Michihiro, > > > > your webrev > > _http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/_ > > looks good to me. > > > > I only noticed that some instructs (e.g. for xscvdpspn and xxspltw) have no ?format %{ ? %}? specification so they are missing in the PrintOptoAssembly output. But this seems to be missing in the current version already. > > > > We can test it while waiting for a 2nd review. > > > > Thanks and best regards, > > Martin > > > > > > *From:* Michihiro Horie * > > Sent:* Donnerstag, 13. September 2018 19:05* > > To:* Doerr, Martin * > > Cc:* Lindenmaier, Goetz ; Gustavo Romero ; hotspot-compiler-dev at openjdk.java.net* > > Subject:* RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad > > > > Hi Martin, > > > > Thank you so much for your review (and adding the ID in the subject :-). > > > > ?>I don?t think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used. > > You're right, thanks. I removed a redundant one. > > > > I also refactored ReplicateD with vector length 2. > > > > Following is the latest webrev:_ > > __http://cr.openjdk.java.net/~mhorie/8210660/webrev.01/_ > > > > > > Best regards, > > -- > > Michihiro, > > IBM Research - Tokyo > > > > Inactive hide details for "Doerr, Martin" ---2018/09/13 16:25:33---Hi Michihiro, I have added "RFR(S): 8210660" to the subject."Doerr, Martin" ---2018/09/13 16:25:33---Hi Michihiro, I have added "RFR(S): 8210660" to the subject. > > > > From: "Doerr, Martin" <_martin.doerr at sap.com_ > > > To: Michihiro Horie <_HORIE at jp.ibm.com_ >, "_hotspot-compiler-dev at openjdk.java.net_ " <_hotspot-compiler-dev at openjdk.java.net_ > > > Cc: Gustavo Romero <_gromero at linux.vnet.ibm.com_ >, "Lindenmaier, Goetz" <_goetz.lindenmaier at sap.com_ > > > Date: 2018/09/13 16:25 > > Subject: RE: RFR(S): 8210660: PPC64: Mapping floating point registers to vsx registers in ppc.ad > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > > > > > > > Hi Michihiro, > > > > I have added ?RFR(S): 8210660? to the subject. > > > > I don?t think we need 2 nodes for ReplicateF with vector length 4. Both ones in your webrev are for Power8 so only one will be used. > > Besides this, your change looks good to me. > > > > Would you like to improve ReplicateD with vector length 2, too? > > > > Thanks and best regards, > > Martin > > > > * > > From:* Michihiro Horie <_HORIE at jp.ibm.com_ > * > > Sent:* Mittwoch, 12. September 2018 18:11* > > To:* _hotspot-compiler-dev at openjdk.java.net_ * > > Cc:* Doerr, Martin <_martin.doerr at sap.com_ >; Gustavo Romero <_gromero at linux.vnet.ibm.com_ >; Lindenmaier, Goetz <_goetz.lindenmaier at sap.com_ >* > > Subject:* RFR: PPC64: Mapping floating point registers to vsx registers in ppc.ad > > > > Dear all, > > > > Would you please review the following change? > > > > Bug: _https://bugs.openjdk.java.net/browse/JDK-8210660_ > > Webrev: _http://cr.openjdk.java.net/~mhorie/8210660/webrev.00/_ > > > > In the current code emit for replicating the floating point value in ppc.ad, a floating point value is once stored in order to load as an integer value. However, when SuperwordUseVSX is enabled, this is redundant because the floating point registers are overlapped with vsx registers 0-31. We can use vsx instructions for replicating the floating point value by mapping the floating point register to the vsx register. > > > > > > Best regards, > > -- > > Michihiro, > > IBM Research - Tokyo > > > > > > > From jcbeyler at google.com Tue Sep 18 04:28:50 2018 From: jcbeyler at google.com (JC Beyler) Date: Mon, 17 Sep 2018 21:28:50 -0700 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com> <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com> <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com> <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com> <3c0fe42a-0606-a060-7435-32547676683b@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com> Message-ID: Hi Sandhya, How are you invoking the test for NativeCallTest? The way I would do it using jtreg would be something like this: $ export BUILD_TYPE=release $ export JDK_PATH=wherever you have your JDK >From the test subfolder: $ wherever-your-jtreg-is/bin/jtreg -nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/support/test/hotspot/jtreg/native/lib -jdk $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java Seems to pass for me. But much easier is: $ make run-test TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java" That seems to pass for me as well and is easier to use :) For information, the make run-test documentation is here: http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing.html Let me know if that helps, Jc Note: For NativeCallTest.java and many others I get the following message > in the corresponding .jtr file: > "test result: Error. Use -nativepath to specify the > location of native code" > Do I need to give any additional info to jtreg to get over > this problem? > > Thanks a lot! > Best Regards, > Sandhya > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Monday, September 17, 2018 10:14 AM > To: Viswanathan, Sandhya ; > hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction > > I finished testing on avx512 machine. > All passed except known (TestNaNVector.java) failures. > > Thanks, > Vladimir > > On 9/14/18 5:22 PM, Vladimir Kozlov wrote: > > I gave incorrect link to RFE. Here is correct: > > > > https://bugs.openjdk.java.net/browse/JDK-8210764 > > > > Vladimir > > > > On 9/14/18 3:49 PM, Vladimir Kozlov wrote: > >> Build failure I got on MacOS and Windows. Linux passed for some reason > so I am not surprise you did not noticed. > >> > >> Anyway. I tested with these changes and got next failures on avx1 > machines. I am planning to run on avx512 too. > >> > >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or > '-XX:CompileThreshold=100 -XX:-TieredCompilation' on CPU > >> with AVX1 only > >> > >> # SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884 > >> # Problematic frame: > >> # V [libjvm.so+0x46f0f0] MachNode::ideal_reg() const+0x20 > >> > >> Current CompileTask: > >> C2: 154 5 java.lang.String::equals (65 bytes) > >> > >> Stack: [0x00007f3b10044000,0x00007f3b10145000], > sp=0x00007f3b1013fe70, free space=1007k > >> Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > >> V [libjvm.so+0x46f0f0] MachNode::ideal_reg() const+0x20 > >> V [libjvm.so+0x882a72] PhaseChaitin::gather_lrg_masks(bool)+0x872 > >> V [libjvm.so+0xd82235] PhaseCFG::global_code_motion()+0xfc5 > >> V [libjvm.so+0xd824b1] PhaseCFG::do_global_code_motion()+0x51 > >> V [libjvm.so+0xa2c26c] Compile::Code_Gen()+0x24c > >> V [libjvm.so+0xa2ff82] Compile::Compile(ciEnv*, C2Compiler*, > ciMethod*, int, bool, bool, bool, DirectiveSet*)+0xe42 > >> > >> > ------------------------------------------------------------------------------------------------ > >> 2. > compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java > with '-Xcomp' > >> # Internal Error > (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, > tid=22073 > >> # assert(false) failed: cannot spill interval that is used in first > instruction (possible reason: no register found) > >> > >> Current CompileTask: > >> C1: 854767 13391 3 org.sunflow.math.Matrix4::multiply (692 > bytes) > >> > >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000], > sp=0x00007f23b9e7f9d0, free space=1014k > >> Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > >> V [libjvm.so+0x1882202] VMError::report_and_die(int, char const*, > char const*, __va_list_tag*, Thread*, unsigned > >> char*, void*, void*, char const*, int, unsigned long)+0x562 > >> V [libjvm.so+0x1882d2f] VMError::report_and_die(Thread*, void*, char > const*, int, char const*, char const*, > >> __va_list_tag*)+0x2f > >> V [libjvm.so+0xb0bea0] report_vm_error(char const*, int, char const*, > char const*, ...)+0x100 > >> V [libjvm.so+0x7e0410] > LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0 > >> V [libjvm.so+0x7e0a20] LinearScanWalker::activate_current()+0x280 > >> V [libjvm.so+0x7e0c7d] IntervalWalker::walk_to(int) [clone > .constprop.299]+0x9d > >> V [libjvm.so+0x7e1078] LinearScan::allocate_registers()+0x338 > >> V [libjvm.so+0x7e2135] LinearScan::do_linear_scan()+0x155 > >> V [libjvm.so+0x70a6bb] Compilation::emit_lir()+0x99b > >> V [libjvm.so+0x70caff] Compilation::compile_java_method()+0x42f > >> V [libjvm.so+0x70d974] Compilation::compile_method()+0x1d4 > >> V [libjvm.so+0x70e547] Compilation::Compilation(AbstractCompiler*, > ciEnv*, ciMethod*, int, BufferBlob*, > >> DirectiveSet*)+0x357 > >> V [libjvm.so+0x71073c] Compiler::compile_method(ciEnv*, ciMethod*, > int, DirectiveSet*)+0x14c > >> V [libjvm.so+0xa3cf89] > CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409 > >> > >> Vladimir > >> > >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote: > >>> > >>> Thanks Vladimir, the below should fix this issue: > >>> > >>> ------------------------------ > >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 > 13:10:23.488379912 -0700 > >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 > 13:10:23.308379915 -0700 > >>> @@ -233,22 +233,6 @@ > >>> _xmm_regs[13] = xmm13; > >>> _xmm_regs[14] = xmm14; > >>> _xmm_regs[15] = xmm15; > >>> - _xmm_regs[16] = xmm16; > >>> - _xmm_regs[17] = xmm17; > >>> - _xmm_regs[18] = xmm18; > >>> - _xmm_regs[19] = xmm19; > >>> - _xmm_regs[20] = xmm20; > >>> - _xmm_regs[21] = xmm21; > >>> - _xmm_regs[22] = xmm22; > >>> - _xmm_regs[23] = xmm23; > >>> - _xmm_regs[24] = xmm24; > >>> - _xmm_regs[25] = xmm25; > >>> - _xmm_regs[26] = xmm26; > >>> - _xmm_regs[27] = xmm27; > >>> - _xmm_regs[28] = xmm28; > >>> - _xmm_regs[29] = xmm29; > >>> - _xmm_regs[30] = xmm30; > >>> - _xmm_regs[31] = xmm31; > >>> #endif // _LP64 > >>> > >>> for (int i = 0; i < 8; i++) { > >>> --------------------------------- > >>> > >>> I think the gcc version on my desktop is older so didn?t catch this. > >>> > >>> The updated patch along with the above change and setting UseAVX as 3 > is uploaded to: > >>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/ > >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735 > >>> > >>> FYI, I did notice that the default for UseAVX had been rolled back and > wanted to get confirmation from you before > >>> changing it back to 3. > >>> > >>> Best Regards, > >>> Sandhya > >>> > >>> > >>> -----Original Message----- > >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > >>> Sent: Friday, September 14, 2018 12:13 PM > >>> To: Viswanathan, Sandhya ; > hotspot-compiler-dev at openjdk.java.net > >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction > >>> > >>> I got build failure: > >>> > >>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: > array index 16 is past the end of the array > >>> (which contains 16 elements) [-Werror,-Warray-bounds] > >>> jib > _xmm_regs[16] = xmm16; > >>> > >>> I also noticed that we don't have RFE for this work. I filed: > >>> > >>> https://bugs.openjdk.java.net/browse/JDK-8209735 > >>> > >>> You did not enabled avx512 by default (8209735 change was synced from > jdk 11 into 12 2 weeks ago). I added next > >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp: > >>> > >>> - product(intx, UseAVX, 2, \ > >>> + product(intx, UseAVX, 3, \ > >>> > >>> Thanks, > >>> Vladimir > >>> > >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote: > >>>> Looks good to me. I will start testing and let you know results. > >>>> > >>>> Thanks, > >>>> Vladimir > >>>> > >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote: > >>>>> Hi Vladimir, > >>>>> > >>>>> Please find below the updated webrev with all your comments > incorporated: > >>>>> > >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/ > >>>>> > >>>>> I have run the jtreg compiler tests on SKX and KNL which have two > >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also > tested SPECjvm2008 on the three platforms. > >>>>> > >>>>> Best Regards, > >>>>> Sandhya > >>>>> > >>>>> > >>>>> -----Original Message----- > >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > >>>>> Sent: Tuesday, September 11, 2018 8:54 PM > >>>>> To: Viswanathan, Sandhya ; > >>>>> hotspot-compiler-dev at openjdk.java.net > >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction > >>>>> > >>>>> Thank you, Sandhya > >>>>> > >>>>> I am satisfied with your detailed answer for memory loads issues. > Okay lets not add them. > >>>>> > >>>>> Vladimir > >>>>> > >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote: > >>>>>> Hi Vladimir, > >>>>>> > >>>>>> Thanks a lot for the detailed review. I really appreciate your > feedback. > >>>>>> Please see my response in your email below marked with (Sandhya > >>>>>>>>> ). Looking forward to your advice. > >>>>>> > >>>>>> Best Regards, > >>>>>> Sandhya > >>>>>> > >>>>>> > >>>>>> -----Original Message----- > >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM > >>>>>> To: Viswanathan, Sandhya ; > >>>>>> hotspot-compiler-dev at openjdk.java.net > >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 > >>>>>> instruction > >>>>>> > >>>>>> Thank you. > >>>>>> > >>>>>> I want to discuss next issue: > >>>>>> > >>>>>> > You did not added instructions to load these registers from > >>>>>> memory (and stack). What happens in such cases when you need to > load or store? > >>>>>> >>>> Let us take an example, e.g. for loading into rregF. First > >>>>>> it gets loaded from memory into regF and then register to register > move to rregF and vice versa. > >>>>>> > >>>>>> This is what I thought. You increase registers pressure this way > which may cause spills on stack. > >>>>>> Also we don't check that register could be the same as result you > may get unneeded moves. > >>>>>> > >>>>>> I would advice add memory moves at least. > >>>>>> > >>>>>> Sandhya >>> I had added those rules initially and removed them in > >>>>>> the final patch. I noticed that the register allocator uses the > >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask > >>>>>> (matcher.cpp). I would like the register allocator to get all the > >>>>>> possible register on an architecture for idealreg2reg mask. I > >>>>>> wondered that multiple instruct rules in .ad file for LoadF from > >>>>>> memory might cause problems. I would have to have higher cost for > >>>>>> loading into restricted register set like vlReg. Then I decided that > >>>>>> the register allocator can handle this in much better way than me > >>>>>> adding rules to load from memory. This is with the background that > the regF is always all the available registers > >>>>>> and vlRegF is the restricted register set. Likewise for VecS and > legVecS. Let me know you thoughts on this and if > >>>>>> I should still add the rules to load from memory into vlReg and > legVec. The specific code from matcher.cpp that I > >>>>>> am referring to is: > >>>>>> MachNode *spillCP = match_tree(new > >>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); > >>>>>> #endif > >>>>>> MachNode *spillI = match_tree(new > >>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); > >>>>>> MachNode *spillL = match_tree(new > >>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, > >>>>>> LoadNode::DependsO nlyOnTest, false)); > >>>>>> MachNode *spillF = match_tree(new > >>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); > >>>>>> MachNode *spillD = match_tree(new > >>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); > >>>>>> MachNode *spillP = match_tree(new > >>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); > >>>>>> .... > >>>>>> idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); > >>>>>> > >>>>>> An other question. You use movflt() and movdbl() which use either > >>>>>> movap[s|d] and movs[s|d] > >>>>>> instructions: > >>>>>> > http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu > >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when > >>>>>> avx512vl is not available? I see for vectors you use > >>>>>> vpxor+vinserti* combination. > >>>>>> > >>>>>> Sandhya >>> Yes the scalar floating point instructions are available > >>>>>> with AVX512 encoding when avx512vl is not available. That is why you > >>>>>> would see not just movflt, movdbl but all the other scalar > >>>>>> operations like adds, addsd etc using the entire xmm range > (xmm0-31). In other words they are AVX512F instructions. > >>>>>> > >>>>>> Last question. I notice next UseAVX check in vectors spills code in > x86.ad: > >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl()) > >>>>>> > >>>>>> Should it be (UseAVX < 3)? > >>>>>> > >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the > updated webrev. > >>>>>> > >>>>>> Thanks, > >>>>>> Vladimir > >>>>>> > >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: > >>>>>>> Hi Vladimir, > >>>>>>> > >>>>>>> Thanks a lot for your review and feedback. Please see my response > >>>>>>> in your email below. I will send an updated webrev incorporating > your feedback. > >>>>>>> > >>>>>>> Best Regards, > >>>>>>> Sandhya > >>>>>>> > >>>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > >>>>>>> Sent: Monday, September 10, 2018 6:09 PM > >>>>>>> To: Viswanathan, Sandhya ; > >>>>>>> hotspot-compiler-dev at openjdk.java.net > >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 > >>>>>>> instruction > >>>>>>> > >>>>>>> Very nice. Thank you, Sandhya. > >>>>>>> > >>>>>>> I would like to see more meaningful naming in .ad files - instead > >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*. > >>>>>>> > >>>>>>>>>> Yes, accepted. > >>>>>>> > >>>>>>> New load_from_* and load_to_* instructions in .ad files should be > >>>>>>> renamed to next and collocate with other Move*_reg_reg* > instructions: > >>>>>>> > >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, > >>>>>>> vlRegF src) > >>>>>>>>>> Yes, accepted. > >>>>>>> > >>>>>>> You did not added instructions to load these registers from memory > >>>>>>> (and stack). What happens in such cases when you need to load or > store? > >>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it > >>>>>>>>>> gets loaded from memory into regF and then register to register > move to rregF and vice versa. > >>>>>>> > >>>>>>> Also please explain why these registers are used when UseAVX == 0?: > >>>>>>> > >>>>>>> +instruct absD_reg(rregD dst) %{ > >>>>>>> predicate((UseSSE>=2) && (UseAVX == 0)); > >>>>>>> > >>>>>>> we switch off evex so regular regD (only legacy register in this > case) should work too: > >>>>>>> 661 if (UseAVX < 3) { > >>>>>>> 662 _features &= ~CPU_AVX512F; > >>>>>>> > >>>>>>>>>> Yes, accepted. It could be regD here. > >>>>>>> > >>>>>>> Next checks could be combined by using new function in > vm_version_x86.hpp (you already have some): > >>>>>>> > >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, > >>>>>>> +vectors_reg_legacy, %{ > >>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() && > >>>>>>> VM_Version::supports_avx512dq() && > >>>>>>> VM_Version::supports_avx512vl() %} ); > >>>>>>> > >>>>>>>>>> Yes, accepted. > >>>>>>> > >>>>>>> I would suggest to test these changes on different machines > >>>>>>> (non-avx512 and avx512) and with different UseAVX values. > >>>>>>> > >>>>>>>>>> Will do. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Vladimir > >>>>>>> > >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: > >>>>>>>> Recently there have been couple of high priority issues with > >>>>>>>> regards to high bank of XMM register > >>>>>>>> (XMM16-XMM31) usage by C2: > >>>>>>>> > >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746 > >>>>>>>> > >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735 > >>>>>>>> > >>>>>>>> Please find below a patch which attempts to clean up the XMM > >>>>>>>> register handling by using register groups. > >>>>>>>> > >>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ > >>>>>>>> > >>>>>>>> > >>>>>>>> The patch provides a restricted set of registers to the match > >>>>>>>> rules in the ad file based on the underlying architecture. > >>>>>>>> > >>>>>>>> The aim is to remove special handling/workaround from macro > assembler and assembler. > >>>>>>>> > >>>>>>>> By removing the special handling, the patch reduces the overall > >>>>>>>> code size by about 1800 lines of code. > >>>>>>>> > >>>>>>>> Your review and feedback is very welcome. > >>>>>>>> > >>>>>>>> Best Regards, > >>>>>>>> > >>>>>>>> Sandhya > >>>>>>>> > -- Thanks, Jc -------------- next part -------------- An HTML attachment was scrubbed... URL: From Pengfei.Li at arm.com Tue Sep 18 07:13:00 2018 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Tue, 18 Sep 2018 07:13:00 +0000 Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by power-of-2 check In-Reply-To: <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com> References: <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com> Message-ID: Hi Reviewers, Is there any other comments, objections or suggestions on the new webrev? If no problems, could anyone help to push this commit? I look forward to your response. -- Thanks, Pengfei > -----Original Message----- > > Looks good. > > Thanks, > Vladimir > > On 9/12/18 2:50 AM, Pengfei Li (Arm Technology China) wrote: > > Hi, > > > > I've updated the patch based on Vladimir's comment. I added checks for > SubI on both branches of the diamond phi. > > Also thanks Roland for the suggestion that supporting a Phi with 3 or more > inputs. But I think the matching rule will be much more complex if we add > this. And I'm not sure if there are any real case scenario which can benefit > from this support. So I didn't add it in. > > > > New webrev: http://cr.openjdk.java.net/~njian/8210152/webrev.01/ > > I've run jtreg full test with the new patch and no new issues found. > > > > Please let me know if you have other comments or suggestions. If no > further issues, I need your help to sponsor and push the patch. > > > > -- > > Thanks, > > Pengfei > > > > From Pengfei.Li at arm.com Tue Sep 18 07:40:53 2018 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Tue, 18 Sep 2018 07:40:53 +0000 Subject: [aarch64-port-dev ] RFR(S): 8210413: AArch64: Optimize div/rem by constant in C1 In-Reply-To: <9645c210-3d87-52fa-8051-54dc60629866@redhat.com> References: <9645c210-3d87-52fa-8051-54dc60629866@redhat.com> Message-ID: Hi Andrew, Thanks for your reminder. I'm adding this for longs and testing it. I will send out a new webrev later. -- Thanks, Pengfei > -----Original Message----- > > Hi, > > On 09/13/2018 10:04 AM, Pengfei Li (Arm Technology China) wrote: > > > Could you please help review this optimization in C1 AArch64? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8210413 > > webrev: http://cr.openjdk.java.net/~njian/8210413/webrev.00/ > > It looks fine, but it's really odd that this is only implemented for ints and not > longs. Can you do longs too? > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rkennke at redhat.com Tue Sep 18 07:58:42 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 18 Sep 2018 09:58:42 +0200 Subject: RFR: JDK-8210829: Modularize allocations in C2 Message-ID: <8fe187e2-b3fb-5473-257f-e4f6b6a94fcb@redhat.com> Similar to what we've done before to runtime, interpreter and C1, allocations should be owned and implemented by GC, and possible to override by specific collectors. For example, Shenandoah lays out objects differently in heap, and needs one extra store to initialize objects. This proposed change factors out the interesting part of object allocation (i.e. the actual allocation) into BarrierSetC2. It's mostly a move-and-rename-job. I had to move some little things around, that is: - for the need-gc-check, I'm passing back the needgc_ctrl to plug into slow-path - for prefetching, instead of passing around the 'length' node, only to determine the number of prefetch lines, I determine this early, and pass through the lines arg. - i_o, needgc_ctrl, fast_oop_ctrl, fast_oop_rawmem are passed as in/out or out-args to stitch together into the regions and phis as appropriate. I see no easy way around that. I tested this using hotspot/jtreg:tier1 and also verified that this fills Shenandoah's needs and run tier3_gc_shenandoah testsuite. http://cr.openjdk.java.net/~rkennke/JDK-8210829/webrev.00/ Can I please get reviews? Thanks, Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From doug.simon at oracle.com Tue Sep 18 08:00:09 2018 From: doug.simon at oracle.com (Doug Simon) Date: Tue, 18 Sep 2018 10:00:09 +0200 Subject: RFR: 8210793: [JVMCI] AllocateCompileIdTest.java failed to find DiagnosticCommand.class Message-ID: <8FC2A8D7-F1FA-4DF0-857E-8A6B0D617884@oracle.com> Please review this tiny change to ensure that the DiagnosticCommand test library class is compiled as part of AllocateCompileIdTest. https://bugs.openjdk.java.net/browse/JDK-8210793 http://cr.openjdk.java.net/~dnsimon/8210793/ -Doug From tobias.hartmann at oracle.com Tue Sep 18 12:18:51 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 18 Sep 2018 08:18:51 -0400 Subject: RFR: 8210793: [JVMCI] AllocateCompileIdTest.java failed to find DiagnosticCommand.class In-Reply-To: <8FC2A8D7-F1FA-4DF0-857E-8A6B0D617884@oracle.com> References: <8FC2A8D7-F1FA-4DF0-857E-8A6B0D617884@oracle.com> Message-ID: <529cd19b-faaa-f68c-333c-7a60b0f06d7f@oracle.com> Hi Doug, looks good to me. Best regards, Tobias On 18.09.2018 04:00, Doug Simon wrote: > Please review this tiny change to ensure that the DiagnosticCommand test library class is compiled as part of AllocateCompileIdTest. > > https://bugs.openjdk.java.net/browse/JDK-8210793 > http://cr.openjdk.java.net/~dnsimon/8210793/ > > -Doug > From doug.simon at oracle.com Tue Sep 18 13:08:03 2018 From: doug.simon at oracle.com (Doug Simon) Date: Tue, 18 Sep 2018 15:08:03 +0200 Subject: RFR: 8210793: [JVMCI] AllocateCompileIdTest.java failed to find DiagnosticCommand.class In-Reply-To: <529cd19b-faaa-f68c-333c-7a60b0f06d7f@oracle.com> References: <8FC2A8D7-F1FA-4DF0-857E-8A6B0D617884@oracle.com> <529cd19b-faaa-f68c-333c-7a60b0f06d7f@oracle.com> Message-ID: <746A2C0D-6AD9-4904-BD99-A1BFC57FBC85@oracle.com> Thanks Tobias. -Doug > On 18 Sep 2018, at 14:18, Tobias Hartmann wrote: > > Hi Doug, > > looks good to me. > > Best regards, > Tobias > > On 18.09.2018 04:00, Doug Simon wrote: >> Please review this tiny change to ensure that the DiagnosticCommand test library class is compiled as part of AllocateCompileIdTest. >> >> https://bugs.openjdk.java.net/browse/JDK-8210793 >> http://cr.openjdk.java.net/~dnsimon/8210793/ >> >> -Doug >> From kuaiwei.kw at alibaba-inc.com Tue Sep 18 13:33:50 2018 From: kuaiwei.kw at alibaba-inc.com (Kuai Wei) Date: Tue, 18 Sep 2018 21:33:50 +0800 Subject: =?UTF-8?B?W1BhdGNoXSA4MjEwODUzOiBDMiBkb2Vzbid0IHNraXAgcG9zdCBiYXJyaWVyIGZvciBuZXcg?= =?UTF-8?B?YWxsb2NhdGVkIG9iamVjdHM=?= Message-ID: <37b6e1a5-6dcf-4097-baca-3f9c03d38bb1.kuaiwei.kw@alibaba-inc.com> Hi, I made a patch to https://bugs.openjdk.java.net/browse/JDK-8210853 . Could you help review my change? Background: C2 could remove G1 post barrier if store to new allocated object. But the check of just_allocated_object will be prevent by a Region node which is created when inline initialize method of super class. The change is to check the pattern and skip the Region node. src/hotspot/share/opto/graphKit.cpp // We use this to determine if an object is so "fresh" that // it does not require card marks. Node* GraphKit::just_allocated_object(Node* current_control) { - if (C->recent_alloc_ctl() == current_control) + Node * ctrl = current_control; + // Object:: is invoked after allocation, most of invoke nodes + // will be reduced, but a region node is kept in parse time, we check + // the pattern and skip the region node + if (ctrl != NULL && ctrl->is_Region() && ctrl->req() == 2) { + ctrl = ctrl->in(1); + } + if (C->recent_alloc_ctl() == ctrl) return C->recent_alloc_obj(); return NULL; } -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Sep 18 16:40:32 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 18 Sep 2018 09:40:32 -0700 Subject: RFR: 8210793: [JVMCI] AllocateCompileIdTest.java failed to find DiagnosticCommand.class In-Reply-To: <529cd19b-faaa-f68c-333c-7a60b0f06d7f@oracle.com> References: <8FC2A8D7-F1FA-4DF0-857E-8A6B0D617884@oracle.com> <529cd19b-faaa-f68c-333c-7a60b0f06d7f@oracle.com> Message-ID: <2226eb54-622f-5cc6-c6e6-4a302a94d3c5@oracle.com> +1 Thanks, Vladimir On 9/18/18 5:18 AM, Tobias Hartmann wrote: > Hi Doug, > > looks good to me. > > Best regards, > Tobias > > On 18.09.2018 04:00, Doug Simon wrote: >> Please review this tiny change to ensure that the DiagnosticCommand test library class is compiled as part of AllocateCompileIdTest. >> >> https://bugs.openjdk.java.net/browse/JDK-8210793 >> http://cr.openjdk.java.net/~dnsimon/8210793/ >> >> -Doug >> From vladimir.kozlov at oracle.com Tue Sep 18 17:41:51 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 18 Sep 2018 10:41:51 -0700 Subject: RFR: JDK-8210829: Modularize allocations in C2 In-Reply-To: <8fe187e2-b3fb-5473-257f-e4f6b6a94fcb@redhat.com> References: <8fe187e2-b3fb-5473-257f-e4f6b6a94fcb@redhat.com> Message-ID: <09c53a34-eb68-a614-ee03-468673004b3b@oracle.com> Hi Roman, This looks good. I looked through changes and it generates the same ideal graph as before. It seems you unintentionally changed indent of the comment in barrierSetC2.hpp Thanks, Vladimir On 9/18/18 12:58 AM, Roman Kennke wrote: > Similar to what we've done before to runtime, interpreter and C1, > allocations should be owned and implemented by GC, and possible to > override by specific collectors. For example, Shenandoah lays out > objects differently in heap, and needs one extra store to initialize > objects. > > This proposed change factors out the interesting part of object > allocation (i.e. the actual allocation) into BarrierSetC2. It's mostly a > move-and-rename-job. I had to move some little things around, that is: > - for the need-gc-check, I'm passing back the needgc_ctrl to plug into > slow-path > - for prefetching, instead of passing around the 'length' node, only to > determine the number of prefetch lines, I determine this early, and pass > through the lines arg. > - i_o, needgc_ctrl, fast_oop_ctrl, fast_oop_rawmem are passed as in/out > or out-args to stitch together into the regions and phis as appropriate. > I see no easy way around that. > > I tested this using hotspot/jtreg:tier1 and also verified that this > fills Shenandoah's needs and run tier3_gc_shenandoah testsuite. > > http://cr.openjdk.java.net/~rkennke/JDK-8210829/webrev.00/ > > Can I please get reviews? > Thanks, > Roman > From rwestrel at redhat.com Tue Sep 18 19:47:46 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 18 Sep 2018 21:47:46 +0200 Subject: RFR(S): 8210389: C2: assert(n->outcnt() != 0 || C->top() == n || n->is_Proj()) failed: No dead instructions after post-alloc Message-ID: http://cr.openjdk.java.net/~roland/8210389/webrev.00/ With volatile loads, the trailing membar has an edge to the load. After optimizations, that edge can point to a chain of Phis and the membar can be the one use that keeps the phis alive. After matching, that required edge is converted to a precedence edge. Liveness analysis ignores precedence edges, the chain of phis is killed and register allocation finds a node with no use. As a fix, I propose that, at the end of optimizations, the edge between the volatile load's membar and the phis be removed and all dead phis be killed. As I understand, that edge is not required for correctness because anti dependencies detection code adds a precedence edge between a volatile load and its membar if needed. I ran full jcstress on x86 and aarch64 with this patch successfully. Roland. From rwestrel at redhat.com Tue Sep 18 19:57:53 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 18 Sep 2018 21:57:53 +0200 Subject: RFR(M): 8210885: Convert left over loads/stores to access api Message-ID: http://cr.openjdk.java.net/~roland/8210885/webrev.00/ This converts some remaining loads and stores to the access API (as preparation for shenandoah). This also cleans up the C2 access API: some entry points get a control argument that's in practice useless because current control() is always used. Roland. From rwestrel at redhat.com Tue Sep 18 20:09:46 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 18 Sep 2018 22:09:46 +0200 Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy Message-ID: http://cr.openjdk.java.net/~roland/8210887/webrev.00/ This extends the entry point of the c2 access api for arraycopy (in preparation for shenandoah). This also fixes some incorrect _adr_type's. It also modifies the ArrayCopyNode::Ideal() transform that produces a series of loads/stores so, as a subsequent change, barriers can be added to loads and stores: they need to produce and consume memory state other than the slice of the array being copied. Roland. From rwestrel at redhat.com Tue Sep 18 20:17:05 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 18 Sep 2018 22:17:05 +0200 Subject: RFR(S): 8210390: C2 still crashes with "assert(mode == ControlAroundStripMined && use == sfpt) failed: missed a node" In-Reply-To: <22635c3c-4c0e-ccc1-9853-46ffa56dd96c@oracle.com> References: <0976877f-eaa7-c5e6-fb52-0c0a70467c84@oracle.com> <22635c3c-4c0e-ccc1-9853-46ffa56dd96c@oracle.com> Message-ID: Thanks for the review, Vladimir. Roland. From sandhya.viswanathan at intel.com Tue Sep 18 20:47:18 2018 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 18 Sep 2018 20:47:18 +0000 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com> <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com> <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com> <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com> <3c0fe42a-0606-a060-7435-32547676683b@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF4A75@FMSMSX126.amr.corp.intel.com> Hi Jc, Thanks a lot for the steps. I am now able to verify my fix for the NativeCallTest.java. Best Regards, Sandhya From: JC Beyler [mailto:jcbeyler at google.com] Sent: Monday, September 17, 2018 9:29 PM To: Viswanathan, Sandhya Cc: vladimir.kozlov at oracle.com; hotspot-compiler-dev Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction Hi Sandhya, How are you invoking the test for NativeCallTest? The way I would do it using jtreg would be something like this: $ export BUILD_TYPE=release $ export JDK_PATH=wherever you have your JDK From the test subfolder: $ wherever-your-jtreg-is/bin/jtreg -nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/support/test/hotspot/jtreg/native/lib -jdk $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java Seems to pass for me. But much easier is: $ make run-test TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java" That seems to pass for me as well and is easier to use :) For information, the make run-test documentation is here: http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing.html Let me know if that helps, Jc Note: For NativeCallTest.java and many others I get the following message in the corresponding .jtr file: "test result: Error. Use -nativepath to specify the location of native code" Do I need to give any additional info to jtreg to get over this problem? Thanks a lot! Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Monday, September 17, 2018 10:14 AM To: Viswanathan, Sandhya >; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction I finished testing on avx512 machine. All passed except known (TestNaNVector.java) failures. Thanks, Vladimir On 9/14/18 5:22 PM, Vladimir Kozlov wrote: > I gave incorrect link to RFE. Here is correct: > > https://bugs.openjdk.java.net/browse/JDK-8210764 > > Vladimir > > On 9/14/18 3:49 PM, Vladimir Kozlov wrote: >> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed. >> >> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too. >> >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation' on CPU >> with AVX1 only >> >> # SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884 >> # Problematic frame: >> # V [libjvm.so+0x46f0f0] MachNode::ideal_reg() const+0x20 >> >> Current CompileTask: >> C2: 154 5 java.lang.String::equals (65 bytes) >> >> Stack: [0x00007f3b10044000,0x00007f3b10145000], sp=0x00007f3b1013fe70, free space=1007k >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x46f0f0] MachNode::ideal_reg() const+0x20 >> V [libjvm.so+0x882a72] PhaseChaitin::gather_lrg_masks(bool)+0x872 >> V [libjvm.so+0xd82235] PhaseCFG::global_code_motion()+0xfc5 >> V [libjvm.so+0xd824b1] PhaseCFG::do_global_code_motion()+0x51 >> V [libjvm.so+0xa2c26c] Compile::Code_Gen()+0x24c >> V [libjvm.so+0xa2ff82] Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, DirectiveSet*)+0xe42 >> >> ------------------------------------------------------------------------------------------------ >> 2. compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java with '-Xcomp' >> # Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073 >> # assert(false) failed: cannot spill interval that is used in first instruction (possible reason: no register found) >> >> Current CompileTask: >> C1: 854767 13391 3 org.sunflow.math.Matrix4::multiply (692 bytes) >> >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000], sp=0x00007f23b9e7f9d0, free space=1014k >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x1882202] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned >> char*, void*, void*, char const*, int, unsigned long)+0x562 >> V [libjvm.so+0x1882d2f] VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, >> __va_list_tag*)+0x2f >> V [libjvm.so+0xb0bea0] report_vm_error(char const*, int, char const*, char const*, ...)+0x100 >> V [libjvm.so+0x7e0410] LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0 >> V [libjvm.so+0x7e0a20] LinearScanWalker::activate_current()+0x280 >> V [libjvm.so+0x7e0c7d] IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d >> V [libjvm.so+0x7e1078] LinearScan::allocate_registers()+0x338 >> V [libjvm.so+0x7e2135] LinearScan::do_linear_scan()+0x155 >> V [libjvm.so+0x70a6bb] Compilation::emit_lir()+0x99b >> V [libjvm.so+0x70caff] Compilation::compile_java_method()+0x42f >> V [libjvm.so+0x70d974] Compilation::compile_method()+0x1d4 >> V [libjvm.so+0x70e547] Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, >> DirectiveSet*)+0x357 >> V [libjvm.so+0x71073c] Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c >> V [libjvm.so+0xa3cf89] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409 >> >> Vladimir >> >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote: >>> >>> Thanks Vladimir, the below should fix this issue: >>> >>> ------------------------------ >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700 >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700 >>> @@ -233,22 +233,6 @@ >>> _xmm_regs[13] = xmm13; >>> _xmm_regs[14] = xmm14; >>> _xmm_regs[15] = xmm15; >>> - _xmm_regs[16] = xmm16; >>> - _xmm_regs[17] = xmm17; >>> - _xmm_regs[18] = xmm18; >>> - _xmm_regs[19] = xmm19; >>> - _xmm_regs[20] = xmm20; >>> - _xmm_regs[21] = xmm21; >>> - _xmm_regs[22] = xmm22; >>> - _xmm_regs[23] = xmm23; >>> - _xmm_regs[24] = xmm24; >>> - _xmm_regs[25] = xmm25; >>> - _xmm_regs[26] = xmm26; >>> - _xmm_regs[27] = xmm27; >>> - _xmm_regs[28] = xmm28; >>> - _xmm_regs[29] = xmm29; >>> - _xmm_regs[30] = xmm30; >>> - _xmm_regs[31] = xmm31; >>> #endif // _LP64 >>> >>> for (int i = 0; i < 8; i++) { >>> --------------------------------- >>> >>> I think the gcc version on my desktop is older so didn?t catch this. >>> >>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to: >>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/ >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735 >>> >>> FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you before >>> changing it back to 3. >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Friday, September 14, 2018 12:13 PM >>> To: Viswanathan, Sandhya >; hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >>> >>> I got build failure: >>> >>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the array >>> (which contains 16 elements) [-Werror,-Warray-bounds] >>> jib > _xmm_regs[16] = xmm16; >>> >>> I also noticed that we don't have RFE for this work. I filed: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>> >>> You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp: >>> >>> - product(intx, UseAVX, 2, \ >>> + product(intx, UseAVX, 3, \ >>> >>> Thanks, >>> Vladimir >>> >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote: >>>> Looks good to me. I will start testing and let you know results. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote: >>>>> Hi Vladimir, >>>>> >>>>> Please find below the updated webrev with all your comments incorporated: >>>>> >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/ >>>>> >>>>> I have run the jtreg compiler tests on SKX and KNL which have two >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms. >>>>> >>>>> Best Regards, >>>>> Sandhya >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, September 11, 2018 8:54 PM >>>>> To: Viswanathan, Sandhya >; >>>>> hotspot-compiler-dev at openjdk.java.net >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >>>>> >>>>> Thank you, Sandhya >>>>> >>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them. >>>>> >>>>> Vladimir >>>>> >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote: >>>>>> Hi Vladimir, >>>>>> >>>>>> Thanks a lot for the detailed review. I really appreciate your feedback. >>>>>> Please see my response in your email below marked with (Sandhya >>>>>>>>> ). Looking forward to your advice. >>>>>> >>>>>> Best Regards, >>>>>> Sandhya >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM >>>>>> To: Viswanathan, Sandhya >; >>>>>> hotspot-compiler-dev at openjdk.java.net >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>>>>> instruction >>>>>> >>>>>> Thank you. >>>>>> >>>>>> I want to discuss next issue: >>>>>> >>>>>> > You did not added instructions to load these registers from >>>>>> memory (and stack). What happens in such cases when you need to load or store? >>>>>> >>>> Let us take an example, e.g. for loading into rregF. First >>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa. >>>>>> >>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack. >>>>>> Also we don't check that register could be the same as result you may get unneeded moves. >>>>>> >>>>>> I would advice add memory moves at least. >>>>>> >>>>>> Sandhya >>> I had added those rules initially and removed them in >>>>>> the final patch. I noticed that the register allocator uses the >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask >>>>>> (matcher.cpp). I would like the register allocator to get all the >>>>>> possible register on an architecture for idealreg2reg mask. I >>>>>> wondered that multiple instruct rules in .ad file for LoadF from >>>>>> memory might cause problems. I would have to have higher cost for >>>>>> loading into restricted register set like vlReg. Then I decided that >>>>>> the register allocator can handle this in much better way than me >>>>>> adding rules to load from memory. This is with the background that the regF is always all the available registers >>>>>> and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if >>>>>> I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I >>>>>> am referring to is: >>>>>> MachNode *spillCP = match_tree(new >>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>>>>> #endif >>>>>> MachNode *spillI = match_tree(new >>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); >>>>>> MachNode *spillL = match_tree(new >>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, >>>>>> LoadNode::DependsO nlyOnTest, false)); >>>>>> MachNode *spillF = match_tree(new >>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); >>>>>> MachNode *spillD = match_tree(new >>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); >>>>>> MachNode *spillP = match_tree(new >>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>>>>> .... >>>>>> idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); >>>>>> >>>>>> An other question. You use movflt() and movdbl() which use either >>>>>> movap[s|d] and movs[s|d] >>>>>> instructions: >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when >>>>>> avx512vl is not available? I see for vectors you use >>>>>> vpxor+vinserti* combination. >>>>>> >>>>>> Sandhya >>> Yes the scalar floating point instructions are available >>>>>> with AVX512 encoding when avx512vl is not available. That is why you >>>>>> would see not just movflt, movdbl but all the other scalar >>>>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions. >>>>>> >>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad: >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl()) >>>>>> >>>>>> Should it be (UseAVX < 3)? >>>>>> >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: >>>>>>> Hi Vladimir, >>>>>>> >>>>>>> Thanks a lot for your review and feedback. Please see my response >>>>>>> in your email below. I will send an updated webrev incorporating your feedback. >>>>>>> >>>>>>> Best Regards, >>>>>>> Sandhya >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>> Sent: Monday, September 10, 2018 6:09 PM >>>>>>> To: Viswanathan, Sandhya >; >>>>>>> hotspot-compiler-dev at openjdk.java.net >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>>>>>> instruction >>>>>>> >>>>>>> Very nice. Thank you, Sandhya. >>>>>>> >>>>>>> I would like to see more meaningful naming in .ad files - instead >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*. >>>>>>> >>>>>>>>>> Yes, accepted. >>>>>>> >>>>>>> New load_from_* and load_to_* instructions in .ad files should be >>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions: >>>>>>> >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, >>>>>>> vlRegF src) >>>>>>>>>> Yes, accepted. >>>>>>> >>>>>>> You did not added instructions to load these registers from memory >>>>>>> (and stack). What happens in such cases when you need to load or store? >>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it >>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa. >>>>>>> >>>>>>> Also please explain why these registers are used when UseAVX == 0?: >>>>>>> >>>>>>> +instruct absD_reg(rregD dst) %{ >>>>>>> predicate((UseSSE>=2) && (UseAVX == 0)); >>>>>>> >>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too: >>>>>>> 661 if (UseAVX < 3) { >>>>>>> 662 _features &= ~CPU_AVX512F; >>>>>>> >>>>>>>>>> Yes, accepted. It could be regD here. >>>>>>> >>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): >>>>>>> >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, >>>>>>> +vectors_reg_legacy, %{ >>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() && >>>>>>> VM_Version::supports_avx512dq() && >>>>>>> VM_Version::supports_avx512vl() %} ); >>>>>>> >>>>>>>>>> Yes, accepted. >>>>>>> >>>>>>> I would suggest to test these changes on different machines >>>>>>> (non-avx512 and avx512) and with different UseAVX values. >>>>>>> >>>>>>>>>> Will do. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: >>>>>>>> Recently there have been couple of high priority issues with >>>>>>>> regards to high bank of XMM register >>>>>>>> (XMM16-XMM31) usage by C2: >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746 >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>>>>>>> >>>>>>>> Please find below a patch which attempts to clean up the XMM >>>>>>>> register handling by using register groups. >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ >>>>>>>> >>>>>>>> >>>>>>>> The patch provides a restricted set of registers to the match >>>>>>>> rules in the ad file based on the underlying architecture. >>>>>>>> >>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler. >>>>>>>> >>>>>>>> By removing the special handling, the patch reduces the overall >>>>>>>> code size by about 1800 lines of code. >>>>>>>> >>>>>>>> Your review and feedback is very welcome. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> >>>>>>>> Sandhya >>>>>>>> -- Thanks, Jc -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Tue Sep 18 21:32:58 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 18 Sep 2018 23:32:58 +0200 Subject: RFR: JDK-8210752: Remaining explicit barriers for C2 In-Reply-To: <15be8e2d-dba5-2e8a-c851-b6821b81d4b3@redhat.com> References: <15be8e2d-dba5-2e8a-c851-b6821b81d4b3@redhat.com> Message-ID: > http://cr.openjdk.java.net/~rkennke/JDK-8210752/webrev.00/ That looks good to me. Roland. From rwestrel at redhat.com Tue Sep 18 21:39:00 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 18 Sep 2018 23:39:00 +0200 Subject: RFR: JDK-8210829: Modularize allocations in C2 In-Reply-To: <8fe187e2-b3fb-5473-257f-e4f6b6a94fcb@redhat.com> References: <8fe187e2-b3fb-5473-257f-e4f6b6a94fcb@redhat.com> Message-ID: > http://cr.openjdk.java.net/~rkennke/JDK-8210829/webrev.00/ That looks good to me. Roland. From sandhya.viswanathan at intel.com Tue Sep 18 23:52:43 2018 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 18 Sep 2018 23:52:43 +0000 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com> <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com> <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com> <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com> <3c0fe42a-0606-a060-7435-32547676683b@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF4BB3@FMSMSX126.amr.corp.intel.com> Hi Vladimir, Please find below the updated webrev with fixes for the two issues: Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.03/ RFE: https://bugs.openjdk.java.net/browse/JDK-8210764 Fix for compiler/vectorization/TestNaNVector.java was to pass legVecS as the temporary register type for intrinsics instead of legVecD. This test was only failing with -XX:MaxVectorSize=4. The file modified is x86_64.ad. Fix for compiler/vectorization/TestNaNVector.java was to allow all xmm registers (xmm0-xmm31) for C1 and handle floating point abs and negate appropriately by providing a temp register. The C1 files are modified for this fix. I reran compiler jtreg tests with setting nativepath appropriately on Haswell, SKX and KNL. Best Regards, Sandhya From: Viswanathan, Sandhya Sent: Tuesday, September 18, 2018 1:47 PM To: 'JC Beyler' Cc: vladimir.kozlov at oracle.com; hotspot-compiler-dev Subject: RE: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction Hi Jc, Thanks a lot for the steps. I am now able to verify my fix for the NativeCallTest.java. Best Regards, Sandhya From: JC Beyler [mailto:jcbeyler at google.com] Sent: Monday, September 17, 2018 9:29 PM To: Viswanathan, Sandhya > Cc: vladimir.kozlov at oracle.com; hotspot-compiler-dev > Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction Hi Sandhya, How are you invoking the test for NativeCallTest? The way I would do it using jtreg would be something like this: $ export BUILD_TYPE=release $ export JDK_PATH=wherever you have your JDK From the test subfolder: $ wherever-your-jtreg-is/bin/jtreg -nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/support/test/hotspot/jtreg/native/lib -jdk $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java Seems to pass for me. But much easier is: $ make run-test TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java" That seems to pass for me as well and is easier to use :) For information, the make run-test documentation is here: http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing.html Let me know if that helps, Jc Note: For NativeCallTest.java and many others I get the following message in the corresponding .jtr file: "test result: Error. Use -nativepath to specify the location of native code" Do I need to give any additional info to jtreg to get over this problem? Thanks a lot! Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Monday, September 17, 2018 10:14 AM To: Viswanathan, Sandhya >; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction I finished testing on avx512 machine. All passed except known (TestNaNVector.java) failures. Thanks, Vladimir On 9/14/18 5:22 PM, Vladimir Kozlov wrote: > I gave incorrect link to RFE. Here is correct: > > https://bugs.openjdk.java.net/browse/JDK-8210764 > > Vladimir > > On 9/14/18 3:49 PM, Vladimir Kozlov wrote: >> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed. >> >> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too. >> >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation' on CPU >> with AVX1 only >> >> # SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884 >> # Problematic frame: >> # V [libjvm.so+0x46f0f0] MachNode::ideal_reg() const+0x20 >> >> Current CompileTask: >> C2: 154 5 java.lang.String::equals (65 bytes) >> >> Stack: [0x00007f3b10044000,0x00007f3b10145000], sp=0x00007f3b1013fe70, free space=1007k >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x46f0f0] MachNode::ideal_reg() const+0x20 >> V [libjvm.so+0x882a72] PhaseChaitin::gather_lrg_masks(bool)+0x872 >> V [libjvm.so+0xd82235] PhaseCFG::global_code_motion()+0xfc5 >> V [libjvm.so+0xd824b1] PhaseCFG::do_global_code_motion()+0x51 >> V [libjvm.so+0xa2c26c] Compile::Code_Gen()+0x24c >> V [libjvm.so+0xa2ff82] Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, DirectiveSet*)+0xe42 >> >> ------------------------------------------------------------------------------------------------ >> 2. with '-Xcomp' >> # Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073 >> # assert(false) failed: cannot spill interval that is used in first instruction (possible reason: no register found) >> >> Current CompileTask: >> C1: 854767 13391 3 org.sunflow.math.Matrix4::multiply (692 bytes) >> >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000], sp=0x00007f23b9e7f9d0, free space=1014k >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x1882202] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned >> char*, void*, void*, char const*, int, unsigned long)+0x562 >> V [libjvm.so+0x1882d2f] VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, >> __va_list_tag*)+0x2f >> V [libjvm.so+0xb0bea0] report_vm_error(char const*, int, char const*, char const*, ...)+0x100 >> V [libjvm.so+0x7e0410] LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0 >> V [libjvm.so+0x7e0a20] LinearScanWalker::activate_current()+0x280 >> V [libjvm.so+0x7e0c7d] IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d >> V [libjvm.so+0x7e1078] LinearScan::allocate_registers()+0x338 >> V [libjvm.so+0x7e2135] LinearScan::do_linear_scan()+0x155 >> V [libjvm.so+0x70a6bb] Compilation::emit_lir()+0x99b >> V [libjvm.so+0x70caff] Compilation::compile_java_method()+0x42f >> V [libjvm.so+0x70d974] Compilation::compile_method()+0x1d4 >> V [libjvm.so+0x70e547] Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, >> DirectiveSet*)+0x357 >> V [libjvm.so+0x71073c] Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c >> V [libjvm.so+0xa3cf89] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409 >> >> Vladimir >> >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote: >>> >>> Thanks Vladimir, the below should fix this issue: >>> >>> ------------------------------ >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700 >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700 >>> @@ -233,22 +233,6 @@ >>> _xmm_regs[13] = xmm13; >>> _xmm_regs[14] = xmm14; >>> _xmm_regs[15] = xmm15; >>> - _xmm_regs[16] = xmm16; >>> - _xmm_regs[17] = xmm17; >>> - _xmm_regs[18] = xmm18; >>> - _xmm_regs[19] = xmm19; >>> - _xmm_regs[20] = xmm20; >>> - _xmm_regs[21] = xmm21; >>> - _xmm_regs[22] = xmm22; >>> - _xmm_regs[23] = xmm23; >>> - _xmm_regs[24] = xmm24; >>> - _xmm_regs[25] = xmm25; >>> - _xmm_regs[26] = xmm26; >>> - _xmm_regs[27] = xmm27; >>> - _xmm_regs[28] = xmm28; >>> - _xmm_regs[29] = xmm29; >>> - _xmm_regs[30] = xmm30; >>> - _xmm_regs[31] = xmm31; >>> #endif // _LP64 >>> >>> for (int i = 0; i < 8; i++) { >>> --------------------------------- >>> >>> I think the gcc version on my desktop is older so didn?t catch this. >>> >>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to: >>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/ >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735 >>> >>> FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you before >>> changing it back to 3. >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Friday, September 14, 2018 12:13 PM >>> To: Viswanathan, Sandhya >; hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >>> >>> I got build failure: >>> >>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the array >>> (which contains 16 elements) [-Werror,-Warray-bounds] >>> jib > _xmm_regs[16] = xmm16; >>> >>> I also noticed that we don't have RFE for this work. I filed: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>> >>> You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp: >>> >>> - product(intx, UseAVX, 2, \ >>> + product(intx, UseAVX, 3, \ >>> >>> Thanks, >>> Vladimir >>> >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote: >>>> Looks good to me. I will start testing and let you know results. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote: >>>>> Hi Vladimir, >>>>> >>>>> Please find below the updated webrev with all your comments incorporated: >>>>> >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/ >>>>> >>>>> I have run the jtreg compiler tests on SKX and KNL which have two >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms. >>>>> >>>>> Best Regards, >>>>> Sandhya >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, September 11, 2018 8:54 PM >>>>> To: Viswanathan, Sandhya >; >>>>> hotspot-compiler-dev at openjdk.java.net >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >>>>> >>>>> Thank you, Sandhya >>>>> >>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them. >>>>> >>>>> Vladimir >>>>> >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote: >>>>>> Hi Vladimir, >>>>>> >>>>>> Thanks a lot for the detailed review. I really appreciate your feedback. >>>>>> Please see my response in your email below marked with (Sandhya >>>>>>>>> ). Looking forward to your advice. >>>>>> >>>>>> Best Regards, >>>>>> Sandhya >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM >>>>>> To: Viswanathan, Sandhya >; >>>>>> hotspot-compiler-dev at openjdk.java.net >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>>>>> instruction >>>>>> >>>>>> Thank you. >>>>>> >>>>>> I want to discuss next issue: >>>>>> >>>>>> > You did not added instructions to load these registers from >>>>>> memory (and stack). What happens in such cases when you need to load or store? >>>>>> >>>> Let us take an example, e.g. for loading into rregF. First >>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa. >>>>>> >>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack. >>>>>> Also we don't check that register could be the same as result you may get unneeded moves. >>>>>> >>>>>> I would advice add memory moves at least. >>>>>> >>>>>> Sandhya >>> I had added those rules initially and removed them in >>>>>> the final patch. I noticed that the register allocator uses the >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask >>>>>> (matcher.cpp). I would like the register allocator to get all the >>>>>> possible register on an architecture for idealreg2reg mask. I >>>>>> wondered that multiple instruct rules in .ad file for LoadF from >>>>>> memory might cause problems. I would have to have higher cost for >>>>>> loading into restricted register set like vlReg. Then I decided that >>>>>> the register allocator can handle this in much better way than me >>>>>> adding rules to load from memory. This is with the background that the regF is always all the available registers >>>>>> and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if >>>>>> I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I >>>>>> am referring to is: >>>>>> MachNode *spillCP = match_tree(new >>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>>>>> #endif >>>>>> MachNode *spillI = match_tree(new >>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); >>>>>> MachNode *spillL = match_tree(new >>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, >>>>>> LoadNode::DependsO nlyOnTest, false)); >>>>>> MachNode *spillF = match_tree(new >>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); >>>>>> MachNode *spillD = match_tree(new >>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); >>>>>> MachNode *spillP = match_tree(new >>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>>>>> .... >>>>>> idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); >>>>>> >>>>>> An other question. You use movflt() and movdbl() which use either >>>>>> movap[s|d] and movs[s|d] >>>>>> instructions: >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when >>>>>> avx512vl is not available? I see for vectors you use >>>>>> vpxor+vinserti* combination. >>>>>> >>>>>> Sandhya >>> Yes the scalar floating point instructions are available >>>>>> with AVX512 encoding when avx512vl is not available. That is why you >>>>>> would see not just movflt, movdbl but all the other scalar >>>>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions. >>>>>> >>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad: >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl()) >>>>>> >>>>>> Should it be (UseAVX < 3)? >>>>>> >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: >>>>>>> Hi Vladimir, >>>>>>> >>>>>>> Thanks a lot for your review and feedback. Please see my response >>>>>>> in your email below. I will send an updated webrev incorporating your feedback. >>>>>>> >>>>>>> Best Regards, >>>>>>> Sandhya >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>> Sent: Monday, September 10, 2018 6:09 PM >>>>>>> To: Viswanathan, Sandhya >; >>>>>>> hotspot-compiler-dev at openjdk.java.net >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>>>>>> instruction >>>>>>> >>>>>>> Very nice. Thank you, Sandhya. >>>>>>> >>>>>>> I would like to see more meaningful naming in .ad files - instead >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*. >>>>>>> >>>>>>>>>> Yes, accepted. >>>>>>> >>>>>>> New load_from_* and load_to_* instructions in .ad files should be >>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions: >>>>>>> >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, >>>>>>> vlRegF src) >>>>>>>>>> Yes, accepted. >>>>>>> >>>>>>> You did not added instructions to load these registers from memory >>>>>>> (and stack). What happens in such cases when you need to load or store? >>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it >>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa. >>>>>>> >>>>>>> Also please explain why these registers are used when UseAVX == 0?: >>>>>>> >>>>>>> +instruct absD_reg(rregD dst) %{ >>>>>>> predicate((UseSSE>=2) && (UseAVX == 0)); >>>>>>> >>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too: >>>>>>> 661 if (UseAVX < 3) { >>>>>>> 662 _features &= ~CPU_AVX512F; >>>>>>> >>>>>>>>>> Yes, accepted. It could be regD here. >>>>>>> >>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): >>>>>>> >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, >>>>>>> +vectors_reg_legacy, %{ >>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() && >>>>>>> VM_Version::supports_avx512dq() && >>>>>>> VM_Version::supports_avx512vl() %} ); >>>>>>> >>>>>>>>>> Yes, accepted. >>>>>>> >>>>>>> I would suggest to test these changes on different machines >>>>>>> (non-avx512 and avx512) and with different UseAVX values. >>>>>>> >>>>>>>>>> Will do. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: >>>>>>>> Recently there have been couple of high priority issues with >>>>>>>> regards to high bank of XMM register >>>>>>>> (XMM16-XMM31) usage by C2: >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746 >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>>>>>>> >>>>>>>> Please find below a patch which attempts to clean up the XMM >>>>>>>> register handling by using register groups. >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ >>>>>>>> >>>>>>>> >>>>>>>> The patch provides a restricted set of registers to the match >>>>>>>> rules in the ad file based on the underlying architecture. >>>>>>>> >>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler. >>>>>>>> >>>>>>>> By removing the special handling, the patch reduces the overall >>>>>>>> code size by about 1800 lines of code. >>>>>>>> >>>>>>>> Your review and feedback is very welcome. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> >>>>>>>> Sandhya >>>>>>>> -- Thanks, Jc -------------- next part -------------- An HTML attachment was scrubbed... URL: From patric.hedlin at oracle.com Wed Sep 19 09:25:19 2018 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Wed, 19 Sep 2018 11:25:19 +0200 Subject: RFR(XS): 8210284: "assert((av & 0x00000001) == 0) failed: unsupported V8" on Solaris 11.4 Message-ID: Dear all, I would like to ask for help to review the following change/update: Issue:? https://bugs.openjdk.java.net/browse/JDK-8210284 Webrev: http://cr.openjdk.java.net/~phedlin/tr8210284 Testing:Verified that the JVM (in debug build) will not assert on ???? ??? start-up when running Solaris 11.4, after applying the update. Best regards, Patric From rkennke at redhat.com Wed Sep 19 09:40:31 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 19 Sep 2018 11:40:31 +0200 Subject: RFR: JDK-8210829: Modularize allocations in C2 In-Reply-To: <09c53a34-eb68-a614-ee03-468673004b3b@oracle.com> References: <8fe187e2-b3fb-5473-257f-e4f6b6a94fcb@redhat.com> <09c53a34-eb68-a614-ee03-468673004b3b@oracle.com> Message-ID: Thanks, Vladimir! I'll fix the comment and push it through jdk/submit before pushing to jdk/jdk. Roman > Hi Roman, > > This looks good. I looked through changes and it generates the same > ideal graph as before. > It seems you unintentionally changed indent of the comment in > barrierSetC2.hpp > > Thanks, > Vladimir > > On 9/18/18 12:58 AM, Roman Kennke wrote: >> Similar to what we've done before to runtime, interpreter and C1, >> allocations should be owned and implemented by GC, and possible to >> override by specific collectors. For example, Shenandoah lays out >> objects differently in heap, and needs one extra store to initialize >> objects. >> >> This proposed change factors out the interesting part of object >> allocation (i.e. the actual allocation) into BarrierSetC2. It's mostly a >> move-and-rename-job. I had to move some little things around, that is: >> - for the need-gc-check, I'm passing back the needgc_ctrl to plug into >> slow-path >> - for prefetching, instead of passing around the 'length' node, only to >> determine the number of prefetch lines, I determine this early, and pass >> through the lines arg. >> - i_o, needgc_ctrl, fast_oop_ctrl, fast_oop_rawmem are passed as in/out >> or out-args to stitch together into the regions and phis as appropriate. >> I see no easy way around that. >> >> I tested this using hotspot/jtreg:tier1 and also verified that this >> fills Shenandoah's needs and run tier3_gc_shenandoah testsuite. >> >> http://cr.openjdk.java.net/~rkennke/JDK-8210829/webrev.00/ >> >> Can I please get reviews? >> Thanks, >> Roman >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From erik.osterlund at oracle.com Wed Sep 19 12:19:23 2018 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Wed, 19 Sep 2018 14:19:23 +0200 Subject: RFR(XS): 8210284: "assert((av & 0x00000001) == 0) failed: unsupported V8" on Solaris 11.4 In-Reply-To: References: Message-ID: <18901554-24c4-64b0-3f06-9a1d029f8d85@oracle.com> Hi Patric, Looks good. Thanks, /Erik On 2018-09-19 11:25, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8210284 > > Webrev: http://cr.openjdk.java.net/~phedlin/tr8210284 > > > > Testing:Verified that the JVM (in debug build) will not assert on > ???? ??? start-up when running Solaris 11.4, after applying the update. > > > Best regards, > Patric From vladimir.kozlov at oracle.com Wed Sep 19 16:22:56 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 19 Sep 2018 09:22:56 -0700 Subject: RFR(XS): 8210284: "assert((av & 0x00000001) == 0) failed: unsupported V8" on Solaris 11.4 In-Reply-To: References: Message-ID: <88e85a78-d9a9-9b05-a895-9e1349aaecba@oracle.com> Looks good. Thanks, Vladimir On 9/19/18 2:25 AM, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8210284 > > Webrev: http://cr.openjdk.java.net/~phedlin/tr8210284 > > > > Testing:Verified that the JVM (in debug build) will not assert on > ???? ??? start-up when running Solaris 11.4, after applying the update. > > > Best regards, > Patric From vladimir.kozlov at oracle.com Wed Sep 19 16:53:54 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 19 Sep 2018 09:53:54 -0700 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF4BB3@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com> <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com> <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com> <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com> <3c0fe42a-0606-a060-7435-32547676683b@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF4BB3@FMSMSX126.amr.corp.intel.com> Message-ID: Thank you, Sandhya I submitted new testing. Vladimir On 9/18/18 4:52 PM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > Please find below the updated webrev with fixes for the two issues: > > Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.03/ > > > RFE: https://bugs.openjdk.java.net/browse/JDK-8210764 > > Fix for compiler/vectorization/TestNaNVector.java was to pass legVecS as the temporary register type for intrinsics > instead of legVecD. > > This test was only failing with -XX:MaxVectorSize=4. > > The file modified is x86_64.ad. > > Fix for compiler/vectorization/TestNaNVector.java was to allow all xmm registers (xmm0-xmm31) for C1 and handle floating > point abs and negate appropriately by providing a temp register. > > The C1 files are modified for this fix. > > I reran compiler jtreg tests with setting nativepath appropriately on Haswell, SKX and KNL. > > Best Regards, > > Sandhya > > *From:*Viswanathan, Sandhya > *Sent:* Tuesday, September 18, 2018 1:47 PM > *To:* 'JC Beyler' > *Cc:* vladimir.kozlov at oracle.com; hotspot-compiler-dev > *Subject:* RE: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction > > Hi Jc, > > Thanks a lot for the steps. I am now able to verify my fix for the NativeCallTest.java. > > Best Regards, > > Sandhya > > *From:*JC Beyler [mailto:jcbeyler at google.com] > *Sent:* Monday, September 17, 2018 9:29 PM > *To:* Viswanathan, Sandhya > > *Cc:* vladimir.kozlov at oracle.com ; hotspot-compiler-dev > > > *Subject:* Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction > > Hi Sandhya, > > How are you invoking the test for NativeCallTest? > > The way I would do it using jtreg would be something like this: > > $ export BUILD_TYPE=release > > $ export JDK_PATH=wherever you have your JDK > > From the test subfolder: > > $ wherever-your-jtreg-is/bin/jtreg > -nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/support/test/hotspot/jtreg/native/lib -jdk > $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk > hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java > > Seems to pass for me. > > But much easier is: > > $ make run-test TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java" > > That seems to pass for me as well and is easier to use :) > > For information, the make run-test documentation is here: > > http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing.html > > Let me know if that helps, > > Jc > > Note: For NativeCallTest.java and many others I get the following message in the corresponding .jtr file: > ? ? ? ? ? ? ? ? ? ? "test result: Error. Use -nativepath to specify the location of native code" > ? ? ? ? ? ? Do I need to give any additional info to jtreg to get over this problem? > > Thanks a lot! > Best Regards, > Sandhya > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com ] > Sent: Monday, September 17, 2018 10:14 AM > To: Viswanathan, Sandhya >; > hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction > > I finished testing on avx512 machine. > All passed except known (TestNaNVector.java) failures. > > Thanks, > Vladimir > > On 9/14/18 5:22 PM, Vladimir Kozlov wrote: > > I gave incorrect link to RFE. Here is correct: > > > > https://bugs.openjdk.java.net/browse/JDK-8210764 > > > > Vladimir > > > > On 9/14/18 3:49 PM, Vladimir Kozlov wrote: > >> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed. > >> > >> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too. > >> > >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation' > on CPU > >> with AVX1 only > >> > >> #? SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884 > >> # Problematic frame: > >> # V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20 > >> > >> Current CompileTask: > >> C2:??? 154??? 5???????????? java.lang.String::equals (65 bytes) > >> > >> Stack: [0x00007f3b10044000,0x00007f3b10145000],? sp=0x00007f3b1013fe70,? free space=1007k > >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) > >> V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20 > >> V? [libjvm.so+0x882a72]? PhaseChaitin::gather_lrg_masks(bool)+0x872 > >> V? [libjvm.so+0xd82235]? PhaseCFG::global_code_motion()+0xfc5 > >> V? [libjvm.so+0xd824b1]? PhaseCFG::do_global_code_motion()+0x51 > >> V? [libjvm.so+0xa2c26c]? Compile::Code_Gen()+0x24c > >> V? [libjvm.so+0xa2ff82]? Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, > DirectiveSet*)+0xe42 > >> > >> ------------------------------------------------------------------------------------------------ > >> 2. > > with '-Xcomp' > >> #? Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073 > >> #? assert(false) failed: cannot spill interval that is used in first instruction (possible reason: no register > found) > >> > >> Current CompileTask: > >> C1: 854767 13391?????? 3?????? org.sunflow.math.Matrix4::multiply (692 bytes) > >> > >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],? sp=0x00007f23b9e7f9d0,? free space=1014k > >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) > >> V? [libjvm.so+0x1882202]? VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned > >> char*, void*, void*, char const*, int, unsigned long)+0x562 > >> V? [libjvm.so+0x1882d2f]? VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, > >> __va_list_tag*)+0x2f > >> V? [libjvm.so+0xb0bea0]? report_vm_error(char const*, int, char const*, char const*, ...)+0x100 > >> V? [libjvm.so+0x7e0410]? LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0 > >> V? [libjvm.so+0x7e0a20]? LinearScanWalker::activate_current()+0x280 > >> V? [libjvm.so+0x7e0c7d]? IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d > >> V? [libjvm.so+0x7e1078]? LinearScan::allocate_registers()+0x338 > >> V? [libjvm.so+0x7e2135]? LinearScan::do_linear_scan()+0x155 > >> V? [libjvm.so+0x70a6bb]? Compilation::emit_lir()+0x99b > >> V? [libjvm.so+0x70caff]? Compilation::compile_java_method()+0x42f > >> V? [libjvm.so+0x70d974]? Compilation::compile_method()+0x1d4 > >> V? [libjvm.so+0x70e547]? Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, > >> DirectiveSet*)+0x357 > >> V? [libjvm.so+0x71073c]? Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c > >> V? [libjvm.so+0xa3cf89]? CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409 > >> > >> Vladimir > >> > >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote: > >>> > >>> Thanks Vladimir, the below should fix this issue: > >>> > >>> ------------------------------ > >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700 > >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700 > >>> @@ -233,22 +233,6 @@ > >>> ??? _xmm_regs[13]? = xmm13; > >>> ??? _xmm_regs[14]? = xmm14; > >>> ??? _xmm_regs[15]? = xmm15; > >>> -? _xmm_regs[16]? = xmm16; > >>> -? _xmm_regs[17]? = xmm17; > >>> -? _xmm_regs[18]? = xmm18; > >>> -? _xmm_regs[19]? = xmm19; > >>> -? _xmm_regs[20]? = xmm20; > >>> -? _xmm_regs[21]? = xmm21; > >>> -? _xmm_regs[22]? = xmm22; > >>> -? _xmm_regs[23]? = xmm23; > >>> -? _xmm_regs[24]? = xmm24; > >>> -? _xmm_regs[25]? = xmm25; > >>> -? _xmm_regs[26]? = xmm26; > >>> -? _xmm_regs[27]? = xmm27; > >>> -? _xmm_regs[28]? = xmm28; > >>> -? _xmm_regs[29]? = xmm29; > >>> -? _xmm_regs[30]? = xmm30; > >>> -? _xmm_regs[31]? = xmm31; > >>> ? #endif // _LP64 > >>> > >>> ??? for (int i = 0; i < 8; i++) { > >>> --------------------------------- > >>> > >>> I think the gcc version on my desktop is older so didn?t catch this. > >>> > >>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to: > >>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/ > > >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735 > >>> > >>> FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you before > >>> changing it back to 3. > >>> > >>> Best Regards, > >>> Sandhya > >>> > >>> > >>> -----Original Message----- > >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com ] > >>> Sent: Friday, September 14, 2018 12:13 PM > >>> To: Viswanathan, Sandhya >; > hotspot-compiler-dev at openjdk.java.net > >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction > >>> > >>> I got build failure: > >>> > >>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the array > >>> (which contains 16 elements) [-Werror,-Warray-bounds] > >>> jib >?? _xmm_regs[16]? = xmm16; > >>> > >>> I also noticed that we don't have RFE for this work. I filed: > >>> > >>> https://bugs.openjdk.java.net/browse/JDK-8209735 > >>> > >>> You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next > >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp: > >>> > >>> - product(intx, UseAVX, 2, \ > >>> + product(intx, UseAVX, 3, \ > >>> > >>> Thanks, > >>> Vladimir > >>> > >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote: > >>>> Looks good to me. I will start testing and let you know results. > >>>> > >>>> Thanks, > >>>> Vladimir > >>>> > >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote: > >>>>> Hi Vladimir, > >>>>> > >>>>> Please find below the updated webrev with all your comments incorporated: > >>>>> > >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/ > > >>>>> > >>>>> I have run the jtreg compiler tests on SKX and KNL which have two > >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms. > >>>>> > >>>>> Best Regards, > >>>>> Sandhya > >>>>> > >>>>> > >>>>> -----Original Message----- > >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com ] > >>>>> Sent: Tuesday, September 11, 2018 8:54 PM > >>>>> To: Viswanathan, Sandhya >; > >>>>> hotspot-compiler-dev at openjdk.java.net > >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction > >>>>> > >>>>> Thank you, Sandhya > >>>>> > >>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them. > >>>>> > >>>>> Vladimir > >>>>> > >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote: > >>>>>> Hi Vladimir, > >>>>>> > >>>>>> Thanks a lot for the detailed review. I really appreciate your feedback. > >>>>>> Please see my response in your email below marked with (Sandhya > >>>>>>>>> ). Looking forward to your advice. > >>>>>> > >>>>>> Best Regards, > >>>>>> Sandhya > >>>>>> > >>>>>> > >>>>>> -----Original Message----- > >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com ] > >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM > >>>>>> To: Viswanathan, Sandhya >; > >>>>>> hotspot-compiler-dev at openjdk.java.net > >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 > >>>>>> instruction > >>>>>> > >>>>>> Thank you. > >>>>>> > >>>>>> I want to discuss next issue: > >>>>>> > >>>>>> ??? > You did not added instructions to load these registers from > >>>>>> memory (and stack). What happens in such cases when you need to load or store? > >>>>>> ??? >>>> Let us take an example, e.g. for loading into rregF. First > >>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa. > >>>>>> > >>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack. > >>>>>> Also we don't check that register could be the same as result you may get unneeded moves. > >>>>>> > >>>>>> I would advice add memory moves at least. > >>>>>> > >>>>>> Sandhya >>>? I had added those rules initially and removed them in > >>>>>> the final patch. I noticed that the register allocator uses the > >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask > >>>>>> (matcher.cpp). I would like the register allocator to get all the > >>>>>> possible register on an architecture for idealreg2reg mask. I > >>>>>> wondered that multiple instruct rules in .ad file for LoadF from > >>>>>> memory might cause problems.? I would have to have higher cost for > >>>>>> loading into restricted register set like vlReg. Then I decided that > >>>>>> the register allocator can handle this in much better way than me > >>>>>> adding rules to load from memory. This is with the background that the regF is always all the available > registers > >>>>>> and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this > and if > >>>>>> I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp > that I > >>>>>> am referring to is: > >>>>>> ???? MachNode *spillCP = match_tree(new > >>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); > >>>>>> #endif > >>>>>> ???? MachNode *spillI? = match_tree(new > >>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); > >>>>>> ???? MachNode *spillL? = match_tree(new > >>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, > >>>>>> LoadNode::DependsO nlyOnTest, false)); > >>>>>> ???? MachNode *spillF? = match_tree(new > >>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); > >>>>>> ???? MachNode *spillD? = match_tree(new > >>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); > >>>>>> ???? MachNode *spillP? = match_tree(new > >>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); > >>>>>> ???? .... > >>>>>> ???? idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); > >>>>>> > >>>>>> An other question. You use movflt() and movdbl() which use either > >>>>>> movap[s|d] and movs[s|d] > >>>>>> instructions: > >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu > >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when > >>>>>> avx512vl is not available? I see for vectors you use > >>>>>> vpxor+vinserti* combination. > >>>>>> > >>>>>> Sandhya >>> Yes the scalar floating point instructions are available > >>>>>> with AVX512 encoding when avx512vl is not available. That is why you > >>>>>> would see not just movflt, movdbl but all the other scalar > >>>>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F > instructions. > >>>>>> > >>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad : > >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl()) > >>>>>> > >>>>>> Should it be (UseAVX < 3)? > >>>>>> > >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev. > >>>>>> > >>>>>> Thanks, > >>>>>> Vladimir > >>>>>> > >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: > >>>>>>> Hi Vladimir, > >>>>>>> > >>>>>>> Thanks a lot for your review and feedback. Please see my response > >>>>>>> in your email below. I will send an updated webrev incorporating your feedback. > >>>>>>> > >>>>>>> Best Regards, > >>>>>>> Sandhya > >>>>>>> > >>>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com ] > >>>>>>> Sent: Monday, September 10, 2018 6:09 PM > >>>>>>> To: Viswanathan, Sandhya >; > >>>>>>> hotspot-compiler-dev at openjdk.java.net > >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 > >>>>>>> instruction > >>>>>>> > >>>>>>> Very nice. Thank you, Sandhya. > >>>>>>> > >>>>>>> I would like to see more meaningful naming in .ad files - instead > >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*. > >>>>>>> > >>>>>>>>>> Yes, accepted. > >>>>>>> > >>>>>>> New load_from_* and load_to_* instructions in .ad files should be > >>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions: > >>>>>>> > >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, > >>>>>>> vlRegF src) > >>>>>>>>>> Yes, accepted. > >>>>>>> > >>>>>>> You did not added instructions to load these registers from memory > >>>>>>> (and stack). What happens in such cases when you need to load or store? > >>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it > >>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa. > >>>>>>> > >>>>>>> Also please explain why these registers are used when UseAVX == 0?: > >>>>>>> > >>>>>>> +instruct absD_reg(rregD dst) %{ > >>>>>>> ??????? predicate((UseSSE>=2) && (UseAVX == 0)); > >>>>>>> > >>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too: > >>>>>>> ????? 661?? if (UseAVX < 3) { > >>>>>>> ????? 662???? _features &= ~CPU_AVX512F; > >>>>>>> > >>>>>>>>>> Yes, accepted. It could be regD here. > >>>>>>> > >>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): > >>>>>>> > >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, > >>>>>>> +vectors_reg_legacy, %{ > >>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() && > >>>>>>> VM_Version::supports_avx512dq() && > >>>>>>> VM_Version::supports_avx512vl() %} ); > >>>>>>> > >>>>>>>>>> Yes, accepted. > >>>>>>> > >>>>>>> I would suggest to test these changes on different machines > >>>>>>> (non-avx512 and avx512) and with different UseAVX values. > >>>>>>> > >>>>>>>>>> Will do. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Vladimir > >>>>>>> > >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: > >>>>>>>> Recently there have been couple of high priority issues with > >>>>>>>> regards to high bank of XMM register > >>>>>>>> (XMM16-XMM31) usage by C2: > >>>>>>>> > >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746 > >>>>>>>> > >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735 > >>>>>>>> > >>>>>>>> Please find below a patch which attempts to clean up the XMM > >>>>>>>> register handling by using register groups. > >>>>>>>> > >>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ > > >>>>>>>> > >>>>>>>> > >>>>>>>> The patch provides a restricted set of registers to the match > >>>>>>>> rules in the ad file based on the underlying architecture. > >>>>>>>> > >>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler. > >>>>>>>> > >>>>>>>> By removing the special handling, the patch reduces the overall > >>>>>>>> code size by about 1800 lines of code. > >>>>>>>> > >>>>>>>> Your review and feedback is very welcome. > >>>>>>>> > >>>>>>>> Best Regards, > >>>>>>>> > >>>>>>>> Sandhya > >>>>>>>> > > > -- > > Thanks, > > Jc > From rkennke at redhat.com Wed Sep 19 17:08:11 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 19 Sep 2018 19:08:11 +0200 Subject: RFR: JDK-8210829: Modularize allocations in C2 In-Reply-To: <09c53a34-eb68-a614-ee03-468673004b3b@oracle.com> References: <8fe187e2-b3fb-5473-257f-e4f6b6a94fcb@redhat.com> <09c53a34-eb68-a614-ee03-468673004b3b@oracle.com> Message-ID: Alright, submit repo came back with UNSTABLE. Can somebody here check it and get back to me? Build Details: 2018-09-19-1536076.roman.source 0 Failed Tests Mach5 Tasks Results Summary KILLED: 0 PASSED: 70 UNABLE_TO_RUN: 3 NA: 0 FAILED: 0 EXECUTED_WITH_FAILURE: 2 Test 2 Not run tier1-solaris-sparc-jdk_open_test_hotspot_jtreg_tier1_common-solaris-sparcv9-57 Dependency task failed: mach5...-1909-solaris-sparcv9-solaris-sparcv9-build-8 tier1-solaris-sparc-jdk_open_test_hotspot_jtreg_tier1_common-solaris-sparcv9-debug-58 Dependency task failed: mach5...solaris-sparcv9-debug-solaris-sparcv9-build-9 > Hi Roman, > > This looks good. I looked through changes and it generates the same > ideal graph as before. > It seems you unintentionally changed indent of the comment in > barrierSetC2.hpp > > Thanks, > Vladimir > > On 9/18/18 12:58 AM, Roman Kennke wrote: >> Similar to what we've done before to runtime, interpreter and C1, >> allocations should be owned and implemented by GC, and possible to >> override by specific collectors. For example, Shenandoah lays out >> objects differently in heap, and needs one extra store to initialize >> objects. >> >> This proposed change factors out the interesting part of object >> allocation (i.e. the actual allocation) into BarrierSetC2. It's mostly a >> move-and-rename-job. I had to move some little things around, that is: >> - for the need-gc-check, I'm passing back the needgc_ctrl to plug into >> slow-path >> - for prefetching, instead of passing around the 'length' node, only to >> determine the number of prefetch lines, I determine this early, and pass >> through the lines arg. >> - i_o, needgc_ctrl, fast_oop_ctrl, fast_oop_rawmem are passed as in/out >> or out-args to stitch together into the regions and phis as appropriate. >> I see no easy way around that. >> >> I tested this using hotspot/jtreg:tier1 and also verified that this >> fills Shenandoah's needs and run tier3_gc_shenandoah testsuite. >> >> http://cr.openjdk.java.net/~rkennke/JDK-8210829/webrev.00/ >> >> Can I please get reviews? >> Thanks, >> Roman >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From vladimir.kozlov at oracle.com Wed Sep 19 17:58:53 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 19 Sep 2018 10:58:53 -0700 Subject: RFR: JDK-8210829: Modularize allocations in C2 In-Reply-To: References: <8fe187e2-b3fb-5473-257f-e4f6b6a94fcb@redhat.com> <09c53a34-eb68-a614-ee03-468673004b3b@oracle.com> Message-ID: <81b25b8b-c18f-28b5-1e25-3ecab35d6dc6@oracle.com> Crypto library build failed - 8210912. Mikael just pushed the fix - update your copy: http://hg.openjdk.java.net/jdk/jdk/rev/15094d12a632 Vladimir On 9/19/18 10:08 AM, Roman Kennke wrote: > Alright, submit repo came back with UNSTABLE. Can somebody here check it > and get back to me? > > Build Details: 2018-09-19-1536076.roman.source > 0 Failed Tests > Mach5 Tasks Results Summary > > KILLED: 0 > PASSED: 70 > UNABLE_TO_RUN: 3 > NA: 0 > FAILED: 0 > EXECUTED_WITH_FAILURE: 2 > Test > > 2 Not run > > tier1-solaris-sparc-jdk_open_test_hotspot_jtreg_tier1_common-solaris-sparcv9-57 > Dependency task failed: > mach5...-1909-solaris-sparcv9-solaris-sparcv9-build-8 > > tier1-solaris-sparc-jdk_open_test_hotspot_jtreg_tier1_common-solaris-sparcv9-debug-58 > Dependency task failed: > mach5...solaris-sparcv9-debug-solaris-sparcv9-build-9 > > >> Hi Roman, >> >> This looks good. I looked through changes and it generates the same >> ideal graph as before. >> It seems you unintentionally changed indent of the comment in >> barrierSetC2.hpp >> >> Thanks, >> Vladimir >> >> On 9/18/18 12:58 AM, Roman Kennke wrote: >>> Similar to what we've done before to runtime, interpreter and C1, >>> allocations should be owned and implemented by GC, and possible to >>> override by specific collectors. For example, Shenandoah lays out >>> objects differently in heap, and needs one extra store to initialize >>> objects. >>> >>> This proposed change factors out the interesting part of object >>> allocation (i.e. the actual allocation) into BarrierSetC2. It's mostly a >>> move-and-rename-job. I had to move some little things around, that is: >>> - for the need-gc-check, I'm passing back the needgc_ctrl to plug into >>> slow-path >>> - for prefetching, instead of passing around the 'length' node, only to >>> determine the number of prefetch lines, I determine this early, and pass >>> through the lines arg. >>> - i_o, needgc_ctrl, fast_oop_ctrl, fast_oop_rawmem are passed as in/out >>> or out-args to stitch together into the regions and phis as appropriate. >>> I see no easy way around that. >>> >>> I tested this using hotspot/jtreg:tier1 and also verified that this >>> fills Shenandoah's needs and run tier3_gc_shenandoah testsuite. >>> >>> http://cr.openjdk.java.net/~rkennke/JDK-8210829/webrev.00/ >>> >>> Can I please get reviews? >>> Thanks, >>> Roman >>> > > From rkennke at redhat.com Wed Sep 19 20:05:23 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 19 Sep 2018 22:05:23 +0200 Subject: RFR: JDK-8210829: Modularize allocations in C2 In-Reply-To: <81b25b8b-c18f-28b5-1e25-3ecab35d6dc6@oracle.com> References: <8fe187e2-b3fb-5473-257f-e4f6b6a94fcb@redhat.com> <09c53a34-eb68-a614-ee03-468673004b3b@oracle.com> <81b25b8b-c18f-28b5-1e25-3ecab35d6dc6@oracle.com> Message-ID: Thanks, Vladimir. That fixed it, build came out clean and I pushed the change. Thanks, Roman > Crypto library build failed - 8210912. > Mikael just pushed the fix - update your copy: > > http://hg.openjdk.java.net/jdk/jdk/rev/15094d12a632 > > Vladimir > > On 9/19/18 10:08 AM, Roman Kennke wrote: >> Alright, submit repo came back with UNSTABLE. Can somebody here check it >> and get back to me? >> >> Build Details: 2018-09-19-1536076.roman.source >> 0 Failed Tests >> Mach5 Tasks Results Summary >> >> ???? KILLED: 0 >> ???? PASSED: 70 >> ???? UNABLE_TO_RUN: 3 >> ???? NA: 0 >> ???? FAILED: 0 >> ???? EXECUTED_WITH_FAILURE: 2 >> ???? Test >> >> ???????? 2 Not run >> >> tier1-solaris-sparc-jdk_open_test_hotspot_jtreg_tier1_common-solaris-sparcv9-57 >> >> Dependency task failed: >> mach5...-1909-solaris-sparcv9-solaris-sparcv9-build-8 >> >> tier1-solaris-sparc-jdk_open_test_hotspot_jtreg_tier1_common-solaris-sparcv9-debug-58 >> >> Dependency task failed: >> mach5...solaris-sparcv9-debug-solaris-sparcv9-build-9 >> >> >>> Hi Roman, >>> >>> This looks good. I looked through changes and it generates the same >>> ideal graph as before. >>> It seems you unintentionally changed indent of the comment in >>> barrierSetC2.hpp >>> >>> Thanks, >>> Vladimir >>> >>> On 9/18/18 12:58 AM, Roman Kennke wrote: >>>> Similar to what we've done before to runtime, interpreter and C1, >>>> allocations should be owned and implemented by GC, and possible to >>>> override by specific collectors. For example, Shenandoah lays out >>>> objects differently in heap, and needs one extra store to initialize >>>> objects. >>>> >>>> This proposed change factors out the interesting part of object >>>> allocation (i.e. the actual allocation) into BarrierSetC2. It's >>>> mostly a >>>> move-and-rename-job. I had to move some little things around, that is: >>>> - for the need-gc-check, I'm passing back the needgc_ctrl to plug into >>>> slow-path >>>> - for prefetching, instead of passing around the 'length' node, only to >>>> determine the number of prefetch lines, I determine this early, and >>>> pass >>>> through the lines arg. >>>> - i_o, needgc_ctrl, fast_oop_ctrl, fast_oop_rawmem are passed as in/out >>>> or out-args to stitch together into the regions and phis as >>>> appropriate. >>>> I see no easy way around that. >>>> >>>> I tested this using hotspot/jtreg:tier1 and also verified that this >>>> fills Shenandoah's needs and run tier3_gc_shenandoah testsuite. >>>> >>>> http://cr.openjdk.java.net/~rkennke/JDK-8210829/webrev.00/ >>>> >>>> Can I please get reviews? >>>> Thanks, >>>> Roman >>>> >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From Pengfei.Li at arm.com Thu Sep 20 04:15:28 2018 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Thu, 20 Sep 2018 04:15:28 +0000 Subject: [aarch64-port-dev ] RFR(S): 8210413: AArch64: Optimize div/rem by constant in C1 In-Reply-To: <9645c210-3d87-52fa-8051-54dc60629866@redhat.com> References: <9645c210-3d87-52fa-8051-54dc60629866@redhat.com> Message-ID: Hi Andrew, Please find below new patch that added the same optimization for longs as well as ints and also fixed an issue. http://cr.openjdk.java.net/~yzhang/8210413/webrev.01/ Could you help look at it again? -- Thanks, Pengfei > -----Original Message----- > > Hi, > > On 09/13/2018 10:04 AM, Pengfei Li (Arm Technology China) wrote: > > > Could you please help review this optimization in C1 AArch64? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8210413 > > webrev: http://cr.openjdk.java.net/~njian/8210413/webrev.00/ > > It looks fine, but it's really odd that this is only implemented for ints and not > longs. Can you do longs too? > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From patric.hedlin at oracle.com Thu Sep 20 09:53:12 2018 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Thu, 20 Sep 2018 11:53:12 +0200 Subject: RFR(S): JDK-8191339: [JVMCI] BigInteger compiler intrinsics on Graal. In-Reply-To: <02f34a26-2a97-6a30-384f-115327781aac@oracle.com> References: <28011331-bd43-2c32-dba4-e41879ffe28a@oracle.com> <02f34a26-2a97-6a30-384f-115327781aac@oracle.com> Message-ID: <661f70d5-7a09-d181-5669-9841b590c7a3@oracle.com> Hi Vladimir, Andrew, Sorry for dropping this after vacation. The testing is a simplistic benchmark (soon to be... I hope) added to Graal (and some directed, a bit to ad hoc, testing not meant for up-streaming to Graal). I also used a simplified version of a more general JVMCI/VM test case for these options only, but it really does only exercise the JVMCI (not the option propagation in Graal or some other JVMCI "client"), making it less useful. But in essence, Graal is the test-case. On 2018-06-22 18:04, Vladimir Kozlov wrote: > Hi Patric, > > Do you need Graal changes for this? Or it already has these intrinsics > and the only problem is these flags were not set in vm_version_x86.cpp? No further changes have been made to Graal. > > Small note. In vm_version_x86.cpp previous code has already > COMPILER2_OR_JVMCI check. You can remove previous #endif and new > #ifdef. Also change comment for closing #endif at line 1080 to // > COMPILER2_OR_JVMCI > > 1080 #endif // COMPILER2 You are right (actually the intended webrev) and it should look correct now (just a tad old). Best regards, Patric > > What testing you did? > > Thanks, > Vladimir > > On 6/21/18 8:26 AM, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue: https://bugs.openjdk.java.net/browse/JDK-8191339 >> >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8191339/ >> >> >> 8191339: [JVMCI] BigInteger compiler intrinsics on Graal. >> >> ???? Enabling BigInteger intrinsics via JVMCI. >> >> >> >> Best regards, >> Patric From aph at redhat.com Thu Sep 20 10:53:23 2018 From: aph at redhat.com (Andrew Haley) Date: Thu, 20 Sep 2018 11:53:23 +0100 Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11 In-Reply-To: <9b58801d-fcdd-88ff-7d45-15c44096137f@bell-sw.com> References: <98587949-4bef-51ad-0652-3a74f79af233@oracle.com> <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com> <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com> <097e9014-db4a-0f2f-9b78-80cb5dcc7832@bell-sw.com> <9b58801d-fcdd-88ff-7d45-15c44096137f@bell-sw.com> Message-ID: On 09/06/2018 02:20 PM, Dmitry Chuyko wrote: > On 09/06/2018 04:10 PM, Roland Westrelin wrote: >>> Yes. Here is how it looks like: >>> ................................... >> That does seem like a pretty minimal difference and not a reason not to >> push that change. What do you think? > I agree, it looks like something we should investigate in aarch64 port. mkay, but how, exactly? Is it simply the case that Intel is improved so the patch is good, even if AArch64 regresses? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shade at redhat.com Thu Sep 20 14:18:11 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 20 Sep 2018 16:18:11 +0200 Subject: RFR (XS) 8210963: Build failures after "8210829: Modularize allocations in C2" Message-ID: <66f8fba1-f1cf-5291-256e-11fef5184435@redhat.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8210963 Missing include for 32-bit platforms makes it fail x86_32 and arm32 builds. Also, uint/intx inconsistency makes it fail even after includes are proper, because "lines" is now "intx". Seems easier to fix uint->intx right at uses in for-loops, because "i" is only used for "i == 0" comparisons in them. Fix: diff -r 1fd0f300d4b7 src/hotspot/share/gc/shared/c2/barrierSetC2.cpp --- a/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp Thu Sep 20 08:11:21 2018 -0400 +++ b/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp Thu Sep 20 15:49:02 2018 +0200 @@ -27,2 +27,3 @@ #include "opto/arraycopynode.hpp" +#include "opto/convertnode.hpp" #include "opto/graphKit.hpp" diff -r 1fd0f300d4b7 src/hotspot/share/opto/macro.cpp --- a/src/hotspot/share/opto/macro.cpp Thu Sep 20 08:11:21 2018 -0400 +++ b/src/hotspot/share/opto/macro.cpp Thu Sep 20 15:49:02 2018 +0200 @@ -1729,3 +1729,3 @@ - for ( uint i = 0; i < lines; i++ ) { + for ( intx i = 0; i < lines; i++ ) { prefetch_adr = new AddPNode( old_pf_wm, new_pf_wmt, @@ -1782,3 +1782,3 @@ distance = step_size; - for ( uint i = 1; i < lines; i++ ) { + for ( intx i = 1; i < lines; i++ ) { prefetch_adr = new AddPNode( cache_adr, cache_adr, @@ -1798,3 +1798,3 @@ uint distance = AllocatePrefetchDistance; - for ( uint i = 0; i < lines; i++ ) { + for ( intx i = 0; i < lines; i++ ) { prefetch_adr = new AddPNode( old_eden_top, new_eden_top, Testing: x86_64, x86_32, armhf builds Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From adinn at redhat.com Thu Sep 20 14:20:02 2018 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 20 Sep 2018 15:20:02 +0100 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> Message-ID: <84fa6c2e-6a6e-a59a-8dff-175f7e50240f@redhat.com> Ping! Could I please get a review of this latest version of the JEP? This includes responses to all previous comments with changes made both to the JEP and the draft implementation. I would like to get this into JDK12 if at all possible regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander On 10/09/18 19:05, Alan Bateman wrote: > On 20/08/2018 16:18, Andrew Dinn wrote: >> Hi Alan, >> >> Round 4: >> >> I have redrafted the JEP and updated the implementation in the light of >> your last feedback: >> >> ?? JEP JIRA: https://bugs.openjdk.java.net/browse/JDK-8207851 >> >> ?? Formatted JEP: http://openjdk.java.net/jeps/8207851 >> >> ?? New webrev: http://cr.openjdk.java.net/~adinn/pmem/webrev.04/ >> >> > The updated JEP looks much better. > > I realize we've been through several iterations on this but I'm now > wondering if the MappedByteBuffer is the right API. As you've shown, > it's straight forward to map a region of NVM and use the existing API, > I'm just not sure if it's the right API. I think I'd like to see a few > examples of how the API might be used. ByteBuffers aren't intended for > use by concurrent threads and I just wonder if the examples might need > that. I also wonder if there is a possible connection with work in > Project Panama and whether it's worth exploring if its scopes and > pointers could be used to backed by NVM. The Risks and Assumption > section mentions the 2GB limit which is another reminder that the MBB > API may not be the right API. > > The 2-arg force method to msync a region make sense? although it might > be more consistent for the second parameter to be the length than the > end offset. > > A detail for later is whether UOE might be more appropriate for > implementations that do not support the XXX_PERSISTENT modes. > > -Alan. > From rkennke at redhat.com Thu Sep 20 14:20:49 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 20 Sep 2018 16:20:49 +0200 Subject: RFR (XS) 8210963: Build failures after "8210829: Modularize allocations in C2" In-Reply-To: <66f8fba1-f1cf-5291-256e-11fef5184435@redhat.com> References: <66f8fba1-f1cf-5291-256e-11fef5184435@redhat.com> Message-ID: <1a282baa-680d-d3b5-7701-d20571b9da77@redhat.com> > Bug: > https://bugs.openjdk.java.net/browse/JDK-8210963 > > Missing include for 32-bit platforms makes it fail x86_32 and arm32 builds. Also, uint/intx > inconsistency makes it fail even after includes are proper, because "lines" is now "intx". Seems > easier to fix uint->intx right at uses in for-loops, because "i" is only used for "i == 0" > comparisons in them. > > Fix: > > diff -r 1fd0f300d4b7 src/hotspot/share/gc/shared/c2/barrierSetC2.cpp > --- a/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp Thu Sep 20 08:11:21 2018 -0400 > +++ b/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp Thu Sep 20 15:49:02 2018 +0200 > @@ -27,2 +27,3 @@ > #include "opto/arraycopynode.hpp" > +#include "opto/convertnode.hpp" > #include "opto/graphKit.hpp" > diff -r 1fd0f300d4b7 src/hotspot/share/opto/macro.cpp > --- a/src/hotspot/share/opto/macro.cpp Thu Sep 20 08:11:21 2018 -0400 > +++ b/src/hotspot/share/opto/macro.cpp Thu Sep 20 15:49:02 2018 +0200 > @@ -1729,3 +1729,3 @@ > > - for ( uint i = 0; i < lines; i++ ) { > + for ( intx i = 0; i < lines; i++ ) { > prefetch_adr = new AddPNode( old_pf_wm, new_pf_wmt, > @@ -1782,3 +1782,3 @@ > distance = step_size; > - for ( uint i = 1; i < lines; i++ ) { > + for ( intx i = 1; i < lines; i++ ) { > prefetch_adr = new AddPNode( cache_adr, cache_adr, > @@ -1798,3 +1798,3 @@ > uint distance = AllocatePrefetchDistance; > - for ( uint i = 0; i < lines; i++ ) { > + for ( intx i = 0; i < lines; i++ ) { > prefetch_adr = new AddPNode( old_eden_top, new_eden_top, > > > Testing: x86_64, x86_32, armhf builds > Looks good to me. Thanks for fixing this. Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From rwestrel at redhat.com Thu Sep 20 14:54:13 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 20 Sep 2018 16:54:13 +0200 Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11 In-Reply-To: References: <98587949-4bef-51ad-0652-3a74f79af233@oracle.com> <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com> <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com> <097e9014-db4a-0f2f-9b78-80cb5dcc7832@bell-sw.com> <9b58801d-fcdd-88ff-7d45-15c44096137f@bell-sw.com> Message-ID: > mkay, but how, exactly? Is it simply the case that Intel is improved > so the patch is good, even if AArch64 regresses? Well, no, I don't think that's an accurate description of what this is. Dmitry reported a performance regression but the generated code is almost identical with or without the patch (the only difference being that in one case the generated code uses b.cc and in the other b.eq). Dmitry also hypothesized that branch prediction may not perform as well with the patch. That doesn't seem directly related to the patch but more of an unfortunate side effect. So the patch simplifies the IR so less instructions may need to be emitted. That's not x86 specific. It just happens that aarch64 don't seem to be able to take advantage of it but it doesn't increase the number of instructions that aarch64 needs either or forces aarch64 to use less efficient instructions. So overall, it seemed to me there was no reasonable reason to not push this patch. Roland. From tobias.hartmann at oracle.com Thu Sep 20 15:22:07 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 20 Sep 2018 11:22:07 -0400 Subject: [Patch] 8210853: C2 doesn't skip post barrier for new allocated objects In-Reply-To: <37b6e1a5-6dcf-4097-baca-3f9c03d38bb1.kuaiwei.kw@alibaba-inc.com> References: <37b6e1a5-6dcf-4097-baca-3f9c03d38bb1.kuaiwei.kw@alibaba-inc.com> Message-ID: Hi, isn't this code executed during parsing and therefore it could happen that more inputs are added to the region? For example, by Parse::Block::add_new_path(): http://hg.openjdk.java.net/jdk/jdk/file/75e4ce0fa1ba/src/hotspot/share/opto/parse1.cpp#l1917 Best regards, Tobias On 18.09.2018 09:33, Kuai Wei wrote: > > Hi, > > ? I made a patch to?https://bugs.openjdk.java.net/browse/JDK-8210853 . Could you help review my change? > > Background: > ? C2 could remove G1 post barrier if store to new allocated object. But the check of > just_allocated_object will be prevent by a Region node which is created when inline initialize > method of super class. The change is to check the pattern and skip the Region node. > > src/hotspot/share/opto/graphKit.cpp > > ?//?We?use?this?to?determine?if?an?object?is?so?"fresh"?that > ?//?it?does?not?require?card?marks. > ?Node*?GraphKit::just_allocated_object(Node*?current_control)?{ > -??if?(C->recent_alloc_ctl()?==?current_control) > +??Node?*?ctrl?=?current_control; > +??//?Object::?is?invoked?after?allocation,?most?of?invoke?nodes > +??//?will?be?reduced,?but?a?region?node?is?kept?in?parse?time,?we?check > +??//?the?pattern?and?skip?the?region?node > +??if?(ctrl?!=?NULL?&&?ctrl->is_Region()?&&?ctrl->req()?==?2)?{ > +????ctrl?=?ctrl->in(1); > +??} > +??if?(C->recent_alloc_ctl()?==?ctrl) > ?????return?C->recent_alloc_obj(); > ???return?NULL; > ?} From tobias.hartmann at oracle.com Thu Sep 20 15:29:26 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 20 Sep 2018 11:29:26 -0400 Subject: RFR (XS) 8210963: Build failures after "8210829: Modularize allocations in C2" In-Reply-To: <66f8fba1-f1cf-5291-256e-11fef5184435@redhat.com> References: <66f8fba1-f1cf-5291-256e-11fef5184435@redhat.com> Message-ID: <12b325bd-f00e-38ff-b357-f9f4c82d3f69@oracle.com> Hi Aleksey, looks good to me too. Best regards, Tobias On 20.09.2018 10:18, Aleksey Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8210963 > > Missing include for 32-bit platforms makes it fail x86_32 and arm32 builds. Also, uint/intx > inconsistency makes it fail even after includes are proper, because "lines" is now "intx". Seems > easier to fix uint->intx right at uses in for-loops, because "i" is only used for "i == 0" > comparisons in them. > > Fix: > > diff -r 1fd0f300d4b7 src/hotspot/share/gc/shared/c2/barrierSetC2.cpp > --- a/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp Thu Sep 20 08:11:21 2018 -0400 > +++ b/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp Thu Sep 20 15:49:02 2018 +0200 > @@ -27,2 +27,3 @@ > #include "opto/arraycopynode.hpp" > +#include "opto/convertnode.hpp" > #include "opto/graphKit.hpp" > diff -r 1fd0f300d4b7 src/hotspot/share/opto/macro.cpp > --- a/src/hotspot/share/opto/macro.cpp Thu Sep 20 08:11:21 2018 -0400 > +++ b/src/hotspot/share/opto/macro.cpp Thu Sep 20 15:49:02 2018 +0200 > @@ -1729,3 +1729,3 @@ > > - for ( uint i = 0; i < lines; i++ ) { > + for ( intx i = 0; i < lines; i++ ) { > prefetch_adr = new AddPNode( old_pf_wm, new_pf_wmt, > @@ -1782,3 +1782,3 @@ > distance = step_size; > - for ( uint i = 1; i < lines; i++ ) { > + for ( intx i = 1; i < lines; i++ ) { > prefetch_adr = new AddPNode( cache_adr, cache_adr, > @@ -1798,3 +1798,3 @@ > uint distance = AllocatePrefetchDistance; > - for ( uint i = 0; i < lines; i++ ) { > + for ( intx i = 0; i < lines; i++ ) { > prefetch_adr = new AddPNode( old_eden_top, new_eden_top, > > > Testing: x86_64, x86_32, armhf builds > > Thanks, > -Aleksey > > > From shade at redhat.com Thu Sep 20 15:30:50 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 20 Sep 2018 17:30:50 +0200 Subject: RFR (XS) 8210963: Build failures after "8210829: Modularize allocations in C2" In-Reply-To: <12b325bd-f00e-38ff-b357-f9f4c82d3f69@oracle.com> References: <66f8fba1-f1cf-5291-256e-11fef5184435@redhat.com> <12b325bd-f00e-38ff-b357-f9f4c82d3f69@oracle.com> Message-ID: <412e47a7-174c-94e0-4dd8-29aedcfd8fbc@redhat.com> Thanks. Trivial? Can I push without jdk-submit? -Aleksey On 09/20/2018 05:29 PM, Tobias Hartmann wrote: > Hi Aleksey, > > looks good to me too. > > Best regards, > Tobias > > On 20.09.2018 10:18, Aleksey Shipilev wrote: >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8210963 >> >> Missing include for 32-bit platforms makes it fail x86_32 and arm32 builds. Also, uint/intx >> inconsistency makes it fail even after includes are proper, because "lines" is now "intx". Seems >> easier to fix uint->intx right at uses in for-loops, because "i" is only used for "i == 0" >> comparisons in them. >> >> Fix: >> >> diff -r 1fd0f300d4b7 src/hotspot/share/gc/shared/c2/barrierSetC2.cpp >> --- a/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp Thu Sep 20 08:11:21 2018 -0400 >> +++ b/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp Thu Sep 20 15:49:02 2018 +0200 >> @@ -27,2 +27,3 @@ >> #include "opto/arraycopynode.hpp" >> +#include "opto/convertnode.hpp" >> #include "opto/graphKit.hpp" >> diff -r 1fd0f300d4b7 src/hotspot/share/opto/macro.cpp >> --- a/src/hotspot/share/opto/macro.cpp Thu Sep 20 08:11:21 2018 -0400 >> +++ b/src/hotspot/share/opto/macro.cpp Thu Sep 20 15:49:02 2018 +0200 >> @@ -1729,3 +1729,3 @@ >> >> - for ( uint i = 0; i < lines; i++ ) { >> + for ( intx i = 0; i < lines; i++ ) { >> prefetch_adr = new AddPNode( old_pf_wm, new_pf_wmt, >> @@ -1782,3 +1782,3 @@ >> distance = step_size; >> - for ( uint i = 1; i < lines; i++ ) { >> + for ( intx i = 1; i < lines; i++ ) { >> prefetch_adr = new AddPNode( cache_adr, cache_adr, >> @@ -1798,3 +1798,3 @@ >> uint distance = AllocatePrefetchDistance; >> - for ( uint i = 0; i < lines; i++ ) { >> + for ( intx i = 0; i < lines; i++ ) { >> prefetch_adr = new AddPNode( old_eden_top, new_eden_top, >> >> >> Testing: x86_64, x86_32, armhf builds >> >> Thanks, >> -Aleksey >> >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From tobias.hartmann at oracle.com Thu Sep 20 16:16:43 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 20 Sep 2018 12:16:43 -0400 Subject: RFR (XS) 8210963: Build failures after "8210829: Modularize allocations in C2" In-Reply-To: <412e47a7-174c-94e0-4dd8-29aedcfd8fbc@redhat.com> References: <66f8fba1-f1cf-5291-256e-11fef5184435@redhat.com> <12b325bd-f00e-38ff-b357-f9f4c82d3f69@oracle.com> <412e47a7-174c-94e0-4dd8-29aedcfd8fbc@redhat.com> Message-ID: Yes. Best regards, Tobias On 20.09.2018 11:30, Aleksey Shipilev wrote: > Thanks. Trivial? Can I push without jdk-submit? > > -Aleksey > > On 09/20/2018 05:29 PM, Tobias Hartmann wrote: >> Hi Aleksey, >> >> looks good to me too. >> >> Best regards, >> Tobias >> >> On 20.09.2018 10:18, Aleksey Shipilev wrote: >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8210963 >>> >>> Missing include for 32-bit platforms makes it fail x86_32 and arm32 builds. Also, uint/intx >>> inconsistency makes it fail even after includes are proper, because "lines" is now "intx". Seems >>> easier to fix uint->intx right at uses in for-loops, because "i" is only used for "i == 0" >>> comparisons in them. >>> >>> Fix: >>> >>> diff -r 1fd0f300d4b7 src/hotspot/share/gc/shared/c2/barrierSetC2.cpp >>> --- a/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp Thu Sep 20 08:11:21 2018 -0400 >>> +++ b/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp Thu Sep 20 15:49:02 2018 +0200 >>> @@ -27,2 +27,3 @@ >>> #include "opto/arraycopynode.hpp" >>> +#include "opto/convertnode.hpp" >>> #include "opto/graphKit.hpp" >>> diff -r 1fd0f300d4b7 src/hotspot/share/opto/macro.cpp >>> --- a/src/hotspot/share/opto/macro.cpp Thu Sep 20 08:11:21 2018 -0400 >>> +++ b/src/hotspot/share/opto/macro.cpp Thu Sep 20 15:49:02 2018 +0200 >>> @@ -1729,3 +1729,3 @@ >>> >>> - for ( uint i = 0; i < lines; i++ ) { >>> + for ( intx i = 0; i < lines; i++ ) { >>> prefetch_adr = new AddPNode( old_pf_wm, new_pf_wmt, >>> @@ -1782,3 +1782,3 @@ >>> distance = step_size; >>> - for ( uint i = 1; i < lines; i++ ) { >>> + for ( intx i = 1; i < lines; i++ ) { >>> prefetch_adr = new AddPNode( cache_adr, cache_adr, >>> @@ -1798,3 +1798,3 @@ >>> uint distance = AllocatePrefetchDistance; >>> - for ( uint i = 0; i < lines; i++ ) { >>> + for ( intx i = 0; i < lines; i++ ) { >>> prefetch_adr = new AddPNode( old_eden_top, new_eden_top, >>> >>> >>> Testing: x86_64, x86_32, armhf builds >>> >>> Thanks, >>> -Aleksey >>> >>> >>> > > From jonathan.halliday at redhat.com Thu Sep 20 16:17:33 2018 From: jonathan.halliday at redhat.com (Jonathan Halliday) Date: Thu, 20 Sep 2018 17:17:33 +0100 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> Message-ID: Hi Alan I'm a middleware engineer (transaction engine, message queues, etc) and I evolved the current API design whilst making some of Red Hat's Jakarta EE stack work with persistent memory. It's a good fit for our needs because it pretty much matches they way we currently do off-heap and persistent storage, so porting existing code is a breeze. For anything that is a 'make this bunch of bytes persistent' use case there isn't really a complex API. We're not trying to pass data structures to and fro as we would when calling a richer C library. The serialization layer takes care of flattening all the structures to an opaque byte[] or ByteBuffer already. We just need to be able to reason about the persistence guarantees the same way we can with the existing sync() call. We already take care of the threading, since existing storage solutions wouldn't work without those safeguards anyhow. So, there are certainly some use cases for which the current API is a good fit, because those are the ones I designed it for, based on code that already uses and copes with the limitations of MappedByteBuffer. However... There are cases where we may want to get further optimizations by eliding the serialization to byte[]/ByteBuffer and instead be able to access persistent memory *as objects*. That's a harder problem and may involve language integration rather than just API changes, for example being able to allocate an object whose state (primitive fields, perhaps also object pointers) is backed by an (optionally explicitly specified area) of pmem. It's definitely a more powerful model, but also a much bigger problem to chew on. Some halfway solution in which we can use Java objects to point into specific areas of memory in a typesafe way (e.g. 'that pmem address should be considered an int') would seem to be something that Panama could overlap with, but it's a convenience layer that could also be modelled by putting higher level abstractions over the proposed low level API. Over time we may have e.g. PersistentLong in the same way that today we have AtomicLong, but it's something that could be tested out in a 3rd party library initially and then migrated into the standard library if it's shown to be useful. Is the proposed API sufficient for all use cases? Probably not. But it's useful for some and, so far as I can tell, non-harmful to others. Under the new release model what we have now is useful in its own right and should ship sooner rather than later, with additional functionality following later in a modular, agile fashion? I don't really see sufficient advantage in holding this pending e.g. investigation of integration with Panama, though that's definitely an interesting avenue for future work. Regards Jonathan On 10/09/2018 19:05, Alan Bateman wrote: ... > I realize we've been through several iterations on this but I'm now > wondering if the MappedByteBuffer is the right API. As you've shown, > it's straight forward to map a region of NVM and use the existing API, > I'm just not sure if it's the right API. I think I'd like to see a few > examples of how the API might be used. ByteBuffers aren't intended for > use by concurrent threads and I just wonder if the examples might need > that. I also wonder if there is a possible connection with work in > Project Panama and whether it's worth exploring if its scopes and > pointers could be used to backed by NVM. The Risks and Assumption > section mentions the 2GB limit which is another reminder that the MBB > API may not be the right API. -- Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From vladimir.kozlov at oracle.com Thu Sep 20 17:09:02 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 20 Sep 2018 10:09:02 -0700 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com> <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com> <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com> <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com> <3c0fe42a-0606-a060-7435-32547676683b@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF4BB3@FMSMSX126.amr.corp.intel.com> Message-ID: <3cbd216b-56f9-0398-db93-4e032af85dc1@oracle.com> I hit build failure on SPARC due to shared changes in C1: workspace/open/src/hotspot/cpu/sparc/c1_LIRAssembler_sparc.cpp", line 3027: Error: "LIR_Assembler::negate(LIR_OprDesc*, LIR_OprDesc*)" was previously declared "LIR_Assembler::negate(LIR_OprDesc*, LIR_OprDesc*, LIR_OprDesc*)". jib > 1 Error(s) detected. I assume other platforms are also affected. Vladimir On 9/19/18 9:53 AM, Vladimir Kozlov wrote: > Thank you, Sandhya > > I submitted new testing. > > Vladimir > > On 9/18/18 4:52 PM, Viswanathan, Sandhya wrote: >> Hi Vladimir, >> >> Please find below the updated webrev with fixes for the two issues: >> >> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.03/ >> >> >> RFE: https://bugs.openjdk.java.net/browse/JDK-8210764 >> >> Fix for compiler/vectorization/TestNaNVector.java was to pass legVecS as the temporary register type for intrinsics >> instead of legVecD. >> >> This test was only failing with -XX:MaxVectorSize=4. >> >> The file modified is x86_64.ad. >> >> Fix for compiler/vectorization/TestNaNVector.java was to allow all xmm registers (xmm0-xmm31) for C1 and handle >> floating point abs and negate appropriately by providing a temp register. >> >> The C1 files are modified for this fix. >> >> I reran compiler jtreg tests with setting nativepath appropriately on Haswell, SKX and KNL. >> >> Best Regards, >> >> Sandhya >> >> *From:*Viswanathan, Sandhya >> *Sent:* Tuesday, September 18, 2018 1:47 PM >> *To:* 'JC Beyler' >> *Cc:* vladimir.kozlov at oracle.com; hotspot-compiler-dev >> *Subject:* RE: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >> >> Hi Jc, >> >> Thanks a lot for the steps. I am now able to verify my fix for the NativeCallTest.java. >> >> Best Regards, >> >> Sandhya >> >> *From:*JC Beyler [mailto:jcbeyler at google.com] >> *Sent:* Monday, September 17, 2018 9:29 PM >> *To:* Viswanathan, Sandhya > >> *Cc:* vladimir.kozlov at oracle.com ; hotspot-compiler-dev >> > >> *Subject:* Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >> >> Hi Sandhya, >> >> How are you invoking the test for NativeCallTest? >> >> The way I would do it using jtreg would be something like this: >> >> $ export BUILD_TYPE=release >> >> $ export JDK_PATH=wherever you have your JDK >> >> ?From the test subfolder: >> >> $ wherever-your-jtreg-is/bin/jtreg >> -nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/support/test/hotspot/jtreg/native/lib -jdk >> $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk >> hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java >> >> Seems to pass for me. >> >> But much easier is: >> >> $ make run-test TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java" >> >> That seems to pass for me as well and is easier to use :) >> >> For information, the make run-test documentation is here: >> >> http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing.html >> >> Let me know if that helps, >> >> Jc >> >> ??? Note: For NativeCallTest.java and many others I get the following message in the corresponding .jtr file: >> ???? ? ? ? ? ? ? ? ? ? ? "test result: Error. Use -nativepath to specify the location of native code" >> ???? ? ? ? ? ? ? Do I need to give any additional info to jtreg to get over this problem? >> >> ??? Thanks a lot! >> ??? Best Regards, >> ??? Sandhya >> >> ??? -----Original Message----- >> ??? From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com ] >> ??? Sent: Monday, September 17, 2018 10:14 AM >> ??? To: Viswanathan, Sandhya >; >> ??? hotspot-compiler-dev at openjdk.java.net >> ??? Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >> >> ??? I finished testing on avx512 machine. >> ??? All passed except known (TestNaNVector.java) failures. >> >> ??? Thanks, >> ??? Vladimir >> >> ??? On 9/14/18 5:22 PM, Vladimir Kozlov wrote: >> ???? > I gave incorrect link to RFE. Here is correct: >> ???? > >> ???? > https://bugs.openjdk.java.net/browse/JDK-8210764 >> ???? > >> ???? > Vladimir >> ???? > >> ???? > On 9/14/18 3:49 PM, Vladimir Kozlov wrote: >> ???? >> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed. >> ???? >> >> ???? >> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too. >> ???? >> >> ???? >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation' >> ??? on CPU >> ???? >> with AVX1 only >> ???? >> >> ???? >> #? SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884 >> ???? >> # Problematic frame: >> ???? >> # V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20 >> ???? >> >> ???? >> Current CompileTask: >> ???? >> C2:??? 154??? 5???????????? java.lang.String::equals (65 bytes) >> ???? >> >> ???? >> Stack: [0x00007f3b10044000,0x00007f3b10145000],? sp=0x00007f3b1013fe70,? free space=1007k >> ???? >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) >> ???? >> V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20 >> ???? >> V? [libjvm.so+0x882a72]? PhaseChaitin::gather_lrg_masks(bool)+0x872 >> ???? >> V? [libjvm.so+0xd82235]? PhaseCFG::global_code_motion()+0xfc5 >> ???? >> V? [libjvm.so+0xd824b1]? PhaseCFG::do_global_code_motion()+0x51 >> ???? >> V? [libjvm.so+0xa2c26c]? Compile::Code_Gen()+0x24c >> ???? >> V? [libjvm.so+0xa2ff82]? Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, >> ??? DirectiveSet*)+0xe42 >> ???? >> >> ???? >> ------------------------------------------------------------------------------------------------ >> ???? >> 2. >> >> ??? with '-Xcomp' >> ???? >> #? Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073 >> ???? >> #? assert(false) failed: cannot spill interval that is used in first instruction (possible reason: no register >> ??? found) >> ???? >> >> ???? >> Current CompileTask: >> ???? >> C1: 854767 13391?????? 3?????? org.sunflow.math.Matrix4::multiply (692 bytes) >> ???? >> >> ???? >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],? sp=0x00007f23b9e7f9d0,? free space=1014k >> ???? >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) >> ???? >> V? [libjvm.so+0x1882202]? VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, >> unsigned >> ???? >> char*, void*, void*, char const*, int, unsigned long)+0x562 >> ???? >> V? [libjvm.so+0x1882d2f]? VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, >> ???? >> __va_list_tag*)+0x2f >> ???? >> V? [libjvm.so+0xb0bea0]? report_vm_error(char const*, int, char const*, char const*, ...)+0x100 >> ???? >> V? [libjvm.so+0x7e0410]? LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0 >> ???? >> V? [libjvm.so+0x7e0a20]? LinearScanWalker::activate_current()+0x280 >> ???? >> V? [libjvm.so+0x7e0c7d]? IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d >> ???? >> V? [libjvm.so+0x7e1078]? LinearScan::allocate_registers()+0x338 >> ???? >> V? [libjvm.so+0x7e2135]? LinearScan::do_linear_scan()+0x155 >> ???? >> V? [libjvm.so+0x70a6bb]? Compilation::emit_lir()+0x99b >> ???? >> V? [libjvm.so+0x70caff]? Compilation::compile_java_method()+0x42f >> ???? >> V? [libjvm.so+0x70d974]? Compilation::compile_method()+0x1d4 >> ???? >> V? [libjvm.so+0x70e547]? Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, >> ???? >> DirectiveSet*)+0x357 >> ???? >> V? [libjvm.so+0x71073c]? Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c >> ???? >> V? [libjvm.so+0xa3cf89]? CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409 >> ???? >> >> ???? >> Vladimir >> ???? >> >> ???? >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote: >> ???? >>> >> ???? >>> Thanks Vladimir, the below should fix this issue: >> ???? >>> >> ???? >>> ------------------------------ >> ???? >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700 >> ???? >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700 >> ???? >>> @@ -233,22 +233,6 @@ >> ???? >>> ??? _xmm_regs[13]? = xmm13; >> ???? >>> ??? _xmm_regs[14]? = xmm14; >> ???? >>> ??? _xmm_regs[15]? = xmm15; >> ???? >>> -? _xmm_regs[16]? = xmm16; >> ???? >>> -? _xmm_regs[17]? = xmm17; >> ???? >>> -? _xmm_regs[18]? = xmm18; >> ???? >>> -? _xmm_regs[19]? = xmm19; >> ???? >>> -? _xmm_regs[20]? = xmm20; >> ???? >>> -? _xmm_regs[21]? = xmm21; >> ???? >>> -? _xmm_regs[22]? = xmm22; >> ???? >>> -? _xmm_regs[23]? = xmm23; >> ???? >>> -? _xmm_regs[24]? = xmm24; >> ???? >>> -? _xmm_regs[25]? = xmm25; >> ???? >>> -? _xmm_regs[26]? = xmm26; >> ???? >>> -? _xmm_regs[27]? = xmm27; >> ???? >>> -? _xmm_regs[28]? = xmm28; >> ???? >>> -? _xmm_regs[29]? = xmm29; >> ???? >>> -? _xmm_regs[30]? = xmm30; >> ???? >>> -? _xmm_regs[31]? = xmm31; >> ???? >>> ? #endif // _LP64 >> ???? >>> >> ???? >>> ??? for (int i = 0; i < 8; i++) { >> ???? >>> --------------------------------- >> ???? >>> >> ???? >>> I think the gcc version on my desktop is older so didn?t catch this. >> ???? >>> >> ???? >>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to: >> ???? >>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/ >> ??? >> ???? >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735 >> ???? >>> >> ???? >>> FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you >> before >> ???? >>> changing it back to 3. >> ???? >>> >> ???? >>> Best Regards, >> ???? >>> Sandhya >> ???? >>> >> ???? >>> >> ???? >>> -----Original Message----- >> ???? >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com ] >> ???? >>> Sent: Friday, September 14, 2018 12:13 PM >> ???? >>> To: Viswanathan, Sandhya >; >> ??? hotspot-compiler-dev at openjdk.java.net >> ???? >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >> ???? >>> >> ???? >>> I got build failure: >> ???? >>> >> ???? >>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the array >> ???? >>> (which contains 16 elements) [-Werror,-Warray-bounds] >> ???? >>> jib >?? _xmm_regs[16]? = xmm16; >> ???? >>> >> ???? >>> I also noticed that we don't have RFE for this work. I filed: >> ???? >>> >> ???? >>> https://bugs.openjdk.java.net/browse/JDK-8209735 >> ???? >>> >> ???? >>> You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next >> ???? >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp: >> ???? >>> >> ???? >>> - product(intx, UseAVX, 2, \ >> ???? >>> + product(intx, UseAVX, 3, \ >> ???? >>> >> ???? >>> Thanks, >> ???? >>> Vladimir >> ???? >>> >> ???? >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote: >> ???? >>>> Looks good to me. I will start testing and let you know results. >> ???? >>>> >> ???? >>>> Thanks, >> ???? >>>> Vladimir >> ???? >>>> >> ???? >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote: >> ???? >>>>> Hi Vladimir, >> ???? >>>>> >> ???? >>>>> Please find below the updated webrev with all your comments incorporated: >> ???? >>>>> >> ???? >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/ >> ??? >> ???? >>>>> >> ???? >>>>> I have run the jtreg compiler tests on SKX and KNL which have two >> ???? >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms. >> ???? >>>>> >> ???? >>>>> Best Regards, >> ???? >>>>> Sandhya >> ???? >>>>> >> ???? >>>>> >> ???? >>>>> -----Original Message----- >> ???? >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com ] >> ???? >>>>> Sent: Tuesday, September 11, 2018 8:54 PM >> ???? >>>>> To: Viswanathan, Sandhya >; >> ???? >>>>> hotspot-compiler-dev at openjdk.java.net >> ???? >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >> ???? >>>>> >> ???? >>>>> Thank you, Sandhya >> ???? >>>>> >> ???? >>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them. >> ???? >>>>> >> ???? >>>>> Vladimir >> ???? >>>>> >> ???? >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote: >> ???? >>>>>> Hi Vladimir, >> ???? >>>>>> >> ???? >>>>>> Thanks a lot for the detailed review. I really appreciate your feedback. >> ???? >>>>>> Please see my response in your email below marked with (Sandhya >> ???? >>>>>>>>> ). Looking forward to your advice. >> ???? >>>>>> >> ???? >>>>>> Best Regards, >> ???? >>>>>> Sandhya >> ???? >>>>>> >> ???? >>>>>> >> ???? >>>>>> -----Original Message----- >> ???? >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com ] >> ???? >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM >> ???? >>>>>> To: Viswanathan, Sandhya >; >> ???? >>>>>> hotspot-compiler-dev at openjdk.java.net >> ???? >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >> ???? >>>>>> instruction >> ???? >>>>>> >> ???? >>>>>> Thank you. >> ???? >>>>>> >> ???? >>>>>> I want to discuss next issue: >> ???? >>>>>> >> ???? >>>>>> ??? > You did not added instructions to load these registers from >> ???? >>>>>> memory (and stack). What happens in such cases when you need to load or store? >> ???? >>>>>> ??? >>>> Let us take an example, e.g. for loading into rregF. First >> ???? >>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa. >> ???? >>>>>> >> ???? >>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack. >> ???? >>>>>> Also we don't check that register could be the same as result you may get unneeded moves. >> ???? >>>>>> >> ???? >>>>>> I would advice add memory moves at least. >> ???? >>>>>> >> ???? >>>>>> Sandhya >>>? I had added those rules initially and removed them in >> ???? >>>>>> the final patch. I noticed that the register allocator uses the >> ???? >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask >> ???? >>>>>> (matcher.cpp). I would like the register allocator to get all the >> ???? >>>>>> possible register on an architecture for idealreg2reg mask. I >> ???? >>>>>> wondered that multiple instruct rules in .ad file for LoadF from >> ???? >>>>>> memory might cause problems.? I would have to have higher cost for >> ???? >>>>>> loading into restricted register set like vlReg. Then I decided that >> ???? >>>>>> the register allocator can handle this in much better way than me >> ???? >>>>>> adding rules to load from memory. This is with the background that the regF is always all the available >> ??? registers >> ???? >>>>>> and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this >> ??? and if >> ???? >>>>>> I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp >> ??? that I >> ???? >>>>>> am referring to is: >> ???? >>>>>> ???? MachNode *spillCP = match_tree(new >> ???? >>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >> ???? >>>>>> #endif >> ???? >>>>>> ???? MachNode *spillI? = match_tree(new >> ???? >>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); >> ???? >>>>>> ???? MachNode *spillL? = match_tree(new >> ???? >>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, >> ???? >>>>>> LoadNode::DependsO nlyOnTest, false)); >> ???? >>>>>> ???? MachNode *spillF? = match_tree(new >> ???? >>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); >> ???? >>>>>> ???? MachNode *spillD? = match_tree(new >> ???? >>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); >> ???? >>>>>> ???? MachNode *spillP? = match_tree(new >> ???? >>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >> ???? >>>>>> ???? .... >> ???? >>>>>> ???? idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); >> ???? >>>>>> >> ???? >>>>>> An other question. You use movflt() and movdbl() which use either >> ???? >>>>>> movap[s|d] and movs[s|d] >> ???? >>>>>> instructions: >> ???? >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu >> ???? >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when >> ???? >>>>>> avx512vl is not available? I see for vectors you use >> ???? >>>>>> vpxor+vinserti* combination. >> ???? >>>>>> >> ???? >>>>>> Sandhya >>> Yes the scalar floating point instructions are available >> ???? >>>>>> with AVX512 encoding when avx512vl is not available. That is why you >> ???? >>>>>> would see not just movflt, movdbl but all the other scalar >> ???? >>>>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F >> ??? instructions. >> ???? >>>>>> >> ???? >>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad : >> ???? >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl()) >> ???? >>>>>> >> ???? >>>>>> Should it be (UseAVX < 3)? >> ???? >>>>>> >> ???? >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev. >> ???? >>>>>> >> ???? >>>>>> Thanks, >> ???? >>>>>> Vladimir >> ???? >>>>>> >> ???? >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: >> ???? >>>>>>> Hi Vladimir, >> ???? >>>>>>> >> ???? >>>>>>> Thanks a lot for your review and feedback. Please see my response >> ???? >>>>>>> in your email below. I will send an updated webrev incorporating your feedback. >> ???? >>>>>>> >> ???? >>>>>>> Best Regards, >> ???? >>>>>>> Sandhya >> ???? >>>>>>> >> ???? >>>>>>> >> ???? >>>>>>> -----Original Message----- >> ???? >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com ] >> ???? >>>>>>> Sent: Monday, September 10, 2018 6:09 PM >> ???? >>>>>>> To: Viswanathan, Sandhya >; >> ???? >>>>>>> hotspot-compiler-dev at openjdk.java.net >> ???? >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >> ???? >>>>>>> instruction >> ???? >>>>>>> >> ???? >>>>>>> Very nice. Thank you, Sandhya. >> ???? >>>>>>> >> ???? >>>>>>> I would like to see more meaningful naming in .ad files - instead >> ???? >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*. >> ???? >>>>>>> >> ???? >>>>>>>>>> Yes, accepted. >> ???? >>>>>>> >> ???? >>>>>>> New load_from_* and load_to_* instructions in .ad files should be >> ???? >>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions: >> ???? >>>>>>> >> ???? >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, >> ???? >>>>>>> vlRegF src) >> ???? >>>>>>>>>> Yes, accepted. >> ???? >>>>>>> >> ???? >>>>>>> You did not added instructions to load these registers from memory >> ???? >>>>>>> (and stack). What happens in such cases when you need to load or store? >> ???? >>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it >> ???? >>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa. >> ???? >>>>>>> >> ???? >>>>>>> Also please explain why these registers are used when UseAVX == 0?: >> ???? >>>>>>> >> ???? >>>>>>> +instruct absD_reg(rregD dst) %{ >> ???? >>>>>>> ??????? predicate((UseSSE>=2) && (UseAVX == 0)); >> ???? >>>>>>> >> ???? >>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too: >> ???? >>>>>>> ????? 661?? if (UseAVX < 3) { >> ???? >>>>>>> ????? 662???? _features &= ~CPU_AVX512F; >> ???? >>>>>>> >> ???? >>>>>>>>>> Yes, accepted. It could be regD here. >> ???? >>>>>>> >> ???? >>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): >> ???? >>>>>>> >> ???? >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, >> ???? >>>>>>> +vectors_reg_legacy, %{ >> ???? >>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() && >> ???? >>>>>>> VM_Version::supports_avx512dq() && >> ???? >>>>>>> VM_Version::supports_avx512vl() %} ); >> ???? >>>>>>> >> ???? >>>>>>>>>> Yes, accepted. >> ???? >>>>>>> >> ???? >>>>>>> I would suggest to test these changes on different machines >> ???? >>>>>>> (non-avx512 and avx512) and with different UseAVX values. >> ???? >>>>>>> >> ???? >>>>>>>>>> Will do. >> ???? >>>>>>> >> ???? >>>>>>> Thanks, >> ???? >>>>>>> Vladimir >> ???? >>>>>>> >> ???? >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: >> ???? >>>>>>>> Recently there have been couple of high priority issues with >> ???? >>>>>>>> regards to high bank of XMM register >> ???? >>>>>>>> (XMM16-XMM31) usage by C2: >> ???? >>>>>>>> >> ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746 >> ???? >>>>>>>> >> ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735 >> ???? >>>>>>>> >> ???? >>>>>>>> Please find below a patch which attempts to clean up the XMM >> ???? >>>>>>>> register handling by using register groups. >> ???? >>>>>>>> >> ???? >>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ >> ??? >> ???? >>>>>>>> >> ???? >>>>>>>> >> ???? >>>>>>>> The patch provides a restricted set of registers to the match >> ???? >>>>>>>> rules in the ad file based on the underlying architecture. >> ???? >>>>>>>> >> ???? >>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler. >> ???? >>>>>>>> >> ???? >>>>>>>> By removing the special handling, the patch reduces the overall >> ???? >>>>>>>> code size by about 1800 lines of code. >> ???? >>>>>>>> >> ???? >>>>>>>> Your review and feedback is very welcome. >> ???? >>>>>>>> >> ???? >>>>>>>> Best Regards, >> ???? >>>>>>>> >> ???? >>>>>>>> Sandhya >> ???? >>>>>>>> >> >> >> -- >> >> Thanks, >> >> Jc >> From vladimir.kozlov at oracle.com Thu Sep 20 17:12:46 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 20 Sep 2018 10:12:46 -0700 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: <3cbd216b-56f9-0398-db93-4e032af85dc1@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com> <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com> <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com> <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com> <3c0fe42a-0606-a060-7435-32547676683b@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF4BB3@FMSMSX126.amr.corp.intel.com> <3cbd216b-56f9-0398-db93-4e032af85dc1@oracle.com> Message-ID: <8facd4a1-f93a-4f63-daa7-34ad993a556b@oracle.com> Sandhya, you can use jdk submit repo to test build on other Oracle platforms (x64 and SPARC only, no 32-bit): https://wiki.openjdk.java.net/display/Build/Submit+Repo Vladimir On 9/20/18 10:09 AM, Vladimir Kozlov wrote: > I hit build failure on SPARC due to shared changes in C1: > > workspace/open/src/hotspot/cpu/sparc/c1_LIRAssembler_sparc.cpp", line 3027: Error: "LIR_Assembler::negate(LIR_OprDesc*, > LIR_OprDesc*)" was previously declared "LIR_Assembler::negate(LIR_OprDesc*, LIR_OprDesc*, LIR_OprDesc*)". > jib > 1 Error(s) detected. > > I assume other platforms are also affected. > > Vladimir > > On 9/19/18 9:53 AM, Vladimir Kozlov wrote: >> Thank you, Sandhya >> >> I submitted new testing. >> >> Vladimir >> >> On 9/18/18 4:52 PM, Viswanathan, Sandhya wrote: >>> Hi Vladimir, >>> >>> Please find below the updated webrev with fixes for the two issues: >>> >>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.03/ >>> >>> >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8210764 >>> >>> Fix for compiler/vectorization/TestNaNVector.java was to pass legVecS as the temporary register type for intrinsics >>> instead of legVecD. >>> >>> This test was only failing with -XX:MaxVectorSize=4. >>> >>> The file modified is x86_64.ad. >>> >>> Fix for compiler/vectorization/TestNaNVector.java was to allow all xmm registers (xmm0-xmm31) for C1 and handle >>> floating point abs and negate appropriately by providing a temp register. >>> >>> The C1 files are modified for this fix. >>> >>> I reran compiler jtreg tests with setting nativepath appropriately on Haswell, SKX and KNL. >>> >>> Best Regards, >>> >>> Sandhya >>> >>> *From:*Viswanathan, Sandhya >>> *Sent:* Tuesday, September 18, 2018 1:47 PM >>> *To:* 'JC Beyler' >>> *Cc:* vladimir.kozlov at oracle.com; hotspot-compiler-dev >>> *Subject:* RE: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >>> >>> Hi Jc, >>> >>> Thanks a lot for the steps. I am now able to verify my fix for the NativeCallTest.java. >>> >>> Best Regards, >>> >>> Sandhya >>> >>> *From:*JC Beyler [mailto:jcbeyler at google.com] >>> *Sent:* Monday, September 17, 2018 9:29 PM >>> *To:* Viswanathan, Sandhya > >>> *Cc:* vladimir.kozlov at oracle.com ; hotspot-compiler-dev >>> > >>> *Subject:* Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >>> >>> Hi Sandhya, >>> >>> How are you invoking the test for NativeCallTest? >>> >>> The way I would do it using jtreg would be something like this: >>> >>> $ export BUILD_TYPE=release >>> >>> $ export JDK_PATH=wherever you have your JDK >>> >>> ?From the test subfolder: >>> >>> $ wherever-your-jtreg-is/bin/jtreg >>> -nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/support/test/hotspot/jtreg/native/lib -jdk >>> $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk >>> hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java >>> >>> Seems to pass for me. >>> >>> But much easier is: >>> >>> $ make run-test TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java" >>> >>> That seems to pass for me as well and is easier to use :) >>> >>> For information, the make run-test documentation is here: >>> >>> http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing.html >>> >>> Let me know if that helps, >>> >>> Jc >>> >>> ??? Note: For NativeCallTest.java and many others I get the following message in the corresponding .jtr file: >>> ???? ? ? ? ? ? ? ? ? ? ? "test result: Error. Use -nativepath to specify the location of native code" >>> ???? ? ? ? ? ? ? Do I need to give any additional info to jtreg to get over this problem? >>> >>> ??? Thanks a lot! >>> ??? Best Regards, >>> ??? Sandhya >>> >>> ??? -----Original Message----- >>> ??? From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com ] >>> ??? Sent: Monday, September 17, 2018 10:14 AM >>> ??? To: Viswanathan, Sandhya >; >>> ??? hotspot-compiler-dev at openjdk.java.net >>> ??? Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >>> >>> ??? I finished testing on avx512 machine. >>> ??? All passed except known (TestNaNVector.java) failures. >>> >>> ??? Thanks, >>> ??? Vladimir >>> >>> ??? On 9/14/18 5:22 PM, Vladimir Kozlov wrote: >>> ???? > I gave incorrect link to RFE. Here is correct: >>> ???? > >>> ???? > https://bugs.openjdk.java.net/browse/JDK-8210764 >>> ???? > >>> ???? > Vladimir >>> ???? > >>> ???? > On 9/14/18 3:49 PM, Vladimir Kozlov wrote: >>> ???? >> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed. >>> ???? >> >>> ???? >> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too. >>> ???? >> >>> ???? >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation' >>> ??? on CPU >>> ???? >> with AVX1 only >>> ???? >> >>> ???? >> #? SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884 >>> ???? >> # Problematic frame: >>> ???? >> # V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20 >>> ???? >> >>> ???? >> Current CompileTask: >>> ???? >> C2:??? 154??? 5???????????? java.lang.String::equals (65 bytes) >>> ???? >> >>> ???? >> Stack: [0x00007f3b10044000,0x00007f3b10145000],? sp=0x00007f3b1013fe70,? free space=1007k >>> ???? >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) >>> ???? >> V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20 >>> ???? >> V? [libjvm.so+0x882a72]? PhaseChaitin::gather_lrg_masks(bool)+0x872 >>> ???? >> V? [libjvm.so+0xd82235]? PhaseCFG::global_code_motion()+0xfc5 >>> ???? >> V? [libjvm.so+0xd824b1]? PhaseCFG::do_global_code_motion()+0x51 >>> ???? >> V? [libjvm.so+0xa2c26c]? Compile::Code_Gen()+0x24c >>> ???? >> V? [libjvm.so+0xa2ff82]? Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, >>> ??? DirectiveSet*)+0xe42 >>> ???? >> >>> ???? >> ------------------------------------------------------------------------------------------------ >>> ???? >> 2. >>> >>> ??? with '-Xcomp' >>> ???? >> #? Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073 >>> ???? >> #? assert(false) failed: cannot spill interval that is used in first instruction (possible reason: no register >>> ??? found) >>> ???? >> >>> ???? >> Current CompileTask: >>> ???? >> C1: 854767 13391?????? 3?????? org.sunflow.math.Matrix4::multiply (692 bytes) >>> ???? >> >>> ???? >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],? sp=0x00007f23b9e7f9d0,? free space=1014k >>> ???? >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) >>> ???? >> V? [libjvm.so+0x1882202]? VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, >>> unsigned >>> ???? >> char*, void*, void*, char const*, int, unsigned long)+0x562 >>> ???? >> V? [libjvm.so+0x1882d2f]? VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, >>> ???? >> __va_list_tag*)+0x2f >>> ???? >> V? [libjvm.so+0xb0bea0]? report_vm_error(char const*, int, char const*, char const*, ...)+0x100 >>> ???? >> V? [libjvm.so+0x7e0410]? LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0 >>> ???? >> V? [libjvm.so+0x7e0a20]? LinearScanWalker::activate_current()+0x280 >>> ???? >> V? [libjvm.so+0x7e0c7d]? IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d >>> ???? >> V? [libjvm.so+0x7e1078]? LinearScan::allocate_registers()+0x338 >>> ???? >> V? [libjvm.so+0x7e2135]? LinearScan::do_linear_scan()+0x155 >>> ???? >> V? [libjvm.so+0x70a6bb]? Compilation::emit_lir()+0x99b >>> ???? >> V? [libjvm.so+0x70caff]? Compilation::compile_java_method()+0x42f >>> ???? >> V? [libjvm.so+0x70d974]? Compilation::compile_method()+0x1d4 >>> ???? >> V? [libjvm.so+0x70e547]? Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, >>> ???? >> DirectiveSet*)+0x357 >>> ???? >> V? [libjvm.so+0x71073c]? Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c >>> ???? >> V? [libjvm.so+0xa3cf89]? CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409 >>> ???? >> >>> ???? >> Vladimir >>> ???? >> >>> ???? >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote: >>> ???? >>> >>> ???? >>> Thanks Vladimir, the below should fix this issue: >>> ???? >>> >>> ???? >>> ------------------------------ >>> ???? >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700 >>> ???? >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700 >>> ???? >>> @@ -233,22 +233,6 @@ >>> ???? >>> ??? _xmm_regs[13]? = xmm13; >>> ???? >>> ??? _xmm_regs[14]? = xmm14; >>> ???? >>> ??? _xmm_regs[15]? = xmm15; >>> ???? >>> -? _xmm_regs[16]? = xmm16; >>> ???? >>> -? _xmm_regs[17]? = xmm17; >>> ???? >>> -? _xmm_regs[18]? = xmm18; >>> ???? >>> -? _xmm_regs[19]? = xmm19; >>> ???? >>> -? _xmm_regs[20]? = xmm20; >>> ???? >>> -? _xmm_regs[21]? = xmm21; >>> ???? >>> -? _xmm_regs[22]? = xmm22; >>> ???? >>> -? _xmm_regs[23]? = xmm23; >>> ???? >>> -? _xmm_regs[24]? = xmm24; >>> ???? >>> -? _xmm_regs[25]? = xmm25; >>> ???? >>> -? _xmm_regs[26]? = xmm26; >>> ???? >>> -? _xmm_regs[27]? = xmm27; >>> ???? >>> -? _xmm_regs[28]? = xmm28; >>> ???? >>> -? _xmm_regs[29]? = xmm29; >>> ???? >>> -? _xmm_regs[30]? = xmm30; >>> ???? >>> -? _xmm_regs[31]? = xmm31; >>> ???? >>> ? #endif // _LP64 >>> ???? >>> >>> ???? >>> ??? for (int i = 0; i < 8; i++) { >>> ???? >>> --------------------------------- >>> ???? >>> >>> ???? >>> I think the gcc version on my desktop is older so didn?t catch this. >>> ???? >>> >>> ???? >>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to: >>> ???? >>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/ >>> ??? >>> ???? >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735 >>> ???? >>> >>> ???? >>> FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you >>> before >>> ???? >>> changing it back to 3. >>> ???? >>> >>> ???? >>> Best Regards, >>> ???? >>> Sandhya >>> ???? >>> >>> ???? >>> >>> ???? >>> -----Original Message----- >>> ???? >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com ] >>> ???? >>> Sent: Friday, September 14, 2018 12:13 PM >>> ???? >>> To: Viswanathan, Sandhya >; >>> ??? hotspot-compiler-dev at openjdk.java.net >>> ???? >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >>> ???? >>> >>> ???? >>> I got build failure: >>> ???? >>> >>> ???? >>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the >>> array >>> ???? >>> (which contains 16 elements) [-Werror,-Warray-bounds] >>> ???? >>> jib >?? _xmm_regs[16]? = xmm16; >>> ???? >>> >>> ???? >>> I also noticed that we don't have RFE for this work. I filed: >>> ???? >>> >>> ???? >>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>> ???? >>> >>> ???? >>> You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next >>> ???? >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp: >>> ???? >>> >>> ???? >>> - product(intx, UseAVX, 2, \ >>> ???? >>> + product(intx, UseAVX, 3, \ >>> ???? >>> >>> ???? >>> Thanks, >>> ???? >>> Vladimir >>> ???? >>> >>> ???? >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote: >>> ???? >>>> Looks good to me. I will start testing and let you know results. >>> ???? >>>> >>> ???? >>>> Thanks, >>> ???? >>>> Vladimir >>> ???? >>>> >>> ???? >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote: >>> ???? >>>>> Hi Vladimir, >>> ???? >>>>> >>> ???? >>>>> Please find below the updated webrev with all your comments incorporated: >>> ???? >>>>> >>> ???? >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/ >>> ??? >>> ???? >>>>> >>> ???? >>>>> I have run the jtreg compiler tests on SKX and KNL which have two >>> ???? >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms. >>> ???? >>>>> >>> ???? >>>>> Best Regards, >>> ???? >>>>> Sandhya >>> ???? >>>>> >>> ???? >>>>> >>> ???? >>>>> -----Original Message----- >>> ???? >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com ] >>> ???? >>>>> Sent: Tuesday, September 11, 2018 8:54 PM >>> ???? >>>>> To: Viswanathan, Sandhya >; >>> ???? >>>>> hotspot-compiler-dev at openjdk.java.net >>> ???? >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction >>> ???? >>>>> >>> ???? >>>>> Thank you, Sandhya >>> ???? >>>>> >>> ???? >>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them. >>> ???? >>>>> >>> ???? >>>>> Vladimir >>> ???? >>>>> >>> ???? >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote: >>> ???? >>>>>> Hi Vladimir, >>> ???? >>>>>> >>> ???? >>>>>> Thanks a lot for the detailed review. I really appreciate your feedback. >>> ???? >>>>>> Please see my response in your email below marked with (Sandhya >>> ???? >>>>>>>>> ). Looking forward to your advice. >>> ???? >>>>>> >>> ???? >>>>>> Best Regards, >>> ???? >>>>>> Sandhya >>> ???? >>>>>> >>> ???? >>>>>> >>> ???? >>>>>> -----Original Message----- >>> ???? >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com ] >>> ???? >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM >>> ???? >>>>>> To: Viswanathan, Sandhya >; >>> ???? >>>>>> hotspot-compiler-dev at openjdk.java.net >>> ???? >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>> ???? >>>>>> instruction >>> ???? >>>>>> >>> ???? >>>>>> Thank you. >>> ???? >>>>>> >>> ???? >>>>>> I want to discuss next issue: >>> ???? >>>>>> >>> ???? >>>>>> ??? > You did not added instructions to load these registers from >>> ???? >>>>>> memory (and stack). What happens in such cases when you need to load or store? >>> ???? >>>>>> ??? >>>> Let us take an example, e.g. for loading into rregF. First >>> ???? >>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa. >>> ???? >>>>>> >>> ???? >>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack. >>> ???? >>>>>> Also we don't check that register could be the same as result you may get unneeded moves. >>> ???? >>>>>> >>> ???? >>>>>> I would advice add memory moves at least. >>> ???? >>>>>> >>> ???? >>>>>> Sandhya >>>? I had added those rules initially and removed them in >>> ???? >>>>>> the final patch. I noticed that the register allocator uses the >>> ???? >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask >>> ???? >>>>>> (matcher.cpp). I would like the register allocator to get all the >>> ???? >>>>>> possible register on an architecture for idealreg2reg mask. I >>> ???? >>>>>> wondered that multiple instruct rules in .ad file for LoadF from >>> ???? >>>>>> memory might cause problems.? I would have to have higher cost for >>> ???? >>>>>> loading into restricted register set like vlReg. Then I decided that >>> ???? >>>>>> the register allocator can handle this in much better way than me >>> ???? >>>>>> adding rules to load from memory. This is with the background that the regF is always all the available >>> ??? registers >>> ???? >>>>>> and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this >>> ??? and if >>> ???? >>>>>> I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp >>> ??? that I >>> ???? >>>>>> am referring to is: >>> ???? >>>>>> ???? MachNode *spillCP = match_tree(new >>> ???? >>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>> ???? >>>>>> #endif >>> ???? >>>>>> ???? MachNode *spillI? = match_tree(new >>> ???? >>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); >>> ???? >>>>>> ???? MachNode *spillL? = match_tree(new >>> ???? >>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, >>> ???? >>>>>> LoadNode::DependsO nlyOnTest, false)); >>> ???? >>>>>> ???? MachNode *spillF? = match_tree(new >>> ???? >>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); >>> ???? >>>>>> ???? MachNode *spillD? = match_tree(new >>> ???? >>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); >>> ???? >>>>>> ???? MachNode *spillP? = match_tree(new >>> ???? >>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>> ???? >>>>>> ???? .... >>> ???? >>>>>> ???? idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); >>> ???? >>>>>> >>> ???? >>>>>> An other question. You use movflt() and movdbl() which use either >>> ???? >>>>>> movap[s|d] and movs[s|d] >>> ???? >>>>>> instructions: >>> ???? >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu >>> ???? >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when >>> ???? >>>>>> avx512vl is not available? I see for vectors you use >>> ???? >>>>>> vpxor+vinserti* combination. >>> ???? >>>>>> >>> ???? >>>>>> Sandhya >>> Yes the scalar floating point instructions are available >>> ???? >>>>>> with AVX512 encoding when avx512vl is not available. That is why you >>> ???? >>>>>> would see not just movflt, movdbl but all the other scalar >>> ???? >>>>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F >>> ??? instructions. >>> ???? >>>>>> >>> ???? >>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad : >>> ???? >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl()) >>> ???? >>>>>> >>> ???? >>>>>> Should it be (UseAVX < 3)? >>> ???? >>>>>> >>> ???? >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev. >>> ???? >>>>>> >>> ???? >>>>>> Thanks, >>> ???? >>>>>> Vladimir >>> ???? >>>>>> >>> ???? >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: >>> ???? >>>>>>> Hi Vladimir, >>> ???? >>>>>>> >>> ???? >>>>>>> Thanks a lot for your review and feedback. Please see my response >>> ???? >>>>>>> in your email below. I will send an updated webrev incorporating your feedback. >>> ???? >>>>>>> >>> ???? >>>>>>> Best Regards, >>> ???? >>>>>>> Sandhya >>> ???? >>>>>>> >>> ???? >>>>>>> >>> ???? >>>>>>> -----Original Message----- >>> ???? >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com ] >>> ???? >>>>>>> Sent: Monday, September 10, 2018 6:09 PM >>> ???? >>>>>>> To: Viswanathan, Sandhya >; >>> ???? >>>>>>> hotspot-compiler-dev at openjdk.java.net >>> ???? >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>> ???? >>>>>>> instruction >>> ???? >>>>>>> >>> ???? >>>>>>> Very nice. Thank you, Sandhya. >>> ???? >>>>>>> >>> ???? >>>>>>> I would like to see more meaningful naming in .ad files - instead >>> ???? >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*. >>> ???? >>>>>>> >>> ???? >>>>>>>>>> Yes, accepted. >>> ???? >>>>>>> >>> ???? >>>>>>> New load_from_* and load_to_* instructions in .ad files should be >>> ???? >>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions: >>> ???? >>>>>>> >>> ???? >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst, >>> ???? >>>>>>> vlRegF src) >>> ???? >>>>>>>>>> Yes, accepted. >>> ???? >>>>>>> >>> ???? >>>>>>> You did not added instructions to load these registers from memory >>> ???? >>>>>>> (and stack). What happens in such cases when you need to load or store? >>> ???? >>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it >>> ???? >>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa. >>> ???? >>>>>>> >>> ???? >>>>>>> Also please explain why these registers are used when UseAVX == 0?: >>> ???? >>>>>>> >>> ???? >>>>>>> +instruct absD_reg(rregD dst) %{ >>> ???? >>>>>>> ??????? predicate((UseSSE>=2) && (UseAVX == 0)); >>> ???? >>>>>>> >>> ???? >>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too: >>> ???? >>>>>>> ????? 661?? if (UseAVX < 3) { >>> ???? >>>>>>> ????? 662???? _features &= ~CPU_AVX512F; >>> ???? >>>>>>> >>> ???? >>>>>>>>>> Yes, accepted. It could be regD here. >>> ???? >>>>>>> >>> ???? >>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): >>> ???? >>>>>>> >>> ???? >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, >>> ???? >>>>>>> +vectors_reg_legacy, %{ >>> ???? >>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() && >>> ???? >>>>>>> VM_Version::supports_avx512dq() && >>> ???? >>>>>>> VM_Version::supports_avx512vl() %} ); >>> ???? >>>>>>> >>> ???? >>>>>>>>>> Yes, accepted. >>> ???? >>>>>>> >>> ???? >>>>>>> I would suggest to test these changes on different machines >>> ???? >>>>>>> (non-avx512 and avx512) and with different UseAVX values. >>> ???? >>>>>>> >>> ???? >>>>>>>>>> Will do. >>> ???? >>>>>>> >>> ???? >>>>>>> Thanks, >>> ???? >>>>>>> Vladimir >>> ???? >>>>>>> >>> ???? >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: >>> ???? >>>>>>>> Recently there have been couple of high priority issues with >>> ???? >>>>>>>> regards to high bank of XMM register >>> ???? >>>>>>>> (XMM16-XMM31) usage by C2: >>> ???? >>>>>>>> >>> ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746 >>> ???? >>>>>>>> >>> ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>> ???? >>>>>>>> >>> ???? >>>>>>>> Please find below a patch which attempts to clean up the XMM >>> ???? >>>>>>>> register handling by using register groups. >>> ???? >>>>>>>> >>> ???? >>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ >>> ??? >>> ???? >>>>>>>> >>> ???? >>>>>>>> >>> ???? >>>>>>>> The patch provides a restricted set of registers to the match >>> ???? >>>>>>>> rules in the ad file based on the underlying architecture. >>> ???? >>>>>>>> >>> ???? >>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler. >>> ???? >>>>>>>> >>> ???? >>>>>>>> By removing the special handling, the patch reduces the overall >>> ???? >>>>>>>> code size by about 1800 lines of code. >>> ???? >>>>>>>> >>> ???? >>>>>>>> Your review and feedback is very welcome. >>> ???? >>>>>>>> >>> ???? >>>>>>>> Best Regards, >>> ???? >>>>>>>> >>> ???? >>>>>>>> Sandhya >>> ???? >>>>>>>> >>> >>> >>> -- >>> >>> Thanks, >>> >>> Jc >>> From dmitrij.pochepko at bell-sw.com Thu Sep 20 17:27:28 2018 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Thu, 20 Sep 2018 20:27:28 +0300 Subject: RFR: 8189107 - AARCH64: create intrinsic for pow In-Reply-To: <47c4307d-eca1-5270-7e2c-83a31686ef91@redhat.com> References: <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com> <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com> <02da7309-9a70-ea30-fafc-d1d85cfae328@bell-sw.com> <47c4307d-eca1-5270-7e2c-83a31686ef91@redhat.com> Message-ID: <2aff2d56-4126-5fdb-ece5-68324f974f8c@bell-sw.com> On 14/09/18 18:29, Andrew Dinn wrote: > On 13/09/18 15:35, Dmitrij Pochepko wrote: >> Other comments seems fine > I am glad to hear that you did not find any errors in my analysis. > However, I also need to ask you to answer a question that was implicit > in my earlier note. I said: > > "I assume you are familiar with the relevant mathematics and how it has > been used to derive the algorithm. If so then I would like you to review > this rewrite and ensure that there are nor mathematical errors in it. I > would also like you to check that the explanatory comments for of the > individual steps in the algorithm do not contain any errors. > > If you are not familiar with the mathematics then please let me know. I > need to know whether this has been reviewed by someone competent to do so." > > As you didn't respond to this I will have to ask you explicitly this > time. Do you have a background in mathematics and numerical analysis > that means you understand how the original algorithm has been arrived > at? equally, how your algorithm may legitimately vary from that original? ?Yes, I do have relevant background in mathematics. And yes to the questions below but for the latest. That said, it's always good to have another pair of eyes looking at the review. To be honest, I had to refresh my memory regarding Remez polynomials. > > I'll break this down into several steps: > > Do you understand the (elementary) theory that explains how the various > polynomial expansions I described in my comments converge to the > original log and exp functions? > > Do you understand the theory that explains how partial polynomial sums > (Remez polynomials) can be used used to approximate these polynomial > expansions within specified ranges? > > Do you know how the coefficients of these Remez polynomial can be > derived to any necessary accuracy? > > Do you understand how the computation of the values of those Remez > polynomials must proceed in order to guarantee accuracy in the computed > result in the presence of rounding errors? > > Can you provide a mathematical proof that the variations you have > introduced into the computational process (specifially the move from > Horner form to Estrin form) will not introduce rounding errors? I have formal verification for some arguments ranges that I considered the most problematic, but the complete proof is too complicated. Looking at the situation from reviewer side I understand why it'll be safer and easier to maintain to have assembly version duplicate the original fdlibm code and because of that I suggest to revert questionable places to original schemas as the performance improvement is not that big. new webrev with polynomial calculations changed back to original schema. Also changed scalbn implementation to be the same as original: http://cr.openjdk.java.net/~dpochepk/8189107/webrev.03/ As expected, it's about 2% slower. Thanks, Dmitrij > > I certainly cannot lay claim to a /thorough/ understanding of most, if > not all, those topics. If you also cannot then I think we need to bring > in someone who does. In particular, it is the last point that matters > most of all here as this is where you have /chosen/ to make your > algorithm diverge from the code you inherited. > > As regards the rest of the background maths, we do at least know that > the other aspects of the algorithm -- in its original manifestation -- > have been checked by numerical experts. Hence, if we ensure that your > algorithm implements /equivalent/ steps then it ought to inherit the > same guarantees of correctness. So, the only task as far as most of the > code is concerned is to iron out any errors you might inadvertently have > introduced. I have several nits to pick in that regard that which I will > be posting shortly. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From sandhya.viswanathan at intel.com Thu Sep 20 17:53:16 2018 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Thu, 20 Sep 2018 17:53:16 +0000 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: <3cbd216b-56f9-0398-db93-4e032af85dc1@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com> <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com> <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com> <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com> <3c0fe42a-0606-a060-7435-32547676683b@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF4BB3@FMSMSX126.amr.corp.intel.com> <3cbd216b-56f9-0398-db93-4e032af85dc1@oracle.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF5771@FMSMSX126.amr.corp.intel.com> Hi Vladimir, In C1_LIRAssembler.hpp, when I added an additional parameter to negate, I did make sure to add it as a default parameter: src/hotspot/share/c1/c1_LIRAssembler.hpp, line 282: void negate(LIR_Opr left, LIR_Opr dest, LIR_Opr tmp = LIR_OprFact::illegalOpr); But I guess since the function is not just called but declared/defined in all the other architectures, I need to add an unused LIR_Opr to the negate function for them. This would be on similar lines as done in some other C1_LIRAssembler methods. I will make this change and work with Vivek to use the submit repo for testing it on Sparc. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, September 20, 2018 10:09 AM To: Viswanathan, Sandhya ; hotspot-compiler-dev Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction I hit build failure on SPARC due to shared changes in C1: workspace/open/src/hotspot/cpu/sparc/c1_LIRAssembler_sparc.cpp", line 3027: Error: "LIR_Assembler::negate(LIR_OprDesc*, LIR_OprDesc*)" was previously declared "LIR_Assembler::negate(LIR_OprDesc*, LIR_OprDesc*, LIR_OprDesc*)". jib > 1 Error(s) detected. I assume other platforms are also affected. Vladimir On 9/19/18 9:53 AM, Vladimir Kozlov wrote: > Thank you, Sandhya > > I submitted new testing. > > Vladimir > > On 9/18/18 4:52 PM, Viswanathan, Sandhya wrote: >> Hi Vladimir, >> >> Please find below the updated webrev with fixes for the two issues: >> >> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.03/ >> >> >> RFE: https://bugs.openjdk.java.net/browse/JDK-8210764 >> >> Fix for compiler/vectorization/TestNaNVector.java was to pass legVecS >> as the temporary register type for intrinsics instead of legVecD. >> >> This test was only failing with -XX:MaxVectorSize=4. >> >> The file modified is x86_64.ad. >> >> Fix for compiler/vectorization/TestNaNVector.java was to allow all >> xmm registers (xmm0-xmm31) for C1 and handle floating point abs and negate appropriately by providing a temp register. >> >> The C1 files are modified for this fix. >> >> I reran compiler jtreg tests with setting nativepath appropriately on Haswell, SKX and KNL. >> >> Best Regards, >> >> Sandhya >> >> *From:*Viswanathan, Sandhya >> *Sent:* Tuesday, September 18, 2018 1:47 PM >> *To:* 'JC Beyler' >> *Cc:* vladimir.kozlov at oracle.com; hotspot-compiler-dev >> >> *Subject:* RE: RFR (M) 8207746:C2: Lucene crashes on AVX512 >> instruction >> >> Hi Jc, >> >> Thanks a lot for the steps. I am now able to verify my fix for the NativeCallTest.java. >> >> Best Regards, >> >> Sandhya >> >> *From:*JC Beyler [mailto:jcbeyler at google.com] >> *Sent:* Monday, September 17, 2018 9:29 PM >> *To:* Viswanathan, Sandhya > > >> *Cc:* vladimir.kozlov at oracle.com ; >> hotspot-compiler-dev > > >> *Subject:* Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >> instruction >> >> Hi Sandhya, >> >> How are you invoking the test for NativeCallTest? >> >> The way I would do it using jtreg would be something like this: >> >> $ export BUILD_TYPE=release >> >> $ export JDK_PATH=wherever you have your JDK >> >> ?From the test subfolder: >> >> $ wherever-your-jtreg-is/bin/jtreg >> -nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/su >> pport/test/hotspot/jtreg/native/lib -jdk >> $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk >> hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/t >> est/NativeCallTest.java >> >> Seems to pass for me. >> >> But much easier is: >> >> $ make run-test TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java" >> >> That seems to pass for me as well and is easier to use :) >> >> For information, the make run-test documentation is here: >> >> http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing. >> html >> >> Let me know if that helps, >> >> Jc >> >> ??? Note: For NativeCallTest.java and many others I get the following message in the corresponding .jtr file: >> ???? ? ? ? ? ? ? ? ? ? ? "test result: Error. Use -nativepath to specify the location of native code" >> ???? ? ? ? ? ? ? Do I need to give any additional info to jtreg to get over this problem? >> >> ??? Thanks a lot! >> ??? Best Regards, >> ??? Sandhya >> >> ??? -----Original Message----- >> ??? From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com >> ] >> ??? Sent: Monday, September 17, 2018 10:14 AM >> ??? To: Viswanathan, Sandhya > >; >> ??? hotspot-compiler-dev at openjdk.java.net >> >> ??? Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >> instruction >> >> ??? I finished testing on avx512 machine. >> ??? All passed except known (TestNaNVector.java) failures. >> >> ??? Thanks, >> ??? Vladimir >> >> ??? On 9/14/18 5:22 PM, Vladimir Kozlov wrote: >> ???? > I gave incorrect link to RFE. Here is correct: >> ???? > >> ???? > https://bugs.openjdk.java.net/browse/JDK-8210764 >> ???? > >> ???? > Vladimir >> ???? > >> ???? > On 9/14/18 3:49 PM, Vladimir Kozlov wrote: >> ???? >> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed. >> ???? >> >> ???? >> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too. >> ???? >> >> ???? >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation' >> ??? on CPU >> ???? >> with AVX1 only >> ???? >> >> ???? >> #? SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, >> tid=13884 >> ???? >> # Problematic frame: >> ???? >> # V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20 >> ???? >> >> ???? >> Current CompileTask: >> ???? >> C2:??? 154??? 5???????????? java.lang.String::equals (65 >> bytes) >> ???? >> >> ???? >> Stack: [0x00007f3b10044000,0x00007f3b10145000],? >> sp=0x00007f3b1013fe70,? free space=1007k >> ???? >> Native frames: (J=compiled Java code, A=aot compiled Java >> code, j=interpreted, Vv=VM code, C=native code) >> ???? >> V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20 >> ???? >> V? [libjvm.so+0x882a72]? >> PhaseChaitin::gather_lrg_masks(bool)+0x872 >> ???? >> V? [libjvm.so+0xd82235]? PhaseCFG::global_code_motion()+0xfc5 >> ???? >> V? [libjvm.so+0xd824b1]? >> PhaseCFG::do_global_code_motion()+0x51 >> ???? >> V? [libjvm.so+0xa2c26c]? Compile::Code_Gen()+0x24c >> ???? >> V? [libjvm.so+0xa2ff82]? Compile::Compile(ciEnv*, >> C2Compiler*, ciMethod*, int, bool, bool, bool, >> ??? DirectiveSet*)+0xe42 >> ???? >> >> ???? >> >> --------------------------------------------------------------------- >> --------------------------- >> ???? >> 2. >> >> ??? with '-Xcomp' >> ???? >> #? Internal Error >> (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), >> pid=22016, tid=22073 >> ???? >> #? assert(false) failed: cannot spill interval that is used >> in first instruction (possible reason: no register >> ??? found) >> ???? >> >> ???? >> Current CompileTask: >> ???? >> C1: 854767 13391?????? 3?????? >> org.sunflow.math.Matrix4::multiply (692 bytes) >> ???? >> >> ???? >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],? >> sp=0x00007f23b9e7f9d0,? free space=1014k >> ???? >> Native frames: (J=compiled Java code, A=aot compiled Java >> code, j=interpreted, Vv=VM code, C=native code) >> ???? >> V? [libjvm.so+0x1882202]? VMError::report_and_die(int, char >> const*, char const*, __va_list_tag*, Thread*, unsigned >> ???? >> char*, void*, void*, char const*, int, unsigned long)+0x562 >> ???? >> V? [libjvm.so+0x1882d2f]? VMError::report_and_die(Thread*, >> void*, char const*, int, char const*, char const*, >> ???? >> __va_list_tag*)+0x2f >> ???? >> V? [libjvm.so+0xb0bea0]? report_vm_error(char const*, int, >> char const*, char const*, ...)+0x100 >> ???? >> V? [libjvm.so+0x7e0410]? >> LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0 >> ???? >> V? [libjvm.so+0x7e0a20]? >> LinearScanWalker::activate_current()+0x280 >> ???? >> V? [libjvm.so+0x7e0c7d]? IntervalWalker::walk_to(int) [clone >> .constprop.299]+0x9d >> ???? >> V? [libjvm.so+0x7e1078]? >> LinearScan::allocate_registers()+0x338 >> ???? >> V? [libjvm.so+0x7e2135]? LinearScan::do_linear_scan()+0x155 >> ???? >> V? [libjvm.so+0x70a6bb]? Compilation::emit_lir()+0x99b >> ???? >> V? [libjvm.so+0x70caff]? >> Compilation::compile_java_method()+0x42f >> ???? >> V? [libjvm.so+0x70d974]? Compilation::compile_method()+0x1d4 >> ???? >> V? [libjvm.so+0x70e547]? >> Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, >> BufferBlob*, >> ???? >> DirectiveSet*)+0x357 >> ???? >> V? [libjvm.so+0x71073c]? Compiler::compile_method(ciEnv*, >> ciMethod*, int, DirectiveSet*)+0x14c >> ???? >> V? [libjvm.so+0xa3cf89]? >> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409 >> ???? >> >> ???? >> Vladimir >> ???? >> >> ???? >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote: >> ???? >>> >> ???? >>> Thanks Vladimir, the below should fix this issue: >> ???? >>> >> ???? >>> ------------------------------ >> ???? >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 >> 13:10:23.488379912 -0700 >> ???? >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 >> 13:10:23.308379915 -0700 >> ???? >>> @@ -233,22 +233,6 @@ >> ???? >>> ??? _xmm_regs[13]? = xmm13; >> ???? >>> ??? _xmm_regs[14]? = xmm14; >> ???? >>> ??? _xmm_regs[15]? = xmm15; >> ???? >>> -? _xmm_regs[16]? = xmm16; >> ???? >>> -? _xmm_regs[17]? = xmm17; >> ???? >>> -? _xmm_regs[18]? = xmm18; >> ???? >>> -? _xmm_regs[19]? = xmm19; >> ???? >>> -? _xmm_regs[20]? = xmm20; >> ???? >>> -? _xmm_regs[21]? = xmm21; >> ???? >>> -? _xmm_regs[22]? = xmm22; >> ???? >>> -? _xmm_regs[23]? = xmm23; >> ???? >>> -? _xmm_regs[24]? = xmm24; >> ???? >>> -? _xmm_regs[25]? = xmm25; >> ???? >>> -? _xmm_regs[26]? = xmm26; >> ???? >>> -? _xmm_regs[27]? = xmm27; >> ???? >>> -? _xmm_regs[28]? = xmm28; >> ???? >>> -? _xmm_regs[29]? = xmm29; >> ???? >>> -? _xmm_regs[30]? = xmm30; >> ???? >>> -? _xmm_regs[31]? = xmm31; >> ???? >>> ? #endif // _LP64 >> ???? >>> >> ???? >>> ??? for (int i = 0; i < 8; i++) { >> ???? >>> --------------------------------- >> ???? >>> >> ???? >>> I think the gcc version on my desktop is older so didn?t catch this. >> ???? >>> >> ???? >>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to: >> ???? >>> Patch: >> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/ >> ??? >> ???? >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735 >> ???? >>> >> ???? >>> FYI, I did notice that the default for UseAVX had been >> rolled back and wanted to get confirmation from you before >> ???? >>> changing it back to 3. >> ???? >>> >> ???? >>> Best Regards, >> ???? >>> Sandhya >> ???? >>> >> ???? >>> >> ???? >>> -----Original Message----- >> ???? >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com >> ] >> ???? >>> Sent: Friday, September 14, 2018 12:13 PM >> ???? >>> To: Viswanathan, Sandhya > >; >> ??? hotspot-compiler-dev at openjdk.java.net >> >> ???? >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >> instruction >> ???? >>> >> ???? >>> I got build failure: >> ???? >>> >> ???? >>> >> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: >> array index 16 is past the end of the array >> ???? >>> (which contains 16 elements) [-Werror,-Warray-bounds] >> ???? >>> jib >?? _xmm_regs[16]? = xmm16; >> ???? >>> >> ???? >>> I also noticed that we don't have RFE for this work. I filed: >> ???? >>> >> ???? >>> https://bugs.openjdk.java.net/browse/JDK-8209735 >> ???? >>> >> ???? >>> You did not enabled avx512 by default (8209735 change was >> synced from jdk 11 into 12 2 weeks ago). I added next >> ???? >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp: >> ???? >>> >> ???? >>> - product(intx, UseAVX, 2, \ >> ???? >>> + product(intx, UseAVX, 3, \ >> ???? >>> >> ???? >>> Thanks, >> ???? >>> Vladimir >> ???? >>> >> ???? >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote: >> ???? >>>> Looks good to me. I will start testing and let you know results. >> ???? >>>> >> ???? >>>> Thanks, >> ???? >>>> Vladimir >> ???? >>>> >> ???? >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote: >> ???? >>>>> Hi Vladimir, >> ???? >>>>> >> ???? >>>>> Please find below the updated webrev with all your comments incorporated: >> ???? >>>>> >> ???? >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/ >> ??? >> ???? >>>>> >> ???? >>>>> I have run the jtreg compiler tests on SKX and KNL which >> have two >> ???? >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms. >> ???? >>>>> >> ???? >>>>> Best Regards, >> ???? >>>>> Sandhya >> ???? >>>>> >> ???? >>>>> >> ???? >>>>> -----Original Message----- >> ???? >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com >> ] >> ???? >>>>> Sent: Tuesday, September 11, 2018 8:54 PM >> ???? >>>>> To: Viswanathan, Sandhya > >; >> ???? >>>>> hotspot-compiler-dev at openjdk.java.net >> >> ???? >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >> instruction >> ???? >>>>> >> ???? >>>>> Thank you, Sandhya >> ???? >>>>> >> ???? >>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them. >> ???? >>>>> >> ???? >>>>> Vladimir >> ???? >>>>> >> ???? >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote: >> ???? >>>>>> Hi Vladimir, >> ???? >>>>>> >> ???? >>>>>> Thanks a lot for the detailed review. I really appreciate your feedback. >> ???? >>>>>> Please see my response in your email below marked with >> (Sandhya >> ???? >>>>>>>>> ). Looking forward to your advice. >> ???? >>>>>> >> ???? >>>>>> Best Regards, >> ???? >>>>>> Sandhya >> ???? >>>>>> >> ???? >>>>>> >> ???? >>>>>> -----Original Message----- >> ???? >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com >> ] >> ???? >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM >> ???? >>>>>> To: Viswanathan, Sandhya > >; >> ???? >>>>>> hotspot-compiler-dev at openjdk.java.net >> >> ???? >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >> ???? >>>>>> instruction >> ???? >>>>>> >> ???? >>>>>> Thank you. >> ???? >>>>>> >> ???? >>>>>> I want to discuss next issue: >> ???? >>>>>> >> ???? >>>>>> ??? > You did not added instructions to load these >> registers from >> ???? >>>>>> memory (and stack). What happens in such cases when you need to load or store? >> ???? >>>>>> ??? >>>> Let us take an example, e.g. for loading into >> rregF. First >> ???? >>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa. >> ???? >>>>>> >> ???? >>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack. >> ???? >>>>>> Also we don't check that register could be the same as result you may get unneeded moves. >> ???? >>>>>> >> ???? >>>>>> I would advice add memory moves at least. >> ???? >>>>>> >> ???? >>>>>> Sandhya >>>? I had added those rules initially and >> removed them in >> ???? >>>>>> the final patch. I noticed that the register allocator >> uses the >> ???? >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg >> mask >> ???? >>>>>> (matcher.cpp). I would like the register allocator to get >> all the >> ???? >>>>>> possible register on an architecture for idealreg2reg >> mask. I >> ???? >>>>>> wondered that multiple instruct rules in .ad file for >> LoadF from >> ???? >>>>>> memory might cause problems.? I would have to have higher >> cost for >> ???? >>>>>> loading into restricted register set like vlReg. Then I >> decided that >> ???? >>>>>> the register allocator can handle this in much better way >> than me >> ???? >>>>>> adding rules to load from memory. This is with the >> background that the regF is always all the available >> ??? registers >> ???? >>>>>> and vlRegF is the restricted register set. Likewise for >> VecS and legVecS. Let me know you thoughts on this >> ??? and if >> ???? >>>>>> I should still add the rules to load from memory into >> vlReg and legVec. The specific code from matcher.cpp >> ??? that I >> ???? >>>>>> am referring to is: >> ???? >>>>>> ???? MachNode *spillCP = match_tree(new >> ???? >>>>>> >> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >> ???? >>>>>> #endif >> ???? >>>>>> ???? MachNode *spillI? = match_tree(new >> ???? >>>>>> >> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); >> ???? >>>>>> ???? MachNode *spillL? = match_tree(new >> ???? >>>>>> >> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, >> ???? >>>>>> LoadNode::DependsO nlyOnTest, false)); >> ???? >>>>>> ???? MachNode *spillF? = match_tree(new >> ???? >>>>>> >> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); >> ???? >>>>>> ???? MachNode *spillD? = match_tree(new >> ???? >>>>>> >> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); >> ???? >>>>>> ???? MachNode *spillP? = match_tree(new >> ???? >>>>>> >> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >> ???? >>>>>> ???? .... >> ???? >>>>>> ???? idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); >> ???? >>>>>> >> ???? >>>>>> An other question. You use movflt() and movdbl() which >> use either >> ???? >>>>>> movap[s|d] and movs[s|d] >> ???? >>>>>> instructions: >> ???? >>>>>> >> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu >> ???? >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions >> work when >> ???? >>>>>> avx512vl is not available? I see for vectors you use >> ???? >>>>>> vpxor+vinserti* combination. >> ???? >>>>>> >> ???? >>>>>> Sandhya >>> Yes the scalar floating point instructions >> are available >> ???? >>>>>> with AVX512 encoding when avx512vl is not available. That >> is why you >> ???? >>>>>> would see not just movflt, movdbl but all the other >> scalar >> ???? >>>>>> operations like adds, addsd etc using the entire xmm >> range (xmm0-31). In other words they are AVX512F >> ??? instructions. >> ???? >>>>>> >> ???? >>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad : >> ???? >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl()) >> ???? >>>>>> >> ???? >>>>>> Should it be (UseAVX < 3)? >> ???? >>>>>> >> ???? >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev. >> ???? >>>>>> >> ???? >>>>>> Thanks, >> ???? >>>>>> Vladimir >> ???? >>>>>> >> ???? >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: >> ???? >>>>>>> Hi Vladimir, >> ???? >>>>>>> >> ???? >>>>>>> Thanks a lot for your review and feedback. Please see my >> response >> ???? >>>>>>> in your email below. I will send an updated webrev incorporating your feedback. >> ???? >>>>>>> >> ???? >>>>>>> Best Regards, >> ???? >>>>>>> Sandhya >> ???? >>>>>>> >> ???? >>>>>>> >> ???? >>>>>>> -----Original Message----- >> ???? >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com >> ] >> ???? >>>>>>> Sent: Monday, September 10, 2018 6:09 PM >> ???? >>>>>>> To: Viswanathan, Sandhya > >; >> ???? >>>>>>> hotspot-compiler-dev at openjdk.java.net >> >> ???? >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on >> AVX512 >> ???? >>>>>>> instruction >> ???? >>>>>>> >> ???? >>>>>>> Very nice. Thank you, Sandhya. >> ???? >>>>>>> >> ???? >>>>>>> I would like to see more meaningful naming in .ad files >> - instead >> ???? >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*. >> ???? >>>>>>> >> ???? >>>>>>>>>> Yes, accepted. >> ???? >>>>>>> >> ???? >>>>>>> New load_from_* and load_to_* instructions in .ad files >> should be >> ???? >>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions: >> ???? >>>>>>> >> ???? >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct >> MoveVL2F(regF dst, >> ???? >>>>>>> vlRegF src) >> ???? >>>>>>>>>> Yes, accepted. >> ???? >>>>>>> >> ???? >>>>>>> You did not added instructions to load these registers >> from memory >> ???? >>>>>>> (and stack). What happens in such cases when you need to load or store? >> ???? >>>>>>>>>> Let us take an example, e.g. for loading into rregF. >> First it >> ???? >>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa. >> ???? >>>>>>> >> ???? >>>>>>> Also please explain why these registers are used when UseAVX == 0?: >> ???? >>>>>>> >> ???? >>>>>>> +instruct absD_reg(rregD dst) %{ >> ???? >>>>>>> ??????? predicate((UseSSE>=2) && (UseAVX == 0)); >> ???? >>>>>>> >> ???? >>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too: >> ???? >>>>>>> ????? 661?? if (UseAVX < 3) { >> ???? >>>>>>> ????? 662???? _features &= ~CPU_AVX512F; >> ???? >>>>>>> >> ???? >>>>>>>>>> Yes, accepted. It could be regD here. >> ???? >>>>>>> >> ???? >>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): >> ???? >>>>>>> >> ???? >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, >> ???? >>>>>>> +vectors_reg_legacy, %{ >> ???? >>>>>>> VM_Version::supports_evex() && >> VM_Version::supports_avx512bw() && >> ???? >>>>>>> VM_Version::supports_avx512dq() && >> ???? >>>>>>> VM_Version::supports_avx512vl() %} ); >> ???? >>>>>>> >> ???? >>>>>>>>>> Yes, accepted. >> ???? >>>>>>> >> ???? >>>>>>> I would suggest to test these changes on different >> machines >> ???? >>>>>>> (non-avx512 and avx512) and with different UseAVX values. >> ???? >>>>>>> >> ???? >>>>>>>>>> Will do. >> ???? >>>>>>> >> ???? >>>>>>> Thanks, >> ???? >>>>>>> Vladimir >> ???? >>>>>>> >> ???? >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: >> ???? >>>>>>>> Recently there have been couple of high priority issues >> with >> ???? >>>>>>>> regards to high bank of XMM register >> ???? >>>>>>>> (XMM16-XMM31) usage by C2: >> ???? >>>>>>>> >> ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746 >> ???? >>>>>>>> >> ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735 >> ???? >>>>>>>> >> ???? >>>>>>>> Please find below a patch which attempts to clean up >> the XMM >> ???? >>>>>>>> register handling by using register groups. >> ???? >>>>>>>> >> ???? >>>>>>>> >> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ >> ??? >> ???? >>>>>>>> >> >> ???? >>>>>>>> >> ???? >>>>>>>> The patch provides a restricted set of registers to the >> match >> ???? >>>>>>>> rules in the ad file based on the underlying architecture. >> ???? >>>>>>>> >> ???? >>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler. >> ???? >>>>>>>> >> ???? >>>>>>>> By removing the special handling, the patch reduces the >> overall >> ???? >>>>>>>> code size by about 1800 lines of code. >> ???? >>>>>>>> >> ???? >>>>>>>> Your review and feedback is very welcome. >> ???? >>>>>>>> >> ???? >>>>>>>> Best Regards, >> ???? >>>>>>>> >> ???? >>>>>>>> Sandhya >> ???? >>>>>>>> >> >> >> -- >> >> Thanks, >> >> Jc >> From aph at redhat.com Thu Sep 20 17:58:34 2018 From: aph at redhat.com (Andrew Haley) Date: Thu, 20 Sep 2018 18:58:34 +0100 Subject: [aarch64-port-dev ] RFR(S): 8210413: AArch64: Optimize div/rem by constant in C1 In-Reply-To: References: <9645c210-3d87-52fa-8051-54dc60629866@redhat.com> Message-ID: <9d454f5b-475b-5713-7155-c6946f378c3e@redhat.com> On 09/20/2018 05:15 AM, Pengfei Li (Arm Technology China) wrote: > Please find below new patch that added the same optimization for longs as well as ints and also fixed an issue. > http://cr.openjdk.java.net/~yzhang/8210413/webrev.01/ > > Could you help look at it again? That's fine. I'm not exactly delighted by the amount of duplicated code for long and int, but it's very hard to avoid in this case. The patch is good for JDK/JDK. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From Pengfei.Li at arm.com Fri Sep 21 06:53:05 2018 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Fri, 21 Sep 2018 06:53:05 +0000 Subject: [aarch64-port-dev ] RFR(S): 8210413: AArch64: Optimize div/rem by constant in C1 In-Reply-To: <9d454f5b-475b-5713-7155-c6946f378c3e@redhat.com> References: <9645c210-3d87-52fa-8051-54dc60629866@redhat.com> <9d454f5b-475b-5713-7155-c6946f378c3e@redhat.com> Message-ID: Thanks for your code review. Could you help push this patch? -- Thanks, Pengfei > -----Original Message----- > > On 09/20/2018 05:15 AM, Pengfei Li (Arm Technology China) wrote: > > Please find below new patch that added the same optimization for longs as > well as ints and also fixed an issue. > > http://cr.openjdk.java.net/~yzhang/8210413/webrev.01/ > > > > Could you help look at it again? > > That's fine. I'm not exactly delighted by the amount of duplicated code for > long and int, but it's very hard to avoid in this case. > The patch is good for JDK/JDK. > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From adinn at redhat.com Fri Sep 21 08:55:32 2018 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 21 Sep 2018 09:55:32 +0100 Subject: RFR: 8189107 - AARCH64: create intrinsic for pow In-Reply-To: <2aff2d56-4126-5fdb-ece5-68324f974f8c@bell-sw.com> References: <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com> <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com> <02da7309-9a70-ea30-fafc-d1d85cfae328@bell-sw.com> <47c4307d-eca1-5270-7e2c-83a31686ef91@redhat.com> <2aff2d56-4126-5fdb-ece5-68324f974f8c@bell-sw.com> Message-ID: <7ca0112f-76f9-538d-f263-8199cbd380ab@redhat.com> On 20/09/18 18:27, Dmitrij Pochepko wrote: > On 14/09/18 18:29, Andrew Dinn wrote: > ?Yes, I do have relevant background in mathematics. And yes to the > questions below but for the latest. That said, it's always good to have > another pair of eyes looking at the review. To be honest, I had to > refresh my memory regarding Remez polynomials. Thank you for the confirmation. . . . >> Can you provide a mathematical proof that the variations you have >> introduced into the computational process (specifially the move from >> Horner form to Estrin form) will not introduce rounding errors? > I have formal verification for some arguments ranges that I considered > the most problematic, but the complete proof is too complicated. Looking > at the situation from reviewer side I understand why it'll be safer and > easier to maintain to have assembly version duplicate the original > fdlibm code and because of that I suggest to revert questionable places > to original schemas as the performance improvement is not that big. Ok, use of Horner form was one of the things I was going to ask you to restore. I did actually ask Joe Darcy about the use of Estrin form. If he can provide an argument that it is ok to employ it then we can think about reinstating the vector computation as an upgrade at a later date. I am not surprised its removal makes only a small difference, given how little of the computation it represents. > new webrev with polynomial calculations changed back to original schema. > Also changed scalbn implementation to be the same as original: > http://cr.openjdk.java.net/~dpochepk/8189107/webrev.03/ So, I guess that means you have now actually tested the underflow case? The previous scalbn implementation was one of two places in which the code was seriously broken. In consequence, all computations meant to generate underflowing results were not computed correctly. One of the things wrong in the previous scalbn implementation was the use of cmp rather than cmpw, an error which I see you have now fixed (there were 3 further mistakes in this section of the code). Your rewrite of the case handling looks complex and unnecessarily slow to me. I'll suggest a better fix in a review I am currently writing up. The other place where there was an error was in the Y_IS_HUGE branch. The vector maths code expects to load the first 4 table values in a permuted order (i.e. it assumes they are 0.25, -1/ln2, -0.333333, 0.5) but the corresponding constants in _pow_coef1 have not been permuted. Because of this all computations where 2^31 < y < 2^64 were not computed correctly. This error appears still to be present in the latest version of the code. So, I assume you have also never tested that range of values? For example, try x = 1.0000000001, y = (double)0x1_1000_0000L and compare the result with that obtained from the StrictMath routine. I did explicitly ask you to ensure that all paths through the code were tested in an earlier posting. Once I had read through and understood the code it took me about 2 minutes to produce a test program that exercised each of these 2 broken paths (and about 1/2 hour in gdb to detect and fix each problem). I'm very under-impressed that you did not bother to produce such tests as part of your test regime. These errors are not bizarre or unexpected corner cases. They are paths that can be expected to be taken during normal computations. Testing the code requires at the very least driving the code through such paths with suitable inputs and then ensuring the results are valid so you should really have looked for and found these bugs. Of course, testing also requires identifying and checking bizarre or unexpected corner cases. However, while it might be understandable if some of those latter cases were missed there is really no excuse for not checking the known, expected paths with at least some inputs. It's extremely unhelpful to those who have to maintain this code when a contributor takes the job of testing this lightly. Nor is such behaviour going to encourage anyone to accept further contributions. You can expect a full review of the code some time today (it will be based on the previous version so you will have to make allowance for the changes you have made in the latest version). There are only a few things I would like you to tweak in the sequence of generated instructions. However, you will need to do a lot of rework to make the generator code more systematic and more readable. That includes 1) introducing some block structure and local declarations to the generator code and 2) adopting a more coherent allocation of values to registers which reflects the naming and local usage of variables in the original algorithm. To simplify that task I will provide a revised algorithm that which faithfully reflects the structure of your generated code and clarifies its relation to the original. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From aph at redhat.com Fri Sep 21 09:31:28 2018 From: aph at redhat.com (Andrew Haley) Date: Fri, 21 Sep 2018 10:31:28 +0100 Subject: [aarch64-port-dev ] RFR: 8189107 - AARCH64: create intrinsic for pow In-Reply-To: <7ca0112f-76f9-538d-f263-8199cbd380ab@redhat.com> References: <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com> <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com> <02da7309-9a70-ea30-fafc-d1d85cfae328@bell-sw.com> <47c4307d-eca1-5270-7e2c-83a31686ef91@redhat.com> <2aff2d56-4126-5fdb-ece5-68324f974f8c@bell-sw.com> <7ca0112f-76f9-538d-f263-8199cbd380ab@redhat.com> Message-ID: On 09/21/2018 09:55 AM, Andrew Dinn wrote: > Ok, use of Horner form was one of the things I was going to ask you to > restore. I did actually ask Joe Darcy about the use of Estrin form. If > he can provide an argument that it is ok to employ it then we can think > about reinstating the vector computation as an upgrade at a later date. > I am not surprised its removal makes only a small difference, given how > little of the computation it represents. I've run the crlibm difficult rounding tests on the patch, with no regressions, so I'm pretty happy about using the Estrin form. Of course it's possible that there will be problems elsewhere, but it's not likely IMO. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rkennke at redhat.com Fri Sep 21 12:48:39 2018 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 21 Sep 2018 14:48:39 +0200 Subject: RFR: JDK-8210752: Remaining explicit barriers for C2 In-Reply-To: References: <15be8e2d-dba5-2e8a-c851-b6821b81d4b3@redhat.com> Message-ID: Hi Roland, thanks for reviewing! Any other reviews? Can I push that stuff? Roman >> http://cr.openjdk.java.net/~rkennke/JDK-8210752/webrev.00/ > > That looks good to me. > > Roland. > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From shade at redhat.com Fri Sep 21 13:13:07 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 21 Sep 2018 15:13:07 +0200 Subject: RFR: JDK-8210752: Remaining explicit barriers for C2 In-Reply-To: References: <15be8e2d-dba5-2e8a-c851-b6821b81d4b3@redhat.com> Message-ID: <0b832f84-2555-5251-8164-b05732ae8b4a@redhat.com> On 09/21/2018 02:48 PM, Roman Kennke wrote: > Any other reviews? Can I push that stuff? > >>> http://cr.openjdk.java.net/~rkennke/JDK-8210752/webrev.00/ Looks good to me too. Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From adinn at redhat.com Fri Sep 21 15:01:57 2018 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 21 Sep 2018 16:01:57 +0100 Subject: RFR: 8189107 - AARCH64: create intrinsic for pow In-Reply-To: <47c4307d-eca1-5270-7e2c-83a31686ef91@redhat.com> References: <40f2697c-1dee-e605-4402-3e43ad8b6019@redhat.com> <44f5259c-d40e-30e0-272b-140fe6fbc950@redhat.com> <02da7309-9a70-ea30-fafc-d1d85cfae328@bell-sw.com> <47c4307d-eca1-5270-7e2c-83a31686ef91@redhat.com> Message-ID: <1b770596-8da2-74be-bad2-832bf4a6622a@redhat.com> Hi Dmitrij, As promised here is the review (after sig) based on webrev.02. The review first describes the problems I have identified. It then continues with recommendations for (extensive) rework. Since it is based on webrev.02 you will have to make some allowance for the changes you introduced in your webrev.03 I have revised you webrev to include fixes for the two errors I identified and a new version is available here http://cr.openjdk.java.net/~adinn/8189107/webrev.03/ The webrev includes my recommended fix for to the scalbn code in macroAssembler_pow.cpp and a correction to the declaration of table _pow_coef1 in stubRoutines_aarch64.cpp. I explain these changes below. I have also uploaded a commented version of the original algorithm and a revised algorithm based on your code here http://cr.openjdk.java.net/~adinn/8189107/original_algorithm.txt http://cr.openjdk.java.net/~adinn/8189107/revised_algorithm.txt You have seen the original algorithm modulo a few tweaks that emerged in creating the revised version. The revised algorithm /faithfully/ models your webrev.02 code (with a couple of notable exceptions that relate to problems described below). That faithful modelling of the code includes retaining the order of computation of all values. In particular, it models early computation of some data that you appear to have introduced in order to pipeline certain operations. At the same time, the algorithm also introduces a much more coherent control structure (inserting 'if then else' in place of GOTO everywhere it is possible) and a nested /block structure/ (none of this required reordering btw). It profits from this to introduce block local declarations which scope the values computed and used at successive steps. As far as possible the revised algorithm employs the same naming convention as the original algorithm (I'll explain more on that in the detailed feedback below). Why does this matter? Well, once the errors get fixed, by far the biggest remaining problems with the existing code are 1) its unclear control structure and 2) its incoherent allocation of data to registers. The intention of providing block structure and block local use of variables in the revised algorithm is to enable introduction of similar block structuring and block local declarations for registers and branch labels in the generator code. In particular, that will allow the generator code to be rewritten to use symbolic names for registers consistently throughout the function. So, a small number of register mappings for variables that are global to the algorithm will need to be be retained at the start of the function. However, they will use meaningful names like x, exp_mask, one_d, result etc instead of the completely meaningless aliases, tm1, tmp2 etc, that you provided. Similarly, some of the label declarations (mostly for exit cases) will need to be retained at the top level. However, most register mappings will be for variables that are local to a block. So, they will need to be declared at the start of that block making it clear where they are used and where they are no longer in use. Similarly most label declarations will be only need to be declared at the start of the immediately enclosing block that generates the code identified by the label. This will ensure declarations are close to their point of use and are not in scope after they become redundant (or at least not for very long). Register mappings for variables declared in an outer block that are live across nested blocks will then be able to be used consistently in those inner blocks while clearly identifying precisely what values are being generated, used or updated. The same applies for label declarations. They can be used as branch targets in nested blocks but will not be visible in outer scopes or sibling scopes. Where possible the revised algorithm employs the same naming convention as the original algorithm for the values it operates on -- with one important modification. A suffix convention has been adopted to clarify some ambiguities present in the original. For example, this allows us to distinguish the double value x from its long bits representation x_r, its 32 bit high word x_h or its 32 bit positive high word ix_h. The algorithm also employs block local declarations to scope intermediate values, employing names starting with 'tmp'. These are often introduced in small, local blocks, allowing the same names tmp1, tmp2 etc to be reused without ambiguity. So, I hope it is clear how you can use this algorithm to rewrite the generator code to be much more readable and maintainable -- that being the ultimate goal here. I'm not willing to let the code be committed without this restructuring taking place. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander Problems -------- 1) Error in Y_IS_HUGE case The vector maths that computes the polynomial for this case expects to load the first 6 coefficients in table _pow_coef1 in the permuted order (0.25, -ivln2, -0.333333333333, 0.5, ivln2_l, ivln2_h). However, the table declaration in stubRoutines_aarch64.cpp declares them in the order (-0.333333333333, 0.25, 0.5, -ivln2, ivln2_l, ivln2_h). 2) scalbn implementation is wrong The original code was using this algorithm 1 if (0 >= (j_h >> 20)) 2 tmp1 = n' - 1023 // bias n as an exponent 3 tmp2 = (tmp1 & 0x3ff) << 52 // install as high bits 4 two_n_d = LONG_TO_DOUBLE(tmp2) 5 z = z * two_n_d // rescale 6 b(CHECK_RESULT_NEGATION); The problems are: line 1: as you spotted the test was wrongly implemented using cmp not cmpw (j_h is computed using an int add so sign extending as a long fails to retain the overflow bit). line 2: n is the unbiased exponent -- rebiasing it requires adding 1023, not subtracting it line 3: when we hit underflow the unbiased exponent is in the range [-1024, -1075]. So, after correcting the sub to an add the exponent in tmp1 will be negative (that is precisely the case the if test looks for). Installing the top 12 bits exponent of this negative value into the high bits of a double gives a float with unbiased exponent in range [970, 1023] i.e. a very high positive power of 2 rather than a very low negative one. The result is by by about 300 orders of magnitude! line 6: As you spotted, the multiply here updates the register storing z rather than installing the result in v0 I explain all this not to point out why it is wrong but to show how your original version can be salvaged with a few small changes. Basically we want to multiply z by 2^n to get the result where n lies between -1023 and -1075. Depending on the values of z and n the result will be either a subnormal double or +0. So, the simplest solution is to do the multiply in two stages. Here is a revised algorithm: if (0 >= (j_h >> 20)) { double two_n_r // power of 2 as long bits mapped to r2 long biased_n // n' - 1023 mapped to r2 double two_n_d // used to rescale z to underflow value // mapped to v17 // split the rescale into two steps: 2^-512 then the rest n = n + 512 two_n_r = 0x1ff << 52 // high bits for 2^-512 two_n_d = LONG_TO_DOUBLE(two_n_r) z' = z' * two_n_d biased_n = n + 1023 // bias remainder -- will now be positive! two_n_r = (biased_n & 0x3ff) << 52 // high bits for 2^n two_n_d = LONG_TO_DOUBLE(two_n_r) result = z' * two_n_d } else { ... The code for this is: cmpw(zr, tmp10, ASR, 20); br(LT, NO_SCALBN); // n = tmp1 // rescale first by 2^-512 and then by the rest addw(tmp1, tmp1, 512); // n -> n + 512 movz(tmp2, 0x1FF0, 48); fmovd(v17, tmp2); // 2^-512 fmuld(v18, v18, v17); // z = z * 2^-512 addw(tmp2, tmp1, 1023); // biased exponent ubfm(tmp2, tmp2, 12, 10); fmovd(v17, tmp2); // 2^n fmuld(v0, v18, v17); // result = z * 2^n b(CHECK_RESULT_NEGATION); bind(NO_SCALBN); . . . I think this is simpler than your alternative. I checked it on several test cases and it agrees with the existing StrictMath code. 3) Use of Estrin form in place of Horner form I would prefer not to use this without a proof that the re-ordered computation does not introduce rounding errors. I doubt that it will and I suspect that even if it did the error will be very small, certainly small enough that the leeway between what is expected of StrictMath.pow and what is allowed for in Math.pow will not cause anyone's expectations to be challenged. However, even so I'd prefer not to surprise users. Anyway, if Andrew Haley really wants this to be adopted then I'll accept his override and you can leave it in Estrin form. 4) Repetition of code in K_IS_0/K_IS_1 branches In the Y_IS_NOT_HUGE block 15/17 instructions are generated in the if and else branches for k == 0 and k == 1, implementing what is almost exactly the same computation. The two generated sequences differ very slightly. In the k == 1 case dp_h and dp_l need to be folded into the computation to subtract ln2(1.5) from the result while in the k == 0 case dp_h and dp_l are both 0.0 and can be ignored. The most important difference is the need to load dp_l/dp_h from the coefficient table in one branch while the other merely moves forward the cursor which points at the table. The other differences consist of two extra operations in the k == 1 branch, an extra fmlavs and an extra faddd, which fold in the dp_l and dp_h values. An alternative would be to use common code for the computation which always perform the extra fmlavs and faddd. The revised algorithm describes precisely this alternative. To make it work the k = 0 branch needs to install 0.0 in the registers used for dp_h and dp_k (not necessarily by loading from the table). This shortens the branches, relocating 15 common instructions after the branch As far as code clarity is concerned it is easier to understand and maintain if the common code is generated only once. As for performance, I believe this trade off of a few more cycles against code size is also a better choice. Shaving instructions may give a small improvement in a benchmark, especially if the benchmark repeatedly runs with values that exercise only one of the paths. However, in real use the extra code size from the duplicated code is likely to push more code out of cache. Since this is main path code that is actually quite likely to happen. So, I would like to see the duplication removed unless you can make a really strong case for keeping it. If you can provide such a reason then an explanation why the duplication is required needs to be provided in the revised algorithm and the code and the algorithm need sot be updated to specify both paths. 4) Repetition of odd/even tests in exit cases. The original algorithm takes the hit of computing the even/odd/fraction status of y inline, i.e. in the main path, during special case checks. That happens even if the result is not used later. You have chosen to do it at the relevant exits, resulting in more repeated code. These cases will likely be a rare path so the issue of extra code size is not going to be very important relative to the saved cycles. However, the replication of inline code is a maintenance issue. It would be better to use a separate function to generate the common code that computes the test values (lowest non-zero bit idx and exponent of y) avoiding any need to read, check and fix the generator code in different places. Please update the code accordingly. 5) Test code You need to include a test as part of this patch which checks the outputs are correct for a few selected inputs that exercise the underflow and Y_IS_HUGE branches. I adapted the existing Math tests to include these extra checks: public static int testUnderflow() { int failures = 0; failures += testPowCase(1.5E-2D, 170D, 8.6201344461973E-311D); failures += testPowCase(1.55E-2D, 171.3D, 1.00883443217485E-310D); // failures += testPowCase(150D, 141.6D, 1.3630829399139085E308); return failures; } public static int testHugeY() { int failures = 0; long yl = 0x1_1000_0000L; failures += testPowCase(1.0000000001, (double)yl, 1.5782873649891997D); failures += testPowCase(1.0000000001, (double)yl + 0.3, 1.5782873650365483D); return failures; You don't have to add this to the existing math test suite. A simple standalone test which checks that the StrictMath and Math outputs agree on these inputs is all that is needed. Rework ------ 1) Please check that the revised algorithm I have provided accurately reflects the computations performed by your code. That will require changing the code to deal with the error cases 1, 2, 4 and 5 above. If you stick with the Estrin form in case 3 then ensure the algorithm is correct. If you stick with Horner form then update the algorithm so it is consistent. The algorithm currently details all mappings of variable to registers. That is provided as a heuristic which i) allowed me to track usages when writing up the algorithm and ii) will allow you to analyze the current register mappings and rationalize them. Once you have a more coherent strategy for allocating variables to registers details of the mapping can and should be removed. As mentioned, the algorithm uses a suffix notation to distinguish different values where there is some scope for ambiguity. The suffices are used as follows 1) '_d' double values (mapped to d registers) 2) '_hd' and '_ld' hi/lo double pairs used to retain accuracy (mapped to independent d registers) 3) '_d2' double vectors (mapped to 2d v registers) 4) '_r' long values that represent the long version of a double value (mapped to general x registers) 5) '_h' int values that represent the high word of a double value (mapped to general w registers) In many unambiguous cases a suffix is omitted e.g. x, y, k, n. 2) Sort out inconsistent mappings and unnecessary remappings One of the problems I faced in writing up the algorithm is that some of your register use appears to be inconsistent -- the same 'logical' variable is mapped to different registers at different points in the code. In some cases, this reflects different use of the same name for different quantities calculated at different stages in the algorithm (for example, z is used to name both log2(x) and then reused later for the fractional part of log2(x)). Most of those reuses are actually catered for by declaring the var in one block and then redeclaring it in a different sibling block. If this block structure is replicated in the code then it will enable z to be mapped differently for each different scope. However, that's not the preferred outcome. It would make the code easier to follow if every variable named in the algorithm was always mapped to the same register unless there was a clear need to relocate it. There are also cases where a remapping is performed (without any discernible reason) within the same scope or within nested scopes! For example, after the sign of x_r has been noted (and, possibly, it's sign bit removed) the resulting absolute value of x_r is moved from r0 to r1 (using an explicit mov). There doesn't seem to be any need to do this. Likewise, in the COMPUTE_POW_PROCEED block the value for z stored in v0 is reassigned to v18 in the same block! I have flagged these remappings with a !!! comment and avoided the ambiguity that arises by adding a ' to the remapped value (x_r', z') and using it thereafter. This ensures that uses of the different versions of the value located in different registers can be distinguished. An example of remapping in nested scope is provided by p_hd and p_ld. At the outermost scope these values are mapped to v5 and b6. However, in the K_IS_SET block they are stored in v17 and v16 (values for v and u, local to the same block, are placed in v5 and v6 so there is no reason why the outer scope mapping could not have been retained). I'd like to see a much more obvious and obviously consistent plan adopted for mappings before the code is committed. 3) Insert my original and revised algorithms into your code in place of the current ones. 4) Change the code to introduce local blocks as per the revised algorithm This should mostly be a simple matter of introducing braces into the code at strategic locations (although please re-indent). 5) Change the code to use symbolic names for register arguments and declare those names as Register or FloatRegister in the appropriate local scope The main point of employing consistent, logical names for values defined in the algorithm is to allow registers employed in the code to be identified using meaningful names rather than using r0, r1, v0, v1, tmp1, tmp2 etc. So, for example at the top level of the routine you need to declare register mappings for the function global variables and then use them in all the generated instructions e.g. the code should look like // double x // input arg // double y // input arg FloatRegister x = v0; FloatRegister y = v1; // long y_r // y as long bits Register y_r = rscratch2 // long x_r // x as long bits Register x_r = r0 // double one_d // 1.0d FloatRegister one_d = v2 . . . // y_r = DOUBLE_TO_LONG(y) fmovd(y_r, y); // x_r = DOUBLE_TO_LONG(x) fmovd(x_r, x) . . . Similarly, in nested blocks you need to introduce block local names for registers and then use them in the code. For example in the SUBNORMAL_HANDLED block bind(SUBNORMAL_HANDLED); // block introduced to scope vars/labels in this region { Label K_IS_SET; // int ix_h // |x| with exponent rescaled so 1 =< ix < Register ix_h = r2; // int k // range 0 or 1 for poly expansion Register k = r7 // block introduced to scope vars/labels in this subregion { // int x_h // high word of x Register x_h = r2 //int mant_x_h // mantissa bits of high word of Register mant_x_h = r10 . . . // x_h = (x_r' >> 32) lsr(x_h, x_r, 32); // mant_x_h = (x_r >> 32) & 0x000FFFFF ubfx(mant_x_h, x_r, 32, 20); // i.e. top 20 fractions bits of x . . . } bind(K_IS_SET); . . . You should be able to hang the code directly off the algorithm as shown above, making it clear that it matches the revised algorithm and allowing meaningful comparison with the original. If you have changed the code in your latest revisions then please update the algorithm accordingly to ensure they continue to correspond with each other. From adinn at redhat.com Fri Sep 21 15:44:14 2018 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 21 Sep 2018 16:44:14 +0100 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> Message-ID: Hi Alan, Thanks for the response and apologies for failing to notice you had posted it some days ago (doh!). Jonathan Halliday has already explained how Red Hat might want to use this API. Well, what he said, essentially! In particular, this model provides a way of ensuring that raw byte data is able to be persisted coherently from Java with the minimal possible overhead. It would be up to client code above this layer to implement structuring mechanisms for how those raw bytes get populated with data and to manage any associated issues regarding atomicity, consistency and isolation (i.e. to provide the A, C and I of ACID to this API's D). The main point of the JEP is to ensure that this such a performant base capability is available for known important cases where that is needed such as, for example, a transaction manager or a distributed cache. If equivalent middleware written in C can use persistent memory to bring the persistent storage tier nearer to the CPU and, hence, lower data durability overheads then we really need an equivalently performant option in Java or risk Java dropping out as a player in those middleware markets. I am glad to hear that other alternatives might be available and would be happy to consider them. However, I'm not sure that this means this option is not still desirable, especially if it is orthogonal to those other alternatives. Most importantly, this one has the advantage that we know it is ready to use and will provide benefits (we have already implemented a journaled transaction log over it with promising results and someone from our messaging team has already been looking into using it to persist log messages). Indeed, we also know we can use it to provide a base for supporting all the use cases addressed by Intel's libpmem and available to C programmers, e.g. a block store, simply by implementing Java client libraries that provide managed access to the persistent buffer along the same lines as the Intel C libraries. I'm afraid I am not familiar with Panama 'scopes' and 'pointers' so I can't really compare options here. Can you point me at any info that explains what those terms mean and how it might be possible to use them to access off-heap, persistent data. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From sandhya.viswanathan at intel.com Fri Sep 21 21:30:01 2018 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 21 Sep 2018 21:30:01 +0000 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF5771@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2467@FMSMSX126.amr.corp.intel.com> <5001f686-5205-090c-66db-8ef9f03bf6a6@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2520@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com> <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com> <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com> <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com> <3c0fe42a-0606-a060-7435-32547676683b@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF4BB3@FMSMSX126.amr.corp.intel.com> <3cbd216b-56f9-0398-db93-4e032af85dc1@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF5771@FMSMSX126.amr.corp.intel.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF5FCD@FMSMSX126.amr.corp.intel.com> Hi Vladimir, Please find the updated webrev with fix for build failure on SPARC and other architectures at: Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.04/ RFE: https://bugs.openjdk.java.net/browse/JDK-8210764 Vivek submitted this webrev for testing to submit repo yesterday at around noon. We haven?t received any email back so far. This is our first time with submit repo. http://mail.openjdk.java.net/pipermail/jdk-submit-changes/2018-September/003164.html Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Viswanathan, Sandhya Sent: Thursday, September 20, 2018 10:53 AM To: Vladimir Kozlov ; hotspot-compiler-dev Subject: RE: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction Hi Vladimir, In C1_LIRAssembler.hpp, when I added an additional parameter to negate, I did make sure to add it as a default parameter: src/hotspot/share/c1/c1_LIRAssembler.hpp, line 282: void negate(LIR_Opr left, LIR_Opr dest, LIR_Opr tmp = LIR_OprFact::illegalOpr); But I guess since the function is not just called but declared/defined in all the other architectures, I need to add an unused LIR_Opr to the negate function for them. This would be on similar lines as done in some other C1_LIRAssembler methods. I will make this change and work with Vivek to use the submit repo for testing it on Sparc. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, September 20, 2018 10:09 AM To: Viswanathan, Sandhya ; hotspot-compiler-dev Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction I hit build failure on SPARC due to shared changes in C1: workspace/open/src/hotspot/cpu/sparc/c1_LIRAssembler_sparc.cpp", line 3027: Error: "LIR_Assembler::negate(LIR_OprDesc*, LIR_OprDesc*)" was previously declared "LIR_Assembler::negate(LIR_OprDesc*, LIR_OprDesc*, LIR_OprDesc*)". jib > 1 Error(s) detected. I assume other platforms are also affected. Vladimir On 9/19/18 9:53 AM, Vladimir Kozlov wrote: > Thank you, Sandhya > > I submitted new testing. > > Vladimir > > On 9/18/18 4:52 PM, Viswanathan, Sandhya wrote: >> Hi Vladimir, >> >> Please find below the updated webrev with fixes for the two issues: >> >> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.03/ >> >> >> RFE: https://bugs.openjdk.java.net/browse/JDK-8210764 >> >> Fix for compiler/vectorization/TestNaNVector.java was to pass legVecS >> as the temporary register type for intrinsics instead of legVecD. >> >> This test was only failing with -XX:MaxVectorSize=4. >> >> The file modified is x86_64.ad. >> >> Fix for compiler/vectorization/TestNaNVector.java was to allow all >> xmm registers (xmm0-xmm31) for C1 and handle floating point abs and negate appropriately by providing a temp register. >> >> The C1 files are modified for this fix. >> >> I reran compiler jtreg tests with setting nativepath appropriately on Haswell, SKX and KNL. >> >> Best Regards, >> >> Sandhya >> >> *From:*Viswanathan, Sandhya >> *Sent:* Tuesday, September 18, 2018 1:47 PM >> *To:* 'JC Beyler' >> *Cc:* vladimir.kozlov at oracle.com; hotspot-compiler-dev >> >> *Subject:* RE: RFR (M) 8207746:C2: Lucene crashes on AVX512 >> instruction >> >> Hi Jc, >> >> Thanks a lot for the steps. I am now able to verify my fix for the NativeCallTest.java. >> >> Best Regards, >> >> Sandhya >> >> *From:*JC Beyler [mailto:jcbeyler at google.com] >> *Sent:* Monday, September 17, 2018 9:29 PM >> *To:* Viswanathan, Sandhya > > >> *Cc:* vladimir.kozlov at oracle.com ; >> hotspot-compiler-dev > > >> *Subject:* Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >> instruction >> >> Hi Sandhya, >> >> How are you invoking the test for NativeCallTest? >> >> The way I would do it using jtreg would be something like this: >> >> $ export BUILD_TYPE=release >> >> $ export JDK_PATH=wherever you have your JDK >> >> ?From the test subfolder: >> >> $ wherever-your-jtreg-is/bin/jtreg >> -nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/su >> pport/test/hotspot/jtreg/native/lib -jdk >> $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk >> hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/t >> est/NativeCallTest.java >> >> Seems to pass for me. >> >> But much easier is: >> >> $ make run-test TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java" >> >> That seems to pass for me as well and is easier to use :) >> >> For information, the make run-test documentation is here: >> >> http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing. >> html >> >> Let me know if that helps, >> >> Jc >> >> ??? Note: For NativeCallTest.java and many others I get the following message in the corresponding .jtr file: >> ???? ? ? ? ? ? ? ? ? ? ? "test result: Error. Use -nativepath to specify the location of native code" >> ???? ? ? ? ? ? ? Do I need to give any additional info to jtreg to get over this problem? >> >> ??? Thanks a lot! >> ??? Best Regards, >> ??? Sandhya >> >> ??? -----Original Message----- >> ??? From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com >> ] >> ??? Sent: Monday, September 17, 2018 10:14 AM >> ??? To: Viswanathan, Sandhya > >; >> ??? hotspot-compiler-dev at openjdk.java.net >> >> ??? Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >> instruction >> >> ??? I finished testing on avx512 machine. >> ??? All passed except known (TestNaNVector.java) failures. >> >> ??? Thanks, >> ??? Vladimir >> >> ??? On 9/14/18 5:22 PM, Vladimir Kozlov wrote: >> ???? > I gave incorrect link to RFE. Here is correct: >> ???? > >> ???? > https://bugs.openjdk.java.net/browse/JDK-8210764 >> ???? > >> ???? > Vladimir >> ???? > >> ???? > On 9/14/18 3:49 PM, Vladimir Kozlov wrote: >> ???? >> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed. >> ???? >> >> ???? >> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too. >> ???? >> >> ???? >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation' >> ??? on CPU >> ???? >> with AVX1 only >> ???? >> >> ???? >> #? SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, >> tid=13884 >> ???? >> # Problematic frame: >> ???? >> # V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20 >> ???? >> >> ???? >> Current CompileTask: >> ???? >> C2:??? 154??? 5???????????? java.lang.String::equals (65 >> bytes) >> ???? >> >> ???? >> Stack: [0x00007f3b10044000,0x00007f3b10145000], >> sp=0x00007f3b1013fe70,? free space=1007k >> ???? >> Native frames: (J=compiled Java code, A=aot compiled Java >> code, j=interpreted, Vv=VM code, C=native code) >> ???? >> V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20 >> ???? >> V? [libjvm.so+0x882a72] >> PhaseChaitin::gather_lrg_masks(bool)+0x872 >> ???? >> V? [libjvm.so+0xd82235]? PhaseCFG::global_code_motion()+0xfc5 >> ???? >> V? [libjvm.so+0xd824b1] >> PhaseCFG::do_global_code_motion()+0x51 >> ???? >> V? [libjvm.so+0xa2c26c]? Compile::Code_Gen()+0x24c >> ???? >> V? [libjvm.so+0xa2ff82]? Compile::Compile(ciEnv*, >> C2Compiler*, ciMethod*, int, bool, bool, bool, >> ??? DirectiveSet*)+0xe42 >> ???? >> >> ???? >> >> --------------------------------------------------------------------- >> --------------------------- >> ???? >> 2. >> >> ??? with '-Xcomp' >> ???? >> #? Internal Error >> (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), >> pid=22016, tid=22073 >> ???? >> #? assert(false) failed: cannot spill interval that is used >> in first instruction (possible reason: no register >> ??? found) >> ???? >> >> ???? >> Current CompileTask: >> ???? >> C1: 854767 13391?????? 3 org.sunflow.math.Matrix4::multiply >> (692 bytes) >> ???? >> >> ???? >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000], >> sp=0x00007f23b9e7f9d0,? free space=1014k >> ???? >> Native frames: (J=compiled Java code, A=aot compiled Java >> code, j=interpreted, Vv=VM code, C=native code) >> ???? >> V? [libjvm.so+0x1882202]? VMError::report_and_die(int, char >> const*, char const*, __va_list_tag*, Thread*, unsigned >> ???? >> char*, void*, void*, char const*, int, unsigned long)+0x562 >> ???? >> V? [libjvm.so+0x1882d2f]? VMError::report_and_die(Thread*, >> void*, char const*, int, char const*, char const*, >> ???? >> __va_list_tag*)+0x2f >> ???? >> V? [libjvm.so+0xb0bea0]? report_vm_error(char const*, int, >> char const*, char const*, ...)+0x100 >> ???? >> V? [libjvm.so+0x7e0410] >> LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0 >> ???? >> V? [libjvm.so+0x7e0a20] >> LinearScanWalker::activate_current()+0x280 >> ???? >> V? [libjvm.so+0x7e0c7d]? IntervalWalker::walk_to(int) [clone >> .constprop.299]+0x9d >> ???? >> V? [libjvm.so+0x7e1078] >> LinearScan::allocate_registers()+0x338 >> ???? >> V? [libjvm.so+0x7e2135]? LinearScan::do_linear_scan()+0x155 >> ???? >> V? [libjvm.so+0x70a6bb]? Compilation::emit_lir()+0x99b >> ???? >> V? [libjvm.so+0x70caff] >> Compilation::compile_java_method()+0x42f >> ???? >> V? [libjvm.so+0x70d974]? Compilation::compile_method()+0x1d4 >> ???? >> V? [libjvm.so+0x70e547] >> Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, >> BufferBlob*, >> ???? >> DirectiveSet*)+0x357 >> ???? >> V? [libjvm.so+0x71073c]? Compiler::compile_method(ciEnv*, >> ciMethod*, int, DirectiveSet*)+0x14c >> ???? >> V? [libjvm.so+0xa3cf89] >> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409 >> ???? >> >> ???? >> Vladimir >> ???? >> >> ???? >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote: >> ???? >>> >> ???? >>> Thanks Vladimir, the below should fix this issue: >> ???? >>> >> ???? >>> ------------------------------ >> ???? >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 >> 13:10:23.488379912 -0700 >> ???? >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 >> 13:10:23.308379915 -0700 >> ???? >>> @@ -233,22 +233,6 @@ >> ???? >>> ??? _xmm_regs[13]? = xmm13; >> ???? >>> ??? _xmm_regs[14]? = xmm14; >> ???? >>> ??? _xmm_regs[15]? = xmm15; >> ???? >>> -? _xmm_regs[16]? = xmm16; >> ???? >>> -? _xmm_regs[17]? = xmm17; >> ???? >>> -? _xmm_regs[18]? = xmm18; >> ???? >>> -? _xmm_regs[19]? = xmm19; >> ???? >>> -? _xmm_regs[20]? = xmm20; >> ???? >>> -? _xmm_regs[21]? = xmm21; >> ???? >>> -? _xmm_regs[22]? = xmm22; >> ???? >>> -? _xmm_regs[23]? = xmm23; >> ???? >>> -? _xmm_regs[24]? = xmm24; >> ???? >>> -? _xmm_regs[25]? = xmm25; >> ???? >>> -? _xmm_regs[26]? = xmm26; >> ???? >>> -? _xmm_regs[27]? = xmm27; >> ???? >>> -? _xmm_regs[28]? = xmm28; >> ???? >>> -? _xmm_regs[29]? = xmm29; >> ???? >>> -? _xmm_regs[30]? = xmm30; >> ???? >>> -? _xmm_regs[31]? = xmm31; >> ???? >>> ? #endif // _LP64 >> ???? >>> >> ???? >>> ??? for (int i = 0; i < 8; i++) { >> ???? >>> --------------------------------- >> ???? >>> >> ???? >>> I think the gcc version on my desktop is older so didn?t catch this. >> ???? >>> >> ???? >>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to: >> ???? >>> Patch: >> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/ >> ??? >> ???? >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735 >> ???? >>> >> ???? >>> FYI, I did notice that the default for UseAVX had been >> rolled back and wanted to get confirmation from you before >> ???? >>> changing it back to 3. >> ???? >>> >> ???? >>> Best Regards, >> ???? >>> Sandhya >> ???? >>> >> ???? >>> >> ???? >>> -----Original Message----- >> ???? >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com >> ] >> ???? >>> Sent: Friday, September 14, 2018 12:13 PM >> ???? >>> To: Viswanathan, Sandhya > >; >> ??? hotspot-compiler-dev at openjdk.java.net >> >> ???? >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >> instruction >> ???? >>> >> ???? >>> I got build failure: >> ???? >>> >> ???? >>> >> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: >> array index 16 is past the end of the array >> ???? >>> (which contains 16 elements) [-Werror,-Warray-bounds] >> ???? >>> jib >?? _xmm_regs[16]? = xmm16; >> ???? >>> >> ???? >>> I also noticed that we don't have RFE for this work. I filed: >> ???? >>> >> ???? >>> https://bugs.openjdk.java.net/browse/JDK-8209735 >> ???? >>> >> ???? >>> You did not enabled avx512 by default (8209735 change was >> synced from jdk 11 into 12 2 weeks ago). I added next >> ???? >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp: >> ???? >>> >> ???? >>> - product(intx, UseAVX, 2, \ >> ???? >>> + product(intx, UseAVX, 3, \ >> ???? >>> >> ???? >>> Thanks, >> ???? >>> Vladimir >> ???? >>> >> ???? >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote: >> ???? >>>> Looks good to me. I will start testing and let you know results. >> ???? >>>> >> ???? >>>> Thanks, >> ???? >>>> Vladimir >> ???? >>>> >> ???? >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote: >> ???? >>>>> Hi Vladimir, >> ???? >>>>> >> ???? >>>>> Please find below the updated webrev with all your comments incorporated: >> ???? >>>>> >> ???? >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/ >> ??? >> ???? >>>>> >> ???? >>>>> I have run the jtreg compiler tests on SKX and KNL which >> have two >> ???? >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms. >> ???? >>>>> >> ???? >>>>> Best Regards, >> ???? >>>>> Sandhya >> ???? >>>>> >> ???? >>>>> >> ???? >>>>> -----Original Message----- >> ???? >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com >> ] >> ???? >>>>> Sent: Tuesday, September 11, 2018 8:54 PM >> ???? >>>>> To: Viswanathan, Sandhya > >; >> ???? >>>>> hotspot-compiler-dev at openjdk.java.net >> >> ???? >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >> instruction >> ???? >>>>> >> ???? >>>>> Thank you, Sandhya >> ???? >>>>> >> ???? >>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them. >> ???? >>>>> >> ???? >>>>> Vladimir >> ???? >>>>> >> ???? >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote: >> ???? >>>>>> Hi Vladimir, >> ???? >>>>>> >> ???? >>>>>> Thanks a lot for the detailed review. I really appreciate your feedback. >> ???? >>>>>> Please see my response in your email below marked with >> (Sandhya >> ???? >>>>>>>>> ). Looking forward to your advice. >> ???? >>>>>> >> ???? >>>>>> Best Regards, >> ???? >>>>>> Sandhya >> ???? >>>>>> >> ???? >>>>>> >> ???? >>>>>> -----Original Message----- >> ???? >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com >> ] >> ???? >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM >> ???? >>>>>> To: Viswanathan, Sandhya > >; >> ???? >>>>>> hotspot-compiler-dev at openjdk.java.net >> >> ???? >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >> ???? >>>>>> instruction >> ???? >>>>>> >> ???? >>>>>> Thank you. >> ???? >>>>>> >> ???? >>>>>> I want to discuss next issue: >> ???? >>>>>> >> ???? >>>>>> ??? > You did not added instructions to load these >> registers from >> ???? >>>>>> memory (and stack). What happens in such cases when you need to load or store? >> ???? >>>>>> ??? >>>> Let us take an example, e.g. for loading into >> rregF. First >> ???? >>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa. >> ???? >>>>>> >> ???? >>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack. >> ???? >>>>>> Also we don't check that register could be the same as result you may get unneeded moves. >> ???? >>>>>> >> ???? >>>>>> I would advice add memory moves at least. >> ???? >>>>>> >> ???? >>>>>> Sandhya >>>? I had added those rules initially and >> removed them in >> ???? >>>>>> the final patch. I noticed that the register allocator >> uses the >> ???? >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg >> mask >> ???? >>>>>> (matcher.cpp). I would like the register allocator to get >> all the >> ???? >>>>>> possible register on an architecture for idealreg2reg >> mask. I >> ???? >>>>>> wondered that multiple instruct rules in .ad file for >> LoadF from >> ???? >>>>>> memory might cause problems.? I would have to have higher >> cost for >> ???? >>>>>> loading into restricted register set like vlReg. Then I >> decided that >> ???? >>>>>> the register allocator can handle this in much better way >> than me >> ???? >>>>>> adding rules to load from memory. This is with the >> background that the regF is always all the available >> ??? registers >> ???? >>>>>> and vlRegF is the restricted register set. Likewise for >> VecS and legVecS. Let me know you thoughts on this >> ??? and if >> ???? >>>>>> I should still add the rules to load from memory into >> vlReg and legVec. The specific code from matcher.cpp >> ??? that I >> ???? >>>>>> am referring to is: >> ???? >>>>>> ???? MachNode *spillCP = match_tree(new >> ???? >>>>>> >> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >> ???? >>>>>> #endif >> ???? >>>>>> ???? MachNode *spillI? = match_tree(new >> ???? >>>>>> >> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); >> ???? >>>>>> ???? MachNode *spillL? = match_tree(new >> ???? >>>>>> >> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, >> ???? >>>>>> LoadNode::DependsO nlyOnTest, false)); >> ???? >>>>>> ???? MachNode *spillF? = match_tree(new >> ???? >>>>>> >> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); >> ???? >>>>>> ???? MachNode *spillD? = match_tree(new >> ???? >>>>>> >> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); >> ???? >>>>>> ???? MachNode *spillP? = match_tree(new >> ???? >>>>>> >> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >> ???? >>>>>> ???? .... >> ???? >>>>>> ???? idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); >> ???? >>>>>> >> ???? >>>>>> An other question. You use movflt() and movdbl() which >> use either >> ???? >>>>>> movap[s|d] and movs[s|d] >> ???? >>>>>> instructions: >> ???? >>>>>> >> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu >> ???? >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions >> work when >> ???? >>>>>> avx512vl is not available? I see for vectors you use >> ???? >>>>>> vpxor+vinserti* combination. >> ???? >>>>>> >> ???? >>>>>> Sandhya >>> Yes the scalar floating point instructions >> are available >> ???? >>>>>> with AVX512 encoding when avx512vl is not available. That >> is why you >> ???? >>>>>> would see not just movflt, movdbl but all the other >> scalar >> ???? >>>>>> operations like adds, addsd etc using the entire xmm >> range (xmm0-31). In other words they are AVX512F >> ??? instructions. >> ???? >>>>>> >> ???? >>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad : >> ???? >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl()) >> ???? >>>>>> >> ???? >>>>>> Should it be (UseAVX < 3)? >> ???? >>>>>> >> ???? >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev. >> ???? >>>>>> >> ???? >>>>>> Thanks, >> ???? >>>>>> Vladimir >> ???? >>>>>> >> ???? >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: >> ???? >>>>>>> Hi Vladimir, >> ???? >>>>>>> >> ???? >>>>>>> Thanks a lot for your review and feedback. Please see my >> response >> ???? >>>>>>> in your email below. I will send an updated webrev incorporating your feedback. >> ???? >>>>>>> >> ???? >>>>>>> Best Regards, >> ???? >>>>>>> Sandhya >> ???? >>>>>>> >> ???? >>>>>>> >> ???? >>>>>>> -----Original Message----- >> ???? >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com >> ] >> ???? >>>>>>> Sent: Monday, September 10, 2018 6:09 PM >> ???? >>>>>>> To: Viswanathan, Sandhya > >; >> ???? >>>>>>> hotspot-compiler-dev at openjdk.java.net >> >> ???? >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on >> AVX512 >> ???? >>>>>>> instruction >> ???? >>>>>>> >> ???? >>>>>>> Very nice. Thank you, Sandhya. >> ???? >>>>>>> >> ???? >>>>>>> I would like to see more meaningful naming in .ad files >> - instead >> ???? >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*. >> ???? >>>>>>> >> ???? >>>>>>>>>> Yes, accepted. >> ???? >>>>>>> >> ???? >>>>>>> New load_from_* and load_to_* instructions in .ad files >> should be >> ???? >>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions: >> ???? >>>>>>> >> ???? >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct >> MoveVL2F(regF dst, >> ???? >>>>>>> vlRegF src) >> ???? >>>>>>>>>> Yes, accepted. >> ???? >>>>>>> >> ???? >>>>>>> You did not added instructions to load these registers >> from memory >> ???? >>>>>>> (and stack). What happens in such cases when you need to load or store? >> ???? >>>>>>>>>> Let us take an example, e.g. for loading into rregF. >> First it >> ???? >>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa. >> ???? >>>>>>> >> ???? >>>>>>> Also please explain why these registers are used when UseAVX == 0?: >> ???? >>>>>>> >> ???? >>>>>>> +instruct absD_reg(rregD dst) %{ >> ???? >>>>>>> ??????? predicate((UseSSE>=2) && (UseAVX == 0)); >> ???? >>>>>>> >> ???? >>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too: >> ???? >>>>>>> ????? 661?? if (UseAVX < 3) { >> ???? >>>>>>> ????? 662???? _features &= ~CPU_AVX512F; >> ???? >>>>>>> >> ???? >>>>>>>>>> Yes, accepted. It could be regD here. >> ???? >>>>>>> >> ???? >>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): >> ???? >>>>>>> >> ???? >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, >> ???? >>>>>>> +vectors_reg_legacy, %{ >> ???? >>>>>>> VM_Version::supports_evex() && >> VM_Version::supports_avx512bw() && >> ???? >>>>>>> VM_Version::supports_avx512dq() && >> ???? >>>>>>> VM_Version::supports_avx512vl() %} ); >> ???? >>>>>>> >> ???? >>>>>>>>>> Yes, accepted. >> ???? >>>>>>> >> ???? >>>>>>> I would suggest to test these changes on different >> machines >> ???? >>>>>>> (non-avx512 and avx512) and with different UseAVX values. >> ???? >>>>>>> >> ???? >>>>>>>>>> Will do. >> ???? >>>>>>> >> ???? >>>>>>> Thanks, >> ???? >>>>>>> Vladimir >> ???? >>>>>>> >> ???? >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: >> ???? >>>>>>>> Recently there have been couple of high priority issues >> with >> ???? >>>>>>>> regards to high bank of XMM register >> ???? >>>>>>>> (XMM16-XMM31) usage by C2: >> ???? >>>>>>>> >> ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746 >> ???? >>>>>>>> >> ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735 >> ???? >>>>>>>> >> ???? >>>>>>>> Please find below a patch which attempts to clean up >> the XMM >> ???? >>>>>>>> register handling by using register groups. >> ???? >>>>>>>> >> ???? >>>>>>>> >> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ >> ??? >> ???? >>>>>>>> >> >> ???? >>>>>>>> >> ???? >>>>>>>> The patch provides a restricted set of registers to the >> match >> ???? >>>>>>>> rules in the ad file based on the underlying architecture. >> ???? >>>>>>>> >> ???? >>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler. >> ???? >>>>>>>> >> ???? >>>>>>>> By removing the special handling, the patch reduces the >> overall >> ???? >>>>>>>> code size by about 1800 lines of code. >> ???? >>>>>>>> >> ???? >>>>>>>> Your review and feedback is very welcome. >> ???? >>>>>>>> >> ???? >>>>>>>> Best Regards, >> ???? >>>>>>>> >> ???? >>>>>>>> Sandhya >> ???? >>>>>>>> >> >> >> -- >> >> Thanks, >> >> Jc >> From rkennke at redhat.com Sun Sep 23 18:47:53 2018 From: rkennke at redhat.com (Roman Kennke) Date: Sun, 23 Sep 2018 20:47:53 +0200 Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of single-threaded walk of thread stacks in NMethodSweeper::mark_active_nmethods() In-Reply-To: References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com> Message-ID: <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com> Hi David, thanks for looking at this! > Should compiler folk be looking at this as well? Maybe. I added them. > I'm not familiar with the details of the NMethodSweeper but it seems to > me that this change potentially allows multiple concurrent executions of > NMethodSweeper::prepare_mark_active_nmethods() and that code does not > appear to be thread-safe. There are two scenarios now: - TLHS enabled: NMethodSweeper::prepare_mark_active_nmethods() only gets called from the sweeper thread. - TLHS disabled: NMethodSweeper::prepare_mark_active_nmethods() only gets called from VMThread/at-safepoint. The structures used in NMethodSweeper::prepare_mark_active_nmethods() are only ever called from sweeper thread, or at safepoint, and those are exclusive, that means it should be safe. And instead of removing the assert, we can extend it to accept the sweeper thread. I also noticed that we need to grab the CodeCache_lock before calling into prepare_mark_active_nmethods() so I added that and put that into the assertion. Incremental webrev: http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.02.diff/ Full webrev: http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.02/ Better now? Thanks, Roman > > Thanks, > David > > On 21/09/2018 9:57 AM, Roman Kennke wrote: >> Please review the following change to improve and/or eliminate stop to >> to mark stacks for NMethodSweeper. >> >> The proposed change is two-fold: >> - If ThreadLocalHandshake is enabled, do the stack-nmethod-marking using >> TLHS. This completely eliminates the full safepoint. In this scenario, >> nmethod-marking will also be skipped during safepoint-cleanup. IOW, it >> only happens when the sweeper loop asks for it. It is also most >> efficient because each thread scans its own stack, without requiring to >> synchronize with other threads. Everything remains free-running. >> - Otherwise, try to use GC-safepoint-workers to do the marking at SP. >> The infrastructure for this is already there since some time, and both >> G1 and ZGC (and Shenandoah, when it arrives) support it. The >> safepoint-cleanup-phase already uses it, so let's just do the same in >> sweeper-loop-induced safepoints. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8132849 >> Webrev: >> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.01/ >> >> Testing: hotspot/jtreg:tier1 using +ThreadLocalHandshakes and >> -ThreadLocalHandshakes >> >> One issue that I am not sure of is the: >> >> assert(SafepointSynchronize::is_at_safepoint(), "must be executed at a >> safepoint"); >> >> at the start of NMethodSweeper::prepare_mark_active_nmethods(). >> >> I couldn't see any particular reason for it. The >> wait_for_stack_scanning() stuff is called outside safepoinst anyway, and >> the other stuff doesn't seem critical. And besides, in the scenario >> where we'd call this outside safepoint (+ThreadLocalHandshakes) we'd >> only ever call it from the sweeper thread anyway. >> >> What do you think? >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From david.holmes at oracle.com Sun Sep 23 19:38:56 2018 From: david.holmes at oracle.com (David Holmes) Date: Sun, 23 Sep 2018 15:38:56 -0400 Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of single-threaded walk of thread stacks in NMethodSweeper::mark_active_nmethods() In-Reply-To: <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com> References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com> <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com> Message-ID: <41505140-0b14-76c7-3c0b-6cf71486c9c7@oracle.com> Hi Roman, Thanks for clarifying only two possible threads involved - and not concurrently. That eases my concern. I'll leave detailed reviews to others. David On 23/09/2018 2:47 PM, Roman Kennke wrote: > Hi David, > > thanks for looking at this! > >> Should compiler folk be looking at this as well? > > Maybe. I added them. > >> I'm not familiar with the details of the NMethodSweeper but it seems to >> me that this change potentially allows multiple concurrent executions of >> NMethodSweeper::prepare_mark_active_nmethods() and that code does not >> appear to be thread-safe. > > There are two scenarios now: > - TLHS enabled: NMethodSweeper::prepare_mark_active_nmethods() only gets > called from the sweeper thread. > - TLHS disabled: NMethodSweeper::prepare_mark_active_nmethods() only > gets called from VMThread/at-safepoint. > > The structures used in NMethodSweeper::prepare_mark_active_nmethods() > are only ever called from sweeper thread, or at safepoint, and those are > exclusive, that means it should be safe. And instead of removing the > assert, we can extend it to accept the sweeper thread. I also noticed > that we need to grab the CodeCache_lock before calling into > prepare_mark_active_nmethods() so I added that and put that into the > assertion. > > Incremental webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.02.diff/ > Full webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.02/ > > Better now? > > Thanks, > Roman > > >> >> Thanks, >> David >> >> On 21/09/2018 9:57 AM, Roman Kennke wrote: >>> Please review the following change to improve and/or eliminate stop to >>> to mark stacks for NMethodSweeper. >>> >>> The proposed change is two-fold: >>> - If ThreadLocalHandshake is enabled, do the stack-nmethod-marking using >>> TLHS. This completely eliminates the full safepoint. In this scenario, >>> nmethod-marking will also be skipped during safepoint-cleanup. IOW, it >>> only happens when the sweeper loop asks for it. It is also most >>> efficient because each thread scans its own stack, without requiring to >>> synchronize with other threads. Everything remains free-running. >>> - Otherwise, try to use GC-safepoint-workers to do the marking at SP. >>> The infrastructure for this is already there since some time, and both >>> G1 and ZGC (and Shenandoah, when it arrives) support it. The >>> safepoint-cleanup-phase already uses it, so let's just do the same in >>> sweeper-loop-induced safepoints. >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8132849 >>> Webrev: >>> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.01/ >>> >>> Testing: hotspot/jtreg:tier1 using +ThreadLocalHandshakes and >>> -ThreadLocalHandshakes >>> >>> One issue that I am not sure of is the: >>> >>> assert(SafepointSynchronize::is_at_safepoint(), "must be executed at a >>> safepoint"); >>> >>> at the start of NMethodSweeper::prepare_mark_active_nmethods(). >>> >>> I couldn't see any particular reason for it. The >>> wait_for_stack_scanning() stuff is called outside safepoinst anyway, and >>> the other stuff doesn't seem critical. And besides, in the scenario >>> where we'd call this outside safepoint (+ThreadLocalHandshakes) we'd >>> only ever call it from the sweeper thread anyway. >>> >>> What do you think? >>> >>> > > From erik.osterlund at oracle.com Sun Sep 23 22:08:48 2018 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 24 Sep 2018 00:08:48 +0200 Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of single-threaded walk of thread stacks in NMethodSweeper::mark_active_nmethods() In-Reply-To: <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com> References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com> <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com> Message-ID: <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com> Hi Roman, Thank you for sorting this out. It is very helpful. Could you change the name of ThreadToCodeBlobClosure to NMethodMarkingThreadClosure. (Motivation: the closure filters JavaThreads that are not the sweeper, and actually only looks at nmethods, and not other types of CodeBlobs, e.g. AoT methods, so it does less than I expect). Also, the NMethodSweeper::prepare_mark_active_nmethods() was built for safepoint cleaning and returns either hotness counting or nmethod marking closures. However, when moving nmethod marking out to be done concurrently with TLH, it is slightly confusing to have the same member function called from the concurrent context despite never ever wanting a hotness counter closure from there. I'm thinking the prepare_mark_active_nmethods() member function could be split into two: One member function that returns either nmethod marking closure or NULL (depending on whether it's needed or not). Another member function that calls the first one, and if NULL slaps on a hotness counter closure. Then from concurrent contexts we would call the first method (nmethod marking or NULL), and from STW contexts we would call the second member function (nmethod marking or hotness counter). Another thing worth noticing is that the VM_MarkActiveNMethods VM operation marks the nmethods on the stack twice. First in safepoint cleanup, and subsequently in the operation itself (VM_MarkActiveNMethods::doit). I would argue that only one pass is enough. Therefore, I would propose to completely remove the nmethod marking from the safepoint cleanup, and have safepoint cleanup *only* fiddle around with hotness counters. If we do that, then nmethod marking is done in VM_MarkActiveNMethods::doit if TLH is off, and in your new handshake operation when TLH is on. Then we can have zero nmethod marking in safepoint cleanup, and subsequently figure out how to get rid of the hotness counters. Thanks, /Erik On 2018-09-23 20:47, Roman Kennke wrote: > Hi David, > > thanks for looking at this! > >> Should compiler folk be looking at this as well? > Maybe. I added them. > >> I'm not familiar with the details of the NMethodSweeper but it seems to >> me that this change potentially allows multiple concurrent executions of >> NMethodSweeper::prepare_mark_active_nmethods() and that code does not >> appear to be thread-safe. > There are two scenarios now: > - TLHS enabled: NMethodSweeper::prepare_mark_active_nmethods() only gets > called from the sweeper thread. > - TLHS disabled: NMethodSweeper::prepare_mark_active_nmethods() only > gets called from VMThread/at-safepoint. > > The structures used in NMethodSweeper::prepare_mark_active_nmethods() > are only ever called from sweeper thread, or at safepoint, and those are > exclusive, that means it should be safe. And instead of removing the > assert, we can extend it to accept the sweeper thread. I also noticed > that we need to grab the CodeCache_lock before calling into > prepare_mark_active_nmethods() so I added that and put that into the > assertion. > > Incremental webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.02.diff/ > Full webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.02/ > > Better now? > > Thanks, > Roman > > >> Thanks, >> David >> >> On 21/09/2018 9:57 AM, Roman Kennke wrote: >>> Please review the following change to improve and/or eliminate stop to >>> to mark stacks for NMethodSweeper. >>> >>> The proposed change is two-fold: >>> - If ThreadLocalHandshake is enabled, do the stack-nmethod-marking using >>> TLHS. This completely eliminates the full safepoint. In this scenario, >>> nmethod-marking will also be skipped during safepoint-cleanup. IOW, it >>> only happens when the sweeper loop asks for it. It is also most >>> efficient because each thread scans its own stack, without requiring to >>> synchronize with other threads. Everything remains free-running. >>> - Otherwise, try to use GC-safepoint-workers to do the marking at SP. >>> The infrastructure for this is already there since some time, and both >>> G1 and ZGC (and Shenandoah, when it arrives) support it. The >>> safepoint-cleanup-phase already uses it, so let's just do the same in >>> sweeper-loop-induced safepoints. >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8132849 >>> Webrev: >>> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.01/ >>> >>> Testing: hotspot/jtreg:tier1 using +ThreadLocalHandshakes and >>> -ThreadLocalHandshakes >>> >>> One issue that I am not sure of is the: >>> >>> assert(SafepointSynchronize::is_at_safepoint(), "must be executed at a >>> safepoint"); >>> >>> at the start of NMethodSweeper::prepare_mark_active_nmethods(). >>> >>> I couldn't see any particular reason for it. The >>> wait_for_stack_scanning() stuff is called outside safepoinst anyway, and >>> the other stuff doesn't seem critical. And besides, in the scenario >>> where we'd call this outside safepoint (+ThreadLocalHandshakes) we'd >>> only ever call it from the sweeper thread anyway. >>> >>> What do you think? >>> >>> > From kuaiwei.kw at alibaba-inc.com Mon Sep 24 06:06:11 2018 From: kuaiwei.kw at alibaba-inc.com (Kuai Wei) Date: Mon, 24 Sep 2018 14:06:11 +0800 Subject: =?UTF-8?B?5Zue5aSN77yaW1BhdGNoXSA4MjEwODUzOiBDMiBkb2Vzbid0IHNraXAgcG9zdCBiYXJyaWVy?= =?UTF-8?B?IGZvciBuZXcgYWxsb2NhdGVkIG9iamVjdHM=?= In-Reply-To: References: <37b6e1a5-6dcf-4097-baca-3f9c03d38bb1.kuaiwei.kw@alibaba-inc.com>, Message-ID: Hi Tobias, Thanks for your suggestion. I think your point is the region node may have new path in later parse phase, so we can not make sure the region node will be optimized. It's a good question and I checked it. Now I think it may not cause trouble. In post barrier reduce, the oop store use allocation node as base pointer. The data graph guarantee control of allocation node should dominate control of store. If allocation node is in pred of region node and there's a new path into region, the graph is bad because we can reach store without allocation. If allocation node is in a domination ancestor, the graph shape is a little complicated, so we can not reach control of allocation by skipping one region. The better solution is we can know the region node is created in exit_map and we will not change it in later. Is there any way to know it in compile time? Thanks, Kevin ------------------------------------------------------------------ ????Tobias Hartmann ?????2018?9?20?(???) 23:22 ??????(??) ; hotspot compiler ????Re: [Patch] 8210853: C2 doesn't skip post barrier for new allocated objects Hi, isn't this code executed during parsing and therefore it could happen that more inputs are added to the region? For example, by Parse::Block::add_new_path(): http://hg.openjdk.java.net/jdk/jdk/file/75e4ce0fa1ba/src/hotspot/share/opto/parse1.cpp#l1917 Best regards, Tobias On 18.09.2018 09:33, Kuai Wei wrote: > > Hi, > > I made a patch to https://bugs.openjdk.java.net/browse/JDK-8210853 . Could you help review my change? > > Background: > C2 could remove G1 post barrier if store to new allocated object. But the check of > just_allocated_object will be prevent by a Region node which is created when inline initialize > method of super class. The change is to check the pattern and skip the Region node. > > src/hotspot/share/opto/graphKit.cpp > > // We use this to determine if an object is so "fresh" that > // it does not require card marks. > Node* GraphKit::just_allocated_object(Node* current_control) { > - if (C->recent_alloc_ctl() == current_control) > + Node * ctrl = current_control; > + // Object:: is invoked after allocation, most of invoke nodes > + // will be reduced, but a region node is kept in parse time, we check > + // the pattern and skip the region node > + if (ctrl != NULL && ctrl->is_Region() && ctrl->req() == 2) { > + ctrl = ctrl->in(1); > + } > + if (C->recent_alloc_ctl() == ctrl) > return C->recent_alloc_obj(); > return NULL; > } -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alan.Bateman at oracle.com Mon Sep 24 08:14:44 2018 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 24 Sep 2018 09:14:44 +0100 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> Message-ID: <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> On 21/09/2018 16:44, Andrew Dinn wrote: > Hi Alan, > > Thanks for the response and apologies for failing to notice you had > posted it some days ago (doh!). > > Jonathan Halliday has already explained how Red Hat might want to use > this API. Well, what he said, essentially! In particular, this model > provides a way of ensuring that raw byte data is able to be persisted > coherently from Java with the minimal possible overhead. It would be up > to client code above this layer to implement structuring mechanisms for > how those raw bytes get populated with data and to manage any associated > issues regarding atomicity, consistency and isolation (i.e. to provide > the A, C and I of ACID to this API's D). > > The main point of the JEP is to ensure that this such a performant base > capability is available for known important cases where that is needed > such as, for example, a transaction manager or a distributed cache. If > equivalent middleware written in C can use persistent memory to bring > the persistent storage tier nearer to the CPU and, hence, lower data > durability overheads then we really need an equivalently performant > option in Java or risk Java dropping out as a player in those middleware > markets. > > I am glad to hear that other alternatives might be available and would > be happy to consider them. However, I'm not sure that this means this > option is not still desirable, especially if it is orthogonal to those > other alternatives. Most importantly, this one has the advantage that we > know it is ready to use and will provide benefits (we have already > implemented a journaled transaction log over it with promising results > and someone from our messaging team has already been looking into using > it to persist log messages). Indeed, we also know we can use it to > provide a base for supporting all the use cases addressed by Intel's > libpmem and available to C programmers, e.g. a block store, simply by > implementing Java client libraries that provide managed access to the > persistent buffer along the same lines as the Intel C libraries. > > I'm afraid I am not familiar with Panama 'scopes' and 'pointers' so I > can't really compare options here. Can you point me at any info that > explains what those terms mean and how it might be possible to use them > to access off-heap, persistent data. > I'm not questioning the need to support NVM, instead I'm trying to see whether MappedByteBuffer is the right way to expose this in the standard API. Buffers were designed in JSR-51 with specific use-cases in mind but they are problematic for many off-heap cases as they aren't thread safe, are limited to 2GB, lack confinement, only support homogeneous data (no layout support). At the same time, Project Panama (foreign branch in panama/dev) has the potential to provide the right API to work with memory. I see Jonathan's mail where he seems to be using object serialization so the solution on the table works for his use-case but it may not be the right solution for more general multi-threaded access to NVM. There is some interest in seeing whether this part of Project Panama could be advanced to address many of the cases where developers are resorting to using Unsafe today. There would of course need to be some integration with buffers too. There's no concrete proposal/JEP at this time, I'm just pointing out that many of the complaints about buffers that are really cases where it's the wrong API and the real need is something more fundamental. So where does this leave us? If support for persistent memory is added to FileChannel.map as we've been discussing then it may not be too bad as the API surface is small. The API surface is just new map modes and a MappedByteBuffer::isPersistent method. The force method that specify a range is something useful to add to MBB anyway. If (and I hope when) there is support for memory regions or pointers then I could imagine re-visiting this so that there are alternative ways to get a memory region or pointer that is backed by NVM. If the timing were different then I think we'd skip the new map modes and we would be having a different discussion here. An alternative is course to create the mapped buffer via a JDK-specific API as that would be easier to deprecate and remove in the future if needed. I'm interested to see if there is other input on this topic before it gets locked into extending the standard API. -Alan. From rkennke at redhat.com Mon Sep 24 08:18:21 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 24 Sep 2018 10:18:21 +0200 Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of single-threaded walk of thread stacks in NMethodSweeper::mark_active_nmethods() In-Reply-To: <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com> References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com> <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com> <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com> Message-ID: Hi Erik, > Thank you for sorting this out. It is very helpful. > > Could you change the name of ThreadToCodeBlobClosure to > NMethodMarkingThreadClosure. (Motivation: the closure filters > JavaThreads that are not the sweeper, and actually only looks at > nmethods, and not other types of CodeBlobs, e.g. AoT methods, so it does > less than I expect). Yes will do, but let's first agree on the other issues: > Also, the NMethodSweeper::prepare_mark_active_nmethods() was built for > safepoint cleaning and returns either hotness counting or nmethod > marking closures. However, when moving nmethod marking out to be done > concurrently with TLH, it is slightly confusing to have the same member > function called from the concurrent context despite never ever wanting a > hotness counter closure from there. The hotness-counting-only-closure will never be used when called from the sweeper thread because this only happens between sweeping-cycles. E.g. when safepointing while sweeper is active, it would do only hotness counting, when sweeper is idle, it would do nmethod marking, which *also* does the hotness counting. With TLHS, we'd always do the full thing, because it's only ever queried between sweeper cycles. > I'm thinking the prepare_mark_active_nmethods() member function could be > split into two: > > One member function that returns either nmethod marking closure or NULL > (depending on whether it's needed or not). > Another member function that calls the first one, and if NULL slaps on a > hotness counter closure. We could do that, yes. > Then from concurrent contexts we would call the first method (nmethod > marking or NULL), and from STW contexts we would call the second member > function (nmethod marking or hotness counter). Right. > Another thing worth noticing is that the VM_MarkActiveNMethods VM > operation marks the nmethods on the stack twice. First in safepoint > cleanup, and subsequently in the operation itself > (VM_MarkActiveNMethods::doit). I would argue that only one pass is > enough. Right... Therefore, I would propose to completely remove the nmethod > marking from the safepoint cleanup, and have safepoint cleanup *only* > fiddle around with hotness counters. If we do that, then nmethod marking > is done in VM_MarkActiveNMethods::doit if TLH is off, and in your new > handshake operation when TLH is on. Yeah, except that hotness counting is also done in nmethod marking pass. Would it be enough if we just kept it there? Or do we want hotness counting stuff to be done always in SP cleanup phase, and not piggy-back it on nmethod marking? > Then we can have zero nmethod marking in safepoint cleanup, and > subsequently figure out how to get rid of the hotness counters. Is there a use of doing nmethod marking more frequently than what is forced in do_stack_scanning() ? Is there a use of doing frequent hotness counters? Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From rkennke at redhat.com Mon Sep 24 09:36:09 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 24 Sep 2018 11:36:09 +0200 Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of single-threaded walk of thread stacks in NMethodSweeper::mark_active_nmethods() In-Reply-To: References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com> <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com> <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com> Message-ID: <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com> > Is there a use of doing nmethod marking more frequently than what is > forced in do_stack_scanning() ? As far as I can tell, it is sufficient to mark nmethods right before sweeping. It might even be counter-productive to do more marking passes: it would result in more non-entrant nmethods marked as 'seen on stack' even if they are no longer on stack. I am not 100% sure about the hotness counter though. From what I see, it's only used for sweeper too, and it really looks like resetting the counter on nmethod-walk is enough. But I'd like confirmation from somebody who knows better than I do. If it's really good enough, we may remove the nmethod stuff completely from SP cleanup, and also remove the hotness-counter-closure, and always piggy-back the stuff on nmethod walking, either in its own VM_Op, or in its handshake. On the other hand, why is hotness counting and nmethod marking split out in sp-cleanup in the first place then? Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From rkennke at redhat.com Mon Sep 24 10:02:25 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 24 Sep 2018 12:02:25 +0200 Subject: RFR(M): 8210885: Convert left over loads/stores to access api In-Reply-To: References: Message-ID: <4215c8ba-3d64-a506-1f48-716e30a2fa56@redhat.com> Hi Roland, the change looks good to me. Thanks, Roman > http://cr.openjdk.java.net/~roland/8210885/webrev.00/ > > This converts some remaining loads and stores to the access API (as > preparation for shenandoah). This also cleans up the C2 access API: some > entry points get a control argument that's in practice useless because > current control() is always used. > > Roland. > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From tobias.hartmann at oracle.com Mon Sep 24 10:06:43 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 24 Sep 2018 12:06:43 +0200 Subject: RFR(M): 8210885: Convert left over loads/stores to access api In-Reply-To: References: Message-ID: <322e26c7-6179-f14c-3c5d-abd4a1cae76d@oracle.com> Hi Roland, looks good to me. Best regards, Tobias On 18.09.2018 21:57, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8210885/webrev.00/ > > This converts some remaining loads and stores to the access API (as > preparation for shenandoah). This also cleans up the C2 access API: some > entry points get a control argument that's in practice useless because > current control() is always used. > > Roland. > From tobias.hartmann at oracle.com Mon Sep 24 10:15:35 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 24 Sep 2018 12:15:35 +0200 Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy In-Reply-To: References: Message-ID: <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com> Hi Roland, Looks good to me. I'm assuming the currently unused arguments of array_copy_requires_gc_barriers will be used for Shenandoah? Why you need to distinguish between ArrayCopyPhase? Best regards, Tobias On 18.09.2018 22:09, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8210887/webrev.00/ > > This extends the entry point of the c2 access api for arraycopy (in > preparation for shenandoah). This also fixes some incorrect > _adr_type's. It also modifies the ArrayCopyNode::Ideal() transform that > produces a series of loads/stores so, as a subsequent change, barriers > can be added to loads and stores: they need to produce and consume > memory state other than the slice of the array being copied. > > Roland. > From erik.osterlund at oracle.com Mon Sep 24 11:25:16 2018 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 24 Sep 2018 13:25:16 +0200 Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of single-threaded walk of thread stacks in NMethodSweeper::mark_active_nmethods() In-Reply-To: <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com> References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com> <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com> <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com> <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com> Message-ID: <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com> Hi Roman, I think that by answering the meta question "wait why are we doing this" in this email, I will cover the questions in the previous email too. The nmethod marking is strictly required so that after you have selected your not entrant nmethods that you want to nuke, you know that at some snapshot in time, they were not on the stack (and cant have become so afterwards because they are not entrant). As I mentioned earlier, doing this both in safepoint cleanup for every safepoint, as well as in the VM operation itself, is "questionable". Doing it in just the VM operation/handshake should be enough. The hotness counting is not strictly necessary at all. In fact, you can turn it off with the JVM flag -XX:-UseCodeAging. So the hotness counter updating is part of the code aging mechanism. This is more of a heuristic thing than a correctness thing. You can just wait until you run out of space in the code heap, and then nuke a bunch of stuff (using the nmethod marking mechanism), and you are good. But similar to how you in your GC algorithm want to avoid running into full GCs because they are expensive, you also want to avoid filling up the code heap, because the consequences of that are also very expensive. The code aging mechanism was therefore introduced as a way of figuring out if there are seemingly inactive nmethods that can be discarded before running out of code heap memory. So the way that works is that you give each nmethod a counter that you decay every now and then, but heat up again when you see said nmethods on the stack. That way, the sweeper can look for nmethods that do not seem to have been found on the stack "for a while", and select them as good candidates for being inactive. So to answer the question whether you can update hotness counters only when you mark nmethods... you can. But by doing that, it no longer serves its purpose of finding inactive nmethods, and becomes more of a piece of logic that we run occasionally for the fun of it. So we should not do that. The reason that hotness counters are in safepoint cleanup, is to provide fresh stack samples to the sweeper. So my suggestion for now is: Do nmethod marking in VM operation/handshake operation. Do hotness counter updating when UseCodeAging in safepoint cleanup. And now you might be wondering if it really makes sense to walk all stacks in the system every safepoint, to provide some heuristic about whether nmethods are inactive or not. Arguably not. I have an idea about a much better way of doing this. I will get back to you in a few days about that. Hope this helps. Thanks, /Erik On 2018-09-24 11:36, Roman Kennke wrote: >> Is there a use of doing nmethod marking more frequently than what is >> forced in do_stack_scanning() ? > As far as I can tell, it is sufficient to mark nmethods right before > sweeping. It might even be counter-productive to do more marking passes: > it would result in more non-entrant nmethods marked as 'seen on stack' > even if they are no longer on stack. > > I am not 100% sure about the hotness counter though. From what I see, > it's only used for sweeper too, and it really looks like resetting the > counter on nmethod-walk is enough. But I'd like confirmation from > somebody who knows better than I do. If it's really good enough, we may > remove the nmethod stuff completely from SP cleanup, and also remove the > hotness-counter-closure, and always piggy-back the stuff on nmethod > walking, either in its own VM_Op, or in its handshake. > > On the other hand, why is hotness counting and nmethod marking split out > in sp-cleanup in the first place then? > > Roman > From rkennke at redhat.com Mon Sep 24 11:47:16 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 24 Sep 2018 13:47:16 +0200 Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of single-threaded walk of thread stacks in NMethodSweeper::mark_active_nmethods() In-Reply-To: <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com> References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com> <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com> <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com> <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com> <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com> Message-ID: <89db00b1-cadd-8f78-92b3-253e2dd1a652@redhat.com> Hi Erik, > I think that by answering the meta question "wait why are we doing this" > in this email, I will cover the questions in the previous email too. > > The nmethod marking is strictly required so that after you have selected > your not entrant nmethods that you want to nuke, you know that at some > snapshot in time, they were not on the stack (and cant have become so > afterwards because they are not entrant). As I mentioned earlier, doing > this both in safepoint cleanup for every safepoint, as well as in the VM > operation itself, is "questionable". Doing it in just the VM > operation/handshake should be enough. > > The hotness counting is not strictly necessary at all. In fact, you can > turn it off with the JVM flag -XX:-UseCodeAging. > > So the hotness counter updating is part of the code aging mechanism. > This is more of a heuristic thing than a correctness thing. You can just > wait until you run out of space in the code heap, and then nuke a bunch > of stuff (using the nmethod marking mechanism), and you are good. But > similar to how you in your GC algorithm want to avoid running into full > GCs because they are expensive, you also want to avoid filling up the > code heap, because the consequences of that are also very expensive. The > code aging mechanism was therefore introduced as a way of figuring out > if there are seemingly inactive nmethods that can be discarded before > running out of code heap memory. > > So the way that works is that you give each nmethod a counter that you > decay every now and then, but heat up again when you see said nmethods > on the stack. That way, the sweeper can look for nmethods that do not > seem to have been found on the stack "for a while", and select them as > good candidates for being inactive. > > So to answer the question whether you can update hotness counters only > when you mark nmethods... you can. But by doing that, it no longer > serves its purpose of finding inactive nmethods, and becomes more of a > piece of logic that we run occasionally for the fun of it. So we should > not do that. > > The reason that hotness counters are in safepoint cleanup, is to provide > fresh stack samples to the sweeper. > > So my suggestion for now is: > Do nmethod marking in VM operation/handshake operation. > Do hotness counter updating when UseCodeAging in safepoint cleanup. > > And now you might be wondering if it really makes sense to walk all > stacks in the system every safepoint, to provide some heuristic about > whether nmethods are inactive or not. Arguably not. I have an idea about > a much better way of doing this. I will get back to you in a few days > about that. Thanks for your explanations. That's more or less what I figured out from studying the code too. Couldn't we have a CodeAgeInterval (or similar) every this many ms we do the hotness-reset-scan, either by firing (from sweeper thread) a TLHS or a VM_Op ? This should get us a more regular sampling than doing this at the somewhat random safepoint-prologue? Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From rkennke at redhat.com Mon Sep 24 13:21:01 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 24 Sep 2018 15:21:01 +0200 Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of single-threaded walk of thread stacks in NMethodSweeper::mark_active_nmethods() In-Reply-To: <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com> References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com> <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com> <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com> <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com> <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com> Message-ID: <6c97cf4c-517f-767c-6723-1b2f41080f7f@redhat.com> > So my suggestion for now is: > Do nmethod marking in VM operation/handshake operation. > Do hotness counter updating when UseCodeAging in safepoint cleanup. Ok, this change should do that: Incremental: http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.03.diff/ Full: http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.03/ Note that nmethod-marking still resets hotness counters. Good now for pushing? Roman > And now you might be wondering if it really makes sense to walk all > stacks in the system every safepoint, to provide some heuristic about > whether nmethods are inactive or not. Arguably not. I have an idea about > a much better way of doing this. I will get back to you in a few days > about that. > > Hope this helps. > > Thanks, > /Erik > > On 2018-09-24 11:36, Roman Kennke wrote: >>> Is there a use of doing nmethod marking more frequently than what is >>> forced in do_stack_scanning() ? >> As far as I can tell, it is sufficient to mark nmethods right before >> sweeping. It might even be counter-productive to do more marking passes: >> it would result in more non-entrant nmethods marked as 'seen on stack' >> even if they are no longer on stack. >> >> I am not 100% sure about the hotness counter though. From what I see, >> it's only used for sweeper too, and it really looks like resetting the >> counter on nmethod-walk is enough. But I'd like confirmation from >> somebody who knows better than I do. If it's really good enough, we may >> remove the nmethod stuff completely from SP cleanup, and also remove the >> hotness-counter-closure, and always piggy-back the stuff on nmethod >> walking, either in its own VM_Op, or in its handshake. >> >> On the other hand, why is hotness counting and nmethod marking split out >> in sp-cleanup in the first place then? >> >> Roman >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From tobias.hartmann at oracle.com Mon Sep 24 13:34:28 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 24 Sep 2018 15:34:28 +0200 Subject: =?UTF-8?B?UmU6IOWbnuWkje+8mltQYXRjaF0gODIxMDg1MzogQzIgZG9lc24ndCBz?= =?UTF-8?Q?kip_post_barrier_for_new_allocated_objects?= In-Reply-To: References: <37b6e1a5-6dcf-4097-baca-3f9c03d38bb1.kuaiwei.kw@alibaba-inc.com> Message-ID: Hi Kevin, On 24.09.2018 08:06, Kuai Wei wrote: > ? Thanks for your suggestion. I think your point is the region node may have new path in later parse > phase, so we can not make sure the region node will be optimized. Yes, my point is that a new path to the region might be added after your optimization and that path might contain stores to the newly allocated object. > ? It's a good question and I checked it. Now I think it may not cause trouble. In post barrier > reduce, the oop store use allocation node as base pointer. The data graph guarantee control of > allocation node should dominate control of store. If allocation node is in pred of region node and > there's a new path into region, the graph is bad because we can reach store without allocation. Yes but the new path might be a backedge from a loop that is dominated by the allocation. > If allocation node is in a domination ancestor, the graph shape is a little complicated, so we can not > reach control of allocation by skipping one region. Right, that's basically the implicit assumption of your patch. I'm not sure if it always holds. But I think you should at least use RegionNode::is_copy(). Let's see what other reviewers think. > ? The better solution is we can know the region node is created in exit_map and we will not change > it in later. Is there any way to know it in compile time? The region node is created in Parse::build_exits(). I don't think there is a way to keep track of this. Thanks, Tobias From rkennke at redhat.com Mon Sep 24 16:04:22 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 24 Sep 2018 18:04:22 +0200 Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of single-threaded walk of thread stacks in NMethodSweeper::mark_active_nmethods() In-Reply-To: <6c97cf4c-517f-767c-6723-1b2f41080f7f@redhat.com> References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com> <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com> <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com> <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com> <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com> <6c97cf4c-517f-767c-6723-1b2f41080f7f@redhat.com> Message-ID: <818610b4-cf48-091a-3719-09f1f863c508@redhat.com> Zhengyu noted off-list that the !ThreadLocalHandshakes version requires to call Threads::change_thread_parity() before using Threads::possibly_parallel_threads_do(), and that we can assert is_Java_thread() instead of explicit filtering. This change does that: Incremental: http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.04.diff/ Full: http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.04/ Let me know what you think! Thanks, Roman >> So my suggestion for now is: >> Do nmethod marking in VM operation/handshake operation. >> Do hotness counter updating when UseCodeAging in safepoint cleanup. > > Ok, this change should do that: > Incremental: > http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.03.diff/ > Full: > http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.03/ > > Note that nmethod-marking still resets hotness counters. > > Good now for pushing? > > Roman > > >> And now you might be wondering if it really makes sense to walk all >> stacks in the system every safepoint, to provide some heuristic about >> whether nmethods are inactive or not. Arguably not. I have an idea about >> a much better way of doing this. I will get back to you in a few days >> about that. >> >> Hope this helps. >> >> Thanks, >> /Erik >> >> On 2018-09-24 11:36, Roman Kennke wrote: >>>> Is there a use of doing nmethod marking more frequently than what is >>>> forced in do_stack_scanning() ? >>> As far as I can tell, it is sufficient to mark nmethods right before >>> sweeping. It might even be counter-productive to do more marking passes: >>> it would result in more non-entrant nmethods marked as 'seen on stack' >>> even if they are no longer on stack. >>> >>> I am not 100% sure about the hotness counter though. From what I see, >>> it's only used for sweeper too, and it really looks like resetting the >>> counter on nmethod-walk is enough. But I'd like confirmation from >>> somebody who knows better than I do. If it's really good enough, we may >>> remove the nmethod stuff completely from SP cleanup, and also remove the >>> hotness-counter-closure, and always piggy-back the stuff on nmethod >>> walking, either in its own VM_Op, or in its handshake. >>> >>> On the other hand, why is hotness counting and nmethod marking split out >>> in sp-cleanup in the first place then? >>> >>> Roman >>> >> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From rkennke at redhat.com Mon Sep 24 16:46:46 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 24 Sep 2018 18:46:46 +0200 Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of single-threaded walk of thread stacks in NMethodSweeper::mark_active_nmethods() In-Reply-To: <662e1e46-a48e-34db-da0a-df693160928d@redhat.com> References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com> <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com> <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com> <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com> <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com> <6c97cf4c-517f-767c-6723-1b2f41080f7f@redhat.com> <818610b4-cf48-091a-3719-09f1f863c508@redhat.com> <662e1e46-a48e-34db-da0a-df693160928d@redhat.com> Message-ID: Hi Zhengyu, >> Zhengyu noted off-list that the !ThreadLocalHandshakes version requires >> to call Threads::change_thread_parity() before using >> Threads::possibly_parallel_threads_do(), and that we can assert >> is_Java_thread() instead of explicit filtering. > > My bad suggestion on assertion for Java thread, > Threads::possibly_parallel_threads_do also walks VMThread, sorry! Yes, I noticed that, and updated/reverted the webrev accordingly: http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.05/ > Otherwise, looks good to me. Thanks for reviewing! Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From zgu at redhat.com Mon Sep 24 16:40:45 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 24 Sep 2018 12:40:45 -0400 Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of single-threaded walk of thread stacks in NMethodSweeper::mark_active_nmethods() In-Reply-To: <818610b4-cf48-091a-3719-09f1f863c508@redhat.com> References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com> <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com> <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com> <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com> <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com> <6c97cf4c-517f-767c-6723-1b2f41080f7f@redhat.com> <818610b4-cf48-091a-3719-09f1f863c508@redhat.com> Message-ID: <662e1e46-a48e-34db-da0a-df693160928d@redhat.com> Hi Roman, On 09/24/2018 12:04 PM, Roman Kennke wrote: > Zhengyu noted off-list that the !ThreadLocalHandshakes version requires > to call Threads::change_thread_parity() before using > Threads::possibly_parallel_threads_do(), and that we can assert > is_Java_thread() instead of explicit filtering. My bad suggestion on assertion for Java thread, Threads::possibly_parallel_threads_do also walks VMThread, sorry! Otherwise, looks good to me. -Zhengyu > > This change does that: > Incremental: > http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.04.diff/ > Full: > http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.04/ > > Let me know what you think! > > Thanks, > Roman > >>> So my suggestion for now is: >>> Do nmethod marking in VM operation/handshake operation. >>> Do hotness counter updating when UseCodeAging in safepoint cleanup. >> >> Ok, this change should do that: >> Incremental: >> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.03.diff/ >> Full: >> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.03/ >> >> Note that nmethod-marking still resets hotness counters. >> >> Good now for pushing? >> >> Roman >> >> >>> And now you might be wondering if it really makes sense to walk all >>> stacks in the system every safepoint, to provide some heuristic about >>> whether nmethods are inactive or not. Arguably not. I have an idea about >>> a much better way of doing this. I will get back to you in a few days >>> about that. >>> >>> Hope this helps. >>> >>> Thanks, >>> /Erik >>> >>> On 2018-09-24 11:36, Roman Kennke wrote: >>>>> Is there a use of doing nmethod marking more frequently than what is >>>>> forced in do_stack_scanning() ? >>>> As far as I can tell, it is sufficient to mark nmethods right before >>>> sweeping. It might even be counter-productive to do more marking passes: >>>> it would result in more non-entrant nmethods marked as 'seen on stack' >>>> even if they are no longer on stack. >>>> >>>> I am not 100% sure about the hotness counter though. From what I see, >>>> it's only used for sweeper too, and it really looks like resetting the >>>> counter on nmethod-walk is enough. But I'd like confirmation from >>>> somebody who knows better than I do. If it's really good enough, we may >>>> remove the nmethod stuff completely from SP cleanup, and also remove the >>>> hotness-counter-closure, and always piggy-back the stuff on nmethod >>>> walking, either in its own VM_Op, or in its handshake. >>>> >>>> On the other hand, why is hotness counting and nmethod marking split out >>>> in sp-cleanup in the first place then? >>>> >>>> Roman >>>> >>> >> >> > > From aph at redhat.com Mon Sep 24 17:28:57 2018 From: aph at redhat.com (Andrew Haley) Date: Mon, 24 Sep 2018 18:28:57 +0100 Subject: RFR(S): 8209544: AES encrypt performance regression in jdk11b11 In-Reply-To: References: <98587949-4bef-51ad-0652-3a74f79af233@oracle.com> <158bc0a5-c374-4860-881a-f6ab6f6fe4bb@oracle.com> <9014fdf2-9a23-9e28-e1b7-eedc4441aae1@bell-sw.com> <097e9014-db4a-0f2f-9b78-80cb5dcc7832@bell-sw.com> <9b58801d-fcdd-88ff-7d45-15c44096137f@bell-sw.com> Message-ID: <41303f32-81a2-c84a-cc9b-bfe79a6c9577@redhat.com> On 09/20/2018 03:54 PM, Roland Westrelin wrote: > >> mkay, but how, exactly? Is it simply the case that Intel is improved >> so the patch is good, even if AArch64 regresses? > > Well, no, I don't think that's an accurate description of what this > is. Dmitry reported a performance regression but the generated code is > almost identical with or without the patch (the only difference being > that in one case the generated code uses b.cc and in the other > b.eq). Dmitry also hypothesized that branch prediction may not perform > as well with the patch. That doesn't seem directly related to the patch > but more of an unfortunate side effect. So the patch simplifies the IR > so less instructions may need to be emitted. That's not x86 specific. It > just happens that aarch64 don't seem to be able to take advantage of it > but it doesn't increase the number of instructions that aarch64 needs > either or forces aarch64 to use less efficient instructions. So overall, > it seemed to me there was no reasonable reason to not push this patch. OK, I see. I agree that reasoning is sound. We already know that perfectly reasonable improvements to JITs occasionally cause regressions in some cases, but that's not a reason to reject such improvements. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Mon Sep 24 18:03:42 2018 From: aph at redhat.com (Andrew Haley) Date: Mon, 24 Sep 2018 19:03:42 +0100 Subject: RFR: 8211064: [AArch64] Interpreter and c1 don't correctly handle jboolean results in native calls Message-ID: apetushkov sent me this little patch and I approved it offlist. I will push to jdk-jdk. http://cr.openjdk.java.net/~aph/8211064/ -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shade at redhat.com Mon Sep 24 18:06:25 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 24 Sep 2018 20:06:25 +0200 Subject: RFR: 8211064: [AArch64] Interpreter and c1 don't correctly handle jboolean results in native calls In-Reply-To: References: Message-ID: <747ac95d-48ea-0777-bfab-1bcf200073a4@redhat.com> On 09/24/2018 08:03 PM, Andrew Haley wrote: > apetushkov sent me this little patch and I approved it offlist. I > will push to jdk-jdk. > > http://cr.openjdk.java.net/~aph/8211064/ Looks good to me. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From ekaterina.pavlova at oracle.com Mon Sep 24 19:37:53 2018 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Mon, 24 Sep 2018 12:37:53 -0700 Subject: RFR(XS) 8199885: [Graal] org.graalvm.compiler.core.test.CountedLoopTest fails with "ControlFlowAnchor should never be cloned in the same graph" In-Reply-To: References: <18a99545-72ad-1cff-c940-25d67cda24ef@oracle.com> <8fbbe891-3cce-79d1-9b78-46cfad86c8e3@oracle.com> Message-ID: <5c7d5612-dce1-4e97-8e51-4b03e58b90ef@oracle.com> Graal team looked at the test failure log and it looks quite weird and rather "impossible" to reach that state. I also did testing using latest jdk bits and no failures observed. I would prefer to integrate the fix and file new bug in case new failures. thanks, -katya On 8/14/18 11:49 AM, Vladimir Kozlov wrote: > On 8/14/18 6:40 AM, Ekaterina Pavlova wrote: >> On 8/13/18 12:25 PM, Vladimir Kozlov wrote: >>> Katya, >>> >>> Did you confirmed that these tests are actually run in mach5 after these changes. >> >> yes, I do confirm. >> However I started to observe intermittent failure of org.graalvm.compiler.core.test.CountedLoopTest today. >> The failure is different. Let's postpone this change till I discuss new failure with Doug. >> Please see other answers below. >> >>> I see conflicting '@requires' in test definition: >>> >>> ?? * @requires vm.opt.final.EnableJVMCI == true >>> + * @requires !vm.graal.enabled >> >> well, they are not conflicting because vm.graal.enabled requires UseJVMCICompiler >> >>> The only runs when EnableJVMCI is specified are runs with Graal as JIT in which case second @requires will skip tests. >> >> we do run graal unit tests in 2 configurations: >> >> 1) graal-off: -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:-UseJVMCICompiler > > I forgot about this mode. Yes, @require change is fine then. >> >> 2) graal-on:? -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+TieredCompilation -XX:+UseJVMCICompiler -Djvmci.Compiler=graal >> >> So, these tests will be run in first configuration and skipped in second one. >> >>> May be move these test into a special group to make sure to run them without Graal JIT but with JVMCI on. >> >> we already do it, graal unit tests are run using graal-off configuration as part of tier3 testing. > > Got it. > > Thanks, > Vladimir > >> >> thanks, >> -katya >> >> >>> Thanks, >>> Vladimir >>> >>> On 8/13/18 5:26 AM, Ekaterina Pavlova wrote: >>>> Hi All, >>>> >>>> please review the change which disables org.graalvm.compiler.core.test.* tests in Graal as JIT mode. >>>> All these tests (except org.graalvm.compiler.core.test.tutorial.GraalTutorial and org.graalvm.compiler.core.test.StaticInterfaceFieldTest) >>>> subclass GraalCompilerTest and were not designed to run in Graal as JIT mode. >>>> >>>> Doug also confirmed that disabling org.graalvm.compiler.core.test.tutorial.GraalTutorial and org.graalvm.compiler.core.test.StaticInterfaceFieldTest >>>> is also the right way. >>>> >>>> Note, the tests will need to be modified/redesigned once Graal becomes default JIT compiler. >>>> >>>> >>>> ???? JBS: https://bugs.openjdk.java.net/browse/JDK-8199885 >>>> ??webrev: http://cr.openjdk.java.net/~epavlova//8199885/webrev.00/index.html >>>> testing: Run compiler/graalunit/CoreTest.java with enabled and disabled Graal. The test was skipped in case Graal was enabled and >>>> ????????? passed in case Graal was disabled. >>>> >>>> thanks, >>>> -katya >>>> >>>> >> From vladimir.kozlov at oracle.com Mon Sep 24 20:49:26 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 24 Sep 2018 13:49:26 -0700 Subject: RFR(XS) 8199885: [Graal] org.graalvm.compiler.core.test.CountedLoopTest fails with "ControlFlowAnchor should never be cloned in the same graph" In-Reply-To: <5c7d5612-dce1-4e97-8e51-4b03e58b90ef@oracle.com> References: <18a99545-72ad-1cff-c940-25d67cda24ef@oracle.com> <8fbbe891-3cce-79d1-9b78-46cfad86c8e3@oracle.com> <5c7d5612-dce1-4e97-8e51-4b03e58b90ef@oracle.com> Message-ID: Okay. Thanks, Vladimir On 9/24/18 12:37 PM, Ekaterina Pavlova wrote: > Graal team looked at the test failure log and it looks quite weird and rather "impossible" to reach that state. > I also did testing using latest jdk bits and no failures observed. > I would prefer to integrate the fix and file new bug in case new failures. > > thanks, > -katya > > On 8/14/18 11:49 AM, Vladimir Kozlov wrote: >> On 8/14/18 6:40 AM, Ekaterina Pavlova wrote: >>> On 8/13/18 12:25 PM, Vladimir Kozlov wrote: >>>> Katya, >>>> >>>> Did you confirmed that these tests are actually run in mach5 after these changes. >>> >>> yes, I do confirm. >>> However I started to observe intermittent failure of org.graalvm.compiler.core.test.CountedLoopTest today. >>> The failure is different. Let's postpone this change till I discuss new failure with Doug. >>> Please see other answers below. >>> >>>> I see conflicting '@requires' in test definition: >>>> >>>> ?? * @requires vm.opt.final.EnableJVMCI == true >>>> + * @requires !vm.graal.enabled >>> >>> well, they are not conflicting because vm.graal.enabled requires UseJVMCICompiler >>> >>>> The only runs when EnableJVMCI is specified are runs with Graal as JIT in which case second @requires will skip tests. >>> >>> we do run graal unit tests in 2 configurations: >>> >>> 1) graal-off: -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:-UseJVMCICompiler >> >> I forgot about this mode. Yes, @require change is fine then. >>> >>> 2) graal-on:? -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+TieredCompilation -XX:+UseJVMCICompiler >>> -Djvmci.Compiler=graal >>> >>> So, these tests will be run in first configuration and skipped in second one. >>> >>>> May be move these test into a special group to make sure to run them without Graal JIT but with JVMCI on. >>> >>> we already do it, graal unit tests are run using graal-off configuration as part of tier3 testing. >> >> Got it. >> >> Thanks, >> Vladimir >> >>> >>> thanks, >>> -katya >>> >>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 8/13/18 5:26 AM, Ekaterina Pavlova wrote: >>>>> Hi All, >>>>> >>>>> please review the change which disables org.graalvm.compiler.core.test.* tests in Graal as JIT mode. >>>>> All these tests (except org.graalvm.compiler.core.test.tutorial.GraalTutorial and >>>>> org.graalvm.compiler.core.test.StaticInterfaceFieldTest) >>>>> subclass GraalCompilerTest and were not designed to run in Graal as JIT mode. >>>>> >>>>> Doug also confirmed that disabling org.graalvm.compiler.core.test.tutorial.GraalTutorial and >>>>> org.graalvm.compiler.core.test.StaticInterfaceFieldTest >>>>> is also the right way. >>>>> >>>>> Note, the tests will need to be modified/redesigned once Graal becomes default JIT compiler. >>>>> >>>>> >>>>> ???? JBS: https://bugs.openjdk.java.net/browse/JDK-8199885 >>>>> ??webrev: http://cr.openjdk.java.net/~epavlova//8199885/webrev.00/index.html >>>>> testing: Run compiler/graalunit/CoreTest.java with enabled and disabled Graal. The test was skipped in case Graal >>>>> was enabled and >>>>> ????????? passed in case Graal was disabled. >>>>> >>>>> thanks, >>>>> -katya >>>>> >>>>> >>> > From vladimir.kozlov at oracle.com Mon Sep 24 21:51:01 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 24 Sep 2018 14:51:01 -0700 Subject: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF5FCD@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF0644@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF2E9D@FMSMSX126.amr.corp.intel.com> <685f297b-86ae-f270-776d-3d43b76bb792@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF3A34@FMSMSX126.amr.corp.intel.com> <589b5445-8315-561b-b63d-cb359162ad2c@oracle.com> <30a99af5-5379-4c4b-854c-37082ea51df3@oracle.com> <3c0fe42a-0606-a060-7435-32547676683b@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF43A8@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF4BB3@FMSMSX126.amr.corp.intel.com> <3cbd216b-56f9-0398-db93-4e032af85dc1@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF5771@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EEF5FCD@FMSMSX126.amr.corp.intel.com> Message-ID: <5170787d-479a-22e2-8191-0fad15ddd770@oracle.com> Looks good. I start testing again. I don't see your 'submit' job in a system. I asked people to look what happened. Thanks, Vladimir On 9/21/18 2:30 PM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > Please find the updated webrev with fix for build failure on SPARC and other architectures at: > Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.04/ > RFE: https://bugs.openjdk.java.net/browse/JDK-8210764 > > Vivek submitted this webrev for testing to submit repo yesterday at around noon. We haven?t received any email back so far. This is our first time with submit repo. > http://mail.openjdk.java.net/pipermail/jdk-submit-changes/2018-September/003164.html > > Best Regards, > Sandhya > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Viswanathan, Sandhya > Sent: Thursday, September 20, 2018 10:53 AM > To: Vladimir Kozlov ; hotspot-compiler-dev > Subject: RE: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction > > Hi Vladimir, > > In C1_LIRAssembler.hpp, when I added an additional parameter to negate, I did make sure to add it as a default parameter: > > src/hotspot/share/c1/c1_LIRAssembler.hpp, line 282: > void negate(LIR_Opr left, LIR_Opr dest, LIR_Opr tmp = LIR_OprFact::illegalOpr); > > But I guess since the function is not just called but declared/defined in all the other architectures, I need to add an unused LIR_Opr to the negate function for them. > This would be on similar lines as done in some other C1_LIRAssembler methods. > > I will make this change and work with Vivek to use the submit repo for testing it on Sparc. > > Best Regards, > Sandhya > > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, September 20, 2018 10:09 AM > To: Viswanathan, Sandhya ; hotspot-compiler-dev > Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction > > I hit build failure on SPARC due to shared changes in C1: > > workspace/open/src/hotspot/cpu/sparc/c1_LIRAssembler_sparc.cpp", line 3027: Error: "LIR_Assembler::negate(LIR_OprDesc*, > LIR_OprDesc*)" was previously declared "LIR_Assembler::negate(LIR_OprDesc*, LIR_OprDesc*, LIR_OprDesc*)". > jib > 1 Error(s) detected. > > I assume other platforms are also affected. > > Vladimir > > On 9/19/18 9:53 AM, Vladimir Kozlov wrote: >> Thank you, Sandhya >> >> I submitted new testing. >> >> Vladimir >> >> On 9/18/18 4:52 PM, Viswanathan, Sandhya wrote: >>> Hi Vladimir, >>> >>> Please find below the updated webrev with fixes for the two issues: >>> >>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.03/ >>> >>> >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8210764 >>> >>> Fix for compiler/vectorization/TestNaNVector.java was to pass legVecS >>> as the temporary register type for intrinsics instead of legVecD. >>> >>> This test was only failing with -XX:MaxVectorSize=4. >>> >>> The file modified is x86_64.ad. >>> >>> Fix for compiler/vectorization/TestNaNVector.java was to allow all >>> xmm registers (xmm0-xmm31) for C1 and handle floating point abs and negate appropriately by providing a temp register. >>> >>> The C1 files are modified for this fix. >>> >>> I reran compiler jtreg tests with setting nativepath appropriately on Haswell, SKX and KNL. >>> >>> Best Regards, >>> >>> Sandhya >>> >>> *From:*Viswanathan, Sandhya >>> *Sent:* Tuesday, September 18, 2018 1:47 PM >>> *To:* 'JC Beyler' >>> *Cc:* vladimir.kozlov at oracle.com; hotspot-compiler-dev >>> >>> *Subject:* RE: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>> instruction >>> >>> Hi Jc, >>> >>> Thanks a lot for the steps. I am now able to verify my fix for the NativeCallTest.java. >>> >>> Best Regards, >>> >>> Sandhya >>> >>> *From:*JC Beyler [mailto:jcbeyler at google.com] >>> *Sent:* Monday, September 17, 2018 9:29 PM >>> *To:* Viswanathan, Sandhya >> > >>> *Cc:* vladimir.kozlov at oracle.com ; >>> hotspot-compiler-dev >> > >>> *Subject:* Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>> instruction >>> >>> Hi Sandhya, >>> >>> How are you invoking the test for NativeCallTest? >>> >>> The way I would do it using jtreg would be something like this: >>> >>> $ export BUILD_TYPE=release >>> >>> $ export JDK_PATH=wherever you have your JDK >>> >>> ?From the test subfolder: >>> >>> $ wherever-your-jtreg-is/bin/jtreg >>> -nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/su >>> pport/test/hotspot/jtreg/native/lib -jdk >>> $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk >>> hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/t >>> est/NativeCallTest.java >>> >>> Seems to pass for me. >>> >>> But much easier is: >>> >>> $ make run-test TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java" >>> >>> That seems to pass for me as well and is easier to use :) >>> >>> For information, the make run-test documentation is here: >>> >>> http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing. >>> html >>> >>> Let me know if that helps, >>> >>> Jc >>> >>> ??? Note: For NativeCallTest.java and many others I get the following message in the corresponding .jtr file: >>> ???? ? ? ? ? ? ? ? ? ? ? "test result: Error. Use -nativepath to specify the location of native code" >>> ???? ? ? ? ? ? ? Do I need to give any additional info to jtreg to get over this problem? >>> >>> ??? Thanks a lot! >>> ??? Best Regards, >>> ??? Sandhya >>> >>> ??? -----Original Message----- >>> ??? From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com >>> ] >>> ??? Sent: Monday, September 17, 2018 10:14 AM >>> ??? To: Viswanathan, Sandhya >> >; >>> ??? hotspot-compiler-dev at openjdk.java.net >>> >>> ??? Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>> instruction >>> >>> ??? I finished testing on avx512 machine. >>> ??? All passed except known (TestNaNVector.java) failures. >>> >>> ??? Thanks, >>> ??? Vladimir >>> >>> ??? On 9/14/18 5:22 PM, Vladimir Kozlov wrote: >>> ???? > I gave incorrect link to RFE. Here is correct: >>> ???? > >>> ???? > https://bugs.openjdk.java.net/browse/JDK-8210764 >>> ???? > >>> ???? > Vladimir >>> ???? > >>> ???? > On 9/14/18 3:49 PM, Vladimir Kozlov wrote: >>> ???? >> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did not noticed. >>> ???? >> >>> ???? >> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on avx512 too. >>> ???? >> >>> ???? >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100 -XX:-TieredCompilation' >>> ??? on CPU >>> ???? >> with AVX1 only >>> ???? >> >>> ???? >> #? SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, >>> tid=13884 >>> ???? >> # Problematic frame: >>> ???? >> # V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20 >>> ???? >> >>> ???? >> Current CompileTask: >>> ???? >> C2:??? 154??? 5???????????? java.lang.String::equals (65 >>> bytes) >>> ???? >> >>> ???? >> Stack: [0x00007f3b10044000,0x00007f3b10145000], >>> sp=0x00007f3b1013fe70,? free space=1007k >>> ???? >> Native frames: (J=compiled Java code, A=aot compiled Java >>> code, j=interpreted, Vv=VM code, C=native code) >>> ???? >> V? [libjvm.so+0x46f0f0]? MachNode::ideal_reg() const+0x20 >>> ???? >> V? [libjvm.so+0x882a72] >>> PhaseChaitin::gather_lrg_masks(bool)+0x872 >>> ???? >> V? [libjvm.so+0xd82235]? PhaseCFG::global_code_motion()+0xfc5 >>> ???? >> V? [libjvm.so+0xd824b1] >>> PhaseCFG::do_global_code_motion()+0x51 >>> ???? >> V? [libjvm.so+0xa2c26c]? Compile::Code_Gen()+0x24c >>> ???? >> V? [libjvm.so+0xa2ff82]? Compile::Compile(ciEnv*, >>> C2Compiler*, ciMethod*, int, bool, bool, bool, >>> ??? DirectiveSet*)+0xe42 >>> ???? >> >>> ???? >> >>> --------------------------------------------------------------------- >>> --------------------------- >>> ???? >> 2. >>> >>> ??? with '-Xcomp' >>> ???? >> #? Internal Error >>> (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), >>> pid=22016, tid=22073 >>> ???? >> #? assert(false) failed: cannot spill interval that is used >>> in first instruction (possible reason: no register >>> ??? found) >>> ???? >> >>> ???? >> Current CompileTask: >>> ???? >> C1: 854767 13391?????? 3 org.sunflow.math.Matrix4::multiply >>> (692 bytes) >>> ???? >> >>> ???? >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000], >>> sp=0x00007f23b9e7f9d0,? free space=1014k >>> ???? >> Native frames: (J=compiled Java code, A=aot compiled Java >>> code, j=interpreted, Vv=VM code, C=native code) >>> ???? >> V? [libjvm.so+0x1882202]? VMError::report_and_die(int, char >>> const*, char const*, __va_list_tag*, Thread*, unsigned >>> ???? >> char*, void*, void*, char const*, int, unsigned long)+0x562 >>> ???? >> V? [libjvm.so+0x1882d2f]? VMError::report_and_die(Thread*, >>> void*, char const*, int, char const*, char const*, >>> ???? >> __va_list_tag*)+0x2f >>> ???? >> V? [libjvm.so+0xb0bea0]? report_vm_error(char const*, int, >>> char const*, char const*, ...)+0x100 >>> ???? >> V? [libjvm.so+0x7e0410] >>> LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0 >>> ???? >> V? [libjvm.so+0x7e0a20] >>> LinearScanWalker::activate_current()+0x280 >>> ???? >> V? [libjvm.so+0x7e0c7d]? IntervalWalker::walk_to(int) [clone >>> .constprop.299]+0x9d >>> ???? >> V? [libjvm.so+0x7e1078] >>> LinearScan::allocate_registers()+0x338 >>> ???? >> V? [libjvm.so+0x7e2135]? LinearScan::do_linear_scan()+0x155 >>> ???? >> V? [libjvm.so+0x70a6bb]? Compilation::emit_lir()+0x99b >>> ???? >> V? [libjvm.so+0x70caff] >>> Compilation::compile_java_method()+0x42f >>> ???? >> V? [libjvm.so+0x70d974]? Compilation::compile_method()+0x1d4 >>> ???? >> V? [libjvm.so+0x70e547] >>> Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, >>> BufferBlob*, >>> ???? >> DirectiveSet*)+0x357 >>> ???? >> V? [libjvm.so+0x71073c]? Compiler::compile_method(ciEnv*, >>> ciMethod*, int, DirectiveSet*)+0x14c >>> ???? >> V? [libjvm.so+0xa3cf89] >>> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409 >>> ???? >> >>> ???? >> Vladimir >>> ???? >> >>> ???? >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote: >>> ???? >>> >>> ???? >>> Thanks Vladimir, the below should fix this issue: >>> ???? >>> >>> ???? >>> ------------------------------ >>> ???? >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 >>> 13:10:23.488379912 -0700 >>> ???? >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 >>> 13:10:23.308379915 -0700 >>> ???? >>> @@ -233,22 +233,6 @@ >>> ???? >>> ??? _xmm_regs[13]? = xmm13; >>> ???? >>> ??? _xmm_regs[14]? = xmm14; >>> ???? >>> ??? _xmm_regs[15]? = xmm15; >>> ???? >>> -? _xmm_regs[16]? = xmm16; >>> ???? >>> -? _xmm_regs[17]? = xmm17; >>> ???? >>> -? _xmm_regs[18]? = xmm18; >>> ???? >>> -? _xmm_regs[19]? = xmm19; >>> ???? >>> -? _xmm_regs[20]? = xmm20; >>> ???? >>> -? _xmm_regs[21]? = xmm21; >>> ???? >>> -? _xmm_regs[22]? = xmm22; >>> ???? >>> -? _xmm_regs[23]? = xmm23; >>> ???? >>> -? _xmm_regs[24]? = xmm24; >>> ???? >>> -? _xmm_regs[25]? = xmm25; >>> ???? >>> -? _xmm_regs[26]? = xmm26; >>> ???? >>> -? _xmm_regs[27]? = xmm27; >>> ???? >>> -? _xmm_regs[28]? = xmm28; >>> ???? >>> -? _xmm_regs[29]? = xmm29; >>> ???? >>> -? _xmm_regs[30]? = xmm30; >>> ???? >>> -? _xmm_regs[31]? = xmm31; >>> ???? >>> ? #endif // _LP64 >>> ???? >>> >>> ???? >>> ??? for (int i = 0; i < 8; i++) { >>> ???? >>> --------------------------------- >>> ???? >>> >>> ???? >>> I think the gcc version on my desktop is older so didn?t catch this. >>> ???? >>> >>> ???? >>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to: >>> ???? >>> Patch: >>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/ >>> ??? >>> ???? >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735 >>> ???? >>> >>> ???? >>> FYI, I did notice that the default for UseAVX had been >>> rolled back and wanted to get confirmation from you before >>> ???? >>> changing it back to 3. >>> ???? >>> >>> ???? >>> Best Regards, >>> ???? >>> Sandhya >>> ???? >>> >>> ???? >>> >>> ???? >>> -----Original Message----- >>> ???? >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com >>> ] >>> ???? >>> Sent: Friday, September 14, 2018 12:13 PM >>> ???? >>> To: Viswanathan, Sandhya >> >; >>> ??? hotspot-compiler-dev at openjdk.java.net >>> >>> ???? >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>> instruction >>> ???? >>> >>> ???? >>> I got build failure: >>> ???? >>> >>> ???? >>> >>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: >>> array index 16 is past the end of the array >>> ???? >>> (which contains 16 elements) [-Werror,-Warray-bounds] >>> ???? >>> jib >?? _xmm_regs[16]? = xmm16; >>> ???? >>> >>> ???? >>> I also noticed that we don't have RFE for this work. I filed: >>> ???? >>> >>> ???? >>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>> ???? >>> >>> ???? >>> You did not enabled avx512 by default (8209735 change was >>> synced from jdk 11 into 12 2 weeks ago). I added next >>> ???? >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp: >>> ???? >>> >>> ???? >>> - product(intx, UseAVX, 2, \ >>> ???? >>> + product(intx, UseAVX, 3, \ >>> ???? >>> >>> ???? >>> Thanks, >>> ???? >>> Vladimir >>> ???? >>> >>> ???? >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote: >>> ???? >>>> Looks good to me. I will start testing and let you know results. >>> ???? >>>> >>> ???? >>>> Thanks, >>> ???? >>>> Vladimir >>> ???? >>>> >>> ???? >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote: >>> ???? >>>>> Hi Vladimir, >>> ???? >>>>> >>> ???? >>>>> Please find below the updated webrev with all your comments incorporated: >>> ???? >>>>> >>> ???? >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/ >>> ??? >>> ???? >>>>> >>> ???? >>>>> I have run the jtreg compiler tests on SKX and KNL which >>> have two >>> ???? >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms. >>> ???? >>>>> >>> ???? >>>>> Best Regards, >>> ???? >>>>> Sandhya >>> ???? >>>>> >>> ???? >>>>> >>> ???? >>>>> -----Original Message----- >>> ???? >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com >>> ] >>> ???? >>>>> Sent: Tuesday, September 11, 2018 8:54 PM >>> ???? >>>>> To: Viswanathan, Sandhya >> >; >>> ???? >>>>> hotspot-compiler-dev at openjdk.java.net >>> >>> ???? >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>> instruction >>> ???? >>>>> >>> ???? >>>>> Thank you, Sandhya >>> ???? >>>>> >>> ???? >>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them. >>> ???? >>>>> >>> ???? >>>>> Vladimir >>> ???? >>>>> >>> ???? >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote: >>> ???? >>>>>> Hi Vladimir, >>> ???? >>>>>> >>> ???? >>>>>> Thanks a lot for the detailed review. I really appreciate your feedback. >>> ???? >>>>>> Please see my response in your email below marked with >>> (Sandhya >>> ???? >>>>>>>>> ). Looking forward to your advice. >>> ???? >>>>>> >>> ???? >>>>>> Best Regards, >>> ???? >>>>>> Sandhya >>> ???? >>>>>> >>> ???? >>>>>> >>> ???? >>>>>> -----Original Message----- >>> ???? >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com >>> ] >>> ???? >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM >>> ???? >>>>>> To: Viswanathan, Sandhya >> >; >>> ???? >>>>>> hotspot-compiler-dev at openjdk.java.net >>> >>> ???? >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 >>> ???? >>>>>> instruction >>> ???? >>>>>> >>> ???? >>>>>> Thank you. >>> ???? >>>>>> >>> ???? >>>>>> I want to discuss next issue: >>> ???? >>>>>> >>> ???? >>>>>> ??? > You did not added instructions to load these >>> registers from >>> ???? >>>>>> memory (and stack). What happens in such cases when you need to load or store? >>> ???? >>>>>> ??? >>>> Let us take an example, e.g. for loading into >>> rregF. First >>> ???? >>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa. >>> ???? >>>>>> >>> ???? >>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack. >>> ???? >>>>>> Also we don't check that register could be the same as result you may get unneeded moves. >>> ???? >>>>>> >>> ???? >>>>>> I would advice add memory moves at least. >>> ???? >>>>>> >>> ???? >>>>>> Sandhya >>>? I had added those rules initially and >>> removed them in >>> ???? >>>>>> the final patch. I noticed that the register allocator >>> uses the >>> ???? >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg >>> mask >>> ???? >>>>>> (matcher.cpp). I would like the register allocator to get >>> all the >>> ???? >>>>>> possible register on an architecture for idealreg2reg >>> mask. I >>> ???? >>>>>> wondered that multiple instruct rules in .ad file for >>> LoadF from >>> ???? >>>>>> memory might cause problems.? I would have to have higher >>> cost for >>> ???? >>>>>> loading into restricted register set like vlReg. Then I >>> decided that >>> ???? >>>>>> the register allocator can handle this in much better way >>> than me >>> ???? >>>>>> adding rules to load from memory. This is with the >>> background that the regF is always all the available >>> ??? registers >>> ???? >>>>>> and vlRegF is the restricted register set. Likewise for >>> VecS and legVecS. Let me know you thoughts on this >>> ??? and if >>> ???? >>>>>> I should still add the rules to load from memory into >>> vlReg and legVec. The specific code from matcher.cpp >>> ??? that I >>> ???? >>>>>> am referring to is: >>> ???? >>>>>> ???? MachNode *spillCP = match_tree(new >>> ???? >>>>>> >>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>> ???? >>>>>> #endif >>> ???? >>>>>> ???? MachNode *spillI? = match_tree(new >>> ???? >>>>>> >>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered)); >>> ???? >>>>>> ???? MachNode *spillL? = match_tree(new >>> ???? >>>>>> >>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, >>> ???? >>>>>> LoadNode::DependsO nlyOnTest, false)); >>> ???? >>>>>> ???? MachNode *spillF? = match_tree(new >>> ???? >>>>>> >>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered)); >>> ???? >>>>>> ???? MachNode *spillD? = match_tree(new >>> ???? >>>>>> >>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered)); >>> ???? >>>>>> ???? MachNode *spillP? = match_tree(new >>> ???? >>>>>> >>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered)); >>> ???? >>>>>> ???? .... >>> ???? >>>>>> ???? idealreg2regmask[Op_RegF] = &spillF->out_RegMask(); >>> ???? >>>>>> >>> ???? >>>>>> An other question. You use movflt() and movdbl() which >>> use either >>> ???? >>>>>> movap[s|d] and movs[s|d] >>> ???? >>>>>> instructions: >>> ???? >>>>>> >>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu >>> ???? >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions >>> work when >>> ???? >>>>>> avx512vl is not available? I see for vectors you use >>> ???? >>>>>> vpxor+vinserti* combination. >>> ???? >>>>>> >>> ???? >>>>>> Sandhya >>> Yes the scalar floating point instructions >>> are available >>> ???? >>>>>> with AVX512 encoding when avx512vl is not available. That >>> is why you >>> ???? >>>>>> would see not just movflt, movdbl but all the other >>> scalar >>> ???? >>>>>> operations like adds, addsd etc using the entire xmm >>> range (xmm0-31). In other words they are AVX512F >>> ??? instructions. >>> ???? >>>>>> >>> ???? >>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad : >>> ???? >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl()) >>> ???? >>>>>> >>> ???? >>>>>> Should it be (UseAVX < 3)? >>> ???? >>>>>> >>> ???? >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev. >>> ???? >>>>>> >>> ???? >>>>>> Thanks, >>> ???? >>>>>> Vladimir >>> ???? >>>>>> >>> ???? >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote: >>> ???? >>>>>>> Hi Vladimir, >>> ???? >>>>>>> >>> ???? >>>>>>> Thanks a lot for your review and feedback. Please see my >>> response >>> ???? >>>>>>> in your email below. I will send an updated webrev incorporating your feedback. >>> ???? >>>>>>> >>> ???? >>>>>>> Best Regards, >>> ???? >>>>>>> Sandhya >>> ???? >>>>>>> >>> ???? >>>>>>> >>> ???? >>>>>>> -----Original Message----- >>> ???? >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com >>> ] >>> ???? >>>>>>> Sent: Monday, September 10, 2018 6:09 PM >>> ???? >>>>>>> To: Viswanathan, Sandhya >> >; >>> ???? >>>>>>> hotspot-compiler-dev at openjdk.java.net >>> >>> ???? >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on >>> AVX512 >>> ???? >>>>>>> instruction >>> ???? >>>>>>> >>> ???? >>>>>>> Very nice. Thank you, Sandhya. >>> ???? >>>>>>> >>> ???? >>>>>>> I would like to see more meaningful naming in .ad files >>> - instead >>> ???? >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*. >>> ???? >>>>>>> >>> ???? >>>>>>>>>> Yes, accepted. >>> ???? >>>>>>> >>> ???? >>>>>>> New load_from_* and load_to_* instructions in .ad files >>> should be >>> ???? >>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions: >>> ???? >>>>>>> >>> ???? >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct >>> MoveVL2F(regF dst, >>> ???? >>>>>>> vlRegF src) >>> ???? >>>>>>>>>> Yes, accepted. >>> ???? >>>>>>> >>> ???? >>>>>>> You did not added instructions to load these registers >>> from memory >>> ???? >>>>>>> (and stack). What happens in such cases when you need to load or store? >>> ???? >>>>>>>>>> Let us take an example, e.g. for loading into rregF. >>> First it >>> ???? >>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa. >>> ???? >>>>>>> >>> ???? >>>>>>> Also please explain why these registers are used when UseAVX == 0?: >>> ???? >>>>>>> >>> ???? >>>>>>> +instruct absD_reg(rregD dst) %{ >>> ???? >>>>>>> ??????? predicate((UseSSE>=2) && (UseAVX == 0)); >>> ???? >>>>>>> >>> ???? >>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too: >>> ???? >>>>>>> ????? 661?? if (UseAVX < 3) { >>> ???? >>>>>>> ????? 662???? _features &= ~CPU_AVX512F; >>> ???? >>>>>>> >>> ???? >>>>>>>>>> Yes, accepted. It could be regD here. >>> ???? >>>>>>> >>> ???? >>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some): >>> ???? >>>>>>> >>> ???? >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex, >>> ???? >>>>>>> +vectors_reg_legacy, %{ >>> ???? >>>>>>> VM_Version::supports_evex() && >>> VM_Version::supports_avx512bw() && >>> ???? >>>>>>> VM_Version::supports_avx512dq() && >>> ???? >>>>>>> VM_Version::supports_avx512vl() %} ); >>> ???? >>>>>>> >>> ???? >>>>>>>>>> Yes, accepted. >>> ???? >>>>>>> >>> ???? >>>>>>> I would suggest to test these changes on different >>> machines >>> ???? >>>>>>> (non-avx512 and avx512) and with different UseAVX values. >>> ???? >>>>>>> >>> ???? >>>>>>>>>> Will do. >>> ???? >>>>>>> >>> ???? >>>>>>> Thanks, >>> ???? >>>>>>> Vladimir >>> ???? >>>>>>> >>> ???? >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote: >>> ???? >>>>>>>> Recently there have been couple of high priority issues >>> with >>> ???? >>>>>>>> regards to high bank of XMM register >>> ???? >>>>>>>> (XMM16-XMM31) usage by C2: >>> ???? >>>>>>>> >>> ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746 >>> ???? >>>>>>>> >>> ???? >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735 >>> ???? >>>>>>>> >>> ???? >>>>>>>> Please find below a patch which attempts to clean up >>> the XMM >>> ???? >>>>>>>> register handling by using register groups. >>> ???? >>>>>>>> >>> ???? >>>>>>>> >>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/ >>> ??? >>> ???? >>>>>>>> >>> >>> ???? >>>>>>>> >>> ???? >>>>>>>> The patch provides a restricted set of registers to the >>> match >>> ???? >>>>>>>> rules in the ad file based on the underlying architecture. >>> ???? >>>>>>>> >>> ???? >>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler. >>> ???? >>>>>>>> >>> ???? >>>>>>>> By removing the special handling, the patch reduces the >>> overall >>> ???? >>>>>>>> code size by about 1800 lines of code. >>> ???? >>>>>>>> >>> ???? >>>>>>>> Your review and feedback is very welcome. >>> ???? >>>>>>>> >>> ???? >>>>>>>> Best Regards, >>> ???? >>>>>>>> >>> ???? >>>>>>>> Sandhya >>> ???? >>>>>>>> >>> >>> >>> -- >>> >>> Thanks, >>> >>> Jc >>> From rkennke at redhat.com Tue Sep 25 07:19:06 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 25 Sep 2018 09:19:06 +0200 Subject: RFR: JDK-8211061: Tests fail with assert(VM_Version::supports_sse4_1()) on ThreadRipper CPU In-Reply-To: <8470d1f1-e878-7a01-ca1b-bacbb0b845e6@redhat.com> References: <8470d1f1-e878-7a01-ca1b-bacbb0b845e6@redhat.com> Message-ID: Involving hotspot-compiler-dev... > Some tests fail with: > > # Internal Error > (/home/rkennke/src/openjdk/jdk-jdk/src/hotspot/cpu/x86/assembler_x86.cpp:3819), > pid=5051, tid=5055 > # Error: assert(VM_Version::supports_sse4_1()) failed > > When running hotspot/jtreg:tier1 on my ThreadRipper 1950X box. On my > Intel box, this works fine. It looks like it attempts to generate > fast_sha1 stubs, which use Assembler::pinsrd() but then runs into > supports_sse4_1(). > > The failing tier1 tests are: > compiler/c1/Test6579789.java > compiler/c1/Test6855215.java > compiler/cpuflags/TestSSE4Disabled.java > > The failing tests seem to disable SSE4 or SSE altogether and check if it > still compiles fine. This does not go well for the SHA1 and SHA256 stubs > because they use SSE4.1 instructions. It seems that it compiles on my > Intel box because that doesn't support_sha(), and thus disables those > intrinsics altogether. > > The proposed fix is to check for SSE4.1 present before enabling the > affected intrinsics. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8211061 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8211061/webrev.00/ > > Testing: hotspot/jtreg:tier1 failed before, now passes > > Thanks, > Roman > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From tobias.hartmann at oracle.com Tue Sep 25 07:42:05 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Sep 2018 09:42:05 +0200 Subject: RFR: JDK-8211061: Tests fail with assert(VM_Version::supports_sse4_1()) on ThreadRipper CPU In-Reply-To: References: <8470d1f1-e878-7a01-ca1b-bacbb0b845e6@redhat.com> Message-ID: <5b44471e-0d24-4d65-71d5-0ccf589cc4e6@oracle.com> Hi Roman, this looks good to me. Best regards, Tobias On 25.09.2018 09:19, Roman Kennke wrote: > Involving hotspot-compiler-dev... > >> Some tests fail with: >> >> # Internal Error >> (/home/rkennke/src/openjdk/jdk-jdk/src/hotspot/cpu/x86/assembler_x86.cpp:3819), >> pid=5051, tid=5055 >> # Error: assert(VM_Version::supports_sse4_1()) failed >> >> When running hotspot/jtreg:tier1 on my ThreadRipper 1950X box. On my >> Intel box, this works fine. It looks like it attempts to generate >> fast_sha1 stubs, which use Assembler::pinsrd() but then runs into >> supports_sse4_1(). >> >> The failing tier1 tests are: >> compiler/c1/Test6579789.java >> compiler/c1/Test6855215.java >> compiler/cpuflags/TestSSE4Disabled.java >> >> The failing tests seem to disable SSE4 or SSE altogether and check if it >> still compiles fine. This does not go well for the SHA1 and SHA256 stubs >> because they use SSE4.1 instructions. It seems that it compiles on my >> Intel box because that doesn't support_sha(), and thus disables those >> intrinsics altogether. >> >> The proposed fix is to check for SSE4.1 present before enabling the >> affected intrinsics. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8211061 >> Webrev: >> http://cr.openjdk.java.net/~rkennke/JDK-8211061/webrev.00/ >> >> Testing: hotspot/jtreg:tier1 failed before, now passes >> >> Thanks, >> Roman >> > > From tobias.hartmann at oracle.com Tue Sep 25 08:13:04 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Sep 2018 10:13:04 +0200 Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of single-threaded walk of thread stacks in NMethodSweeper::mark_active_nmethods() In-Reply-To: References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com> <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com> <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com> <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com> <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com> <6c97cf4c-517f-767c-6723-1b2f41080f7f@redhat.com> <818610b4-cf48-091a-3719-09f1f863c508@redhat.com> <662e1e46-a48e-34db-da0a-df693160928d@redhat.com> Message-ID: Hi Roman, On 24.09.2018 18:46, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.05/ This looks good to me! Best regards, Tobias From rkennke at redhat.com Tue Sep 25 08:17:23 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 25 Sep 2018 10:17:23 +0200 Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of single-threaded walk of thread stacks in NMethodSweeper::mark_active_nmethods() In-Reply-To: References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com> <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com> <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com> <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com> <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com> <6c97cf4c-517f-767c-6723-1b2f41080f7f@redhat.com> <818610b4-cf48-091a-3719-09f1f863c508@redhat.com> <662e1e46-a48e-34db-da0a-df693160928d@redhat.com> Message-ID: <66c0754a-e93b-48b8-4528-184c636d7254@redhat.com> Thanks Tobias for reviewing! Erik: Is this good for you too? Thanks, Roman > Hi Roman, > > On 24.09.2018 18:46, Roman Kennke wrote: >> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.05/ > > This looks good to me! > > Best regards, > Tobias > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From rkennke at redhat.com Tue Sep 25 08:23:05 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 25 Sep 2018 10:23:05 +0200 Subject: RFR: JDK-8211061: Tests fail with assert(VM_Version::supports_sse4_1()) on ThreadRipper CPU In-Reply-To: <5b44471e-0d24-4d65-71d5-0ccf589cc4e6@oracle.com> References: <8470d1f1-e878-7a01-ca1b-bacbb0b845e6@redhat.com> <5b44471e-0d24-4d65-71d5-0ccf589cc4e6@oracle.com> Message-ID: <4554cf90-c0dd-baa4-ba41-e3f46b2c8871@redhat.com> Thanks for reviewing, Tobias! Roman > Hi Roman, > > this looks good to me. > > Best regards, > Tobias > > On 25.09.2018 09:19, Roman Kennke wrote: >> Involving hotspot-compiler-dev... >> >>> Some tests fail with: >>> >>> # Internal Error >>> (/home/rkennke/src/openjdk/jdk-jdk/src/hotspot/cpu/x86/assembler_x86.cpp:3819), >>> pid=5051, tid=5055 >>> # Error: assert(VM_Version::supports_sse4_1()) failed >>> >>> When running hotspot/jtreg:tier1 on my ThreadRipper 1950X box. On my >>> Intel box, this works fine. It looks like it attempts to generate >>> fast_sha1 stubs, which use Assembler::pinsrd() but then runs into >>> supports_sse4_1(). >>> >>> The failing tier1 tests are: >>> compiler/c1/Test6579789.java >>> compiler/c1/Test6855215.java >>> compiler/cpuflags/TestSSE4Disabled.java >>> >>> The failing tests seem to disable SSE4 or SSE altogether and check if it >>> still compiles fine. This does not go well for the SHA1 and SHA256 stubs >>> because they use SSE4.1 instructions. It seems that it compiles on my >>> Intel box because that doesn't support_sha(), and thus disables those >>> intrinsics altogether. >>> >>> The proposed fix is to check for SSE4.1 present before enabling the >>> affected intrinsics. >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8211061 >>> Webrev: >>> http://cr.openjdk.java.net/~rkennke/JDK-8211061/webrev.00/ >>> >>> Testing: hotspot/jtreg:tier1 failed before, now passes >>> >>> Thanks, >>> Roman >>> >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From rwestrel at redhat.com Tue Sep 25 08:27:03 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 25 Sep 2018 10:27:03 +0200 Subject: RFR(M): 8210885: Convert left over loads/stores to access api In-Reply-To: <322e26c7-6179-f14c-3c5d-abd4a1cae76d@oracle.com> References: <322e26c7-6179-f14c-3c5d-abd4a1cae76d@oracle.com> Message-ID: Thanks for the reviews Roman & Tobias. Roland. From rwestrel at redhat.com Tue Sep 25 08:37:27 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 25 Sep 2018 10:37:27 +0200 Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy In-Reply-To: <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com> References: <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com> Message-ID: Hi Tobias, > Looks good to me. I'm assuming the currently unused arguments of array_copy_requires_gc_barriers > will be used for Shenandoah? Why you need to distinguish between ArrayCopyPhase? Thanks for the view. Yes extra arguments are to be used by shenandoah. Generating barriers once parsing is over is not supported by all gcs. The shape of the barriers is sometimes too complicated to be emitted at igvn time. Roland. From Pengfei.Li at arm.com Tue Sep 25 08:38:07 2018 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Tue, 25 Sep 2018 08:38:07 +0000 Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by power-of-2 check In-Reply-To: References: <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com> Message-ID: Hi Vladimir, I still didn't get other comments during the past week. Do you think it is ok to push this patch? http://cr.openjdk.java.net/~njian/8210152/webrev.01/ -- Thanks, Pengfei > -----Original Message----- > > Hi Reviewers, > > Is there any other comments, objections or suggestions on the new webrev? > If no problems, could anyone help to push this commit? > > I look forward to your response. > > -- > Thanks, > Pengfei > > > -----Original Message----- > > > > Looks good. > > > > Thanks, > > Vladimir > > > > On 9/12/18 2:50 AM, Pengfei Li (Arm Technology China) wrote: > > > Hi, > > > > > > I've updated the patch based on Vladimir's comment. I added checks > > > for > > SubI on both branches of the diamond phi. > > > Also thanks Roland for the suggestion that supporting a Phi with 3 > > > or more > > inputs. But I think the matching rule will be much more complex if we > > add this. And I'm not sure if there are any real case scenario which > > can benefit from this support. So I didn't add it in. > > > > > > New webrev: http://cr.openjdk.java.net/~njian/8210152/webrev.01/ > > > I've run jtreg full test with the new patch and no new issues found. > > > > > > Please let me know if you have other comments or suggestions. If no > > further issues, I need your help to sponsor and push the patch. > > > > > > -- > > > Thanks, > > > Pengfei > > > > > > From tobias.hartmann at oracle.com Tue Sep 25 08:42:17 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Sep 2018 10:42:17 +0200 Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy In-Reply-To: References: <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com> Message-ID: Hi Roland, okay, thanks for the clarifications. Best regards, Tobias On 25.09.2018 10:37, Roland Westrelin wrote: > > Hi Tobias, > >> Looks good to me. I'm assuming the currently unused arguments of array_copy_requires_gc_barriers >> will be used for Shenandoah? Why you need to distinguish between ArrayCopyPhase? > > Thanks for the view. > > Yes extra arguments are to be used by shenandoah. > > Generating barriers once parsing is over is not supported by all > gcs. The shape of the barriers is sometimes too complicated to be > emitted at igvn time. > > Roland. > From aph at redhat.com Tue Sep 25 09:01:03 2018 From: aph at redhat.com (Andrew Haley) Date: Tue, 25 Sep 2018 10:01:03 +0100 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> Message-ID: On 09/24/2018 09:14 AM, Alan Bateman wrote: > I'm not questioning the need to support NVM, instead I'm trying to > see whether MappedByteBuffer is the right way to expose this in the > standard API. Buffers were designed in JSR-51 with specific > use-cases in mind but they are problematic for many off-heap cases > as they aren't thread safe, are limited to 2GB, lack confinement, > only support homogeneous data (no layout support). I'm baffled by this assertion. It's true that the 2Gb limit is turning into a real pain, but all of the rest are nothing to do with ByteBuffers, which are just raw memory. Adding structure is something that can be done by third-party libraries or by some future OpenJDK API. > So where does this leave us? If support for persistent memory is > added to FileChannel.map as we've been discussing then it may not be > too bad as the API surface is small. The API surface is just new map > modes and a MappedByteBuffer::isPersistent method. The force method > that specify a range is something useful to add to MBB anyway. Yeah, that's right, it is. While something not yet planned might be an alternative, even a better one, the purpose of our faster release cadence is to "evolve the Java SE Platform and the JDK at a more rapid pace, so that new features [can] be delivered in timelier manner". This is timely; waiting for Panama to think of what might be possible, not so much. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Tue Sep 25 09:28:35 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Sep 2018 11:28:35 +0200 Subject: [12] RFR(M): 8210215: C2 should optimize trichotomy calculations Message-ID: Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8210215 http://cr.openjdk.java.net/~thartmann/8210215/webrev.00/ While analyzing performance results for the Value Types LW1 EA we got back from Doug Lea [1], I've found that C2 is very bad at optimizing simple "trichotomic" comparisons of the form int compare(int a, int b) { return (a < b) ? -1 : (a == b) ? 0 : 1; } when being inlined into a caller comparing the result against -1, 0 or 1. For example, "compare(a, b) == 1" should be folded to "a > b" but currently isn't. Since this is a very common pattern for sorting algorithms and since it will be even more common with value types, it's crucial to optimize these. Out of the 66 comparisons in the jtreg test, C2 can optimize only 16. For all the other 50 tests, C2 emits two comparisons. With my patch, all comparisons are optimized. The optimization improves the performance of Doug's microbenchmark by 5 - 12%. The basic idea of the optimization is to search for two ifs that compare the same values while one projection of the first if connects to the second if and all other projections connect to the same region (potentially through an intermediate region). If two out of three projections to the region then map to the same result value or control, we can replace the two ifs by a single if. The implementation does this by first checking for one of the two shapes described in the RegionNode::optimize_trichotomy comment which ensure that the comparisons have only two result branches. We then merge the two ifs by computing the logical AND of the two tests (might be a constant if the result is always false). Thanks to John for pre-reviewing this change. Thanks, Tobias [1] http://mail.openjdk.java.net/pipermail/valhalla-dev/2018-August/004879.html From tobias.hartmann at oracle.com Tue Sep 25 09:38:02 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Sep 2018 11:38:02 +0200 Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by power-of-2 check In-Reply-To: References: <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com> Message-ID: <8a5aba14-5feb-e6ca-953b-a91d12e4b5bc@oracle.com> Hi Pengfei, this looks good to me but please fix the whitespacing before pushing: "if( a == b )" -> "if (a == b)" "method( param )" -> "method(param)" It's not consistently like that in old HotSpot code but we should at least fix it in new code. Thanks, Tobias On 25.09.2018 10:38, Pengfei Li (Arm Technology China) wrote: > Hi Vladimir, > > I still didn't get other comments during the past week. > Do you think it is ok to push this patch? > http://cr.openjdk.java.net/~njian/8210152/webrev.01/ > > -- > Thanks, > Pengfei > >> -----Original Message----- >> >> Hi Reviewers, >> >> Is there any other comments, objections or suggestions on the new webrev? >> If no problems, could anyone help to push this commit? >> >> I look forward to your response. >> >> -- >> Thanks, >> Pengfei >> >>> -----Original Message----- >>> >>> Looks good. >>> >>> Thanks, >>> Vladimir >>> >>> On 9/12/18 2:50 AM, Pengfei Li (Arm Technology China) wrote: >>>> Hi, >>>> >>>> I've updated the patch based on Vladimir's comment. I added checks >>>> for >>> SubI on both branches of the diamond phi. >>>> Also thanks Roland for the suggestion that supporting a Phi with 3 >>>> or more >>> inputs. But I think the matching rule will be much more complex if we >>> add this. And I'm not sure if there are any real case scenario which >>> can benefit from this support. So I didn't add it in. >>>> >>>> New webrev: http://cr.openjdk.java.net/~njian/8210152/webrev.01/ >>>> I've run jtreg full test with the new patch and no new issues found. >>>> >>>> Please let me know if you have other comments or suggestions. If no >>> further issues, I need your help to sponsor and push the patch. >>>> >>>> -- >>>> Thanks, >>>> Pengfei >>>> >>>> From Pengfei.Li at arm.com Tue Sep 25 10:13:56 2018 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Tue, 25 Sep 2018 10:13:56 +0000 Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by power-of-2 check In-Reply-To: <8a5aba14-5feb-e6ca-953b-a91d12e4b5bc@oracle.com> References: <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com> <8a5aba14-5feb-e6ca-953b-a91d12e4b5bc@oracle.com> Message-ID: Hi Tobias, Thanks for your review comment. I've fixed the whitespaces and below is a new webrev. http://cr.openjdk.java.net/~njian/8210152/webrev.02/ Could you help push this patch since I don't have permissions to do that? -- Thanks, Pengfei > -----Original Message----- > > Hi Pengfei, > > this looks good to me but please fix the whitespacing before pushing: > "if( a == b )" -> "if (a == b)" > "method( param )" -> "method(param)" > > It's not consistently like that in old HotSpot code but we should at least fix it > in new code. > > Thanks, > Tobias > > On 25.09.2018 10:38, Pengfei Li (Arm Technology China) wrote: > > Hi Vladimir, > > > > I still didn't get other comments during the past week. > > Do you think it is ok to push this patch? > > http://cr.openjdk.java.net/~njian/8210152/webrev.01/ > > > > -- > > Thanks, > > Pengfei > > > >> -----Original Message----- > >> > >> Hi Reviewers, > >> > >> Is there any other comments, objections or suggestions on the new > webrev? > >> If no problems, could anyone help to push this commit? > >> > >> I look forward to your response. > >> > >> -- > >> Thanks, > >> Pengfei > >> > >>> -----Original Message----- > >>> > >>> Looks good. > >>> > >>> Thanks, > >>> Vladimir > >>> > >>> On 9/12/18 2:50 AM, Pengfei Li (Arm Technology China) wrote: > >>>> Hi, > >>>> > >>>> I've updated the patch based on Vladimir's comment. I added checks > >>>> for > >>> SubI on both branches of the diamond phi. > >>>> Also thanks Roland for the suggestion that supporting a Phi with 3 > >>>> or more > >>> inputs. But I think the matching rule will be much more complex if > >>> we add this. And I'm not sure if there are any real case scenario > >>> which can benefit from this support. So I didn't add it in. > >>>> > >>>> New webrev: http://cr.openjdk.java.net/~njian/8210152/webrev.01/ > >>>> I've run jtreg full test with the new patch and no new issues found. > >>>> > >>>> Please let me know if you have other comments or suggestions. If no > >>> further issues, I need your help to sponsor and push the patch. > >>>> > >>>> -- > >>>> Thanks, > >>>> Pengfei > >>>> > >>>> From tobias.hartmann at oracle.com Tue Sep 25 12:22:25 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Sep 2018 14:22:25 +0200 Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by power-of-2 check In-Reply-To: References: <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com> <8a5aba14-5feb-e6ca-953b-a91d12e4b5bc@oracle.com> Message-ID: <54cd88be-ed94-9ffa-0862-f803496577fc@oracle.com> Hi Pengfei, sure, pushed: http://hg.openjdk.java.net/jdk/jdk/rev/bc38c75eed57 Best regards, Tobias On 25.09.2018 12:13, Pengfei Li (Arm Technology China) wrote: > Hi Tobias, > > Thanks for your review comment. I've fixed the whitespaces and below is a new webrev. > http://cr.openjdk.java.net/~njian/8210152/webrev.02/ > > Could you help push this patch since I don't have permissions to do that? > > -- > Thanks, > Pengfei > > >> -----Original Message----- >> >> Hi Pengfei, >> >> this looks good to me but please fix the whitespacing before pushing: >> "if( a == b )" -> "if (a == b)" >> "method( param )" -> "method(param)" >> >> It's not consistently like that in old HotSpot code but we should at least fix it >> in new code. >> >> Thanks, >> Tobias >> >> On 25.09.2018 10:38, Pengfei Li (Arm Technology China) wrote: >>> Hi Vladimir, >>> >>> I still didn't get other comments during the past week. >>> Do you think it is ok to push this patch? >>> http://cr.openjdk.java.net/~njian/8210152/webrev.01/ >>> >>> -- >>> Thanks, >>> Pengfei >>> >>>> -----Original Message----- >>>> >>>> Hi Reviewers, >>>> >>>> Is there any other comments, objections or suggestions on the new >> webrev? >>>> If no problems, could anyone help to push this commit? >>>> >>>> I look forward to your response. >>>> >>>> -- >>>> Thanks, >>>> Pengfei >>>> >>>>> -----Original Message----- >>>>> >>>>> Looks good. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 9/12/18 2:50 AM, Pengfei Li (Arm Technology China) wrote: >>>>>> Hi, >>>>>> >>>>>> I've updated the patch based on Vladimir's comment. I added checks >>>>>> for >>>>> SubI on both branches of the diamond phi. >>>>>> Also thanks Roland for the suggestion that supporting a Phi with 3 >>>>>> or more >>>>> inputs. But I think the matching rule will be much more complex if >>>>> we add this. And I'm not sure if there are any real case scenario >>>>> which can benefit from this support. So I didn't add it in. >>>>>> >>>>>> New webrev: http://cr.openjdk.java.net/~njian/8210152/webrev.01/ >>>>>> I've run jtreg full test with the new patch and no new issues found. >>>>>> >>>>>> Please let me know if you have other comments or suggestions. If no >>>>> further issues, I need your help to sponsor and push the patch. >>>>>> >>>>>> -- >>>>>> Thanks, >>>>>> Pengfei >>>>>> >>>>>> From rwestrel at redhat.com Tue Sep 25 13:10:40 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 25 Sep 2018 15:10:40 +0200 Subject: RFR(M): 8210885: Convert left over loads/stores to access api In-Reply-To: References: <322e26c7-6179-f14c-3c5d-abd4a1cae76d@oracle.com> Message-ID: I ran this one through the submit repo and it came back with 1 failed test: runtime/XCheckJniJsig/XCheckJSig.java that I can't reproduce. Could it be a spurious failure? Roland. From rkennke at redhat.com Tue Sep 25 13:33:34 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 25 Sep 2018 15:33:34 +0200 Subject: RFR(M): 8210885: Convert left over loads/stores to access api In-Reply-To: References: <322e26c7-6179-f14c-3c5d-abd4a1cae76d@oracle.com> Message-ID: I've also seen this with JDK-8211061. Roman > I ran this one through the submit repo and it came back with 1 failed > test: > > runtime/XCheckJniJsig/XCheckJSig.java > > that I can't reproduce. Could it be a spurious failure? > > Roland. > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From kuaiwei.kw at alibaba-inc.com Tue Sep 25 13:50:35 2018 From: kuaiwei.kw at alibaba-inc.com (Kuai Wei) Date: Tue, 25 Sep 2018 21:50:35 +0800 Subject: =?UTF-8?B?5Zue5aSN77ya5Zue5aSN77yaW1BhdGNoXSA4MjEwODUzOiBDMiBkb2Vzbid0IHNraXAgcG9z?= =?UTF-8?B?dCBiYXJyaWVyIGZvciBuZXcgYWxsb2NhdGVkIG9iamVjdHM=?= In-Reply-To: References: <37b6e1a5-6dcf-4097-baca-3f9c03d38bb1.kuaiwei.kw@alibaba-inc.com> , Message-ID: <4f7cdcf3-fa85-4011-85f9-fb2fc4f8d80e.kuaiwei.kw@alibaba-inc.com> Hi Tobias, Thanks for your comments. I will check RegionNode::is_copy to see if it can be used to detect unnecessary region node. I will send new review after testing. Best Regards, Kevin ------------------------------------------------------------------ ????Tobias Hartmann ?????2018?9?24?(???) 21:34 ??????(??) ; hotspot compiler ???????(??) ????Re: ???[Patch] 8210853: C2 doesn't skip post barrier for new allocated objects Hi Kevin, On 24.09.2018 08:06, Kuai Wei wrote: > Thanks for your suggestion. I think your point is the region node may have new path in later parse > phase, so we can not make sure the region node will be optimized. Yes, my point is that a new path to the region might be added after your optimization and that path might contain stores to the newly allocated object. > It's a good question and I checked it. Now I think it may not cause trouble. In post barrier > reduce, the oop store use allocation node as base pointer. The data graph guarantee control of > allocation node should dominate control of store. If allocation node is in pred of region node and > there's a new path into region, the graph is bad because we can reach store without allocation. Yes but the new path might be a backedge from a loop that is dominated by the allocation. > If allocation node is in a domination ancestor, the graph shape is a little complicated, so we can not > reach control of allocation by skipping one region. Right, that's basically the implicit assumption of your patch. I'm not sure if it always holds. But I think you should at least use RegionNode::is_copy(). Let's see what other reviewers think. > The better solution is we can know the region node is created in exit_map and we will not change > it in later. Is there any way to know it in compile time? The region node is created in Parse::build_exits(). I don't think there is a way to keep track of this. Thanks, Tobias -------------- next part -------------- An HTML attachment was scrubbed... URL: From Pengfei.Li at arm.com Tue Sep 25 13:54:06 2018 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Tue, 25 Sep 2018 13:54:06 +0000 Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by power-of-2 check In-Reply-To: <54cd88be-ed94-9ffa-0862-f803496577fc@oracle.com> References: <91d1bf82-eb60-c0a1-449d-8bb9246ddedd@oracle.com> <8a5aba14-5feb-e6ca-953b-a91d12e4b5bc@oracle.com> , <54cd88be-ed94-9ffa-0862-f803496577fc@oracle.com> Message-ID: Thanks Tobias. Hi Pengfei, sure, pushed: http://hg.openjdk.java.net/jdk/jdk/rev/bc38c75eed57 Best regards, Tobias On 25.09.2018 12:13, Pengfei Li (Arm Technology China) wrote: > Hi Tobias, > > Thanks for your review comment. I've fixed the whitespaces and below is a new webrev. > http://cr.openjdk.java.net/~njian/8210152/webrev.02/ > > Could you help push this patch since I don't have permissions to do that? > > -- > Thanks, > Pengfei > > >> -----Original Message----- >> >> Hi Pengfei, >> >> this looks good to me but please fix the whitespacing before pushing: >> "if( a == b )" -> "if (a == b)" >> "method( param )" -> "method(param)" >> >> It's not consistently like that in old HotSpot code but we should at least fix it >> in new code. >> >> Thanks, >> Tobias >> >> On 25.09.2018 10:38, Pengfei Li (Arm Technology China) wrote: >>> Hi Vladimir, >>> >>> I still didn't get other comments during the past week. >>> Do you think it is ok to push this patch? >>> http://cr.openjdk.java.net/~njian/8210152/webrev.01/ >>> >>> -- >>> Thanks, >>> Pengfei >>> >>>> -----Original Message----- >>>> >>>> Hi Reviewers, >>>> >>>> Is there any other comments, objections or suggestions on the new >> webrev? >>>> If no problems, could anyone help to push this commit? >>>> >>>> I look forward to your response. >>>> >>>> -- >>>> Thanks, >>>> Pengfei >>>> >>>>> -----Original Message----- >>>>> >>>>> Looks good. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 9/12/18 2:50 AM, Pengfei Li (Arm Technology China) wrote: >>>>>> Hi, >>>>>> >>>>>> I've updated the patch based on Vladimir's comment. I added checks >>>>>> for >>>>> SubI on both branches of the diamond phi. >>>>>> Also thanks Roland for the suggestion that supporting a Phi with 3 >>>>>> or more >>>>> inputs. But I think the matching rule will be much more complex if >>>>> we add this. And I'm not sure if there are any real case scenario >>>>> which can benefit from this support. So I didn't add it in. >>>>>> >>>>>> New webrev: http://cr.openjdk.java.net/~njian/8210152/webrev.01/ >>>>>> I've run jtreg full test with the new patch and no new issues found. >>>>>> >>>>>> Please let me know if you have other comments or suggestions. If no >>>>> further issues, I need your help to sponsor and push the patch. >>>>>> >>>>>> -- >>>>>> Thanks, >>>>>> Pengfei >>>>>> >>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcherepanov at azul.com Tue Sep 25 14:00:35 2018 From: dcherepanov at azul.com (Dmitry Cherepanov) Date: Tue, 25 Sep 2018 14:00:35 +0000 Subject: RFR: 8211100: hotspot C1 issue with comparing long numbers on x86 32-bit Message-ID: Hello, Please review a patch that resolves issue in x86 32bit builds. It slightly adjusts the fix for JDK-8201447 (C1 does backedge profiling incorrectly) by creating a copy of the left operand and using it for incrementing backedge counter. JBS issue: https://bugs.openjdk.java.net/browse/JDK-8211100 webrev: http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.00/ Thanks, Dmitry From tobias.hartmann at oracle.com Tue Sep 25 14:33:27 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Sep 2018 16:33:27 +0200 Subject: RFR(M): 8210885: Convert left over loads/stores to access api In-Reply-To: References: <322e26c7-6179-f14c-3c5d-abd4a1cae76d@oracle.com> Message-ID: <40296bcb-540f-693f-3431-f20dba984245@oracle.com> This is: https://bugs.openjdk.java.net/browse/JDK-8211084 Best regards, Tobias On 25.09.2018 15:33, Roman Kennke wrote: > I've also seen this with JDK-8211061. > > Roman > > >> I ran this one through the submit repo and it came back with 1 failed >> test: >> >> runtime/XCheckJniJsig/XCheckJSig.java >> >> that I can't reproduce. Could it be a spurious failure? >> >> Roland. >> > > From rwestrel at redhat.com Tue Sep 25 14:37:46 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 25 Sep 2018 16:37:46 +0200 Subject: RFR(M): 8210885: Convert left over loads/stores to access api In-Reply-To: <40296bcb-540f-693f-3431-f20dba984245@oracle.com> References: <322e26c7-6179-f14c-3c5d-abd4a1cae76d@oracle.com> <40296bcb-540f-693f-3431-f20dba984245@oracle.com> Message-ID: > https://bugs.openjdk.java.net/browse/JDK-8211084 Thanks. I'll push this change then. Roland. From doug.simon at oracle.com Tue Sep 25 14:48:12 2018 From: doug.simon at oracle.com (Doug Simon) Date: Tue, 25 Sep 2018 16:48:12 +0200 Subject: RFR: 8208686: [AOT] JVMTI ResourceExhausted event repeated for same allocation Message-ID: <910A4A3C-3EEF-4167-84D5-9819C83D6FC1@oracle.com> A major design point of Graal is to treat allocations as non-side effecting to give more freedom to the optimizer by reducing the number of distinct FrameStates that need to be managed. When failing an allocation, Graal will deoptimize to the last side effecting instruction before the allocation. This mean the VM code for heap allocation will potentially be executed twice, once from Graal compiled code and then again in the interpreter. While this is perfectly fine according to the JVM specification, it can cause confusing behavior for JVMTI based tools. They will receive 2 ResourceExhausted events for a single allocation. Furthermore, the first ResourceExhausted event (on the Graal allocation slow path) might denote a bytecode instruction that performs no allocation, making it hard to debug the memory failure. The proposed solution is to add an extra set of JVMCI VM runtime calls for allocation. These entry points will attempt the allocation and upon failure, skip side-effects such as posting JVMTI events or handling -XX:OnOutOfMemoryError. The compiled code using these entry points is expected deoptmize on null. The path from these new entry points to where allocation can fail goes through quite a bit of VM code. One could modify all these paths by: * Returning null instead of throwing an exception on failure. * Adding a `bool null_on_fail` argument to all relevant methods. * Adding extra null checking where necessary after each call to these methods when `null_on_fail == true`. This represents a significant number of changes. Instead, the proposed solution introduces a new _in_retryable_allocation thread-local. This way, only the entry points and allocation routines that raise an exception need to be modified. Failure is communicated back to the new entry points by throwing a special pre-allocated OOME object (i.e., Universe::out_of_memory_error_retry()) which must not propagate back to Java code. Use of this object is not strictly necessary; it is introduced to highlight/document the special allocation mode. The proposed solution is at http://cr.openjdk.java.net/~dnsimon/8208686. THE JBS bug is: https://bugs.openjdk.java.net/browse/JDK-8208686 -Doug From daniel.daugherty at oracle.com Tue Sep 25 15:11:18 2018 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 25 Sep 2018 11:11:18 -0400 Subject: RFR: 8208686: [AOT] JVMTI ResourceExhausted event repeated for same allocation In-Reply-To: <910A4A3C-3EEF-4167-84D5-9819C83D6FC1@oracle.com> References: <910A4A3C-3EEF-4167-84D5-9819C83D6FC1@oracle.com> Message-ID: <64cbe730-5c9a-a04d-9eee-a56abfbb8e07@oracle.com> Adding serviceability-dev at ... since this is JVM/TI... Dan On 9/25/18 10:48 AM, Doug Simon wrote: > A major design point of Graal is to treat allocations as non-side effecting to give more freedom to the optimizer by reducing the number of distinct FrameStates that need to be managed. When failing an allocation, Graal will deoptimize to the last side effecting instruction before the allocation. This mean the VM code for heap allocation will potentially be executed twice, once from Graal compiled code and then again in the interpreter. While this is perfectly fine according to the JVM specification, it can cause confusing behavior for JVMTI based tools. They will receive 2 ResourceExhausted events for a single allocation. Furthermore, the first ResourceExhausted event (on the Graal allocation slow path) might denote a bytecode instruction that performs no allocation, making it hard to debug the memory failure. > > The proposed solution is to add an extra set of JVMCI VM runtime calls for allocation. These entry points will attempt the allocation and upon failure, > skip side-effects such as posting JVMTI events or handling -XX:OnOutOfMemoryError. The compiled code using these entry points is expected deoptmize on null. > > The path from these new entry points to where allocation can fail goes through quite a bit of VM code. One could modify all these paths by: > * Returning null instead of throwing an exception on failure. > * Adding a `bool null_on_fail` argument to all relevant methods. > * Adding extra null checking where necessary after each call to these methods when `null_on_fail == true`. > This represents a significant number of changes. > > Instead, the proposed solution introduces a new _in_retryable_allocation thread-local. This way, only the entry points and allocation routines that raise an exception need to be modified. Failure is communicated back to the new entry points by throwing a special pre-allocated OOME object (i.e., Universe::out_of_memory_error_retry()) which must not propagate back to Java code. Use of this object is not strictly necessary; it is introduced to highlight/document the special allocation mode. > > The proposed solution is at http://cr.openjdk.java.net/~dnsimon/8208686. > THE JBS bug is: https://bugs.openjdk.java.net/browse/JDK-8208686 > > -Doug From tobias.hartmann at oracle.com Tue Sep 25 15:40:19 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Sep 2018 17:40:19 +0200 Subject: RFR: 8211100: hotspot C1 issue with comparing long numbers on x86 32-bit In-Reply-To: References: Message-ID: Hi Dmitry, Shouldn't this at least be guarded by an #ifndef _LP64 to avoid the additional move on x86_64? Could you please add the regression test to the webrev? Or did this reproduce with other tests? Thanks, Tobias On 25.09.2018 16:00, Dmitry Cherepanov wrote: > Hello, > > Please review a patch that resolves issue in x86 32bit builds. It slightly adjusts the fix for JDK-8201447 (C1 does backedge profiling incorrectly) by creating a copy of the left operand and using it for incrementing backedge counter. > > JBS issue: https://bugs.openjdk.java.net/browse/JDK-8211100 > webrev: http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.00/ > > Thanks, > > Dmitry > From gromero at linux.vnet.ibm.com Tue Sep 25 18:29:04 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 25 Sep 2018 15:29:04 -0300 Subject: [8u] RFR for backport of JDK-8164920: ppc: enhancement of CRC32 intrinsic to jdk8u-dev (v2) In-Reply-To: <31e036a0-a7b7-70f2-422f-addd4049436f@linux.vnet.ibm.com> References: <31e036a0-a7b7-70f2-422f-addd4049436f@linux.vnet.ibm.com> Message-ID: <5d059529-6048-ea79-f661-aae05b754dcc@linux.vnet.ibm.com> Hi, Maybe I please get reviews for the following changes (v2)? http://cr.openjdk.java.net/~gromero/crc32_jdk8u/v2/8131048/ http://cr.openjdk.java.net/~gromero/crc32_jdk8u/v2/8164920/ Change JDK-8131048 is a dependency to backport JDK-8164920 to 8u. Goetz reviewed the first version of this backport and pointed out necessary changes and fixes that are present in this v2. Thank you, Goetz. There is no code change except to adapt the file paths, to add has_vpmsumb() feature detection, and to adapt the signature before doing the runtime call to CRC32 intrinsic by casting T_INTs as T_LONGs, because PPC64 c_calling_convention() requires T_LONGs on 8u, otherwise a proper assert() for that is hit. Change JDK-8131048 touches shared code but, since it checks for 'CCallingConventionRequiresIntsAsLongs' flag that is only currently set on PPC64, that change is in effect PPC64-only. JDK-8164920 is important for PPC64 because it helps a lot some Apache Cassandra workloads on Power. Thank you. Best regards, Gustavo From sandhya.viswanathan at intel.com Tue Sep 25 17:31:23 2018 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 25 Sep 2018 17:31:23 +0000 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF0F805@FMSMSX126.amr.corp.intel.com> Hi Alan, It looks like Apache HBASE also uses FileChannel map and MappedByteBuffer mechanism. The feature proposed by Andrew Dinn and Jonathan Halliday will be very useful for Big Data frameworks as well and help them to use NVM without a need to go to JNI. Copying HBASE experts Anoop and Ram to this thread. Apache has API layers to overcome the 2GB limitation through MultiByteBuff class: https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/nio/MultiByteBuff.html https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/ByteBufferArray.html Some example uses of ByteBuffer in HBASE today: https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/bucket/FileMmapEngine.html https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/ByteBuffInputStream.html https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/ipc/ServerRpcConnection.ByteBuffByteInput.html Best Regards, Sandhya -----Original Message----- From: Alan Bateman [mailto:Alan.Bateman at oracle.com] Sent: Monday, September 24, 2018 1:15 AM To: Andrew Dinn ; core-libs-dev at openjdk.java.net; hotspot compiler ; Aundhe, Shirish ; Dohrmann, Steve ; Viswanathan, Sandhya ; Deshpande, Vivek R ; Jonathan Halliday Subject: Re: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory On 21/09/2018 16:44, Andrew Dinn wrote: > Hi Alan, > > Thanks for the response and apologies for failing to notice you had > posted it some days ago (doh!). > > Jonathan Halliday has already explained how Red Hat might want to use > this API. Well, what he said, essentially! In particular, this model > provides a way of ensuring that raw byte data is able to be persisted > coherently from Java with the minimal possible overhead. It would be up > to client code above this layer to implement structuring mechanisms for > how those raw bytes get populated with data and to manage any associated > issues regarding atomicity, consistency and isolation (i.e. to provide > the A, C and I of ACID to this API's D). > > The main point of the JEP is to ensure that this such a performant base > capability is available for known important cases where that is needed > such as, for example, a transaction manager or a distributed cache. If > equivalent middleware written in C can use persistent memory to bring > the persistent storage tier nearer to the CPU and, hence, lower data > durability overheads then we really need an equivalently performant > option in Java or risk Java dropping out as a player in those middleware > markets. > > I am glad to hear that other alternatives might be available and would > be happy to consider them. However, I'm not sure that this means this > option is not still desirable, especially if it is orthogonal to those > other alternatives. Most importantly, this one has the advantage that we > know it is ready to use and will provide benefits (we have already > implemented a journaled transaction log over it with promising results > and someone from our messaging team has already been looking into using > it to persist log messages). Indeed, we also know we can use it to > provide a base for supporting all the use cases addressed by Intel's > libpmem and available to C programmers, e.g. a block store, simply by > implementing Java client libraries that provide managed access to the > persistent buffer along the same lines as the Intel C libraries. > > I'm afraid I am not familiar with Panama 'scopes' and 'pointers' so I > can't really compare options here. Can you point me at any info that > explains what those terms mean and how it might be possible to use them > to access off-heap, persistent data. > I'm not questioning the need to support NVM, instead I'm trying to see whether MappedByteBuffer is the right way to expose this in the standard API. Buffers were designed in JSR-51 with specific use-cases in mind but they are problematic for many off-heap cases as they aren't thread safe, are limited to 2GB, lack confinement, only support homogeneous data (no layout support). At the same time, Project Panama (foreign branch in panama/dev) has the potential to provide the right API to work with memory. I see Jonathan's mail where he seems to be using object serialization so the solution on the table works for his use-case but it may not be the right solution for more general multi-threaded access to NVM. There is some interest in seeing whether this part of Project Panama could be advanced to address many of the cases where developers are resorting to using Unsafe today. There would of course need to be some integration with buffers too. There's no concrete proposal/JEP at this time, I'm just pointing out that many of the complaints about buffers that are really cases where it's the wrong API and the real need is something more fundamental. So where does this leave us? If support for persistent memory is added to FileChannel.map as we've been discussing then it may not be too bad as the API surface is small. The API surface is just new map modes and a MappedByteBuffer::isPersistent method. The force method that specify a range is something useful to add to MBB anyway. If (and I hope when) there is support for memory regions or pointers then I could imagine re-visiting this so that there are alternative ways to get a memory region or pointer that is backed by NVM. If the timing were different then I think we'd skip the new map modes and we would be having a different discussion here. An alternative is course to create the mapped buffer via a JDK-specific API as that would be easier to deprecate and remove in the future if needed. I'm interested to see if there is other input on this topic before it gets locked into extending the standard API. -Alan. From david.holmes at oracle.com Tue Sep 25 21:27:47 2018 From: david.holmes at oracle.com (David Holmes) Date: Tue, 25 Sep 2018 17:27:47 -0400 Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of single-threaded walk of thread stacks in NMethodSweeper::mark_active_nmethods() In-Reply-To: <66c0754a-e93b-48b8-4528-184c636d7254@redhat.com> References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com> <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com> <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com> <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com> <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com> <6c97cf4c-517f-767c-6723-1b2f41080f7f@redhat.com> <818610b4-cf48-091a-3719-09f1f863c508@redhat.com> <662e1e46-a48e-34db-da0a-df693160928d@redhat.com> <66c0754a-e93b-48b8-4528-184c636d7254@redhat.com> Message-ID: Hi Roman, This change seems to have broken a test: compiler/whitebox/ForceNMethodSweepTest.java on exception 'sweep after deoptimization should decrease usage: expected that 1477504 < 1477504': I'm assuming the test is assuming (it is a whitebox test afterall) that the cleanup is happening synchronously. No bug filed yet. Thanks, David On 25/09/2018 4:17 AM, Roman Kennke wrote: > Thanks Tobias for reviewing! > > Erik: Is this good for you too? > > Thanks, > Roman > >> Hi Roman, >> >> On 24.09.2018 18:46, Roman Kennke wrote: >>> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.05/ >> >> This looks good to me! >> >> Best regards, >> Tobias >> > > From david.holmes at oracle.com Tue Sep 25 21:32:57 2018 From: david.holmes at oracle.com (David Holmes) Date: Tue, 25 Sep 2018 17:32:57 -0400 Subject: RFR: JDK-8132849: Increased stop time in cleanup phase because of single-threaded walk of thread stacks in NMethodSweeper::mark_active_nmethods() In-Reply-To: References: <2224c4e0-9ffd-d094-a9d1-5e265239912f@redhat.com> <87a686e9-2d7a-58f2-f09f-8dcd2f8b9f3c@redhat.com> <0ebf559b-f480-34e3-9939-d820a398f16c@oracle.com> <109f95e2-2096-a1cf-c0d9-a0714cec0dba@redhat.com> <15330507-2ffc-702e-f008-6aa47e6e4288@oracle.com> <6c97cf4c-517f-767c-6723-1b2f41080f7f@redhat.com> <818610b4-cf48-091a-3719-09f1f863c508@redhat.com> <662e1e46-a48e-34db-da0a-df693160928d@redhat.com> <66c0754a-e93b-48b8-4528-184c636d7254@redhat.com> Message-ID: <2163fb80-e8de-c604-9547-2790c15558af@oracle.com> Filed: JDK-8211129 David On 25/09/2018 5:27 PM, David Holmes wrote: > Hi Roman, > > This change seems to have broken a test: > > compiler/whitebox/ForceNMethodSweepTest.java > > on exception 'sweep after deoptimization should decrease usage: expected > that 1477504 < 1477504': > > I'm assuming the test is assuming (it is a whitebox test afterall) that > the cleanup is happening synchronously. > > No bug filed yet. > > Thanks, > David > > On 25/09/2018 4:17 AM, Roman Kennke wrote: >> Thanks Tobias for reviewing! >> >> Erik: Is this good for you too? >> >> Thanks, >> Roman >> >>> Hi Roman, >>> >>> On 24.09.2018 18:46, Roman Kennke wrote: >>>> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.05/ >>> >>> This looks good to me! >>> >>> Best regards, >>> Tobias >>> >> >> From john.r.rose at oracle.com Tue Sep 25 23:57:26 2018 From: john.r.rose at oracle.com (John Rose) Date: Tue, 25 Sep 2018 16:57:26 -0700 Subject: [12] RFR(M): 8210215: C2 should optimize trichotomy calculations In-Reply-To: References: Message-ID: <8DC6DEBA-9D31-4DFB-97D7-83474E69A5E3@oracle.com> On Sep 25, 2018, at 2:28 AM, Tobias Hartmann wrote: > > Thanks to John for pre-reviewing this change. Based on a partial inspection, two comments: `res[9][9]` should be `res[illegal+1][illegal+1]` and should have rows and columns for `never` (code smell: `9` is a naked constant; makes it hard to tell your table is out of date) In the test cases `compare1` has `(a < b) ? -1 : (a == b) ? 0 : 1`. Shouldn?t you also test `(a < b) ? -1 : (a <= b) ? 0 : 1`? And similarly, for other cases where the second test overlaps with the first. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuart.marks at oracle.com Wed Sep 26 02:19:02 2018 From: stuart.marks at oracle.com (Stuart Marks) Date: Tue, 25 Sep 2018 19:19:02 -0700 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> Message-ID: Hi Andrew, I've been starting to look at some of the buffer-related issues and I've been discussing this issue with Alan. On 9/25/18 2:01 AM, Andrew Haley wrote: > On 09/24/2018 09:14 AM, Alan Bateman wrote: > >> I'm not questioning the need to support NVM, instead I'm trying to >> see whether MappedByteBuffer is the right way to expose this in the >> standard API. Buffers were designed in JSR-51 with specific >> use-cases in mind but they are problematic for many off-heap cases >> as they aren't thread safe, are limited to 2GB, lack confinement, >> only support homogeneous data (no layout support). > > I'm baffled by this assertion. It's true that the 2Gb limit is turning > into a real pain, but all of the rest are nothing to do with > ByteBuffers, which are just raw memory. Adding structure is something > that can be done by third-party libraries or by some future OpenJDK > API. If you look around Java SE for a public API to represent raw memory, then MappedByteBuffer is the obvious choice; there isn't any realistic alternative right now. By asking whether MBB is "the right way to expose" raw memory, I think Alan is really saying, is MBB the best API to use to expose raw memory in the long run? I think the answer is clearly No. However, that's not an argument against proceeding with MBB. Rather, it's setting expectations that MBB has limitations that impose some pain in the short term, that possibly can be worked around, but which in the long term may prove to be insurmountable. For an example of something that might be "insurmountable", I'll use the 2GB limitation. Doing something to raise the limit is certainly possible. The question is, after retrofitting this into the API, whether the result will be something that people want to program with, and whether it will perform well enough. It might not. Another example would be a library layered on top that provides structured access. It's certainly possible have such a library that will get the job done. However, the Buffer API necessarily requires certain operations to be implemented using multiple method calls, or it might require copying in some cases. Either of these might impose unacceptable overhead. There are, however, certain things that can be done with buffers in the short term to make things work better. For example, JDK-5029431 absolute bulk put/get methods. I suspect this will be quite helpful for the NVM case, and indeed it's been something that's been asked for repeatedly for quite some time. If you (collectively) are aware of this and other limitations, then sure, let's proceed with this JEP. >> So where does this leave us? If support for persistent memory is >> added to FileChannel.map as we've been discussing then it may not be >> too bad as the API surface is small. The API surface is just new map >> modes and a MappedByteBuffer::isPersistent method. The force method >> that specify a range is something useful to add to MBB anyway. > > Yeah, that's right, it is. While something not yet planned might be an > alternative, even a better one, the purpose of our faster release > cadence is to "evolve the Java SE Platform and the JDK at a more rapid > pace, so that new features [can] be delivered in timelier > manner". This is timely; waiting for Panama to think of what might be > possible, not so much. Agree, "waiting for Panama" is certainly not something that anyone wants to do. s'marks From felix.yang at huawei.com Wed Sep 26 03:01:51 2018 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 26 Sep 2018 03:01:51 +0000 Subject: [aarch64-port-dev ] RFR(S): 8210413: AArch64: Optimize div/rem by constant in C1 In-Reply-To: References: <9645c210-3d87-52fa-8051-54dc60629866@redhat.com> <9d454f5b-475b-5713-7155-c6946f378c3e@redhat.com> Message-ID: Just eyeballed the changes, looks good. Pushed: http://hg.openjdk.java.net/jdk/jdk/rev/e1368526699d Thanks, Felix > > Thanks for your code review. Could you help push this patch? > > -- > Thanks, > Pengfei > > > > -----Original Message----- > > > > On 09/20/2018 05:15 AM, Pengfei Li (Arm Technology China) wrote: > > > Please find below new patch that added the same optimization for longs as > > well as ints and also fixed an issue. > > > http://cr.openjdk.java.net/~yzhang/8210413/webrev.01/ > > > > > > Could you help look at it again? > > > > That's fine. I'm not exactly delighted by the amount of duplicated code for > > long and int, but it's very hard to avoid in this case. > > The patch is good for JDK/JDK. > > > > -- > > Andrew Haley > > Java Platform Lead Engineer > > Red Hat UK Ltd. > > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dcherepanov at azul.com Wed Sep 26 07:04:18 2018 From: dcherepanov at azul.com (Dmitry Cherepanov) Date: Wed, 26 Sep 2018 07:04:18 +0000 Subject: RFR: 8211100: hotspot C1 issue with comparing long numbers on x86 32-bit In-Reply-To: References: Message-ID: Hi Tobias, Thanks for the review, updated patch avoids the additional move on x86_64 and includes the regression test. http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.01/ Dmitry On Sep 25, 2018, at 6:40 PM, Tobias Hartmann > wrote: Hi Dmitry, Shouldn't this at least be guarded by an #ifndef _LP64 to avoid the additional move on x86_64? Could you please add the regression test to the webrev? Or did this reproduce with other tests? Thanks, Tobias On 25.09.2018 16:00, Dmitry Cherepanov wrote: Hello, Please review a patch that resolves issue in x86 32bit builds. It slightly adjusts the fix for JDK-8201447 (C1 does backedge profiling incorrectly) by creating a copy of the left operand and using it for incrementing backedge counter. JBS issue: https://bugs.openjdk.java.net/browse/JDK-8211100 webrev: http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.00/ Thanks, Dmitry -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Wed Sep 26 08:25:17 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 26 Sep 2018 10:25:17 +0200 Subject: [12] RFR(M): 8210215: C2 should optimize trichotomy calculations In-Reply-To: <8DC6DEBA-9D31-4DFB-97D7-83474E69A5E3@oracle.com> References: <8DC6DEBA-9D31-4DFB-97D7-83474E69A5E3@oracle.com> Message-ID: <4991b788-ed40-ffd0-ff0c-85a4ef246df2@oracle.com> Hi John, thanks for looking at this again! On 26.09.2018 01:57, John Rose wrote: > `res[9][9]` should be `res[illegal+1][illegal+1]` and should have rows and columns for `never` > (code smell: ?`9` is a naked constant; makes it hard to tell your table is out of date) Right, I've updated the table. > In the test cases `compare1` has `(a < b) ? -1 : (a == b) ? 0 : 1`. > Shouldn?t you also test `(a < b) ? -1 : (a <= b) ? 0 : 1`? > And similarly, for other cases where the second test overlaps > with the first. I did not add tests for all the 6? operator combinations but I think more overlapping tests won't hurt. I've added (a < b) ? -1 : (a <= b) ? 0 : 1; (a > b) ? 1 : (a >= b) ? 0 : -1; (a == b) ? 0 : (a <= b) ? -1 : 1; (a == b) ? 0 : (a >= b) ? 1 : -1; and verified that all inlined comparisons fold. Here's the new webrev: http://cr.openjdk.java.net/~thartmann/8210215/webrev.01/ Thanks, Tobias From tobias.hartmann at oracle.com Wed Sep 26 08:29:42 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 26 Sep 2018 10:29:42 +0200 Subject: RFR: 8211100: hotspot C1 issue with comparing long numbers on x86 32-bit In-Reply-To: References: Message-ID: Hi Dmitry, this looks good to me but Igor (who implemented 8201447) should have a look as well. Best regards, Tobias On 26.09.2018 09:04, Dmitry Cherepanov wrote: > Hi Tobias, > > Thanks for the review, updated patch avoids the additional move on x86_64 and includes the > regression test. > > http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.01/ > > > Dmitry > >> On Sep 25, 2018, at 6:40 PM, Tobias Hartmann > > wrote: >> >> Hi Dmitry, >> >> Shouldn't this at least be guarded by an #ifndef _LP64 to avoid the additional move on x86_64? >> >> Could you please add the regression test to the webrev? Or did this reproduce with other tests? >> >> Thanks, >> Tobias >> >> On 25.09.2018 16:00, Dmitry Cherepanov wrote: >>> Hello, >>> >>> Please review a patch that resolves issue in x86 32bit builds. It slightly adjusts the fix for >>> JDK-8201447 (C1 does backedge profiling incorrectly) by creating a copy of the left operand and >>> using it for incrementing backedge counter. >>> >>> JBS issue: https://bugs.openjdk.java.net/browse/JDK-8211100 >>> webrev: http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.00/ >>> >>> >>> Thanks, >>> >>> Dmitry >>> > From rkennke at redhat.com Wed Sep 26 08:30:17 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 26 Sep 2018 10:30:17 +0200 Subject: RFR: JDK-8211129: [Testbug] compiler/whitebox/ForceNMethodSweepTest.java fails after JDK-8132849 Message-ID: Please review the following change: Several tests fail because after forcing nmethod sweep via Whitebox API, the sweeper doesn't actually kick in. The reason is the changed heuristic in NMethodSweeper: before JDK-8132849, we would scan stacks and mark nmethods at every safepoint, during safepoint cleanup phase. This would subsequently trigger a sweep cycle via _should_sweep. If no stack-scanning is performed, the sweeper would skip sweeping because the CompiledMethodIterator _current has not been reset. I propose to change the following: - In the sweep-loop, call into do_stack_scanning() whenever it's forced (via WhiteBox API) or if should_sweep has been determined by other heuristics (code-cache-change, time-since-last-sweep,..) - Instead let do_stack_scanning() not set _should_sweep anymore. Bug: https://bugs.openjdk.java.net/browse/JDK-8211129 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8211129/webrev.00/ Testing: Fixes previously failing: compiler/whitebox/ForceNMethodSweepTest.java jdk/jfr/event/compiler/TestCodeSweeperStats.java Passes: hotspot/jtreg:tier1 -------------- next part -------------- A non-text attachment was scrubbed... Name: pEpkey.asc Type: application/pgp-keys Size: 1761 bytes Desc: not available URL: From Alan.Bateman at oracle.com Wed Sep 26 10:35:52 2018 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Wed, 26 Sep 2018 11:35:52 +0100 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> Message-ID: <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> On 26/09/2018 09:44, Andrew Dinn wrote: > : >> If you (collectively) are aware of this and other limitations, then >> sure, let's proceed with this JEP. > Well, I'm very happy to proceed if Alan is in agreement. One thing he > suggested in an earlier post was splitting off the functionality to > create a persistent ByteBuffer into a separate method so as to avoid any > issues if we have to deprecate this model at a alter date. I think > that's a quite reasonable precaution and I'd be happy to propose an > alternative API or let Alan suggest one. Perhaps Alan can comment? > I'm reasonably happy with the approach that we converged on to introduce new map modes and use the existing FileChannel.map method. Ideally new map modes wouldn't need to be exposed but you've looked into that (to my satisfaction at least). One detail that I think may need another iteration or two on is whether we need one or two modes. This will become clearer once the javadoc is fleshed out a bit further. It maybe that one new map mode, "SYNC" for example, that works with the existing READ_WRITE mode may be clearer and easier to specify. I think that would be consistent with how copy-on-write mappings are exposed with the PRIVATE mode. It also provides a 1-1 mapping to the underlying MAP_SYNC flag too. As regards the bigger topic on what the right API is for "memory" then I don't think ByteBuffer is the right answer. You've touched on a few of the issues in your mail but there are bigger issues around thread safety and confinement, also the issue of the buffer position/limit that get in the way and the reason why several libraries use Unsafe. There isn't any concrete proposal or discussion to point at around splitting out this aspect of Project Panama. Stuart and I just pointing out that a better solution could emerge which could lead to have an alternative API to map a region of NVM as "memory" rather than a mapped byte buffer. If I were developing a file system backed by NVM then that is probably the raw API that I would want, not MBB. As regards introducing an API that we could deprecate then that musing was about introducing a JDK-specific API. If MapMode were an interface then we could have introduce a JDK-specific map mode that wouldn't have required rev'ing the standard API. Introducing a completely separate map method in a JDK-specific module doesn't seem to be worth it as I think we can be confident that the proposed and possible-new.future approaches will not conflict. -Alan From adinn at redhat.com Wed Sep 26 08:44:04 2018 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 26 Sep 2018 09:44:04 +0100 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> Message-ID: <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> Hi Stuart, On 26/09/18 03:19, Stuart Marks wrote: > I've been starting to look at some of the buffer-related issues and I've > been discussing this issue with Alan. I'd be interested to hear more details if the discussion has gone far enough for any of it to be aired online. > On 9/25/18 2:01 AM, Andrew Haley wrote: >> . . . >> I'm baffled by this assertion. It's true that the 2Gb limit is turning >> into a real pain, but all of the rest are nothing to do with >> ByteBuffers, which are just raw memory. Adding structure is something >> that can be done by third-party libraries or by some future OpenJDK >> API. > > If you look around Java SE for a public API to represent raw memory, > then MappedByteBuffer is the obvious choice; there isn't any realistic > alternative right now. By asking whether MBB is "the right way to > expose" raw memory, I think Alan is really saying, is MBB the best API > to use to expose raw memory in the long run? I think the answer is > clearly No. Sorry, it may well be my fault but it's not really clear to me. You mention two issues below, buffer size limit and API verbosity. I acknowledge the former is a problem but i) there is a proposal (JDK-8180628, referred to in the JEP) to deal with this limitation by adding extra methods that support the creation of larger buffers and use of long indices and ii) there are existing Java libraries built over ByteBuffer that overcome this issue (as Sandhya pointed out in a note somewhere near this one). Sure, both of these remedies have limitations which /might/ lead to problems but I don't see (yet, at least) that they are manifestly unworkable. As regards the latter issue, I am not really sure what you are suggesting would be a better alternative to using ByteBuffer get and put methods? Are you perhaps thinking of some way of overlaying a record (or object?) structure over the mapped memory that might allow a compiler to provide an equivalent to these ByteBuffer method calls as direct memory loads and stores? Of course, a Java library built on top of this proposal could provide a similar abstraction, although perhaps not with as firm guarantees for compiler optimization and certainly not with the possibility of direct language integration. Copying might indeed be an issue but surely that depends on the type of data being written, the library design and the way the client needs to operate in order to use it (essentially on whether it can size in advance a data area in which to write the contents direct vs build a separate copy as distinct pieces and then serialize them). Anyway, I hope the above explains why I'm not sure about your use of the the words 'clearly' or (in a in a later comment) 'insurmountable'. Perhaps more details of your conversation with Alan would help. > There are, however, certain things that can be done with buffers in the > short term to make things work better. For example, JDK-5029431 absolute > bulk put/get methods. I suspect this will be quite helpful for the NVM > case, and indeed it's been something that's been asked for repeatedly > for quite some time. Would it be enough to add a comment to the Risks and Assumptions of the JEP to point out this limitation and the potential need to address it, mentioning this specific JDK issue -- much as was done with JDK-8180628. > If you (collectively) are aware of this and other limitations, then > sure, let's proceed with this JEP. Well, I'm very happy to proceed if Alan is in agreement. One thing he suggested in an earlier post was splitting off the functionality to create a persistent ByteBuffer into a separate method so as to avoid any issues if we have to deprecate this model at a alter date. I think that's a quite reasonable precaution and I'd be happy to propose an alternative API or let Alan suggest one. Perhaps Alan can comment? > Agree, "waiting for Panama" is certainly not something that anyone wants > to do. Yes, indeed, there are already several important use cases waiting in the wings. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From aph at redhat.com Wed Sep 26 09:53:23 2018 From: aph at redhat.com (Andrew Haley) Date: Wed, 26 Sep 2018 10:53:23 +0100 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> Message-ID: On 09/26/2018 09:44 AM, Andrew Dinn wrote: > As regards the latter issue, I am not really sure what you are > suggesting would be a better alternative to using ByteBuffer get and put > methods? Are you perhaps thinking of some way of overlaying a record (or > object?) structure over the mapped memory that might allow a compiler to > provide an equivalent to these ByteBuffer method calls as direct memory > loads and stores? Of course, a Java library built on top of this > proposal could provide a similar abstraction, although perhaps not with > as firm guarantees for compiler optimization and certainly not with the > possibility of direct language integration. Thinking about it some more, I guess that being able to say something like aFoo.bar = n; rather than aFoo.setBar(n); is preferable, although common Java practice (and indeed good OOP practice) is to provide getters and setters rather than directly expose fields. I suppose one advantage of being able to use an object structure is that the compiler can do better (type-based) alias analysis, can track dead stores, etc. But from a language design perspective, the fact that classes internally use direct field accesses but expose a completely different get/set notation is something of a linguistic wart. [ The BETA language used a single notation, the pattern, for assignment, method calls, and argument passing. Therefore, in BETA there would be no API difference between the two exaples above. They'd both be something like n -> aFoo.bar Curiously, the first commercial licences for BETA were acquired by Bill Joy and James Gosling, so they knew about this, but I suppose a more C-like notation for Java was the right decision. The BETA notation would have been too frightening for the target audience. :-) ] -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From adinn at redhat.com Wed Sep 26 13:27:31 2018 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 26 Sep 2018 14:27:31 +0100 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> Message-ID: Hi Alan, On 26/09/18 11:35, Alan Bateman wrote: > I'm reasonably happy with the approach that we converged on to introduce > new map modes and use the existing FileChannel.map method. Ideally new > map modes wouldn't need to be exposed but you've looked into that (to my > satisfaction at least). One detail that I think may need another > iteration or two on is whether we need one or two modes. This will > become clearer once the javadoc is fleshed out a bit further. It maybe > that one new map mode, "SYNC" for example, that works with the existing > READ_WRITE mode may be clearer and easier to specify. I think that would > be consistent with how copy-on-write mappings are exposed with the > PRIVATE mode. It also provides a 1-1 mapping to the underlying MAP_SYNC > flag too. I'm not clear why we should only use one flag. The two flags I specified reflect two independent use cases, one where data stored in an NVM device is accessed read-only and another where it is accessed read-write. Are you suggesting that the read-only case is redundant? I'm not sure I agree. For example, a utility which might want to review the state of persistent data while a service is off-line would really want to pass flag READ_ONLY_PERSISTENT. Of course, it could employ READ_WRITE_PERSISTENT (or equivalently, SYNC) and just not write the data but, mutatis mutandis, that same argument would remove the case for flag READ_ONLY. > As regards the bigger topic on what the right API is for "memory" then I > don't think ByteBuffer is the right answer. You've touched on a few of > the issues in your mail but there are bigger issues around thread safety > and confinement, also the issue of the buffer position/limit that get in > the way and the reason why several libraries use Unsafe. There isn't any > concrete proposal or discussion to point at around splitting out this > aspect of Project Panama. Stuart and I just pointing out that a better > solution could emerge which could lead to have an alternative API to map > a region of NVM as "memory" rather than a mapped byte buffer. If I were > developing a file system backed by NVM then that is probably the raw API > that I would want, not MBB. I really don't understand how thread safety comes into the argument here. How is some other mechanism going to avoid the need for client threads -- or, rather, the applications which create them them -- to manage concurrent updates of NVM? Are you perhaps thinking of some form of software transactional memory? I'm really struggling to understand why you keep raising this point without any further detail to explain how the lack of exclusion and synchronization primitives constitutes a problem this API that can be bypassed by rolling some equivalent alternative into another API. Also, can you explain what you mean by confinement? (thread confinement?). Also, I don't think I would label this API an attempt to develop a file system. I think that's rather and overblown characterisation of what it does. This is an attempt to provide an intermediate storage tier somewhere between a file system and volatile memory to create/access/update data across program runs, without incurring the costs associated with implementing that sort of capability on top of existing file system implementations. The use of a byte array layout at the base level is indeed, as the success of Unix/Linux/MS Windows file systems makes clear, a helpful way of enabling a variety of application-defined data layouts to be implemented on top of this storage tier. But I don't really see how that makes this a file system. > As regards introducing an API that we could deprecate then that musing > was about introducing a JDK-specific API. If MapMode were an interface > then we could have introduce a JDK-specific map mode that wouldn't have > required rev'ing the standard API. Introducing a completely separate map > method in a JDK-specific module doesn't seem to be worth it as I think > we can be confident that the proposed and possible-new.future approaches > will not conflict. Ok, so no need for a change there then I guess. I'm still not quite sure where this reply leaves the JEP though. Shall I update the Risks and Assumptions section to include mention of JDK-5029431 as suggested to Stuart? Is there anything else I can do to progress things? regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From adinn at redhat.com Wed Sep 26 15:26:44 2018 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 26 Sep 2018 16:26:44 +0100 Subject: RFR: JDK-8211105: AArch64: Disable cos/sin and log intrinsics in jdk11u pending fix Message-ID: <3ae53e62-75c1-0bc6-b0be-fa07c183efe2@redhat.com> Can I please get a review for this trivial fix for jdk11u which is intended to disable the broken, generated AArch64 trig and log intrinsics. This is a stop-gap to avoid the breakage until we are ok to backport upstream fixes. JIRA: https://bugs.openjdk.java.net/browse/JDK-8211105 webrev: http://cr.openjdk.java.net/~adinn/8211105/webrev.00/ regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From aph at redhat.com Wed Sep 26 15:31:14 2018 From: aph at redhat.com (Andrew Haley) Date: Wed, 26 Sep 2018 16:31:14 +0100 Subject: RFR: JDK-8211105: AArch64: Disable cos/sin and log intrinsics in jdk11u pending fix In-Reply-To: <3ae53e62-75c1-0bc6-b0be-fa07c183efe2@redhat.com> References: <3ae53e62-75c1-0bc6-b0be-fa07c183efe2@redhat.com> Message-ID: On 09/26/2018 04:26 PM, Andrew Dinn wrote: > Can I please get a review for this trivial fix for jdk11u which is > intended to disable the broken, generated AArch64 trig and log > intrinsics. This is a stop-gap to avoid the breakage until we are ok to > backport upstream fixes. > > JIRA: https://bugs.openjdk.java.net/browse/JDK-8211105 > webrev: http://cr.openjdk.java.net/~adinn/8211105/webrev.00/ OK -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shade at redhat.com Wed Sep 26 15:33:44 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 26 Sep 2018 17:33:44 +0200 Subject: RFR: JDK-8211105: AArch64: Disable cos/sin and log intrinsics in jdk11u pending fix In-Reply-To: <3ae53e62-75c1-0bc6-b0be-fa07c183efe2@redhat.com> References: <3ae53e62-75c1-0bc6-b0be-fa07c183efe2@redhat.com> Message-ID: On 09/26/2018 05:26 PM, Andrew Dinn wrote: > Can I please get a review for this trivial fix for jdk11u which is > intended to disable the broken, generated AArch64 trig and log > intrinsics. This is a stop-gap to avoid the breakage until we are ok to > backport upstream fixes. > > JIRA: https://bugs.openjdk.java.net/browse/JDK-8211105 > webrev: http://cr.openjdk.java.net/~adinn/8211105/webrev.00/ Looks good. Please mention in the comments that we are waiting on JDK-8210858 and JDK-8210461. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From dmitrij.pochepko at bell-sw.com Wed Sep 26 15:40:33 2018 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Wed, 26 Sep 2018 18:40:33 +0300 Subject: RFR: JDK-8211105: AArch64: Disable cos/sin and log intrinsics in jdk11u pending fix In-Reply-To: <3ae53e62-75c1-0bc6-b0be-fa07c183efe2@redhat.com> References: <3ae53e62-75c1-0bc6-b0be-fa07c183efe2@redhat.com> Message-ID: Looks food to me. (Sorry for the trouble) On 26/09/18 18:26, Andrew Dinn wrote: > Can I please get a review for this trivial fix for jdk11u which is > intended to disable the broken, generated AArch64 trig and log > intrinsics. This is a stop-gap to avoid the breakage until we are ok to > backport upstream fixes. > > JIRA: https://bugs.openjdk.java.net/browse/JDK-8211105 > webrev: http://cr.openjdk.java.net/~adinn/8211105/webrev.00/ > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From Alan.Bateman at oracle.com Wed Sep 26 16:00:43 2018 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Wed, 26 Sep 2018 17:00:43 +0100 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> Message-ID: <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com> On 26/09/2018 14:27, Andrew Dinn wrote: > : > I really don't understand how thread safety comes into the argument > here. How is some other mechanism going to avoid the need for client > threads -- or, rather, the applications which create them them -- to > manage concurrent updates of NVM? Are you perhaps thinking of some form > of software transactional memory? I'm really struggling to understand > why you keep raising this point without any further detail to explain > how the lack of exclusion and synchronization primitives constitutes a > problem this API that can be bypassed by rolling some equivalent > alternative into another API. The reason that we've mentioned it a few times is because it's a significant issue. If you have a byte buffer then you can't have different threads accessing different parts of the buffer at the same time, at least not with any of the relative get/put methods as they depend on the buffer position. Sure you can globally synchronize all operations but you'll likely want much finer granularity. This bugbear comes up periodically, particularly when using buffers for cases that they weren't really designed for. Stuart pointed out the lack of absolute bulk get/put operations which is something that I think will help some of these cases. > > Also, can you explain what you mean by confinement? (thread confinement?). Yes, thread vs. global. I haven't been following Panama close enough to say how this is exposed in the API. > > Also, I don't think I would label this API an attempt to develop a file > system. I think that's rather and overblown characterisation of what it > does. I think you may have mis-read my mail as was just picking another example where MBB would be problematic. > : > I'm still not quite sure where this reply leaves the JEP though. Shall I > update the Risks and Assumptions section to include mention of > JDK-5029431 as suggested to Stuart? Is there anything else I can do to > progress things? > It wouldn't do any harm to have this section mention that an alternative that exposes a more memory centric API may be possible in the future. -Alan From igor.veresov at oracle.com Wed Sep 26 16:35:18 2018 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 26 Sep 2018 09:35:18 -0700 Subject: RFR: 8211100: hotspot C1 issue with comparing long numbers on x86 32-bit In-Reply-To: References: Message-ID: <659DF4FF-71B9-472D-A064-038ADF2A50FF@oracle.com> It doesn?t seem to me like the proper way to fix it. The problem is that the cmp is destroying opr1 without telling the register allocator about it. One possible solution would be to make opr1 also a temp (see LIR_OpVisitState::visit(LIR_Op* op) in c1_LIR.cpp), only for x86 32bit and only if the operand type is T_LONG. Another solution is to maintain a temporary register for lir_cmp and use it to save/restore opr1 when emitting the code in LIR_Assembler::comp_op(). Again, the temporary register has to be there only for x86 32bit and T_LONG. igor > On Sep 26, 2018, at 1:29 AM, Tobias Hartmann wrote: > > Hi Dmitry, > > this looks good to me but Igor (who implemented 8201447) should have a look as well. > > Best regards, > Tobias > > On 26.09.2018 09:04, Dmitry Cherepanov wrote: >> Hi Tobias, >> >> Thanks for the review, updated patch avoids the additional move on x86_64 and includes the >> regression test. >> >> http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.01/ >> > >> >> Dmitry >> >>> On Sep 25, 2018, at 6:40 PM, Tobias Hartmann >>> >> wrote: >>> >>> Hi Dmitry, >>> >>> Shouldn't this at least be guarded by an #ifndef _LP64 to avoid the additional move on x86_64? >>> >>> Could you please add the regression test to the webrev? Or did this reproduce with other tests? >>> >>> Thanks, >>> Tobias >>> >>> On 25.09.2018 16:00, Dmitry Cherepanov wrote: >>>> Hello, >>>> >>>> Please review a patch that resolves issue in x86 32bit builds. It slightly adjusts the fix for >>>> JDK-8201447 (C1 does backedge profiling incorrectly) by creating a copy of the left operand and >>>> using it for incrementing backedge counter. >>>> >>>> JBS issue: https://bugs.openjdk.java.net/browse/JDK-8211100 >>>> webrev: http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.00/ >>>> > >>>> >>>> Thanks, >>>> >>>> Dmitry -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkennke at redhat.com Wed Sep 26 17:26:29 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 26 Sep 2018 19:26:29 +0200 Subject: RFR: JDK-8211129: compiler/whitebox/ForceNMethodSweepTest.java fails after JDK-8132849 In-Reply-To: References: Message-ID: Ping! This fixes two failing tests... (also changed subject to remove [Testbug]) Thanks, Roman Am 26.09.18 um 10:30 schrieb Roman Kennke: > Please review the following change: > > Several tests fail because after forcing nmethod sweep via Whitebox API, > the sweeper doesn't actually kick in. > > The reason is the changed heuristic in NMethodSweeper: before > JDK-8132849, we would scan stacks and mark nmethods at every safepoint, > during safepoint cleanup phase. This would subsequently trigger a sweep > cycle via _should_sweep. If no stack-scanning is performed, the sweeper > would skip sweeping because the CompiledMethodIterator _current has not > been reset. > > I propose to change the following: > > - In the sweep-loop, call into do_stack_scanning() whenever it's forced > (via WhiteBox API) or if should_sweep has been determined by other > heuristics (code-cache-change, time-since-last-sweep,..) > > - Instead let do_stack_scanning() not set _should_sweep anymore. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8211129 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8211129/webrev.00/ > > Testing: Fixes previously failing: > compiler/whitebox/ForceNMethodSweepTest.java > jdk/jfr/event/compiler/TestCodeSweeperStats.java > > Passes: hotspot/jtreg:tier1 > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From vladimir.kozlov at oracle.com Wed Sep 26 19:25:43 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 26 Sep 2018 12:25:43 -0700 Subject: [12] RFR(M): 8210215: C2 should optimize trichotomy calculations In-Reply-To: <4991b788-ed40-ffd0-ff0c-85a4ef246df2@oracle.com> References: <8DC6DEBA-9D31-4DFB-97D7-83474E69A5E3@oracle.com> <4991b788-ed40-ffd0-ff0c-85a4ef246df2@oracle.com> Message-ID: Hi Tobias, In the head comment of RegionNode::optimize_trichotomy() add code examples you optimizing. I think you need to return bool value if modification happens and || with 'modified' value. In shape 1 check you check for region->outcnt() != 2. Does it mean this Region node does not have Phi node attached? 1 - is itself, 2 - is this Region node. Next optimization seems wrong for case 1 where you don't check Phi inputs - it could be normal diamond shape with different Phi node values on each branch: + if (iff1 == iff2) { + igvn->replace_input_of(region, idx1, iff1->in(0)); + igvn->replace_input_of(region, idx2, igvn->C->top()); + return; // Remove useless if (both projections map to the same control/value) + } I think you need to check control flow and Phi inputs for both cases to make sure you got only expected shapes before transforming graph. Thanks, Vladimir On 9/26/18 1:25 AM, Tobias Hartmann wrote: > Hi John, > > thanks for looking at this again! > > On 26.09.2018 01:57, John Rose wrote: >> `res[9][9]` should be `res[illegal+1][illegal+1]` and should have rows and columns for `never` >> (code smell: ?`9` is a naked constant; makes it hard to tell your table is out of date) > > Right, I've updated the table. > >> In the test cases `compare1` has `(a < b) ? -1 : (a == b) ? 0 : 1`. >> Shouldn?t you also test `(a < b) ? -1 : (a <= b) ? 0 : 1`? >> And similarly, for other cases where the second test overlaps >> with the first. > > I did not add tests for all the 6? operator combinations but I think more overlapping tests won't > hurt. I've added > > (a < b) ? -1 : (a <= b) ? 0 : 1; > (a > b) ? 1 : (a >= b) ? 0 : -1; > (a == b) ? 0 : (a <= b) ? -1 : 1; > (a == b) ? 0 : (a >= b) ? 1 : -1; > > and verified that all inlined comparisons fold. > > Here's the new webrev: > http://cr.openjdk.java.net/~thartmann/8210215/webrev.01/ > > Thanks, > Tobias > From vladimir.kozlov at oracle.com Wed Sep 26 19:54:20 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 26 Sep 2018 12:54:20 -0700 Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy In-Reply-To: References: <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com> Message-ID: <3524b9d8-f387-a3c4-49c3-eace83126bb7@oracle.com> Looks good but we need to test it hard. I submitted testing. Thanks, Vladimir On 9/25/18 1:42 AM, Tobias Hartmann wrote: > Hi Roland, > > okay, thanks for the clarifications. > > Best regards, > Tobias > > On 25.09.2018 10:37, Roland Westrelin wrote: >> >> Hi Tobias, >> >>> Looks good to me. I'm assuming the currently unused arguments of array_copy_requires_gc_barriers >>> will be used for Shenandoah? Why you need to distinguish between ArrayCopyPhase? >> >> Thanks for the view. >> >> Yes extra arguments are to be used by shenandoah. >> >> Generating barriers once parsing is over is not supported by all >> gcs. The shape of the barriers is sometimes too complicated to be >> emitted at igvn time. >> >> Roland. >> From vladimir.kozlov at oracle.com Wed Sep 26 21:58:22 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 26 Sep 2018 14:58:22 -0700 Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy In-Reply-To: <3524b9d8-f387-a3c4-49c3-eace83126bb7@oracle.com> References: <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com> <3524b9d8-f387-a3c4-49c3-eace83126bb7@oracle.com> Message-ID: <27315645-ff46-0e05-a22d-89c166f5bd8b@oracle.com> A LOT tests failed. workspace/open/src/hotspot/share/opto/graphKit.cpp:1848), pid=24736, tid=15619 jib > # assert(C->alias_type(call->adr_type()) == C->alias_type(hook_mem)) failed: call node must be constructed correctly Vladimir On 9/26/18 12:54 PM, Vladimir Kozlov wrote: > Looks good but we need to test it hard. I submitted testing. > > Thanks, > Vladimir > > On 9/25/18 1:42 AM, Tobias Hartmann wrote: >> Hi Roland, >> >> okay, thanks for the clarifications. >> >> Best regards, >> Tobias >> >> On 25.09.2018 10:37, Roland Westrelin wrote: >>> >>> Hi Tobias, >>> >>>> Looks good to me. I'm assuming the currently unused arguments of array_copy_requires_gc_barriers >>>> will be used for Shenandoah? Why you need to distinguish between ArrayCopyPhase? >>> >>> Thanks for the view. >>> >>> Yes extra arguments are to be used by shenandoah. >>> >>> Generating barriers once parsing is over is not supported by all >>> gcs. The shape of the barriers is sometimes too complicated to be >>> emitted at igvn time. >>> >>> Roland. >>> From igor.veresov at oracle.com Wed Sep 26 23:18:58 2018 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 26 Sep 2018 16:18:58 -0700 Subject: RFR: 8211100: hotspot C1 issue with comparing long numbers on x86 32-bit In-Reply-To: <659DF4FF-71B9-472D-A064-038ADF2A50FF@oracle.com> References: <659DF4FF-71B9-472D-A064-038ADF2A50FF@oracle.com> Message-ID: Edit: It may be more consistent to check for is_double_cpu() instead of T_LONG. Although that?s semantically equivalent. > On Sep 26, 2018, at 9:35 AM, Igor Veresov wrote: > > It doesn?t seem to me like the proper way to fix it. The problem is that the cmp is destroying opr1 without telling the register allocator about it. > > One possible solution would be to make opr1 also a temp (see LIR_OpVisitState::visit(LIR_Op* op) in c1_LIR.cpp), only for x86 32bit and only if the operand type is T_LONG. > Another solution is to maintain a temporary register for lir_cmp and use it to save/restore opr1 when emitting the code in LIR_Assembler::comp_op(). Again, the temporary register has to be there only for x86 32bit and T_LONG. > > igor > > >> On Sep 26, 2018, at 1:29 AM, Tobias Hartmann > wrote: >> >> Hi Dmitry, >> >> this looks good to me but Igor (who implemented 8201447) should have a look as well. >> >> Best regards, >> Tobias >> >> On 26.09.2018 09:04, Dmitry Cherepanov wrote: >>> Hi Tobias, >>> >>> Thanks for the review, updated patch avoids the additional move on x86_64 and includes the >>> regression test. >>> >>> http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.01/ >>> > >>> >>> Dmitry >>> >>>> On Sep 25, 2018, at 6:40 PM, Tobias Hartmann >>>> >> wrote: >>>> >>>> Hi Dmitry, >>>> >>>> Shouldn't this at least be guarded by an #ifndef _LP64 to avoid the additional move on x86_64? >>>> >>>> Could you please add the regression test to the webrev? Or did this reproduce with other tests? >>>> >>>> Thanks, >>>> Tobias >>>> >>>> On 25.09.2018 16:00, Dmitry Cherepanov wrote: >>>>> Hello, >>>>> >>>>> Please review a patch that resolves issue in x86 32bit builds. It slightly adjusts the fix for >>>>> JDK-8201447 (C1 does backedge profiling incorrectly) by creating a copy of the left operand and >>>>> using it for incrementing backedge counter. >>>>> >>>>> JBS issue: https://bugs.openjdk.java.net/browse/JDK-8211100 >>>>> webrev: http://cr.openjdk.java.net/~dcherepanov/8211100/webrev.00/ >>>>> > >>>>> >>>>> Thanks, >>>>> >>>>> Dmitry > -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.osterlund at oracle.com Thu Sep 27 01:36:38 2018 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Wed, 26 Sep 2018 21:36:38 -0400 Subject: RFR: JDK-8211129: compiler/whitebox/ForceNMethodSweepTest.java fails after JDK-8132849 In-Reply-To: References: Message-ID: <88708510-05E6-494F-937D-D9B91BB70E11@oracle.com> Hi Roman, Looks reasonable. Thanks, /Erik > On 26 Sep 2018, at 13:26, Roman Kennke wrote: > > Ping! This fixes two failing tests... > > (also changed subject to remove [Testbug]) > > Thanks, > Roman > > >> Am 26.09.18 um 10:30 schrieb Roman Kennke: >> Please review the following change: >> >> Several tests fail because after forcing nmethod sweep via Whitebox API, >> the sweeper doesn't actually kick in. >> >> The reason is the changed heuristic in NMethodSweeper: before >> JDK-8132849, we would scan stacks and mark nmethods at every safepoint, >> during safepoint cleanup phase. This would subsequently trigger a sweep >> cycle via _should_sweep. If no stack-scanning is performed, the sweeper >> would skip sweeping because the CompiledMethodIterator _current has not >> been reset. >> >> I propose to change the following: >> >> - In the sweep-loop, call into do_stack_scanning() whenever it's forced >> (via WhiteBox API) or if should_sweep has been determined by other >> heuristics (code-cache-change, time-since-last-sweep,..) >> >> - Instead let do_stack_scanning() not set _should_sweep anymore. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8211129 >> Webrev: >> http://cr.openjdk.java.net/~rkennke/JDK-8211129/webrev.00/ >> >> Testing: Fixes previously failing: >> compiler/whitebox/ForceNMethodSweepTest.java >> jdk/jfr/event/compiler/TestCodeSweeperStats.java >> >> Passes: hotspot/jtreg:tier1 >> > From tobias.hartmann at oracle.com Thu Sep 27 09:05:50 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 27 Sep 2018 11:05:50 +0200 Subject: RFR: JDK-8211129: compiler/whitebox/ForceNMethodSweepTest.java fails after JDK-8132849 In-Reply-To: References: Message-ID: Hi Roman, this looks reasonable to me as well. Please verify that all failing tests now pass (TestCodeSweeperStats.java, ForceNMethodSweepTest.java). Thanks, Tobias On 26.09.2018 19:26, Roman Kennke wrote: > Ping! This fixes two failing tests... > > (also changed subject to remove [Testbug]) > > Thanks, > Roman > > > Am 26.09.18 um 10:30 schrieb Roman Kennke: >> Please review the following change: >> >> Several tests fail because after forcing nmethod sweep via Whitebox API, >> the sweeper doesn't actually kick in. >> >> The reason is the changed heuristic in NMethodSweeper: before >> JDK-8132849, we would scan stacks and mark nmethods at every safepoint, >> during safepoint cleanup phase. This would subsequently trigger a sweep >> cycle via _should_sweep. If no stack-scanning is performed, the sweeper >> would skip sweeping because the CompiledMethodIterator _current has not >> been reset. >> >> I propose to change the following: >> >> - In the sweep-loop, call into do_stack_scanning() whenever it's forced >> (via WhiteBox API) or if should_sweep has been determined by other >> heuristics (code-cache-change, time-since-last-sweep,..) >> >> - Instead let do_stack_scanning() not set _should_sweep anymore. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8211129 >> Webrev: >> http://cr.openjdk.java.net/~rkennke/JDK-8211129/webrev.00/ >> >> Testing: Fixes previously failing: >> compiler/whitebox/ForceNMethodSweepTest.java >> jdk/jfr/event/compiler/TestCodeSweeperStats.java >> >> Passes: hotspot/jtreg:tier1 >> > From rkennke at redhat.com Thu Sep 27 09:07:08 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 27 Sep 2018 11:07:08 +0200 Subject: RFR: JDK-8211129: compiler/whitebox/ForceNMethodSweepTest.java fails after JDK-8132849 In-Reply-To: References: Message-ID: Hi Tobias and Erik, the two tests pass now. I'm submitting it to jdk/submit and will push if it comes back clean. Thanks, Roman Am 27.09.18 um 11:05 schrieb Tobias Hartmann: > Hi Roman, > > this looks reasonable to me as well. Please verify that all failing tests now pass > (TestCodeSweeperStats.java, ForceNMethodSweepTest.java). > > Thanks, > Tobias > > On 26.09.2018 19:26, Roman Kennke wrote: >> Ping! This fixes two failing tests... >> >> (also changed subject to remove [Testbug]) >> >> Thanks, >> Roman >> >> >> Am 26.09.18 um 10:30 schrieb Roman Kennke: >>> Please review the following change: >>> >>> Several tests fail because after forcing nmethod sweep via Whitebox API, >>> the sweeper doesn't actually kick in. >>> >>> The reason is the changed heuristic in NMethodSweeper: before >>> JDK-8132849, we would scan stacks and mark nmethods at every safepoint, >>> during safepoint cleanup phase. This would subsequently trigger a sweep >>> cycle via _should_sweep. If no stack-scanning is performed, the sweeper >>> would skip sweeping because the CompiledMethodIterator _current has not >>> been reset. >>> >>> I propose to change the following: >>> >>> - In the sweep-loop, call into do_stack_scanning() whenever it's forced >>> (via WhiteBox API) or if should_sweep has been determined by other >>> heuristics (code-cache-change, time-since-last-sweep,..) >>> >>> - Instead let do_stack_scanning() not set _should_sweep anymore. >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8211129 >>> Webrev: >>> http://cr.openjdk.java.net/~rkennke/JDK-8211129/webrev.00/ >>> >>> Testing: Fixes previously failing: >>> compiler/whitebox/ForceNMethodSweepTest.java >>> jdk/jfr/event/compiler/TestCodeSweeperStats.java >>> >>> Passes: hotspot/jtreg:tier1 >>> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From adinn at redhat.com Thu Sep 27 09:23:05 2018 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 27 Sep 2018 10:23:05 +0100 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com> Message-ID: <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com> On 26/09/18 17:00, Alan Bateman wrote: > The reason that we've mentioned it a few times is because it's a > significant issue. If you have a byte buffer then you can't have > different threads accessing different parts of the buffer at the same > time, at least not with any of the relative get/put methods as they > depend on the buffer position. Sure you can globally synchronize all > operations but you'll likely want much finer granularity. This bugbear > comes up periodically, particularly when using buffers for cases that > they weren't really designed for. Stuart pointed out the lack of > absolute bulk get/put operations which is something that I think will > help some of these cases. Ok, I see that there is an issue here where only byte puts at absolute positions can be performed concurrently (assuming threads know how to avoid overlapping writes) while, by contrast, cursor-based byte[] stores require synchronization. Is that the problem in full? Or is there still more that I have missed? I certainly agree that a retro-fit to ByteBuffer which provided for byte[] puts at absolute positions would be of benefit for this proposal. However, such a retro-fix would be equally as useful for volatile memory buffers. I am not clear why this omission suggests to you that we should look at a new, alternative model for managing this particular type of mapped memory rather than just fixing the current one properly for all buffers. >> Also, can you explain what you mean by confinement? (thread >> confinement?). > Yes, thread vs. global. I haven't been following Panama close enough to > say how this is exposed in the API. Well, my vague stab was obviously in the right ballpark but I'm afraid I still don't know what baseball is. Could you explain what you mean by confinement? >> Also, I don't think I would label this API an attempt to develop a file >> system. I think that's rather and overblown characterisation of what it >> does. > I think you may have mis-read my mail as was just picking another > example where MBB would be problematic. Apologies for my very evident confusion here. I'd be very grateful if you could talk down a notch or two and/or amplify a bit more to help the hard of thinking. >> I'm still not quite sure where this reply leaves the JEP though. Shall I >> update the Risks and Assumptions section to include mention of >> JDK-5029431 as suggested to Stuart? Is there anything else I can do to >> progress things? >> > It wouldn't do any harm to have this section mention that an alternative > that exposes a more memory centric API may be possible in the future. Ok, I'll certainly add that. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From peter.levart at gmail.com Thu Sep 27 10:28:21 2018 From: peter.levart at gmail.com (Peter Levart) Date: Thu, 27 Sep 2018 12:28:21 +0200 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com> <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com> Message-ID: <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com> Hi Andrew, On 09/27/2018 11:23 AM, Andrew Dinn wrote: > On 26/09/18 17:00, Alan Bateman wrote: >> The reason that we've mentioned it a few times is because it's a >> significant issue. If you have a byte buffer then you can't have >> different threads accessing different parts of the buffer at the same >> time, at least not with any of the relative get/put methods as they >> depend on the buffer position. Sure you can globally synchronize all >> operations but you'll likely want much finer granularity. This bugbear >> comes up periodically, particularly when using buffers for cases that >> they weren't really designed for. Stuart pointed out the lack of >> absolute bulk get/put operations which is something that I think will >> help some of these cases. > Ok, I see that there is an issue here where only byte puts at absolute > positions can be performed concurrently (assuming threads know how to > avoid overlapping writes) while, by contrast, cursor-based byte[] stores > require synchronization. Is that the problem in full? Or is there still > more that I have missed? > > I certainly agree that a retro-fit to ByteBuffer which provided for > byte[] puts at absolute positions would be of benefit for this proposal. > However, such a retro-fix would be equally as useful for volatile memory > buffers. I am not clear why this omission suggests to you that we should > look at a new, alternative model for managing this particular type of > mapped memory rather than just fixing the current one properly for all > buffers. May I just note that multithreaded bulk operations are kind of possible without external synchronization (i.e. locks) if you follow a simple protocol: - never use relative operations on the shared ByteBuffer instance - never use operations that change internal mark/position/limit/byteOrder on the shared ByteBuffer instance - a concurrent bulk operation on 'bb' consists of: ByteBuffer myBb = bb.slice(0, bb.capacity()); // use myBb to perform concurrent bulk operation (any operations are allowed) and then throw it away or cache it in ThreadLocal If you combine this with explicit fences and/or atomic 16, 32 and 64 bit operations via VarHandles. (see MethodHandles.byteBufferViewVarHandle(Class, ByteOrder)), concurrent programming with ByteBuffer(s) is entirely possible. Regards, Peter > >>> Also, can you explain what you mean by confinement? (thread >>> confinement?). >> Yes, thread vs. global. I haven't been following Panama close enough to >> say how this is exposed in the API. > Well, my vague stab was obviously in the right ballpark but I'm afraid I > still don't know what baseball is. Could you explain what you mean by > confinement? > >>> Also, I don't think I would label this API an attempt to develop a file >>> system. I think that's rather and overblown characterisation of what it >>> does. >> I think you may have mis-read my mail as was just picking another >> example where MBB would be problematic. > Apologies for my very evident confusion here. I'd be very grateful if > you could talk down a notch or two and/or amplify a bit more to help the > hard of thinking. > >>> I'm still not quite sure where this reply leaves the JEP though. Shall I >>> update the Risks and Assumptions section to include mention of >>> JDK-5029431 as suggested to Stuart? Is there anything else I can do to >>> progress things? >>> >> It wouldn't do any harm to have this section mention that an alternative >> that exposes a more memory centric API may be possible in the future. > Ok, I'll certainly add that. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From adinn at redhat.com Thu Sep 27 10:41:11 2018 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 27 Sep 2018 11:41:11 +0100 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com> <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com> <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com> Message-ID: <65f5660a-807b-426a-7af5-d5f95a6089ba@redhat.com> Hi Peter, On 27/09/18 11:28, Peter Levart wrote: > May I just note that multithreaded bulk operations are kind of possible > without external synchronization (i.e. locks) if you follow a simple > protocol: > > - never use relative operations on the shared ByteBuffer instance > - never use operations that change internal > mark/position/limit/byteOrder on the shared ByteBuffer instance > - a concurrent bulk operation on 'bb' consists of: > > ByteBuffer myBb = bb.slice(0, bb.capacity()); > // use myBb to perform concurrent bulk operation (any operations are > allowed) and then throw it away or cache it in ThreadLocal > > If you combine this with explicit fences and/or atomic 16, 32 and 64 bit > operations via VarHandles. (see > MethodHandles.byteBufferViewVarHandle(Class, ByteOrder)), concurrent > programming with ByteBuffer(s) is entirely possible. Thank you for the usual expert advice. I am sure it will be of great help in implementing a persistent data management library over this JEP's base capability. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From rkennke at redhat.com Thu Sep 27 12:03:51 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 27 Sep 2018 14:03:51 +0200 Subject: RFR: JDK-8211219: Type inconsistency in LIRGenerator::atomic_cmpxchg(..) Message-ID: We spotted this in Shenandoah land. Doesn't seem to be catastrophic, but would be nice to fix: In c1_LIRGenerator_x86.cpp, towards the end of LIRGenerator::atomic_cmpxchg(..) there's this cmove: __ cmove(lir_cond_equal, LIR_OprFact::intConst(1), LIR_OprFact::intConst(0), result, type); which should use T_INT instead of the passed-in type. Bug: https://bugs.openjdk.java.net/browse/JDK-8211219 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8211219/webrev.00/ Testing: hotspot/jtreg:tier1 ok Ok? Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From igor.veresov at oracle.com Thu Sep 27 13:33:54 2018 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 27 Sep 2018 06:33:54 -0700 Subject: RFR: JDK-8211219: Type inconsistency in LIRGenerator::atomic_cmpxchg(..) In-Reply-To: References: Message-ID: Looks good. igor > On Sep 27, 2018, at 5:03 AM, Roman Kennke wrote: > > We spotted this in Shenandoah land. Doesn't seem to be catastrophic, but > would be nice to fix: > > In c1_LIRGenerator_x86.cpp, towards the end of > LIRGenerator::atomic_cmpxchg(..) there's this cmove: > > __ cmove(lir_cond_equal, LIR_OprFact::intConst(1), > LIR_OprFact::intConst(0), > result, type); > > which should use T_INT instead of the passed-in type. > > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8211219 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8211219/webrev.00/ > > Testing: hotspot/jtreg:tier1 ok > > Ok? > > Roman > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Thu Sep 27 13:58:51 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 27 Sep 2018 15:58:51 +0200 Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy In-Reply-To: <27315645-ff46-0e05-a22d-89c166f5bd8b@oracle.com> References: <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com> <3524b9d8-f387-a3c4-49c3-eace83126bb7@oracle.com> <27315645-ff46-0e05-a22d-89c166f5bd8b@oracle.com> Message-ID: Hi Vladimir, Thanks for the review and the testing. > A LOT tests failed. I did some last minute code refactoring after running tests and managed to break something. Sorry about that. The fix is: diff --git a/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp b/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp --- a/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp +++ b/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp @@ -598,6 +598,7 @@ ac->set_clonebasic(); Node* n = kit->gvn().transform(ac); if (n == ac) { + ac->_adr_type = TypeRawPtr::BOTTOM; kit->set_predefined_output_for_runtime_call(ac, ac->in(TypeFunc::Memory), raw_adr_type); } else { kit->set_all_memory(n); New webrev: http://cr.openjdk.java.net/~roland/8210887/webrev.01/ Roland. From rwestrel at redhat.com Thu Sep 27 14:36:29 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 27 Sep 2018 16:36:29 +0200 Subject: RFR(S): 8211231: BarrierSetC1::generate_referent_check() confuses register allocator Message-ID: http://cr.openjdk.java.net/~roland/8211231/webrev.00/ With Shenandoah, we had a crash in compiled code because a value was restored from a spill in a branch that's not always executed in BarrierSetC1::generate_referent_check(). That method generates code with control flow within a basic block. The register allocator is not aware of that control flow. So if a value that was spilled before is needed in a branch, the register allocator may decide to restore it and then assume it's live in a register from there. The fix I propose is to assign a temp register to that value and load it before any control flow. Details (intermediate representation and generated code) are here: http://mail.openjdk.java.net/pipermail/shenandoah-dev/2018-September/007605.html Roland. From igor.veresov at oracle.com Thu Sep 27 15:40:13 2018 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 27 Sep 2018 08:40:13 -0700 Subject: RFR(S): 8211231: BarrierSetC1::generate_referent_check() confuses register allocator In-Reply-To: References: Message-ID: <66CC2DB1-DF8B-4098-8C58-2C4A88DB82E4@oracle.com> Looks good to me. igor > On Sep 27, 2018, at 7:36 AM, Roland Westrelin wrote: > > > http://cr.openjdk.java.net/~roland/8211231/webrev.00/ > > With Shenandoah, we had a crash in compiled code because a value was > restored from a spill in a branch that's not always executed in > BarrierSetC1::generate_referent_check(). That method generates code with > control flow within a basic block. The register allocator is not aware > of that control flow. So if a value that was spilled before is needed in > a branch, the register allocator may decide to restore it and then > assume it's live in a register from there. The fix I propose is to > assign a temp register to that value and load it before any control > flow. > > Details (intermediate representation and generated code) are here: > > http://mail.openjdk.java.net/pipermail/shenandoah-dev/2018-September/007605.html > > Roland. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Thu Sep 27 15:52:32 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 27 Sep 2018 17:52:32 +0200 Subject: RFR(S): 8211232: GraphKit::make_runtime_call() sometimes attaches wrong memory state to call Message-ID: http://cr.openjdk.java.net/~roland/8211232/webrev.00/ This came up in shenandoah testing with XX:+ExtendedDTraceProbes. make_runtime_call() is called through make_dtrace_method_exit() from Parse::return_current(). Memory state at this point is: 137 Phi === 135 _ _ 91 [[ 74 141 145 150 152 162 166 168 179 182 187 193 202 211 216 225 228 237 242 258 266 274 282 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: IdentityHashMap::put @ bci:24 141 MergeMem === _ 1 137 1 1 279 1 275 282 [[ 142 ]] { - - N279:java/lang/Object+-8 * - N275:narrowoop: java/lang/Object *[int:>=0]+-8 * N282:narrowoop: java/lang/Object *[int:>=0]+any * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: IdentityHashMap::put @ bci:24 The Phi is a loop phi so not all its inputs are set yet. The following code in GraphKit::make_runtime_call(): assert(!wide_out, "narrow in => narrow out"); Node* narrow_mem = memory(adr_type); prev_mem = reset_memory(); map()->set_memory(narrow_mem); set the entire memory state to the phi. Next in GraphKit::set_predefined_input_for_runtime_call(): Node* memory = reset_memory(); causes the current memory state (the Phi) to be transformed which the GVN transforms to: 91 Phi === 89 _ _ 73 [[ 137 100 103 105 113 116 118 126 129 131 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: IdentityHashMap::put @ bci:24 the out of loop memory state and so the wrong state. Roland. From rwestrel at redhat.com Thu Sep 27 16:05:28 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 27 Sep 2018 18:05:28 +0200 Subject: RFR(S): 8211233: MemBarNode::trailing_membar() and MemBarNode::leading_membar() need to handle dying subgraphs better Message-ID: http://cr.openjdk.java.net/~roland/8211233/webrev.00/ I hit a bug where MemBarNode::leading_membar() doesn't return the right result because a dying part of the graph between a trailing and a leading membars is not properly handled. Roland. From vladimir.kozlov at oracle.com Thu Sep 27 19:15:27 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Sep 2018 12:15:27 -0700 Subject: RFR(S): 8211231: BarrierSetC1::generate_referent_check() confuses register allocator In-Reply-To: References: Message-ID: <2d430192-9a88-d3f3-40e1-e6d57f32e5f6@oracle.com> Good. thanks, Vladimir On 9/27/18 7:36 AM, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8211231/webrev.00/ > > With Shenandoah, we had a crash in compiled code because a value was > restored from a spill in a branch that's not always executed in > BarrierSetC1::generate_referent_check(). That method generates code with > control flow within a basic block. The register allocator is not aware > of that control flow. So if a value that was spilled before is needed in > a branch, the register allocator may decide to restore it and then > assume it's live in a register from there. The fix I propose is to > assign a temp register to that value and load it before any control > flow. > > Details (intermediate representation and generated code) are here: > > http://mail.openjdk.java.net/pipermail/shenandoah-dev/2018-September/007605.html > > Roland. > From rkennke at redhat.com Thu Sep 27 20:07:19 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 27 Sep 2018 22:07:19 +0200 Subject: RFR: JDK-8211241: Missing obj equals in TemplateTable::fast_aldc Message-ID: TemplateTable::fast_aldc compares the just-loaded reference with Universe::the_null_sentinel. If it really is that null-sentinel, we may get a false negative (with GCs like Shenandoah that allow both from-space and to-space copies of an object to be around), and thus skip NULL-ing the ref. In other words, it would allow to get the-null-sentinel out into the wild as oop which can cause subtle and not-so-subtle bugs. Fix is easy, call cmpoop() which re-routes through GC-interface for GCs that need it: http://cr.openjdk.java.net/~rkennke/JDK-8211241/webrev.00/ Testing: hotspot/jtreg:tier1 Ok? Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From daniel.daugherty at oracle.com Thu Sep 27 20:34:05 2018 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 27 Sep 2018 16:34:05 -0400 Subject: RFR: JDK-8211241: Missing obj equals in TemplateTable::fast_aldc In-Reply-To: References: Message-ID: <5492869d-c359-0a72-ebcd-fed24072ccc9@oracle.com> On 9/27/18 4:07 PM, Roman Kennke wrote: > TemplateTable::fast_aldc compares the just-loaded reference with > Universe::the_null_sentinel. If it really is that null-sentinel, we may > get a false negative (with GCs like Shenandoah that allow both > from-space and to-space copies of an object to be around), and thus skip > NULL-ing the ref. In other words, it would allow to get > the-null-sentinel out into the wild as oop which can cause subtle and > not-so-subtle bugs. > > Fix is easy, call cmpoop() which re-routes through GC-interface for GCs > that need it: > > http://cr.openjdk.java.net/~rkennke/JDK-8211241/webrev.00/ src/hotspot/cpu/aarch64/templateTable_aarch64.cpp ??? No comments. src/hotspot/cpu/x86/templateTable_x86.cpp ??? No comments. Thumbs up (on the change)! > Testing: hotspot/jtreg:tier1 Did you use jdk_submit or local testing? I don't expect build problems but the templateTable_x86.cpp will affect all X86/X64 platforms right? Dan > > Ok? > > Roman > From rkennke at redhat.com Thu Sep 27 20:37:42 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 27 Sep 2018 22:37:42 +0200 Subject: RFR: JDK-8211241: Missing obj equals in TemplateTable::fast_aldc In-Reply-To: <5492869d-c359-0a72-ebcd-fed24072ccc9@oracle.com> References: <5492869d-c359-0a72-ebcd-fed24072ccc9@oracle.com> Message-ID: <4b57bab3-723a-eedb-5654-5ce893c8573f@redhat.com> > On 9/27/18 4:07 PM, Roman Kennke wrote: >> TemplateTable::fast_aldc compares the just-loaded reference with >> Universe::the_null_sentinel. If it really is that null-sentinel, we may >> get a false negative (with GCs like Shenandoah that allow both >> from-space and to-space copies of an object to be around), and thus skip >> NULL-ing the ref. In other words, it would allow to get >> the-null-sentinel out into the wild as oop which can cause subtle and >> not-so-subtle bugs. >> >> Fix is easy, call cmpoop() which re-routes through GC-interface for GCs >> that need it: >> >> http://cr.openjdk.java.net/~rkennke/JDK-8211241/webrev.00/ > > src/hotspot/cpu/aarch64/templateTable_aarch64.cpp > ??? No comments. > > src/hotspot/cpu/x86/templateTable_x86.cpp > ??? No comments. > > Thumbs up (on the change)! > >> Testing: hotspot/jtreg:tier1 > > Did you use jdk_submit or local testing? I don't expect build > problems but the templateTable_x86.cpp will affect all X86/X64 > platforms right? I tested locally on x86_64 and aarch64. I always push my stuff through jdk/submit before pushing to jdk/jdk, usually after or during reviews. Thanks for reviewing! Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From daniel.daugherty at oracle.com Thu Sep 27 20:39:01 2018 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 27 Sep 2018 16:39:01 -0400 Subject: RFR: JDK-8211241: Missing obj equals in TemplateTable::fast_aldc In-Reply-To: <4b57bab3-723a-eedb-5654-5ce893c8573f@redhat.com> References: <5492869d-c359-0a72-ebcd-fed24072ccc9@oracle.com> <4b57bab3-723a-eedb-5654-5ce893c8573f@redhat.com> Message-ID: On 9/27/18 4:37 PM, Roman Kennke wrote: >> On 9/27/18 4:07 PM, Roman Kennke wrote: >>> TemplateTable::fast_aldc compares the just-loaded reference with >>> Universe::the_null_sentinel. If it really is that null-sentinel, we may >>> get a false negative (with GCs like Shenandoah that allow both >>> from-space and to-space copies of an object to be around), and thus skip >>> NULL-ing the ref. In other words, it would allow to get >>> the-null-sentinel out into the wild as oop which can cause subtle and >>> not-so-subtle bugs. >>> >>> Fix is easy, call cmpoop() which re-routes through GC-interface for GCs >>> that need it: >>> >>> http://cr.openjdk.java.net/~rkennke/JDK-8211241/webrev.00/ >> src/hotspot/cpu/aarch64/templateTable_aarch64.cpp >> ??? No comments. >> >> src/hotspot/cpu/x86/templateTable_x86.cpp >> ??? No comments. >> >> Thumbs up (on the change)! >> >>> Testing: hotspot/jtreg:tier1 >> Did you use jdk_submit or local testing? I don't expect build >> problems but the templateTable_x86.cpp will affect all X86/X64 >> platforms right? > I tested locally on x86_64 and aarch64. I always push my stuff through > jdk/submit before pushing to jdk/jdk, usually after or during reviews. Thanks for confirming the testing. Dan > > Thanks for reviewing! > Roman > From vladimir.kozlov at oracle.com Thu Sep 27 21:13:30 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Sep 2018 14:13:30 -0700 Subject: RFR(S): 8211232: GraphKit::make_runtime_call() sometimes attaches wrong memory state to call In-Reply-To: References: Message-ID: Hi Roland, I understand that you want to avoid second reset_memory() and I agree. But I concern about your code for setting input memory for call. Why not to pass narrow_mem from memory(adr_type) to set_predefined_input_for_runtime_call() in this case and NULL in others and check for NULL to select which memory to set. memory(adr_type) will check for merge_mem. Thanks, Vladimir On 9/27/18 8:52 AM, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8211232/webrev.00/ > > This came up in shenandoah testing with XX:+ExtendedDTraceProbes. > > make_runtime_call() is called through make_dtrace_method_exit() from > Parse::return_current(). Memory state at this point is: > > 137 Phi === 135 _ _ 91 [[ 74 141 145 150 152 162 166 168 179 182 187 193 202 211 216 225 228 237 242 258 266 274 282 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: IdentityHashMap::put @ bci:24 > 141 MergeMem === _ 1 137 1 1 279 1 275 282 [[ 142 ]] { - - N279:java/lang/Object+-8 * - N275:narrowoop: java/lang/Object *[int:>=0]+-8 * N282:narrowoop: java/lang/Object *[int:>=0]+any * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: IdentityHashMap::put @ bci:24 > > The Phi is a loop phi so not all its inputs are set yet. The following > code in GraphKit::make_runtime_call(): > > assert(!wide_out, "narrow in => narrow out"); > Node* narrow_mem = memory(adr_type); > prev_mem = reset_memory(); > map()->set_memory(narrow_mem); > > set the entire memory state to the phi. Next in > GraphKit::set_predefined_input_for_runtime_call(): > > Node* memory = reset_memory(); > > causes the current memory state (the Phi) to be transformed which the > GVN transforms to: > > 91 Phi === 89 _ _ 73 [[ 137 100 103 105 113 116 118 126 129 131 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: IdentityHashMap::put @ bci:24 > > the out of loop memory state and so the wrong state. > > Roland. > From vladimir.kozlov at oracle.com Thu Sep 27 21:22:31 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Sep 2018 14:22:31 -0700 Subject: RFR(S): 8210389: C2: assert(n->outcnt() != 0 || C->top() == n || n->is_Proj()) failed: No dead instructions after post-alloc In-Reply-To: References: Message-ID: <99bc7d08-614c-9a45-0fdd-2dac729122d9@oracle.com> Why you are not using subsume_by()? Thanks, Vladimir On 9/18/18 12:47 PM, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8210389/webrev.00/ > > With volatile loads, the trailing membar has an edge to the load. After > optimizations, that edge can point to a chain of Phis and the membar can > be the one use that keeps the phis alive. After matching, that required > edge is converted to a precedence edge. Liveness analysis ignores > precedence edges, the chain of phis is killed and register allocation > finds a node with no use. > > As a fix, I propose that, at the end of optimizations, the edge between > the volatile load's membar and the phis be removed and all dead phis be > killed. As I understand, that edge is not required for correctness > because anti dependencies detection code adds a precedence edge between > a volatile load and its membar if needed. I ran full jcstress on x86 and > aarch64 with this patch successfully. > > Roland. > From vladimir.kozlov at oracle.com Thu Sep 27 21:25:19 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Sep 2018 14:25:19 -0700 Subject: RFR(S): 8211233: MemBarNode::trailing_membar() and MemBarNode::leading_membar() need to handle dying subgraphs better In-Reply-To: References: Message-ID: Looks good. Thanks, Vladimir On 9/27/18 9:05 AM, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8211233/webrev.00/ > > I hit a bug where MemBarNode::leading_membar() doesn't return the right > result because a dying part of the graph between a trailing and a > leading membars is not properly handled. > > Roland. > From vladimir.kozlov at oracle.com Thu Sep 27 21:28:28 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Sep 2018 14:28:28 -0700 Subject: RFR(S): JDK-8191339: [JVMCI] BigInteger compiler intrinsics on Graal. In-Reply-To: <661f70d5-7a09-d181-5669-9841b590c7a3@oracle.com> References: <28011331-bd43-2c32-dba4-e41879ffe28a@oracle.com> <02f34a26-2a97-6a30-384f-115327781aac@oracle.com> <661f70d5-7a09-d181-5669-9841b590c7a3@oracle.com> Message-ID: <343cef9d-15f2-a01d-d3e3-0d4e2bc4cb06@oracle.com> Good. Thanks, Vladimir On 9/20/18 2:53 AM, Patric Hedlin wrote: > Hi Vladimir, Andrew, > > Sorry for dropping this after vacation. The testing is a simplistic benchmark (soon to be... I hope) > added to Graal (and some directed, a bit to ad hoc, testing not meant for up-streaming to Graal). I > also used a simplified version of a more general JVMCI/VM test case for these options only, but it > really does only exercise the JVMCI (not the option propagation in Graal or some other JVMCI > "client"), making it less useful. > > But in essence, Graal is the test-case. > > > On 2018-06-22 18:04, Vladimir Kozlov wrote: >> Hi Patric, >> >> Do you need Graal changes for this? Or it already has these intrinsics and the only problem is >> these flags were not set in vm_version_x86.cpp? > > No further changes have been made to Graal. > >> >> Small note. In vm_version_x86.cpp previous code has already COMPILER2_OR_JVMCI check. You can >> remove previous #endif and new #ifdef. Also change comment for closing #endif at line 1080 to // >> COMPILER2_OR_JVMCI >> >> 1080 #endif // COMPILER2 > > You are right (actually the intended webrev) and it should look correct now (just a tad old). > > Best regards, > Patric >> >> What testing you did? >> >> Thanks, >> Vladimir >> >> On 6/21/18 8:26 AM, Patric Hedlin wrote: >>> Dear all, >>> >>> I would like to ask for help to review the following change/update: >>> >>> Issue: https://bugs.openjdk.java.net/browse/JDK-8191339 >>> >>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8191339/ >>> >>> >>> 8191339: [JVMCI] BigInteger compiler intrinsics on Graal. >>> >>> ???? Enabling BigInteger intrinsics via JVMCI. >>> >>> >>> >>> Best regards, >>> Patric > From sandhya.viswanathan at intel.com Thu Sep 27 21:37:15 2018 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Thu, 27 Sep 2018 21:37:15 +0000 Subject: RFR(S):8211251:Default mask register for avx512 instructions Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12894@FMSMSX126.amr.corp.intel.com> Please find below a patch which cleans up K1 mask register handling for AVX 512 instructions. Currently unmasked instructions are encoded using k1 register which requires k1 register to be initialized properly and also reinitialized across JNI and Runtime calls. This patch encodes AVX 512 instructions as unmasked instruction with K0 encoding where the explicit mask register is not specified. RFE: https://bugs.openjdk.java.net/browse/JDK-8211251 Webrev: http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.00/ Best Regards, Sandhya -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Sep 27 22:24:03 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Sep 2018 15:24:03 -0700 Subject: RFR(S):8211251:Default mask register for avx512 instructions In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12894@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12894@FMSMSX126.amr.corp.intel.com> Message-ID: Looks good except PostLoopMultiversioning flag guarded changes in macroAssembler_x86.cpp which should be explained too. Thanks, Vladimir On 9/27/18 2:37 PM, Viswanathan, Sandhya wrote: > Please find below a patch which cleans up K1 mask register handling for AVX 512 instructions. > > Currently unmasked instructions are encoded using k1 register which requires k1 register to be > initialized properly and also reinitialized across JNI and Runtime calls. > > This patch encodes AVX 512 instructions as unmasked instruction with K0 encoding where the explicit > mask register is not specified. > > RFE: https://bugs.openjdk.java.net/browse/JDK-8211251 > > Webrev: http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.00/ > > > Best Regards, > > Sandhya > From sandhya.viswanathan at intel.com Thu Sep 27 23:40:19 2018 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Thu, 27 Sep 2018 23:40:19 +0000 Subject: RFR(S):8211251:Default mask register for avx512 instructions In-Reply-To: References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12894@FMSMSX126.amr.corp.intel.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF129C6@FMSMSX126.amr.corp.intel.com> Hi Vladimir, As you know the PostLoopMultiversioning needs to be fully redesigned and is currently disabled. That particular code should only be exercised when PostLoopMultiversioning is on. I could change it with an assert statement if that looks ok to you. Please let me know. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, September 27, 2018 3:24 PM To: Viswanathan, Sandhya ; hotspot compiler Subject: Re: RFR(S):8211251:Default mask register for avx512 instructions Looks good except PostLoopMultiversioning flag guarded changes in macroAssembler_x86.cpp which should be explained too. Thanks, Vladimir On 9/27/18 2:37 PM, Viswanathan, Sandhya wrote: > Please find below a patch which cleans up K1 mask register handling for AVX 512 instructions. > > Currently unmasked instructions are encoded using k1 register which requires k1 register to be > initialized properly and also reinitialized across JNI and Runtime calls. > > This patch encodes AVX 512 instructions as unmasked instruction with K0 encoding where the explicit > mask register is not specified. > > RFE: https://bugs.openjdk.java.net/browse/JDK-8211251 > > Webrev: http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.00/ > > > Best Regards, > > Sandhya > From vladimir.kozlov at oracle.com Thu Sep 27 23:49:45 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Sep 2018 16:49:45 -0700 Subject: RFR(S):8211251:Default mask register for avx512 instructions In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF129C6@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12894@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF129C6@FMSMSX126.amr.corp.intel.com> Message-ID: <1a675691-9178-eb7e-f517-72a2a9e0519c@oracle.com> Use guarantee() instead of assert so if someone to try to use it with product JDK it will fail. Thanks, Vladimir On 9/27/18 4:40 PM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > As you know the PostLoopMultiversioning needs to be fully redesigned and is currently disabled. > That particular code should only be exercised when PostLoopMultiversioning is on. > I could change it with an assert statement if that looks ok to you. > > Please let me know. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, September 27, 2018 3:24 PM > To: Viswanathan, Sandhya ; hotspot compiler > Subject: Re: RFR(S):8211251:Default mask register for avx512 instructions > > Looks good except PostLoopMultiversioning flag guarded changes in macroAssembler_x86.cpp which > should be explained too. > > Thanks, > Vladimir > > On 9/27/18 2:37 PM, Viswanathan, Sandhya wrote: >> Please find below a patch which cleans up K1 mask register handling for AVX 512 instructions. >> >> Currently unmasked instructions are encoded using k1 register which requires k1 register to be >> initialized properly and also reinitialized across JNI and Runtime calls. >> >> This patch encodes AVX 512 instructions as unmasked instruction with K0 encoding where the explicit >> mask register is not specified. >> >> RFE: https://bugs.openjdk.java.net/browse/JDK-8211251 >> >> Webrev: http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.00/ >> >> >> Best Regards, >> >> Sandhya >> From sandhya.viswanathan at intel.com Fri Sep 28 00:46:48 2018 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 28 Sep 2018 00:46:48 +0000 Subject: RFR(S):8211251:Default mask register for avx512 instructions In-Reply-To: <1a675691-9178-eb7e-f517-72a2a9e0519c@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12894@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF129C6@FMSMSX126.amr.corp.intel.com> <1a675691-9178-eb7e-f517-72a2a9e0519c@oracle.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12CE1@FMSMSX126.amr.corp.intel.com> Hi Vladimir, Please find the updated webrev with this change at: http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.01/ Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, September 27, 2018 4:50 PM To: Viswanathan, Sandhya ; hotspot compiler Subject: Re: RFR(S):8211251:Default mask register for avx512 instructions Use guarantee() instead of assert so if someone to try to use it with product JDK it will fail. Thanks, Vladimir On 9/27/18 4:40 PM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > As you know the PostLoopMultiversioning needs to be fully redesigned and is currently disabled. > That particular code should only be exercised when PostLoopMultiversioning is on. > I could change it with an assert statement if that looks ok to you. > > Please let me know. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, September 27, 2018 3:24 PM > To: Viswanathan, Sandhya ; hotspot > compiler > Subject: Re: RFR(S):8211251:Default mask register for avx512 > instructions > > Looks good except PostLoopMultiversioning flag guarded changes in > macroAssembler_x86.cpp which should be explained too. > > Thanks, > Vladimir > > On 9/27/18 2:37 PM, Viswanathan, Sandhya wrote: >> Please find below a patch which cleans up K1 mask register handling for AVX 512 instructions. >> >> Currently unmasked instructions are encoded using k1 register which >> requires k1 register to be initialized properly and also reinitialized across JNI and Runtime calls. >> >> This patch encodes AVX 512 instructions as unmasked instruction with >> K0 encoding where the explicit mask register is not specified. >> >> RFE: https://bugs.openjdk.java.net/browse/JDK-8211251 >> >> Webrev: http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.00/ >> >> >> Best Regards, >> >> Sandhya >> From erik.osterlund at oracle.com Fri Sep 28 01:21:26 2018 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 28 Sep 2018 03:21:26 +0200 Subject: RFR: JDK-8211219: Type inconsistency in LIRGenerator::atomic_cmpxchg(..) In-Reply-To: References: Message-ID: Hi Roman, Looks good. Thanks, /Erik On 2018-09-27 14:03, Roman Kennke wrote: > We spotted this in Shenandoah land. Doesn't seem to be catastrophic, but > would be nice to fix: > > In c1_LIRGenerator_x86.cpp, towards the end of > LIRGenerator::atomic_cmpxchg(..) there's this cmove: > > __ cmove(lir_cond_equal, LIR_OprFact::intConst(1), > LIR_OprFact::intConst(0), > result, type); > > which should use T_INT instead of the passed-in type. > > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8211219 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8211219/webrev.00/ > > Testing: hotspot/jtreg:tier1 ok > > Ok? > > Roman > From vladimir.kozlov at oracle.com Fri Sep 28 01:37:48 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Sep 2018 18:37:48 -0700 Subject: RFR(S):8211251:Default mask register for avx512 instructions In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12CE1@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12894@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF129C6@FMSMSX126.amr.corp.intel.com> <1a675691-9178-eb7e-f517-72a2a9e0519c@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12CE1@FMSMSX126.amr.corp.intel.com> Message-ID: This looks fine. I assume you did testing. It only affects avx512 machines - right? Thanks, Vladimir On 9/27/18 5:46 PM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > Please find the updated webrev with this change at: > http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.01/ > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, September 27, 2018 4:50 PM > To: Viswanathan, Sandhya ; hotspot compiler > Subject: Re: RFR(S):8211251:Default mask register for avx512 instructions > > Use guarantee() instead of assert so if someone to try to use it with product JDK it will fail. > > Thanks, > Vladimir > > On 9/27/18 4:40 PM, Viswanathan, Sandhya wrote: >> Hi Vladimir, >> >> As you know the PostLoopMultiversioning needs to be fully redesigned and is currently disabled. >> That particular code should only be exercised when PostLoopMultiversioning is on. >> I could change it with an assert statement if that looks ok to you. >> >> Please let me know. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, September 27, 2018 3:24 PM >> To: Viswanathan, Sandhya ; hotspot >> compiler >> Subject: Re: RFR(S):8211251:Default mask register for avx512 >> instructions >> >> Looks good except PostLoopMultiversioning flag guarded changes in >> macroAssembler_x86.cpp which should be explained too. >> >> Thanks, >> Vladimir >> >> On 9/27/18 2:37 PM, Viswanathan, Sandhya wrote: >>> Please find below a patch which cleans up K1 mask register handling for AVX 512 instructions. >>> >>> Currently unmasked instructions are encoded using k1 register which >>> requires k1 register to be initialized properly and also reinitialized across JNI and Runtime calls. >>> >>> This patch encodes AVX 512 instructions as unmasked instruction with >>> K0 encoding where the explicit mask register is not specified. >>> >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8211251 >>> >>> Webrev: http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.00/ >>> >>> >>> Best Regards, >>> >>> Sandhya >>> From vladimir.kozlov at oracle.com Fri Sep 28 01:50:34 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Sep 2018 18:50:34 -0700 Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy In-Reply-To: References: <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com> <3524b9d8-f387-a3c4-49c3-eace83126bb7@oracle.com> <27315645-ff46-0e05-a22d-89c166f5bd8b@oracle.com> Message-ID: <39b8c3c2-d412-06fa-9162-1d80f067ed11@oracle.com> gc/epsilon/TestManyThreads.java test failed on SPARC I add information and replay file to bug report. Vladimir On 9/27/18 6:58 AM, Roland Westrelin wrote: > > Hi Vladimir, > > Thanks for the review and the testing. > >> A LOT tests failed. > > I did some last minute code refactoring after running tests and managed > to break something. Sorry about that. > > The fix is: > > diff --git a/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp b/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp > --- a/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp > +++ b/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp > @@ -598,6 +598,7 @@ > ac->set_clonebasic(); > Node* n = kit->gvn().transform(ac); > if (n == ac) { > + ac->_adr_type = TypeRawPtr::BOTTOM; > kit->set_predefined_output_for_runtime_call(ac, ac->in(TypeFunc::Memory), raw_adr_type); > } else { > kit->set_all_memory(n); > > > New webrev: > > http://cr.openjdk.java.net/~roland/8210887/webrev.01/ > > Roland. > From stuart.marks at oracle.com Fri Sep 28 05:51:45 2018 From: stuart.marks at oracle.com (Stuart Marks) Date: Thu, 27 Sep 2018 22:51:45 -0700 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <65f5660a-807b-426a-7af5-d5f95a6089ba@redhat.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com> <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com> <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com> <65f5660a-807b-426a-7af5-d5f95a6089ba@redhat.com> Message-ID: Hi Andrew, Let me first stay that this issue of "ByteBuffer might not be the right answer" is something of a digression from the JEP discussion. I think the JEP should proceed forward using MBB with the API that you and Alan had discussed previously. At most, the discussion of the "right thing" issue might affect a side note in the JEP text about possible limitations and future directions of this effort. However, it's not a blocker to the JEP making progress as far as I'm concerned. With that in mind, I'll discuss the issue of multithreaded access to ByteBuffers and how this bears on whether buffers are or aren't the "right answer." There are actually several issues that figure into the "right answer" analysis. In this message, though, I'll just focus on the issue of multithreaded access. To recap (possibly for the benefit of other readers) the Buffer class doc has the following statement: Buffers are not safe for use by multiple concurrent threads. If a buffer is to be used by more than one thread then access to the buffer should be controlled by appropriate synchronization. Buffers are primarily designed for sequential operations such as I/O or codeset conversion. Typical buffer operations set the mark, position, and limit before initiating the operation. If the operation completes partially -- not uncommon with I/O or codeset conversion -- the position is updated so that the operation can be resumed easily from where it left off. The fact that buffers not only contain the data being operated upon but also mutable state information such as mark/position/limit makes it difficult to have multiple threads operate on different parts of the same buffer. Each thread would have to lock around setting the position and limit and performing the operation, preventing any parallelism. The typical way to deal with this is to create multiple buffer slices, one per thread. Each slice has its own mark/position/limit values but shares the same backing data. We can avoid the need for this by adding absolute bulk operations, right? Let's suppose we were to add something like this (considering ByteBuffer only, setting the buffer views aside): get(int srcOff, byte[] dst, int dstOff, int length) put(int dstOff, byte[] src, int srcOff, int length) Each thread can perform its operations on a different part of the buffer, in parallel, without interference from the others. Presumably these operations don't read or write the mark and position. Oh, wait. The existing absolute put and get overloads *do* respect the buffer's limit, so the absolute bulk operations ought to as well. This means they do depend on shared state. (I guess we could make the absolute bulk ops not respect the limit, but that seems inconsistent.) OK, let's adopt an approach similar to what was described by Peter Levart a couple messages upthread, where a) there is an initialization step where various things including the limit are set properly; b) the buffer is published to the worker threads properly, e.g., using a lock or other suitable memory operation; and c) all worker threads agree only to use absolute operations and to avoid relative operations. Now suppose the threads have completed their work and you want to, say, write the buffer's contents to a channel. You have to carefully make sure the threads are all finished and properly publish their results back to some central thread, have that central thread receive the results, set the position and limit, after which the central thread can initiate the I/O operation. This can certainly be made to work. But note what we just did. We now have an API where: - there are different "phases", where in one phase all the methods work, but in another phase only certain methods work (otherwise it breaks silently); - you have to carefully control all the code to ensure that the wrong methods aren't called when the buffer is in the wrong phase (otherwise it breaks silently); and - you can't hand off the buffer to a library (3rd party or JDK) without carefully orchestrating a transition into the right phase (otherwise it breaks silently). Frankly, this is pretty crappy. It's certainly possible to work around it. People do, and it is painful, and they complain about it up and down all day long (and rightfully so). Note that this discussion is based primarily on looking at the ByteBuffer API. I have not done extensive investigation of the impact of the various buffer views (IntBuffer, LongBuffer, etc.), nor have I looked thoroughly at the implementations. I have no doubt that we will run into additional issues when we do those investigations. If we were designing an API to support multi-threaded access to memory regions, it would almost certainly look nothing like the buffer API. This is what Alan means by "buffers might not be the right answer." As things stand, it appears quite difficult to me to fix the multi-threaded access problem without turning buffers into something they aren't, or fragmenting the API in some complex and uncomfortable way. Finally, note that this is not an argument against adding bulk absolute operations! I think we should probably go ahead and do that anyway. But let's not fool ourselves into thinking that bulk absolute operations solve the multi-threaded buffer access problem. s'marks From peter.levart at gmail.com Fri Sep 28 07:21:13 2018 From: peter.levart at gmail.com (Peter Levart) Date: Fri, 28 Sep 2018 09:21:13 +0200 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com> <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com> <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com> <65f5660a-807b-426a-7af5-d5f95a6089ba@redhat.com> Message-ID: Hi Stuart, I mostly agree with your assessment about the suitability of the ByteBuffer API for nice multithreaded use. What would such API look like? I think pretty much like ByteBuffer but without things that mutate mark/position/limit/ByteOrder. A stripped-down ByteBuffer API therefore. That would be in my opinion the most low-level API possible. If you add things to such API that coordinate multithreaded access to the underlying memory, you are already creating a concurrent data structure for a particular set of use cases, which might not cover all possible use cases or be sub-optimal at some of them. So I think this is better layered on top of such API not built into it. Low-level multithreaded access to memory is, in my opinion, always going to be "unsafe" from the standpoint of coordination. It's not only the mark/position/limit/ByteOrder that is not multithreaded-friendly about ByteBuffer API, but the underlying memory too. It would be nice if mark/position/limit/ByteOrder weren't in the way though. Regards, Peter On 09/28/2018 07:51 AM, Stuart Marks wrote: > Hi Andrew, > > Let me first stay that this issue of "ByteBuffer might not be the > right answer" is something of a digression from the JEP discussion. I > think the JEP should proceed forward using MBB with the API that you > and Alan had discussed previously. At most, the discussion of the > "right thing" issue might affect a side note in the JEP text about > possible limitations and future directions of this effort. However, > it's not a blocker to the JEP making progress as far as I'm concerned. > > With that in mind, I'll discuss the issue of multithreaded access to > ByteBuffers and how this bears on whether buffers are or aren't the > "right answer." There are actually several issues that figure into the > "right answer" analysis. In this message, though, I'll just focus on > the issue of multithreaded access. > > To recap (possibly for the benefit of other readers) the Buffer class > doc has the following statement: > > ??? Buffers are not safe for use by multiple concurrent threads. If a > buffer > ??? is to be used by more than one thread then access to the buffer > should be > ??? controlled by appropriate synchronization. > > Buffers are primarily designed for sequential operations such as I/O > or codeset conversion. Typical buffer operations set the mark, > position, and limit before initiating the operation. If the operation > completes partially -- not uncommon with I/O or codeset conversion -- > the position is updated so that the operation can be resumed easily > from where it left off. > > The fact that buffers not only contain the data being operated upon > but also mutable state information such as mark/position/limit makes > it difficult to have multiple threads operate on different parts of > the same buffer. Each thread would have to lock around setting the > position and limit and performing the operation, preventing any > parallelism. The typical way to deal with this is to create multiple > buffer slices, one per thread. Each slice has its own > mark/position/limit values but shares the same backing data. > > We can avoid the need for this by adding absolute bulk operations, right? > > Let's suppose we were to add something like this (considering > ByteBuffer only, setting the buffer views aside): > > ??? get(int srcOff, byte[] dst, int dstOff, int length) > ??? put(int dstOff, byte[] src, int srcOff, int length) > > Each thread can perform its operations on a different part of the > buffer, in parallel, without interference from the others. Presumably > these operations don't read or write the mark and position. Oh, wait. > The existing absolute put and get overloads *do* respect the buffer's > limit, so the absolute bulk operations ought to as well. This means > they do depend on shared state. (I guess we could make the absolute > bulk ops not respect the limit, but that seems inconsistent.) > > OK, let's adopt an approach similar to what was described by Peter > Levart a couple messages upthread, where a) there is an initialization > step where various things including the limit are set properly; b) the > buffer is published to the worker threads properly, e.g., using a lock > or other suitable memory operation; and c) all worker threads agree > only to use absolute operations and to avoid relative operations. > > Now suppose the threads have completed their work and you want to, > say, write the buffer's contents to a channel. You have to carefully > make sure the threads are all finished and properly publish their > results back to some central thread, have that central thread receive > the results, set the position and limit, after which the central > thread can initiate the I/O operation. > > This can certainly be made to work. > > But note what we just did. We now have an API where: > > ?- there are different "phases", where in one phase all the methods > work, but in another phase only certain methods work (otherwise it > breaks silently); > > ?- you have to carefully control all the code to ensure that the wrong > methods aren't called when the buffer is in the wrong phase (otherwise > it breaks silently); and > > ?- you can't hand off the buffer to a library (3rd party or JDK) > without carefully orchestrating a transition into the right phase > (otherwise it breaks silently). > > Frankly, this is pretty crappy. It's certainly possible to work around > it. People do, and it is painful, and they complain about it up and > down all day long (and rightfully so). > > Note that this discussion is based primarily on looking at the > ByteBuffer API. I have not done extensive investigation of the impact > of the various buffer views (IntBuffer, LongBuffer, etc.), nor have I > looked thoroughly at the implementations. I have no doubt that we will > run into additional issues when we do those investigations. > > If we were designing an API to support multi-threaded access to memory > regions, it would almost certainly look nothing like the buffer API. > This is what Alan means by "buffers might not be the right answer." As > things stand, it appears quite difficult to me to fix the > multi-threaded access problem without turning buffers into something > they aren't, or fragmenting the API in some complex and uncomfortable > way. > > Finally, note that this is not an argument against adding bulk > absolute operations! I think we should probably go ahead and do that > anyway. But let's not fool ourselves into thinking that bulk absolute > operations solve the multi-threaded buffer access problem. > > s'marks > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Fri Sep 28 07:27:17 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 28 Sep 2018 09:27:17 +0200 Subject: RFR(S): 8211231: BarrierSetC1::generate_referent_check() confuses register allocator In-Reply-To: <2d430192-9a88-d3f3-40e1-e6d57f32e5f6@oracle.com> References: <2d430192-9a88-d3f3-40e1-e6d57f32e5f6@oracle.com> Message-ID: Thanks Igor and Vladimir for the reviews. Roland. From nigro.fra at gmail.com Fri Sep 28 07:38:48 2018 From: nigro.fra at gmail.com (Francesco Nigro) Date: Fri, 28 Sep 2018 09:38:48 +0200 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com> <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com> <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com> <65f5660a-807b-426a-7af5-d5f95a6089ba@redhat.com> Message-ID: Hi guys! I'm one of the mentioned devs (like many others) that are using external (and unsafe) APIs to concurrent access ByteBuffer's content and a developer of a messaging broker's journal that would benefit by this JEP :) Re concurrent access API, how it looks https://github.com/real-logic/agrona/blob/master/agrona/src/main/java/org/agrona/concurrent/AtomicBuffer.java ? note: I don't know how's considered to appear in these discussions without presenting myself and I hope to not be OT, but both this JEP and the comments around are so interesting that I couldn't resist: I apologize if I'm not respecting some rule on it Thanks for the hard work, Francesco Il giorno ven 28 set 2018 alle ore 09:21 Peter Levart < peter.levart at gmail.com> ha scritto: > Hi Stuart, > > I mostly agree with your assessment about the suitability of the > ByteBuffer API for nice multithreaded use. What would such API look like? I > think pretty much like ByteBuffer but without things that mutate > mark/position/limit/ByteOrder. A stripped-down ByteBuffer API therefore. > That would be in my opinion the most low-level API possible. If you add > things to such API that coordinate multithreaded access to the underlying > memory, you are already creating a concurrent data structure for a > particular set of use cases, which might not cover all possible use cases > or be sub-optimal at some of them. So I think this is better layered on top > of such API not built into it. Low-level multithreaded access to memory is, > in my opinion, always going to be "unsafe" from the standpoint of > coordination. It's not only the mark/position/limit/ByteOrder that is not > multithreaded-friendly about ByteBuffer API, but the underlying memory too. > It would be nice if mark/position/limit/ByteOrder weren't in the way though. > > Regards, Peter > > > On 09/28/2018 07:51 AM, Stuart Marks wrote: > > Hi Andrew, > > Let me first stay that this issue of "ByteBuffer might not be the right > answer" is something of a digression from the JEP discussion. I think the > JEP should proceed forward using MBB with the API that you and Alan had > discussed previously. At most, the discussion of the "right thing" issue > might affect a side note in the JEP text about possible limitations and > future directions of this effort. However, it's not a blocker to the JEP > making progress as far as I'm concerned. > > With that in mind, I'll discuss the issue of multithreaded access to > ByteBuffers and how this bears on whether buffers are or aren't the "right > answer." There are actually several issues that figure into the "right > answer" analysis. In this message, though, I'll just focus on the issue of > multithreaded access. > > To recap (possibly for the benefit of other readers) the Buffer class doc > has the following statement: > > Buffers are not safe for use by multiple concurrent threads. If a > buffer > is to be used by more than one thread then access to the buffer should > be > controlled by appropriate synchronization. > > Buffers are primarily designed for sequential operations such as I/O or > codeset conversion. Typical buffer operations set the mark, position, and > limit before initiating the operation. If the operation completes partially > -- not uncommon with I/O or codeset conversion -- the position is updated > so that the operation can be resumed easily from where it left off. > > The fact that buffers not only contain the data being operated upon but > also mutable state information such as mark/position/limit makes it > difficult to have multiple threads operate on different parts of the same > buffer. Each thread would have to lock around setting the position and > limit and performing the operation, preventing any parallelism. The typical > way to deal with this is to create multiple buffer slices, one per thread. > Each slice has its own mark/position/limit values but shares the same > backing data. > > We can avoid the need for this by adding absolute bulk operations, right? > > Let's suppose we were to add something like this (considering ByteBuffer > only, setting the buffer views aside): > > get(int srcOff, byte[] dst, int dstOff, int length) > put(int dstOff, byte[] src, int srcOff, int length) > > Each thread can perform its operations on a different part of the buffer, > in parallel, without interference from the others. Presumably these > operations don't read or write the mark and position. Oh, wait. The > existing absolute put and get overloads *do* respect the buffer's limit, so > the absolute bulk operations ought to as well. This means they do depend on > shared state. (I guess we could make the absolute bulk ops not respect the > limit, but that seems inconsistent.) > > OK, let's adopt an approach similar to what was described by Peter Levart > a couple messages upthread, where a) there is an initialization step where > various things including the limit are set properly; b) the buffer is > published to the worker threads properly, e.g., using a lock or other > suitable memory operation; and c) all worker threads agree only to use > absolute operations and to avoid relative operations. > > Now suppose the threads have completed their work and you want to, say, > write the buffer's contents to a channel. You have to carefully make sure > the threads are all finished and properly publish their results back to > some central thread, have that central thread receive the results, set the > position and limit, after which the central thread can initiate the I/O > operation. > > This can certainly be made to work. > > But note what we just did. We now have an API where: > > - there are different "phases", where in one phase all the methods work, > but in another phase only certain methods work (otherwise it breaks > silently); > > - you have to carefully control all the code to ensure that the wrong > methods aren't called when the buffer is in the wrong phase (otherwise it > breaks silently); and > > - you can't hand off the buffer to a library (3rd party or JDK) without > carefully orchestrating a transition into the right phase (otherwise it > breaks silently). > > Frankly, this is pretty crappy. It's certainly possible to work around it. > People do, and it is painful, and they complain about it up and down all > day long (and rightfully so). > > Note that this discussion is based primarily on looking at the ByteBuffer > API. I have not done extensive investigation of the impact of the various > buffer views (IntBuffer, LongBuffer, etc.), nor have I looked thoroughly at > the implementations. I have no doubt that we will run into additional > issues when we do those investigations. > > If we were designing an API to support multi-threaded access to memory > regions, it would almost certainly look nothing like the buffer API. This > is what Alan means by "buffers might not be the right answer." As things > stand, it appears quite difficult to me to fix the multi-threaded access > problem without turning buffers into something they aren't, or fragmenting > the API in some complex and uncomfortable way. > > Finally, note that this is not an argument against adding bulk absolute > operations! I think we should probably go ahead and do that anyway. But > let's not fool ourselves into thinking that bulk absolute operations solve > the multi-threaded buffer access problem. > > s'marks > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Fri Sep 28 08:23:15 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 28 Sep 2018 10:23:15 +0200 Subject: RFR(S): 8210389: C2: assert(n->outcnt() != 0 || C->top() == n || n->is_Proj()) failed: No dead instructions after post-alloc In-Reply-To: <99bc7d08-614c-9a45-0fdd-2dac729122d9@oracle.com> References: <99bc7d08-614c-9a45-0fdd-2dac729122d9@oracle.com> Message-ID: Hi Vladimir, Thanks for looking at this. > Why you are not using subsume_by()? Instead of disconnect_inputs()? Or as a replacement for the loop? I want all nodes that become useless as a result of the edge removal to be disconnected. subsume_by() wouldn't do as that AFAICT. Roland. From rwestrel at redhat.com Fri Sep 28 08:23:30 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 28 Sep 2018 10:23:30 +0200 Subject: RFR(S): 8211233: MemBarNode::trailing_membar() and MemBarNode::leading_membar() need to handle dying subgraphs better In-Reply-To: References: Message-ID: Hi Vladimir, Thanks for the review. Roland. From rwestrel at redhat.com Fri Sep 28 09:06:17 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 28 Sep 2018 11:06:17 +0200 Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy In-Reply-To: <39b8c3c2-d412-06fa-9162-1d80f067ed11@oracle.com> References: <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com> <3524b9d8-f387-a3c4-49c3-eace83126bb7@oracle.com> <27315645-ff46-0e05-a22d-89c166f5bd8b@oracle.com> <39b8c3c2-d412-06fa-9162-1d80f067ed11@oracle.com> Message-ID: > gc/epsilon/TestManyThreads.java test failed on SPARC > I add information and replay file to bug report. Thanks for the test result. The fix is: diff --git a/src/hotspot/share/opto/arraycopynode.cpp b/src/hotspot/share/opto/arraycopynode.cpp --- a/src/hotspot/share/opto/arraycopynode.cpp +++ b/src/hotspot/share/opto/arraycopynode.cpp @@ -422,7 +422,8 @@ Node *start_mem_dest = mm->memory_at(alias_idx_dest); Node* mem = start_mem_dest; - assert(copy_type != T_OBJECT, "only tightly coupled allocations for object arrays"); + BarrierSetC2* bs = BarrierSet::barrier_set()->barrier_set_c2(); + assert(copy_type != T_OBJECT || !bs->array_copy_requires_gc_barriers(false, T_OBJECT, false, BarrierSetC2::Optimization), "only tightly coupled allocations for object arrays"); bool same_alias = (alias_idx_src == alias_idx_dest); if (count > 0) { New webrev: http://cr.openjdk.java.net/~roland/8210887/webrev.02/ Roland. From rahul.v.raghavan at oracle.com Fri Sep 28 09:10:34 2018 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Fri, 28 Sep 2018 14:40:34 +0530 Subject: RFR: 8211168: Solaris-X64 build failure with error nreg hides the same name in an outer scope Message-ID: <18885331-d9de-6d52-d28f-c876849b76f8@oracle.com> Hi, Please review the following fix proposal contributed by Daniel Daugherty. - http://cr.openjdk.java.net/~rraghavan/8211168/webrev.00/ - https://bugs.openjdk.java.net/browse/JDK-8211168 (Pre-integration testing in progress) Thanks, Rahul From adinn at redhat.com Fri Sep 28 09:16:13 2018 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 28 Sep 2018 10:16:13 +0100 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com> <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com> <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com> <65f5660a-807b-426a-7af5-d5f95a6089ba@redhat.com> Message-ID: <2cdc5a22-0217-b985-9069-0fc72058f081@redhat.com> Hi Stuart, On 28/09/18 06:51, Stuart Marks wrote: > Let me first stay that this issue of "ByteBuffer might not be the right > answer" is something of a digression from the JEP discussion. I think > the JEP should proceed forward using MBB with the API that you and Alan > had discussed previously. At most, the discussion of the "right thing" > issue might affect a side note in the JEP text about possible > limitations and future directions of this effort. However, it's not a > blocker to the JEP making progress as far as I'm concerned. Thanks for clarifying that point. I have already added a note to that effect to the JEP. I take your other point that these limitations make this JEP a less useful addition than it could be. However, it's hard to see what else might usefully be provided that does not involve a reworking of JDK core-lib (and, potentially, JVM) functionality that has a much larger scope than is needed to crack the specific nut the JEP addresses. > With that in mind, I'll discuss the issue of multithreaded access to > ByteBuffers and how this bears on whether buffers are or aren't the > "right answer." There are actually several issues that figure into the > "right answer" analysis. In this message, though, I'll just focus on the > issue of multithreaded access. Thank you for a very clear and interesting summary of the limitations of the Buffer API. I have cut it from this reply for the sake of brevity but I will respond to a few points. I think the limitations you point out regarding concurrent clients' mode of operation are less severe in this specific case because there is not really a need for those client threads to reach a rendezvous point in order to execute some form of FileChannel update. The buffer content is persistent memory. So, essentially, the data writes constitute the update. If independent threads can arrange to coordinate over carving up separate regions of a persistent mapped buffer for parallel update then they can also write and flush (by which I mean force cache writeback for) those regions independently. Clearly there will also be a need for threads to write common index regions of the persistent mapped buffer in order to ensure that the associated data updates are committed. That means the writes and flushes for those common regions need to synchronize. However, that is simply business as usual for persistent data management code. A TX manager will already have code in place for this purpose, for example. Certainly, that synchronized update will not need to rely on buffer cursor (position) management. Also, I am not sure I see any problem arising from your point about absolute puts (and gets) depending on the 'limit' property. The various put operations do indeed /read/ the current limit but they do not update it. So, you are right to state that a persistent store management library built over this API would need to ensure that put operations were reined in via some form of rendezvous if it ever wanted to adjust the limit. However, I don't think that is going to happen with a librray that manages a mapped persistent store. I would expect that any such code is never going to call clear(), flip(), truncate() -- nor make a direct call to limit() -- except as part of the initialization or reconciliation performed at startup before concurrent clients are unleashed. Anyway, thank you for a clear warning as to the precise perils faced in implementing correct client libraries over the base layer this JEP proposes. > If we were designing an API to support multi-threaded access to memory > regions, it would almost certainly look nothing like the buffer API. > This is what Alan means by "buffers might not be the right answer." As > things stand, it appears quite difficult to me to fix the multi-threaded > access problem without turning buffers into something they aren't, or > fragmenting the API in some complex and uncomfortable way. Agreed. > Finally, note that this is not an argument against adding bulk absolute > operations! I think we should probably go ahead and do that anyway. But > let's not fool ourselves into thinking that bulk absolute operations > solve the multi-threaded buffer access problem. Also agreed. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From shade at redhat.com Fri Sep 28 10:24:58 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 28 Sep 2018 12:24:58 +0200 Subject: RFR: 8211168: Solaris-X64 build failure with error nreg hides the same name in an outer scope In-Reply-To: <18885331-d9de-6d52-d28f-c876849b76f8@oracle.com> References: <18885331-d9de-6d52-d28f-c876849b76f8@oracle.com> Message-ID: <8b945716-b537-95f8-bbd7-7979a82c92b1@redhat.com> On 09/28/2018 11:10 AM, Rahul Raghavan wrote: > Please review the following fix proposal contributed by Daniel Daugherty. > > - http://cr.openjdk.java.net/~rraghavan/8211168/webrev.00/ This looks good and trivially correct. I would just move the "int nreg" statement at the beginning of the method before all the blocks, but this build fix is fine too. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From rwestrel at redhat.com Fri Sep 28 11:06:56 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 28 Sep 2018 13:06:56 +0200 Subject: RFR(S): 8211232: GraphKit::make_runtime_call() sometimes attaches wrong memory state to call In-Reply-To: References: Message-ID: Hi Vladimir, Thanks for looking at this. > I understand that you want to avoid second reset_memory() and I agree. > But I concern about your code for setting input memory for call. Why not to pass narrow_mem from > memory(adr_type) to set_predefined_input_for_runtime_call() in this case and NULL in others and > check for NULL to select which memory to set. memory(adr_type) will check for merge_mem. Is this what you're suggesting? http://cr.openjdk.java.net/~roland/8211232/webrev.00/ Roland. From shade at redhat.com Fri Sep 28 11:16:50 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 28 Sep 2018 13:16:50 +0200 Subject: RFR 8211272: x86_32 build failures after JDK-8210764 (Update avx512 implementation) Message-ID: Bug: https://bugs.openjdk.java.net/browse/JDK-8211272 It is a trivial mistake: the braces got unbalanced when _LP64 is not defined, which is the case for x86_32 build. I would not bother to run it through jdk-submit. Fix: diff -r eb3e72f181af src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp --- a/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp Thu Sep 27 10:24:12 2018 +0200 +++ b/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp Fri Sep 28 13:08:20 2018 +0200 @@ -2401,12 +2401,13 @@ { #ifdef _LP64 if (UseAVX > 2 && !VM_Version::supports_avx512vl()) { assert(tmp->is_valid(), "need temporary"); __ vpandn(dest->as_xmm_double_reg(), tmp->as_xmm_double_reg(), value->as_xmm_double_reg(), 2); - } else { + } else #endif + { if (dest->as_xmm_double_reg() != value->as_xmm_double_reg()) { __ movdbl(dest->as_xmm_double_reg(), value->as_xmm_double_reg()); } assert(!tmp->is_valid(), "do not need temporary"); __ andpd(dest->as_xmm_double_reg(), Testing: x86_32 build, x86_64 build Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From daniel.daugherty at oracle.com Fri Sep 28 14:39:26 2018 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 28 Sep 2018 10:39:26 -0400 Subject: RFR: 8211168: Solaris-X64 build failure with error nreg hides the same name in an outer scope In-Reply-To: <18885331-d9de-6d52-d28f-c876849b76f8@oracle.com> References: <18885331-d9de-6d52-d28f-c876849b76f8@oracle.com> Message-ID: Thumbs up. Dan On 9/28/18 5:10 AM, Rahul Raghavan wrote: > Hi, > > Please review the following fix proposal contributed by Daniel Daugherty. > > - http://cr.openjdk.java.net/~rraghavan/8211168/webrev.00/ > > - https://bugs.openjdk.java.net/browse/JDK-8211168 > > (Pre-integration testing in progress) > > Thanks, > Rahul From rkennke at redhat.com Fri Sep 28 15:01:50 2018 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 28 Sep 2018 17:01:50 +0200 Subject: RFR 8211272: x86_32 build failures after JDK-8210764 (Update avx512 implementation) In-Reply-To: References: Message-ID: <817d02b0-31e8-e488-b374-6021aa475190@redhat.com> Ok fine. > Bug: > https://bugs.openjdk.java.net/browse/JDK-8211272 > > It is a trivial mistake: the braces got unbalanced when _LP64 is not defined, which is the case for > x86_32 build. I would not bother to run it through jdk-submit. jdk/submit seems unresponsive anyway since at least yesterday. Dunno what's up? Did it go up in flames because of the warnings stuff? Thanks, Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From vladimir.kozlov at oracle.com Fri Sep 28 16:10:56 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 28 Sep 2018 09:10:56 -0700 Subject: RFR 8211272: x86_32 build failures after JDK-8210764 (Update avx512 implementation) In-Reply-To: References: Message-ID: <76cca0b1-e4be-e0bc-d89c-1c8e3c389879@oracle.com> Good. You can push it since it is trivial. Thanks, Vladimir On 9/28/18 4:16 AM, Aleksey Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8211272 > > It is a trivial mistake: the braces got unbalanced when _LP64 is not defined, which is the case for > x86_32 build. I would not bother to run it through jdk-submit. > > Fix: > > diff -r eb3e72f181af src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp > --- a/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp Thu Sep 27 10:24:12 2018 +0200 > +++ b/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp Fri Sep 28 13:08:20 2018 +0200 > @@ -2401,12 +2401,13 @@ > { > #ifdef _LP64 > if (UseAVX > 2 && !VM_Version::supports_avx512vl()) { > assert(tmp->is_valid(), "need temporary"); > __ vpandn(dest->as_xmm_double_reg(), tmp->as_xmm_double_reg(), > value->as_xmm_double_reg(), 2); > - } else { > + } else > #endif > + { > if (dest->as_xmm_double_reg() != value->as_xmm_double_reg()) { > __ movdbl(dest->as_xmm_double_reg(), value->as_xmm_double_reg()); > } > assert(!tmp->is_valid(), "do not need temporary"); > __ andpd(dest->as_xmm_double_reg(), > > Testing: x86_32 build, x86_64 build > > Thanks, > -Aleksey > From vladimir.kozlov at oracle.com Fri Sep 28 16:43:41 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 28 Sep 2018 09:43:41 -0700 Subject: RFR(S): 8211232: GraphKit::make_runtime_call() sometimes attaches wrong memory state to call In-Reply-To: References: Message-ID: On 9/28/18 4:06 AM, Roland Westrelin wrote: > > Hi Vladimir, > > Thanks for looking at this. > >> I understand that you want to avoid second reset_memory() and I agree. >> But I concern about your code for setting input memory for call. Why not to pass narrow_mem from >> memory(adr_type) to set_predefined_input_for_runtime_call() in this case and NULL in others and >> check for NULL to select which memory to set. memory(adr_type) will check for merge_mem. > > Is this what you're suggesting? > > http://cr.openjdk.java.net/~roland/8211232/webrev.00/ Yes. Does it work for you? Thanks, Vladimir > > Roland. > From shade at redhat.com Fri Sep 28 16:47:59 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 28 Sep 2018 18:47:59 +0200 Subject: RFR 8211272: x86_32 build failures after JDK-8210764 (Update avx512 implementation) In-Reply-To: <76cca0b1-e4be-e0bc-d89c-1c8e3c389879@oracle.com> References: <76cca0b1-e4be-e0bc-d89c-1c8e3c389879@oracle.com> Message-ID: Right on. Pushed. -Aleksey On 09/28/2018 06:10 PM, Vladimir Kozlov wrote: > Good. You can push it since it is trivial. > > Thanks, > Vladimir > > On 9/28/18 4:16 AM, Aleksey Shipilev wrote: >> Bug: >> ?? https://bugs.openjdk.java.net/browse/JDK-8211272 >> >> It is a trivial mistake: the braces got unbalanced when _LP64 is not defined, which is the case for >> x86_32 build. I would not bother to run it through jdk-submit. >> >> Fix: >> >> diff -r eb3e72f181af src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp >> --- a/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp?????? Thu Sep 27 10:24:12 2018 +0200 >> +++ b/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp?????? Fri Sep 28 13:08:20 2018 +0200 >> @@ -2401,12 +2401,13 @@ >> ????????? { >> ? #ifdef _LP64 >> ??????????? if (UseAVX > 2 && !VM_Version::supports_avx512vl()) { >> ????????????? assert(tmp->is_valid(), "need temporary"); >> ????????????? __ vpandn(dest->as_xmm_double_reg(), tmp->as_xmm_double_reg(), >> value->as_xmm_double_reg(), 2); >> -????????? } else { >> +????????? } else >> ? #endif >> +????????? { >> ????????????? if (dest->as_xmm_double_reg() != value->as_xmm_double_reg()) { >> ??????????????? __ movdbl(dest->as_xmm_double_reg(), value->as_xmm_double_reg()); >> ????????????? } >> ????????????? assert(!tmp->is_valid(), "do not need temporary"); >> ????????????? __ andpd(dest->as_xmm_double_reg(), >> >> Testing: x86_32 build, x86_64 build >> >> Thanks, >> -Aleksey >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From sandhya.viswanathan at intel.com Fri Sep 28 17:04:03 2018 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 28 Sep 2018 17:04:03 +0000 Subject: RFR(S):8211251:Default mask register for avx512 instructions In-Reply-To: References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12894@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF129C6@FMSMSX126.amr.corp.intel.com> <1a675691-9178-eb7e-f517-72a2a9e0519c@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12CE1@FMSMSX126.amr.corp.intel.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12FCB@FMSMSX126.amr.corp.intel.com> Hi Vladimir, Yes, it only affects avx512 machines with UseAVX=3. I have run jtreg compiler tests on SKX, KNL and Haswell. Also ran SPECjvm2008. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, September 27, 2018 6:38 PM To: Viswanathan, Sandhya ; hotspot compiler Subject: Re: RFR(S):8211251:Default mask register for avx512 instructions This looks fine. I assume you did testing. It only affects avx512 machines - right? Thanks, Vladimir On 9/27/18 5:46 PM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > Please find the updated webrev with this change at: > http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.01/ > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, September 27, 2018 4:50 PM > To: Viswanathan, Sandhya ; hotspot compiler > Subject: Re: RFR(S):8211251:Default mask register for avx512 instructions > > Use guarantee() instead of assert so if someone to try to use it with product JDK it will fail. > > Thanks, > Vladimir > > On 9/27/18 4:40 PM, Viswanathan, Sandhya wrote: >> Hi Vladimir, >> >> As you know the PostLoopMultiversioning needs to be fully redesigned and is currently disabled. >> That particular code should only be exercised when PostLoopMultiversioning is on. >> I could change it with an assert statement if that looks ok to you. >> >> Please let me know. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, September 27, 2018 3:24 PM >> To: Viswanathan, Sandhya ; hotspot >> compiler >> Subject: Re: RFR(S):8211251:Default mask register for avx512 >> instructions >> >> Looks good except PostLoopMultiversioning flag guarded changes in >> macroAssembler_x86.cpp which should be explained too. >> >> Thanks, >> Vladimir >> >> On 9/27/18 2:37 PM, Viswanathan, Sandhya wrote: >>> Please find below a patch which cleans up K1 mask register handling for AVX 512 instructions. >>> >>> Currently unmasked instructions are encoded using k1 register which >>> requires k1 register to be initialized properly and also reinitialized across JNI and Runtime calls. >>> >>> This patch encodes AVX 512 instructions as unmasked instruction with >>> K0 encoding where the explicit mask register is not specified. >>> >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8211251 >>> >>> Webrev: http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.00/ >>> >>> >>> Best Regards, >>> >>> Sandhya >>> From vladimir.kozlov at oracle.com Fri Sep 28 17:20:25 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 28 Sep 2018 10:20:25 -0700 Subject: RFR(S):8211251:Default mask register for avx512 instructions In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12FCB@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12894@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF129C6@FMSMSX126.amr.corp.intel.com> <1a675691-9178-eb7e-f517-72a2a9e0519c@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12CE1@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B9EF12FCB@FMSMSX126.amr.corp.intel.com> Message-ID: <0439a3ae-1b18-f38a-6832-e44c2c2b8b05@oracle.com> Okay. Thanks. I submitted testing on avx512 machine too. Vladimir On 9/28/18 10:04 AM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > Yes, it only affects avx512 machines with UseAVX=3. I have run jtreg compiler tests on SKX, KNL and Haswell. Also ran SPECjvm2008. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, September 27, 2018 6:38 PM > To: Viswanathan, Sandhya ; hotspot compiler > Subject: Re: RFR(S):8211251:Default mask register for avx512 instructions > > This looks fine. I assume you did testing. It only affects avx512 machines - right? > > Thanks, > Vladimir > > On 9/27/18 5:46 PM, Viswanathan, Sandhya wrote: >> Hi Vladimir, >> >> Please find the updated webrev with this change at: >> http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.01/ >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, September 27, 2018 4:50 PM >> To: Viswanathan, Sandhya ; hotspot compiler >> Subject: Re: RFR(S):8211251:Default mask register for avx512 instructions >> >> Use guarantee() instead of assert so if someone to try to use it with product JDK it will fail. >> >> Thanks, >> Vladimir >> >> On 9/27/18 4:40 PM, Viswanathan, Sandhya wrote: >>> Hi Vladimir, >>> >>> As you know the PostLoopMultiversioning needs to be fully redesigned and is currently disabled. >>> That particular code should only be exercised when PostLoopMultiversioning is on. >>> I could change it with an assert statement if that looks ok to you. >>> >>> Please let me know. >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Thursday, September 27, 2018 3:24 PM >>> To: Viswanathan, Sandhya ; hotspot >>> compiler >>> Subject: Re: RFR(S):8211251:Default mask register for avx512 >>> instructions >>> >>> Looks good except PostLoopMultiversioning flag guarded changes in >>> macroAssembler_x86.cpp which should be explained too. >>> >>> Thanks, >>> Vladimir >>> >>> On 9/27/18 2:37 PM, Viswanathan, Sandhya wrote: >>>> Please find below a patch which cleans up K1 mask register handling for AVX 512 instructions. >>>> >>>> Currently unmasked instructions are encoded using k1 register which >>>> requires k1 register to be initialized properly and also reinitialized across JNI and Runtime calls. >>>> >>>> This patch encodes AVX 512 instructions as unmasked instruction with >>>> K0 encoding where the explicit mask register is not specified. >>>> >>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8211251 >>>> >>>> Webrev: http://cr.openjdk.java.net/~vdeshpande/k_register/webrev.00/ >>>> >>>> >>>> Best Regards, >>>> >>>> Sandhya >>>> From vladimir.kozlov at oracle.com Fri Sep 28 17:51:35 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 28 Sep 2018 10:51:35 -0700 Subject: RFR: 8208686: [AOT] JVMTI ResourceExhausted event repeated for same allocation In-Reply-To: <64cbe730-5c9a-a04d-9eee-a56abfbb8e07@oracle.com> References: <910A4A3C-3EEF-4167-84D5-9819C83D6FC1@oracle.com> <64cbe730-5c9a-a04d-9eee-a56abfbb8e07@oracle.com> Message-ID: <015e8416-a948-fdc5-46db-8b5d80ba52e8@oracle.com> To let you know, me and Tom R. did review these changes and agreed that it is the least intrusive changes for Hotspot shared code. Thanks, Vladimir On 9/25/18 8:11 AM, Daniel D. Daugherty wrote: > Adding serviceability-dev at ... since this is JVM/TI... > > Dan > > > On 9/25/18 10:48 AM, Doug Simon wrote: >> A major design point of Graal is to treat allocations as non-side effecting to give more freedom >> to the optimizer by reducing the number of distinct FrameStates that need to be managed. When >> failing an allocation, Graal will deoptimize to the last side effecting instruction before the >> allocation. This mean the VM code for heap allocation will potentially be executed twice, once >> from Graal compiled code and then again in the interpreter. While this is perfectly fine according >> to the JVM specification, it can cause confusing behavior for JVMTI based tools. They will receive >> 2 ResourceExhausted events for a single allocation. Furthermore, the first ResourceExhausted event >> (on the Graal allocation slow path) might denote a bytecode instruction that performs no >> allocation, making it hard to debug the memory failure. >> >> The proposed solution is to add an extra set of JVMCI VM runtime calls for allocation. These entry >> points will attempt the allocation and upon failure, >> skip side-effects such as posting JVMTI events or handling -XX:OnOutOfMemoryError. The compiled >> code using these entry points is expected deoptmize on null. >> >> The path from these new entry points to where allocation can fail goes through quite a bit of VM >> code. One could modify all these paths by: >> * Returning null instead of throwing an exception on failure. >> * Adding a `bool null_on_fail` argument to all relevant methods. >> * Adding extra null checking where necessary after each call to these methods when `null_on_fail >> == true`. >> This represents a significant number of changes. >> >> Instead, the proposed solution introduces a new _in_retryable_allocation thread-local. This way, >> only the entry points and allocation routines that raise an exception need to be modified. Failure >> is communicated back to the new entry points by throwing a special pre-allocated OOME object >> (i.e., Universe::out_of_memory_error_retry()) which must not propagate back to Java code. Use of >> this object is not strictly necessary; it is introduced to highlight/document the special >> allocation mode. >> >> The proposed solution is at http://cr.openjdk.java.net/~dnsimon/8208686. >> THE JBS bug is: https://bugs.openjdk.java.net/browse/JDK-8208686 >> >> -Doug > From vladimir.kozlov at oracle.com Fri Sep 28 18:24:57 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 28 Sep 2018 11:24:57 -0700 Subject: RFR(S): 8210389: C2: assert(n->outcnt() != 0 || C->top() == n || n->is_Proj()) failed: No dead instructions after post-alloc In-Reply-To: References: <99bc7d08-614c-9a45-0fdd-2dac729122d9@oracle.com> Message-ID: subsume_by() + disconnect_inputs() are used by other code in final_graph_reshaping_impl(). If it does not work in your case it may not work for other cases to and should be solved in general. May be we can modify final_graph_reshaping_walk() and final_graph_reshaping_impl() to remove dead code. Or do separate path over graph like PhaseRemoveUseless. One thing to point is that verify_graph_edges() call after Optimize() should have no_dead_code = true to catch all cases we missing. Thanks, Vladimir On 9/28/18 1:23 AM, Roland Westrelin wrote: > > Hi Vladimir, > > Thanks for looking at this. > >> Why you are not using subsume_by()? > > Instead of disconnect_inputs()? Or as a replacement for the loop? I want > all nodes that become useless as a result of the edge removal to be > disconnected. subsume_by() wouldn't do as that AFAICT. > > Roland. > From stuart.marks at oracle.com Fri Sep 28 20:12:44 2018 From: stuart.marks at oracle.com (Stuart Marks) Date: Fri, 28 Sep 2018 13:12:44 -0700 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <2cdc5a22-0217-b985-9069-0fc72058f081@redhat.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com> <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com> <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com> <65f5660a-807b-426a-7af5-d5f95a6089ba@redhat.com> <2cdc5a22-0217-b985-9069-0fc72058f081@redhat.com> Message-ID: <51f1741a-7854-a8fb-3d25-b4cd7fcfd7bf@oracle.com> On 9/28/18 2:16 AM, Andrew Dinn wrote: > Thanks for clarifying that point. I have already added a note to that > effect to the JEP. I take your other point that these limitations make > this JEP a less useful addition than it could be. However, it's hard to > see what else might usefully be provided that does not involve a > reworking of JDK core-lib (and, potentially, JVM) functionality that has > a much larger scope than is needed to crack the specific nut the JEP > addresses. I'm not sure I'd put it quite that way, "less useful than it could be." I guess it depends on what you think the JEP is about. If the JEP is about MBB, and MBB is at some point superseded by something else, then yes, I suppose that means this JEP is less useful than it might be. On the other hand, suppose that this JEP is primarily about NVM, including access, operations, API, architecture, life cycle issues, etc., and these happen to be surfaced through MBB today. If something supersedes MBB, then the concepts developed by this JEP can be retargeted to that other thing at the appropriate time. Or are the concepts developed by this JEP so closely intertwined with MBB that this idea of "retargeting" doesn't make sense? I don't know. > Thank you for a very clear and interesting summary of the limitations of > the Buffer API. I have cut it from this reply for the sake of brevity > but I will respond to a few points. Great, I'm glad this helped. I'm never quite sure whether writing these big essays is helpful. (Note also that there are OTHER limitations of the buffer API that I didn't cover, since the message was getting too long as it was. Example: 2GB limit.) > I think the limitations you point out regarding concurrent clients' mode > of operation are less severe in this specific case because there is not > really a need for those client threads to reach a rendezvous point in > order to execute some form of FileChannel update. The buffer content is > persistent memory. So, essentially, the data writes constitute the update. Sure. It may be that the use cases for NVM aren't particularly affected by limitations of the Buffer APIs. If so, so much the better! But there are other systems where the limitations imposed by buffers are so onerous that they've had to go directly to Unsafe. > Anyway, thank you for a clear warning as to the precise perils faced in > implementing correct client libraries over the base layer this JEP proposes. Yes, this is essentially it. When you run into a problem -- as every project does -- think about whether it's inherent to NVM, or whether it's incidental to NVM and is rooted in the use of Buffers. s'marks From stuart.marks at oracle.com Fri Sep 28 20:50:44 2018 From: stuart.marks at oracle.com (Stuart Marks) Date: Fri, 28 Sep 2018 13:50:44 -0700 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com> <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com> <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com> <65f5660a-807b-426a-7af5-d5f95a6089ba@redhat.com> Message-ID: On 9/28/18 12:21 AM, Peter Levart wrote: > I mostly agree with your assessment about the suitability of the ByteBuffer API > for nice multithreaded use. What would such API look like? I think pretty much > like ByteBuffer but without things that mutate mark/position/limit/ByteOrder. A > stripped-down ByteBuffer API therefore. That would be in my opinion the most > low-level API possible. If you add things to such API that coordinate > multithreaded access to the underlying memory, you are already creating a > concurrent data structure for a particular set of use cases, which might not > cover all possible use cases or be sub-optimal at some of them. So I think this > is better layered on top of such API not built into it. Low-level multithreaded > access to memory is, in my opinion, always going to be "unsafe" from the > standpoint of coordination. It's not only the mark/position/limit/ByteOrder that > is not multithreaded-friendly about ByteBuffer API, but the underlying memory > too. It would be nice if mark/position/limit/ByteOrder weren't in the way though. Right, getting mark/position/limit/ByteOrder out of the way would be a good first step. (I just realized that ByteOrder is mutable too!) I also think you're right that proceeding down a "classic" thread-safe object design won't be fruitful. We don't know what the right set of operations is yet, so it'll be difficult to know how to deal with thread safety. One complicating factor is timely deallocation. This is an existing problem with direct buffers and MappedByteBuffer (see JDK-4724038). If a "buffer" were confined to a single thread, it could be deallocated safely when that thread is finished. I don't know how to guarantee thread confinement though. On the other hand, if a "buffer" is exposed to multiple threads, deallocation requires that proper synchronization and checking be done so that subsequent operations are properly checked (so that they do something reasonable, like throw an exception) instead of accessing unmapped or repurposed memory. If checking is done, this pushes operations to be coarser-grained (bulk) so that the checking overhead is amortized over a more expensive operation. I know there has been some thought put into this in the Panama project, but I don't know exactly where it stands at the moment. See the MemoryRegion and Scope stuff. s'marks From stuart.marks at oracle.com Fri Sep 28 21:14:00 2018 From: stuart.marks at oracle.com (Stuart Marks) Date: Fri, 28 Sep 2018 14:14:00 -0700 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <4c8ae5fb-dec1-0685-1b7c-85170d55e6cd@oracle.com> <6b29fde7-c147-a346-6bee-c410c57c498d@redhat.com> <92bfe8a0-e4d9-a988-387a-fe66b530a260@gmail.com> <65f5660a-807b-426a-7af5-d5f95a6089ba@redhat.com> Message-ID: <3fe9410d-2d20-c26c-a62e-cf9ad47b529a@oracle.com> Hi Francesco, Thanks for the pointer to AtomicBuffer stuff. It's quite interesting. I don't know how directly relevant this JEP is your work. I guess that's really up to you and possibly Andrew Dinn. However, in my thinking, if you have useful comments and relevant questions, you're certainly welcome to participate in the discussion. Looking at the AtomicBuffer interface, I see that it supports reading and writing of a variety of data items, with a few different memory access modes. That reminds me of the VarHandles API. [1] This enables quite a number of different operations on a data item somewhere in memory, with a variety of memory access modes. What would AtomicBuffer look like if it were to use VarHandles? Or would AtomicBuffer be necessary at all if the rest of the library were to use VarHandles? Note that a VarHandle can be used to access an arbitrary item within a region of memory, such as an array or a ByteBuffer.[2] An obvious extension to VarHandle is to allow a long offset, not just an int offset. Note also that while many VarHandle methods return Object and take a varargs parameter of Object..., this does not imply that primitives are boxed! This is a bit of VM magic called "signature polymorphism"; see JVMS 2.9.3 [3]. s'marks [1] https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/invoke/VarHandle.html [2] https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/invoke/MethodHandles.html#byteBufferViewVarHandle(java.lang.Class,java.nio.ByteOrder) [3] https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-2.html#jvms-2.9.3 On 9/28/18 12:38 AM, Francesco Nigro wrote: > Hi guys! > > I'm one of the mentioned devs (like many others) that are using external (and > unsafe) APIs to concurrent access ByteBuffer's content and a developer of a > messaging broker's journal > that would benefit by this JEP :) > Re concurrent access API, how it looks > https://github.com/real-logic/agrona/blob/master/agrona/src/main/java/org/agrona/concurrent/AtomicBuffer.java? > > note: > I don't know how's considered to appear in these discussions without > presenting myself and I hope to not be OT, but both this JEP and the comments > around are so interesting > that I couldn't resist: I apologize if I'm not respecting some rule on it > > Thanks for the hard work, > Francesco > > Il giorno ven 28 set 2018 alle ore 09:21 Peter Levart > ha scritto: > > Hi Stuart, > > I mostly agree with your assessment about the suitability of the > ByteBuffer API for nice multithreaded use. What would such API look like? > I think pretty much like ByteBuffer but without things that mutate > mark/position/limit/ByteOrder. A stripped-down ByteBuffer API therefore. > That would be in my opinion the most low-level API possible. If you add > things to such API that coordinate multithreaded access to the underlying > memory, you are already creating a concurrent data structure for a > particular set of use cases, which might not cover all possible use cases > or be sub-optimal at some of them. So I think this is better layered on > top of such API not built into it. Low-level multithreaded access to > memory is, in my opinion, always going to be "unsafe" from the standpoint > of coordination. It's not only the mark/position/limit/ByteOrder that is > not multithreaded-friendly about ByteBuffer API, but the underlying memory > too. It would be nice if mark/position/limit/ByteOrder weren't in the way > though. > > Regards, Peter > > > On 09/28/2018 07:51 AM, Stuart Marks wrote: >> Hi Andrew, >> >> Let me first stay that this issue of "ByteBuffer might not be the right >> answer" is something of a digression from the JEP discussion. I think the >> JEP should proceed forward using MBB with the API that you and Alan had >> discussed previously. At most, the discussion of the "right thing" issue >> might affect a side note in the JEP text about possible limitations and >> future directions of this effort. However, it's not a blocker to the JEP >> making progress as far as I'm concerned. >> >> With that in mind, I'll discuss the issue of multithreaded access to >> ByteBuffers and how this bears on whether buffers are or aren't the >> "right answer." There are actually several issues that figure into the >> "right answer" analysis. In this message, though, I'll just focus on the >> issue of multithreaded access. >> >> To recap (possibly for the benefit of other readers) the Buffer class doc >> has the following statement: >> >> ??? Buffers are not safe for use by multiple concurrent threads. If a buffer >> ??? is to be used by more than one thread then access to the buffer >> should be >> ??? controlled by appropriate synchronization. >> >> Buffers are primarily designed for sequential operations such as I/O or >> codeset conversion. Typical buffer operations set the mark, position, and >> limit before initiating the operation. If the operation completes >> partially -- not uncommon with I/O or codeset conversion -- the position >> is updated so that the operation can be resumed easily from where it left >> off. >> >> The fact that buffers not only contain the data being operated upon but >> also mutable state information such as mark/position/limit makes it >> difficult to have multiple threads operate on different parts of the same >> buffer. Each thread would have to lock around setting the position and >> limit and performing the operation, preventing any parallelism. The >> typical way to deal with this is to create multiple buffer slices, one >> per thread. Each slice has its own mark/position/limit values but shares >> the same backing data. >> >> We can avoid the need for this by adding absolute bulk operations, right? >> >> Let's suppose we were to add something like this (considering ByteBuffer >> only, setting the buffer views aside): >> >> ??? get(int srcOff, byte[] dst, int dstOff, int length) >> ??? put(int dstOff, byte[] src, int srcOff, int length) >> >> Each thread can perform its operations on a different part of the buffer, >> in parallel, without interference from the others. Presumably these >> operations don't read or write the mark and position. Oh, wait. The >> existing absolute put and get overloads *do* respect the buffer's limit, >> so the absolute bulk operations ought to as well. This means they do >> depend on shared state. (I guess we could make the absolute bulk ops not >> respect the limit, but that seems inconsistent.) >> >> OK, let's adopt an approach similar to what was described by Peter Levart >> a couple messages upthread, where a) there is an initialization step >> where various things including the limit are set properly; b) the buffer >> is published to the worker threads properly, e.g., using a lock or other >> suitable memory operation; and c) all worker threads agree only to use >> absolute operations and to avoid relative operations. >> >> Now suppose the threads have completed their work and you want to, say, >> write the buffer's contents to a channel. You have to carefully make sure >> the threads are all finished and properly publish their results back to >> some central thread, have that central thread receive the results, set >> the position and limit, after which the central thread can initiate the >> I/O operation. >> >> This can certainly be made to work. >> >> But note what we just did. We now have an API where: >> >> ?- there are different "phases", where in one phase all the methods work, >> but in another phase only certain methods work (otherwise it breaks >> silently); >> >> ?- you have to carefully control all the code to ensure that the wrong >> methods aren't called when the buffer is in the wrong phase (otherwise it >> breaks silently); and >> >> ?- you can't hand off the buffer to a library (3rd party or JDK) without >> carefully orchestrating a transition into the right phase (otherwise it >> breaks silently). >> >> Frankly, this is pretty crappy. It's certainly possible to work around >> it. People do, and it is painful, and they complain about it up and down >> all day long (and rightfully so). >> >> Note that this discussion is based primarily on looking at the ByteBuffer >> API. I have not done extensive investigation of the impact of the various >> buffer views (IntBuffer, LongBuffer, etc.), nor have I looked thoroughly >> at the implementations. I have no doubt that we will run into additional >> issues when we do those investigations. >> >> If we were designing an API to support multi-threaded access to memory >> regions, it would almost certainly look nothing like the buffer API. This >> is what Alan means by "buffers might not be the right answer." As things >> stand, it appears quite difficult to me to fix the multi-threaded access >> problem without turning buffers into something they aren't, or >> fragmenting the API in some complex and uncomfortable way. >> >> Finally, note that this is not an argument against adding bulk absolute >> operations! I think we should probably go ahead and do that anyway. But >> let's not fool ourselves into thinking that bulk absolute operations >> solve the multi-threaded buffer access problem. >> >> s'marks >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Fri Sep 28 23:22:08 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 28 Sep 2018 16:22:08 -0700 Subject: RFR(M): 8210887: Tweak C2 gc api for arraycopy In-Reply-To: References: <0d622902-6f1e-e501-4b21-6bea60c93105@oracle.com> <3524b9d8-f387-a3c4-49c3-eace83126bb7@oracle.com> <27315645-ff46-0e05-a22d-89c166f5bd8b@oracle.com> <39b8c3c2-d412-06fa-9162-1d80f067ed11@oracle.com> Message-ID: This version passed clean! Thanks, Vladimir On 9/28/18 2:06 AM, Roland Westrelin wrote: > >> gc/epsilon/TestManyThreads.java test failed on SPARC >> I add information and replay file to bug report. > > Thanks for the test result. The fix is: > > diff --git a/src/hotspot/share/opto/arraycopynode.cpp b/src/hotspot/share/opto/arraycopynode.cpp > --- a/src/hotspot/share/opto/arraycopynode.cpp > +++ b/src/hotspot/share/opto/arraycopynode.cpp > @@ -422,7 +422,8 @@ > Node *start_mem_dest = mm->memory_at(alias_idx_dest); > Node* mem = start_mem_dest; > > - assert(copy_type != T_OBJECT, "only tightly coupled allocations for object arrays"); > + BarrierSetC2* bs = BarrierSet::barrier_set()->barrier_set_c2(); > + assert(copy_type != T_OBJECT || !bs->array_copy_requires_gc_barriers(false, T_OBJECT, false, BarrierSetC2::Optimization), "only tightly coupled allocations for object arrays"); > bool same_alias = (alias_idx_src == alias_idx_dest); > > if (count > 0) { > > > New webrev: > > http://cr.openjdk.java.net/~roland/8210887/webrev.02/ > > Roland. > From Alan.Bateman at oracle.com Sun Sep 30 15:31:03 2018 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Sun, 30 Sep 2018 16:31:03 +0100 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <8e1fe763-c631-1060-e714-9f9b9481cfe2@redhat.com> <50ed4716-b76e-6557-1146-03084776c160@redhat.com> <7ab7fb00-92aa-b952-0c63-9b1ab2479def@oracle.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> Message-ID: <09c92b1a-c6da-e16b-bb68-553876e8a6ea@oracle.com> On 26/09/2018 14:27, Andrew Dinn wrote: > : > I'm not clear why we should only use one flag. The two flags I specified > reflect two independent use cases, one where data stored in an NVM > device is accessed read-only and another where it is accessed > read-write. Are you suggesting that the read-only case is redundant? I'm > not sure I agree. For example, a utility which might want to review the > state of persistent data while a service is off-line would really want > to pass flag READ_ONLY_PERSISTENT. Of course, it could employ > READ_WRITE_PERSISTENT (or equivalently, SYNC) and just not write the > data but, mutatis mutandis, that same argument would remove the case for > flag READ_ONLY. > I'm wrong on this point. The map takes a single MapMode, not a set of modes as I was assuming,? so you are right that it needs two new modes, not one. I do think we should re-visit the name though as the native flag is MAP_SYNC. -Alan