From dean.long at oracle.com  Fri Nov  1 08:05:27 2019
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Fri, 1 Nov 2019 01:05:27 -0700
Subject: RFR: 8231955: ARM32: Address displacement is 0 for volatile field
 access because of Unsafe field access.
In-Reply-To: <2vyvfdgdqk-1@aserp2050.oracle.com>
References: <20191010143426.BA4B6319F46@aojmv0009>
 <20191015073212.7FCCA319074@aojmv0009>
 <f40cbf84-aef3-f235-4861-403ce30dc03d@oracle.com>
 <587f6363-bbdc-da12-9e50-82acc5bc5853@oracle.com>
 <2vyvfdgdqk-1@aserp2050.oracle.com>
Message-ID: <eb4ac687-c009-6692-ef66-4bceb26a7417@oracle.com>


On 10/31/19 2:12 AM, christoph.goettschkes at microdoc.com wrote:
>> I see now that BarrierSetC1::resolve_address() is calling
>> generate_address(), at least when access isn't patched.  So now I'm
>> thinking that the address passed to
>> volatile_field_load/volatile_field_store should be correct, and the call
>> to add_large_constant() isn't necessary.
> Yes, this is correct. The LIR_Address is created by
> LIRGenerator::generate_address and has a displacement of 0.
> I attached a backtrace of the failing assert at the end of this mail.
>
> Do you think the patch makes sense and can be pushed?
> The HotSpot tier1 JTreg tests are passing with this and other patches I am
> working on applied with a debug VM.

Yes,? it looks fine now that I am reminded that arm32 is using 
ldmia/stmia in volatile_move_op, which means the displacement must be 0.

dl

> -- Christoph
>
> #0  0x7636b860 in LIRGenerator::add_large_constant
>      (this=0x641ae2f0, src=0xe500b, c=0, dest=0xe900b)
>      at src/hotspot/cpu/arm/c1_LIRGenerator_arm.cpp:166
> #1  0x7636f266 in LIRGenerator::volatile_field_load
>      (this=0x641ae2f0, address=0x6429c970, result=0xdd093, info=0x0)
>      at src/hotspot/cpu/arm/c1_LIRGenerator_arm.cpp:1326
> #2  0x762d9806 in BarrierSetC1::load_at_resolved
>      (this=0x7602b1f0, access=..., result=0xdd093)
>      at src/hotspot/share/gc/shared/c1/barrierSetC1.cpp:183
> #3  0x762d929a in BarrierSetC1::load_at
>      (this=0x7602b1f0, access=..., result=0xdd093)
>      at src/hotspot/share/gc/shared/c1/barrierSetC1.cpp:94
> #4  0x7635f6cc in LIRGenerator::access_load_at
>      (this=0x641ae2f0, decorators=9127331840, type=T_LONG, base=...,
>       offset=0xd900b, result=0xdd093, patch_info=0x0, load_emit_info=0x0)
>      at src/hotspot/share/c1/c1_LIRGenerator.cpp:1618
> #5  0x7636133e in LIRGenerator::do_UnsafeGetObject
>      (this=0x641ae2f0, x=0x6429a0d0)
>      at src/hotspot/share/c1/c1_LIRGenerator.cpp:2173
> #6  0x76328bdc in UnsafeGetObject::visit
>      (this=0x6429a0d0, v=0x641ae2f0)
>      at src/hotspot/share/c1/c1_Instruction.hpp:2407
> #7  0x7635b2d2 in LIRGenerator::do_root
>      (this=0x641ae2f0, instr=0x6429a0d0)
>      at src/hotspot/share/c1/c1_LIRGenerator.cpp:373
> #8  0x7635b1f2 in LIRGenerator::block_do
>      (this=0x641ae2f0, block=0x64299788)
>      at src/hotspot/share/c1/c1_LIRGenerator.cpp:354
> #9  0x76337d5a in BlockList::iterate_forward
>      (this=0x6429bf00, closure=0x641ae2f4)
>      at src/hotspot/share/c1/c1_Instruction.cpp:921
> #10 0x76332936 in IR::iterate_linear_scan_order
>      (this=0x642994d0, closure=0x641ae2f4)
>      at src/hotspot/share/c1/c1_IR.cpp:1221
> #11 0x7630ed10 in Compilation::emit_lir
>      (this=0x641ae5c0)
>      at src/hotspot/share/c1/c1_Compilation.cpp:259
> #12 0x7630f2be in Compilation::compile_java_method
>      (this=0x641ae5c0)
>      at src/hotspot/share/c1/c1_Compilation.cpp:398
> #13 0x7630f566 in Compilation::compile_method
>      (this=0x641ae5c0)
>      at src/hotspot/share/c1/c1_Compilation.cpp:460
> #14 0x7630fabc in Compilation::Compilation
>      (this=0x641ae5c0, compiler=0x760eb610, env=0x641ae848,
>       method=0x63d2edc8, osr_bci=-1, buffer_blob=0x73eb7448,
>       directive=0x760cf858)
>      at src/hotspot/share/c1/c1_Compilation.cpp:583
> #15 0x76312d6e in Compiler::compile_method
>      (this=0x760eb610, env=0x641ae848, method=0x63d2edc8, entry_bci=-1,
>       directive=0x760cf858)
>      at src/hotspot/share/c1/c1_Compiler.cpp:247
> #16 0x76453704 in CompileBroker::invoke_compiler_on_method
>      (task=0x642cfa50)
>      at src/hotspot/share/compiler/compileBroker.cpp:2115
> #17 0x764529ba in CompileBroker::compiler_thread_loop
>      ()
>      at src/hotspot/share/compiler/compileBroker.cpp:1800
> #18 0x7693548c in compiler_thread_entry
>      (thread=0x6423b400, __the_thread__=0x6423b400)
>      at src/hotspot/share/runtime/thread.cpp:3401
> #19 0x769315d4 in JavaThread::thread_main_inner
>      (this=0x6423b400)
>      at src/hotspot/share/runtime/thread.cpp:1917
> #20 0x769314ac in JavaThread::run
>      (this=0x6423b400)
>      at src/hotspot/share/runtime/thread.cpp:1900
> #21 0x7692e884 in Thread::call_run
>      (this=0x6423b400)
>      at src/hotspot/share/runtime/thread.cpp:398
> #22 0x768285ce in thread_native_entry
>      (thread=0x6423b400)
>      at src/hotspot/os/linux/os_linux.cpp:790
> #23 0x76f84568 in start_thread() from target:/usr/lib/libpthread.so.0
> #24 0x76ef8ac8 in ?? () from target:/usr/lib/libc.so.6
>


From per.liden at oracle.com  Fri Nov  1 11:50:07 2019
From: per.liden at oracle.com (Per Liden)
Date: Fri, 1 Nov 2019 12:50:07 +0100
Subject: RFR(M): 8232896: ZGC: Enable C2 clone intrinsic
In-Reply-To: <2d509788-35f0-1fed-c305-d98d76583c66@oracle.com>
References: <2d509788-35f0-1fed-c305-d98d76583c66@oracle.com>
Message-ID: <3e148258-8c89-afdd-bf4f-585be5e09c42@oracle.com>

Hi Nils,

On 10/31/19 5:37 PM, Nils Eliasson wrote:
> Hi,
> 
> This patch fixes and enables the clone intrinsic for C2 with ZGC.
> 
> The main thing added is an implementation of the 
> ZBarrierSetC2::clone_at_expansion method. This method handles clone 
> expansion for regular objects and primitive arrays that hasn't already 
> been reduced by optimizations. (Oop array clones doesn't go through this 
> path.)
> 
> The code switches on the type of the source to either make a leaf call 
> to a runtime clone for regular objects, or delegate to 
> BarrierSetC2::clone_at_expansion for primitive arrays.
> 
> Updated micro benchmark shows great gains, especially for small objects 
> that now will be reduced to inlined move-store sequences.

Sweet! The speedup of micro:Clone is impressive!

> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8232896
> 
> Webrev: http://cr.openjdk.java.net/~neliasso/8232896/webrev.02/

My review comments summarized in a patch on top of yours. Let me know if 
there's something you don't agree with. The changes are:
* Changed the type of the runtime function's size arguments from 
TypeInt::INT to TypeLong::LONG, to make it 64 bits, rather than 32.
* Changed the new runtime function to call HeapAccess::clone() rather 
than ZBarrier::clone_oop(), since ZBarrier is the backend for the access 
layer, not the frontend.
* Renamed clone_oop() to clone(), as we're cloning an object rather than 
an oop.
* Added const where applicable.
* Adjusted an include line.
* Moved the clone_at_expansion() function up in the file.

http://cr.openjdk.java.net/~pliden/8232896/webrev.pliden_review.0

I ran micro:Clone to verify my changes.

cheers,
Per

> 
> 
> Please review,
> 
> Nils Eliasson
> 

From jorn.vernee at oracle.com  Fri Nov  1 15:09:51 2019
From: jorn.vernee at oracle.com (Jorn Vernee)
Date: Fri, 1 Nov 2019 16:09:51 +0100
Subject: RFR 8233389: Add PrintIdeal to compiler directives
Message-ID: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com>

Hi,

I'd like to add PrintIdeal as a compiler directive in order to enable 
PrintIdeal for only a single method when combining it with the 'match' 
directive.

Please review the following:

Bug: https://bugs.openjdk.java.net/browse/JDK-8233389
Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/
(Testing = tier1, manual)

As a heads-up; I'm not a committer on the jdk project, so if this sounds 
like a good idea, I would require a sponsor to push the changes.

Thanks,
Jorn


From Nikola.Grcevski at microsoft.com  Fri Nov  1 18:26:57 2019
From: Nikola.Grcevski at microsoft.com (Nikola Grcevski)
Date: Fri, 1 Nov 2019 18:26:57 +0000
Subject: RFR 8233389: Add PrintIdeal to compiler directives
In-Reply-To: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com>
References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com>
Message-ID: <MW2PR2101MB10180C983D791BA740C8705CF5620@MW2PR2101MB1018.namprd21.prod.outlook.com>

Hi Jorn and hotspot-compiler-dev!

I'm a VM engineer at Microsoft and we recently signed the OCA agreement so we can properly contribute to OpenJDK. I was recently also in need of this option, as I have been educating myself on Ideal Graph and how the C2 optimizations work. I initially went ahead and added a very similar change in my local build to be able to log only certain methods in larger applications, however I later on discovered that a similar effect can be achieved by using the following combination of options:

1. On the main java command line you would need to add: -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=0

2. With the compile command (or through the .hotspot_compiler file) you can then use IGVPrintLevel to increase the level to greater than 0 on the methods you'd like, e.g:
option path.class_name::method_name,intx,IGVPrintLevel,2

The idea is to enable the Ideal Graph printing globally, set the global level to 0, and then control which methods would get logged by setting their level through the compiler directive.

This is not as good as having a direct option properly documented as suggested by Jorn, but it works.

Thank you and please let me know if you think this is valid approach and if I should add this comment in the bug report.
Nikola

-----Original Message-----
From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Jorn Vernee
Sent: November 1, 2019 11:10 AM
To: hotspot-compiler-dev at openjdk.java.net
Subject: RFR 8233389: Add PrintIdeal to compiler directives

Hi,

I'd like to add PrintIdeal as a compiler directive in order to enable PrintIdeal for only a single method when combining it with the 'match' 
directive.

Please review the following:

Bug: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8233389&amp;data=02%7C01%7CNikola.Grcevski%40microsoft.com%7Cda3a597a49d94f4c541008d75ede29b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637082181241665037&amp;sdata=24uDd9T1ncKsAPzcvxYI70XUZ1WvXdJ8eX5jHrNHNhk%3D&amp;reserved=0
Webrev: https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~jvernee%2Fprint_ideal%2Fwebrev.00%2F&amp;data=02%7C01%7CNikola.Grcevski%40microsoft.com%7Cda3a597a49d94f4c541008d75ede29b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637082181241665037&amp;sdata=GD8GvzGdJljn4gDWsLTGxbtbvVl5KCv0RgmMPNH3UKU%3D&amp;reserved=0
(Testing = tier1, manual)

As a heads-up; I'm not a committer on the jdk project, so if this sounds like a good idea, I would require a sponsor to push the changes.

Thanks,
Jorn


From dean.long at oracle.com  Fri Nov  1 22:10:24 2019
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Fri, 1 Nov 2019 15:10:24 -0700
Subject: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns
 wrong result if Klass* is aligned to 32bit
In-Reply-To: <CAA-vtUz6P7C3=xkfwBYG7MoGFY37r2XS87Am89Pu9GMEsNSWTw@mail.gmail.com>
References: <CAA-vtUzdW000k20ZUEH03PbCA5Zt2S6MUGupt_F842fL36Uwvg@mail.gmail.com>
 <VI1PR0201MB247991750418A9641C45209F9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <CAA-vtUz6P7C3=xkfwBYG7MoGFY37r2XS87Am89Pu9GMEsNSWTw@mail.gmail.com>
Message-ID: <274c3691-8e12-3425-0ee9-94da295e203f@oracle.com>

The shared changes look good to me, and I don't see any obvious problems 
with the cpu changes.

dl

On 10/31/19 12:01 AM, Thomas St?fe wrote:
> Hi Martin,
>
> thanks for the review!
>
> New version:
> http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.01/webrev/
>
> pls find remarks inline.
>
> On Wed, Oct 30, 2019 at 1:09 PM Doerr, Martin <martin.doerr at sap.com> wrote:
>
>> Hi Thomas,
>>
>>
>>
>> thank you for finding and fixing this issue.
>>
>>
>>
>> Shared code part and regression test look good to me.
>>
>> But I have a few requests and questions related to the platform code.
>>
>>
>>
>> Arm32:
>>
>> Breaks build. The local variable p needs a scope. Can be fixed by:
>>
>> -        case T_METADATA:
>>
>> +        case T_METADATA: {
>>
>>             // We only need, for now, comparison with NULL for metadata.
>>
>> ?
>>
>>             break;
>>
>> +        }
>>
>>
> Ouch. Fixed. Actually, I discarded my coding completely and took over the
> code from T_OBJECT to keep in line with the rest of the coding here.
>
>
>>
>> S390:
>>
>> Using z_cgfi is correct, but there are equivalent instructions with
>> shorter opcode.
>>
>> For comparing to 0, z_ltgr(reg1, reg1) or z_cghi(reg1, 0) may be
>> preferred. But that?s not a big deal.
>>
>>
>>
> Okay I switched to cghi. ltgr sounds cool but would be difficult to
> integrate into the shared part since that one does first a move, then a
> compare.
>
>
>> I wonder why you have added includes to some platform files. Isn?t that
>> redundant?
>>
>> "utilities/debug.hpp" comes via shared assembler.hpp.
>>
> Did this because I added asserts and the rule is that every file should
> include what it needs and not rely on other includes including it (save for
> umbrella includes like globalDefinitions.hpp). But okay, I removed the
> added includes again to keep the patch small.
>
>
>>
>> I?d probably choose Unimplemented() instead of ShouldNotReachHere() for
>> non-null cases because it?s not bad in general, it?s just currently not
>> used and therefore not yet implemented.
>>
>> But you can keep that as it is. I?m ok with that, too.
>>
>>
> I rather keep it to keep in line with the rest of the code (see e.g. the
> default: branches).
>
>
>>
>> Best regards,
>>
>> Martin
>>
>>
>>
> Thanks Martin!
>
> ..Thomas
>
>
>>
>> *From:* Thomas St?fe <thomas.stuefe at gmail.com>
>> *Sent:* Mittwoch, 30. Oktober 2019 11:48
>> *To:* hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>> *Cc:* Doerr, Martin <martin.doerr at sap.com>; Schmidt, Lutz <
>> lutz.schmidt at sap.com>
>> *Subject:* RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns
>> wrong result if Klass* is aligned to 32bit
>>
>>
>>
>> Hi all,
>>
>>
>>
>> second attempt at a fix (please find first review thread here:
>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-October/035608.html
>> )
>>
>>
>>
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8233019
>>
>> webrev:
>> http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.00/webrev/
>>
>>
>>
>> In short, C1 intrinsic for jlC::isPrimitive does a compare with the Klass*
>> pointer for the class to find out if its NULL and hence a primitive type.
>> That compare is done using 32bit cmp and so it gives wrong results when the
>> Klass* pointer is aligned to 32bit.
>>
>>
>>
>> In the generator I changed the comparison constant type from intConst(0)
>> to metadataConst(0) and implemented the missing code paths for all CPUs.
>> Since on most architectures we do not seem to have a comparison with a
>> 64bit immediate (at least I could not find one) I kept the change simple
>> and only implemented comparison with NULL for now.
>>
>>
>>
>> I tested the fix in our nightlies (jtreg tier1, jck and others) as well as
>> manually testing it.
>>
>>
>>
>> I did not test on aarch64 and arm though and would be thankful if someone
>> knowledgeable to these platforms could take a look.
>>
>>
>>
>> Thanks to Martin and Lutz for eyeballing the ppc and s390 parts.
>>
>>
>>
>> Thanks, Thomas
>>


From jorn.vernee at oracle.com  Fri Nov  1 23:03:16 2019
From: jorn.vernee at oracle.com (Jorn Vernee)
Date: Sat, 2 Nov 2019 00:03:16 +0100
Subject: RFR 8233389: Add PrintIdeal to compiler directives
In-Reply-To: <MW2PR2101MB10180C983D791BA740C8705CF5620@MW2PR2101MB1018.namprd21.prod.outlook.com>
References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com>
 <MW2PR2101MB10180C983D791BA740C8705CF5620@MW2PR2101MB1018.namprd21.prod.outlook.com>
Message-ID: <6e40a291-2628-9787-8b1b-92d6950cec27@oracle.com>

Hi Nikola,

Thanks for the suggestion, and welcome to OpenJDK!

The PrintIdeal option is slightly different from PrintIdealGraph + 
IGVPrintLevel. The latter outputs the graph in XML format (or directly 
to the visualizer tool), while the former calls Node::dump on the root 
node of the graph, which outputs a plain text representation of the 
graph to the console instead. IGVPrintLevel is already supported as a 
compiler directive, but I'd like to add PrintIdeal as well, since that 
doesn't require using the visualizer tool :)

In case you were unaware of the feature; I'm using the compiler 
directives JSON file support which was added by JEP 165: 
https://openjdk.java.net/jeps/165 in Java 9. This allows me to use 
"-XX:CompilerDirectivesFile=compile.txt" and then have a compile.txt 
file with something like:

```
[
 ??? {
 ??????? match: "main.Main::invoke",
 ??????? c2: {
 ??????????? inline: "-main.Main::invoke",
 ??????????? Log: true,
 ??????????? PrintAssembly: true,
 ??????????? PrintInlining: true,
 ??????????? PrintIdeal: true
 ??????? }
 ??? }
]
```

As a more structured way of defining the compile commands I need.

Cheers,
Jorn

On 01/11/2019 19:26, Nikola Grcevski wrote:
> Hi Jorn and hotspot-compiler-dev!
>
> I'm a VM engineer at Microsoft and we recently signed the OCA agreement so we can properly contribute to OpenJDK. I was recently also in need of this option, as I have been educating myself on Ideal Graph and how the C2 optimizations work. I initially went ahead and added a very similar change in my local build to be able to log only certain methods in larger applications, however I later on discovered that a similar effect can be achieved by using the following combination of options:
>
> 1. On the main java command line you would need to add: -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=0
>
> 2. With the compile command (or through the .hotspot_compiler file) you can then use IGVPrintLevel to increase the level to greater than 0 on the methods you'd like, e.g:
> option path.class_name::method_name,intx,IGVPrintLevel,2
>
> The idea is to enable the Ideal Graph printing globally, set the global level to 0, and then control which methods would get logged by setting their level through the compiler directive.
>
> This is not as good as having a direct option properly documented as suggested by Jorn, but it works.
>
> Thank you and please let me know if you think this is valid approach and if I should add this comment in the bug report.
> Nikola
>
> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Jorn Vernee
> Sent: November 1, 2019 11:10 AM
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: RFR 8233389: Add PrintIdeal to compiler directives
>
> Hi,
>
> I'd like to add PrintIdeal as a compiler directive in order to enable PrintIdeal for only a single method when combining it with the 'match'
> directive.
>
> Please review the following:
>
> Bug: https://urldefense.proofpoint.com/v2/url?u=https-3A__nam06.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fbugs.openjdk.java.net-252Fbrowse-252FJDK-2D8233389-26amp-3Bdata-3D02-257C01-257CNikola.Grcevski-2540microsoft.com-257Cda3a597a49d94f4c541008d75ede29b2-257C72f988bf86f141af91ab2d7cd011db47-257C1-257C0-257C637082181241665037-26amp-3Bsdata-3D24uDd9T1ncKsAPzcvxYI70XUZ1WvXdJ8eX5jHrNHNhk-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=z6hqmB9KSaIU-Otko9bPiG2fp55VDaBvROsDkQobkYI&m=k9YvYehfSfTknl9EFUZMcbiqTaW26xrJPPQUd5KnPTs&s=mObyYrV_E5da2eJQdvqpKJI_tJ5te1qGcSXVSOHqz0M&e=
> Webrev: https://urldefense.proofpoint.com/v2/url?u=https-3A__nam06.safelinks.protection.outlook.com_-3Furl-3Dhttp-3A-252F-252Fcr.openjdk.java.net-252F-7Ejvernee-252Fprint-5Fideal-252Fwebrev.00-252F-26amp-3Bdata-3D02-257C01-257CNikola.Grcevski-2540microsoft.com-257Cda3a597a49d94f4c541008d75ede29b2-257C72f988bf86f141af91ab2d7cd011db47-257C1-257C0-257C637082181241665037-26amp-3Bsdata-3DGD8GvzGdJljn4gDWsLTGxbtbvVl5KCv0RgmMPNH3UKU-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=z6hqmB9KSaIU-Otko9bPiG2fp55VDaBvROsDkQobkYI&m=k9YvYehfSfTknl9EFUZMcbiqTaW26xrJPPQUd5KnPTs&s=3HriLu-X9seR3J_Z7VsBJyyWVginzlyEMhTEBTzkfBs&e=
> (Testing = tier1, manual)
>
> As a heads-up; I'm not a committer on the jdk project, so if this sounds like a good idea, I would require a sponsor to push the changes.
>
> Thanks,
> Jorn
>

From Nikola.Grcevski at microsoft.com  Fri Nov  1 23:39:14 2019
From: Nikola.Grcevski at microsoft.com (Nikola Grcevski)
Date: Fri, 1 Nov 2019 23:39:14 +0000
Subject: RFR 8233389: Add PrintIdeal to compiler directives
In-Reply-To: <6e40a291-2628-9787-8b1b-92d6950cec27@oracle.com>
References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com>
 <MW2PR2101MB10180C983D791BA740C8705CF5620@MW2PR2101MB1018.namprd21.prod.outlook.com>
 <6e40a291-2628-9787-8b1b-92d6950cec27@oracle.com>
Message-ID: <BL0PR2101MB1009A40AFDF3D9F8F38C42D5F5620@BL0PR2101MB1009.namprd21.prod.outlook.com>

Hi Jorn,

This is great, thanks so much for the explanation. I wasn't aware of the new directives format, I'll check it out.

Cheers,
Nikola

-----Original Message-----
From: Jorn Vernee <jorn.vernee at oracle.com> 
Sent: November 1, 2019 7:03 PM
To: Nikola Grcevski <Nikola.Grcevski at microsoft.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR 8233389: Add PrintIdeal to compiler directives

Hi Nikola,

Thanks for the suggestion, and welcome to OpenJDK!

The PrintIdeal option is slightly different from PrintIdealGraph + IGVPrintLevel. The latter outputs the graph in XML format (or directly to the visualizer tool), while the former calls Node::dump on the root node of the graph, which outputs a plain text representation of the graph to the console instead. IGVPrintLevel is already supported as a compiler directive, but I'd like to add PrintIdeal as well, since that doesn't require using the visualizer tool :)

In case you were unaware of the feature; I'm using the compiler directives JSON file support which was added by JEP 165: 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenjdk.java.net%2Fjeps%2F165&amp;data=02%7C01%7CNikola.Grcevski%40microsoft.com%7C8d8803db27e04864494108d75f1fb1d0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637082462078430230&amp;sdata=20c4zHKdGZralHRTgC8AL5HLpNeWiQiwfk7tGmQ0mIU%3D&amp;reserved=0 in Java 9. This allows me to use "-XX:CompilerDirectivesFile=compile.txt" and then have a compile.txt file with something like:

```
[
 ??? {
 ??????? match: "main.Main::invoke",
 ??????? c2: {
 ??????????? inline: "-main.Main::invoke",
 ??????????? Log: true,
 ??????????? PrintAssembly: true,
 ??????????? PrintInlining: true,
 ??????????? PrintIdeal: true
 ??????? }
 ??? }
]
```

As a more structured way of defining the compile commands I need.

Cheers,
Jorn

On 01/11/2019 19:26, Nikola Grcevski wrote:
> Hi Jorn and hotspot-compiler-dev!
>
> I'm a VM engineer at Microsoft and we recently signed the OCA agreement so we can properly contribute to OpenJDK. I was recently also in need of this option, as I have been educating myself on Ideal Graph and how the C2 optimizations work. I initially went ahead and added a very similar change in my local build to be able to log only certain methods in larger applications, however I later on discovered that a similar effect can be achieved by using the following combination of options:
>
> 1. On the main java command line you would need to add: 
> -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=0
>
> 2. With the compile command (or through the .hotspot_compiler file) you can then use IGVPrintLevel to increase the level to greater than 0 on the methods you'd like, e.g:
> option path.class_name::method_name,intx,IGVPrintLevel,2
>
> The idea is to enable the Ideal Graph printing globally, set the global level to 0, and then control which methods would get logged by setting their level through the compiler directive.
>
> This is not as good as having a direct option properly documented as suggested by Jorn, but it works.
>
> Thank you and please let me know if you think this is valid approach and if I should add this comment in the bug report.
> Nikola
>
> -----Original Message-----
> From: hotspot-compiler-dev 
> <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Jorn 
> Vernee
> Sent: November 1, 2019 11:10 AM
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: RFR 8233389: Add PrintIdeal to compiler directives
>
> Hi,
>
> I'd like to add PrintIdeal as a compiler directive in order to enable PrintIdeal for only a single method when combining it with the 'match'
> directive.
>
> Please review the following:
>
> Bug: 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> efense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__nam06.safelinks.prote
> ction.outlook.com_-3Furl-3Dhttps-253A-252F-252Fbugs.openjdk.java.net-2
> 52Fbrowse-252FJDK-2D8233389-26amp-3Bdata-3D02-257C01-257CNikola.Grcevs
> ki-2540microsoft.com-257Cda3a597a49d94f4c541008d75ede29b2-257C72f988bf
> 86f141af91ab2d7cd011db47-257C1-257C0-257C637082181241665037-26amp-3Bsd
> ata-3D24uDd9T1ncKsAPzcvxYI70XUZ1WvXdJ8eX5jHrNHNhk-253D-26amp-3Breserve
> d-3D0%26d%3DDwIGaQ%26c%3DRoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE%2
> 6r%3Dz6hqmB9KSaIU-Otko9bPiG2fp55VDaBvROsDkQobkYI%26m%3Dk9YvYehfSfTknl9
> EFUZMcbiqTaW26xrJPPQUd5KnPTs%26s%3DmObyYrV_E5da2eJQdvqpKJI_tJ5te1qGcSX
> VSOHqz0M%26e&amp;data=02%7C01%7CNikola.Grcevski%40microsoft.com%7C8d88
> 03db27e04864494108d75f1fb1d0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C
> 0%7C637082462078430230&amp;sdata=P9YD4j5nbAegopSUa3FKOA8g4Nr26nKiOw16K
> IXNZd4%3D&amp;reserved=0=
> Webrev: 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> efense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__nam06.safelinks.prote
> ction.outlook.com_-3Furl-3Dhttp-3A-252F-252Fcr.openjdk.java.net-252F-7
> Ejvernee-252Fprint-5Fideal-252Fwebrev.00-252F-26amp-3Bdata-3D02-257C01
> -257CNikola.Grcevski-2540microsoft.com-257Cda3a597a49d94f4c541008d75ed
> e29b2-257C72f988bf86f141af91ab2d7cd011db47-257C1-257C0-257C63708218124
> 1665037-26amp-3Bsdata-3DGD8GvzGdJljn4gDWsLTGxbtbvVl5KCv0RgmMPNH3UKU-25
> 3D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DRoP1YumCXCgaWHvlZYR8PZh8Bv7
> qIrMUB65eapI_JnE%26r%3Dz6hqmB9KSaIU-Otko9bPiG2fp55VDaBvROsDkQobkYI%26m
> %3Dk9YvYehfSfTknl9EFUZMcbiqTaW26xrJPPQUd5KnPTs%26s%3D3HriLu-X9seR3J_Z7
> VsBJyyWVginzlyEMhTEBTzkfBs%26e&amp;data=02%7C01%7CNikola.Grcevski%40mi
> crosoft.com%7C8d8803db27e04864494108d75f1fb1d0%7C72f988bf86f141af91ab2
> d7cd011db47%7C1%7C0%7C637082462078430230&amp;sdata=qpkD3ku8UI2lNodOWh6
> xVy72p455nxlWx%2FBvEi%2BS5Bo%3D&amp;reserved=0=
> (Testing = tier1, manual)
>
> As a heads-up; I'm not a committer on the jdk project, so if this sounds like a good idea, I would require a sponsor to push the changes.
>
> Thanks,
> Jorn
>

From thomas.stuefe at gmail.com  Sat Nov  2 05:28:25 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Sat, 2 Nov 2019 06:28:25 +0100
Subject: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns
 wrong result if Klass* is aligned to 32bit
In-Reply-To: <274c3691-8e12-3425-0ee9-94da295e203f@oracle.com>
References: <CAA-vtUzdW000k20ZUEH03PbCA5Zt2S6MUGupt_F842fL36Uwvg@mail.gmail.com>
 <VI1PR0201MB247991750418A9641C45209F9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <CAA-vtUz6P7C3=xkfwBYG7MoGFY37r2XS87Am89Pu9GMEsNSWTw@mail.gmail.com>
 <274c3691-8e12-3425-0ee9-94da295e203f@oracle.com>
Message-ID: <CAA-vtUzoEezVbkd61EEKGpc2RiQyz+k19+ZgThXtNjkUsWVkHA@mail.gmail.com>

Thank you, Dean.

On Fri 1. Nov 2019 at 23:10, <dean.long at oracle.com> wrote:

> The shared changes look good to me, and I don't see any obvious problems
> with the cpu changes.
>
> dl
>
> On 10/31/19 12:01 AM, Thomas St?fe wrote:
> > Hi Martin,
> >
> > thanks for the review!
> >
> > New version:
> >
> http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.01/webrev/
> >
> > pls find remarks inline.
> >
> > On Wed, Oct 30, 2019 at 1:09 PM Doerr, Martin <martin.doerr at sap.com>
> wrote:
> >
> >> Hi Thomas,
> >>
> >>
> >>
> >> thank you for finding and fixing this issue.
> >>
> >>
> >>
> >> Shared code part and regression test look good to me.
> >>
> >> But I have a few requests and questions related to the platform code.
> >>
> >>
> >>
> >> Arm32:
> >>
> >> Breaks build. The local variable p needs a scope. Can be fixed by:
> >>
> >> -        case T_METADATA:
> >>
> >> +        case T_METADATA: {
> >>
> >>             // We only need, for now, comparison with NULL for metadata.
> >>
> >> ?
> >>
> >>             break;
> >>
> >> +        }
> >>
> >>
> > Ouch. Fixed. Actually, I discarded my coding completely and took over the
> > code from T_OBJECT to keep in line with the rest of the coding here.
> >
> >
> >>
> >> S390:
> >>
> >> Using z_cgfi is correct, but there are equivalent instructions with
> >> shorter opcode.
> >>
> >> For comparing to 0, z_ltgr(reg1, reg1) or z_cghi(reg1, 0) may be
> >> preferred. But that?s not a big deal.
> >>
> >>
> >>
> > Okay I switched to cghi. ltgr sounds cool but would be difficult to
> > integrate into the shared part since that one does first a move, then a
> > compare.
> >
> >
> >> I wonder why you have added includes to some platform files. Isn?t that
> >> redundant?
> >>
> >> "utilities/debug.hpp" comes via shared assembler.hpp.
> >>
> > Did this because I added asserts and the rule is that every file should
> > include what it needs and not rely on other includes including it (save
> for
> > umbrella includes like globalDefinitions.hpp). But okay, I removed the
> > added includes again to keep the patch small.
> >
> >
> >>
> >> I?d probably choose Unimplemented() instead of ShouldNotReachHere() for
> >> non-null cases because it?s not bad in general, it?s just currently not
> >> used and therefore not yet implemented.
> >>
> >> But you can keep that as it is. I?m ok with that, too.
> >>
> >>
> > I rather keep it to keep in line with the rest of the code (see e.g. the
> > default: branches).
> >
> >
> >>
> >> Best regards,
> >>
> >> Martin
> >>
> >>
> >>
> > Thanks Martin!
> >
> > ..Thomas
> >
> >
> >>
> >> *From:* Thomas St?fe <thomas.stuefe at gmail.com>
> >> *Sent:* Mittwoch, 30. Oktober 2019 11:48
> >> *To:* hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> >> *Cc:* Doerr, Martin <martin.doerr at sap.com>; Schmidt, Lutz <
> >> lutz.schmidt at sap.com>
> >> *Subject:* RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns
> >> wrong result if Klass* is aligned to 32bit
> >>
> >>
> >>
> >> Hi all,
> >>
> >>
> >>
> >> second attempt at a fix (please find first review thread here:
> >>
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-October/035608.html
> >> )
> >>
> >>
> >>
> >> Issue: https://bugs.openjdk.java.net/browse/JDK-8233019
> >>
> >> webrev:
> >>
> http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.00/webrev/
> >>
> >>
> >>
> >> In short, C1 intrinsic for jlC::isPrimitive does a compare with the
> Klass*
> >> pointer for the class to find out if its NULL and hence a primitive
> type.
> >> That compare is done using 32bit cmp and so it gives wrong results when
> the
> >> Klass* pointer is aligned to 32bit.
> >>
> >>
> >>
> >> In the generator I changed the comparison constant type from intConst(0)
> >> to metadataConst(0) and implemented the missing code paths for all CPUs.
> >> Since on most architectures we do not seem to have a comparison with a
> >> 64bit immediate (at least I could not find one) I kept the change simple
> >> and only implemented comparison with NULL for now.
> >>
> >>
> >>
> >> I tested the fix in our nightlies (jtreg tier1, jck and others) as well
> as
> >> manually testing it.
> >>
> >>
> >>
> >> I did not test on aarch64 and arm though and would be thankful if
> someone
> >> knowledgeable to these platforms could take a look.
> >>
> >>
> >>
> >> Thanks to Martin and Lutz for eyeballing the ppc and s390 parts.
> >>
> >>
> >>
> >> Thanks, Thomas
> >>
>
>

From dean.long at oracle.com  Sat Nov  2 07:35:46 2019
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Sat, 2 Nov 2019 00:35:46 -0700
Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread
In-Reply-To: <VI1PR0201MB2479DDFCD2E55541481E65ED9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB24797F385592A8C572B4064E9A900@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <VI1PR0201MB247908A9BE69E75AAFE3B3639A920@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <3f80fc0b-2388-d4be-3c84-4af516e9635f@oracle.com>
 <VI1PR0201MB2479A25B5B2E26D79ADDE7BF9A690@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <15a92da5-c5ba-ce55-341d-5f60acf14c3a@oracle.com>
 <VI1PR0201MB2479D25DB14E79BBF16666769A680@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com>
 <HE1PR0201MB24755374CE7618AA8545A2309A6B0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com>
 <A0870A55-0738-4CC6-AF3C-50B6E2324D3F@oracle.com>
 <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com>
 <VI1PR0201MB24790137F2D7A941A33CE4949A660@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d8990c59-b51e-1c21-3e7c-272d287fc711@oracle.com>
 <VI1PR0201MB2479DDFCD2E55541481E65ED9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com>

Hi Martin,

On 10/30/19 3:18 AM, Doerr, Martin wrote:
> Hi David,
>
>> I don't think factoring out CompileBroker::clear_compiler2_object when
>> it is only used once was warranted, but that's call for compiler team to
>> make.
> I did that because _compiler2_objects is private and there's currently no setter available.
> But let's see what the compiler folks think.

how about changing can_remove() to CompileBroker::can_remove()? Then you 
can access _compiler2_objects directly, right?

dl
>> Otherwise changes seem fine and I have noted the use of the
>> MutexUnlocker as per your direct email.
> Thanks a lot for reviewing. It was not a trivial one ??
>
> You had noticed an incorrect usage of the CHECK macro. I've created a new bug for that:
> https://bugs.openjdk.java.net/browse/JDK-8233193
> Would be great if you could take a look if that's what you meant and made adaptions if needed.
>
> Best regards,
> Martin
>
>
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Mittwoch, 30. Oktober 2019 05:47
>> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
>> <kim.barrett at oracle.com>
>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net;
>> David Holmes <David.Holmes at oracle.com>
>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>
>> Hi Martin,
>>
>> On 29/10/2019 12:06 am, Doerr, Martin wrote:
>>> Hi David and Kim,
>>>
>>> I think it's easier to talk about code. So here's a new webrev:
>>>
>> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
>> ev.03/
>>
>> I don't think factoring out CompileBroker::clear_compiler2_object when
>> it is only used once was warranted, but that's call for compiler team to
>> make. Otherwise changes seem fine and I have noted the use of the
>> MutexUnlocker as per your direct email.
>>
>> Thanks,
>> David
>> -----
>>
>>> @Kim:
>>> Thanks for looking at the handle related parts. It's ok if you don't want to
>> be a reviewer of the whole change.
>>>> I think it's weird that can_remove() is a predicate with optional side
>>>> effects.  I think it would be simpler to have it be a pure predicate,
>>>> and have the one caller with do_it = true perform the updates.  That
>>>> should include NULLing out the handle pointer (perhaps debug-only, but
>>>> it doesn't cost much to cleanly maintain the data structure).
>>> Nevertheless, it has the advantage that it enforces the update to be
>> consistent.
>>> A caller could use it without holding the lock or mess it up otherwise.
>>> In addition, I don't what to change that as part of this fix.
>>>
>>>> So far as I can tell, THREAD == NULL here.
>>> This is a very tricky part (not my invention):
>>> EXCEPTION_MARK contains an ExceptionMark constructor call which sets
>> __the_thread__ to Thread::current().
>>> I don't want to publish my opinion about this ??
>>>
>>> @David:
>>> Seems like this option is preferred over option 3
>> (possibly_add_compiler_threads part of webrev.02 and leave the
>> initialization as is).
>>> So when you're ok with it, I'll request a 2nd review from the compiler folks
>> (I should change the subject to contain RFR).
>>> Thanks,
>>> Martin
>>>
>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> Sent: Montag, 28. Oktober 2019 05:04
>>>> To: Kim Barrett <kim.barrett at oracle.com>
>>>> Cc: Doerr, Martin <martin.doerr at sap.com>; Vladimir Kozlov
>>>> (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>; hotspot-
>>>> compiler-dev at openjdk.java.net
>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>>>
>>>> On 28/10/2019 1:42 pm, Kim Barrett wrote:
>>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes
>> <david.holmes at oracle.com>
>>>> wrote:
>>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote:
>>>>>>> Hi Kim,
>>>>>>> I didn't like using the OopStorage stuff directly, either. I just have not
>>>> seen how to allocate a global handle and add the oop later.
>>>>>>> Thanks for pointing me to JVMCI::make_global. I was not aware of
>> that.
>>>>>>> So I can imagine 3 ways to implement it:
>>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just added
>>>> that to
>> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
>>>> ev.01/
>>>>>>> We may want to improve that further by setting the handle pointer to
>>>> NULL and asserting that it is NULL before adding the new one.
>>>>>>> I had been concerned about NULLs in the array, but looks like the
>>>> existing code can deal with that.
>>>>>> I think it would be cleaner to both destroy the global handle and NULL it
>> in
>>>> the array at the same time.
>>>>>> This comment
>>>>>>
>>>>>> 325         // Old j.l.Thread object can die here.
>>>>>>
>>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct->threadObj()
>> so
>>>> can't "die" until that is also cleared during the actual termination process.
>>>>> I think if there is such a thread here that it can't die, because the
>>>>> death predicate (the can_remove stuff) won't see that old thread as
>>>>> the last thread in _compiler2_objects.  That's what I meant by this:
>>>>>
>>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett <kim.barrett at oracle.com>
>>>> wrote:
>>>>>> I also think that here:
>>>>>>
>>>>>> 947         jobject thread_handle =
>> JNIHandles::make_global(thread_oop);
>>>>>> 948         _compiler2_objects[i] = thread_handle;
>>>>>>
>>>>>> should assert _compiler2_objects[i] == NULL.  Or if that isn't a valid
>>>>>> assertion then I think there are other problems.
>>>>> I think either that comment about an old thread is wrong (and the NULL
>>>>> assertion I suggested is okay), or I think the whole mechanism here
>>>>> has problems.  Or at least I was unable to figure out how it could work...
>>>>>
>>>> I'm not following sorry. You can't assert NULL unless it's actually set
>>>> to NULL which it presently isn't. But it could be set NULL as Martin
>>>> suggested:
>>>>
>>>> "We may want to improve that further by setting the handle pointer to
>>>> NULL and asserting that it is NULL before adding the new one."
>>>>
>>>> and which I also supported. But that aside once the delete_global has
>>>> been called that JNIHandle no longer references the j.l.Thread that it
>>>> did, at which point it is only reachable via the threadObj() of the
>>>> CompilerThread.
>>>>
>>>> David


From fujie at loongson.cn  Sat Nov  2 09:29:01 2019
From: fujie at loongson.cn (Jie Fu)
Date: Sat, 2 Nov 2019 17:29:01 +0800
Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003
Message-ID: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn>

Hi all,

May I get reviews for this small fix?

JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233429
Webrev: http://cr.openjdk.java.net/~jiefu/8233429/webrev.00/

Thanks a lot.
Best regards,
Jie


From bsrbnd at gmail.com  Sat Nov  2 17:18:29 2019
From: bsrbnd at gmail.com (B. Blaser)
Date: Sat, 2 Nov 2019 18:18:29 +0100
Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and setting
 long vector bits
Message-ID: <CAEgw74CWKFCH9c744sJcAEf4_9-33vk9HBGosNH4QwgEH4nKYw@mail.gmail.com>

Hi,

I experimented, some time ago, with an optimization of several common
flag patterns (see also JBS) using BTR/BTS instead of AND/OR
instructions on x86_64 xeon:

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class BitSetAndReset {
    private static final int COUNT = 10_000;

    private static final long MASK63 = 0x8000_0000_0000_0000L;
    private static final long MASK31 = 0x0000_0000_8000_0000L;
    private static final long MASK15 = 0x0000_0000_0000_8000L;
    private static final long MASK00 = 0x0000_0000_0000_0001L;

    private long andq, orq;
    private boolean success = true;

    @TearDown(Level.Iteration)
    public void finish() {
        if (!success)
            throw new AssertionError("Failure while setting or
clearing long vector bits!");
    }

    @Benchmark
    public void bitSet(Blackhole bh) {
        for (int i=0; i<COUNT; i++) {
            andq = MASK63 | MASK31 | MASK15 | MASK00;
            orq = 0;
            bh.consume(test63());
            bh.consume(test31());
            bh.consume(test15());
            bh.consume(test00());
            success &= andq == 0 && orq == (MASK63 | MASK31 | MASK15 | MASK00);
        }
    }

    private long test63() {
        andq &= ~MASK63;
        orq |= MASK63;
        return 0L;
    }
    private long test31() {
        andq &= ~MASK31;
        orq |= MASK31;
        return 0L;
    }
    private long test15() {
        andq &= ~MASK15;
        orq |= MASK15;
        return 0L;
    }
    private long test00() {
        andq &= ~MASK00;
        orq |= MASK00;
        return 0L;
    }
}

Running the benchmark this way:

$ make test TEST="micro:vm.compiler.BitSetAndReset"
MICRO="VM_OPTIONS='-XX:CompileCommand=print,org/openjdk/bench/vm/compiler/BitSetAndReset.*test*';FORK=3;WARMUP_ITER=1;ITER=3"

We had before:

03e       movq    R10, #9223372036854775807    # long
048       andq    [RSI + #16 (8-bit)], R10    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.andq
04c       movq    R10, #-9223372036854775808    # long
056       orq     [RSI + #24 (8-bit)], R10    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.orq
05a       ...

=> 28 bytes

03c       xorl    RAX, RAX    # long
03e       movq    R10, #-2147483649    # long
048       andq    [RSI + #16 (8-bit)], R10    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.andq
04c       movl    R10, #2147483648    # long (unsigned 32-bit)
052       orq     [RSI + #24 (8-bit)], R10    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.orq
056       ...

=> 26 bytes

03c       andq    [RSI + #16 (8-bit)], #-32769    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.andq
044       orq     [RSI + #24 (8-bit)], #32768    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.orq
04c       ...

03c       andq    [RSI + #16 (8-bit)], #-2    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.andq
041       orq     [RSI + #24 (8-bit)], #1    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.orq
046       ...

Benchmark              Mode  Cnt      Score      Error  Units
BitSetAndReset.bitSet  avgt    9  78083.773 ? 2182.692  ns/op

And we would have after:

03c       btrq    [RSI + #16 (8-bit)], log2(not(#9223372036854775807))
   # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq
042       btsq    [RSI + #24 (8-bit)], log2(#-9223372036854775808)
# long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq
048       ...

=> 12 bytes

03c       btrq    [RSI + #16 (8-bit)], log2(not(#-2147483649))    #
long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq
042       xorl    RAX, RAX    # long
044       movl    R10, #2147483648    # long (unsigned 32-bit)
04a       orq     [RSI + #24 (8-bit)], R10    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.orq
04e       ...

=> 18 bytes

03c       andq    [RSI + #16 (8-bit)], #-32769    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.andq
044       orq     [RSI + #24 (8-bit)], #32768    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.orq
04c       ...

03c       andq    [RSI + #16 (8-bit)], #-2    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.andq
041       orq     [RSI + #24 (8-bit)], #1    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.orq
046       ...

Benchmark              Mode  Cnt      Score     Error  Units
BitSetAndReset.bitSet  avgt    9  77355.154 ? 252.503  ns/op

We see a tiny performance gain with BTR/BTS but the major interest
remains the much better encoding with up to 16 bytes saving for pure
64-bit immediates along with a lower register consumption.

Does the patch below look reasonable enough to eventually rebase and
push it to jdk/submit and to post a RFR maybe soon if all goes well?

Thanks,
Bernard

diff --git a/src/hotspot/cpu/x86/x86_64.ad b/src/hotspot/cpu/x86/x86_64.ad
--- a/src/hotspot/cpu/x86/x86_64.ad
+++ b/src/hotspot/cpu/x86/x86_64.ad
@@ -2069,6 +2069,16 @@
     }
   %}

+  enc_class Log2L(immPow2L imm)
+  %{
+    emit_d8(cbuf, log2_long($imm$$constant));
+  %}
+
+  enc_class Log2NotL(immPow2NotL imm)
+  %{
+    emit_d8(cbuf, log2_long(~$imm$$constant));
+  %}
+
   enc_class opc2_reg(rRegI dst)
   %{
     // BSWAP
@@ -3131,6 +3141,28 @@
   interface(CONST_INTER);
 %}

+operand immPow2L()
+%{
+  // n should be a pure 64-bit power of 2 immediate.
+  predicate(is_power_of_2_long(n->get_long()) &&
log2_long(n->get_long()) > 31);
+  match(ConL);
+
+  op_cost(15);
+  format %{ %}
+  interface(CONST_INTER);
+%}
+
+operand immPow2NotL()
+%{
+  // n should be a pure 64-bit immediate given that not(n) is a power of 2.
+  predicate(is_power_of_2_long(~n->get_long()) &&
log2_long(~n->get_long()) > 30);
+  match(ConL);
+
+  op_cost(15);
+  format %{ %}
+  interface(CONST_INTER);
+%}
+
 // Long Immediate zero
 operand immL0()
 %{
@@ -9740,6 +9772,19 @@
   ins_pipe(ialu_mem_imm);
 %}

+instruct btrL_mem_imm(memory dst, immPow2NotL src, rFlagsReg cr)
+%{
+  match(Set dst (StoreL dst (AndL (LoadL dst) src)));
+  effect(KILL cr);
+
+  ins_cost(125);
+  format %{ "btrq    $dst, log2(not($src))\t# long" %}
+  opcode(0x0F, 0xBA, 0x06);
+  ins_encode(REX_mem_wide(dst), OpcP, OpcS,
+             RM_opc_mem(tertiary, dst), Log2NotL(src));
+  ins_pipe(ialu_mem_imm);
+%}
+
 // BMI1 instructions
 instruct andnL_rReg_rReg_mem(rRegL dst, rRegL src1, memory src2,
immL_M1 minus_1, rFlagsReg cr) %{
   match(Set dst (AndL (XorL src1 minus_1) (LoadL src2)));
@@ -9933,6 +9978,19 @@
   ins_pipe(ialu_mem_imm);
 %}

+instruct btsL_mem_imm(memory dst, immPow2L src, rFlagsReg cr)
+%{
+  match(Set dst (StoreL dst (OrL (LoadL dst) src)));
+  effect(KILL cr);
+
+  ins_cost(125);
+  format %{ "btsq    $dst, log2($src)\t# long" %}
+  opcode(0x0F, 0xBA, 0x05);
+  ins_encode(REX_mem_wide(dst), OpcP, OpcS,
+             RM_opc_mem(tertiary, dst), Log2L(src));
+  ins_pipe(ialu_mem_imm);
+%}
+
 // Xor Instructions
 // Xor Register with Register
 instruct xorL_rReg(rRegL dst, rRegL src, rFlagsReg cr)

From tobias.hartmann at oracle.com  Mon Nov  4 07:25:20 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 4 Nov 2019 08:25:20 +0100
Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out
In-Reply-To: <878spbc0c8.fsf@redhat.com>
References: <878spbc0c8.fsf@redhat.com>
Message-ID: <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com>

Hi Roland,

this seems reasonable to me but I'm concerned that it might cause performance regressions. I'll run
some tests in our system.

Best regards,
Tobias

On 23.10.19 10:50, Roland Westrelin wrote:
> 
> http://cr.openjdk.java.net/~roland/8232539/webrev.00/
> 
> I couldn't come up with a test case because node processing order during
> IGVN matters. Bug was reported against 11 but I see no reason it
> wouldn't apply to current code as well.
> 
> At parse time, predicates are added by
> Parse::maybe_add_predicate_after_if() but not loop is actually
> created. Compile::_major_progress is cleared. On the next round of IGVN,
> one input of a region points to predicates. The same region has an if as
> use that can be split through phi during IGVN. The predicates are going
> to be removed by IGVN. But that happens in multiple steps because there
> are several predicates (for reason Deoptimization::Reason_predicate,
> Deoptimization::Reason_loop_limit_check etc.) and because for each
> predicate one IGVN iteration must first remove the Opaque1 node, then
> another kill the IfFalse projection, finally another replace the IfTrue
> projection by the If control input.
> 
> Split if occurs while predicates are in the process of being removed. It
> sees predicates, tries to walk over them, encounters a predicates that's
> been half removed (false projection removed) and we hit the assert/crash.
> 
> I propose we simply not apply IGVN split if if we're splitting through a
> loop or if there's a predicate input to a region because:
> 
> - Making split if robust to dying predicates is not straightforward as
>   far as I can tell
> 
> - Loop opts split if doesn't split through loop header so why would it
>   make sense for IGVN split if?
> 
> - I'm wondering if there are other cases where handling of predicates in
>   split if could be wrong (and so more trouble ahead):
> 
>   + What if we split through a Loop region, predicates were added by
>   loop optimizations, loop opts are now over so the predicates added at
>   parse time were removed: then PhaseIdealLoop::find_predicate()
>   wouldn't report a predicate but cloning predicates would still be
>   required for correctness?
> 
>   + What if we have no loop, a region has predicates as input,
>   predicates are going to die but have not yet been processed, split if
>   uselessly duplicates predicates but one of then is control dependent
>   on the branch it is in so cloning predicates actually causes a broken
>   graph?
> 
> So overall it feels safer to me to simply bail out from split if for
> loops/predicates.
> 
> Roland.
> 

From tobias.hartmann at oracle.com  Mon Nov  4 07:33:06 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 4 Nov 2019 08:33:06 +0100
Subject: RFR(S): 8233081: C1: PatchingStub for field access copies too much
In-Reply-To: <VI1PR0201MB24798B4FFDCAEE55104A74279A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB24798B4FFDCAEE55104A74279A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <70bf20d2-a1f4-2e02-7c7a-62f986eaf7c0@oracle.com>

Hi Martin,

nice cleanup, x86/Sparc looks good to me.

Best regards,
Tobias

On 30.10.19 16:38, Doerr, Martin wrote:
> Hi,
> 
> ?
> 
> I'd like to fix an issue in C1's PatchingStub implementation for "access_field_id".
> 
> We had noticed that the code in the template exceeded the 255 byte limitation when switching on
> VerifyOops on PPC64.
> 
> I'd like to improve the situation for all platforms.
> 
> ?
> 
> More detailed bug description:
> 
> https://bugs.openjdk.java.net/browse/JDK-8233081
> 
> ?
> 
> I need a function to determine how many bytes are needed for the NativeMovRegMem.
> 
> x86 has next_instruction_address() which could in theory be used, but I noticed that it's dead code
> which is no longer correct.
> 
> Is it ok to remove it?
> 
> I?d also like to remove the constant instruction_size from NativeMovRegMem because it?s not constant.
> 
> I'd prefer to introduce num_bytes_to_end_of_patch() for the purpose of determining how many bytes to
> copy for the "access_field_id" PatchingStub.
> 
> We can factor out the offset computation from offset() and set_offset() and reuse it. This enforces
> consistency.
> 
> ?
> 
> Webrev:
> 
> http://cr.openjdk.java.net/~mdoerr/8233081_C1_access_field_patching/webrev.00/
> 
> ?
> 
> Please review.
> 
> ?
> 
> Best regards,
> 
> Martin
> 
> ?
> 

From tobias.hartmann at oracle.com  Mon Nov  4 07:38:41 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 4 Nov 2019 08:38:41 +0100
Subject: RFR 8233389: Add PrintIdeal to compiler directives
In-Reply-To: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com>
References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com>
Message-ID: <fd672407-cf37-d07b-e61b-16bfeba5e0d4@oracle.com>

Hi Jorn,

looks good to me. I can sponsor the change once you got a second review.

Best regards,
Tobias

On 01.11.19 16:09, Jorn Vernee wrote:
> Hi,
> 
> I'd like to add PrintIdeal as a compiler directive in order to enable PrintIdeal for only a single
> method when combining it with the 'match' directive.
> 
> Please review the following:
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8233389
> Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/
> (Testing = tier1, manual)
> 
> As a heads-up; I'm not a committer on the jdk project, so if this sounds like a good idea, I would
> require a sponsor to push the changes.
> 
> Thanks,
> Jorn
> 

From rwestrel at redhat.com  Mon Nov  4 08:16:30 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 04 Nov 2019 09:16:30 +0100
Subject: RFR 8233389: Add PrintIdeal to compiler directives
In-Reply-To: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com>
References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com>
Message-ID: <877e4g83a9.fsf@redhat.com>


> Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/

Looks good to me.

Roland.

From nils.eliasson at oracle.com  Mon Nov  4 09:16:22 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Mon, 4 Nov 2019 10:16:22 +0100
Subject: RFR(M): 8232896: ZGC: Enable C2 clone intrinsic
In-Reply-To: <3e148258-8c89-afdd-bf4f-585be5e09c42@oracle.com>
References: <2d509788-35f0-1fed-c305-d98d76583c66@oracle.com>
 <3e148258-8c89-afdd-bf4f-585be5e09c42@oracle.com>
Message-ID: <aa0e8fe1-e3af-6ebd-3189-975b8d59f4be@oracle.com>


On 2019-11-01 12:50, Per Liden wrote:
> Hi Nils,
>
> On 10/31/19 5:37 PM, Nils Eliasson wrote:
>> Hi,
>>
>> This patch fixes and enables the clone intrinsic for C2 with ZGC.
>>
>> The main thing added is an implementation of the 
>> ZBarrierSetC2::clone_at_expansion method. This method handles clone 
>> expansion for regular objects and primitive arrays that hasn't 
>> already been reduced by optimizations. (Oop array clones doesn't go 
>> through this path.)
>>
>> The code switches on the type of the source to either make a leaf 
>> call to a runtime clone for regular objects, or delegate to 
>> BarrierSetC2::clone_at_expansion for primitive arrays.
>>
>> Updated micro benchmark shows great gains, especially for small 
>> objects that now will be reduced to inlined move-store sequences.
>
> Sweet! The speedup of micro:Clone is impressive!
>
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8232896
>>
>> Webrev: http://cr.openjdk.java.net/~neliasso/8232896/webrev.02/
>
> My review comments summarized in a patch on top of yours. Let me know 
> if there's something you don't agree with. The changes are:
> * Changed the type of the runtime function's size arguments from 
> TypeInt::INT to TypeLong::LONG, to make it 64 bits, rather than 32.
> * Changed the new runtime function to call HeapAccess::clone() rather 
> than ZBarrier::clone_oop(), since ZBarrier is the backend for the 
> access layer, not the frontend.
> * Renamed clone_oop() to clone(), as we're cloning an object rather 
> than an oop.

Overall, I like your suggestions.

One problem with using "clone" is that is suggest that it will create a 
new Object too (like how java.lang.Object.clone does). But this function 
only copies the contents. I am open to suggestions, and even keeping the 
clone name, but feel a slight bit of unease.

Thanks!

/Nils

> * Added const where applicable.
> * Adjusted an include line.
> * Moved the clone_at_expansion() function up in the file.
>
> http://cr.openjdk.java.net/~pliden/8232896/webrev.pliden_review.0
>
> I ran micro:Clone to verify my changes.
>
> cheers,
> Per
>
>>
>>
>> Please review,
>>
>> Nils Eliasson
>>

From shade at redhat.com  Mon Nov  4 08:53:47 2019
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 4 Nov 2019 09:53:47 +0100
Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003
In-Reply-To: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn>
References: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn>
Message-ID: <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com>

On 11/2/19 10:29 AM, Jie Fu wrote:
> Hi all,
> 
> May I get reviews for this small fix?
> 
> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233429
> Webrev: http://cr.openjdk.java.net/~jiefu/8233429/webrev.00/

Looks fine to me.

The alternative is to stub out CompilationModeFlag::*() definitions under TIERED define, but that
would be more awkward than effectively using the "default" mode for minimal and zero VMs.

-- 
Thanks,
-Aleksey


From tobias.hartmann at oracle.com  Mon Nov  4 08:55:50 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 4 Nov 2019 09:55:50 +0100
Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003
In-Reply-To: <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com>
References: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn>
 <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com>
Message-ID: <20c3a0ff-3b96-3fc8-f175-73eda72885cb@oracle.com>

+1

Best regards,
Tobias

On 04.11.19 09:53, Aleksey Shipilev wrote:
> On 11/2/19 10:29 AM, Jie Fu wrote:
>> Hi all,
>>
>> May I get reviews for this small fix?
>>
>> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233429
>> Webrev: http://cr.openjdk.java.net/~jiefu/8233429/webrev.00/
> 
> Looks fine to me.
> 
> The alternative is to stub out CompilationModeFlag::*() definitions under TIERED define, but that
> would be more awkward than effectively using the "default" mode for minimal and zero VMs.
> 

From per.liden at oracle.com  Mon Nov  4 09:26:14 2019
From: per.liden at oracle.com (Per Liden)
Date: Mon, 4 Nov 2019 10:26:14 +0100
Subject: RFR(M): 8232896: ZGC: Enable C2 clone intrinsic
In-Reply-To: <aa0e8fe1-e3af-6ebd-3189-975b8d59f4be@oracle.com>
References: <2d509788-35f0-1fed-c305-d98d76583c66@oracle.com>
 <3e148258-8c89-afdd-bf4f-585be5e09c42@oracle.com>
 <aa0e8fe1-e3af-6ebd-3189-975b8d59f4be@oracle.com>
Message-ID: <fa4bad8d-b1d6-f504-48a2-b2a9d6466471@oracle.com>

Hi Nils,

On 11/4/19 10:16 AM, Nils Eliasson wrote:
> 
> On 2019-11-01 12:50, Per Liden wrote:
>> Hi Nils,
>>
>> On 10/31/19 5:37 PM, Nils Eliasson wrote:
>>> Hi,
>>>
>>> This patch fixes and enables the clone intrinsic for C2 with ZGC.
>>>
>>> The main thing added is an implementation of the 
>>> ZBarrierSetC2::clone_at_expansion method. This method handles clone 
>>> expansion for regular objects and primitive arrays that hasn't 
>>> already been reduced by optimizations. (Oop array clones doesn't go 
>>> through this path.)
>>>
>>> The code switches on the type of the source to either make a leaf 
>>> call to a runtime clone for regular objects, or delegate to 
>>> BarrierSetC2::clone_at_expansion for primitive arrays.
>>>
>>> Updated micro benchmark shows great gains, especially for small 
>>> objects that now will be reduced to inlined move-store sequences.
>>
>> Sweet! The speedup of micro:Clone is impressive!
>>
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8232896
>>>
>>> Webrev: http://cr.openjdk.java.net/~neliasso/8232896/webrev.02/
>>
>> My review comments summarized in a patch on top of yours. Let me know 
>> if there's something you don't agree with. The changes are:
>> * Changed the type of the runtime function's size arguments from 
>> TypeInt::INT to TypeLong::LONG, to make it 64 bits, rather than 32.
>> * Changed the new runtime function to call HeapAccess::clone() rather 
>> than ZBarrier::clone_oop(), since ZBarrier is the backend for the 
>> access layer, not the frontend.
>> * Renamed clone_oop() to clone(), as we're cloning an object rather 
>> than an oop.
> 
> Overall, I like your suggestions.
> 
> One problem with using "clone" is that is suggest that it will create a 
> new Object too (like how java.lang.Object.clone does). But this function 
> only copies the contents. I am open to suggestions, and even keeping the 
> clone name, but feel a slight bit of unease.

I see your points. However, I see two reasons to keep that name:
1) In the Access API it's called Access::clone(src, dst, size), and 
since ZBarrierRuntime::clone() is a bridge to that function it might be 
good to use the same name.
2) The clone() function has a dst argument, implying that dst already 
exists.

Objections?

cheers,
Per

> 
> Thanks!
> 
> /Nils
> 
>> * Added const where applicable.
>> * Adjusted an include line.
>> * Moved the clone_at_expansion() function up in the file.
>>
>> http://cr.openjdk.java.net/~pliden/8232896/webrev.pliden_review.0
>>
>> I ran micro:Clone to verify my changes.
>>
>> cheers,
>> Per
>>
>>>
>>>
>>> Please review,
>>>
>>> Nils Eliasson
>>>

From fujie at loongson.cn  Mon Nov  4 09:33:46 2019
From: fujie at loongson.cn (Jie Fu)
Date: Mon, 4 Nov 2019 17:33:46 +0800
Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003
In-Reply-To: <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com>
References: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn>
 <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com>
Message-ID: <cf97774a-98e7-fef4-dab7-5e1472b08612@loongson.cn>

Hi Aleksey and Tobias,

Thanks for your review and valuable comments.

I'm sorry to mention that Igor is teaching me how to fix the bug off the 
list these days.

What do you think of this version?
 ? http://cr.openjdk.java.net/~jiefu/8233429/webrev.01/

I prefer webrev.01.

Thanks a lot.
Best regards,
Jie


On 2019/11/4 ??4:53, Aleksey Shipilev wrote:
> On 11/2/19 10:29 AM, Jie Fu wrote:
>> Hi all,
>>
>> May I get reviews for this small fix?
>>
>> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233429
>> Webrev: http://cr.openjdk.java.net/~jiefu/8233429/webrev.00/
> Looks fine to me.
>
> The alternative is to stub out CompilationModeFlag::*() definitions under TIERED define, but that
> would be more awkward than effectively using the "default" mode for minimal and zero VMs.
>


From nils.eliasson at oracle.com  Mon Nov  4 09:52:15 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Mon, 4 Nov 2019 10:52:15 +0100
Subject: RFR(M): 8232896: ZGC: Enable C2 clone intrinsic
In-Reply-To: <fa4bad8d-b1d6-f504-48a2-b2a9d6466471@oracle.com>
References: <2d509788-35f0-1fed-c305-d98d76583c66@oracle.com>
 <3e148258-8c89-afdd-bf4f-585be5e09c42@oracle.com>
 <aa0e8fe1-e3af-6ebd-3189-975b8d59f4be@oracle.com>
 <fa4bad8d-b1d6-f504-48a2-b2a9d6466471@oracle.com>
Message-ID: <9f1fe58a-17b9-899a-ffd9-f743e8cb1551@oracle.com>


On 2019-11-04 10:26, Per Liden wrote:
> Hi Nils,
>
> On 11/4/19 10:16 AM, Nils Eliasson wrote:
>>
>> On 2019-11-01 12:50, Per Liden wrote:
>>> Hi Nils,
>>>
>>> On 10/31/19 5:37 PM, Nils Eliasson wrote:
>>>> Hi,
>>>>
>>>> This patch fixes and enables the clone intrinsic for C2 with ZGC.
>>>>
>>>> The main thing added is an implementation of the 
>>>> ZBarrierSetC2::clone_at_expansion method. This method handles clone 
>>>> expansion for regular objects and primitive arrays that hasn't 
>>>> already been reduced by optimizations. (Oop array clones doesn't go 
>>>> through this path.)
>>>>
>>>> The code switches on the type of the source to either make a leaf 
>>>> call to a runtime clone for regular objects, or delegate to 
>>>> BarrierSetC2::clone_at_expansion for primitive arrays.
>>>>
>>>> Updated micro benchmark shows great gains, especially for small 
>>>> objects that now will be reduced to inlined move-store sequences.
>>>
>>> Sweet! The speedup of micro:Clone is impressive!
>>>
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8232896
>>>>
>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8232896/webrev.02/
>>>
>>> My review comments summarized in a patch on top of yours. Let me 
>>> know if there's something you don't agree with. The changes are:
>>> * Changed the type of the runtime function's size arguments from 
>>> TypeInt::INT to TypeLong::LONG, to make it 64 bits, rather than 32.
>>> * Changed the new runtime function to call HeapAccess::clone() 
>>> rather than ZBarrier::clone_oop(), since ZBarrier is the backend for 
>>> the access layer, not the frontend.
>>> * Renamed clone_oop() to clone(), as we're cloning an object rather 
>>> than an oop.
>>
>> Overall, I like your suggestions.
>>
>> One problem with using "clone" is that is suggest that it will create 
>> a new Object too (like how java.lang.Object.clone does). But this 
>> function only copies the contents. I am open to suggestions, and even 
>> keeping the clone name, but feel a slight bit of unease.
>
> I see your points. However, I see two reasons to keep that name:
> 1) In the Access API it's called Access::clone(src, dst, size), and 
> since ZBarrierRuntime::clone() is a bridge to that function it might 
> be good to use the same name.
> 2) The clone() function has a dst argument, implying that dst already 
> exists.
>
> Objections?
>
> cheers,
> Per

No,

Lets go with this.

Thanks!

// Nils

>
>>
>> Thanks!
>>
>> /Nils
>>
>>> * Added const where applicable.
>>> * Adjusted an include line.
>>> * Moved the clone_at_expansion() function up in the file.
>>>
>>> http://cr.openjdk.java.net/~pliden/8232896/webrev.pliden_review.0
>>>
>>> I ran micro:Clone to verify my changes.
>>>
>>> cheers,
>>> Per
>>>
>>>>
>>>>
>>>> Please review,
>>>>
>>>> Nils Eliasson
>>>>

From adinn at redhat.com  Mon Nov  4 10:08:33 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Mon, 4 Nov 2019 10:08:33 +0000
Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free
 blocks
In-Reply-To: <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com>
References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com>
 <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com>
 <B89E11F0-98EE-49F1-87EB-FC22CB7D54A2@sap.com>
 <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com>
 <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com>
 <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com>
 <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com>
 <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com>
 <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com>
Message-ID: <dd34fa81-b027-4353-df0f-daedf9911b6d@redhat.com>

Hi Lutz,

I'll summarize my thoughts here rather than answer point by point.

The patch successfully addresses the worst case performance but it seems
to me extremely unlikely that we will see anything that approaches that
case in real applications. So, that doesn't argue for pushing the patch.

The patch does not seem to make a significant difference to the stress
test. This test is also not necessarily 'representative' of real cases
but it is much more likely to be so than the worst case test. That
suggests to me that the current patch is perhaps not worth pursuing (it
ain't really broke so ...).  Especially so given that it is not possible
to distinguish any benefit when running the Spec benchmark apps. One
could argue that the patch looks like it will do no harm and may do good
in pathological  cases but that's not really good enough reason to make
a change. We really need evidence that this is worth doing.

The free list 'search bottleneck' certainly looks like a more promising
problem to tackle than the 'merge problem'. However, once again this
'problem' may just be an artefact of running this specific test rather
than anything that might happen in real life.

I think the only way to find out for sure whether the current patch or a
patch that addresses the 'search bottleneck' is going to be beneficial
is to instrument the JVM to record traces for code-cache use from real
apps and then replay allocations/frees based on those traces to see what
difference a patch makes and how much this might help the overall
execution time.

regards,


Andrew Dinn
-----------

On 31/10/2019 16:55, Schmidt, Lutz wrote:
> Hi Andrew, (and hi to the interested crowd),
> 
> Please accept my apologies for taking so long to get back.
> 
> These tests (OverflowCodeCacheTest and StressCodeCacheTest) were causing me quite some headaches. Some layer between me and the test prevents the vm (in particular: the VMThread) from terminating normally. The final output from my time measurements is therefore not generated or thrown away. Adding to that were some test machine unavailabilities and a bug in my measurement code, causing crashes. 
> 
> Anyway, I added some on-the-fly output, printing the timer values after 10k measurement intervals. This reveals some interesting, additional facts about the tests and the CodeHeap management methods. For detailed numbers, refer to the files attached to the bug (https://bugs.openjdk.java.net/browse/JDK-8231460). For even more detail, I can provide the jtr files on request. 
> 
> 
> OverflowCodeCacheTest
> =====================
> This test runs (in my setup) with a 1GB CodeCache.
> 
> For this test, CodeHeap::mark_segmap_as_used() is THE performance hog. 40% of all calls have to mark more than 16k segment map entries (in the not optimized case). Basically all of these calls convert to len=1 calls with the optimization turned on. Note that during FreeBlock joining, the segment count is forced to 1(one). No wonder the time spent in CodeHeap::mark_segmap_as_used() collapses from >80sec (half of the test runtime) to <100msec. 
> 
> CodeHeap::add_to_freelist() on the other hand, is almost not observable. Average free list length is at two elements, making even linear search really quick. 
> 
> 
> StressCodeCacheTest
> ===================
> With a 1GB CodeCache, this test runs into a 12 min timeout, set by our internal test environment. Scaling back to 300MB prevents the test from timing out.
> 
> For this test, CodeHeap::mark_segmap_as_used() is not a factor. From 200,000 calls, only a few (less than 3%) had to process a block consisting of more than 16 segments. Note that during FreeBlock joining, the segment count is forced to 1(one). 
> 
> Another method is popping up as performance hog instead: CodeHeap::add_to_freelist(). More than 8 out of 40 seconds of test runtime (before optimization) are spent in this method, for just 160,000 calls. The test seems to create a long list of non-contiguous free blocks (around 5,500 on average). This list is linearly scanned to find the insert point for the free block at hand.
> 
> Suffering as well from the long free block list is CodeHeap::search_freelist(). It uses another 2.7 seconds for 270,000 calls.  
> 
> 
> SPEVjvm2008 suite
> =================
> With respect to the task at hand, this is a well-behaved test suite. Timing shows some before/after difference, but nothing spectacular. The measurements due not provide evidence of a performance bottleneck. 
> 
> 
> There were some minor adjustments to the code. Unused code blocks have been removed as well. I have therefore created a new webrev. You can find it here:
>    http://cr.openjdk.java.net/~lucy/webrevs/8231460.01/ 
> 
> Thanks for investing your time!
> Lutz
> 
> 
> On 21.10.19, 15:06, "Andrew Dinn" <adinn at redhat.com> wrote:
> 
>     Hi Lutz,
>     
>     On 21/10/2019 13:37, Schmidt, Lutz wrote:
>     > I understand what you are interested in. And I was hoping to be able
>     > to provide some (first) numbers by today. Unfortunately, the
>     > measurement code I activated last Friday was buggy and blew most of
>     > the tests I had hoped to run over the weekend.
>     > 
>     > I will take your modified test and run it with and without my
>     > optimization. In parallel, I will try to generate some (non-random)
>     > numbers for other tests.
>     > 
>     > I'll be back as soon as I have results.
>     
>     Thanks for trying the test and also for deriving some call stats from a
>     real example. I'm keen to see how much your patch improves things.
>     
>     regards,
>     
>     
>     Andrew Dinn
>     -----------
>     Senior Principal Software Engineer
>     Red Hat UK Ltd
>     Registered in England and Wales under Company Registration No. 03798903
>     Directors: Michael Cunningham, Michael ("Mike") O'Neill
>     
>     
> 
> 


From martin.doerr at sap.com  Mon Nov  4 11:12:42 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 4 Nov 2019 11:12:42 +0000
Subject: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns
 wrong result if Klass* is aligned to 32bit
In-Reply-To: <CAA-vtUz6P7C3=xkfwBYG7MoGFY37r2XS87Am89Pu9GMEsNSWTw@mail.gmail.com>
References: <CAA-vtUzdW000k20ZUEH03PbCA5Zt2S6MUGupt_F842fL36Uwvg@mail.gmail.com>
 <VI1PR0201MB247991750418A9641C45209F9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <CAA-vtUz6P7C3=xkfwBYG7MoGFY37r2XS87Am89Pu9GMEsNSWTw@mail.gmail.com>
Message-ID: <VI1PR0201MB247921C887DC1A88CE99BFD49A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Thomas,

looks good.

Thanks,
Martin


From: Thomas St?fe <thomas.stuefe at gmail.com>
Sent: Donnerstag, 31. Oktober 2019 08:02
To: Doerr, Martin <martin.doerr at sap.com>
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; Schmidt, Lutz <lutz.schmidt at sap.com>
Subject: Re: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns wrong result if Klass* is aligned to 32bit

Hi Martin,

thanks for the review!

New version: http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.01/webrev/

pls find remarks inline.

On Wed, Oct 30, 2019 at 1:09 PM Doerr, Martin <martin.doerr at sap.com<mailto:martin.doerr at sap.com>> wrote:
Hi Thomas,

thank you for finding and fixing this issue.

Shared code part and regression test look good to me.
But I have a few requests and questions related to the platform code.

Arm32:
Breaks build. The local variable p needs a scope. Can be fixed by:
-        case T_METADATA:
+        case T_METADATA: {
           // We only need, for now, comparison with NULL for metadata.
?
           break;
+        }

Ouch. Fixed. Actually, I discarded my coding completely and took over the code from T_OBJECT to keep in line with the rest of the coding here.


S390:
Using z_cgfi is correct, but there are equivalent instructions with shorter opcode.
For comparing to 0, z_ltgr(reg1, reg1) or z_cghi(reg1, 0) may be preferred. But that?s not a big deal.


Okay I switched to cghi. ltgr sounds cool but would be difficult to integrate into the shared part since that one does first a move, then a compare.

I wonder why you have added includes to some platform files. Isn?t that redundant?
"utilities/debug.hpp" comes via shared assembler.hpp.

Did this because I added asserts and the rule is that every file should include what it needs and not rely on other includes including it (save for umbrella includes like globalDefinitions.hpp). But okay, I removed the added includes again to keep the patch small.


I?d probably choose Unimplemented() instead of ShouldNotReachHere() for non-null cases because it?s not bad in general, it?s just currently not used and therefore not yet implemented.
But you can keep that as it is. I?m ok with that, too.

I rather keep it to keep in line with the rest of the code (see e.g. the default: branches).


Best regards,
Martin


Thanks Martin!

..Thomas


From: Thomas St?fe <thomas.stuefe at gmail.com<mailto:thomas.stuefe at gmail.com>>
Sent: Mittwoch, 30. Oktober 2019 11:48
To: hotspot compiler <hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>
Cc: Doerr, Martin <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>; Schmidt, Lutz <lutz.schmidt at sap.com<mailto:lutz.schmidt at sap.com>>
Subject: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns wrong result if Klass* is aligned to 32bit

Hi all,

second attempt at a fix (please find first review thread here: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-October/035608.html)

Issue: https://bugs.openjdk.java.net/browse/JDK-8233019
webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.00/webrev/

In short, C1 intrinsic for jlC::isPrimitive does a compare with the Klass* pointer for the class to find out if its NULL and hence a primitive type. That compare is done using 32bit cmp and so it gives wrong results when the Klass* pointer is aligned to 32bit.

In the generator I changed the comparison constant type from intConst(0) to metadataConst(0) and implemented the missing code paths for all CPUs. Since on most architectures we do not seem to have a comparison with a 64bit immediate (at least I could not find one) I kept the change simple and only implemented comparison with NULL for now.

I tested the fix in our nightlies (jtreg tier1, jck and others) as well as manually testing it.

I did not test on aarch64 and arm though and would be thankful if someone knowledgeable to these platforms could take a look.

Thanks to Martin and Lutz for eyeballing the ppc and s390 parts.

Thanks, Thomas

From martin.doerr at sap.com  Mon Nov  4 11:12:45 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 4 Nov 2019 11:12:45 +0000
Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread
In-Reply-To: <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com>
References: <VI1PR0201MB24797F385592A8C572B4064E9A900@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <VI1PR0201MB247908A9BE69E75AAFE3B3639A920@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <3f80fc0b-2388-d4be-3c84-4af516e9635f@oracle.com>
 <VI1PR0201MB2479A25B5B2E26D79ADDE7BF9A690@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <15a92da5-c5ba-ce55-341d-5f60acf14c3a@oracle.com>
 <VI1PR0201MB2479D25DB14E79BBF16666769A680@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com>
 <HE1PR0201MB24755374CE7618AA8545A2309A6B0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com>
 <A0870A55-0738-4CC6-AF3C-50B6E2324D3F@oracle.com>
 <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com>
 <VI1PR0201MB24790137F2D7A941A33CE4949A660@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d8990c59-b51e-1c21-3e7c-272d287fc711@oracle.com>
 <VI1PR0201MB2479DDFCD2E55541481E65ED9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com>
Message-ID: <VI1PR0201MB247928CE867614F8272E5D719A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi all,

@Dean
> changing can_remove() to CompileBroker::can_remove()?
Yes. That would be an option.

@Kim, David
I think there's another problem with this implementation.
It introduces a use-after-free pattern due to concurrency.
Compiler threads may still read the oops from the handles after one of them has called destroy_global until next safepoint. It doesn't matter which values they get in this case, but the VM should not crash. I believe that OopStorage allows freeing storage without safepoints, so this may be unsafe. Right?

If so, I think replacing the oops in the handles (and keeping the handles alive) would be better. And also much more simple.

Best regards,
Martin


> -----Original Message-----
> From: dean.long at oracle.com <dean.long at oracle.com>
> Sent: Samstag, 2. November 2019 08:36
> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
> <david.holmes at oracle.com>; Kim Barrett <kim.barrett at oracle.com>
> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> 
> Hi Martin,
> 
> On 10/30/19 3:18 AM, Doerr, Martin wrote:
> > Hi David,
> >
> >> I don't think factoring out CompileBroker::clear_compiler2_object when
> >> it is only used once was warranted, but that's call for compiler team to
> >> make.
> > I did that because _compiler2_objects is private and there's currently no
> setter available.
> > But let's see what the compiler folks think.
> 
> how about changing can_remove() to CompileBroker::can_remove()? Then
> you
> can access _compiler2_objects directly, right?
> 
> dl
> >> Otherwise changes seem fine and I have noted the use of the
> >> MutexUnlocker as per your direct email.
> > Thanks a lot for reviewing. It was not a trivial one ??
> >
> > You had noticed an incorrect usage of the CHECK macro. I've created a new
> bug for that:
> > https://bugs.openjdk.java.net/browse/JDK-8233193
> > Would be great if you could take a look if that's what you meant and made
> adaptions if needed.
> >
> > Best regards,
> > Martin
> >
> >
> >> -----Original Message-----
> >> From: David Holmes <david.holmes at oracle.com>
> >> Sent: Mittwoch, 30. Oktober 2019 05:47
> >> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
> >> <kim.barrett at oracle.com>
> >> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> >> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net;
> >> David Holmes <David.Holmes at oracle.com>
> >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> >>
> >> Hi Martin,
> >>
> >> On 29/10/2019 12:06 am, Doerr, Martin wrote:
> >>> Hi David and Kim,
> >>>
> >>> I think it's easier to talk about code. So here's a new webrev:
> >>>
> >>
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> >> ev.03/
> >>
> >> I don't think factoring out CompileBroker::clear_compiler2_object when
> >> it is only used once was warranted, but that's call for compiler team to
> >> make. Otherwise changes seem fine and I have noted the use of the
> >> MutexUnlocker as per your direct email.
> >>
> >> Thanks,
> >> David
> >> -----
> >>
> >>> @Kim:
> >>> Thanks for looking at the handle related parts. It's ok if you don't want to
> >> be a reviewer of the whole change.
> >>>> I think it's weird that can_remove() is a predicate with optional side
> >>>> effects.  I think it would be simpler to have it be a pure predicate,
> >>>> and have the one caller with do_it = true perform the updates.  That
> >>>> should include NULLing out the handle pointer (perhaps debug-only,
> but
> >>>> it doesn't cost much to cleanly maintain the data structure).
> >>> Nevertheless, it has the advantage that it enforces the update to be
> >> consistent.
> >>> A caller could use it without holding the lock or mess it up otherwise.
> >>> In addition, I don't what to change that as part of this fix.
> >>>
> >>>> So far as I can tell, THREAD == NULL here.
> >>> This is a very tricky part (not my invention):
> >>> EXCEPTION_MARK contains an ExceptionMark constructor call which
> sets
> >> __the_thread__ to Thread::current().
> >>> I don't want to publish my opinion about this ??
> >>>
> >>> @David:
> >>> Seems like this option is preferred over option 3
> >> (possibly_add_compiler_threads part of webrev.02 and leave the
> >> initialization as is).
> >>> So when you're ok with it, I'll request a 2nd review from the compiler
> folks
> >> (I should change the subject to contain RFR).
> >>> Thanks,
> >>> Martin
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: David Holmes <david.holmes at oracle.com>
> >>>> Sent: Montag, 28. Oktober 2019 05:04
> >>>> To: Kim Barrett <kim.barrett at oracle.com>
> >>>> Cc: Doerr, Martin <martin.doerr at sap.com>; Vladimir Kozlov
> >>>> (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>; hotspot-
> >>>> compiler-dev at openjdk.java.net
> >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> >>>>
> >>>> On 28/10/2019 1:42 pm, Kim Barrett wrote:
> >>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes
> >> <david.holmes at oracle.com>
> >>>> wrote:
> >>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote:
> >>>>>>> Hi Kim,
> >>>>>>> I didn't like using the OopStorage stuff directly, either. I just have
> not
> >>>> seen how to allocate a global handle and add the oop later.
> >>>>>>> Thanks for pointing me to JVMCI::make_global. I was not aware of
> >> that.
> >>>>>>> So I can imagine 3 ways to implement it:
> >>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just
> added
> >>>> that to
> >>
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> >>>> ev.01/
> >>>>>>> We may want to improve that further by setting the handle pointer
> to
> >>>> NULL and asserting that it is NULL before adding the new one.
> >>>>>>> I had been concerned about NULLs in the array, but looks like the
> >>>> existing code can deal with that.
> >>>>>> I think it would be cleaner to both destroy the global handle and
> NULL it
> >> in
> >>>> the array at the same time.
> >>>>>> This comment
> >>>>>>
> >>>>>> 325         // Old j.l.Thread object can die here.
> >>>>>>
> >>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct-
> >threadObj()
> >> so
> >>>> can't "die" until that is also cleared during the actual termination
> process.
> >>>>> I think if there is such a thread here that it can't die, because the
> >>>>> death predicate (the can_remove stuff) won't see that old thread as
> >>>>> the last thread in _compiler2_objects.  That's what I meant by this:
> >>>>>
> >>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett <kim.barrett at oracle.com>
> >>>> wrote:
> >>>>>> I also think that here:
> >>>>>>
> >>>>>> 947         jobject thread_handle =
> >> JNIHandles::make_global(thread_oop);
> >>>>>> 948         _compiler2_objects[i] = thread_handle;
> >>>>>>
> >>>>>> should assert _compiler2_objects[i] == NULL.  Or if that isn't a valid
> >>>>>> assertion then I think there are other problems.
> >>>>> I think either that comment about an old thread is wrong (and the
> NULL
> >>>>> assertion I suggested is okay), or I think the whole mechanism here
> >>>>> has problems.  Or at least I was unable to figure out how it could
> work...
> >>>>>
> >>>> I'm not following sorry. You can't assert NULL unless it's actually set
> >>>> to NULL which it presently isn't. But it could be set NULL as Martin
> >>>> suggested:
> >>>>
> >>>> "We may want to improve that further by setting the handle pointer to
> >>>> NULL and asserting that it is NULL before adding the new one."
> >>>>
> >>>> and which I also supported. But that aside once the delete_global has
> >>>> been called that JNIHandle no longer references the j.l.Thread that it
> >>>> did, at which point it is only reachable via the threadObj() of the
> >>>> CompilerThread.
> >>>>
> >>>> David


From thomas.stuefe at gmail.com  Mon Nov  4 11:17:27 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Mon, 4 Nov 2019 12:17:27 +0100
Subject: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns
 wrong result if Klass* is aligned to 32bit
In-Reply-To: <VI1PR0201MB247921C887DC1A88CE99BFD49A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <CAA-vtUzdW000k20ZUEH03PbCA5Zt2S6MUGupt_F842fL36Uwvg@mail.gmail.com>
 <VI1PR0201MB247991750418A9641C45209F9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <CAA-vtUz6P7C3=xkfwBYG7MoGFY37r2XS87Am89Pu9GMEsNSWTw@mail.gmail.com>
 <VI1PR0201MB247921C887DC1A88CE99BFD49A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <CAA-vtUy92MJhGiN9uKE4zCyvJyBdnv85876Znb2QPm0zFKZZ1Q@mail.gmail.com>

Thanks, Martin!


On Mon, Nov 4, 2019 at 12:12 PM Doerr, Martin <martin.doerr at sap.com> wrote:

> Hi Thomas,
>
>
>
> looks good.
>
>
>
> Thanks,
>
> Martin
>
>
>
>
>
> *From:* Thomas St?fe <thomas.stuefe at gmail.com>
> *Sent:* Donnerstag, 31. Oktober 2019 08:02
> *To:* Doerr, Martin <martin.doerr at sap.com>
> *Cc:* hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; Schmidt,
> Lutz <lutz.schmidt at sap.com>
> *Subject:* Re: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1)
> returns wrong result if Klass* is aligned to 32bit
>
>
>
> Hi Martin,
>
>
>
> thanks for the review!
>
>
>
> New version:
> http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.01/webrev/
>
>
>
> pls find remarks inline.
>
>
>
> On Wed, Oct 30, 2019 at 1:09 PM Doerr, Martin <martin.doerr at sap.com>
> wrote:
>
> Hi Thomas,
>
>
>
> thank you for finding and fixing this issue.
>
>
>
> Shared code part and regression test look good to me.
>
> But I have a few requests and questions related to the platform code.
>
>
>
> Arm32:
>
> Breaks build. The local variable p needs a scope. Can be fixed by:
>
> -        case T_METADATA:
>
> +        case T_METADATA: {
>
>            // We only need, for now, comparison with NULL for metadata.
>
> ?
>
>            break;
>
> +        }
>
>
>
> Ouch. Fixed. Actually, I discarded my coding completely and took over the
> code from T_OBJECT to keep in line with the rest of the coding here.
>
>
>
>
>
> S390:
>
> Using z_cgfi is correct, but there are equivalent instructions with
> shorter opcode.
>
> For comparing to 0, z_ltgr(reg1, reg1) or z_cghi(reg1, 0) may be
> preferred. But that?s not a big deal.
>
>
>
>
>
> Okay I switched to cghi. ltgr sounds cool but would be difficult to
> integrate into the shared part since that one does first a move, then a
> compare.
>
>
>
> I wonder why you have added includes to some platform files. Isn?t that
> redundant?
>
> "utilities/debug.hpp" comes via shared assembler.hpp.
>
>
>
> Did this because I added asserts and the rule is that every file should
> include what it needs and not rely on other includes including it (save for
> umbrella includes like globalDefinitions.hpp). But okay, I removed the
> added includes again to keep the patch small.
>
>
>
>
>
> I?d probably choose Unimplemented() instead of ShouldNotReachHere() for
> non-null cases because it?s not bad in general, it?s just currently not
> used and therefore not yet implemented.
>
> But you can keep that as it is. I?m ok with that, too.
>
>
>
> I rather keep it to keep in line with the rest of the code (see e.g. the
> default: branches).
>
>
>
>
>
> Best regards,
>
> Martin
>
>
>
>
>
> Thanks Martin!
>
>
>
> ..Thomas
>
>
>
>
>
> *From:* Thomas St?fe <thomas.stuefe at gmail.com>
> *Sent:* Mittwoch, 30. Oktober 2019 11:48
> *To:* hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> *Cc:* Doerr, Martin <martin.doerr at sap.com>; Schmidt, Lutz <
> lutz.schmidt at sap.com>
> *Subject:* RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns
> wrong result if Klass* is aligned to 32bit
>
>
>
> Hi all,
>
>
>
> second attempt at a fix (please find first review thread here:
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-October/035608.html
> )
>
>
>
> Issue: https://bugs.openjdk.java.net/browse/JDK-8233019
>
> webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.00/webrev/
>
>
>
> In short, C1 intrinsic for jlC::isPrimitive does a compare with the Klass*
> pointer for the class to find out if its NULL and hence a primitive type.
> That compare is done using 32bit cmp and so it gives wrong results when the
> Klass* pointer is aligned to 32bit.
>
>
>
> In the generator I changed the comparison constant type from intConst(0)
> to metadataConst(0) and implemented the missing code paths for all CPUs.
> Since on most architectures we do not seem to have a comparison with a
> 64bit immediate (at least I could not find one) I kept the change simple
> and only implemented comparison with NULL for now.
>
>
>
> I tested the fix in our nightlies (jtreg tier1, jck and others) as well as
> manually testing it.
>
>
>
> I did not test on aarch64 and arm though and would be thankful if someone
> knowledgeable to these platforms could take a look.
>
>
>
> Thanks to Martin and Lutz for eyeballing the ppc and s390 parts.
>
>
>
> Thanks, Thomas
>
>

From martin.doerr at sap.com  Mon Nov  4 11:20:05 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 4 Nov 2019 11:20:05 +0000
Subject: RFR(S): 8233081: C1: PatchingStub for field access copies too much
In-Reply-To: <70bf20d2-a1f4-2e02-7c7a-62f986eaf7c0@oracle.com>
References: <VI1PR0201MB24798B4FFDCAEE55104A74279A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <70bf20d2-a1f4-2e02-7c7a-62f986eaf7c0@oracle.com>
Message-ID: <VI1PR0201MB2479BD24EA4216514B7B13399A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Tobias,

thanks for the review.

Best regards,
Martin


> -----Original Message-----
> From: Tobias Hartmann <tobias.hartmann at oracle.com>
> Sent: Montag, 4. November 2019 08:33
> To: Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-
> dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
> Subject: Re: RFR(S): 8233081: C1: PatchingStub for field access copies too
> much
> 
> Hi Martin,
> 
> nice cleanup, x86/Sparc looks good to me.
> 
> Best regards,
> Tobias
> 
> On 30.10.19 16:38, Doerr, Martin wrote:
> > Hi,
> >
> >
> >
> > I'd like to fix an issue in C1's PatchingStub implementation for
> "access_field_id".
> >
> > We had noticed that the code in the template exceeded the 255 byte
> limitation when switching on
> > VerifyOops on PPC64.
> >
> > I'd like to improve the situation for all platforms.
> >
> >
> >
> > More detailed bug description:
> >
> > https://bugs.openjdk.java.net/browse/JDK-8233081
> >
> >
> >
> > I need a function to determine how many bytes are needed for the
> NativeMovRegMem.
> >
> > x86 has next_instruction_address() which could in theory be used, but I
> noticed that it's dead code
> > which is no longer correct.
> >
> > Is it ok to remove it?
> >
> > I'd also like to remove the constant instruction_size from
> NativeMovRegMem because it's not constant.
> >
> > I'd prefer to introduce num_bytes_to_end_of_patch() for the purpose of
> determining how many bytes to
> > copy for the "access_field_id" PatchingStub.
> >
> > We can factor out the offset computation from offset() and set_offset()
> and reuse it. This enforces
> > consistency.
> >
> >
> >
> > Webrev:
> >
> >
> http://cr.openjdk.java.net/~mdoerr/8233081_C1_access_field_patching/we
> brev.00/
> >
> >
> >
> > Please review.
> >
> >
> >
> > Best regards,
> >
> > Martin
> >
> >
> >

From nils.eliasson at oracle.com  Mon Nov  4 12:54:55 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Mon, 4 Nov 2019 13:54:55 +0100
Subject: RFR 8233389: Add PrintIdeal to compiler directives
In-Reply-To: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com>
References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com>
Message-ID: <54ce54dd-5b68-311a-e055-3db815c081bd@oracle.com>

Hi Jorn,

One nitpick - in compilerDirectives.hpp: you should surround the line 
with NOT_PRODUCT(...) similair to how it's done for TraceOptoOutput. You 
have already made sure that the use is guarded, which is good, but it 
should show in the list too.

In the future I hope we can make this flag diagnostic.

Otherwise good.

Regards,

Nils

On 2019-11-01 16:09, Jorn Vernee wrote:
> Hi,
>
> I'd like to add PrintIdeal as a compiler directive in order to enable 
> PrintIdeal for only a single method when combining it with the 'match' 
> directive.
>
> Please review the following:
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8233389
> Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/
> (Testing = tier1, manual)
>
> As a heads-up; I'm not a committer on the jdk project, so if this 
> sounds like a good idea, I would require a sponsor to push the changes.
>
> Thanks,
> Jorn
>

From patric.hedlin at oracle.com  Mon Nov  4 13:19:25 2019
From: patric.hedlin at oracle.com (Patric Hedlin)
Date: Mon, 4 Nov 2019 14:19:25 +0100
Subject: RFR(S/T): 8233498: [SPARC] Remove dead code
Message-ID: <da59af7c-c792-72f5-a091-7996ab7aefb4@oracle.com>

Dear all,

I would like to ask for help to review the following change/update:

Issue:? https://bugs.openjdk.java.net/browse/JDK-8233498
Webrev: http://cr.openjdk.java.net/~phedlin/tr8233498/

8233498: [SPARC] Remove dead code

 ??? Remove a number of dead/unused/non-implemented methods in the
 ??? SPARC version of the MacroAssembler.


Testing: SPARC build on Solaris and Linux (thanks to Adrian G. [1])
 ??? hs-tier1-3 (sparcv9)

[1]. John Paul Adrian Glaubitz <glaubitz at physik.fu-berlin.de>


Best regards,
Patric


From nils.eliasson at oracle.com  Mon Nov  4 13:52:44 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Mon, 4 Nov 2019 14:52:44 +0100
Subject: RFR(S/T): 8233498: [SPARC] Remove dead code
In-Reply-To: <da59af7c-c792-72f5-a091-7996ab7aefb4@oracle.com>
References: <da59af7c-c792-72f5-a091-7996ab7aefb4@oracle.com>
Message-ID: <da58af92-6d23-656a-1f23-847e06b5cd12@oracle.com>

Looks good!

Regards,

Nils

On 2019-11-04 14:19, Patric Hedlin wrote:
> Dear all,
>
> I would like to ask for help to review the following change/update:
>
> Issue:? https://bugs.openjdk.java.net/browse/JDK-8233498
> Webrev: http://cr.openjdk.java.net/~phedlin/tr8233498/
>
> 8233498: [SPARC] Remove dead code
>
> ??? Remove a number of dead/unused/non-implemented methods in the
> ??? SPARC version of the MacroAssembler.
>
>
> Testing: SPARC build on Solaris and Linux (thanks to Adrian G. [1])
> ??? hs-tier1-3 (sparcv9)
>
> [1]. John Paul Adrian Glaubitz <glaubitz at physik.fu-berlin.de>
>
>
> Best regards,
> Patric
>

From vladimir.x.ivanov at oracle.com  Mon Nov  4 14:45:03 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 4 Nov 2019 17:45:03 +0300
Subject: RFR 8233389: Add PrintIdeal to compiler directives
In-Reply-To: <54ce54dd-5b68-311a-e055-3db815c081bd@oracle.com>
References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com>
 <54ce54dd-5b68-311a-e055-3db815c081bd@oracle.com>
Message-ID: <7deb65c5-822c-36db-bb97-8f2e3edbac3b@oracle.com>


On 04.11.2019 15:54, Nils Eliasson wrote:
> Hi Jorn,
> 
> One nitpick - in compilerDirectives.hpp: you should surround the line 
> with NOT_PRODUCT(...) similair to how it's done for TraceOptoOutput. You 
> have already made sure that the use is guarded, which is good, but it 
> should show in the list too.

FTR the same applies to IGVPrintLevel:

src/hotspot/share/compiler/compilerDirectives.hpp:
     cflags(IGVPrintLevel,           intx, PrintIdealGraphLevel, 
IGVPrintLevel) \

src/hotspot/share/opto/c2_globals.hpp:
   notproduct(intx, PrintIdealGraphLevel, 0, 


Best regards,
Vladimir Ivanov

> 
> In the future I hope we can make this flag diagnostic.
> 
> Otherwise good.
> 
> Regards,
> 
> Nils
> 
> On 2019-11-01 16:09, Jorn Vernee wrote:
>> Hi,
>>
>> I'd like to add PrintIdeal as a compiler directive in order to enable 
>> PrintIdeal for only a single method when combining it with the 'match' 
>> directive.
>>
>> Please review the following:
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8233389
>> Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/
>> (Testing = tier1, manual)
>>
>> As a heads-up; I'm not a committer on the jdk project, so if this 
>> sounds like a good idea, I would require a sponsor to push the changes.
>>
>> Thanks,
>> Jorn
>>

From vladimir.x.ivanov at oracle.com  Mon Nov  4 14:57:06 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 4 Nov 2019 17:57:06 +0300
Subject: RFR 8233389: Add PrintIdeal to compiler directives
In-Reply-To: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com>
References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com>
Message-ID: <21ab250e-0564-69e4-62c0-07bb4dce9082@oracle.com>

Hi Jorn,

src\hotspot\share\opto\compile.hpp:
+   bool                  _print_ideal;           // True if we should 
dump node IR for this compilation

Since the only usage is in non-product code, I suggest to put 
_print_ideal into #ifndef PRODUCT, so you don't need to initialize it in 
product build.

Also, it'll allow you to just put it on initializer list instead of 
doing it in the ctor body (akin to how _trace_opto_output is handled):

src\hotspot\share\opto\compile.cpp:

Compile::Compile( ciEnv* ci_env,
...
   : Phase(Compiler),
...
     _has_reserved_stack_access(false),
#ifndef PRODUCT
     _trace_opto_output(directive->TraceOptoOutputOption),
#endif
     _has_method_handle_invokes(false),


Overall, I don't see much value in PrintIdeal: PrintIdealGraph provides 
much more detailed information (even though in XML format) and 
IdealGraphVisualizer is better at browsing the graph. The only thing I'm 
usually missing is full text dump output on individual nodes (they are 
shown pruned in IGV; not sure whether it's IGV fault or the info is 
missing in the dump).

Best regards,
Vladimir Ivanov

On 01.11.2019 18:09, Jorn Vernee wrote:
> Hi,
> 
> I'd like to add PrintIdeal as a compiler directive in order to enable 
> PrintIdeal for only a single method when combining it with the 'match' 
> directive.
> 
> Please review the following:
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8233389
> Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/
> (Testing = tier1, manual)
> 
> As a heads-up; I'm not a committer on the jdk project, so if this sounds 
> like a good idea, I would require a sponsor to push the changes.
> 
> Thanks,
> Jorn
> 

From lutz.schmidt at sap.com  Mon Nov  4 15:35:30 2019
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Mon, 4 Nov 2019 15:35:30 +0000
Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free
 blocks
In-Reply-To: <dd34fa81-b027-4353-df0f-daedf9911b6d@redhat.com>
References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com>
 <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com>
 <B89E11F0-98EE-49F1-87EB-FC22CB7D54A2@sap.com>
 <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com>
 <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com>
 <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com>
 <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com>
 <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com>
 <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com>
 <dd34fa81-b027-4353-df0f-daedf9911b6d@redhat.com>
Message-ID: <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com>

Hi Andrew, 

thank you for your thoughts. I do not agree to your conclusion, though. 

There are two bottlenecks in the CodeHeap management code. One is in CodeHeap::mark_segmap_as_used(), uncovered by OverflowCodeCacheTest.java. The other is in CodeHeap::add_to_freelist(), uncovered by StressCodeCacheTest.java.

Both bottlenecks are tackled by the recommended changeset. 

CodeHeap::mark_segmap_as_used() is no longer O(n*n) for the critical "FreeBlock-join" case. It actually is O(1) now. The time reduction from > 80 seconds to just a few milliseconds is proof of that statement.

CodeHeap::add_to_freelist() is still O(n*n), with n being the free list length. But the kick-in point of the non-linearity could be significantly shifted towards larger n. The time reduction from approx. 8 seconds to 160 milliseconds supports this statement.

I agree it would be helpful to have a "real-world" example showing some improvement. Providing such evidence is hard, though. I could instrument the code and print some values form time to time. It's certain this additional output will mess up success/failure decisions in our test environment. Not sure everybody likes that. But I will give it a try and take the hits. This will be a multi-day effort.

On a general note, I am always uncomfortable knowing of a O(n*n) effort, in particular when it could be removed or at least tamed considerably. Experience tells (at least to me) that, at some point in time, n will be large enough to hurt. 

I'll be back.

Thanks, 
Lutz


?On 04.11.19, 11:08, "Andrew Dinn" <adinn at redhat.com> wrote:

    Hi Lutz,
    
    I'll summarize my thoughts here rather than answer point by point.
    
    The patch successfully addresses the worst case performance but it seems
    to me extremely unlikely that we will see anything that approaches that
    case in real applications. So, that doesn't argue for pushing the patch.
    
    The patch does not seem to make a significant difference to the stress
    test. This test is also not necessarily 'representative' of real cases
    but it is much more likely to be so than the worst case test. That
    suggests to me that the current patch is perhaps not worth pursuing (it
    ain't really broke so ...).  Especially so given that it is not possible
    to distinguish any benefit when running the Spec benchmark apps. One
    could argue that the patch looks like it will do no harm and may do good
    in pathological  cases but that's not really good enough reason to make
    a change. We really need evidence that this is worth doing.
    
    The free list 'search bottleneck' certainly looks like a more promising
    problem to tackle than the 'merge problem'. However, once again this
    'problem' may just be an artefact of running this specific test rather
    than anything that might happen in real life.
    
    I think the only way to find out for sure whether the current patch or a
    patch that addresses the 'search bottleneck' is going to be beneficial
    is to instrument the JVM to record traces for code-cache use from real
    apps and then replay allocations/frees based on those traces to see what
    difference a patch makes and how much this might help the overall
    execution time.
    
    regards,
    
    
    Andrew Dinn
    -----------
    
    On 31/10/2019 16:55, Schmidt, Lutz wrote:
    > Hi Andrew, (and hi to the interested crowd),
    > 
    > Please accept my apologies for taking so long to get back.
    > 
    > These tests (OverflowCodeCacheTest and StressCodeCacheTest) were causing me quite some headaches. Some layer between me and the test prevents the vm (in particular: the VMThread) from terminating normally. The final output from my time measurements is therefore not generated or thrown away. Adding to that were some test machine unavailabilities and a bug in my measurement code, causing crashes. 
    > 
    > Anyway, I added some on-the-fly output, printing the timer values after 10k measurement intervals. This reveals some interesting, additional facts about the tests and the CodeHeap management methods. For detailed numbers, refer to the files attached to the bug (https://bugs.openjdk.java.net/browse/JDK-8231460). For even more detail, I can provide the jtr files on request. 
    > 
    > 
    > OverflowCodeCacheTest
    > =====================
    > This test runs (in my setup) with a 1GB CodeCache.
    > 
    > For this test, CodeHeap::mark_segmap_as_used() is THE performance hog. 40% of all calls have to mark more than 16k segment map entries (in the not optimized case). Basically all of these calls convert to len=1 calls with the optimization turned on. Note that during FreeBlock joining, the segment count is forced to 1(one). No wonder the time spent in CodeHeap::mark_segmap_as_used() collapses from >80sec (half of the test runtime) to <100msec. 
    > 
    > CodeHeap::add_to_freelist() on the other hand, is almost not observable. Average free list length is at two elements, making even linear search really quick. 
    > 
    > 
    > StressCodeCacheTest
    > ===================
    > With a 1GB CodeCache, this test runs into a 12 min timeout, set by our internal test environment. Scaling back to 300MB prevents the test from timing out.
    > 
    > For this test, CodeHeap::mark_segmap_as_used() is not a factor. From 200,000 calls, only a few (less than 3%) had to process a block consisting of more than 16 segments. Note that during FreeBlock joining, the segment count is forced to 1(one). 
    > 
    > Another method is popping up as performance hog instead: CodeHeap::add_to_freelist(). More than 8 out of 40 seconds of test runtime (before optimization) are spent in this method, for just 160,000 calls. The test seems to create a long list of non-contiguous free blocks (around 5,500 on average). This list is linearly scanned to find the insert point for the free block at hand.
    > 
    > Suffering as well from the long free block list is CodeHeap::search_freelist(). It uses another 2.7 seconds for 270,000 calls.  
    > 
    > 
    > SPEVjvm2008 suite
    > =================
    > With respect to the task at hand, this is a well-behaved test suite. Timing shows some before/after difference, but nothing spectacular. The measurements due not provide evidence of a performance bottleneck. 
    > 
    > 
    > There were some minor adjustments to the code. Unused code blocks have been removed as well. I have therefore created a new webrev. You can find it here:
    >    http://cr.openjdk.java.net/~lucy/webrevs/8231460.01/ 
    > 
    > Thanks for investing your time!
    > Lutz
    > 
    > 
    > On 21.10.19, 15:06, "Andrew Dinn" <adinn at redhat.com> wrote:
    > 
    >     Hi Lutz,
    >     
    >     On 21/10/2019 13:37, Schmidt, Lutz wrote:
    >     > I understand what you are interested in. And I was hoping to be able
    >     > to provide some (first) numbers by today. Unfortunately, the
    >     > measurement code I activated last Friday was buggy and blew most of
    >     > the tests I had hoped to run over the weekend.
    >     > 
    >     > I will take your modified test and run it with and without my
    >     > optimization. In parallel, I will try to generate some (non-random)
    >     > numbers for other tests.
    >     > 
    >     > I'll be back as soon as I have results.
    >     
    >     Thanks for trying the test and also for deriving some call stats from a
    >     real example. I'm keen to see how much your patch improves things.
    >     
    >     regards,
    >     
    >     
    >     Andrew Dinn
    >     -----------
    >     Senior Principal Software Engineer
    >     Red Hat UK Ltd
    >     Registered in England and Wales under Company Registration No. 03798903
    >     Directors: Michael Cunningham, Michael ("Mike") O'Neill
    >     
    >     
    > 
    > 
    
    
From martin.doerr at sap.com  Mon Nov  4 16:43:01 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 4 Nov 2019 16:43:01 +0000
Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread
In-Reply-To: <VI1PR0201MB247928CE867614F8272E5D719A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB24797F385592A8C572B4064E9A900@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <VI1PR0201MB247908A9BE69E75AAFE3B3639A920@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <3f80fc0b-2388-d4be-3c84-4af516e9635f@oracle.com>
 <VI1PR0201MB2479A25B5B2E26D79ADDE7BF9A690@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <15a92da5-c5ba-ce55-341d-5f60acf14c3a@oracle.com>
 <VI1PR0201MB2479D25DB14E79BBF16666769A680@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com>
 <HE1PR0201MB24755374CE7618AA8545A2309A6B0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com>
 <A0870A55-0738-4CC6-AF3C-50B6E2324D3F@oracle.com>
 <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com>
 <VI1PR0201MB24790137F2D7A941A33CE4949A660@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d8990c59-b51e-1c21-3e7c-272d287fc711@oracle.com>
 <VI1PR0201MB2479DDFCD2E55541481E65ED9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com>
 <VI1PR0201MB247928CE867614F8272E5D719A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <VI1PR0201MB247904681A73E752B26B6D439A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi again,

it is possible to release the handles, but it comes with a much higher complexity:
http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webrev.04/

If we can replace oops in handles like
NativeAccess<>::oop_store((oop*)compiler2_object(i), thread_oop());
we only need to change one place (possibly_add_compiler_threads) and that's it.

Best regards,
Martin


> -----Original Message-----
> From: Doerr, Martin
> Sent: Montag, 4. November 2019 12:13
> To: 'dean.long at oracle.com' <dean.long at oracle.com>; David Holmes
> <david.holmes at oracle.com>; Kim Barrett <kim.barrett at oracle.com>
> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: RE: JDK-8230459: Test failed to resume JVMCI CompilerThread
> 
> Hi all,
> 
> @Dean
> > changing can_remove() to CompileBroker::can_remove()?
> Yes. That would be an option.
> 
> @Kim, David
> I think there's another problem with this implementation.
> It introduces a use-after-free pattern due to concurrency.
> Compiler threads may still read the oops from the handles after one of them
> has called destroy_global until next safepoint. It doesn't matter which values
> they get in this case, but the VM should not crash. I believe that OopStorage
> allows freeing storage without safepoints, so this may be unsafe. Right?
> 
> If so, I think replacing the oops in the handles (and keeping the handles alive)
> would be better. And also much more simple.
> 
> Best regards,
> Martin
> 
> 
> > -----Original Message-----
> > From: dean.long at oracle.com <dean.long at oracle.com>
> > Sent: Samstag, 2. November 2019 08:36
> > To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
> > <david.holmes at oracle.com>; Kim Barrett <kim.barrett at oracle.com>
> > Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> > <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> > Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> >
> > Hi Martin,
> >
> > On 10/30/19 3:18 AM, Doerr, Martin wrote:
> > > Hi David,
> > >
> > >> I don't think factoring out CompileBroker::clear_compiler2_object when
> > >> it is only used once was warranted, but that's call for compiler team to
> > >> make.
> > > I did that because _compiler2_objects is private and there's currently no
> > setter available.
> > > But let's see what the compiler folks think.
> >
> > how about changing can_remove() to CompileBroker::can_remove()? Then
> > you
> > can access _compiler2_objects directly, right?
> >
> > dl
> > >> Otherwise changes seem fine and I have noted the use of the
> > >> MutexUnlocker as per your direct email.
> > > Thanks a lot for reviewing. It was not a trivial one ??
> > >
> > > You had noticed an incorrect usage of the CHECK macro. I've created a
> new
> > bug for that:
> > > https://bugs.openjdk.java.net/browse/JDK-8233193
> > > Would be great if you could take a look if that's what you meant and
> made
> > adaptions if needed.
> > >
> > > Best regards,
> > > Martin
> > >
> > >
> > >> -----Original Message-----
> > >> From: David Holmes <david.holmes at oracle.com>
> > >> Sent: Mittwoch, 30. Oktober 2019 05:47
> > >> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
> > >> <kim.barrett at oracle.com>
> > >> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> > >> <vladimir.kozlov at oracle.com>; hotspot-compiler-
> dev at openjdk.java.net;
> > >> David Holmes <David.Holmes at oracle.com>
> > >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> > >>
> > >> Hi Martin,
> > >>
> > >> On 29/10/2019 12:06 am, Doerr, Martin wrote:
> > >>> Hi David and Kim,
> > >>>
> > >>> I think it's easier to talk about code. So here's a new webrev:
> > >>>
> > >>
> >
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> > >> ev.03/
> > >>
> > >> I don't think factoring out CompileBroker::clear_compiler2_object when
> > >> it is only used once was warranted, but that's call for compiler team to
> > >> make. Otherwise changes seem fine and I have noted the use of the
> > >> MutexUnlocker as per your direct email.
> > >>
> > >> Thanks,
> > >> David
> > >> -----
> > >>
> > >>> @Kim:
> > >>> Thanks for looking at the handle related parts. It's ok if you don't want
> to
> > >> be a reviewer of the whole change.
> > >>>> I think it's weird that can_remove() is a predicate with optional side
> > >>>> effects.  I think it would be simpler to have it be a pure predicate,
> > >>>> and have the one caller with do_it = true perform the updates.  That
> > >>>> should include NULLing out the handle pointer (perhaps debug-only,
> > but
> > >>>> it doesn't cost much to cleanly maintain the data structure).
> > >>> Nevertheless, it has the advantage that it enforces the update to be
> > >> consistent.
> > >>> A caller could use it without holding the lock or mess it up otherwise.
> > >>> In addition, I don't what to change that as part of this fix.
> > >>>
> > >>>> So far as I can tell, THREAD == NULL here.
> > >>> This is a very tricky part (not my invention):
> > >>> EXCEPTION_MARK contains an ExceptionMark constructor call which
> > sets
> > >> __the_thread__ to Thread::current().
> > >>> I don't want to publish my opinion about this ??
> > >>>
> > >>> @David:
> > >>> Seems like this option is preferred over option 3
> > >> (possibly_add_compiler_threads part of webrev.02 and leave the
> > >> initialization as is).
> > >>> So when you're ok with it, I'll request a 2nd review from the compiler
> > folks
> > >> (I should change the subject to contain RFR).
> > >>> Thanks,
> > >>> Martin
> > >>>
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: David Holmes <david.holmes at oracle.com>
> > >>>> Sent: Montag, 28. Oktober 2019 05:04
> > >>>> To: Kim Barrett <kim.barrett at oracle.com>
> > >>>> Cc: Doerr, Martin <martin.doerr at sap.com>; Vladimir Kozlov
> > >>>> (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>;
> hotspot-
> > >>>> compiler-dev at openjdk.java.net
> > >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI
> CompilerThread
> > >>>>
> > >>>> On 28/10/2019 1:42 pm, Kim Barrett wrote:
> > >>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes
> > >> <david.holmes at oracle.com>
> > >>>> wrote:
> > >>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote:
> > >>>>>>> Hi Kim,
> > >>>>>>> I didn't like using the OopStorage stuff directly, either. I just have
> > not
> > >>>> seen how to allocate a global handle and add the oop later.
> > >>>>>>> Thanks for pointing me to JVMCI::make_global. I was not aware
> of
> > >> that.
> > >>>>>>> So I can imagine 3 ways to implement it:
> > >>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just
> > added
> > >>>> that to
> > >>
> >
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> > >>>> ev.01/
> > >>>>>>> We may want to improve that further by setting the handle
> pointer
> > to
> > >>>> NULL and asserting that it is NULL before adding the new one.
> > >>>>>>> I had been concerned about NULLs in the array, but looks like the
> > >>>> existing code can deal with that.
> > >>>>>> I think it would be cleaner to both destroy the global handle and
> > NULL it
> > >> in
> > >>>> the array at the same time.
> > >>>>>> This comment
> > >>>>>>
> > >>>>>> 325         // Old j.l.Thread object can die here.
> > >>>>>>
> > >>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct-
> > >threadObj()
> > >> so
> > >>>> can't "die" until that is also cleared during the actual termination
> > process.
> > >>>>> I think if there is such a thread here that it can't die, because the
> > >>>>> death predicate (the can_remove stuff) won't see that old thread as
> > >>>>> the last thread in _compiler2_objects.  That's what I meant by this:
> > >>>>>
> > >>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett
> <kim.barrett at oracle.com>
> > >>>> wrote:
> > >>>>>> I also think that here:
> > >>>>>>
> > >>>>>> 947         jobject thread_handle =
> > >> JNIHandles::make_global(thread_oop);
> > >>>>>> 948         _compiler2_objects[i] = thread_handle;
> > >>>>>>
> > >>>>>> should assert _compiler2_objects[i] == NULL.  Or if that isn't a valid
> > >>>>>> assertion then I think there are other problems.
> > >>>>> I think either that comment about an old thread is wrong (and the
> > NULL
> > >>>>> assertion I suggested is okay), or I think the whole mechanism here
> > >>>>> has problems.  Or at least I was unable to figure out how it could
> > work...
> > >>>>>
> > >>>> I'm not following sorry. You can't assert NULL unless it's actually set
> > >>>> to NULL which it presently isn't. But it could be set NULL as Martin
> > >>>> suggested:
> > >>>>
> > >>>> "We may want to improve that further by setting the handle pointer
> to
> > >>>> NULL and asserting that it is NULL before adding the new one."
> > >>>>
> > >>>> and which I also supported. But that aside once the delete_global has
> > >>>> been called that JNIHandle no longer references the j.l.Thread that it
> > >>>> did, at which point it is only reachable via the threadObj() of the
> > >>>> CompilerThread.
> > >>>>
> > >>>> David


From thomas.stuefe at gmail.com  Mon Nov  4 17:02:26 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Mon, 4 Nov 2019 18:02:26 +0100
Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free
 blocks
In-Reply-To: <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com>
References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com>
 <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com>
 <B89E11F0-98EE-49F1-87EB-FC22CB7D54A2@sap.com>
 <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com>
 <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com>
 <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com>
 <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com>
 <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com>
 <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com>
Message-ID: <CAA-vtUyny1TkJXvqT0smwhQeB0omJ_jPPJyYoSO9Cv4NV4Zd2Q@mail.gmail.com>

Hi Lutz,

I like this patch and have a number of remarks:

Segment map splicing:

- If the last byte of the leading block happens to contain 0xFD, you do not
need to increment the fragmentation counter after splicing since the
leading hop would be fully formed.

- If patch complexity is a concern, I would maybe leave out the
segmap_template. That would simplify the coding a bit and I am not sure how
much this actually brings. I really am curious. Is it better to let the
compiler store immediates or to memcpy the template? Would the latter not
mean I pay for the loads too?

- In ::allocate(), I like that you re-use block. This is just bikeshedding,
but you also could reshape this function a tiny bit for a clean single exit
at the end of the function.

- Comment nits:

- * code block (e.g. nmethod) when only a pointer to somewhere inside the
+ * code block (e.g. nmethod) when only a pointer to a location inside the

- *  - Segment addresses should be aligned to be multiples of
CodeCacheSegmentSize.
+ *  - Segment start addresses should be aligned to be multiples of
CodeCacheSegmentSize.

- *  - Allocation in the code cache can only happen at segment boundaries.
+ *  - Allocation in the code cache can only happen at segment start
addresses.

- I would like comments for the newly added functions in heap.hpp, not in
the cpp file, because my IDE expands those comments from the hpp. But thats
just me :=)

- block_at() could be implemented now using address_for and a cast

- fragmentation_limit, freelist_limit: I thought using enums for constants
is not done anymore, and the preferred way is to use static const. See e.g.
markWord.hpp.

- +  // Boundaries of committed space.
  +  // Boundaries of reserved space.
   Thanks for the comments. Out of curiosity, can the lower part be
uncommitted? Or would low() == low_boundary() always be true?

- +  bool contains(const void* p) const             { return low() <= p &&
p < high(); }
  Is comparing with void* legal C++?

- CodeHeap::merge_right():
    - Nits, I would maybe rename "follower" to "beg_follower", or,
alternatively, "beg" to "leader".
    - I wonder whether we can have a "wipe header function" which only
wipes the FreeBlock header of the follower, instead of wiping the whole
segment.

- CodeHeap::add_to_freelist():
   I am astonished that this brings such a performance increase. I would
naively have thought the deallocation to be pretty random, and therefore
this technique to have an improvement of factor 2 in general, but we also
need to find the start of the last_insert_free_block which may take some
hops, and then, the block might not even be free...

--

General remarks, not necessarily for your patch:

- This code could really gain readability if the segment id would be its an
own type, e.g. "typdef int segid_t". APIs like "mark_segmap_as_used(segid_t
beg, segid_t end) are immediately clearer than when size_t is used.

- In CodeHeap::verify(), do we actually check that the segment map is
correctly formed? That from every point in the map, we reach the correct
start of the associated code blob? I could not find this but I may be blind.

Thanks, Thomas


On Thu, Oct 31, 2019 at 5:55 PM Schmidt, Lutz <lutz.schmidt at sap.com> wrote:

> Hi Andrew, (and hi to the interested crowd),
>
> Please accept my apologies for taking so long to get back.
>
> These tests (OverflowCodeCacheTest and StressCodeCacheTest) were causing
> me quite some headaches. Some layer between me and the test prevents the vm
> (in particular: the VMThread) from terminating normally. The final output
> from my time measurements is therefore not generated or thrown away. Adding
> to that were some test machine unavailabilities and a bug in my measurement
> code, causing crashes.
>
> Anyway, I added some on-the-fly output, printing the timer values after
> 10k measurement intervals. This reveals some interesting, additional facts
> about the tests and the CodeHeap management methods. For detailed numbers,
> refer to the files attached to the bug (
> https://bugs.openjdk.java.net/browse/JDK-8231460). For even more detail,
> I can provide the jtr files on request.
>
>
> OverflowCodeCacheTest
> =====================
> This test runs (in my setup) with a 1GB CodeCache.
>
> For this test, CodeHeap::mark_segmap_as_used() is THE performance hog. 40%
> of all calls have to mark more than 16k segment map entries (in the not
> optimized case). Basically all of these calls convert to len=1 calls with
> the optimization turned on. Note that during FreeBlock joining, the segment
> count is forced to 1(one). No wonder the time spent in
> CodeHeap::mark_segmap_as_used() collapses from >80sec (half of the test
> runtime) to <100msec.
>
> CodeHeap::add_to_freelist() on the other hand, is almost not observable.
> Average free list length is at two elements, making even linear search
> really quick.
>
>
> StressCodeCacheTest
> ===================
> With a 1GB CodeCache, this test runs into a 12 min timeout, set by our
> internal test environment. Scaling back to 300MB prevents the test from
> timing out.
>
> For this test, CodeHeap::mark_segmap_as_used() is not a factor. From
> 200,000 calls, only a few (less than 3%) had to process a block consisting
> of more than 16 segments. Note that during FreeBlock joining, the segment
> count is forced to 1(one).
>
> Another method is popping up as performance hog instead:
> CodeHeap::add_to_freelist(). More than 8 out of 40 seconds of test runtime
> (before optimization) are spent in this method, for just 160,000 calls. The
> test seems to create a long list of non-contiguous free blocks (around
> 5,500 on average). This list is linearly scanned to find the insert point
> for the free block at hand.
>
> Suffering as well from the long free block list is
> CodeHeap::search_freelist(). It uses another 2.7 seconds for 270,000
> calls.
>
>
> SPEVjvm2008 suite
> =================
> With respect to the task at hand, this is a well-behaved test suite.
> Timing shows some before/after difference, but nothing spectacular. The
> measurements due not provide evidence of a performance bottleneck.
>
>
> There were some minor adjustments to the code. Unused code blocks have
> been removed as well. I have therefore created a new webrev. You can find
> it here:
>    http://cr.openjdk.java.net/~lucy/webrevs/8231460.01/
>
> Thanks for investing your time!
> Lutz
>
>
> On 21.10.19, 15:06, "Andrew Dinn" <adinn at redhat.com> wrote:
>
>     Hi Lutz,
>
>     On 21/10/2019 13:37, Schmidt, Lutz wrote:
>     > I understand what you are interested in. And I was hoping to be able
>     > to provide some (first) numbers by today. Unfortunately, the
>     > measurement code I activated last Friday was buggy and blew most of
>     > the tests I had hoped to run over the weekend.
>     >
>     > I will take your modified test and run it with and without my
>     > optimization. In parallel, I will try to generate some (non-random)
>     > numbers for other tests.
>     >
>     > I'll be back as soon as I have results.
>
>     Thanks for trying the test and also for deriving some call stats from a
>     real example. I'm keen to see how much your patch improves things.
>
>     regards,
>
>
>     Andrew Dinn
>     -----------
>     Senior Principal Software Engineer
>     Red Hat UK Ltd
>     Registered in England and Wales under Company Registration No. 03798903
>     Directors: Michael Cunningham, Michael ("Mike") O'Neill
>
>
>
>
>

From thomas.stuefe at gmail.com  Mon Nov  4 17:04:41 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Mon, 4 Nov 2019 18:04:41 +0100
Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free
 blocks
In-Reply-To: <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com>
References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com>
 <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com>
 <B89E11F0-98EE-49F1-87EB-FC22CB7D54A2@sap.com>
 <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com>
 <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com>
 <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com>
 <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com>
 <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com>
 <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com>
 <dd34fa81-b027-4353-df0f-daedf9911b6d@redhat.com>
 <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com>
Message-ID: <CAA-vtUzkWGRjFw7WFPX7N8sH6_pTWvyMegwV=LQn-7eAi5Ve5Q@mail.gmail.com>

Hi Andrew, Lutz,

I agree with Lutz in this case. I think the patch complexity could be
reduced if needed (see my mail to Lutz) if complexity is a concern, but I
like most of these changes and the comment improvements are nice.

Just my 5 cent.

Cheers, Thomas


On Mon, Nov 4, 2019 at 4:36 PM Schmidt, Lutz <lutz.schmidt at sap.com> wrote:

> Hi Andrew,
>
> thank you for your thoughts. I do not agree to your conclusion, though.
>
> There are two bottlenecks in the CodeHeap management code. One is in
> CodeHeap::mark_segmap_as_used(), uncovered by OverflowCodeCacheTest.java.
> The other is in CodeHeap::add_to_freelist(), uncovered by
> StressCodeCacheTest.java.
>
> Both bottlenecks are tackled by the recommended changeset.
>
> CodeHeap::mark_segmap_as_used() is no longer O(n*n) for the critical
> "FreeBlock-join" case. It actually is O(1) now. The time reduction from >
> 80 seconds to just a few milliseconds is proof of that statement.
>
> CodeHeap::add_to_freelist() is still O(n*n), with n being the free list
> length. But the kick-in point of the non-linearity could be significantly
> shifted towards larger n. The time reduction from approx. 8 seconds to 160
> milliseconds supports this statement.
>
> I agree it would be helpful to have a "real-world" example showing some
> improvement. Providing such evidence is hard, though. I could instrument
> the code and print some values form time to time. It's certain this
> additional output will mess up success/failure decisions in our test
> environment. Not sure everybody likes that. But I will give it a try and
> take the hits. This will be a multi-day effort.
>
> On a general note, I am always uncomfortable knowing of a O(n*n) effort,
> in particular when it could be removed or at least tamed considerably.
> Experience tells (at least to me) that, at some point in time, n will be
> large enough to hurt.
>
> I'll be back.
>
> Thanks,
> Lutz
>
>
> ?On 04.11.19, 11:08, "Andrew Dinn" <adinn at redhat.com> wrote:
>
>     Hi Lutz,
>
>     I'll summarize my thoughts here rather than answer point by point.
>
>     The patch successfully addresses the worst case performance but it
> seems
>     to me extremely unlikely that we will see anything that approaches that
>     case in real applications. So, that doesn't argue for pushing the
> patch.
>
>     The patch does not seem to make a significant difference to the stress
>     test. This test is also not necessarily 'representative' of real cases
>     but it is much more likely to be so than the worst case test. That
>     suggests to me that the current patch is perhaps not worth pursuing (it
>     ain't really broke so ...).  Especially so given that it is not
> possible
>     to distinguish any benefit when running the Spec benchmark apps. One
>     could argue that the patch looks like it will do no harm and may do
> good
>     in pathological  cases but that's not really good enough reason to make
>     a change. We really need evidence that this is worth doing.
>
>     The free list 'search bottleneck' certainly looks like a more promising
>     problem to tackle than the 'merge problem'. However, once again this
>     'problem' may just be an artefact of running this specific test rather
>     than anything that might happen in real life.
>
>     I think the only way to find out for sure whether the current patch or
> a
>     patch that addresses the 'search bottleneck' is going to be beneficial
>     is to instrument the JVM to record traces for code-cache use from real
>     apps and then replay allocations/frees based on those traces to see
> what
>     difference a patch makes and how much this might help the overall
>     execution time.
>
>     regards,
>
>
>     Andrew Dinn
>     -----------
>
>     On 31/10/2019 16:55, Schmidt, Lutz wrote:
>     > Hi Andrew, (and hi to the interested crowd),
>     >
>     > Please accept my apologies for taking so long to get back.
>     >
>     > These tests (OverflowCodeCacheTest and StressCodeCacheTest) were
> causing me quite some headaches. Some layer between me and the test
> prevents the vm (in particular: the VMThread) from terminating normally.
> The final output from my time measurements is therefore not generated or
> thrown away. Adding to that were some test machine unavailabilities and a
> bug in my measurement code, causing crashes.
>     >
>     > Anyway, I added some on-the-fly output, printing the timer values
> after 10k measurement intervals. This reveals some interesting, additional
> facts about the tests and the CodeHeap management methods. For detailed
> numbers, refer to the files attached to the bug (
> https://bugs.openjdk.java.net/browse/JDK-8231460). For even more detail,
> I can provide the jtr files on request.
>     >
>     >
>     > OverflowCodeCacheTest
>     > =====================
>     > This test runs (in my setup) with a 1GB CodeCache.
>     >
>     > For this test, CodeHeap::mark_segmap_as_used() is THE performance
> hog. 40% of all calls have to mark more than 16k segment map entries (in
> the not optimized case). Basically all of these calls convert to len=1
> calls with the optimization turned on. Note that during FreeBlock joining,
> the segment count is forced to 1(one). No wonder the time spent in
> CodeHeap::mark_segmap_as_used() collapses from >80sec (half of the test
> runtime) to <100msec.
>     >
>     > CodeHeap::add_to_freelist() on the other hand, is almost not
> observable. Average free list length is at two elements, making even linear
> search really quick.
>     >
>     >
>     > StressCodeCacheTest
>     > ===================
>     > With a 1GB CodeCache, this test runs into a 12 min timeout, set by
> our internal test environment. Scaling back to 300MB prevents the test from
> timing out.
>     >
>     > For this test, CodeHeap::mark_segmap_as_used() is not a factor. From
> 200,000 calls, only a few (less than 3%) had to process a block consisting
> of more than 16 segments. Note that during FreeBlock joining, the segment
> count is forced to 1(one).
>     >
>     > Another method is popping up as performance hog instead:
> CodeHeap::add_to_freelist(). More than 8 out of 40 seconds of test runtime
> (before optimization) are spent in this method, for just 160,000 calls. The
> test seems to create a long list of non-contiguous free blocks (around
> 5,500 on average). This list is linearly scanned to find the insert point
> for the free block at hand.
>     >
>     > Suffering as well from the long free block list is
> CodeHeap::search_freelist(). It uses another 2.7 seconds for 270,000
> calls.
>     >
>     >
>     > SPEVjvm2008 suite
>     > =================
>     > With respect to the task at hand, this is a well-behaved test suite.
> Timing shows some before/after difference, but nothing spectacular. The
> measurements due not provide evidence of a performance bottleneck.
>     >
>     >
>     > There were some minor adjustments to the code. Unused code blocks
> have been removed as well. I have therefore created a new webrev. You can
> find it here:
>     >    http://cr.openjdk.java.net/~lucy/webrevs/8231460.01/
>     >
>     > Thanks for investing your time!
>     > Lutz
>     >
>     >
>     > On 21.10.19, 15:06, "Andrew Dinn" <adinn at redhat.com> wrote:
>     >
>     >     Hi Lutz,
>     >
>     >     On 21/10/2019 13:37, Schmidt, Lutz wrote:
>     >     > I understand what you are interested in. And I was hoping to
> be able
>     >     > to provide some (first) numbers by today. Unfortunately, the
>     >     > measurement code I activated last Friday was buggy and blew
> most of
>     >     > the tests I had hoped to run over the weekend.
>     >     >
>     >     > I will take your modified test and run it with and without my
>     >     > optimization. In parallel, I will try to generate some
> (non-random)
>     >     > numbers for other tests.
>     >     >
>     >     > I'll be back as soon as I have results.
>     >
>     >     Thanks for trying the test and also for deriving some call stats
> from a
>     >     real example. I'm keen to see how much your patch improves
> things.
>     >
>     >     regards,
>     >
>     >
>     >     Andrew Dinn
>     >     -----------
>     >     Senior Principal Software Engineer
>     >     Red Hat UK Ltd
>     >     Registered in England and Wales under Company Registration No.
> 03798903
>     >     Directors: Michael Cunningham, Michael ("Mike") O'Neill
>     >
>     >
>     >
>     >
>
>
>
>

From igor.veresov at oracle.com  Mon Nov  4 17:57:13 2019
From: igor.veresov at oracle.com (Igor Veresov)
Date: Mon, 4 Nov 2019 09:57:13 -0800
Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003
In-Reply-To: <cf97774a-98e7-fef4-dab7-5e1472b08612@loongson.cn>
References: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn>
 <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com>
 <cf97774a-98e7-fef4-dab7-5e1472b08612@loongson.cn>
Message-ID: <DDB215A6-AFA2-4D06-A0A1-37DC3B748BB9@oracle.com>

This look good to me.

igor


> On Nov 4, 2019, at 1:33 AM, Jie Fu <fujie at loongson.cn> wrote:
> 
> Hi Aleksey and Tobias,
> 
> Thanks for your review and valuable comments.
> 
> I'm sorry to mention that Igor is teaching me how to fix the bug off the list these days.
> 
> What do you think of this version?
>   http://cr.openjdk.java.net/~jiefu/8233429/webrev.01/
> 
> I prefer webrev.01.
> 
> Thanks a lot.
> Best regards,
> Jie
> 
> 
> On 2019/11/4 ??4:53, Aleksey Shipilev wrote:
>> On 11/2/19 10:29 AM, Jie Fu wrote:
>>> Hi all,
>>> 
>>> May I get reviews for this small fix?
>>> 
>>> JBS:    https://bugs.openjdk.java.net/browse/JDK-8233429
>>> Webrev: http://cr.openjdk.java.net/~jiefu/8233429/webrev.00/
>> Looks fine to me.
>> 
>> The alternative is to stub out CompilationModeFlag::*() definitions under TIERED define, but that
>> would be more awkward than effectively using the "default" mode for minimal and zero VMs.
>> 
> 


From shade at redhat.com  Mon Nov  4 18:02:15 2019
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 4 Nov 2019 19:02:15 +0100
Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003
In-Reply-To: <cf97774a-98e7-fef4-dab7-5e1472b08612@loongson.cn>
References: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn>
 <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com>
 <cf97774a-98e7-fef4-dab7-5e1472b08612@loongson.cn>
Message-ID: <268b6cd8-7b66-5e5a-4933-e67197ae34ec@redhat.com>

On 11/4/19 10:33 AM, Jie Fu wrote:
> What do you think of this version?
> ? http://cr.openjdk.java.net/~jiefu/8233429/webrev.01/
Alright, this looks fine too.

-- 
Thanks,
-Aleksey


From dean.long at oracle.com  Mon Nov  4 18:58:50 2019
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Mon, 4 Nov 2019 10:58:50 -0800
Subject: RFR(S): 8233081: C1: PatchingStub for field access copies too much
In-Reply-To: <VI1PR0201MB24798B4FFDCAEE55104A74279A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB24798B4FFDCAEE55104A74279A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <db231bce-9e5d-bccb-61c8-5f098b028293@oracle.com>

Looks good.? If I am reading it right, you also fixed a (latent?) bug in 
Sparc by changing the max size return from 7 words to 8 words.

dl

On 10/30/19 8:38 AM, Doerr, Martin wrote:
> Hi,
>
> I'd like to fix an issue in C1's PatchingStub implementation for "access_field_id".
> We had noticed that the code in the template exceeded the 255 byte limitation when switching on VerifyOops on PPC64.
> I'd like to improve the situation for all platforms.
>
> More detailed bug description:
> https://bugs.openjdk.java.net/browse/JDK-8233081
>
> I need a function to determine how many bytes are needed for the NativeMovRegMem.
> x86 has next_instruction_address() which could in theory be used, but I noticed that it's dead code which is no longer correct.
> Is it ok to remove it?
> I'd also like to remove the constant instruction_size from NativeMovRegMem because it's not constant.
> I'd prefer to introduce num_bytes_to_end_of_patch() for the purpose of determining how many bytes to copy for the "access_field_id" PatchingStub.
> We can factor out the offset computation from offset() and set_offset() and reuse it. This enforces consistency.
>
> Webrev:
> http://cr.openjdk.java.net/~mdoerr/8233081_C1_access_field_patching/webrev.00/
>
> Please review.
>
> Best regards,
> Martin
>


From igor.ignatyev at oracle.com  Mon Nov  4 21:33:31 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Mon, 4 Nov 2019 13:33:31 -0800
Subject: RFR(S) : 8233496 : AOT tests failures with
 'java.lang.RuntimeException: Failed to find sun/hotspot/WhiteBox.class'
Message-ID: <87B508A4-0274-4084-B8B2-1A23FB9B8D26@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8233496/webrev.00/index.html
> 42 lines changed: 0 ins; 6 del; 36 mod;

Hi all,

could you please review this small patch for compiler/aot tests? the tests run 'ClassFileInstaller sun.hotspot.WhiteBox' w/o having any preceding actions which build s.h.WhiteBox class, the fix adds sun.hotspot.WhiteBox to the explicit build action and also removes unneeded classes (compiler.aot.AotCompiler and compiler.calls.common.InvokeDynamicPatcher as they are built implicitly by @run) from it.

JBS: https://bugs.openjdk.java.net/browse/JDK-8233496
webrev: http://cr.openjdk.java.net/~iignatyev//8233496/webrev.00/index.html
testing: 
 - all compiler/aot tests together on Oracle platforms
 - each changed test separately on linux-x64

Thanks,
-- Igor
  
  
From vladimir.kozlov at oracle.com  Mon Nov  4 21:45:00 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 4 Nov 2019 13:45:00 -0800
Subject: RFR(S) : 8233496 : AOT tests failures with
 'java.lang.RuntimeException: Failed to find sun/hotspot/WhiteBox.class'
In-Reply-To: <87B508A4-0274-4084-B8B2-1A23FB9B8D26@oracle.com>
References: <87B508A4-0274-4084-B8B2-1A23FB9B8D26@oracle.com>
Message-ID: <3845ADEC-2657-421F-B35F-C12962A31452@oracle.com>

Looks good.

Thanks
Vladimir

> On Nov 4, 2019, at 1:33 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~iignatyev//8233496/webrev.00/index.html
>> 42 lines changed: 0 ins; 6 del; 36 mod;
> 
> Hi all,
> 
> could you please review this small patch for compiler/aot tests? the tests run 'ClassFileInstaller sun.hotspot.WhiteBox' w/o having any preceding actions which build s.h.WhiteBox class, the fix adds sun.hotspot.WhiteBox to the explicit build action and also removes unneeded classes (compiler.aot.AotCompiler and compiler.calls.common.InvokeDynamicPatcher as they are built implicitly by @run) from it.
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8233496
> webrev: http://cr.openjdk.java.net/~iignatyev//8233496/webrev.00/index.html
> testing: 
> - all compiler/aot tests together on Oracle platforms
> - each changed test separately on linux-x64
> 
> Thanks,
> -- Igor
> 
> 
> 


From gromero at linux.vnet.ibm.com  Mon Nov  4 22:32:39 2019
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Mon, 4 Nov 2019 19:32:39 -0300
Subject: [8u] RFR for backport of 8198894 (CRC32 1/4): [PPC64] More
 generic vector CRC implementation (v2)
In-Reply-To: <VI1PR0201MB247951F074AA4A598CBF8B8B9A6A0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <0dc83fcb-4e09-5841-04be-aee615e5a7fd@linux.vnet.ibm.com>
 <VI1PR0201MB2479B9E369C33E4D46B8F9149A680@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <67e6e482-df56-27d0-da20-7968615f3ea1@linux.vnet.ibm.com>
 <VI1PR0201MB247951F074AA4A598CBF8B8B9A6A0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <b5e148b7-1493-24f2-ecf2-89ce72f6df38@linux.vnet.ibm.com>

Hello Martin,

On 10/24/2019 07:17 AM, Doerr, Martin wrote:
> Hi Gustavo,
> 
> I think removing invertCRC is an unnecessary manual change.
> We should minimize that as far as possible. They may create merge conflicts for future backports.

Thanks a lot for the review.

I agree I should minimize the changes as far as possible. I added back invertCRC
and tried to follow your advice, so the final clean-up patch is almost similar
to the one found on jdk/jdk, for instance.

Please find v2 for the patchset below. v2 changes affect only 3/4 and 4/4.


[PPC64] More generic vector CRC implementation                           (1/4)
http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8198894/

v2:
- Adapt file names to OpenJDK 8u
- Remove CRC32C part, leaving only CRC32 part, since OpenJDK 8u has no CRC32C
- Add Assembler::add_const_optimized() from "8077838: Recent developments for ppc" [0]
- Fix vpermxor() opcode, replacing VPMSUMW_OPCODE by VPERMXOR_OPCODE,
   accordingly to fix in "8190781: ppc64 + s390: Fix CriticalJNINatives" [1]
- Adapt signatures for the following functions and their callers, accordingly to
   "8175369: [ppc] Provide intrinsic implementation for CRC32C" [2]:
    a. MacroAssembler::update_byteLoop_crc32(), removing 'invertCRC' parameter
    b. MacroAssembler::kernel_crc32_1word(), adding 'invertCRC' parameter


[PPC64] Possibly unreliable stack frame resizing in template interpreter (2/4)
http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8216376/

v2:
- Adapt file names to OpenJDK 8u
- Remove CRC32C code


[PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays
http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8216060/   (3/4)

v2:
- Remove CRC32C code, keeping is_crc32c in crc32(), code related to is_crc32c
   and invertCRC, like code in kernel_crc32_vpmsum(), and not touching stub code
   mark in generate_CRC32_updateBytes() to avoid merge conflicts in future
   backports.


[PPC64] Cleanup non-vector version of CRC32
http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8217459/   (4/4)

v2:
- Add {BIG,LITTLE}_ENDIAN_ONLY to src/share/vm/utilities/macros.hpp
- Add kernel_crc32_singleByteReg from change 8175369 [2] as the clean-up uses it
   in InterpreterGenerator::generate_CRC32_update_entry().


--

Best regards,
Gustavo

[0] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/88847a1b3718
[1] http://hg.openjdk.java.net/jdk/jdk/rev/5a69ba3a4fd1#l1.7
[2] https://bugs.openjdk.java.net/browse/JDK-8175369

From david.holmes at oracle.com  Mon Nov  4 23:18:48 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 5 Nov 2019 09:18:48 +1000
Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread
In-Reply-To: <VI1PR0201MB247928CE867614F8272E5D719A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB24797F385592A8C572B4064E9A900@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <VI1PR0201MB2479A25B5B2E26D79ADDE7BF9A690@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <15a92da5-c5ba-ce55-341d-5f60acf14c3a@oracle.com>
 <VI1PR0201MB2479D25DB14E79BBF16666769A680@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com>
 <HE1PR0201MB24755374CE7618AA8545A2309A6B0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com>
 <A0870A55-0738-4CC6-AF3C-50B6E2324D3F@oracle.com>
 <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com>
 <VI1PR0201MB24790137F2D7A941A33CE4949A660@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d8990c59-b51e-1c21-3e7c-272d287fc711@oracle.com>
 <VI1PR0201MB2479DDFCD2E55541481E65ED9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com>
 <VI1PR0201MB247928CE867614F8272E5D719A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com>

On 4/11/2019 9:12 pm, Doerr, Martin wrote:
> Hi all,
> 
> @Dean
>> changing can_remove() to CompileBroker::can_remove()?
> Yes. That would be an option.
> 
> @Kim, David
> I think there's another problem with this implementation.
> It introduces a use-after-free pattern due to concurrency.
> Compiler threads may still read the oops from the handles after one of them has called destroy_global until next safepoint. It doesn't matter which values they get in this case, but the VM should not crash. I believe that OopStorage allows freeing storage without safepoints, so this may be unsafe. Right?

I don't understand what you mean. If a compiler thread holds an oop, any 
oop, it must hold it in a Handle to ensure it can't be gc'd.

David

> If so, I think replacing the oops in the handles (and keeping the handles alive) would be better. And also much more simple.
> 
> Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: dean.long at oracle.com <dean.long at oracle.com>
>> Sent: Samstag, 2. November 2019 08:36
>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
>> <david.holmes at oracle.com>; Kim Barrett <kim.barrett at oracle.com>
>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>
>> Hi Martin,
>>
>> On 10/30/19 3:18 AM, Doerr, Martin wrote:
>>> Hi David,
>>>
>>>> I don't think factoring out CompileBroker::clear_compiler2_object when
>>>> it is only used once was warranted, but that's call for compiler team to
>>>> make.
>>> I did that because _compiler2_objects is private and there's currently no
>> setter available.
>>> But let's see what the compiler folks think.
>>
>> how about changing can_remove() to CompileBroker::can_remove()? Then
>> you
>> can access _compiler2_objects directly, right?
>>
>> dl
>>>> Otherwise changes seem fine and I have noted the use of the
>>>> MutexUnlocker as per your direct email.
>>> Thanks a lot for reviewing. It was not a trivial one ??
>>>
>>> You had noticed an incorrect usage of the CHECK macro. I've created a new
>> bug for that:
>>> https://bugs.openjdk.java.net/browse/JDK-8233193
>>> Would be great if you could take a look if that's what you meant and made
>> adaptions if needed.
>>>
>>> Best regards,
>>> Martin
>>>
>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> Sent: Mittwoch, 30. Oktober 2019 05:47
>>>> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
>>>> <kim.barrett at oracle.com>
>>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net;
>>>> David Holmes <David.Holmes at oracle.com>
>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>>>
>>>> Hi Martin,
>>>>
>>>> On 29/10/2019 12:06 am, Doerr, Martin wrote:
>>>>> Hi David and Kim,
>>>>>
>>>>> I think it's easier to talk about code. So here's a new webrev:
>>>>>
>>>>
>> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
>>>> ev.03/
>>>>
>>>> I don't think factoring out CompileBroker::clear_compiler2_object when
>>>> it is only used once was warranted, but that's call for compiler team to
>>>> make. Otherwise changes seem fine and I have noted the use of the
>>>> MutexUnlocker as per your direct email.
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>> @Kim:
>>>>> Thanks for looking at the handle related parts. It's ok if you don't want to
>>>> be a reviewer of the whole change.
>>>>>> I think it's weird that can_remove() is a predicate with optional side
>>>>>> effects.  I think it would be simpler to have it be a pure predicate,
>>>>>> and have the one caller with do_it = true perform the updates.  That
>>>>>> should include NULLing out the handle pointer (perhaps debug-only,
>> but
>>>>>> it doesn't cost much to cleanly maintain the data structure).
>>>>> Nevertheless, it has the advantage that it enforces the update to be
>>>> consistent.
>>>>> A caller could use it without holding the lock or mess it up otherwise.
>>>>> In addition, I don't what to change that as part of this fix.
>>>>>
>>>>>> So far as I can tell, THREAD == NULL here.
>>>>> This is a very tricky part (not my invention):
>>>>> EXCEPTION_MARK contains an ExceptionMark constructor call which
>> sets
>>>> __the_thread__ to Thread::current().
>>>>> I don't want to publish my opinion about this ??
>>>>>
>>>>> @David:
>>>>> Seems like this option is preferred over option 3
>>>> (possibly_add_compiler_threads part of webrev.02 and leave the
>>>> initialization as is).
>>>>> So when you're ok with it, I'll request a 2nd review from the compiler
>> folks
>>>> (I should change the subject to contain RFR).
>>>>> Thanks,
>>>>> Martin
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>> Sent: Montag, 28. Oktober 2019 05:04
>>>>>> To: Kim Barrett <kim.barrett at oracle.com>
>>>>>> Cc: Doerr, Martin <martin.doerr at sap.com>; Vladimir Kozlov
>>>>>> (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>; hotspot-
>>>>>> compiler-dev at openjdk.java.net
>>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>>>>>
>>>>>> On 28/10/2019 1:42 pm, Kim Barrett wrote:
>>>>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes
>>>> <david.holmes at oracle.com>
>>>>>> wrote:
>>>>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote:
>>>>>>>>> Hi Kim,
>>>>>>>>> I didn't like using the OopStorage stuff directly, either. I just have
>> not
>>>>>> seen how to allocate a global handle and add the oop later.
>>>>>>>>> Thanks for pointing me to JVMCI::make_global. I was not aware of
>>>> that.
>>>>>>>>> So I can imagine 3 ways to implement it:
>>>>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just
>> added
>>>>>> that to
>>>>
>> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
>>>>>> ev.01/
>>>>>>>>> We may want to improve that further by setting the handle pointer
>> to
>>>>>> NULL and asserting that it is NULL before adding the new one.
>>>>>>>>> I had been concerned about NULLs in the array, but looks like the
>>>>>> existing code can deal with that.
>>>>>>>> I think it would be cleaner to both destroy the global handle and
>> NULL it
>>>> in
>>>>>> the array at the same time.
>>>>>>>> This comment
>>>>>>>>
>>>>>>>> 325         // Old j.l.Thread object can die here.
>>>>>>>>
>>>>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct-
>>> threadObj()
>>>> so
>>>>>> can't "die" until that is also cleared during the actual termination
>> process.
>>>>>>> I think if there is such a thread here that it can't die, because the
>>>>>>> death predicate (the can_remove stuff) won't see that old thread as
>>>>>>> the last thread in _compiler2_objects.  That's what I meant by this:
>>>>>>>
>>>>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett <kim.barrett at oracle.com>
>>>>>> wrote:
>>>>>>>> I also think that here:
>>>>>>>>
>>>>>>>> 947         jobject thread_handle =
>>>> JNIHandles::make_global(thread_oop);
>>>>>>>> 948         _compiler2_objects[i] = thread_handle;
>>>>>>>>
>>>>>>>> should assert _compiler2_objects[i] == NULL.  Or if that isn't a valid
>>>>>>>> assertion then I think there are other problems.
>>>>>>> I think either that comment about an old thread is wrong (and the
>> NULL
>>>>>>> assertion I suggested is okay), or I think the whole mechanism here
>>>>>>> has problems.  Or at least I was unable to figure out how it could
>> work...
>>>>>>>
>>>>>> I'm not following sorry. You can't assert NULL unless it's actually set
>>>>>> to NULL which it presently isn't. But it could be set NULL as Martin
>>>>>> suggested:
>>>>>>
>>>>>> "We may want to improve that further by setting the handle pointer to
>>>>>> NULL and asserting that it is NULL before adding the new one."
>>>>>>
>>>>>> and which I also supported. But that aside once the delete_global has
>>>>>> been called that JNIHandle no longer references the j.l.Thread that it
>>>>>> did, at which point it is only reachable via the threadObj() of the
>>>>>> CompilerThread.
>>>>>>
>>>>>> David
> 

From fujie at loongson.cn  Tue Nov  5 01:43:02 2019
From: fujie at loongson.cn (Jie Fu)
Date: Tue, 5 Nov 2019 09:43:02 +0800
Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003
In-Reply-To: <DDB215A6-AFA2-4D06-A0A1-37DC3B748BB9@oracle.com>
References: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn>
 <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com>
 <cf97774a-98e7-fef4-dab7-5e1472b08612@loongson.cn>
 <DDB215A6-AFA2-4D06-A0A1-37DC3B748BB9@oracle.com>
Message-ID: <28b9a36b-b2ea-8186-0750-65685238cc36@loongson.cn>

Hi Igor,

Thanks for your help and review.

Updated: http://cr.openjdk.java.net/~jiefu/8233429/webrev.02/
 ?- Added the reviewers in it.

Hope you can sponsor it.

Thanks a lot.
Best regards,
Jie

On 2019/11/5 ??1:57, Igor Veresov wrote:
> This look good to me.
>
> igor
>
>
>
>> On Nov 4, 2019, at 1:33 AM, Jie Fu <fujie at loongson.cn 
>> <mailto:fujie at loongson.cn>> wrote:
>>
>> Hi Aleksey and Tobias,
>>
>> Thanks for your review and valuable comments.
>>
>> I'm sorry to mention that Igor is teaching me how to fix the bug off 
>> the list these days.
>>
>> What do you think of this version?
>> http://cr.openjdk.java.net/~jiefu/8233429/webrev.01/
>>
>> I prefer webrev.01.
>>
>> Thanks a lot.
>> Best regards,
>> Jie
>>
>>
>> On 2019/11/4 ??4:53, Aleksey Shipilev wrote:
>>> On 11/2/19 10:29 AM, Jie Fu wrote:
>>>> Hi all,
>>>>
>>>> May I get reviews for this small fix?
>>>>
>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233429
>>>> Webrev: http://cr.openjdk.java.net/~jiefu/8233429/webrev.00/
>>> Looks fine to me.
>>>
>>> The alternative is to stub out CompilationModeFlag::*() definitions 
>>> under TIERED define, but that
>>> would be more awkward than effectively using the "default" mode for 
>>> minimal and zero VMs.
>>>
>>
>

From fujie at loongson.cn  Tue Nov  5 01:44:33 2019
From: fujie at loongson.cn (Jie Fu)
Date: Tue, 5 Nov 2019 09:44:33 +0800
Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003
In-Reply-To: <268b6cd8-7b66-5e5a-4933-e67197ae34ec@redhat.com>
References: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn>
 <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com>
 <cf97774a-98e7-fef4-dab7-5e1472b08612@loongson.cn>
 <268b6cd8-7b66-5e5a-4933-e67197ae34ec@redhat.com>
Message-ID: <dc935d7a-d191-4ab0-5f6c-b8eb0df03ef7@loongson.cn>

Thanks Aleksey for your review.

On 2019/11/5 ??2:02, Aleksey Shipilev wrote:
> On 11/4/19 10:33 AM, Jie Fu wrote:
>> What do you think of this version?
>>  ? http://cr.openjdk.java.net/~jiefu/8233429/webrev.01/
> Alright, this looks fine too.
>


From martin.doerr at sap.com  Tue Nov  5 08:40:02 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 5 Nov 2019 08:40:02 +0000
Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread
In-Reply-To: <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com>
References: <VI1PR0201MB24797F385592A8C572B4064E9A900@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <VI1PR0201MB2479A25B5B2E26D79ADDE7BF9A690@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <15a92da5-c5ba-ce55-341d-5f60acf14c3a@oracle.com>
 <VI1PR0201MB2479D25DB14E79BBF16666769A680@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com>
 <HE1PR0201MB24755374CE7618AA8545A2309A6B0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com>
 <A0870A55-0738-4CC6-AF3C-50B6E2324D3F@oracle.com>
 <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com>
 <VI1PR0201MB24790137F2D7A941A33CE4949A660@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d8990c59-b51e-1c21-3e7c-272d287fc711@oracle.com>
 <VI1PR0201MB2479DDFCD2E55541481E65ED9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com>
 <VI1PR0201MB247928CE867614F8272E5D719A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com>
Message-ID: <VI1PR0201MB247921C496C30C638B0959279A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi David

> I don't understand what you mean. If a compiler thread holds an oop, any
> oop, it must hold it in a Handle to ensure it can't be gc'd.

The problem is not related to gc.
My change introduces destroy_global for the handles. This means that the OopStorage portion which has held the oop can get freed.
However, other compiler threads are running concurrently. They may execute code which reads the oop from the handle which is freed by this thread. Reading stale data is not a problem here, but reading freed memory may assert or even crash in general.
I can't see how OopStorage supports reading from handles which were freed by destroy_global.

I think it would be safe if the freeing only occurred at safepoints, but I don't think this is the case.

Best regards,
Martin


> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Dienstag, 5. November 2019 00:19
> To: Doerr, Martin <martin.doerr at sap.com>; dean.long at oracle.com; Kim
> Barrett <kim.barrett at oracle.com>
> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> 
> On 4/11/2019 9:12 pm, Doerr, Martin wrote:
> > Hi all,
> >
> > @Dean
> >> changing can_remove() to CompileBroker::can_remove()?
> > Yes. That would be an option.
> >
> > @Kim, David
> > I think there's another problem with this implementation.
> > It introduces a use-after-free pattern due to concurrency.
> > Compiler threads may still read the oops from the handles after one of
> them has called destroy_global until next safepoint. It doesn't matter which
> values they get in this case, but the VM should not crash. I believe that
> OopStorage allows freeing storage without safepoints, so this may be
> unsafe. Right?
> 
> I don't understand what you mean. If a compiler thread holds an oop, any
> oop, it must hold it in a Handle to ensure it can't be gc'd.
> 
> David
> 
> > If so, I think replacing the oops in the handles (and keeping the handles
> alive) would be better. And also much more simple.
> >
> > Best regards,
> > Martin
> >
> >
> >> -----Original Message-----
> >> From: dean.long at oracle.com <dean.long at oracle.com>
> >> Sent: Samstag, 2. November 2019 08:36
> >> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
> >> <david.holmes at oracle.com>; Kim Barrett <kim.barrett at oracle.com>
> >> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> >> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> >>
> >> Hi Martin,
> >>
> >> On 10/30/19 3:18 AM, Doerr, Martin wrote:
> >>> Hi David,
> >>>
> >>>> I don't think factoring out CompileBroker::clear_compiler2_object
> when
> >>>> it is only used once was warranted, but that's call for compiler team to
> >>>> make.
> >>> I did that because _compiler2_objects is private and there's currently no
> >> setter available.
> >>> But let's see what the compiler folks think.
> >>
> >> how about changing can_remove() to CompileBroker::can_remove()?
> Then
> >> you
> >> can access _compiler2_objects directly, right?
> >>
> >> dl
> >>>> Otherwise changes seem fine and I have noted the use of the
> >>>> MutexUnlocker as per your direct email.
> >>> Thanks a lot for reviewing. It was not a trivial one ??
> >>>
> >>> You had noticed an incorrect usage of the CHECK macro. I've created a
> new
> >> bug for that:
> >>> https://bugs.openjdk.java.net/browse/JDK-8233193
> >>> Would be great if you could take a look if that's what you meant and
> made
> >> adaptions if needed.
> >>>
> >>> Best regards,
> >>> Martin
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: David Holmes <david.holmes at oracle.com>
> >>>> Sent: Mittwoch, 30. Oktober 2019 05:47
> >>>> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
> >>>> <kim.barrett at oracle.com>
> >>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> >>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-
> dev at openjdk.java.net;
> >>>> David Holmes <David.Holmes at oracle.com>
> >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> >>>>
> >>>> Hi Martin,
> >>>>
> >>>> On 29/10/2019 12:06 am, Doerr, Martin wrote:
> >>>>> Hi David and Kim,
> >>>>>
> >>>>> I think it's easier to talk about code. So here's a new webrev:
> >>>>>
> >>>>
> >>
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> >>>> ev.03/
> >>>>
> >>>> I don't think factoring out CompileBroker::clear_compiler2_object
> when
> >>>> it is only used once was warranted, but that's call for compiler team to
> >>>> make. Otherwise changes seem fine and I have noted the use of the
> >>>> MutexUnlocker as per your direct email.
> >>>>
> >>>> Thanks,
> >>>> David
> >>>> -----
> >>>>
> >>>>> @Kim:
> >>>>> Thanks for looking at the handle related parts. It's ok if you don't want
> to
> >>>> be a reviewer of the whole change.
> >>>>>> I think it's weird that can_remove() is a predicate with optional side
> >>>>>> effects.  I think it would be simpler to have it be a pure predicate,
> >>>>>> and have the one caller with do_it = true perform the updates.  That
> >>>>>> should include NULLing out the handle pointer (perhaps debug-only,
> >> but
> >>>>>> it doesn't cost much to cleanly maintain the data structure).
> >>>>> Nevertheless, it has the advantage that it enforces the update to be
> >>>> consistent.
> >>>>> A caller could use it without holding the lock or mess it up otherwise.
> >>>>> In addition, I don't what to change that as part of this fix.
> >>>>>
> >>>>>> So far as I can tell, THREAD == NULL here.
> >>>>> This is a very tricky part (not my invention):
> >>>>> EXCEPTION_MARK contains an ExceptionMark constructor call which
> >> sets
> >>>> __the_thread__ to Thread::current().
> >>>>> I don't want to publish my opinion about this ??
> >>>>>
> >>>>> @David:
> >>>>> Seems like this option is preferred over option 3
> >>>> (possibly_add_compiler_threads part of webrev.02 and leave the
> >>>> initialization as is).
> >>>>> So when you're ok with it, I'll request a 2nd review from the compiler
> >> folks
> >>>> (I should change the subject to contain RFR).
> >>>>> Thanks,
> >>>>> Martin
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: David Holmes <david.holmes at oracle.com>
> >>>>>> Sent: Montag, 28. Oktober 2019 05:04
> >>>>>> To: Kim Barrett <kim.barrett at oracle.com>
> >>>>>> Cc: Doerr, Martin <martin.doerr at sap.com>; Vladimir Kozlov
> >>>>>> (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>;
> hotspot-
> >>>>>> compiler-dev at openjdk.java.net
> >>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI
> CompilerThread
> >>>>>>
> >>>>>> On 28/10/2019 1:42 pm, Kim Barrett wrote:
> >>>>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes
> >>>> <david.holmes at oracle.com>
> >>>>>> wrote:
> >>>>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote:
> >>>>>>>>> Hi Kim,
> >>>>>>>>> I didn't like using the OopStorage stuff directly, either. I just have
> >> not
> >>>>>> seen how to allocate a global handle and add the oop later.
> >>>>>>>>> Thanks for pointing me to JVMCI::make_global. I was not aware
> of
> >>>> that.
> >>>>>>>>> So I can imagine 3 ways to implement it:
> >>>>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just
> >> added
> >>>>>> that to
> >>>>
> >>
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> >>>>>> ev.01/
> >>>>>>>>> We may want to improve that further by setting the handle
> pointer
> >> to
> >>>>>> NULL and asserting that it is NULL before adding the new one.
> >>>>>>>>> I had been concerned about NULLs in the array, but looks like
> the
> >>>>>> existing code can deal with that.
> >>>>>>>> I think it would be cleaner to both destroy the global handle and
> >> NULL it
> >>>> in
> >>>>>> the array at the same time.
> >>>>>>>> This comment
> >>>>>>>>
> >>>>>>>> 325         // Old j.l.Thread object can die here.
> >>>>>>>>
> >>>>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct-
> >>> threadObj()
> >>>> so
> >>>>>> can't "die" until that is also cleared during the actual termination
> >> process.
> >>>>>>> I think if there is such a thread here that it can't die, because the
> >>>>>>> death predicate (the can_remove stuff) won't see that old thread
> as
> >>>>>>> the last thread in _compiler2_objects.  That's what I meant by this:
> >>>>>>>
> >>>>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett
> <kim.barrett at oracle.com>
> >>>>>> wrote:
> >>>>>>>> I also think that here:
> >>>>>>>>
> >>>>>>>> 947         jobject thread_handle =
> >>>> JNIHandles::make_global(thread_oop);
> >>>>>>>> 948         _compiler2_objects[i] = thread_handle;
> >>>>>>>>
> >>>>>>>> should assert _compiler2_objects[i] == NULL.  Or if that isn't a
> valid
> >>>>>>>> assertion then I think there are other problems.
> >>>>>>> I think either that comment about an old thread is wrong (and the
> >> NULL
> >>>>>>> assertion I suggested is okay), or I think the whole mechanism here
> >>>>>>> has problems.  Or at least I was unable to figure out how it could
> >> work...
> >>>>>>>
> >>>>>> I'm not following sorry. You can't assert NULL unless it's actually set
> >>>>>> to NULL which it presently isn't. But it could be set NULL as Martin
> >>>>>> suggested:
> >>>>>>
> >>>>>> "We may want to improve that further by setting the handle pointer
> to
> >>>>>> NULL and asserting that it is NULL before adding the new one."
> >>>>>>
> >>>>>> and which I also supported. But that aside once the delete_global
> has
> >>>>>> been called that JNIHandle no longer references the j.l.Thread that it
> >>>>>> did, at which point it is only reachable via the threadObj() of the
> >>>>>> CompilerThread.
> >>>>>>
> >>>>>> David
> >

From tobias.hartmann at oracle.com  Tue Nov  5 08:43:45 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 5 Nov 2019 09:43:45 +0100
Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003
In-Reply-To: <28b9a36b-b2ea-8186-0750-65685238cc36@loongson.cn>
References: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn>
 <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com>
 <cf97774a-98e7-fef4-dab7-5e1472b08612@loongson.cn>
 <DDB215A6-AFA2-4D06-A0A1-37DC3B748BB9@oracle.com>
 <28b9a36b-b2ea-8186-0750-65685238cc36@loongson.cn>
Message-ID: <b8e8fb10-1f6f-3d9d-88b7-f1d50f78d415@oracle.com>

Looks good to me too. Pushed.

Best regards,
Tobias

On 05.11.19 02:43, Jie Fu wrote:
> Hi Igor,
> 
> Thanks for your help and review.
> 
> Updated: http://cr.openjdk.java.net/~jiefu/8233429/webrev.02/
> ?- Added the reviewers in it.
> 
> Hope you can sponsor it.
> 
> Thanks a lot.
> Best regards,
> Jie
> 
> On 2019/11/5 ??1:57, Igor Veresov wrote:
>> This look good to me.
>>
>> igor
>>
>>
>>
>>> On Nov 4, 2019, at 1:33 AM, Jie Fu <fujie at loongson.cn <mailto:fujie at loongson.cn>> wrote:
>>>
>>> Hi Aleksey and Tobias,
>>>
>>> Thanks for your review and valuable comments.
>>>
>>> I'm sorry to mention that Igor is teaching me how to fix the bug off the list these days.
>>>
>>> What do you think of this version?
>>> http://cr.openjdk.java.net/~jiefu/8233429/webrev.01/
>>>
>>> I prefer webrev.01.
>>>
>>> Thanks a lot.
>>> Best regards,
>>> Jie
>>>
>>>
>>> On 2019/11/4 ??4:53, Aleksey Shipilev wrote:
>>>> On 11/2/19 10:29 AM, Jie Fu wrote:
>>>>> Hi all,
>>>>>
>>>>> May I get reviews for this small fix?
>>>>>
>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233429
>>>>> Webrev: http://cr.openjdk.java.net/~jiefu/8233429/webrev.00/
>>>> Looks fine to me.
>>>>
>>>> The alternative is to stub out CompilationModeFlag::*() definitions under TIERED define, but that
>>>> would be more awkward than effectively using the "default" mode for minimal and zero VMs.
>>>>
>>>
>>

From fujie at loongson.cn  Tue Nov  5 08:48:40 2019
From: fujie at loongson.cn (Jie Fu)
Date: Tue, 5 Nov 2019 16:48:40 +0800
Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003
In-Reply-To: <b8e8fb10-1f6f-3d9d-88b7-f1d50f78d415@oracle.com>
References: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn>
 <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com>
 <cf97774a-98e7-fef4-dab7-5e1472b08612@loongson.cn>
 <DDB215A6-AFA2-4D06-A0A1-37DC3B748BB9@oracle.com>
 <28b9a36b-b2ea-8186-0750-65685238cc36@loongson.cn>
 <b8e8fb10-1f6f-3d9d-88b7-f1d50f78d415@oracle.com>
Message-ID: <110c9573-f9b2-0bdd-04d3-7086d7c2ad6c@loongson.cn>

Thank you so much, Tobias.

On 2019/11/5 ??4:43, Tobias Hartmann wrote:
> Looks good to me too. Pushed.
>
> Best regards,
> Tobias
>
> On 05.11.19 02:43, Jie Fu wrote:
>> Hi Igor,
>>
>> Thanks for your help and review.
>>
>> Updated: http://cr.openjdk.java.net/~jiefu/8233429/webrev.02/
>>  ?- Added the reviewers in it.
>>
>> Hope you can sponsor it.
>>
>> Thanks a lot.
>> Best regards,
>> Jie
>>
>> On 2019/11/5 ??1:57, Igor Veresov wrote:
>>> This look good to me.
>>>
>>> igor
>>>
>>>
>>>
>>>> On Nov 4, 2019, at 1:33 AM, Jie Fu <fujie at loongson.cn <mailto:fujie at loongson.cn>> wrote:
>>>>
>>>> Hi Aleksey and Tobias,
>>>>
>>>> Thanks for your review and valuable comments.
>>>>
>>>> I'm sorry to mention that Igor is teaching me how to fix the bug off the list these days.
>>>>
>>>> What do you think of this version?
>>>> http://cr.openjdk.java.net/~jiefu/8233429/webrev.01/
>>>>
>>>> I prefer webrev.01.
>>>>
>>>> Thanks a lot.
>>>> Best regards,
>>>> Jie
>>>>
>>>>
>>>> On 2019/11/4 ??4:53, Aleksey Shipilev wrote:
>>>>> On 11/2/19 10:29 AM, Jie Fu wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> May I get reviews for this small fix?
>>>>>>
>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233429
>>>>>> Webrev: http://cr.openjdk.java.net/~jiefu/8233429/webrev.00/
>>>>> Looks fine to me.
>>>>>
>>>>> The alternative is to stub out CompilationModeFlag::*() definitions under TIERED define, but that
>>>>> would be more awkward than effectively using the "default" mode for minimal and zero VMs.
>>>>>


From tobias.hartmann at oracle.com  Tue Nov  5 09:01:41 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 5 Nov 2019 10:01:41 +0100
Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out
In-Reply-To: <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com>
References: <878spbc0c8.fsf@redhat.com>
 <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com>
Message-ID: <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com>

Performance results look good.

Best regards,
Tobias

On 04.11.19 08:25, Tobias Hartmann wrote:
> Hi Roland,
> 
> this seems reasonable to me but I'm concerned that it might cause performance regressions. I'll run
> some tests in our system.
> 
> Best regards,
> Tobias
> 
> On 23.10.19 10:50, Roland Westrelin wrote:
>>
>> http://cr.openjdk.java.net/~roland/8232539/webrev.00/
>>
>> I couldn't come up with a test case because node processing order during
>> IGVN matters. Bug was reported against 11 but I see no reason it
>> wouldn't apply to current code as well.
>>
>> At parse time, predicates are added by
>> Parse::maybe_add_predicate_after_if() but not loop is actually
>> created. Compile::_major_progress is cleared. On the next round of IGVN,
>> one input of a region points to predicates. The same region has an if as
>> use that can be split through phi during IGVN. The predicates are going
>> to be removed by IGVN. But that happens in multiple steps because there
>> are several predicates (for reason Deoptimization::Reason_predicate,
>> Deoptimization::Reason_loop_limit_check etc.) and because for each
>> predicate one IGVN iteration must first remove the Opaque1 node, then
>> another kill the IfFalse projection, finally another replace the IfTrue
>> projection by the If control input.
>>
>> Split if occurs while predicates are in the process of being removed. It
>> sees predicates, tries to walk over them, encounters a predicates that's
>> been half removed (false projection removed) and we hit the assert/crash.
>>
>> I propose we simply not apply IGVN split if if we're splitting through a
>> loop or if there's a predicate input to a region because:
>>
>> - Making split if robust to dying predicates is not straightforward as
>>   far as I can tell
>>
>> - Loop opts split if doesn't split through loop header so why would it
>>   make sense for IGVN split if?
>>
>> - I'm wondering if there are other cases where handling of predicates in
>>   split if could be wrong (and so more trouble ahead):
>>
>>   + What if we split through a Loop region, predicates were added by
>>   loop optimizations, loop opts are now over so the predicates added at
>>   parse time were removed: then PhaseIdealLoop::find_predicate()
>>   wouldn't report a predicate but cloning predicates would still be
>>   required for correctness?
>>
>>   + What if we have no loop, a region has predicates as input,
>>   predicates are going to die but have not yet been processed, split if
>>   uselessly duplicates predicates but one of then is control dependent
>>   on the branch it is in so cloning predicates actually causes a broken
>>   graph?
>>
>> So overall it feels safer to me to simply bail out from split if for
>> loops/predicates.
>>
>> Roland.
>>

From rwestrel at redhat.com  Tue Nov  5 09:09:56 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 05 Nov 2019 10:09:56 +0100
Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out
In-Reply-To: <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com>
References: <878spbc0c8.fsf@redhat.com>
 <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com>
 <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com>
Message-ID: <87y2wu7kpn.fsf@redhat.com>


Hi Tobias,

Thanks for the review and for performance testing.

Roland.

From martin.doerr at sap.com  Tue Nov  5 09:12:56 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 5 Nov 2019 09:12:56 +0000
Subject: RFR(S): 8233081: C1: PatchingStub for field access copies too much
In-Reply-To: <db231bce-9e5d-bccb-61c8-5f098b028293@oracle.com>
References: <VI1PR0201MB24798B4FFDCAEE55104A74279A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <db231bce-9e5d-bccb-61c8-5f098b028293@oracle.com>
Message-ID: <VI1PR0201MB2479E7390C93F3BE2A8188AB9A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Dean,

thank you for the review.

You can call it a bug fix for SPARC, but I'd rather call it removal of dead an incorrect code.

Best regards,
Martin


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of dean.long at oracle.com
> Sent: Montag, 4. November 2019 19:59
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR(S): 8233081: C1: PatchingStub for field access copies too
> much
> 
> Looks good.? If I am reading it right, you also fixed a (latent?) bug in
> Sparc by changing the max size return from 7 words to 8 words.
> 
> dl
> 
> On 10/30/19 8:38 AM, Doerr, Martin wrote:
> > Hi,
> >
> > I'd like to fix an issue in C1's PatchingStub implementation for
> "access_field_id".
> > We had noticed that the code in the template exceeded the 255 byte
> limitation when switching on VerifyOops on PPC64.
> > I'd like to improve the situation for all platforms.
> >
> > More detailed bug description:
> > https://bugs.openjdk.java.net/browse/JDK-8233081
> >
> > I need a function to determine how many bytes are needed for the
> NativeMovRegMem.
> > x86 has next_instruction_address() which could in theory be used, but I
> noticed that it's dead code which is no longer correct.
> > Is it ok to remove it?
> > I'd also like to remove the constant instruction_size from
> NativeMovRegMem because it's not constant.
> > I'd prefer to introduce num_bytes_to_end_of_patch() for the purpose of
> determining how many bytes to copy for the "access_field_id" PatchingStub.
> > We can factor out the offset computation from offset() and set_offset()
> and reuse it. This enforces consistency.
> >
> > Webrev:
> >
> http://cr.openjdk.java.net/~mdoerr/8233081_C1_access_field_patching/we
> brev.00/
> >
> > Please review.
> >
> > Best regards,
> > Martin
> >


From martin.doerr at sap.com  Tue Nov  5 11:07:17 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 5 Nov 2019 11:07:17 +0000
Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread
In-Reply-To: <VI1PR0201MB247921C496C30C638B0959279A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB24797F385592A8C572B4064E9A900@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <VI1PR0201MB2479A25B5B2E26D79ADDE7BF9A690@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <15a92da5-c5ba-ce55-341d-5f60acf14c3a@oracle.com>
 <VI1PR0201MB2479D25DB14E79BBF16666769A680@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com>
 <HE1PR0201MB24755374CE7618AA8545A2309A6B0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com>
 <A0870A55-0738-4CC6-AF3C-50B6E2324D3F@oracle.com>
 <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com>
 <VI1PR0201MB24790137F2D7A941A33CE4949A660@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d8990c59-b51e-1c21-3e7c-272d287fc711@oracle.com>
 <VI1PR0201MB2479DDFCD2E55541481E65ED9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com>
 <VI1PR0201MB247928CE867614F8272E5D719A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com>
 <VI1PR0201MB247921C496C30C638B0959279A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <VI1PR0201MB2479DD48B58D0A770FC7B56E9A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Btw. this seems to be the issue Vladimir has found by running more tests:
https://bugs.openjdk.java.net/secure/attachment/85306/hs_err_pid93932.log


> -----Original Message-----
> From: Doerr, Martin
> Sent: Dienstag, 5. November 2019 09:40
> To: David Holmes <david.holmes at oracle.com>; dean.long at oracle.com; Kim
> Barrett <kim.barrett at oracle.com>
> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: RE: JDK-8230459: Test failed to resume JVMCI CompilerThread
> 
> Hi David
> 
> > I don't understand what you mean. If a compiler thread holds an oop, any
> > oop, it must hold it in a Handle to ensure it can't be gc'd.
> 
> The problem is not related to gc.
> My change introduces destroy_global for the handles. This means that the
> OopStorage portion which has held the oop can get freed.
> However, other compiler threads are running concurrently. They may
> execute code which reads the oop from the handle which is freed by this
> thread. Reading stale data is not a problem here, but reading freed memory
> may assert or even crash in general.
> I can't see how OopStorage supports reading from handles which were freed
> by destroy_global.
> 
> I think it would be safe if the freeing only occurred at safepoints, but I don't
> think this is the case.
> 
> Best regards,
> Martin
> 
> 
> > -----Original Message-----
> > From: David Holmes <david.holmes at oracle.com>
> > Sent: Dienstag, 5. November 2019 00:19
> > To: Doerr, Martin <martin.doerr at sap.com>; dean.long at oracle.com; Kim
> > Barrett <kim.barrett at oracle.com>
> > Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> > <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> > Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> >
> > On 4/11/2019 9:12 pm, Doerr, Martin wrote:
> > > Hi all,
> > >
> > > @Dean
> > >> changing can_remove() to CompileBroker::can_remove()?
> > > Yes. That would be an option.
> > >
> > > @Kim, David
> > > I think there's another problem with this implementation.
> > > It introduces a use-after-free pattern due to concurrency.
> > > Compiler threads may still read the oops from the handles after one of
> > them has called destroy_global until next safepoint. It doesn't matter
> which
> > values they get in this case, but the VM should not crash. I believe that
> > OopStorage allows freeing storage without safepoints, so this may be
> > unsafe. Right?
> >
> > I don't understand what you mean. If a compiler thread holds an oop, any
> > oop, it must hold it in a Handle to ensure it can't be gc'd.
> >
> > David
> >
> > > If so, I think replacing the oops in the handles (and keeping the handles
> > alive) would be better. And also much more simple.
> > >
> > > Best regards,
> > > Martin
> > >
> > >
> > >> -----Original Message-----
> > >> From: dean.long at oracle.com <dean.long at oracle.com>
> > >> Sent: Samstag, 2. November 2019 08:36
> > >> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
> > >> <david.holmes at oracle.com>; Kim Barrett <kim.barrett at oracle.com>
> > >> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> > >> <vladimir.kozlov at oracle.com>; hotspot-compiler-
> dev at openjdk.java.net
> > >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> > >>
> > >> Hi Martin,
> > >>
> > >> On 10/30/19 3:18 AM, Doerr, Martin wrote:
> > >>> Hi David,
> > >>>
> > >>>> I don't think factoring out CompileBroker::clear_compiler2_object
> > when
> > >>>> it is only used once was warranted, but that's call for compiler team to
> > >>>> make.
> > >>> I did that because _compiler2_objects is private and there's currently
> no
> > >> setter available.
> > >>> But let's see what the compiler folks think.
> > >>
> > >> how about changing can_remove() to CompileBroker::can_remove()?
> > Then
> > >> you
> > >> can access _compiler2_objects directly, right?
> > >>
> > >> dl
> > >>>> Otherwise changes seem fine and I have noted the use of the
> > >>>> MutexUnlocker as per your direct email.
> > >>> Thanks a lot for reviewing. It was not a trivial one ??
> > >>>
> > >>> You had noticed an incorrect usage of the CHECK macro. I've created a
> > new
> > >> bug for that:
> > >>> https://bugs.openjdk.java.net/browse/JDK-8233193
> > >>> Would be great if you could take a look if that's what you meant and
> > made
> > >> adaptions if needed.
> > >>>
> > >>> Best regards,
> > >>> Martin
> > >>>
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: David Holmes <david.holmes at oracle.com>
> > >>>> Sent: Mittwoch, 30. Oktober 2019 05:47
> > >>>> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
> > >>>> <kim.barrett at oracle.com>
> > >>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> > >>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-
> > dev at openjdk.java.net;
> > >>>> David Holmes <David.Holmes at oracle.com>
> > >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI
> CompilerThread
> > >>>>
> > >>>> Hi Martin,
> > >>>>
> > >>>> On 29/10/2019 12:06 am, Doerr, Martin wrote:
> > >>>>> Hi David and Kim,
> > >>>>>
> > >>>>> I think it's easier to talk about code. So here's a new webrev:
> > >>>>>
> > >>>>
> > >>
> >
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> > >>>> ev.03/
> > >>>>
> > >>>> I don't think factoring out CompileBroker::clear_compiler2_object
> > when
> > >>>> it is only used once was warranted, but that's call for compiler team to
> > >>>> make. Otherwise changes seem fine and I have noted the use of the
> > >>>> MutexUnlocker as per your direct email.
> > >>>>
> > >>>> Thanks,
> > >>>> David
> > >>>> -----
> > >>>>
> > >>>>> @Kim:
> > >>>>> Thanks for looking at the handle related parts. It's ok if you don't
> want
> > to
> > >>>> be a reviewer of the whole change.
> > >>>>>> I think it's weird that can_remove() is a predicate with optional side
> > >>>>>> effects.  I think it would be simpler to have it be a pure predicate,
> > >>>>>> and have the one caller with do_it = true perform the updates.
> That
> > >>>>>> should include NULLing out the handle pointer (perhaps debug-
> only,
> > >> but
> > >>>>>> it doesn't cost much to cleanly maintain the data structure).
> > >>>>> Nevertheless, it has the advantage that it enforces the update to be
> > >>>> consistent.
> > >>>>> A caller could use it without holding the lock or mess it up otherwise.
> > >>>>> In addition, I don't what to change that as part of this fix.
> > >>>>>
> > >>>>>> So far as I can tell, THREAD == NULL here.
> > >>>>> This is a very tricky part (not my invention):
> > >>>>> EXCEPTION_MARK contains an ExceptionMark constructor call which
> > >> sets
> > >>>> __the_thread__ to Thread::current().
> > >>>>> I don't want to publish my opinion about this ??
> > >>>>>
> > >>>>> @David:
> > >>>>> Seems like this option is preferred over option 3
> > >>>> (possibly_add_compiler_threads part of webrev.02 and leave the
> > >>>> initialization as is).
> > >>>>> So when you're ok with it, I'll request a 2nd review from the
> compiler
> > >> folks
> > >>>> (I should change the subject to contain RFR).
> > >>>>> Thanks,
> > >>>>> Martin
> > >>>>>
> > >>>>>
> > >>>>>> -----Original Message-----
> > >>>>>> From: David Holmes <david.holmes at oracle.com>
> > >>>>>> Sent: Montag, 28. Oktober 2019 05:04
> > >>>>>> To: Kim Barrett <kim.barrett at oracle.com>
> > >>>>>> Cc: Doerr, Martin <martin.doerr at sap.com>; Vladimir Kozlov
> > >>>>>> (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>;
> > hotspot-
> > >>>>>> compiler-dev at openjdk.java.net
> > >>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI
> > CompilerThread
> > >>>>>>
> > >>>>>> On 28/10/2019 1:42 pm, Kim Barrett wrote:
> > >>>>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes
> > >>>> <david.holmes at oracle.com>
> > >>>>>> wrote:
> > >>>>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote:
> > >>>>>>>>> Hi Kim,
> > >>>>>>>>> I didn't like using the OopStorage stuff directly, either. I just
> have
> > >> not
> > >>>>>> seen how to allocate a global handle and add the oop later.
> > >>>>>>>>> Thanks for pointing me to JVMCI::make_global. I was not
> aware
> > of
> > >>>> that.
> > >>>>>>>>> So I can imagine 3 ways to implement it:
> > >>>>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just
> > >> added
> > >>>>>> that to
> > >>>>
> > >>
> >
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> > >>>>>> ev.01/
> > >>>>>>>>> We may want to improve that further by setting the handle
> > pointer
> > >> to
> > >>>>>> NULL and asserting that it is NULL before adding the new one.
> > >>>>>>>>> I had been concerned about NULLs in the array, but looks like
> > the
> > >>>>>> existing code can deal with that.
> > >>>>>>>> I think it would be cleaner to both destroy the global handle and
> > >> NULL it
> > >>>> in
> > >>>>>> the array at the same time.
> > >>>>>>>> This comment
> > >>>>>>>>
> > >>>>>>>> 325         // Old j.l.Thread object can die here.
> > >>>>>>>>
> > >>>>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct-
> > >>> threadObj()
> > >>>> so
> > >>>>>> can't "die" until that is also cleared during the actual termination
> > >> process.
> > >>>>>>> I think if there is such a thread here that it can't die, because the
> > >>>>>>> death predicate (the can_remove stuff) won't see that old thread
> > as
> > >>>>>>> the last thread in _compiler2_objects.  That's what I meant by
> this:
> > >>>>>>>
> > >>>>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett
> > <kim.barrett at oracle.com>
> > >>>>>> wrote:
> > >>>>>>>> I also think that here:
> > >>>>>>>>
> > >>>>>>>> 947         jobject thread_handle =
> > >>>> JNIHandles::make_global(thread_oop);
> > >>>>>>>> 948         _compiler2_objects[i] = thread_handle;
> > >>>>>>>>
> > >>>>>>>> should assert _compiler2_objects[i] == NULL.  Or if that isn't a
> > valid
> > >>>>>>>> assertion then I think there are other problems.
> > >>>>>>> I think either that comment about an old thread is wrong (and the
> > >> NULL
> > >>>>>>> assertion I suggested is okay), or I think the whole mechanism
> here
> > >>>>>>> has problems.  Or at least I was unable to figure out how it could
> > >> work...
> > >>>>>>>
> > >>>>>> I'm not following sorry. You can't assert NULL unless it's actually set
> > >>>>>> to NULL which it presently isn't. But it could be set NULL as Martin
> > >>>>>> suggested:
> > >>>>>>
> > >>>>>> "We may want to improve that further by setting the handle
> pointer
> > to
> > >>>>>> NULL and asserting that it is NULL before adding the new one."
> > >>>>>>
> > >>>>>> and which I also supported. But that aside once the delete_global
> > has
> > >>>>>> been called that JNIHandle no longer references the j.l.Thread that
> it
> > >>>>>> did, at which point it is only reachable via the threadObj() of the
> > >>>>>> CompilerThread.
> > >>>>>>
> > >>>>>> David
> > >

From jorn.vernee at oracle.com  Tue Nov  5 12:03:20 2019
From: jorn.vernee at oracle.com (Jorn Vernee)
Date: Tue, 5 Nov 2019 13:03:20 +0100
Subject: RFR 8233389: Add PrintIdeal to compiler directives
In-Reply-To: <21ab250e-0564-69e4-62c0-07bb4dce9082@oracle.com>
References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com>
 <21ab250e-0564-69e4-62c0-07bb4dce9082@oracle.com>
Message-ID: <6547a22a-47e2-822f-0772-c2b0a7599088@oracle.com>

Hi,

I've update the patch per your suggestions: 
http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.02/
(Testing = tier1, manual)

- Moved the PrintIdeal flag (as well as IGVPrintLevel) into a 
NOT_PRODUCT in compilerDirectives.hpp.
- Moved _print_ideal field in Compile into an #ifndef PRODUCT block and 
moved the initialization to the initializer list.

W.r.t. usefulness of PrintIdeal vs PrintIdealGraph; The obvious thing is 
that PrintIdeal doesn't require IGV, which might be more useful if 
PrintIdeal were a diagnostic flag instead (as suggested by Nils), so it 
could be used from a standard JDK build which doesn't come with IGV. 
Another advantage that comes to mind is that PrintIdeal output is easier 
to share as text; you can just copy a few lines into the body of an 
email. That said, I haven't used either of these flags extensively, so I 
find it hard to judge whether one is clearly better than the other. But, 
it seems at least unfortunate that we have the PrintIdeal flag, but can 
not use it in compiler directives to filter the output.

Jorn

On 04/11/2019 15:57, Vladimir Ivanov wrote:
> Hi Jorn,
>
> src\hotspot\share\opto\compile.hpp:
> +?? bool????????????????? _print_ideal;?????????? // True if we should 
> dump node IR for this compilation
>
> Since the only usage is in non-product code, I suggest to put 
> _print_ideal into #ifndef PRODUCT, so you don't need to initialize it 
> in product build.
>
> Also, it'll allow you to just put it on initializer list instead of 
> doing it in the ctor body (akin to how _trace_opto_output is handled):
>
> src\hotspot\share\opto\compile.cpp:
>
> Compile::Compile( ciEnv* ci_env,
> ...
> ? : Phase(Compiler),
> ...
> ??? _has_reserved_stack_access(false),
> #ifndef PRODUCT
> ??? _trace_opto_output(directive->TraceOptoOutputOption),
> #endif
> ??? _has_method_handle_invokes(false),
>
>
> Overall, I don't see much value in PrintIdeal: PrintIdealGraph 
> provides much more detailed information (even though in XML format) 
> and IdealGraphVisualizer is better at browsing the graph. The only 
> thing I'm usually missing is full text dump output on individual nodes 
> (they are shown pruned in IGV; not sure whether it's IGV fault or the 
> info is missing in the dump).
>
> Best regards,
> Vladimir Ivanov
>
> On 01.11.2019 18:09, Jorn Vernee wrote:
>> Hi,
>>
>> I'd like to add PrintIdeal as a compiler directive in order to enable 
>> PrintIdeal for only a single method when combining it with the 
>> 'match' directive.
>>
>> Please review the following:
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8233389
>> Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/
>> (Testing = tier1, manual)
>>
>> As a heads-up; I'm not a committer on the jdk project, so if this 
>> sounds like a good idea, I would require a sponsor to push the changes.
>>
>> Thanks,
>> Jorn
>>

From david.holmes at oracle.com  Tue Nov  5 12:33:21 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 5 Nov 2019 22:33:21 +1000
Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread
In-Reply-To: <VI1PR0201MB247921C496C30C638B0959279A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB24797F385592A8C572B4064E9A900@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <VI1PR0201MB2479D25DB14E79BBF16666769A680@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com>
 <HE1PR0201MB24755374CE7618AA8545A2309A6B0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com>
 <A0870A55-0738-4CC6-AF3C-50B6E2324D3F@oracle.com>
 <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com>
 <VI1PR0201MB24790137F2D7A941A33CE4949A660@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d8990c59-b51e-1c21-3e7c-272d287fc711@oracle.com>
 <VI1PR0201MB2479DDFCD2E55541481E65ED9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com>
 <VI1PR0201MB247928CE867614F8272E5D719A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com>
 <VI1PR0201MB247921C496C30C638B0959279A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <6ac1f31e-61a6-fa92-75f6-cae0732915e3@oracle.com>

On 5/11/2019 6:40 pm, Doerr, Martin wrote:
> Hi David
> 
>> I don't understand what you mean. If a compiler thread holds an oop, any
>> oop, it must hold it in a Handle to ensure it can't be gc'd.
> 
> The problem is not related to gc.
> My change introduces destroy_global for the handles. This means that the OopStorage portion which has held the oop can get freed.
> However, other compiler threads are running concurrently. They may execute code which reads the oop from the handle which is freed by this thread. Reading stale data is not a problem here, but reading freed memory may assert or even crash in general.
> I can't see how OopStorage supports reading from handles which were freed by destroy_global.

With JVMCI compiler threads, each getting a new j.l.Thread oop that 
lasts for the lifetime of that compiler thread (just like a regular 
JavaThread) do we even actually need these arrays? I'm unclear what 
purpose they serve when we are not trying to reuse the oops stored in 
the array. ??

David
-----

> I think it would be safe if the freeing only occurred at safepoints, but I don't think this is the case.
> 
> Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Dienstag, 5. November 2019 00:19
>> To: Doerr, Martin <martin.doerr at sap.com>; dean.long at oracle.com; Kim
>> Barrett <kim.barrett at oracle.com>
>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>
>> On 4/11/2019 9:12 pm, Doerr, Martin wrote:
>>> Hi all,
>>>
>>> @Dean
>>>> changing can_remove() to CompileBroker::can_remove()?
>>> Yes. That would be an option.
>>>
>>> @Kim, David
>>> I think there's another problem with this implementation.
>>> It introduces a use-after-free pattern due to concurrency.
>>> Compiler threads may still read the oops from the handles after one of
>> them has called destroy_global until next safepoint. It doesn't matter which
>> values they get in this case, but the VM should not crash. I believe that
>> OopStorage allows freeing storage without safepoints, so this may be
>> unsafe. Right?
>>
>> I don't understand what you mean. If a compiler thread holds an oop, any
>> oop, it must hold it in a Handle to ensure it can't be gc'd.
>>
>> David
>>
>>> If so, I think replacing the oops in the handles (and keeping the handles
>> alive) would be better. And also much more simple.
>>>
>>> Best regards,
>>> Martin
>>>
>>>
>>>> -----Original Message-----
>>>> From: dean.long at oracle.com <dean.long at oracle.com>
>>>> Sent: Samstag, 2. November 2019 08:36
>>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
>>>> <david.holmes at oracle.com>; Kim Barrett <kim.barrett at oracle.com>
>>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>>>
>>>> Hi Martin,
>>>>
>>>> On 10/30/19 3:18 AM, Doerr, Martin wrote:
>>>>> Hi David,
>>>>>
>>>>>> I don't think factoring out CompileBroker::clear_compiler2_object
>> when
>>>>>> it is only used once was warranted, but that's call for compiler team to
>>>>>> make.
>>>>> I did that because _compiler2_objects is private and there's currently no
>>>> setter available.
>>>>> But let's see what the compiler folks think.
>>>>
>>>> how about changing can_remove() to CompileBroker::can_remove()?
>> Then
>>>> you
>>>> can access _compiler2_objects directly, right?
>>>>
>>>> dl
>>>>>> Otherwise changes seem fine and I have noted the use of the
>>>>>> MutexUnlocker as per your direct email.
>>>>> Thanks a lot for reviewing. It was not a trivial one ??
>>>>>
>>>>> You had noticed an incorrect usage of the CHECK macro. I've created a
>> new
>>>> bug for that:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8233193
>>>>> Would be great if you could take a look if that's what you meant and
>> made
>>>> adaptions if needed.
>>>>>
>>>>> Best regards,
>>>>> Martin
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>> Sent: Mittwoch, 30. Oktober 2019 05:47
>>>>>> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
>>>>>> <kim.barrett at oracle.com>
>>>>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
>>>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-
>> dev at openjdk.java.net;
>>>>>> David Holmes <David.Holmes at oracle.com>
>>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>>>>>
>>>>>> Hi Martin,
>>>>>>
>>>>>> On 29/10/2019 12:06 am, Doerr, Martin wrote:
>>>>>>> Hi David and Kim,
>>>>>>>
>>>>>>> I think it's easier to talk about code. So here's a new webrev:
>>>>>>>
>>>>>>
>>>>
>> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
>>>>>> ev.03/
>>>>>>
>>>>>> I don't think factoring out CompileBroker::clear_compiler2_object
>> when
>>>>>> it is only used once was warranted, but that's call for compiler team to
>>>>>> make. Otherwise changes seem fine and I have noted the use of the
>>>>>> MutexUnlocker as per your direct email.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> @Kim:
>>>>>>> Thanks for looking at the handle related parts. It's ok if you don't want
>> to
>>>>>> be a reviewer of the whole change.
>>>>>>>> I think it's weird that can_remove() is a predicate with optional side
>>>>>>>> effects.  I think it would be simpler to have it be a pure predicate,
>>>>>>>> and have the one caller with do_it = true perform the updates.  That
>>>>>>>> should include NULLing out the handle pointer (perhaps debug-only,
>>>> but
>>>>>>>> it doesn't cost much to cleanly maintain the data structure).
>>>>>>> Nevertheless, it has the advantage that it enforces the update to be
>>>>>> consistent.
>>>>>>> A caller could use it without holding the lock or mess it up otherwise.
>>>>>>> In addition, I don't what to change that as part of this fix.
>>>>>>>
>>>>>>>> So far as I can tell, THREAD == NULL here.
>>>>>>> This is a very tricky part (not my invention):
>>>>>>> EXCEPTION_MARK contains an ExceptionMark constructor call which
>>>> sets
>>>>>> __the_thread__ to Thread::current().
>>>>>>> I don't want to publish my opinion about this ??
>>>>>>>
>>>>>>> @David:
>>>>>>> Seems like this option is preferred over option 3
>>>>>> (possibly_add_compiler_threads part of webrev.02 and leave the
>>>>>> initialization as is).
>>>>>>> So when you're ok with it, I'll request a 2nd review from the compiler
>>>> folks
>>>>>> (I should change the subject to contain RFR).
>>>>>>> Thanks,
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>>>> Sent: Montag, 28. Oktober 2019 05:04
>>>>>>>> To: Kim Barrett <kim.barrett at oracle.com>
>>>>>>>> Cc: Doerr, Martin <martin.doerr at sap.com>; Vladimir Kozlov
>>>>>>>> (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>;
>> hotspot-
>>>>>>>> compiler-dev at openjdk.java.net
>>>>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI
>> CompilerThread
>>>>>>>>
>>>>>>>> On 28/10/2019 1:42 pm, Kim Barrett wrote:
>>>>>>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes
>>>>>> <david.holmes at oracle.com>
>>>>>>>> wrote:
>>>>>>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote:
>>>>>>>>>>> Hi Kim,
>>>>>>>>>>> I didn't like using the OopStorage stuff directly, either. I just have
>>>> not
>>>>>>>> seen how to allocate a global handle and add the oop later.
>>>>>>>>>>> Thanks for pointing me to JVMCI::make_global. I was not aware
>> of
>>>>>> that.
>>>>>>>>>>> So I can imagine 3 ways to implement it:
>>>>>>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just
>>>> added
>>>>>>>> that to
>>>>>>
>>>>
>> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
>>>>>>>> ev.01/
>>>>>>>>>>> We may want to improve that further by setting the handle
>> pointer
>>>> to
>>>>>>>> NULL and asserting that it is NULL before adding the new one.
>>>>>>>>>>> I had been concerned about NULLs in the array, but looks like
>> the
>>>>>>>> existing code can deal with that.
>>>>>>>>>> I think it would be cleaner to both destroy the global handle and
>>>> NULL it
>>>>>> in
>>>>>>>> the array at the same time.
>>>>>>>>>> This comment
>>>>>>>>>>
>>>>>>>>>> 325         // Old j.l.Thread object can die here.
>>>>>>>>>>
>>>>>>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct-
>>>>> threadObj()
>>>>>> so
>>>>>>>> can't "die" until that is also cleared during the actual termination
>>>> process.
>>>>>>>>> I think if there is such a thread here that it can't die, because the
>>>>>>>>> death predicate (the can_remove stuff) won't see that old thread
>> as
>>>>>>>>> the last thread in _compiler2_objects.  That's what I meant by this:
>>>>>>>>>
>>>>>>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett
>> <kim.barrett at oracle.com>
>>>>>>>> wrote:
>>>>>>>>>> I also think that here:
>>>>>>>>>>
>>>>>>>>>> 947         jobject thread_handle =
>>>>>> JNIHandles::make_global(thread_oop);
>>>>>>>>>> 948         _compiler2_objects[i] = thread_handle;
>>>>>>>>>>
>>>>>>>>>> should assert _compiler2_objects[i] == NULL.  Or if that isn't a
>> valid
>>>>>>>>>> assertion then I think there are other problems.
>>>>>>>>> I think either that comment about an old thread is wrong (and the
>>>> NULL
>>>>>>>>> assertion I suggested is okay), or I think the whole mechanism here
>>>>>>>>> has problems.  Or at least I was unable to figure out how it could
>>>> work...
>>>>>>>>>
>>>>>>>> I'm not following sorry. You can't assert NULL unless it's actually set
>>>>>>>> to NULL which it presently isn't. But it could be set NULL as Martin
>>>>>>>> suggested:
>>>>>>>>
>>>>>>>> "We may want to improve that further by setting the handle pointer
>> to
>>>>>>>> NULL and asserting that it is NULL before adding the new one."
>>>>>>>>
>>>>>>>> and which I also supported. But that aside once the delete_global
>> has
>>>>>>>> been called that JNIHandle no longer references the j.l.Thread that it
>>>>>>>> did, at which point it is only reachable via the threadObj() of the
>>>>>>>> CompilerThread.
>>>>>>>>
>>>>>>>> David
>>>

From martin.doerr at sap.com  Tue Nov  5 13:37:09 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 5 Nov 2019 13:37:09 +0000
Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread
In-Reply-To: <6ac1f31e-61a6-fa92-75f6-cae0732915e3@oracle.com>
References: <VI1PR0201MB24797F385592A8C572B4064E9A900@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <VI1PR0201MB2479D25DB14E79BBF16666769A680@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com>
 <HE1PR0201MB24755374CE7618AA8545A2309A6B0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com>
 <A0870A55-0738-4CC6-AF3C-50B6E2324D3F@oracle.com>
 <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com>
 <VI1PR0201MB24790137F2D7A941A33CE4949A660@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d8990c59-b51e-1c21-3e7c-272d287fc711@oracle.com>
 <VI1PR0201MB2479DDFCD2E55541481E65ED9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com>
 <VI1PR0201MB247928CE867614F8272E5D719A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com>
 <VI1PR0201MB247921C496C30C638B0959279A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <6ac1f31e-61a6-fa92-75f6-cae0732915e3@oracle.com>
Message-ID: <VI1PR0201MB247905B810D7CFE5A15CD6EC9A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi David

> With JVMCI compiler threads, each getting a new j.l.Thread oop that 
> lasts for the lifetime of that compiler thread (just like a regular 
> JavaThread) do we even actually need these arrays? I'm unclear what 
> purpose they serve when we are not trying to reuse the oops stored in 
> the array. ??

Compiler threads can lookup j.l.Thread objects of live compilers by iterating over the arrays.
That's used to find the last compiler alive or to find a log instance for a compiler.
Could get designed differently, but that would make the change even bigger.

Best regards,
Martin


> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Dienstag, 5. November 2019 13:33
> To: Doerr, Martin <martin.doerr at sap.com>; dean.long at oracle.com; Kim
> Barrett <kim.barrett at oracle.com>
> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> 
> On 5/11/2019 6:40 pm, Doerr, Martin wrote:
> > Hi David
> >
> >> I don't understand what you mean. If a compiler thread holds an oop, any
> >> oop, it must hold it in a Handle to ensure it can't be gc'd.
> >
> > The problem is not related to gc.
> > My change introduces destroy_global for the handles. This means that the
> OopStorage portion which has held the oop can get freed.
> > However, other compiler threads are running concurrently. They may
> execute code which reads the oop from the handle which is freed by this
> thread. Reading stale data is not a problem here, but reading freed memory
> may assert or even crash in general.
> > I can't see how OopStorage supports reading from handles which were
> freed by destroy_global.
> 
> With JVMCI compiler threads, each getting a new j.l.Thread oop that
> lasts for the lifetime of that compiler thread (just like a regular
> JavaThread) do we even actually need these arrays? I'm unclear what
> purpose they serve when we are not trying to reuse the oops stored in
> the array. ??
> 
> David
> -----
> 
> > I think it would be safe if the freeing only occurred at safepoints, but I don't
> think this is the case.
> >
> > Best regards,
> > Martin
> >
> >
> >> -----Original Message-----
> >> From: David Holmes <david.holmes at oracle.com>
> >> Sent: Dienstag, 5. November 2019 00:19
> >> To: Doerr, Martin <martin.doerr at sap.com>; dean.long at oracle.com; Kim
> >> Barrett <kim.barrett at oracle.com>
> >> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> >> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> >>
> >> On 4/11/2019 9:12 pm, Doerr, Martin wrote:
> >>> Hi all,
> >>>
> >>> @Dean
> >>>> changing can_remove() to CompileBroker::can_remove()?
> >>> Yes. That would be an option.
> >>>
> >>> @Kim, David
> >>> I think there's another problem with this implementation.
> >>> It introduces a use-after-free pattern due to concurrency.
> >>> Compiler threads may still read the oops from the handles after one of
> >> them has called destroy_global until next safepoint. It doesn't matter
> which
> >> values they get in this case, but the VM should not crash. I believe that
> >> OopStorage allows freeing storage without safepoints, so this may be
> >> unsafe. Right?
> >>
> >> I don't understand what you mean. If a compiler thread holds an oop, any
> >> oop, it must hold it in a Handle to ensure it can't be gc'd.
> >>
> >> David
> >>
> >>> If so, I think replacing the oops in the handles (and keeping the handles
> >> alive) would be better. And also much more simple.
> >>>
> >>> Best regards,
> >>> Martin
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: dean.long at oracle.com <dean.long at oracle.com>
> >>>> Sent: Samstag, 2. November 2019 08:36
> >>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
> >>>> <david.holmes at oracle.com>; Kim Barrett <kim.barrett at oracle.com>
> >>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> >>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-
> dev at openjdk.java.net
> >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> >>>>
> >>>> Hi Martin,
> >>>>
> >>>> On 10/30/19 3:18 AM, Doerr, Martin wrote:
> >>>>> Hi David,
> >>>>>
> >>>>>> I don't think factoring out CompileBroker::clear_compiler2_object
> >> when
> >>>>>> it is only used once was warranted, but that's call for compiler team
> to
> >>>>>> make.
> >>>>> I did that because _compiler2_objects is private and there's currently
> no
> >>>> setter available.
> >>>>> But let's see what the compiler folks think.
> >>>>
> >>>> how about changing can_remove() to CompileBroker::can_remove()?
> >> Then
> >>>> you
> >>>> can access _compiler2_objects directly, right?
> >>>>
> >>>> dl
> >>>>>> Otherwise changes seem fine and I have noted the use of the
> >>>>>> MutexUnlocker as per your direct email.
> >>>>> Thanks a lot for reviewing. It was not a trivial one ??
> >>>>>
> >>>>> You had noticed an incorrect usage of the CHECK macro. I've created a
> >> new
> >>>> bug for that:
> >>>>> https://bugs.openjdk.java.net/browse/JDK-8233193
> >>>>> Would be great if you could take a look if that's what you meant and
> >> made
> >>>> adaptions if needed.
> >>>>>
> >>>>> Best regards,
> >>>>> Martin
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: David Holmes <david.holmes at oracle.com>
> >>>>>> Sent: Mittwoch, 30. Oktober 2019 05:47
> >>>>>> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
> >>>>>> <kim.barrett at oracle.com>
> >>>>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> >>>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-
> >> dev at openjdk.java.net;
> >>>>>> David Holmes <David.Holmes at oracle.com>
> >>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI
> CompilerThread
> >>>>>>
> >>>>>> Hi Martin,
> >>>>>>
> >>>>>> On 29/10/2019 12:06 am, Doerr, Martin wrote:
> >>>>>>> Hi David and Kim,
> >>>>>>>
> >>>>>>> I think it's easier to talk about code. So here's a new webrev:
> >>>>>>>
> >>>>>>
> >>>>
> >>
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> >>>>>> ev.03/
> >>>>>>
> >>>>>> I don't think factoring out CompileBroker::clear_compiler2_object
> >> when
> >>>>>> it is only used once was warranted, but that's call for compiler team
> to
> >>>>>> make. Otherwise changes seem fine and I have noted the use of the
> >>>>>> MutexUnlocker as per your direct email.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> David
> >>>>>> -----
> >>>>>>
> >>>>>>> @Kim:
> >>>>>>> Thanks for looking at the handle related parts. It's ok if you don't
> want
> >> to
> >>>>>> be a reviewer of the whole change.
> >>>>>>>> I think it's weird that can_remove() is a predicate with optional
> side
> >>>>>>>> effects.  I think it would be simpler to have it be a pure predicate,
> >>>>>>>> and have the one caller with do_it = true perform the updates.
> That
> >>>>>>>> should include NULLing out the handle pointer (perhaps debug-
> only,
> >>>> but
> >>>>>>>> it doesn't cost much to cleanly maintain the data structure).
> >>>>>>> Nevertheless, it has the advantage that it enforces the update to
> be
> >>>>>> consistent.
> >>>>>>> A caller could use it without holding the lock or mess it up
> otherwise.
> >>>>>>> In addition, I don't what to change that as part of this fix.
> >>>>>>>
> >>>>>>>> So far as I can tell, THREAD == NULL here.
> >>>>>>> This is a very tricky part (not my invention):
> >>>>>>> EXCEPTION_MARK contains an ExceptionMark constructor call
> which
> >>>> sets
> >>>>>> __the_thread__ to Thread::current().
> >>>>>>> I don't want to publish my opinion about this ??
> >>>>>>>
> >>>>>>> @David:
> >>>>>>> Seems like this option is preferred over option 3
> >>>>>> (possibly_add_compiler_threads part of webrev.02 and leave the
> >>>>>> initialization as is).
> >>>>>>> So when you're ok with it, I'll request a 2nd review from the
> compiler
> >>>> folks
> >>>>>> (I should change the subject to contain RFR).
> >>>>>>> Thanks,
> >>>>>>> Martin
> >>>>>>>
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: David Holmes <david.holmes at oracle.com>
> >>>>>>>> Sent: Montag, 28. Oktober 2019 05:04
> >>>>>>>> To: Kim Barrett <kim.barrett at oracle.com>
> >>>>>>>> Cc: Doerr, Martin <martin.doerr at sap.com>; Vladimir Kozlov
> >>>>>>>> (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>;
> >> hotspot-
> >>>>>>>> compiler-dev at openjdk.java.net
> >>>>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI
> >> CompilerThread
> >>>>>>>>
> >>>>>>>> On 28/10/2019 1:42 pm, Kim Barrett wrote:
> >>>>>>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes
> >>>>>> <david.holmes at oracle.com>
> >>>>>>>> wrote:
> >>>>>>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote:
> >>>>>>>>>>> Hi Kim,
> >>>>>>>>>>> I didn't like using the OopStorage stuff directly, either. I just
> have
> >>>> not
> >>>>>>>> seen how to allocate a global handle and add the oop later.
> >>>>>>>>>>> Thanks for pointing me to JVMCI::make_global. I was not
> aware
> >> of
> >>>>>> that.
> >>>>>>>>>>> So I can imagine 3 ways to implement it:
> >>>>>>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I
> just
> >>>> added
> >>>>>>>> that to
> >>>>>>
> >>>>
> >>
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> >>>>>>>> ev.01/
> >>>>>>>>>>> We may want to improve that further by setting the handle
> >> pointer
> >>>> to
> >>>>>>>> NULL and asserting that it is NULL before adding the new one.
> >>>>>>>>>>> I had been concerned about NULLs in the array, but looks like
> >> the
> >>>>>>>> existing code can deal with that.
> >>>>>>>>>> I think it would be cleaner to both destroy the global handle
> and
> >>>> NULL it
> >>>>>> in
> >>>>>>>> the array at the same time.
> >>>>>>>>>> This comment
> >>>>>>>>>>
> >>>>>>>>>> 325         // Old j.l.Thread object can die here.
> >>>>>>>>>>
> >>>>>>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct-
> >>>>> threadObj()
> >>>>>> so
> >>>>>>>> can't "die" until that is also cleared during the actual termination
> >>>> process.
> >>>>>>>>> I think if there is such a thread here that it can't die, because the
> >>>>>>>>> death predicate (the can_remove stuff) won't see that old
> thread
> >> as
> >>>>>>>>> the last thread in _compiler2_objects.  That's what I meant by
> this:
> >>>>>>>>>
> >>>>>>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett
> >> <kim.barrett at oracle.com>
> >>>>>>>> wrote:
> >>>>>>>>>> I also think that here:
> >>>>>>>>>>
> >>>>>>>>>> 947         jobject thread_handle =
> >>>>>> JNIHandles::make_global(thread_oop);
> >>>>>>>>>> 948         _compiler2_objects[i] = thread_handle;
> >>>>>>>>>>
> >>>>>>>>>> should assert _compiler2_objects[i] == NULL.  Or if that isn't a
> >> valid
> >>>>>>>>>> assertion then I think there are other problems.
> >>>>>>>>> I think either that comment about an old thread is wrong (and
> the
> >>>> NULL
> >>>>>>>>> assertion I suggested is okay), or I think the whole mechanism
> here
> >>>>>>>>> has problems.  Or at least I was unable to figure out how it could
> >>>> work...
> >>>>>>>>>
> >>>>>>>> I'm not following sorry. You can't assert NULL unless it's actually
> set
> >>>>>>>> to NULL which it presently isn't. But it could be set NULL as Martin
> >>>>>>>> suggested:
> >>>>>>>>
> >>>>>>>> "We may want to improve that further by setting the handle
> pointer
> >> to
> >>>>>>>> NULL and asserting that it is NULL before adding the new one."
> >>>>>>>>
> >>>>>>>> and which I also supported. But that aside once the delete_global
> >> has
> >>>>>>>> been called that JNIHandle no longer references the j.l.Thread
> that it
> >>>>>>>> did, at which point it is only reachable via the threadObj() of the
> >>>>>>>> CompilerThread.
> >>>>>>>>
> >>>>>>>> David
> >>>

From rwestrel at redhat.com  Tue Nov  5 13:42:20 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 05 Nov 2019 14:42:20 +0100
Subject: [8u] RFR: 8233023: assert(Opcode() == mem->Opcode() ||
 phase->C->get_alias_index(adr_type()) == Compile::AliasIdxRaw) failed: no
 mismatched stores, except on raw memory
In-Reply-To: <7a2d50793120bdeb86d0047fd09db18c725328e0.camel@redhat.com>
References: <7a2d50793120bdeb86d0047fd09db18c725328e0.camel@redhat.com>
Message-ID: <87tv7ie8xv.fsf@redhat.com>


Hi Severin,

Thanks for taking care of this.

> Could I please get a review of this 8u only issue? The reason a
> fastdebug build of latest OpenJDK 8u asserts for the dec-tree benchmark
> of the renaissance suite is because the 8u backport of JDK-8140309 was
> missing this hunk from JDK 9[1]:
>
> +             (Opcode() == Op_StoreL && st->Opcode() == Op_StoreI) || // expanded ClearArrayNode
> +             (is_mismatched_access() || st->as_Store()->is_mismatched_access()),
>
> I had a closer look and there doesn't seem to be missing anything else.
> The proposed fix is to amend the assert condition in the appropriate
> place, which brings 8u in line with JDK 9 code where the failure isn't
> observed.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8233023
> webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8233023/01/webrev/

Isn't this:

@@ -3213,6 +3221,9 @@
 // within the initialized memory.
 intptr_t InitializeNode::can_capture_store(StoreNode* st, PhaseTransform* phase, bool can_reshape) {
   const int FAIL = 0;
+  if (st->is_unaligned_access()) {
+    return FAIL;
+  }
   if (st->req() != MemNode::ValueIn + 1)
     return FAIL;                // an inscrutable StoreNode (card mark?)
   Node* ctl = st->in(MemNode::Control);

also missing from the 8140309?

It must be armless because nothing sets _unaligned_access AFAICT but
given the unaligned access part of the patch was backported I think we
should keep it consistent.

Also, 

+             (Opcode() == Op_StoreL && st->Opcode() == Op_StoreI) || // expanded ClearArrayNode

is not from 8140309 but from 8080289. We're not going to backport it and
that line is unrelated to the change so backporting it sounds good. But
while doing this, we should backport the other changes to that assert
from 8080289 as well.

+             st->Opcode() == Op_StoreVector ||
+             Opcode() == Op_StoreVector ||

Roland.

From lutz.schmidt at sap.com  Tue Nov  5 14:54:25 2019
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Tue, 5 Nov 2019 14:54:25 +0000
Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free
 blocks
In-Reply-To: <CAA-vtUyny1TkJXvqT0smwhQeB0omJ_jPPJyYoSO9Cv4NV4Zd2Q@mail.gmail.com>
References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com>
 <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com>
 <B89E11F0-98EE-49F1-87EB-FC22CB7D54A2@sap.com>
 <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com>
 <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com>
 <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com>
 <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com>
 <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com>
 <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com>
 <CAA-vtUyny1TkJXvqT0smwhQeB0omJ_jPPJyYoSO9Cv4NV4Zd2Q@mail.gmail.com>
Message-ID: <09A11702-3B0B-4FA2-A79E-EE61779D311F@sap.com>

Hi Thomas,

thank you very much for your elaborate comments and suggestions! Please find my comments inline below.

Regards,
Lutz


?On 04.11.19, 18:02, "Thomas St?fe" <thomas.stuefe at gmail.com> wrote:

    Hi Lutz,
    
    I like this patch and have a number of remarks:
    
    Segment map splicing:
    
    - If the last byte of the leading block happens to contain 0xFD, you do not need to increment the fragmentation counter after splicing since the leading hop would be fully formed.

R: You are right. I'm not sure if this extra check will pay off performance-wise, though.
    
    - If patch complexity is a concern, I would maybe leave out the segmap_template. That would simplify the coding a bit and I am not sure how much this actually brings. I really am curious. Is it better to let the compiler store immediates or to memcpy the template? Would the latter not mean I pay for the loads too?

R: Introducing the segmap_template was my first attempt to boost performance. Improvement was visible, but of course in no way comparable with the effect we see now. Unfortunately, I did not archive the numbers. The improvement over a byte loop depends on the compiler, the memcpy implementation, and CPU capabilities. On s390x, for example, there exists special hardware support for mem->mem moves of up to 256 bytes. 
    
    - In ::allocate(), I like that you re-use block. This is just bikeshedding, but you also could reshape this function a tiny bit for a clean single exit at the end of the function.

R: Where to exit a function is personal taste. Or is there an OpenJDK coding guideline for that? In this particular case, an additional local variable would be needed to hold the function result (block->allocated_space() or NULL). I do not really like that.
    
    - Comment nits:
    
    - * code block (e.g. nmethod) when only a pointer to somewhere inside the
    + * code block (e.g. nmethod) when only a pointer to a location inside the
    
    - *  - Segment addresses should be aligned to be multiples of CodeCacheSegmentSize.
    + *  - Segment start addresses should be aligned to be multiples of CodeCacheSegmentSize.
    
    - *  - Allocation in the code cache can only happen at segment boundaries.
    + *  - Allocation in the code cache can only happen at segment start addresses.

R: agreed. Will be in next webrev.
    
    - I would like comments for the newly added functions in heap.hpp, not in the cpp file, because my IDE expands those comments from the hpp. But thats just me :=)

R: I prefer interface documentation in *.hpp (where you look what'S available), functional description in *.cpp (where the code is).
    
    - block_at() could be implemented now using address_for and a cast

R: I tried to keep the diff in check. 
    
    - fragmentation_limit, freelist_limit: I thought using enums for constants is not done anymore, and the preferred way is to use static const. See e.g. markWord.hpp.

R: If static const is the way to go, I'll go that way. 
    
    - +  // Boundaries of committed space.
      +  // Boundaries of reserved space.
       Thanks for the comments. Out of curiosity, can the lower part be uncommitted? Or would low() == low_boundary() always be true?

R: The interface distinguishes low() and low_boundary(). I did not dive into VirtualSpace to find out if the two values are always the same. Instead of relying on implementation detals, I prefer to adhere to the API.
    
    - +  bool contains(const void* p) const             { return low() <= p && p < high(); }
      Is comparing with void* legal C++?

R: None of our various compilers complained...
    
    - CodeHeap::merge_right():
        - Nits, I would maybe rename "follower" to "beg_follower", or, alternatively, "beg" to "leader".
        - I wonder whether we can have a "wipe header function" which only wipes the FreeBlock header of the follower, instead of wiping the whole segment.

R: If naming ist o be changed, I'd rather do something like
     size_t end_ix = segment_for(a) + a->length();
    
    - CodeHeap::add_to_freelist():
       I am astonished that this brings such a performance increase. I would naively have thought the deallocation to be pretty random, and therefore this technique to have an improvement of factor 2 in general, but we also need to find the start of the last_insert_free_block which may take some hops, and then, the block might not even be free...

R: Well, it all depends on the deallocation pattern. If you deallocate with ascending block addresses, the optimization is perfect. If you deallocate in the other direction, the optimization is not so good. But it will help as long as the the deallocated block is at a higher address than the remembered free block.
    
    --
    
    General remarks, not necessarily for your patch:
    
    - This code could really gain readability if the segment id would be its an own type, e.g. "typdef int segid_t". APIs like "mark_segmap_as_used(segid_t beg, segid_t end) are immediately clearer than when size_t is used.

R: Same as above: wanted to keep the diff in check.
    
    - In CodeHeap::verify(), do we actually check that the segment map is correctly formed? That from every point in the map, we reach the correct start of the associated code blob? I could not find this but I may be blind.

R: I had that, but it was way too expensive, even for a fastdebug build. Check webrev version *.00

    Thanks, Thomas
    
    
    On Thu, Oct 31, 2019 at 5:55 PM Schmidt, Lutz <mailto:lutz.schmidt at sap.com> wrote:
    Hi Andrew, (and hi to the interested crowd),
    
    Please accept my apologies for taking so long to get back.
    
    These tests (OverflowCodeCacheTest and StressCodeCacheTest) were causing me quite some headaches. Some layer between me and the test prevents the vm (in particular: the VMThread) from terminating normally. The final output from my time measurements is therefore not generated or thrown away. Adding to that were some test machine unavailabilities and a bug in my measurement code, causing crashes. 
    
    Anyway, I added some on-the-fly output, printing the timer values after 10k measurement intervals. This reveals some interesting, additional facts about the tests and the CodeHeap management methods. For detailed numbers, refer to the files attached to the bug (https://bugs.openjdk.java.net/browse/JDK-8231460). For even more detail, I can provide the jtr files on request. 
    
    
    OverflowCodeCacheTest
    =====================
    This test runs (in my setup) with a 1GB CodeCache.
    
    For this test, CodeHeap::mark_segmap_as_used() is THE performance hog. 40% of all calls have to mark more than 16k segment map entries (in the not optimized case). Basically all of these calls convert to len=1 calls with the optimization turned on. Note that during FreeBlock joining, the segment count is forced to 1(one). No wonder the time spent in CodeHeap::mark_segmap_as_used() collapses from >80sec (half of the test runtime) to <100msec. 
    
    CodeHeap::add_to_freelist() on the other hand, is almost not observable. Average free list length is at two elements, making even linear search really quick. 
    
    
    StressCodeCacheTest
    ===================
    With a 1GB CodeCache, this test runs into a 12 min timeout, set by our internal test environment. Scaling back to 300MB prevents the test from timing out.
    
    For this test, CodeHeap::mark_segmap_as_used() is not a factor. From 200,000 calls, only a few (less than 3%) had to process a block consisting of more than 16 segments. Note that during FreeBlock joining, the segment count is forced to 1(one). 
    
    Another method is popping up as performance hog instead: CodeHeap::add_to_freelist(). More than 8 out of 40 seconds of test runtime (before optimization) are spent in this method, for just 160,000 calls. The test seems to create a long list of non-contiguous free blocks (around 5,500 on average). This list is linearly scanned to find the insert point for the free block at hand.
    
    Suffering as well from the long free block list is CodeHeap::search_freelist(). It uses another 2.7 seconds for 270,000 calls.  
    
    
    SPEVjvm2008 suite
    =================
    With respect to the task at hand, this is a well-behaved test suite. Timing shows some before/after difference, but nothing spectacular. The measurements due not provide evidence of a performance bottleneck. 
    
    
    There were some minor adjustments to the code. Unused code blocks have been removed as well. I have therefore created a new webrev. You can find it here:
       http://cr.openjdk.java.net/~lucy/webrevs/8231460.01/ 
    
    Thanks for investing your time!
    Lutz
    
    
    On 21.10.19, 15:06, "Andrew Dinn" <mailto:adinn at redhat.com> wrote:
    
        Hi Lutz,
    
        On 21/10/2019 13:37, Schmidt, Lutz wrote:
        > I understand what you are interested in. And I was hoping to be able
        > to provide some (first) numbers by today. Unfortunately, the
        > measurement code I activated last Friday was buggy and blew most of
        > the tests I had hoped to run over the weekend.
        > 
        > I will take your modified test and run it with and without my
        > optimization. In parallel, I will try to generate some (non-random)
        > numbers for other tests.
        > 
        > I'll be back as soon as I have results.
    
        Thanks for trying the test and also for deriving some call stats from a
        real example. I'm keen to see how much your patch improves things.
    
        regards,
    
    
        Andrew Dinn
        -----------
        Senior Principal Software Engineer
        Red Hat UK Ltd
        Registered in England and Wales under Company Registration No. 03798903
        Directors: Michael Cunningham, Michael ("Mike") O'Neill
    
    
From tobias.hartmann at oracle.com  Tue Nov  5 16:00:40 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 5 Nov 2019 17:00:40 +0100
Subject: RFR(S) : 8233496 : AOT tests failures with
 'java.lang.RuntimeException: Failed to find sun/hotspot/WhiteBox.class'
In-Reply-To: <3845ADEC-2657-421F-B35F-C12962A31452@oracle.com>
References: <87B508A4-0274-4084-B8B2-1A23FB9B8D26@oracle.com>
 <3845ADEC-2657-421F-B35F-C12962A31452@oracle.com>
Message-ID: <5ca28de0-2a9b-3762-e9a8-12cde97b828c@oracle.com>

+1

Best regards,
Tobias

On 04.11.19 22:45, Vladimir Kozlov wrote:
> Looks good.
> 
> Thanks
> Vladimir
> 
>> On Nov 4, 2019, at 1:33 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>
>> http://cr.openjdk.java.net/~iignatyev//8233496/webrev.00/index.html
>>> 42 lines changed: 0 ins; 6 del; 36 mod;
>>
>> Hi all,
>>
>> could you please review this small patch for compiler/aot tests? the tests run 'ClassFileInstaller sun.hotspot.WhiteBox' w/o having any preceding actions which build s.h.WhiteBox class, the fix adds sun.hotspot.WhiteBox to the explicit build action and also removes unneeded classes (compiler.aot.AotCompiler and compiler.calls.common.InvokeDynamicPatcher as they are built implicitly by @run) from it.
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233496
>> webrev: http://cr.openjdk.java.net/~iignatyev//8233496/webrev.00/index.html
>> testing: 
>> - all compiler/aot tests together on Oracle platforms
>> - each changed test separately on linux-x64
>>
>> Thanks,
>> -- Igor
>>
>>
>>
> 

From vladimir.x.ivanov at oracle.com  Tue Nov  5 16:54:30 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 5 Nov 2019 19:54:30 +0300
Subject: RFR 8233389: Add PrintIdeal to compiler directives
In-Reply-To: <6547a22a-47e2-822f-0772-c2b0a7599088@oracle.com>
References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com>
 <21ab250e-0564-69e4-62c0-07bb4dce9082@oracle.com>
 <6547a22a-47e2-822f-0772-c2b0a7599088@oracle.com>
Message-ID: <975bbd8f-8021-8a5c-544b-123b2a2e08d7@oracle.com>


> http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.02/

Looks good.

> W.r.t. usefulness of PrintIdeal vs PrintIdealGraph; The obvious thing is 
> that PrintIdeal doesn't require IGV, which might be more useful if 
> PrintIdeal were a diagnostic flag instead (as suggested by Nils), so it 
> could be used from a standard JDK build which doesn't come with IGV. 
> Another advantage that comes to mind is that PrintIdeal output is easier 
> to share as text; you can just copy a few lines into the body of an 
> email. That said, I haven't used either of these flags extensively, so I 
> find it hard to judge whether one is clearly better than the other. But, 
> it seems at least unfortunate that we have the PrintIdeal flag, but can 
> not use it in compiler directives to filter the output.

It's a long-standing issue with tracing functionality in C2: both 
options depend on ability to dump Node and Type instances in textual 
form which is absent in product binaries. It would be nice to bundle 
that code in product binaries as well and turn both options into 
diagnostic ones, but nobody have taken care of it yet.

Also, at some point, IGV had a text view of the graph (which was pretty 
close to PrintIdeal output), but I can't find it there anymore.

Best regards,
Vladimir Ivanov

> On 04/11/2019 15:57, Vladimir Ivanov wrote:
>> Hi Jorn,
>>
>> src\hotspot\share\opto\compile.hpp:
>> +?? bool????????????????? _print_ideal;?????????? // True if we should 
>> dump node IR for this compilation
>>
>> Since the only usage is in non-product code, I suggest to put 
>> _print_ideal into #ifndef PRODUCT, so you don't need to initialize it 
>> in product build.
>>
>> Also, it'll allow you to just put it on initializer list instead of 
>> doing it in the ctor body (akin to how _trace_opto_output is handled):
>>
>> src\hotspot\share\opto\compile.cpp:
>>
>> Compile::Compile( ciEnv* ci_env,
>> ...
>> ? : Phase(Compiler),
>> ...
>> ??? _has_reserved_stack_access(false),
>> #ifndef PRODUCT
>> ??? _trace_opto_output(directive->TraceOptoOutputOption),
>> #endif
>> ??? _has_method_handle_invokes(false),
>>
>>
>> Overall, I don't see much value in PrintIdeal: PrintIdealGraph 
>> provides much more detailed information (even though in XML format) 
>> and IdealGraphVisualizer is better at browsing the graph. The only 
>> thing I'm usually missing is full text dump output on individual nodes 
>> (they are shown pruned in IGV; not sure whether it's IGV fault or the 
>> info is missing in the dump).
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 01.11.2019 18:09, Jorn Vernee wrote:
>>> Hi,
>>>
>>> I'd like to add PrintIdeal as a compiler directive in order to enable 
>>> PrintIdeal for only a single method when combining it with the 
>>> 'match' directive.
>>>
>>> Please review the following:
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8233389
>>> Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/
>>> (Testing = tier1, manual)
>>>
>>> As a heads-up; I'm not a committer on the jdk project, so if this 
>>> sounds like a good idea, I would require a sponsor to push the changes.
>>>
>>> Thanks,
>>> Jorn
>>>

From igor.ignatyev at oracle.com  Tue Nov  5 16:58:47 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 5 Nov 2019 08:58:47 -0800
Subject: RFR(S) : 8233496 : AOT tests failures with
 'java.lang.RuntimeException: Failed to find sun/hotspot/WhiteBox.class'
In-Reply-To: <5ca28de0-2a9b-3762-e9a8-12cde97b828c@oracle.com>
References: <87B508A4-0274-4084-B8B2-1A23FB9B8D26@oracle.com>
 <3845ADEC-2657-421F-B35F-C12962A31452@oracle.com>
 <5ca28de0-2a9b-3762-e9a8-12cde97b828c@oracle.com>
Message-ID: <920D44DE-9AE1-4304-9682-782F585E4913@oracle.com>

Vladimir, Tobias,

thanks for your review, pushed.

-- Igor

> On Nov 5, 2019, at 8:00 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> +1
> 
> Best regards,
> Tobias
> 
> On 04.11.19 22:45, Vladimir Kozlov wrote:
>> Looks good.
>> 
>> Thanks
>> Vladimir
>> 
>>> On Nov 4, 2019, at 1:33 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>> 
>>> http://cr.openjdk.java.net/~iignatyev//8233496/webrev.00/index.html
>>>> 42 lines changed: 0 ins; 6 del; 36 mod;
>>> 
>>> Hi all,
>>> 
>>> could you please review this small patch for compiler/aot tests? the tests run 'ClassFileInstaller sun.hotspot.WhiteBox' w/o having any preceding actions which build s.h.WhiteBox class, the fix adds sun.hotspot.WhiteBox to the explicit build action and also removes unneeded classes (compiler.aot.AotCompiler and compiler.calls.common.InvokeDynamicPatcher as they are built implicitly by @run) from it.
>>> 
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233496
>>> webrev: http://cr.openjdk.java.net/~iignatyev//8233496/webrev.00/index.html
>>> testing: 
>>> - all compiler/aot tests together on Oracle platforms
>>> - each changed test separately on linux-x64
>>> 
>>> Thanks,
>>> -- Igor
>>> 
>>> 
>>> 
>> 


From igor.veresov at oracle.com  Tue Nov  5 17:32:39 2019
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 5 Nov 2019 09:32:39 -0800
Subject: RFR(XS) 8233590: Compiler thread creation fails with assert(_c2_count
 > 0 || _c1_count > 0) failed: No compilers?
Message-ID: <C8611090-2ECD-49A2-ADCA-0ECE3A003F39@oracle.com>

This fixes a regression introduced by JDK-8233429.


JBS: https://bugs.openjdk.java.net/browse/JDK-8233590
Webrev: http://cr.openjdk.java.net/~iveresov/8233590/webrev.00/

igor


From shade at redhat.com  Tue Nov  5 17:38:10 2019
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 5 Nov 2019 18:38:10 +0100
Subject: RFR(XS) 8233590: Compiler thread creation fails with
 assert(_c2_count > 0 || _c1_count > 0) failed: No compilers?
In-Reply-To: <C8611090-2ECD-49A2-ADCA-0ECE3A003F39@oracle.com>
References: <C8611090-2ECD-49A2-ADCA-0ECE3A003F39@oracle.com>
Message-ID: <294cb7fb-8277-1f06-ef68-5a6a9b5f348f@redhat.com>

On 11/5/19 6:32 PM, Igor Veresov wrote:
> JBS: https://bugs.openjdk.java.net/browse/JDK-8233590
> Webrev: http://cr.openjdk.java.net/~iveresov/8233590/webrev.00/

Awwww, took me a while to see how the original code was broken. Looks good!

-- 
Thanks,
-Aleksey


From tobias.hartmann at oracle.com  Tue Nov  5 17:56:08 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 5 Nov 2019 18:56:08 +0100
Subject: RFR(XS) 8233590: Compiler thread creation fails with
 assert(_c2_count > 0 || _c1_count > 0) failed: No compilers?
In-Reply-To: <294cb7fb-8277-1f06-ef68-5a6a9b5f348f@redhat.com>
References: <C8611090-2ECD-49A2-ADCA-0ECE3A003F39@oracle.com>
 <294cb7fb-8277-1f06-ef68-5a6a9b5f348f@redhat.com>
Message-ID: <8013806e-b863-a82b-93bd-4f25870b7de8@oracle.com>

+1, ship it!

Best regards,
Tobias

On 05.11.19 18:38, Aleksey Shipilev wrote:
> On 11/5/19 6:32 PM, Igor Veresov wrote:
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233590
>> Webrev: http://cr.openjdk.java.net/~iveresov/8233590/webrev.00/
> 
> Awwww, took me a while to see how the original code was broken. Looks good!
> 

From igor.veresov at oracle.com  Tue Nov  5 17:57:24 2019
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 5 Nov 2019 09:57:24 -0800
Subject: RFR(XS) 8233590: Compiler thread creation fails with
 assert(_c2_count > 0 || _c1_count > 0) failed: No compilers?
In-Reply-To: <8013806e-b863-a82b-93bd-4f25870b7de8@oracle.com>
References: <C8611090-2ECD-49A2-ADCA-0ECE3A003F39@oracle.com>
 <294cb7fb-8277-1f06-ef68-5a6a9b5f348f@redhat.com>
 <8013806e-b863-a82b-93bd-4f25870b7de8@oracle.com>
Message-ID: <33788B72-540B-4F41-BEEB-A38FC005DAA6@oracle.com>

Thanks Aleksey and Tobias!

igor


> On Nov 5, 2019, at 9:56 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> +1, ship it!
> 
> Best regards,
> Tobias
> 
> On 05.11.19 18:38, Aleksey Shipilev wrote:
>> On 11/5/19 6:32 PM, Igor Veresov wrote:
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233590
>>> Webrev: http://cr.openjdk.java.net/~iveresov/8233590/webrev.00/
>> 
>> Awwww, took me a while to see how the original code was broken. Looks good!
>> 


From sgehwolf at redhat.com  Tue Nov  5 19:18:06 2019
From: sgehwolf at redhat.com (Severin Gehwolf)
Date: Tue, 05 Nov 2019 20:18:06 +0100
Subject: [8u] RFR: 8233023: assert(Opcode() == mem->Opcode() ||
 phase->C->get_alias_index(adr_type()) == Compile::AliasIdxRaw) failed: no
 mismatched stores, except on raw memory
In-Reply-To: <87tv7ie8xv.fsf@redhat.com>
References: <7a2d50793120bdeb86d0047fd09db18c725328e0.camel@redhat.com>
 <87tv7ie8xv.fsf@redhat.com>
Message-ID: <66e485457444840d563685516765e31304691932.camel@redhat.com>

Hi Roland,

On Tue, 2019-11-05 at 14:42 +0100, Roland Westrelin wrote:
> Hi Severin,
> 
> Thanks for taking care of this.

Thanks for the review!

> > Could I please get a review of this 8u only issue? The reason a
> > fastdebug build of latest OpenJDK 8u asserts for the dec-tree benchmark
> > of the renaissance suite is because the 8u backport of JDK-8140309 was
> > missing this hunk from JDK 9[1]:
> > 
> > +             (Opcode() == Op_StoreL && st->Opcode() == Op_StoreI) || // expanded ClearArrayNode
> > +             (is_mismatched_access() || st->as_Store()->is_mismatched_access()),
> > 
> > I had a closer look and there doesn't seem to be missing anything else.
> > The proposed fix is to amend the assert condition in the appropriate
> > place, which brings 8u in line with JDK 9 code where the failure isn't
> > observed.
> > 
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8233023
> > webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8233023/01/webrev/
> 
> Isn't this:
> 
> @@ -3213,6 +3221,9 @@
>  // within the initialized memory.
>  intptr_t InitializeNode::can_capture_store(StoreNode* st, PhaseTransform* phase, bool can_reshape) {
>    const int FAIL = 0;
> +  if (st->is_unaligned_access()) {
> +    return FAIL;
> +  }
>    if (st->req() != MemNode::ValueIn + 1)
>      return FAIL;                // an inscrutable StoreNode (card mark?)
>    Node* ctl = st->in(MemNode::Control);
> 
> also missing from the 8140309?

It wasn't missing from the 8u backport of 8140309. See:
http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/0ffee573412b

It was removed later by JDK-8202414.

> It must be armless because nothing sets _unaligned_access AFAICT but
> given the unaligned access part of the patch was backported I think we
> should keep it consistent.
> 
> Also, 
> 
> +             (Opcode() == Op_StoreL && st->Opcode() == Op_StoreI) || // expanded ClearArrayNode
> 
> is not from 8140309 but from 8080289.

Aah, good catch. I didn't mean to include part of 8080289. Updated
webrev:
http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8233023/02/webrev/

> We're not going to backport it and
> that line is unrelated to the change so backporting it sounds good. But
> while doing this, we should backport the other changes to that assert
> from 8080289 as well.
> 
> +             st->Opcode() == Op_StoreVector ||
> +             Opcode() == Op_StoreVector ||

Right. I've opted for not including any parts of 8080289, so we should
be consistent.

Thoughts?

Thanks,
Severin


From david.holmes at oracle.com  Tue Nov  5 23:48:13 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 6 Nov 2019 09:48:13 +1000
Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread
In-Reply-To: <VI1PR0201MB247905B810D7CFE5A15CD6EC9A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB24797F385592A8C572B4064E9A900@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <HE1PR0201MB24755374CE7618AA8545A2309A6B0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com>
 <A0870A55-0738-4CC6-AF3C-50B6E2324D3F@oracle.com>
 <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com>
 <VI1PR0201MB24790137F2D7A941A33CE4949A660@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d8990c59-b51e-1c21-3e7c-272d287fc711@oracle.com>
 <VI1PR0201MB2479DDFCD2E55541481E65ED9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com>
 <VI1PR0201MB247928CE867614F8272E5D719A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com>
 <VI1PR0201MB247921C496C30C638B0959279A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <6ac1f31e-61a6-fa92-75f6-cae0732915e3@oracle.com>
 <VI1PR0201MB247905B810D7CFE5A15CD6EC9A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <7f792953-112d-40bc-70c9-02323fa8f892@oracle.com>

Hi Martin,

On 5/11/2019 11:37 pm, Doerr, Martin wrote:
> Hi David
> 
>> With JVMCI compiler threads, each getting a new j.l.Thread oop that
>> lasts for the lifetime of that compiler thread (just like a regular
>> JavaThread) do we even actually need these arrays? I'm unclear what
>> purpose they serve when we are not trying to reuse the oops stored in
>> the array. ??
> 
> Compiler threads can lookup j.l.Thread objects of live compilers by iterating over the arrays.
> That's used to find the last compiler alive or to find a log instance for a compiler.
> Could get designed differently, but that would make the change even bigger.

Yes but given we don't have a clean working fix for current arrangement ...

Is hacking into oopStorage internals the only way forward?

Thanks,
David

> 
> Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Dienstag, 5. November 2019 13:33
>> To: Doerr, Martin <martin.doerr at sap.com>; dean.long at oracle.com; Kim
>> Barrett <kim.barrett at oracle.com>
>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>
>> On 5/11/2019 6:40 pm, Doerr, Martin wrote:
>>> Hi David
>>>
>>>> I don't understand what you mean. If a compiler thread holds an oop, any
>>>> oop, it must hold it in a Handle to ensure it can't be gc'd.
>>>
>>> The problem is not related to gc.
>>> My change introduces destroy_global for the handles. This means that the
>> OopStorage portion which has held the oop can get freed.
>>> However, other compiler threads are running concurrently. They may
>> execute code which reads the oop from the handle which is freed by this
>> thread. Reading stale data is not a problem here, but reading freed memory
>> may assert or even crash in general.
>>> I can't see how OopStorage supports reading from handles which were
>> freed by destroy_global.
>>
>> With JVMCI compiler threads, each getting a new j.l.Thread oop that
>> lasts for the lifetime of that compiler thread (just like a regular
>> JavaThread) do we even actually need these arrays? I'm unclear what
>> purpose they serve when we are not trying to reuse the oops stored in
>> the array. ??
>>
>> David
>> -----
>>
>>> I think it would be safe if the freeing only occurred at safepoints, but I don't
>> think this is the case.
>>>
>>> Best regards,
>>> Martin
>>>
>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> Sent: Dienstag, 5. November 2019 00:19
>>>> To: Doerr, Martin <martin.doerr at sap.com>; dean.long at oracle.com; Kim
>>>> Barrett <kim.barrett at oracle.com>
>>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>>>
>>>> On 4/11/2019 9:12 pm, Doerr, Martin wrote:
>>>>> Hi all,
>>>>>
>>>>> @Dean
>>>>>> changing can_remove() to CompileBroker::can_remove()?
>>>>> Yes. That would be an option.
>>>>>
>>>>> @Kim, David
>>>>> I think there's another problem with this implementation.
>>>>> It introduces a use-after-free pattern due to concurrency.
>>>>> Compiler threads may still read the oops from the handles after one of
>>>> them has called destroy_global until next safepoint. It doesn't matter
>> which
>>>> values they get in this case, but the VM should not crash. I believe that
>>>> OopStorage allows freeing storage without safepoints, so this may be
>>>> unsafe. Right?
>>>>
>>>> I don't understand what you mean. If a compiler thread holds an oop, any
>>>> oop, it must hold it in a Handle to ensure it can't be gc'd.
>>>>
>>>> David
>>>>
>>>>> If so, I think replacing the oops in the handles (and keeping the handles
>>>> alive) would be better. And also much more simple.
>>>>>
>>>>> Best regards,
>>>>> Martin
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: dean.long at oracle.com <dean.long at oracle.com>
>>>>>> Sent: Samstag, 2. November 2019 08:36
>>>>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
>>>>>> <david.holmes at oracle.com>; Kim Barrett <kim.barrett at oracle.com>
>>>>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
>>>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-
>> dev at openjdk.java.net
>>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>>>>>
>>>>>> Hi Martin,
>>>>>>
>>>>>> On 10/30/19 3:18 AM, Doerr, Martin wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>>> I don't think factoring out CompileBroker::clear_compiler2_object
>>>> when
>>>>>>>> it is only used once was warranted, but that's call for compiler team
>> to
>>>>>>>> make.
>>>>>>> I did that because _compiler2_objects is private and there's currently
>> no
>>>>>> setter available.
>>>>>>> But let's see what the compiler folks think.
>>>>>>
>>>>>> how about changing can_remove() to CompileBroker::can_remove()?
>>>> Then
>>>>>> you
>>>>>> can access _compiler2_objects directly, right?
>>>>>>
>>>>>> dl
>>>>>>>> Otherwise changes seem fine and I have noted the use of the
>>>>>>>> MutexUnlocker as per your direct email.
>>>>>>> Thanks a lot for reviewing. It was not a trivial one ??
>>>>>>>
>>>>>>> You had noticed an incorrect usage of the CHECK macro. I've created a
>>>> new
>>>>>> bug for that:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8233193
>>>>>>> Would be great if you could take a look if that's what you meant and
>>>> made
>>>>>> adaptions if needed.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>>>> Sent: Mittwoch, 30. Oktober 2019 05:47
>>>>>>>> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
>>>>>>>> <kim.barrett at oracle.com>
>>>>>>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
>>>>>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-
>>>> dev at openjdk.java.net;
>>>>>>>> David Holmes <David.Holmes at oracle.com>
>>>>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI
>> CompilerThread
>>>>>>>>
>>>>>>>> Hi Martin,
>>>>>>>>
>>>>>>>> On 29/10/2019 12:06 am, Doerr, Martin wrote:
>>>>>>>>> Hi David and Kim,
>>>>>>>>>
>>>>>>>>> I think it's easier to talk about code. So here's a new webrev:
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
>>>>>>>> ev.03/
>>>>>>>>
>>>>>>>> I don't think factoring out CompileBroker::clear_compiler2_object
>>>> when
>>>>>>>> it is only used once was warranted, but that's call for compiler team
>> to
>>>>>>>> make. Otherwise changes seem fine and I have noted the use of the
>>>>>>>> MutexUnlocker as per your direct email.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>> @Kim:
>>>>>>>>> Thanks for looking at the handle related parts. It's ok if you don't
>> want
>>>> to
>>>>>>>> be a reviewer of the whole change.
>>>>>>>>>> I think it's weird that can_remove() is a predicate with optional
>> side
>>>>>>>>>> effects.  I think it would be simpler to have it be a pure predicate,
>>>>>>>>>> and have the one caller with do_it = true perform the updates.
>> That
>>>>>>>>>> should include NULLing out the handle pointer (perhaps debug-
>> only,
>>>>>> but
>>>>>>>>>> it doesn't cost much to cleanly maintain the data structure).
>>>>>>>>> Nevertheless, it has the advantage that it enforces the update to
>> be
>>>>>>>> consistent.
>>>>>>>>> A caller could use it without holding the lock or mess it up
>> otherwise.
>>>>>>>>> In addition, I don't what to change that as part of this fix.
>>>>>>>>>
>>>>>>>>>> So far as I can tell, THREAD == NULL here.
>>>>>>>>> This is a very tricky part (not my invention):
>>>>>>>>> EXCEPTION_MARK contains an ExceptionMark constructor call
>> which
>>>>>> sets
>>>>>>>> __the_thread__ to Thread::current().
>>>>>>>>> I don't want to publish my opinion about this ??
>>>>>>>>>
>>>>>>>>> @David:
>>>>>>>>> Seems like this option is preferred over option 3
>>>>>>>> (possibly_add_compiler_threads part of webrev.02 and leave the
>>>>>>>> initialization as is).
>>>>>>>>> So when you're ok with it, I'll request a 2nd review from the
>> compiler
>>>>>> folks
>>>>>>>> (I should change the subject to contain RFR).
>>>>>>>>> Thanks,
>>>>>>>>> Martin
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>>>>>> Sent: Montag, 28. Oktober 2019 05:04
>>>>>>>>>> To: Kim Barrett <kim.barrett at oracle.com>
>>>>>>>>>> Cc: Doerr, Martin <martin.doerr at sap.com>; Vladimir Kozlov
>>>>>>>>>> (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>;
>>>> hotspot-
>>>>>>>>>> compiler-dev at openjdk.java.net
>>>>>>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI
>>>> CompilerThread
>>>>>>>>>>
>>>>>>>>>> On 28/10/2019 1:42 pm, Kim Barrett wrote:
>>>>>>>>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes
>>>>>>>> <david.holmes at oracle.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote:
>>>>>>>>>>>>> Hi Kim,
>>>>>>>>>>>>> I didn't like using the OopStorage stuff directly, either. I just
>> have
>>>>>> not
>>>>>>>>>> seen how to allocate a global handle and add the oop later.
>>>>>>>>>>>>> Thanks for pointing me to JVMCI::make_global. I was not
>> aware
>>>> of
>>>>>>>> that.
>>>>>>>>>>>>> So I can imagine 3 ways to implement it:
>>>>>>>>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I
>> just
>>>>>> added
>>>>>>>>>> that to
>>>>>>>>
>>>>>>
>>>>
>> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
>>>>>>>>>> ev.01/
>>>>>>>>>>>>> We may want to improve that further by setting the handle
>>>> pointer
>>>>>> to
>>>>>>>>>> NULL and asserting that it is NULL before adding the new one.
>>>>>>>>>>>>> I had been concerned about NULLs in the array, but looks like
>>>> the
>>>>>>>>>> existing code can deal with that.
>>>>>>>>>>>> I think it would be cleaner to both destroy the global handle
>> and
>>>>>> NULL it
>>>>>>>> in
>>>>>>>>>> the array at the same time.
>>>>>>>>>>>> This comment
>>>>>>>>>>>>
>>>>>>>>>>>> 325         // Old j.l.Thread object can die here.
>>>>>>>>>>>>
>>>>>>>>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct-
>>>>>>> threadObj()
>>>>>>>> so
>>>>>>>>>> can't "die" until that is also cleared during the actual termination
>>>>>> process.
>>>>>>>>>>> I think if there is such a thread here that it can't die, because the
>>>>>>>>>>> death predicate (the can_remove stuff) won't see that old
>> thread
>>>> as
>>>>>>>>>>> the last thread in _compiler2_objects.  That's what I meant by
>> this:
>>>>>>>>>>>
>>>>>>>>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett
>>>> <kim.barrett at oracle.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>> I also think that here:
>>>>>>>>>>>>
>>>>>>>>>>>> 947         jobject thread_handle =
>>>>>>>> JNIHandles::make_global(thread_oop);
>>>>>>>>>>>> 948         _compiler2_objects[i] = thread_handle;
>>>>>>>>>>>>
>>>>>>>>>>>> should assert _compiler2_objects[i] == NULL.  Or if that isn't a
>>>> valid
>>>>>>>>>>>> assertion then I think there are other problems.
>>>>>>>>>>> I think either that comment about an old thread is wrong (and
>> the
>>>>>> NULL
>>>>>>>>>>> assertion I suggested is okay), or I think the whole mechanism
>> here
>>>>>>>>>>> has problems.  Or at least I was unable to figure out how it could
>>>>>> work...
>>>>>>>>>>>
>>>>>>>>>> I'm not following sorry. You can't assert NULL unless it's actually
>> set
>>>>>>>>>> to NULL which it presently isn't. But it could be set NULL as Martin
>>>>>>>>>> suggested:
>>>>>>>>>>
>>>>>>>>>> "We may want to improve that further by setting the handle
>> pointer
>>>> to
>>>>>>>>>> NULL and asserting that it is NULL before adding the new one."
>>>>>>>>>>
>>>>>>>>>> and which I also supported. But that aside once the delete_global
>>>> has
>>>>>>>>>> been called that JNIHandle no longer references the j.l.Thread
>> that it
>>>>>>>>>> did, at which point it is only reachable via the threadObj() of the
>>>>>>>>>> CompilerThread.
>>>>>>>>>>
>>>>>>>>>> David
>>>>>

From kim.barrett at oracle.com  Wed Nov  6 03:09:01 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Tue, 5 Nov 2019 22:09:01 -0500
Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread
In-Reply-To: <VI1PR0201MB247921C496C30C638B0959279A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB24797F385592A8C572B4064E9A900@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <VI1PR0201MB2479A25B5B2E26D79ADDE7BF9A690@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <15a92da5-c5ba-ce55-341d-5f60acf14c3a@oracle.com>
 <VI1PR0201MB2479D25DB14E79BBF16666769A680@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com>
 <HE1PR0201MB24755374CE7618AA8545A2309A6B0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com>
 <A0870A55-0738-4CC6-AF3C-50B6E2324D3F@oracle.com>
 <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com>
 <VI1PR0201MB24790137F2D7A941A33CE4949A660@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d8990c59-b51e-1c21-3e7c-272d287fc711@oracle.com>
 <VI1PR0201MB2479DDFCD2E55541481E65ED9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com>
 <VI1PR0201MB247928CE867614F8272E5D719A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com>
 <VI1PR0201MB247921C496C30C638B0959279A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <8BCAE092-9E05-4B38-9D41-2F3AC889C0AA@oracle.com>

> On Nov 5, 2019, at 3:40 AM, Doerr, Martin <martin.doerr at sap.com> wrote:

Coming back in, because this seems to be going off into the weeds again.

>> I don't understand what you mean. If a compiler thread holds an oop, any
>> oop, it must hold it in a Handle to ensure it can't be gc'd.
> 
> The problem is not related to gc.
> My change introduces destroy_global for the handles. This means that the OopStorage portion which has held the oop can get freed.
> However, other compiler threads are running concurrently. They may execute code which reads the oop from the handle which is freed by this thread.
> Reading stale data is not a problem here, but reading freed memory may assert or even crash in general.
> I can't see how OopStorage supports reading from handles which were freed by destroy_global.

So don't do that!

OopStorage isn't magic. If you are going to look at an OopStorage
handle, you have to ensure there won't be concurrent deletion. Use
locks or some safe memory reclamation protocol. (GlobalCounter might
be used here, but it depends a lot on what the iterations are doing. A
reference counting mechanism is another possibility.) This is no
different from any other resource management.

> I think it would be safe if the freeing only occurred at safepoints, but I don't think this is the case.

Assuming the iteration didn?t happen at safepoints (which is just a way to make the iteration and
deletion not concurrent).  And I agree that isn?t the case with the current code.


From rwestrel at redhat.com  Wed Nov  6 08:42:48 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 06 Nov 2019 09:42:48 +0100
Subject: [8u] RFR: 8233023: assert(Opcode() == mem->Opcode() ||
 phase->C->get_alias_index(adr_type()) == Compile::AliasIdxRaw) failed: no
 mismatched stores, except on raw memory
In-Reply-To: <66e485457444840d563685516765e31304691932.camel@redhat.com>
References: <7a2d50793120bdeb86d0047fd09db18c725328e0.camel@redhat.com>
 <87tv7ie8xv.fsf@redhat.com>
 <66e485457444840d563685516765e31304691932.camel@redhat.com>
Message-ID: <87r22le6pj.fsf@redhat.com>


>> Isn't this:
>> 
>> @@ -3213,6 +3221,9 @@
>>  // within the initialized memory.
>>  intptr_t InitializeNode::can_capture_store(StoreNode* st, PhaseTransform* phase, bool can_reshape) {
>>    const int FAIL = 0;
>> +  if (st->is_unaligned_access()) {
>> +    return FAIL;
>> +  }
>>    if (st->req() != MemNode::ValueIn + 1)
>>      return FAIL;                // an inscrutable StoreNode (card mark?)
>>    Node* ctl = st->in(MemNode::Control);
>> 
>> also missing from the 8140309?
>
> It wasn't missing from the 8u backport of 8140309. See:
> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/0ffee573412b
>
> It was removed later by JDK-8202414.

Ok.

> Aah, good catch. I didn't mean to include part of 8080289. Updated
> webrev:
> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8233023/02/webrev/

Looks good to me.

Roland.

From martin.doerr at sap.com  Wed Nov  6 09:12:28 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 6 Nov 2019 09:12:28 +0000
Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread
In-Reply-To: <8BCAE092-9E05-4B38-9D41-2F3AC889C0AA@oracle.com>
References: <VI1PR0201MB24797F385592A8C572B4064E9A900@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <VI1PR0201MB2479A25B5B2E26D79ADDE7BF9A690@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <15a92da5-c5ba-ce55-341d-5f60acf14c3a@oracle.com>
 <VI1PR0201MB2479D25DB14E79BBF16666769A680@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com>
 <HE1PR0201MB24755374CE7618AA8545A2309A6B0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com>
 <A0870A55-0738-4CC6-AF3C-50B6E2324D3F@oracle.com>
 <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com>
 <VI1PR0201MB24790137F2D7A941A33CE4949A660@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d8990c59-b51e-1c21-3e7c-272d287fc711@oracle.com>
 <VI1PR0201MB2479DDFCD2E55541481E65ED9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com>
 <VI1PR0201MB247928CE867614F8272E5D719A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com>
 <VI1PR0201MB247921C496C30C638B0959279A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <8BCAE092-9E05-4B38-9D41-2F3AC889C0AA@oracle.com>
Message-ID: <HE1PR0201MB2475B62202B8D5187D6C3CAC9A790@HE1PR0201MB2475.eurprd02.prod.outlook.com>

Hi Kim,

thanks for confirming.

http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webrev.04/
already avoids access to freed handles.

I don't really like the complexity of this code.
Replacing oops in handles would have been much more simple.
But I can live with either version.

Best regards,
Martin


> -----Original Message-----
> From: Kim Barrett <kim.barrett at oracle.com>
> Sent: Mittwoch, 6. November 2019 04:09
> To: Doerr, Martin <martin.doerr at sap.com>
> Cc: David Holmes <david.holmes at oracle.com>; dean.long at oracle.com;
> Vladimir Kozlov (vladimir.kozlov at oracle.com)
> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> 
> > On Nov 5, 2019, at 3:40 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
> 
> Coming back in, because this seems to be going off into the weeds again.
> 
> >> I don't understand what you mean. If a compiler thread holds an oop, any
> >> oop, it must hold it in a Handle to ensure it can't be gc'd.
> >
> > The problem is not related to gc.
> > My change introduces destroy_global for the handles. This means that the
> OopStorage portion which has held the oop can get freed.
> > However, other compiler threads are running concurrently. They may
> execute code which reads the oop from the handle which is freed by this
> thread.
> > Reading stale data is not a problem here, but reading freed memory may
> assert or even crash in general.
> > I can't see how OopStorage supports reading from handles which were
> freed by destroy_global.
> 
> So don't do that!
> 
> OopStorage isn't magic. If you are going to look at an OopStorage
> handle, you have to ensure there won't be concurrent deletion. Use
> locks or some safe memory reclamation protocol. (GlobalCounter might
> be used here, but it depends a lot on what the iterations are doing. A
> reference counting mechanism is another possibility.) This is no
> different from any other resource management.
> 
> > I think it would be safe if the freeing only occurred at safepoints, but I don't
> think this is the case.
> 
> Assuming the iteration didn?t happen at safepoints (which is just a way to
> make the iteration and
> deletion not concurrent).  And I agree that isn?t the case with the current
> code.


From sgehwolf at redhat.com  Wed Nov  6 09:28:19 2019
From: sgehwolf at redhat.com (Severin Gehwolf)
Date: Wed, 06 Nov 2019 10:28:19 +0100
Subject: [8u] RFR: 8233023: assert(Opcode() == mem->Opcode() ||
 phase->C->get_alias_index(adr_type()) == Compile::AliasIdxRaw) failed: no
 mismatched stores, except on raw memory
In-Reply-To: <87r22le6pj.fsf@redhat.com>
References: <7a2d50793120bdeb86d0047fd09db18c725328e0.camel@redhat.com>
 <87tv7ie8xv.fsf@redhat.com>
 <66e485457444840d563685516765e31304691932.camel@redhat.com>
 <87r22le6pj.fsf@redhat.com>
Message-ID: <2e6ec654637b3ceac2af2d7cf02c72398bda0d11.camel@redhat.com>

On Wed, 2019-11-06 at 09:42 +0100, Roland Westrelin wrote:
> > > Isn't this:
> > > 
> > > @@ -3213,6 +3221,9 @@
> > >  // within the initialized memory.
> > >  intptr_t InitializeNode::can_capture_store(StoreNode* st, PhaseTransform* phase, bool can_reshape) {
> > >    const int FAIL = 0;
> > > +  if (st->is_unaligned_access()) {
> > > +    return FAIL;
> > > +  }
> > >    if (st->req() != MemNode::ValueIn + 1)
> > >      return FAIL;                // an inscrutable StoreNode (card mark?)
> > >    Node* ctl = st->in(MemNode::Control);
> > > 
> > > also missing from the 8140309?
> > 
> > It wasn't missing from the 8u backport of 8140309. See:
> > http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/0ffee573412b
> > 
> > It was removed later by JDK-8202414.
> 
> Ok.
> 
> > Aah, good catch. I didn't mean to include part of 8080289. Updated
> > webrev:
> > http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8233023/02/webrev/
> 
> Looks good to me.

Thanks again for the review, Roland.

Cheers,
Severin


From christian.hagedorn at oracle.com  Wed Nov  6 10:13:03 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Wed, 6 Nov 2019 11:13:03 +0100
Subject: [14] RFR(S): 8229694: JVM crash in SWPointer during C2 OSR compilation
Message-ID: <ac04b5f1-a652-e638-cd40-04fc01fca203@oracle.com>

Hi

Please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8229694
http://cr.openjdk.java.net/~chagedorn/8229694/webrev.00/

The JVM crashes in the testcase when trying to dereference mem at [1] 
which is NULL. This happens when converting packs into vector nodes in 
SuperWord::output() by reading an unexpected NULL value with 
SuperWord::align_to_ref() [2]. The corresponding field _align_to_ref is 
set to NULL at [3] since best_align_to_mem_ref was assigned NULL just 
before at [4]. _packset only contained one pack and there were no memory 
operations left to be processed (memops is empty). As a result 
SuperWord::find_align_to_ref() will return NULL since it needs at least 
two operations to find an alignment.

The fix is straight forward to directly use the alignment of the only 
pack remaining if there are no memory operations left in memops to find 
another alignment.


The testcase creates such a situation where only one pack remains at [4] 
when the loop is unrolled four times. When calling 
SuperWord::find_adjacent_refs() there are:
- 4 StoreI for intArr[j-1] = 400
- 4 StoreC for shortArr[j] = 30
- 2 StoreI for intArr[7] = 260 // Initially 4 but 2 are removed by IGVN 
in Ideal()
- 2 StoreC for shortArr[10] = 10 // Initially 4 but 2 are removed by 
IGVN in Ideal()
- 2 LoadI (and 2 StoreI) for iFld = intArr[j] // Initially 4 each but 2 
of each are removed by IGVN in Ideal()

The field stores are obviously ignored for the superword algorithm. 
intArr[j-1] aligns with intArr[7] and therefore create_pack is true. The 
only pack created is one with two immediately following stores for 
intArr[j-1]. The IGVN algorithm is not able to remove the first 
redundant store to intArr[7] when the loop is unrolled the first time. 
Only when unrolling it again the second time, it is able to remove the 
two newly created redundant stores to intArr[7]. This leaves us with the 
following depencendies of stores: "intArr[j-1] -> intArr[j-1] -> 
intArr[7] -> intArr[j-1] -> intArr[7] -> intArr[j-1]" from which only 
the first two operations can be used to create a pack.

The very same applies to the StoreC nodes. As a result, one pack for 
StoreI and one for StoreC are created in total. There are now only the 
two LoadI nodes of intArr[j] left which are not aligned with 
intArr[j-1]. Therefore, all StoreI packs are removed at [5]. This leaves 
us with exactly one ShortC pack and an empty memops list which sets the 
alignment to NULL and eventually lets the JVM crash at [1].

We might want to file an RFE to investigate further why IGVN cannot 
remove the first redundant stores to intArr[7], shortArr[10], and iFld, 
respectively (even though it's quite useless to keep setting the same 
values in a loop). This problem can also be observed if the loop only 
contains the statement "iFld = intArr[j]". But I think even if those 
redundant stores would have been optimized away we should have this fix 
to handle the situation with only one pack and no memory operations 
remaining.


Thank you!

Best regards,
Christian


[1] 
http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l3608
[2] 
http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l2328
[3] 
http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l732
[4] 
http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l708
[5] 
http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l688

From david.holmes at oracle.com  Wed Nov  6 10:14:30 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 6 Nov 2019 20:14:30 +1000
Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread
In-Reply-To: <HE1PR0201MB2475B62202B8D5187D6C3CAC9A790@HE1PR0201MB2475.eurprd02.prod.outlook.com>
References: <VI1PR0201MB24797F385592A8C572B4064E9A900@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <HE1PR0201MB24755374CE7618AA8545A2309A6B0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com>
 <A0870A55-0738-4CC6-AF3C-50B6E2324D3F@oracle.com>
 <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com>
 <VI1PR0201MB24790137F2D7A941A33CE4949A660@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d8990c59-b51e-1c21-3e7c-272d287fc711@oracle.com>
 <VI1PR0201MB2479DDFCD2E55541481E65ED9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com>
 <VI1PR0201MB247928CE867614F8272E5D719A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com>
 <VI1PR0201MB247921C496C30C638B0959279A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <8BCAE092-9E05-4B38-9D41-2F3AC889C0AA@oracle.com>
 <HE1PR0201MB2475B62202B8D5187D6C3CAC9A790@HE1PR0201MB2475.eurprd02.prod.outlook.com>
Message-ID: <84ef3a8c-5005-6529-5192-b9214e0348ac@oracle.com>

Hi Martin,

On 6/11/2019 7:12 pm, Doerr, Martin wrote:
> Hi Kim,
> 
> thanks for confirming.
> 
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webrev.04/
> already avoids access to freed handles.

Sorry I missed your earlier reference to this version.

So the expectation here is that all accesses to these arrays are guarded 
by the CompileThread_lock, but that doesn't seem to hold for get_log ?

Thanks,
David
-----

> I don't really like the complexity of this code.
> Replacing oops in handles would have been much more simple.
> But I can live with either version.
> 
> Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: Kim Barrett <kim.barrett at oracle.com>
>> Sent: Mittwoch, 6. November 2019 04:09
>> To: Doerr, Martin <martin.doerr at sap.com>
>> Cc: David Holmes <david.holmes at oracle.com>; dean.long at oracle.com;
>> Vladimir Kozlov (vladimir.kozlov at oracle.com)
>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>
>>> On Nov 5, 2019, at 3:40 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>
>> Coming back in, because this seems to be going off into the weeds again.
>>
>>>> I don't understand what you mean. If a compiler thread holds an oop, any
>>>> oop, it must hold it in a Handle to ensure it can't be gc'd.
>>>
>>> The problem is not related to gc.
>>> My change introduces destroy_global for the handles. This means that the
>> OopStorage portion which has held the oop can get freed.
>>> However, other compiler threads are running concurrently. They may
>> execute code which reads the oop from the handle which is freed by this
>> thread.
>>> Reading stale data is not a problem here, but reading freed memory may
>> assert or even crash in general.
>>> I can't see how OopStorage supports reading from handles which were
>> freed by destroy_global.
>>
>> So don't do that!
>>
>> OopStorage isn't magic. If you are going to look at an OopStorage
>> handle, you have to ensure there won't be concurrent deletion. Use
>> locks or some safe memory reclamation protocol. (GlobalCounter might
>> be used here, but it depends a lot on what the iterations are doing. A
>> reference counting mechanism is another possibility.) This is no
>> different from any other resource management.
>>
>>> I think it would be safe if the freeing only occurred at safepoints, but I don't
>> think this is the case.
>>
>> Assuming the iteration didn?t happen at safepoints (which is just a way to
>> make the iteration and
>> deletion not concurrent).  And I agree that isn?t the case with the current
>> code.
> 

From tobias.hartmann at oracle.com  Wed Nov  6 13:34:15 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 6 Nov 2019 14:34:15 +0100
Subject: RFR(XS): 8233491: Crash in AdapterHandlerLibrary::get_adapter with
 CDS due to code cache exhaustion
Message-ID: <c44d6d02-8244-2ef3-ac8b-ba7f12e65eb4@oracle.com>

Hi,

please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8233491
http://cr.openjdk.java.net/~thartmann/8233491/webrev.00/

When running a stress test with CDS, we fail to create adapters when linking a method from a shared
class because the code cache is full. This case is not properly handled by the CDS specific code and
instead of throwing a VirtualMachineError, we crash because "entry" is NULL.

I'm able to spuriously reproduce this with a test (see [1]) but since the problem depends on the
class loading sequence, I was not able to make it more reliable or convert it to a robust jtreg
test. However, I've verified that the patch fixes the problem.

Thanks,
Tobias

[1]
https://bugs.openjdk.java.net/browse/JDK-8233491?focusedCommentId=14298462&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14298462

From vitalyd at gmail.com  Wed Nov  6 13:45:35 2019
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Wed, 6 Nov 2019 08:45:35 -0500
Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out
In-Reply-To: <87y2wu7kpn.fsf@redhat.com>
References: <878spbc0c8.fsf@redhat.com>
 <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com>
 <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com> <87y2wu7kpn.fsf@redhat.com>
Message-ID: <CAHjP37GtPX0p984_7Q-akN5gSGDJEgWDhJitWn2dHc8DRZFAdQ@mail.gmail.com>

Hi Roland,

Thanks for fixing this! :) Perhaps a bit too premature to ask but: any
chance this will get backported to 11?

Thanks

On Tue, Nov 5, 2019 at 4:10 AM Roland Westrelin <rwestrel at redhat.com> wrote:

>
> Hi Tobias,
>
> Thanks for the review and for performance testing.
>
> Roland.
>

From claes.redestad at oracle.com  Wed Nov  6 14:42:50 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Wed, 6 Nov 2019 15:42:50 +0100
Subject: RFR: 8233708: VectorSet cleanup
Message-ID: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com>

Hi,

VectorSet needs a cleanup:

- Extravagant use of operator overloading (<<= adds items to the set,
 >>= removes them :eyeroll:)
- Plenty of unused methods
- Since VectorSet is the only implementation in the code base, the
abstract base class Set is unnecessary
- Various method names in conflict with HotSpot code "standard"

Webrev: http://cr.openjdk.java.net/~redestad/8233708/open.00/
Bug:    https://bugs.openjdk.java.net/browse/JDK-8233708

Testing: tier1-3, built and sanity tested with Shenandoah due changes
in some Shenandoah C2 support

Thanks!

/Claes

From rwestrel at redhat.com  Wed Nov  6 14:54:34 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 06 Nov 2019 15:54:34 +0100
Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out
In-Reply-To: <CAHjP37GtPX0p984_7Q-akN5gSGDJEgWDhJitWn2dHc8DRZFAdQ@mail.gmail.com>
References: <878spbc0c8.fsf@redhat.com>
 <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com>
 <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com> <87y2wu7kpn.fsf@redhat.com>
 <CAHjP37GtPX0p984_7Q-akN5gSGDJEgWDhJitWn2dHc8DRZFAdQ@mail.gmail.com>
Message-ID: <87o8xpdphx.fsf@redhat.com>


Hi Vitaly,

> Thanks for fixing this! :) Perhaps a bit too premature to ask but: any
> chance this will get backported to 11?

(I still need an extra review in order to push this)

I'll get it backported to 11u (that is openjdk 11u, I can't comment
about Oracle 11u).

Roland.

From vitalyd at gmail.com  Wed Nov  6 15:04:37 2019
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Wed, 6 Nov 2019 10:04:37 -0500
Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out
In-Reply-To: <87o8xpdphx.fsf@redhat.com>
References: <878spbc0c8.fsf@redhat.com>
 <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com>
 <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com> <87y2wu7kpn.fsf@redhat.com>
 <CAHjP37GtPX0p984_7Q-akN5gSGDJEgWDhJitWn2dHc8DRZFAdQ@mail.gmail.com>
 <87o8xpdphx.fsf@redhat.com>
Message-ID: <CAHjP37HErcda76QWToMS7wkciVGuKXc2CU3Ar-QZO4TgXq5Orw@mail.gmail.com>

Hey Roland,

On Wed, Nov 6, 2019 at 9:54 AM Roland Westrelin <rwestrel at redhat.com> wrote:

>
> Hi Vitaly,
>
> > Thanks for fixing this! :) Perhaps a bit too premature to ask but: any
> > chance this will get backported to 11?
>
> (I still need an extra review in order to push this)
>
> I'll get it backported to 11u (that is openjdk 11u, I can't comment
> about Oracle 11u).
>
Perfect! I'm actually interested in openjdk11 - should've made that clear,
sorry.  Thanks again for tackling this so quickly!

>
> Roland.
>

From nils.eliasson at oracle.com  Wed Nov  6 15:08:51 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Wed, 6 Nov 2019 16:08:51 +0100
Subject: RFR: 8233708: VectorSet cleanup
In-Reply-To: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com>
References: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com>
Message-ID: <fa62038c-f636-3a60-2947-83da88ccfa9e@oracle.com>

Hi Claes,

Excellent cleanup!

// Nils

On 2019-11-06 15:42, Claes Redestad wrote:
> Hi,
>
> VectorSet needs a cleanup:
>
> - Extravagant use of operator overloading (<<= adds items to the set,
> >>= removes them :eyeroll:)
> - Plenty of unused methods
> - Since VectorSet is the only implementation in the code base, the
> abstract base class Set is unnecessary
> - Various method names in conflict with HotSpot code "standard"
>
> Webrev: http://cr.openjdk.java.net/~redestad/8233708/open.00/
> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8233708
>
> Testing: tier1-3, built and sanity tested with Shenandoah due changes
> in some Shenandoah C2 support
>
> Thanks!
>
> /Claes

From martin.doerr at sap.com  Wed Nov  6 15:15:30 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 6 Nov 2019 15:15:30 +0000
Subject: [8u] RFR for backport of 8198894 (CRC32 1/4): [PPC64] More
 generic vector CRC implementation (v2)
In-Reply-To: <b5e148b7-1493-24f2-ecf2-89ce72f6df38@linux.vnet.ibm.com>
References: <0dc83fcb-4e09-5841-04be-aee615e5a7fd@linux.vnet.ibm.com>
 <VI1PR0201MB2479B9E369C33E4D46B8F9149A680@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <67e6e482-df56-27d0-da20-7968615f3ea1@linux.vnet.ibm.com>
 <VI1PR0201MB247951F074AA4A598CBF8B8B9A6A0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <b5e148b7-1493-24f2-ecf2-89ce72f6df38@linux.vnet.ibm.com>
Message-ID: <HE1PR0201MB24752CF818A0E90380EB7C7E9A790@HE1PR0201MB2475.eurprd02.prod.outlook.com>

Hi Gustavo,

> [PPC64] More generic vector CRC implementation                           (1/4)
> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8198894/

This seems to be the version I had already reviewed. I'm still ok with it.


> [PPC64] Possibly unreliable stack frame resizing in template interpreter (2/4)
> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8216376/

Normally, backports should get handled in separate backport RFRs, but the manual change in this version could be considered trivial, so I don't insist on separate RFR.
Looks good, now.


> [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays
> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8216060/   (3/4)

Almost like above. I'd leave at least the CRC32C defines in the code (stubRoutines_ppc). They don't disturb. Why should we introduce additional diffs?


> [PPC64] Cleanup non-vector version of CRC32
> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8217459/   (4/4)

This one contains a part of another shared code change. So I can't review it as part of this RFR.
Needs to get reviewed separately. Backport of the original change which introduced LITTLE_ENDIAN_ONLY etc. should get evaluated.


Best regards,
Martin


> -----Original Message-----
> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> Sent: Montag, 4. November 2019 23:33
> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-compiler-
> dev at openjdk.java.net
> Cc: jdk8u-dev at openjdk.java.net
> Subject: Re: [8u] RFR for backport of 8198894 (CRC32 1/4): [PPC64] More
> generic vector CRC implementation (v2)
> 
> Hello Martin,
> 
> On 10/24/2019 07:17 AM, Doerr, Martin wrote:
> > Hi Gustavo,
> >
> > I think removing invertCRC is an unnecessary manual change.
> > We should minimize that as far as possible. They may create merge
> conflicts for future backports.
> 
> Thanks a lot for the review.
> 
> I agree I should minimize the changes as far as possible. I added back
> invertCRC
> and tried to follow your advice, so the final clean-up patch is almost similar
> to the one found on jdk/jdk, for instance.
> 
> Please find v2 for the patchset below. v2 changes affect only 3/4 and 4/4.
> 
> 
> [PPC64] More generic vector CRC implementation                           (1/4)
> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8198894/
> 
> v2:
> - Adapt file names to OpenJDK 8u
> - Remove CRC32C part, leaving only CRC32 part, since OpenJDK 8u has no
> CRC32C
> - Add Assembler::add_const_optimized() from "8077838: Recent
> developments for ppc" [0]
> - Fix vpermxor() opcode, replacing VPMSUMW_OPCODE by
> VPERMXOR_OPCODE,
>    accordingly to fix in "8190781: ppc64 + s390: Fix CriticalJNINatives" [1]
> - Adapt signatures for the following functions and their callers, accordingly to
>    "8175369: [ppc] Provide intrinsic implementation for CRC32C" [2]:
>     a. MacroAssembler::update_byteLoop_crc32(), removing 'invertCRC'
> parameter
>     b. MacroAssembler::kernel_crc32_1word(), adding 'invertCRC' parameter
> 
> 
> [PPC64] Possibly unreliable stack frame resizing in template interpreter (2/4)
> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8216376/
> 
> v2:
> - Adapt file names to OpenJDK 8u
> - Remove CRC32C code
> 
> 
> [PPC64] Vector CRC implementation should be used by interpreter and be
> faster for short arrays
> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8216060/
> (3/4)
> 
> v2:
> - Remove CRC32C code, keeping is_crc32c in crc32(), code related to is_crc32c
>    and invertCRC, like code in kernel_crc32_vpmsum(), and not touching stub
> code
>    mark in generate_CRC32_updateBytes() to avoid merge conflicts in future
>    backports.
> 
> 
> [PPC64] Cleanup non-vector version of CRC32
> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8217459/
> (4/4)
> 
> v2:
> - Add {BIG,LITTLE}_ENDIAN_ONLY to src/share/vm/utilities/macros.hpp
> - Add kernel_crc32_singleByteReg from change 8175369 [2] as the clean-up
> uses it
>    in InterpreterGenerator::generate_CRC32_update_entry().
> 
> 
> --
> 
> Best regards,
> Gustavo
> 
> [0] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/88847a1b3718
> [1] http://hg.openjdk.java.net/jdk/jdk/rev/5a69ba3a4fd1#l1.7
> [2] https://bugs.openjdk.java.net/browse/JDK-8175369

From shade at redhat.com  Wed Nov  6 15:49:26 2019
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 6 Nov 2019 16:49:26 +0100
Subject: RFR: 8233708: VectorSet cleanup
In-Reply-To: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com>
References: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com>
Message-ID: <336853d7-3047-0194-4e36-9ee968e07a86@redhat.com>

On 11/6/19 3:42 PM, Claes Redestad wrote:
> Webrev: http://cr.openjdk.java.net/~redestad/8233708/open.00/

Shenandoah part looks fine.

The rest looks fine too.

-- 
Thanks,
-Aleksey


From tobias.hartmann at oracle.com  Wed Nov  6 15:57:13 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 6 Nov 2019 16:57:13 +0100
Subject: RFR: 8233708: VectorSet cleanup
In-Reply-To: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com>
References: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com>
Message-ID: <c679ab18-ed3b-5696-ba89-6814959a1c0c@oracle.com>

Hi Claes,

very nice. While you're at it, could you add brackets to the for-loops/ifs in vectset.cpp:70,
vectset.hpp:89, 98 and fix the whitespacing? No new webrev required.

Thanks,
Tobias

On 06.11.19 15:42, Claes Redestad wrote:
> Hi,
> 
> VectorSet needs a cleanup:
> 
> - Extravagant use of operator overloading (<<= adds items to the set,
>>>= removes them :eyeroll:)
> - Plenty of unused methods
> - Since VectorSet is the only implementation in the code base, the
> abstract base class Set is unnecessary
> - Various method names in conflict with HotSpot code "standard"
> 
> Webrev: http://cr.openjdk.java.net/~redestad/8233708/open.00/
> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8233708
> 
> Testing: tier1-3, built and sanity tested with Shenandoah due changes
> in some Shenandoah C2 support
> 
> Thanks!
> 
> /Claes

From gnu.andrew at redhat.com  Wed Nov  6 16:51:47 2019
From: gnu.andrew at redhat.com (Andrew John Hughes)
Date: Wed, 6 Nov 2019 16:51:47 +0000
Subject: [8u] RFR: 8233023: assert(Opcode() == mem->Opcode() ||
 phase->C->get_alias_index(adr_type()) == Compile::AliasIdxRaw) failed: no
 mismatched stores, except on raw memory
In-Reply-To: <7a2d50793120bdeb86d0047fd09db18c725328e0.camel@redhat.com>
References: <7a2d50793120bdeb86d0047fd09db18c725328e0.camel@redhat.com>
Message-ID: <dff209ce-4356-25f0-1b2b-dab8a5f16dd5@redhat.com>

On 30/10/2019 09:41, Severin Gehwolf wrote:
> Hi,
> 
> Could I please get a review of this 8u only issue? The reason a
> fastdebug build of latest OpenJDK 8u asserts for the dec-tree benchmark
> of the renaissance suite is because the 8u backport of JDK-8140309 was
> missing this hunk from JDK 9[1]:
> 
> +             (Opcode() == Op_StoreL && st->Opcode() == Op_StoreI) || // expanded ClearArrayNode
> +             (is_mismatched_access() || st->as_Store()->is_mismatched_access()),
> 
> I had a closer look and there doesn't seem to be missing anything else.
> The proposed fix is to amend the assert condition in the appropriate
> place, which brings 8u in line with JDK 9 code where the failure isn't
> observed.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8233023
> webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8233023/01/webrev/
> 
> Testing: 8u tier1 test set with fastdebug build on x86_64 Linux. No new
> failures. dec-tree benchmark now runs successfully on an 8u fastdebug
> build.
> 
> Thoughts?
> 
> Thanks,
> Severin
> 
> [1] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/4bee38ba018c
> 

I compared the two patches and this missing hunk does stand out. So the
patch looks fine in that respect.

I notice they also didn't backport the testcase to 8u. Any thoughts on
including that?

Thanks,
-- 
Andrew :)

Senior Free Java Software Engineer
Red Hat, Inc. (http://www.redhat.com)

PGP Key: ed25519/0xCFDA0F9B35964222 (hkp://keys.gnupg.net)
Fingerprint = 5132 579D D154 0ED2 3E04  C5A0 CFDA 0F9B 3596 4222
https://keybase.io/gnu_andrew


From vladimir.kozlov at oracle.com  Wed Nov  6 18:01:44 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 6 Nov 2019 10:01:44 -0800
Subject: RFR(XS): 8233491: Crash in AdapterHandlerLibrary::get_adapter
 with CDS due to code cache exhaustion
In-Reply-To: <c44d6d02-8244-2ef3-ac8b-ba7f12e65eb4@oracle.com>
References: <c44d6d02-8244-2ef3-ac8b-ba7f12e65eb4@oracle.com>
Message-ID: <af6e664b-8942-96b3-aba2-220679ba4b18@oracle.com>

CC to runtime group too.

Looks good to me.

Thanks,
Vladimir

On 11/6/19 5:34 AM, Tobias Hartmann wrote:
> Hi,
> 
> please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8233491
> http://cr.openjdk.java.net/~thartmann/8233491/webrev.00/
> 
> When running a stress test with CDS, we fail to create adapters when linking a method from a shared
> class because the code cache is full. This case is not properly handled by the CDS specific code and
> instead of throwing a VirtualMachineError, we crash because "entry" is NULL.
> 
> I'm able to spuriously reproduce this with a test (see [1]) but since the problem depends on the
> class loading sequence, I was not able to make it more reliable or convert it to a robust jtreg
> test. However, I've verified that the patch fixes the problem.
> 
> Thanks,
> Tobias
> 
> [1]
> https://bugs.openjdk.java.net/browse/JDK-8233491?focusedCommentId=14298462&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14298462
> 

From vladimir.kozlov at oracle.com  Wed Nov  6 18:05:40 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 6 Nov 2019 10:05:40 -0800
Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out
In-Reply-To: <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com>
References: <878spbc0c8.fsf@redhat.com>
 <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com>
 <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com>
Message-ID: <f86c6ba8-b8b7-dc46-c29a-86c10cb04861@oracle.com>

I am fine with this conservative fix (bailout optimization). It is good that performance is not affected.

Thanks,
Vladimir

On 11/5/19 1:01 AM, Tobias Hartmann wrote:
> Performance results look good.
> 
> Best regards,
> Tobias
> 
> On 04.11.19 08:25, Tobias Hartmann wrote:
>> Hi Roland,
>>
>> this seems reasonable to me but I'm concerned that it might cause performance regressions. I'll run
>> some tests in our system.
>>
>> Best regards,
>> Tobias
>>
>> On 23.10.19 10:50, Roland Westrelin wrote:
>>>
>>> http://cr.openjdk.java.net/~roland/8232539/webrev.00/
>>>
>>> I couldn't come up with a test case because node processing order during
>>> IGVN matters. Bug was reported against 11 but I see no reason it
>>> wouldn't apply to current code as well.
>>>
>>> At parse time, predicates are added by
>>> Parse::maybe_add_predicate_after_if() but not loop is actually
>>> created. Compile::_major_progress is cleared. On the next round of IGVN,
>>> one input of a region points to predicates. The same region has an if as
>>> use that can be split through phi during IGVN. The predicates are going
>>> to be removed by IGVN. But that happens in multiple steps because there
>>> are several predicates (for reason Deoptimization::Reason_predicate,
>>> Deoptimization::Reason_loop_limit_check etc.) and because for each
>>> predicate one IGVN iteration must first remove the Opaque1 node, then
>>> another kill the IfFalse projection, finally another replace the IfTrue
>>> projection by the If control input.
>>>
>>> Split if occurs while predicates are in the process of being removed. It
>>> sees predicates, tries to walk over them, encounters a predicates that's
>>> been half removed (false projection removed) and we hit the assert/crash.
>>>
>>> I propose we simply not apply IGVN split if if we're splitting through a
>>> loop or if there's a predicate input to a region because:
>>>
>>> - Making split if robust to dying predicates is not straightforward as
>>>    far as I can tell
>>>
>>> - Loop opts split if doesn't split through loop header so why would it
>>>    make sense for IGVN split if?
>>>
>>> - I'm wondering if there are other cases where handling of predicates in
>>>    split if could be wrong (and so more trouble ahead):
>>>
>>>    + What if we split through a Loop region, predicates were added by
>>>    loop optimizations, loop opts are now over so the predicates added at
>>>    parse time were removed: then PhaseIdealLoop::find_predicate()
>>>    wouldn't report a predicate but cloning predicates would still be
>>>    required for correctness?
>>>
>>>    + What if we have no loop, a region has predicates as input,
>>>    predicates are going to die but have not yet been processed, split if
>>>    uselessly duplicates predicates but one of then is control dependent
>>>    on the branch it is in so cloning predicates actually causes a broken
>>>    graph?
>>>
>>> So overall it feels safer to me to simply bail out from split if for
>>> loops/predicates.
>>>
>>> Roland.
>>>

From vladimir.kozlov at oracle.com  Wed Nov  6 18:44:47 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 6 Nov 2019 10:44:47 -0800
Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <CAEgw74CWKFCH9c744sJcAEf4_9-33vk9HBGosNH4QwgEH4nKYw@mail.gmail.com>
References: <CAEgw74CWKFCH9c744sJcAEf4_9-33vk9HBGosNH4QwgEH4nKYw@mail.gmail.com>
Message-ID: <dbdbd1cd-c564-20de-a7ac-f31807b02cc5@oracle.com>

Hi Bernard,

It is interesting suggestion. I don't see we use BTR and BTS currently. Sandhya, do these instructions has some 
limitations/restrictions?

Regarding changes. For new code we prefer to have new encoding of macroasm instructions used in .ad files instead of 
opcodes [1]. This way we make sure correct encoding is used on different CPUs.

Thanks,
Vladimir

[1] http://hg.openjdk.java.net/jdk/jdk/file/38d4202154f2/src/hotspot/cpu/x86/x86_64.ad#l10051

On 11/2/19 10:18 AM, B. Blaser wrote:
> Hi,
> 
> I experimented, some time ago, with an optimization of several common
> flag patterns (see also JBS) using BTR/BTS instead of AND/OR
> instructions on x86_64 xeon:
> 
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.NANOSECONDS)
> @State(Scope.Thread)
> public class BitSetAndReset {
>      private static final int COUNT = 10_000;
> 
>      private static final long MASK63 = 0x8000_0000_0000_0000L;
>      private static final long MASK31 = 0x0000_0000_8000_0000L;
>      private static final long MASK15 = 0x0000_0000_0000_8000L;
>      private static final long MASK00 = 0x0000_0000_0000_0001L;
> 
>      private long andq, orq;
>      private boolean success = true;
> 
>      @TearDown(Level.Iteration)
>      public void finish() {
>          if (!success)
>              throw new AssertionError("Failure while setting or
> clearing long vector bits!");
>      }
> 
>      @Benchmark
>      public void bitSet(Blackhole bh) {
>          for (int i=0; i<COUNT; i++) {
>              andq = MASK63 | MASK31 | MASK15 | MASK00;
>              orq = 0;
>              bh.consume(test63());
>              bh.consume(test31());
>              bh.consume(test15());
>              bh.consume(test00());
>              success &= andq == 0 && orq == (MASK63 | MASK31 | MASK15 | MASK00);
>          }
>      }
> 
>      private long test63() {
>          andq &= ~MASK63;
>          orq |= MASK63;
>          return 0L;
>      }
>      private long test31() {
>          andq &= ~MASK31;
>          orq |= MASK31;
>          return 0L;
>      }
>      private long test15() {
>          andq &= ~MASK15;
>          orq |= MASK15;
>          return 0L;
>      }
>      private long test00() {
>          andq &= ~MASK00;
>          orq |= MASK00;
>          return 0L;
>      }
> }
> 
> Running the benchmark this way:
> 
> $ make test TEST="micro:vm.compiler.BitSetAndReset"
> MICRO="VM_OPTIONS='-XX:CompileCommand=print,org/openjdk/bench/vm/compiler/BitSetAndReset.*test*';FORK=3;WARMUP_ITER=1;ITER=3"
> 
> We had before:
> 
> 03e       movq    R10, #9223372036854775807    # long
> 048       andq    [RSI + #16 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.andq
> 04c       movq    R10, #-9223372036854775808    # long
> 056       orq     [RSI + #24 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq
> 05a       ...
> 
> => 28 bytes
> 
> 03c       xorl    RAX, RAX    # long
> 03e       movq    R10, #-2147483649    # long
> 048       andq    [RSI + #16 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.andq
> 04c       movl    R10, #2147483648    # long (unsigned 32-bit)
> 052       orq     [RSI + #24 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq
> 056       ...
> 
> => 26 bytes
> 
> 03c       andq    [RSI + #16 (8-bit)], #-32769    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.andq
> 044       orq     [RSI + #24 (8-bit)], #32768    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq
> 04c       ...
> 
> 03c       andq    [RSI + #16 (8-bit)], #-2    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.andq
> 041       orq     [RSI + #24 (8-bit)], #1    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq
> 046       ...
> 
> Benchmark              Mode  Cnt      Score      Error  Units
> BitSetAndReset.bitSet  avgt    9  78083.773 ? 2182.692  ns/op
> 
> And we would have after:
> 
> 03c       btrq    [RSI + #16 (8-bit)], log2(not(#9223372036854775807))
>     # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq
> 042       btsq    [RSI + #24 (8-bit)], log2(#-9223372036854775808)
> # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq
> 048       ...
> 
> => 12 bytes
> 
> 03c       btrq    [RSI + #16 (8-bit)], log2(not(#-2147483649))    #
> long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq
> 042       xorl    RAX, RAX    # long
> 044       movl    R10, #2147483648    # long (unsigned 32-bit)
> 04a       orq     [RSI + #24 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq
> 04e       ...
> 
> => 18 bytes
> 
> 03c       andq    [RSI + #16 (8-bit)], #-32769    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.andq
> 044       orq     [RSI + #24 (8-bit)], #32768    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq
> 04c       ...
> 
> 03c       andq    [RSI + #16 (8-bit)], #-2    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.andq
> 041       orq     [RSI + #24 (8-bit)], #1    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq
> 046       ...
> 
> Benchmark              Mode  Cnt      Score     Error  Units
> BitSetAndReset.bitSet  avgt    9  77355.154 ? 252.503  ns/op
> 
> We see a tiny performance gain with BTR/BTS but the major interest
> remains the much better encoding with up to 16 bytes saving for pure
> 64-bit immediates along with a lower register consumption.
> 
> Does the patch below look reasonable enough to eventually rebase and
> push it to jdk/submit and to post a RFR maybe soon if all goes well?
> 
> Thanks,
> Bernard
> 
> diff --git a/src/hotspot/cpu/x86/x86_64.ad b/src/hotspot/cpu/x86/x86_64.ad
> --- a/src/hotspot/cpu/x86/x86_64.ad
> +++ b/src/hotspot/cpu/x86/x86_64.ad
> @@ -2069,6 +2069,16 @@
>       }
>     %}
> 
> +  enc_class Log2L(immPow2L imm)
> +  %{
> +    emit_d8(cbuf, log2_long($imm$$constant));
> +  %}
> +
> +  enc_class Log2NotL(immPow2NotL imm)
> +  %{
> +    emit_d8(cbuf, log2_long(~$imm$$constant));
> +  %}
> +
>     enc_class opc2_reg(rRegI dst)
>     %{
>       // BSWAP
> @@ -3131,6 +3141,28 @@
>     interface(CONST_INTER);
>   %}
> 
> +operand immPow2L()
> +%{
> +  // n should be a pure 64-bit power of 2 immediate.
> +  predicate(is_power_of_2_long(n->get_long()) &&
> log2_long(n->get_long()) > 31);
> +  match(ConL);
> +
> +  op_cost(15);
> +  format %{ %}
> +  interface(CONST_INTER);
> +%}
> +
> +operand immPow2NotL()
> +%{
> +  // n should be a pure 64-bit immediate given that not(n) is a power of 2.
> +  predicate(is_power_of_2_long(~n->get_long()) &&
> log2_long(~n->get_long()) > 30);
> +  match(ConL);
> +
> +  op_cost(15);
> +  format %{ %}
> +  interface(CONST_INTER);
> +%}
> +
>   // Long Immediate zero
>   operand immL0()
>   %{
> @@ -9740,6 +9772,19 @@
>     ins_pipe(ialu_mem_imm);
>   %}
> 
> +instruct btrL_mem_imm(memory dst, immPow2NotL src, rFlagsReg cr)
> +%{
> +  match(Set dst (StoreL dst (AndL (LoadL dst) src)));
> +  effect(KILL cr);
> +
> +  ins_cost(125);
> +  format %{ "btrq    $dst, log2(not($src))\t# long" %}
> +  opcode(0x0F, 0xBA, 0x06);
> +  ins_encode(REX_mem_wide(dst), OpcP, OpcS,
> +             RM_opc_mem(tertiary, dst), Log2NotL(src));
> +  ins_pipe(ialu_mem_imm);
> +%}
> +
>   // BMI1 instructions
>   instruct andnL_rReg_rReg_mem(rRegL dst, rRegL src1, memory src2,
> immL_M1 minus_1, rFlagsReg cr) %{
>     match(Set dst (AndL (XorL src1 minus_1) (LoadL src2)));
> @@ -9933,6 +9978,19 @@
>     ins_pipe(ialu_mem_imm);
>   %}
> 
> +instruct btsL_mem_imm(memory dst, immPow2L src, rFlagsReg cr)
> +%{
> +  match(Set dst (StoreL dst (OrL (LoadL dst) src)));
> +  effect(KILL cr);
> +
> +  ins_cost(125);
> +  format %{ "btsq    $dst, log2($src)\t# long" %}
> +  opcode(0x0F, 0xBA, 0x05);
> +  ins_encode(REX_mem_wide(dst), OpcP, OpcS,
> +             RM_opc_mem(tertiary, dst), Log2L(src));
> +  ins_pipe(ialu_mem_imm);
> +%}
> +
>   // Xor Instructions
>   // Xor Register with Register
>   instruct xorL_rReg(rRegL dst, rRegL src, rFlagsReg cr)
> 

From claes.redestad at oracle.com  Wed Nov  6 19:17:16 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Wed, 6 Nov 2019 20:17:16 +0100
Subject: RFR: 8233708: VectorSet cleanup
In-Reply-To: <336853d7-3047-0194-4e36-9ee968e07a86@redhat.com>
References: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com>
 <336853d7-3047-0194-4e36-9ee968e07a86@redhat.com>
Message-ID: <5af59bb9-6d1a-1a69-0693-a27ee9a2d56c@oracle.com>

Hi Aleksey,

On 2019-11-06 16:49, Aleksey Shipilev wrote:
> On 11/6/19 3:42 PM, Claes Redestad wrote:
>> Webrev: http://cr.openjdk.java.net/~redestad/8233708/open.00/
> 
> Shenandoah part looks fine.

thanks for checking!

> 
> The rest looks fine too.

Thanks!

/Claes


From claes.redestad at oracle.com  Wed Nov  6 19:17:39 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Wed, 6 Nov 2019 20:17:39 +0100
Subject: RFR: 8233708: VectorSet cleanup
In-Reply-To: <fa62038c-f636-3a60-2947-83da88ccfa9e@oracle.com>
References: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com>
 <fa62038c-f636-3a60-2947-83da88ccfa9e@oracle.com>
Message-ID: <531ad263-331c-dcca-b63d-f99648d64618@oracle.com>

Hi Nils,

On 2019-11-06 16:08, Nils Eliasson wrote:
> Hi Claes,
> 
> Excellent cleanup!

thanks for reviewing!

/Claes

From claes.redestad at oracle.com  Wed Nov  6 19:18:01 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Wed, 6 Nov 2019 20:18:01 +0100
Subject: RFR: 8233708: VectorSet cleanup
In-Reply-To: <c679ab18-ed3b-5696-ba89-6814959a1c0c@oracle.com>
References: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com>
 <c679ab18-ed3b-5696-ba89-6814959a1c0c@oracle.com>
Message-ID: <7cf8fffc-4198-11af-86a8-3db575524e79@oracle.com>

Hi Tobias,

On 2019-11-06 16:57, Tobias Hartmann wrote:
> Hi Claes,
> 
> very nice. While you're at it, could you add brackets to the for-loops/ifs in vectset.cpp:70,
> vectset.hpp:89, 98 and fix the whitespacing? No new webrev required.

will do!

/Claes

From sandhya.viswanathan at intel.com  Thu Nov  7 00:34:32 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Thu, 7 Nov 2019 00:34:32 +0000
Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <dbdbd1cd-c564-20de-a7ac-f31807b02cc5@oracle.com>
References: <CAEgw74CWKFCH9c744sJcAEf4_9-33vk9HBGosNH4QwgEH4nKYw@mail.gmail.com>
 <dbdbd1cd-c564-20de-a7ac-f31807b02cc5@oracle.com>
Message-ID: <CH2PR11MB4280661F4597DEF39127D45FEF780@CH2PR11MB4280.namprd11.prod.outlook.com>

Hi Vladimir/Bernard,


I don?t see any restrictions/limitations on these instructions other than the fact that the ?long? operation is only supported on 64-bit format as usual so should be restricted to 64-bit JVM only.

The code size improvement that Bernard demonstrates is significant for operation on longs.

It looks like the throughput for AND/OR is better than BTR/BTS  (0.25 vs 0.5) though. Please refer Table C-17 in the document below:

https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

Best Regards,

Sandhya


-----Original Message-----
From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
Sent: Wednesday, November 06, 2019 10:45 AM
To: B. Blaser <bsrbnd at gmail.com>; hotspot-compiler-dev at openjdk.java.net
Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
Subject: Re: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and setting long vector bits


Hi Bernard,


It is interesting suggestion. I don't see we use BTR and BTS currently. Sandhya, do these instructions has some limitations/restrictions?


Regarding changes. For new code we prefer to have new encoding of macroasm instructions used in .ad files instead of opcodes [1]. This way we make sure correct encoding is used on different CPUs.


Thanks,

Vladimir


[1] http://hg.openjdk.java.net/jdk/jdk/file/38d4202154f2/src/hotspot/cpu/x86/x86_64.ad#l10051


On 11/2/19 10:18 AM, B. Blaser wrote:

> Hi,

>

> I experimented, some time ago, with an optimization of several common

> flag patterns (see also JBS) using BTR/BTS instead of AND/OR

> instructions on x86_64 xeon:

>

> @BenchmarkMode(Mode.AverageTime)

> @OutputTimeUnit(TimeUnit.NANOSECONDS)

> @State(Scope.Thread)

> public class BitSetAndReset {

>      private static final int COUNT = 10_000;

>

>      private static final long MASK63 = 0x8000_0000_0000_0000L;

>      private static final long MASK31 = 0x0000_0000_8000_0000L;

>      private static final long MASK15 = 0x0000_0000_0000_8000L;

>      private static final long MASK00 = 0x0000_0000_0000_0001L;

>

>      private long andq, orq;

>      private boolean success = true;

>

>      @TearDown(Level.Iteration)

>      public void finish() {

>          if (!success)

>              throw new AssertionError("Failure while setting or

> clearing long vector bits!");

>      }

>

>      @Benchmark

>      public void bitSet(Blackhole bh) {

>          for (int i=0; i<COUNT; i++) {

>              andq = MASK63 | MASK31 | MASK15 | MASK00;

>              orq = 0;

>              bh.consume(test63());

>              bh.consume(test31());

>              bh.consume(test15());

>              bh.consume(test00());

>              success &= andq == 0 && orq == (MASK63 | MASK31 | MASK15 | MASK00);

>          }

>      }

>

>      private long test63() {

>          andq &= ~MASK63;

>          orq |= MASK63;

>          return 0L;

>      }

>      private long test31() {

>          andq &= ~MASK31;

>          orq |= MASK31;

>          return 0L;

>      }

>      private long test15() {

>          andq &= ~MASK15;

>          orq |= MASK15;

>          return 0L;

>      }

>      private long test00() {

>          andq &= ~MASK00;

>          orq |= MASK00;

>          return 0L;

>      }

> }

>

> Running the benchmark this way:

>

> $ make test TEST="micro:vm.compiler.BitSetAndReset"

> MICRO="VM_OPTIONS='-XX:CompileCommand=print,org/openjdk/bench/vm/compiler/BitSetAndReset.*test*';FORK=3;WARMUP_ITER=1;ITER=3"

>

> We had before:

>

> 03e       movq    R10, #9223372036854775807    # long

> 048       andq    [RSI + #16 (8-bit)], R10    # long ! Field:

> org/openjdk/bench/vm/compiler/BitSetAndReset.andq

> 04c       movq    R10, #-9223372036854775808    # long

> 056       orq     [RSI + #24 (8-bit)], R10    # long ! Field:

> org/openjdk/bench/vm/compiler/BitSetAndReset.orq

> 05a       ...

>

> => 28 bytes

>

> 03c       xorl    RAX, RAX    # long

> 03e       movq    R10, #-2147483649    # long

> 048       andq    [RSI + #16 (8-bit)], R10    # long ! Field:

> org/openjdk/bench/vm/compiler/BitSetAndReset.andq

> 04c       movl    R10, #2147483648    # long (unsigned 32-bit)

> 052       orq     [RSI + #24 (8-bit)], R10    # long ! Field:

> org/openjdk/bench/vm/compiler/BitSetAndReset.orq

> 056       ...

>

> => 26 bytes

>

> 03c       andq    [RSI + #16 (8-bit)], #-32769    # long ! Field:

> org/openjdk/bench/vm/compiler/BitSetAndReset.andq

> 044       orq     [RSI + #24 (8-bit)], #32768    # long ! Field:

> org/openjdk/bench/vm/compiler/BitSetAndReset.orq

> 04c       ...

>

> 03c       andq    [RSI + #16 (8-bit)], #-2    # long ! Field:

> org/openjdk/bench/vm/compiler/BitSetAndReset.andq

> 041       orq     [RSI + #24 (8-bit)], #1    # long ! Field:

> org/openjdk/bench/vm/compiler/BitSetAndReset.orq

> 046       ...

>

> Benchmark              Mode  Cnt      Score      Error  Units

> BitSetAndReset.bitSet  avgt    9  78083.773 ? 2182.692  ns/op

>

> And we would have after:

>

> 03c       btrq    [RSI + #16 (8-bit)], log2(not(#9223372036854775807))

>     # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq

> 042       btsq    [RSI + #24 (8-bit)], log2(#-9223372036854775808)

> # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq

> 048       ...

>

> => 12 bytes

>

> 03c       btrq    [RSI + #16 (8-bit)], log2(not(#-2147483649))    #

> long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq

> 042       xorl    RAX, RAX    # long

> 044       movl    R10, #2147483648    # long (unsigned 32-bit)

> 04a       orq     [RSI + #24 (8-bit)], R10    # long ! Field:

> org/openjdk/bench/vm/compiler/BitSetAndReset.orq

> 04e       ...

>

> => 18 bytes

>

> 03c       andq    [RSI + #16 (8-bit)], #-32769    # long ! Field:

> org/openjdk/bench/vm/compiler/BitSetAndReset.andq

> 044       orq     [RSI + #24 (8-bit)], #32768    # long ! Field:

> org/openjdk/bench/vm/compiler/BitSetAndReset.orq

> 04c       ...

>

> 03c       andq    [RSI + #16 (8-bit)], #-2    # long ! Field:

> org/openjdk/bench/vm/compiler/BitSetAndReset.andq

> 041       orq     [RSI + #24 (8-bit)], #1    # long ! Field:

> org/openjdk/bench/vm/compiler/BitSetAndReset.orq

> 046       ...

>

> Benchmark              Mode  Cnt      Score     Error  Units

> BitSetAndReset.bitSet  avgt    9  77355.154 ? 252.503  ns/op

>

> We see a tiny performance gain with BTR/BTS but the major interest

> remains the much better encoding with up to 16 bytes saving for pure

> 64-bit immediates along with a lower register consumption.

>

> Does the patch below look reasonable enough to eventually rebase and

> push it to jdk/submit and to post a RFR maybe soon if all goes well?

>

> Thanks,

> Bernard

>

> diff --git a/src/hotspot/cpu/x86/x86_64.ad

> b/src/hotspot/cpu/x86/x86_64.ad

> --- a/src/hotspot/cpu/x86/x86_64.ad

> +++ b/src/hotspot/cpu/x86/x86_64.ad

> @@ -2069,6 +2069,16 @@

>       }

>     %}

>

> +  enc_class Log2L(immPow2L imm)

> +  %{

> +    emit_d8(cbuf, log2_long($imm$$constant));  %}

> +

> +  enc_class Log2NotL(immPow2NotL imm)  %{

> +    emit_d8(cbuf, log2_long(~$imm$$constant));  %}

> +

>     enc_class opc2_reg(rRegI dst)

>     %{

>       // BSWAP

> @@ -3131,6 +3141,28 @@

>     interface(CONST_INTER);

>   %}

>

> +operand immPow2L()

> +%{

> +  // n should be a pure 64-bit power of 2 immediate.

> +  predicate(is_power_of_2_long(n->get_long()) &&

> log2_long(n->get_long()) > 31);

> +  match(ConL);

> +

> +  op_cost(15);

> +  format %{ %}

> +  interface(CONST_INTER);

> +%}

> +

> +operand immPow2NotL()

> +%{

> +  // n should be a pure 64-bit immediate given that not(n) is a power of 2.

> +  predicate(is_power_of_2_long(~n->get_long()) &&

> log2_long(~n->get_long()) > 30);

> +  match(ConL);

> +

> +  op_cost(15);

> +  format %{ %}

> +  interface(CONST_INTER);

> +%}

> +

>   // Long Immediate zero

>   operand immL0()

>   %{

> @@ -9740,6 +9772,19 @@

>     ins_pipe(ialu_mem_imm);

>   %}

>

> +instruct btrL_mem_imm(memory dst, immPow2NotL src, rFlagsReg cr) %{

> +  match(Set dst (StoreL dst (AndL (LoadL dst) src)));

> +  effect(KILL cr);

> +

> +  ins_cost(125);

> +  format %{ "btrq    $dst, log2(not($src))\t# long" %}

> +  opcode(0x0F, 0xBA, 0x06);

> +  ins_encode(REX_mem_wide(dst), OpcP, OpcS,

> +             RM_opc_mem(tertiary, dst), Log2NotL(src));

> +  ins_pipe(ialu_mem_imm);

> +%}

> +

>   // BMI1 instructions

>   instruct andnL_rReg_rReg_mem(rRegL dst, rRegL src1, memory src2,

> immL_M1 minus_1, rFlagsReg cr) %{

>     match(Set dst (AndL (XorL src1 minus_1) (LoadL src2))); @@ -9933,6

> +9978,19 @@

>     ins_pipe(ialu_mem_imm);

>   %}

>

> +instruct btsL_mem_imm(memory dst, immPow2L src, rFlagsReg cr) %{

> +  match(Set dst (StoreL dst (OrL (LoadL dst) src)));

> +  effect(KILL cr);

> +

> +  ins_cost(125);

> +  format %{ "btsq    $dst, log2($src)\t# long" %}

> +  opcode(0x0F, 0xBA, 0x05);

> +  ins_encode(REX_mem_wide(dst), OpcP, OpcS,

> +             RM_opc_mem(tertiary, dst), Log2L(src));

> +  ins_pipe(ialu_mem_imm);

> +%}

> +

>   // Xor Instructions

>   // Xor Register with Register

>   instruct xorL_rReg(rRegL dst, rRegL src, rFlagsReg cr)

>

From gromero at linux.vnet.ibm.com  Thu Nov  7 00:53:32 2019
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Wed, 6 Nov 2019 21:53:32 -0300
Subject: [8u] RFR for backport of 8198894 (CRC32 1/4): [PPC64] More
 generic vector CRC implementation (v2)
In-Reply-To: <HE1PR0201MB24752CF818A0E90380EB7C7E9A790@HE1PR0201MB2475.eurprd02.prod.outlook.com>
References: <0dc83fcb-4e09-5841-04be-aee615e5a7fd@linux.vnet.ibm.com>
 <VI1PR0201MB2479B9E369C33E4D46B8F9149A680@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <67e6e482-df56-27d0-da20-7968615f3ea1@linux.vnet.ibm.com>
 <VI1PR0201MB247951F074AA4A598CBF8B8B9A6A0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <b5e148b7-1493-24f2-ecf2-89ce72f6df38@linux.vnet.ibm.com>
 <HE1PR0201MB24752CF818A0E90380EB7C7E9A790@HE1PR0201MB2475.eurprd02.prod.outlook.com>
Message-ID: <db56f7b6-af33-de6b-3ecf-86447e42f228@linux.vnet.ibm.com>

Hi Martin,

On 11/06/2019 12:15 PM, Doerr, Martin wrote:
> Hi Gustavo,
> 
>> [PPC64] More generic vector CRC implementation                           (1/4)
>> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8198894/
> 
> This seems to be the version I had already reviewed. I'm still ok with it.

Yes, nothing changed. I posted it again for completeness, but I see now
it can cause confusing. I'll avoid doing it in the future.


>> [PPC64] Possibly unreliable stack frame resizing in template interpreter (2/4)
>> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8216376/
> 
> Normally, backports should get handled in separate backport RFRs, but the manual change in this version could be considered trivial, so I don't insist on separate RFR.
> Looks good, now.

I see, initially I posted the patches separately. I should have kept them
separated.


>> [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays
>> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8216060/   (3/4)
> 
> Almost like above. I'd leave at least the CRC32C defines in the code (stubRoutines_ppc). They don't disturb. Why should we introduce additional diffs?
Got it. I'll send a separate RFR.


>> [PPC64] Cleanup non-vector version of CRC32
>> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8217459/   (4/4)
> 
> This one contains a part of another shared code change. So I can't review it as part of this RFR.
> Needs to get reviewed separately. Backport of the original change which introduced LITTLE_ENDIAN_ONLY etc. should get evaluated.

Good point, Martin. I overlooked the original change. It's a fix from Volker to
consider big-endian function descriptors when walking the stack:

https://bugs.openjdk.java.net/browse/JDK-8206173

Fix applies cleanly (except for the path adjustment). It's shared code but in
effect it's PPC64-only. I'll take care of backporting it separate.

And a nit in the test pointed out by Volker:

http://hg.openjdk.java.net/jdk/jdk/file/5bc2e9c9604d/test/hotspot/jtreg/runtime/ElfDecoder/TestElfDirectRead.java#l39

It should read "function descriptors" instead of "file descriptors", right?

I'll send a patch to jdk/jdk fixing that comment too.

Thank you!

Best regards,
Gustavo

From john.r.rose at oracle.com  Thu Nov  7 01:01:34 2019
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 6 Nov 2019 17:01:34 -0800
Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <CH2PR11MB4280661F4597DEF39127D45FEF780@CH2PR11MB4280.namprd11.prod.outlook.com>
References: <CAEgw74CWKFCH9c744sJcAEf4_9-33vk9HBGosNH4QwgEH4nKYw@mail.gmail.com>
 <dbdbd1cd-c564-20de-a7ac-f31807b02cc5@oracle.com>
 <CH2PR11MB4280661F4597DEF39127D45FEF780@CH2PR11MB4280.namprd11.prod.outlook.com>
Message-ID: <58CC373D-E70D-4503-8578-6110F3FC04EE@oracle.com>

I recently saw LLVM compile a classification switch into a really tidy BTR instruction,
something like this:

  switch (ch) {
  case ';': case '/': case '.': case '[':  return 0;
  default: return 1;
  }
=>
  ? range check ?
  movabsq	0x200000002003, %rcx
  btq	%rdi, %rcx

It made me wish for this change, plus some more to switch itself.
Given Sandhya?s report, though, BTR may only be helpful in limited
cases.  In the case above, it subsumes a shift instruction.

Bernard?s JMH experiment suggests something else is going on besides
the throughput difference which Sandhya cites.  Maybe it?s a benchmark
artifact, or maybe it?s a good effect from smaller code.  I suggest jamming
more back-to-back BTRs together, to see if the throughput effect appears.

? John

On Nov 6, 2019, at 4:34 PM, Viswanathan, Sandhya <sandhya.viswanathan at intel.com> wrote:
> 
> Hi Vladimir/Bernard,
> 
> 
> 
> I don?t see any restrictions/limitations on these instructions other than the fact that the ?long? operation is only supported on 64-bit format as usual so should be restricted to 64-bit JVM only.
> 
> The code size improvement that Bernard demonstrates is significant for operation on longs.
> 
> It looks like the throughput for AND/OR is better than BTR/BTS  (0.25 vs 0.5) though. Please refer Table C-17 in the document below:


From igor.ignatyev at oracle.com  Thu Nov  7 04:23:24 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 6 Nov 2019 20:23:24 -0800
Subject: RFR(S) : 8230364 : [JVMCI] a number of JVMCI tests are not jtreg
 enabled
Message-ID: <82DF702D-19C9-486B-8B5D-82C4F94D0A95@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8230364/webrev.02/
> 102 lines changed: 72 ins; 9 del; 21 mod;

Hi all,

could you please review this small patch which adds jtreg test descriptions to all tests in compiler/jvmci/jdk.vm.ci.hotspot.test/src?
to make it work, the patch also:
 - replaces junit ceremonies w/ testng ceremonies;
 - changes TestHotSpotJVMCIRuntime to use platform classLoader instead of ext. loader b/c ext. loader (and internal classes used by the test)  got removed in jdk9;
 - temporary excludes TestTranslatedException. the test fails b/c decoded exception doesn't have information about modules. 8233745 is going to update jdk.vm.ci.hotspot.TranslatedException and remove @ignore from the test.

JBS: https://bugs.openjdk.java.net/browse/JDK-8230364
webrev: http://cr.openjdk.java.net/~iignatyev//8230364/webrev.02/
testing: "added" tests (test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.hotspot.test/)

Thanks,
-- Igor

From vladimir.kozlov at oracle.com  Thu Nov  7 05:20:40 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 6 Nov 2019 21:20:40 -0800
Subject: RFR(S) : 8230364 : [JVMCI] a number of JVMCI tests are not jtreg
 enabled
In-Reply-To: <82DF702D-19C9-486B-8B5D-82C4F94D0A95@oracle.com>
References: <82DF702D-19C9-486B-8B5D-82C4F94D0A95@oracle.com>
Message-ID: <09AB4909-D013-44C3-8A80-4334A5F52902@oracle.com>

Looks good.

Thanks
Vladimir

> On Nov 6, 2019, at 8:23 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~iignatyev//8230364/webrev.02/
>> 102 lines changed: 72 ins; 9 del; 21 mod;
> 
> Hi all,
> 
> could you please review this small patch which adds jtreg test descriptions to all tests in compiler/jvmci/jdk.vm.ci.hotspot.test/src?
> to make it work, the patch also:
> - replaces junit ceremonies w/ testng ceremonies;
> - changes TestHotSpotJVMCIRuntime to use platform classLoader instead of ext. loader b/c ext. loader (and internal classes used by the test)  got removed in jdk9;
> - temporary excludes TestTranslatedException. the test fails b/c decoded exception doesn't have information about modules. 8233745 is going to update jdk.vm.ci.hotspot.TranslatedException and remove @ignore from the test.
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8230364
> webrev: http://cr.openjdk.java.net/~iignatyev//8230364/webrev.02/
> testing: "added" tests (test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.hotspot.test/)
> 
> Thanks,
> -- Igor


From tobias.hartmann at oracle.com  Thu Nov  7 05:54:45 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 7 Nov 2019 06:54:45 +0100
Subject: RFR(XS): 8233491: Crash in AdapterHandlerLibrary::get_adapter
 with CDS due to code cache exhaustion
In-Reply-To: <af6e664b-8942-96b3-aba2-220679ba4b18@oracle.com>
References: <c44d6d02-8244-2ef3-ac8b-ba7f12e65eb4@oracle.com>
 <af6e664b-8942-96b3-aba2-220679ba4b18@oracle.com>
Message-ID: <3e45e244-31a0-3bd2-4b6c-acd1478ace5f@oracle.com>

Thanks Vladimir.

Best regards,
Tobias

On 06.11.19 19:01, Vladimir Kozlov wrote:
> CC to runtime group too.
> 
> Looks good to me.
> 
> Thanks,
> Vladimir
> 
> On 11/6/19 5:34 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8233491
>> http://cr.openjdk.java.net/~thartmann/8233491/webrev.00/
>>
>> When running a stress test with CDS, we fail to create adapters when linking a method from a shared
>> class because the code cache is full. This case is not properly handled by the CDS specific code and
>> instead of throwing a VirtualMachineError, we crash because "entry" is NULL.
>>
>> I'm able to spuriously reproduce this with a test (see [1]) but since the problem depends on the
>> class loading sequence, I was not able to make it more reliable or convert it to a robust jtreg
>> test. However, I've verified that the patch fixes the problem.
>>
>> Thanks,
>> Tobias
>>
>> [1]
>> https://bugs.openjdk.java.net/browse/JDK-8233491?focusedCommentId=14298462&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14298462
>>
>>

From martin.doerr at sap.com  Thu Nov  7 09:03:50 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 7 Nov 2019 09:03:50 +0000
Subject: [8u] RFR for backport of 8198894 (CRC32 1/4): [PPC64] More
 generic vector CRC implementation (v2)
In-Reply-To: <db56f7b6-af33-de6b-3ecf-86447e42f228@linux.vnet.ibm.com>
References: <0dc83fcb-4e09-5841-04be-aee615e5a7fd@linux.vnet.ibm.com>
 <VI1PR0201MB2479B9E369C33E4D46B8F9149A680@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <67e6e482-df56-27d0-da20-7968615f3ea1@linux.vnet.ibm.com>
 <VI1PR0201MB247951F074AA4A598CBF8B8B9A6A0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <b5e148b7-1493-24f2-ecf2-89ce72f6df38@linux.vnet.ibm.com>
 <HE1PR0201MB24752CF818A0E90380EB7C7E9A790@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <db56f7b6-af33-de6b-3ecf-86447e42f228@linux.vnet.ibm.com>
Message-ID: <VI1PR0201MB24793F06D0989FB829AC09579A780@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Gustavo,

> Good point, Martin. I overlooked the original change. It's a fix from Volker to
> consider big-endian function descriptors when walking the stack:
> 
> https://bugs.openjdk.java.net/browse/JDK-8206173
> 
> Fix applies cleanly (except for the path adjustment). It's shared code but in
> effect it's PPC64-only. I'll take care of backporting it separate.
> 
> And a nit in the test pointed out by Volker:
> 
> http://hg.openjdk.java.net/jdk/jdk/file/5bc2e9c9604d/test/hotspot/jtreg/ru
> ntime/ElfDecoder/TestElfDirectRead.java#l39
> 
> It should read "function descriptors" instead of "file descriptors", right?

Comment fixes should not be done in backports. If you would like to fix it, it should get fixed in jdk/jdk and then backported.

Please backport as it is. You won't need a review if you don't change anything except the trivial path adaptations.

Best regards,
Martin


> -----Original Message-----
> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> Sent: Donnerstag, 7. November 2019 01:54
> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-compiler-
> dev at openjdk.java.net
> Cc: jdk8u-dev at openjdk.java.net
> Subject: Re: [8u] RFR for backport of 8198894 (CRC32 1/4): [PPC64] More
> generic vector CRC implementation (v2)
> 
> Hi Martin,
> 
> On 11/06/2019 12:15 PM, Doerr, Martin wrote:
> > Hi Gustavo,
> >
> >> [PPC64] More generic vector CRC implementation                           (1/4)
> >> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-
> review/v2/8198894/
> >
> > This seems to be the version I had already reviewed. I'm still ok with it.
> 
> Yes, nothing changed. I posted it again for completeness, but I see now
> it can cause confusing. I'll avoid doing it in the future.
> 
> 
> >> [PPC64] Possibly unreliable stack frame resizing in template interpreter
> (2/4)
> >> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-
> review/v2/8216376/
> >
> > Normally, backports should get handled in separate backport RFRs, but the
> manual change in this version could be considered trivial, so I don't insist on
> separate RFR.
> > Looks good, now.
> 
> I see, initially I posted the patches separately. I should have kept them
> separated.
> 
> 
> >> [PPC64] Vector CRC implementation should be used by interpreter and be
> faster for short arrays
> >> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-
> review/v2/8216060/   (3/4)
> >
> > Almost like above. I'd leave at least the CRC32C defines in the code
> (stubRoutines_ppc). They don't disturb. Why should we introduce additional
> diffs?
> Got it. I'll send a separate RFR.
> 
> 
> >> [PPC64] Cleanup non-vector version of CRC32
> >> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-
> review/v2/8217459/   (4/4)
> >
> > This one contains a part of another shared code change. So I can't review it
> as part of this RFR.
> > Needs to get reviewed separately. Backport of the original change which
> introduced LITTLE_ENDIAN_ONLY etc. should get evaluated.
> 
> Good point, Martin. I overlooked the original change. It's a fix from Volker to
> consider big-endian function descriptors when walking the stack:
> 
> https://bugs.openjdk.java.net/browse/JDK-8206173
> 
> Fix applies cleanly (except for the path adjustment). It's shared code but in
> effect it's PPC64-only. I'll take care of backporting it separate.
> 
> And a nit in the test pointed out by Volker:
> 
> http://hg.openjdk.java.net/jdk/jdk/file/5bc2e9c9604d/test/hotspot/jtreg/ru
> ntime/ElfDecoder/TestElfDirectRead.java#l39
> 
> It should read "function descriptors" instead of "file descriptors", right?
> 
> I'll send a patch to jdk/jdk fixing that comment too.
> 
> Thank you!
> 
> Best regards,
> Gustavo

From martin.doerr at sap.com  Thu Nov  7 09:08:38 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 7 Nov 2019 09:08:38 +0000
Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread
In-Reply-To: <84ef3a8c-5005-6529-5192-b9214e0348ac@oracle.com>
References: <VI1PR0201MB24797F385592A8C572B4064E9A900@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <HE1PR0201MB24755374CE7618AA8545A2309A6B0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com>
 <A0870A55-0738-4CC6-AF3C-50B6E2324D3F@oracle.com>
 <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com>
 <VI1PR0201MB24790137F2D7A941A33CE4949A660@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d8990c59-b51e-1c21-3e7c-272d287fc711@oracle.com>
 <VI1PR0201MB2479DDFCD2E55541481E65ED9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com>
 <VI1PR0201MB247928CE867614F8272E5D719A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com>
 <VI1PR0201MB247921C496C30C638B0959279A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <8BCAE092-9E05-4B38-9D41-2F3AC889C0AA@oracle.com>
 <HE1PR0201MB2475B62202B8D5187D6C3CAC9A790@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <84ef3a8c-5005-6529-5192-b9214e0348ac@oracle.com>
Message-ID: <VI1PR0201MB2479F3BD3483ABF14F0A36109A780@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi David,

get_log only accesses the executing thread's own oop and the ones before it. So it's ensured by the algorithm that all accessed oops are in live handles.
The problem is in can_remove when not holding the lock. For that, webrev.04 avoids accessing the oop of the last compiler thread in the case in which the lock is not held.

Best regards,
Martin


> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Mittwoch, 6. November 2019 11:15
> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
> <kim.barrett at oracle.com>
> Cc: dean.long at oracle.com; Vladimir Kozlov (vladimir.kozlov at oracle.com)
> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> 
> Hi Martin,
> 
> On 6/11/2019 7:12 pm, Doerr, Martin wrote:
> > Hi Kim,
> >
> > thanks for confirming.
> >
> >
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> ev.04/
> > already avoids access to freed handles.
> 
> Sorry I missed your earlier reference to this version.
> 
> So the expectation here is that all accesses to these arrays are guarded
> by the CompileThread_lock, but that doesn't seem to hold for get_log ?
> 
> Thanks,
> David
> -----
> 
> > I don't really like the complexity of this code.
> > Replacing oops in handles would have been much more simple.
> > But I can live with either version.
> >
> > Best regards,
> > Martin
> >
> >
> >> -----Original Message-----
> >> From: Kim Barrett <kim.barrett at oracle.com>
> >> Sent: Mittwoch, 6. November 2019 04:09
> >> To: Doerr, Martin <martin.doerr at sap.com>
> >> Cc: David Holmes <david.holmes at oracle.com>; dean.long at oracle.com;
> >> Vladimir Kozlov (vladimir.kozlov at oracle.com)
> >> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> >>
> >>> On Nov 5, 2019, at 3:40 AM, Doerr, Martin <martin.doerr at sap.com>
> wrote:
> >>
> >> Coming back in, because this seems to be going off into the weeds again.
> >>
> >>>> I don't understand what you mean. If a compiler thread holds an oop,
> any
> >>>> oop, it must hold it in a Handle to ensure it can't be gc'd.
> >>>
> >>> The problem is not related to gc.
> >>> My change introduces destroy_global for the handles. This means that
> the
> >> OopStorage portion which has held the oop can get freed.
> >>> However, other compiler threads are running concurrently. They may
> >> execute code which reads the oop from the handle which is freed by this
> >> thread.
> >>> Reading stale data is not a problem here, but reading freed memory may
> >> assert or even crash in general.
> >>> I can't see how OopStorage supports reading from handles which were
> >> freed by destroy_global.
> >>
> >> So don't do that!
> >>
> >> OopStorage isn't magic. If you are going to look at an OopStorage
> >> handle, you have to ensure there won't be concurrent deletion. Use
> >> locks or some safe memory reclamation protocol. (GlobalCounter might
> >> be used here, but it depends a lot on what the iterations are doing. A
> >> reference counting mechanism is another possibility.) This is no
> >> different from any other resource management.
> >>
> >>> I think it would be safe if the freeing only occurred at safepoints, but I
> don't
> >> think this is the case.
> >>
> >> Assuming the iteration didn?t happen at safepoints (which is just a way to
> >> make the iteration and
> >> deletion not concurrent).  And I agree that isn?t the case with the current
> >> code.
> >

From rwestrel at redhat.com  Thu Nov  7 09:46:26 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 07 Nov 2019 10:46:26 +0100
Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out
In-Reply-To: <f86c6ba8-b8b7-dc46-c29a-86c10cb04861@oracle.com>
References: <878spbc0c8.fsf@redhat.com>
 <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com>
 <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com>
 <f86c6ba8-b8b7-dc46-c29a-86c10cb04861@oracle.com>
Message-ID: <87imnwdnnx.fsf@redhat.com>


> I am fine with this conservative fix (bailout optimization). It is good that performance is not affected.

Thanks for the review.

Roland.

From david.holmes at oracle.com  Thu Nov  7 09:51:10 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 7 Nov 2019 19:51:10 +1000
Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread
In-Reply-To: <VI1PR0201MB2479F3BD3483ABF14F0A36109A780@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB24797F385592A8C572B4064E9A900@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <A0870A55-0738-4CC6-AF3C-50B6E2324D3F@oracle.com>
 <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com>
 <VI1PR0201MB24790137F2D7A941A33CE4949A660@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d8990c59-b51e-1c21-3e7c-272d287fc711@oracle.com>
 <VI1PR0201MB2479DDFCD2E55541481E65ED9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com>
 <VI1PR0201MB247928CE867614F8272E5D719A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com>
 <VI1PR0201MB247921C496C30C638B0959279A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <8BCAE092-9E05-4B38-9D41-2F3AC889C0AA@oracle.com>
 <HE1PR0201MB2475B62202B8D5187D6C3CAC9A790@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <84ef3a8c-5005-6529-5192-b9214e0348ac@oracle.com>
 <VI1PR0201MB2479F3BD3483ABF14F0A36109A780@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <d147c777-b05f-3c65-90eb-d35cea0beed1@oracle.com>

On 7/11/2019 7:08 pm, Doerr, Martin wrote:
> Hi David,
> 
> get_log only accesses the executing thread's own oop and the ones before it. So it's ensured by the algorithm that all accessed oops are in live handles.

Okay I see that now.

Thanks,
David

> The problem is in can_remove when not holding the lock. For that, webrev.04 avoids accessing the oop of the last compiler thread in the case in which the lock is not held.
> 
> Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Mittwoch, 6. November 2019 11:15
>> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
>> <kim.barrett at oracle.com>
>> Cc: dean.long at oracle.com; Vladimir Kozlov (vladimir.kozlov at oracle.com)
>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>
>> Hi Martin,
>>
>> On 6/11/2019 7:12 pm, Doerr, Martin wrote:
>>> Hi Kim,
>>>
>>> thanks for confirming.
>>>
>>>
>> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
>> ev.04/
>>> already avoids access to freed handles.
>>
>> Sorry I missed your earlier reference to this version.
>>
>> So the expectation here is that all accesses to these arrays are guarded
>> by the CompileThread_lock, but that doesn't seem to hold for get_log ?
>>
>> Thanks,
>> David
>> -----
>>
>>> I don't really like the complexity of this code.
>>> Replacing oops in handles would have been much more simple.
>>> But I can live with either version.
>>>
>>> Best regards,
>>> Martin
>>>
>>>
>>>> -----Original Message-----
>>>> From: Kim Barrett <kim.barrett at oracle.com>
>>>> Sent: Mittwoch, 6. November 2019 04:09
>>>> To: Doerr, Martin <martin.doerr at sap.com>
>>>> Cc: David Holmes <david.holmes at oracle.com>; dean.long at oracle.com;
>>>> Vladimir Kozlov (vladimir.kozlov at oracle.com)
>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>>>
>>>>> On Nov 5, 2019, at 3:40 AM, Doerr, Martin <martin.doerr at sap.com>
>> wrote:
>>>>
>>>> Coming back in, because this seems to be going off into the weeds again.
>>>>
>>>>>> I don't understand what you mean. If a compiler thread holds an oop,
>> any
>>>>>> oop, it must hold it in a Handle to ensure it can't be gc'd.
>>>>>
>>>>> The problem is not related to gc.
>>>>> My change introduces destroy_global for the handles. This means that
>> the
>>>> OopStorage portion which has held the oop can get freed.
>>>>> However, other compiler threads are running concurrently. They may
>>>> execute code which reads the oop from the handle which is freed by this
>>>> thread.
>>>>> Reading stale data is not a problem here, but reading freed memory may
>>>> assert or even crash in general.
>>>>> I can't see how OopStorage supports reading from handles which were
>>>> freed by destroy_global.
>>>>
>>>> So don't do that!
>>>>
>>>> OopStorage isn't magic. If you are going to look at an OopStorage
>>>> handle, you have to ensure there won't be concurrent deletion. Use
>>>> locks or some safe memory reclamation protocol. (GlobalCounter might
>>>> be used here, but it depends a lot on what the iterations are doing. A
>>>> reference counting mechanism is another possibility.) This is no
>>>> different from any other resource management.
>>>>
>>>>> I think it would be safe if the freeing only occurred at safepoints, but I
>> don't
>>>> think this is the case.
>>>>
>>>> Assuming the iteration didn?t happen at safepoints (which is just a way to
>>>> make the iteration and
>>>> deletion not concurrent).  And I agree that isn?t the case with the current
>>>> code.
>>>

From tobias.hartmann at oracle.com  Thu Nov  7 11:34:40 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 7 Nov 2019 12:34:40 +0100
Subject: [14] RFR(S): 8233788: Remove useless asserts in
 PhaseCFG::insert_anti_dependences
Message-ID: <b04cdfba-50da-227c-b0b4-7a35460977c1@oracle.com>

Hi,

please review the following cleanup:
https://bugs.openjdk.java.net/browse/JDK-8233788
http://cr.openjdk.java.net/~thartmann/8233788/webrev.00/

load_alias_idx can never be 0 and even if it could be, one of the asserts would fail because the
opcode check handles only a single type. In fact, all these intrinsic nodes have adr_type() == NULL
which maps to AliasIdxTop.

Thanks,
Tobias

From erik.osterlund at oracle.com  Thu Nov  7 13:49:00 2019
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Thu, 7 Nov 2019 14:49:00 +0100
Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted to a
 load for strong refs
Message-ID: <db3d2acd-3f06-5660-a9d6-7b04dda99204@oracle.com>

Hi,

We have noticed problems with the one single place where we let C2 touch 
the graph while injecting our load barriers. Right before tagging a load 
as needing a load barrier, it is GVN transformed. The problem with this 
is that if we emit a strong field load, the GVN transformation can see 
through that load, by following it through the store that set its value, 
and find that the field value really came from the load of a weak 
Reference.get intrinsic. In this scenario, the load we get out from the 
GVN transformation needs weak barriers, yet we override it with strong 
barriers, as that was the semantics of the access being parsed. Sigh. We 
already have code that tries to determine if the load we got out from 
the GVN transformation looks like a load that was created in the 
BarrierSetC2 factory function, so one way of solving this is to refine 
that logic that tries to determine if this was the load we created 
before the transformation or not. But I felt like a better solution is 
to finish constructing the access with all the intended properties 
*before* transformation.

I massaged the code so that the GC barrier data of accesses with load 
barriers gets passed in to the factory functions that create the access, 
right before the transformation. This way, we construct the access with 
the intended semantics where it is being created (parser or macro 
expansion for field accesses in clone intrinsics). Then we do not have 
to touch it after the GVN transformation.

It does seem like there could be similar problems from other GCs, but in 
e.g. G1, the consequences are weird suboptimal code instead of anything 
dangerous happening. For example, we can generate SATB buffering code 
required by G1 Reference.get() intrinsics for strong accesses, due to 
GVN handing out earlier accesses with different semantics. Perhaps that 
should be looked into separately as well. But that investigation is 
outside of the scope of this bug fix.

Webrev:
http://cr.openjdk.java.net/~eosterlund/8233506/webrev.00/

Bug:
https://bugs.openjdk.java.net/browse/JDK-8233506

Thanks,
/Erik

From sgehwolf at redhat.com  Thu Nov  7 15:02:36 2019
From: sgehwolf at redhat.com (Severin Gehwolf)
Date: Thu, 07 Nov 2019 16:02:36 +0100
Subject: [8u] RFR: 8233023: assert(Opcode() == mem->Opcode() ||
 phase->C->get_alias_index(adr_type()) == Compile::AliasIdxRaw) failed: no
 mismatched stores, except on raw memory
In-Reply-To: <dff209ce-4356-25f0-1b2b-dab8a5f16dd5@redhat.com>
References: <7a2d50793120bdeb86d0047fd09db18c725328e0.camel@redhat.com>
 <dff209ce-4356-25f0-1b2b-dab8a5f16dd5@redhat.com>
Message-ID: <7aaffb34ba9af2e5508f423f7f3dddf75e5ea9cc.camel@redhat.com>

On Wed, 2019-11-06 at 16:51 +0000, Andrew John Hughes wrote:
> On 30/10/2019 09:41, Severin Gehwolf wrote:
> > Hi,
> > 
> > Could I please get a review of this 8u only issue? The reason a
> > fastdebug build of latest OpenJDK 8u asserts for the dec-tree benchmark
> > of the renaissance suite is because the 8u backport of JDK-8140309 was
> > missing this hunk from JDK 9[1]:
> > 
> > +             (Opcode() == Op_StoreL && st->Opcode() == Op_StoreI) || // expanded ClearArrayNode
> > +             (is_mismatched_access() || st->as_Store()->is_mismatched_access()),
> > 
> > I had a closer look and there doesn't seem to be missing anything else.
> > The proposed fix is to amend the assert condition in the appropriate
> > place, which brings 8u in line with JDK 9 code where the failure isn't
> > observed.
> > 
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8233023
> > webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8233023/01/webrev/
> > 
> > Testing: 8u tier1 test set with fastdebug build on x86_64 Linux. No new
> > failures. dec-tree benchmark now runs successfully on an 8u fastdebug
> > build.
> > 
> > Thoughts?
> > 
> > Thanks,
> > Severin
> > 
> > [1] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/4bee38ba018c
> > 
> 
> I compared the two patches and this missing hunk does stand out. So the
> patch looks fine in that respect.

Thanks.

> I notice they also didn't backport the testcase to 8u. Any thoughts on
> including that?

The test uses Unsafe.putXXXUnaligned() which aren't available in JDK 8.
So I'm not sure how the test would work. More thoughts?

Thanks,
Severin


From adinn at redhat.com  Thu Nov  7 16:34:15 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 7 Nov 2019 16:34:15 +0000
Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free
 blocks
In-Reply-To: <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com>
References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com>
 <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com>
 <B89E11F0-98EE-49F1-87EB-FC22CB7D54A2@sap.com>
 <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com>
 <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com>
 <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com>
 <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com>
 <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com>
 <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com>
 <dd34fa81-b027-4353-df0f-daedf9911b6d@redhat.com>
 <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com>
Message-ID: <bfc79f43-c00d-3f0c-8025-dbe19bc582e1@redhat.com>

On 04/11/2019 15:35, Schmidt, Lutz wrote:
> thank you for your thoughts. I do not agree to your conclusion, 
> though.
> 
> There are two bottlenecks in the CodeHeap management code. One is in
> CodeHeap::mark_segmap_as_used(), uncovered by
> OverflowCodeCacheTest.java.  The other is in 
> CodeHeap::add_to_freelist(), uncovered by StressCodeCacheTest.java.
> 
> Both bottlenecks are tackled by the recommended changeset.
  . . .
> CodeHeap::add_to_freelist() is still O(n*n), with n being the free 
> list length. But the kick-in point of the non-linearity could be 
> significantly shifted towards larger n. The time reduction from 
> approx. 8 seconds to 160 milliseconds supports this statement.

Ah sorry, I was not clear from your original post that the proposed
change had significantly improved the time spent in free list management
in the second test by significantly cutting down the free list size. As
you say, a reduction factor of 1/K in list size will give a 1/K*K
reduction in execution time. Since this test is a lot nearer to reality
than the overflow test I think the current result is perhaps enough to
justify its value.

> I agree it would be helpful to have a "real-world" example showing 
> some improvement. Providing such evidence is hard, though. I could 
> instrument the code and print some values form time to time. It's 
> certain this additional output will mess up success/failure decisions
> in our test environment. Not sure everybody likes that. But I will
> give it a try and take the hits. This will be a multi-day effort.

Well, that would be nice to have but not if it stops other work. The one
thing about the Stress test that I fear may be 'unreal' is the
potentially over-high probability of generating long(ish) runs of
adjacent free segments. That might be giving an artificial win that we
will not in fact see. However, given the current numbers I'd be happy to
risk that and let this patch go in as is.

> On a general note, I am always uncomfortable knowing of a O(n*n) 
> effort, in particular when it could be removed or at least tamed 
> considerably. Experience tells (at least to me) that, at some point 
> in time, n will be large enough to hurt.

Well, yes, although salesman do travel /and/ make money ... ;-)

> I'll be back.

Sure, thanks for following up. This is all very interesting.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From gromero at linux.vnet.ibm.com  Thu Nov  7 18:36:46 2019
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Thu, 7 Nov 2019 15:36:46 -0300
Subject: [8u] RFR for backport of 8198894 (CRC32 1/4): [PPC64] More
 generic vector CRC implementation (v2)
In-Reply-To: <VI1PR0201MB24793F06D0989FB829AC09579A780@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <0dc83fcb-4e09-5841-04be-aee615e5a7fd@linux.vnet.ibm.com>
 <VI1PR0201MB2479B9E369C33E4D46B8F9149A680@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <67e6e482-df56-27d0-da20-7968615f3ea1@linux.vnet.ibm.com>
 <VI1PR0201MB247951F074AA4A598CBF8B8B9A6A0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <b5e148b7-1493-24f2-ecf2-89ce72f6df38@linux.vnet.ibm.com>
 <HE1PR0201MB24752CF818A0E90380EB7C7E9A790@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <db56f7b6-af33-de6b-3ecf-86447e42f228@linux.vnet.ibm.com>
 <VI1PR0201MB24793F06D0989FB829AC09579A780@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <26fba4a1-0d53-bf0c-42cd-09c724bf833b@linux.vnet.ibm.com>

Hello Martin,

On 11/07/2019 06:03 AM, Doerr, Martin wrote:
> Hi Gustavo,
> 
>> Good point, Martin. I overlooked the original change. It's a fix from Volker to
>> consider big-endian function descriptors when walking the stack:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8206173
>>
>> Fix applies cleanly (except for the path adjustment). It's shared code but in
>> effect it's PPC64-only. I'll take care of backporting it separate.
>>
>> And a nit in the test pointed out by Volker:
>>
>> http://hg.openjdk.java.net/jdk/jdk/file/5bc2e9c9604d/test/hotspot/jtreg/ru
>> ntime/ElfDecoder/TestElfDirectRead.java#l39
>>
>> It should read "function descriptors" instead of "file descriptors", right?
> 
> Comment fixes should not be done in backports. If you would like to fix it, it should get fixed in jdk/jdk and then backported.

Yeah, that's what I meant (implicitly at least). I'm aware of that. I plan to
fix it only in jdk/jdk, also the test is not part of Volker's fix, anyway I
think that nit is not worth a backport.


> Please backport as it is. You won't need a review if you don't change anything except the trivial path adaptations.

Yeah, that's why I mentioned it applied cleanly, except for path adaptations :)


Thanks.

Best regards,
Gustavo

From vladimir.kozlov at oracle.com  Thu Nov  7 19:28:13 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 7 Nov 2019 11:28:13 -0800
Subject: [14] RFR(S): 8233788: Remove useless asserts in
 PhaseCFG::insert_anti_dependences
In-Reply-To: <b04cdfba-50da-227c-b0b4-7a35460977c1@oracle.com>
References: <b04cdfba-50da-227c-b0b4-7a35460977c1@oracle.com>
Message-ID: <f4dcd527-9d74-31e8-afa3-b01da6746d21@oracle.com>

Looks good.

Thanks,
Vladimir

On 11/7/19 3:34 AM, Tobias Hartmann wrote:
> Hi,
> 
> please review the following cleanup:
> https://bugs.openjdk.java.net/browse/JDK-8233788
> http://cr.openjdk.java.net/~thartmann/8233788/webrev.00/
> 
> load_alias_idx can never be 0 and even if it could be, one of the asserts would fail because the
> opcode check handles only a single type. In fact, all these intrinsic nodes have adr_type() == NULL
> which maps to AliasIdxTop.
> 
> Thanks,
> Tobias
> 

From bsrbnd at gmail.com  Thu Nov  7 19:30:09 2019
From: bsrbnd at gmail.com (B. Blaser)
Date: Thu, 7 Nov 2019 20:30:09 +0100
Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <58CC373D-E70D-4503-8578-6110F3FC04EE@oracle.com>
References: <CAEgw74CWKFCH9c744sJcAEf4_9-33vk9HBGosNH4QwgEH4nKYw@mail.gmail.com>
 <dbdbd1cd-c564-20de-a7ac-f31807b02cc5@oracle.com>
 <CH2PR11MB4280661F4597DEF39127D45FEF780@CH2PR11MB4280.namprd11.prod.outlook.com>
 <58CC373D-E70D-4503-8578-6110F3FC04EE@oracle.com>
Message-ID: <CAEgw74DKy-2kQqd-PyK62DR5baWund-F60jM5DjpAoJtKUe_TA@mail.gmail.com>

Hi Vladimir, Sandhya and John,

Thanks for your respective answers.

The suggested fix focuses on x86_64 and pure 64-bit immediates which
means that all other cases are left unchanged as shown by the initial
benchmark, for example:

andq &= ~MASK00;
orq |= MASK00;

would still give:

03c       andq    [RSI + #16 (8-bit)], #-2    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.andq
041       orq     [RSI + #24 (8-bit)], #1    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.orq
046       ...

Now, the interesting point is that pure 64-bit immediates (which
cannot be treated as sign-extended 8/32-bit values) are assembled
using two instructions (not one) because AND/OR cannot be used
directly in such cases, for example:

andq &= ~MASK63;
orq |= MASK63;

gives:

03e       movq    R10, #9223372036854775807    # long
048       andq    [RSI + #16 (8-bit)], R10    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.andq
04c       movq    R10, #-9223372036854775808    # long
056       orq     [RSI + #24 (8-bit)], R10    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.orq
05a       ...

So, even though Sandhya mentioned a better throughput for AND/OR, the
additional MOV cost (I didn't find it in table C-17 but I assume
something close to MOVS/Z with latency=1/throughput=0.25) seems to be
in favor of a sole BTR/BTS instruction as shown by the initial
benchmark.

However, as John suggested, I tried another benchmark which focuses on
the throughput to make sure there isn't any regression in such
situations:

    private long orq63, orq62, orq61, orq60;

    @Benchmark
    public void throughput(Blackhole bh) {
        for (int i=0; i<COUNT; i++) {
            orq63 = orq62 = orq61 = orq60 = 0;
            bh.consume(testTp());
        }
    }

    private long testTp() {
        orq63 |= MASK63;
        orq62 |= MASK62;
        orq61 |= MASK61;
        orq60 |= MASK60;
        return 0L;
    }

Before, we had:

03e       movq    R10, #-9223372036854775808    # long
048       orq     [RSI + #32 (8-bit)], R10    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.orq63
04c       movq    R10, #4611686018427387904    # long
056       orq     [RSI + #40 (8-bit)], R10    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.orq62
05a       movq    R10, #2305843009213693952    # long
064       orq     [RSI + #48 (8-bit)], R10    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.orq61
068       movq    R10, #1152921504606846976    # long
072       orq     [RSI + #56 (8-bit)], R10    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.orq60

Benchmark                  Mode  Cnt      Score      Error  Units
BitSetAndReset.throughput  avgt    9  25912.455 ? 2527.041  ns/op

And after, we would have:

03c       btsq    [RSI + #32 (8-bit)], log2(#-9223372036854775808)
# long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq63
042       btsq    [RSI + #40 (8-bit)], log2(#4611686018427387904)    #
long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq62
048       btsq    [RSI + #48 (8-bit)], log2(#2305843009213693952)    #
long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq61
04e       btsq    [RSI + #56 (8-bit)], log2(#1152921504606846976)    #
long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq60

Benchmark                  Mode  Cnt      Score      Error  Units
BitSetAndReset.throughput  avgt    9  25803.195 ? 2434.009  ns/op

Fortunately, we still see a tiny performance gain along with the large
size reduction and register saving.
Should we go ahead with this optimization? If so, I'll post a RFR with
Vladimir's requested changes soon.

Thanks,
Bernard

On Thu, 7 Nov 2019 at 02:02, John Rose <john.r.rose at oracle.com> wrote:
>
> I recently saw LLVM compile a classification switch into a really tidy BTR instruction,
> something like this:
>
>   switch (ch) {
>   case ';': case '/': case '.': case '[':  return 0;
>   default: return 1;
>   }
> =>
>   ? range check ?
>   movabsq       0x200000002003, %rcx
>   btq   %rdi, %rcx
>
> It made me wish for this change, plus some more to switch itself.
> Given Sandhya?s report, though, BTR may only be helpful in limited
> cases.  In the case above, it subsumes a shift instruction.
>
> Bernard?s JMH experiment suggests something else is going on besides
> the throughput difference which Sandhya cites.  Maybe it?s a benchmark
> artifact, or maybe it?s a good effect from smaller code.  I suggest jamming
> more back-to-back BTRs together, to see if the throughput effect appears.
>
> ? John
>
> On Nov 6, 2019, at 4:34 PM, Viswanathan, Sandhya <sandhya.viswanathan at intel.com> wrote:
> >
> > Hi Vladimir/Bernard,
> >
> >
> >
> > I don?t see any restrictions/limitations on these instructions other than the fact that the ?long? operation is only supported on 64-bit format as usual so should be restricted to 64-bit JVM only.
> >
> > The code size improvement that Bernard demonstrates is significant for operation on longs.
> >
> > It looks like the throughput for AND/OR is better than BTR/BTS  (0.25 vs 0.5) though. Please refer Table C-17 in the document below:
>

From vladimir.kozlov at oracle.com  Thu Nov  7 19:51:08 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 7 Nov 2019 11:51:08 -0800
Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <CAEgw74DKy-2kQqd-PyK62DR5baWund-F60jM5DjpAoJtKUe_TA@mail.gmail.com>
References: <CAEgw74CWKFCH9c744sJcAEf4_9-33vk9HBGosNH4QwgEH4nKYw@mail.gmail.com>
 <dbdbd1cd-c564-20de-a7ac-f31807b02cc5@oracle.com>
 <CH2PR11MB4280661F4597DEF39127D45FEF780@CH2PR11MB4280.namprd11.prod.outlook.com>
 <58CC373D-E70D-4503-8578-6110F3FC04EE@oracle.com>
 <CAEgw74DKy-2kQqd-PyK62DR5baWund-F60jM5DjpAoJtKUe_TA@mail.gmail.com>
Message-ID: <5fac20bf-d5d6-3380-894e-0cb488fb0dca@oracle.com>

I agree with you, Bernard.

I think throughput performance is limited by memory accesses which is the same in both cases. But 
code reduction is very nice improvement. We can squeeze more code into CPU buffer which is very good 
for small loops.

Please, send official RFR and to testing. Also would be nice to have a test which verifies result of 
these operations.

Thanks,
Vladimir


On 11/7/19 11:30 AM, B. Blaser wrote:
> Hi Vladimir, Sandhya and John,
> 
> Thanks for your respective answers.
> 
> The suggested fix focuses on x86_64 and pure 64-bit immediates which
> means that all other cases are left unchanged as shown by the initial
> benchmark, for example:
> 
> andq &= ~MASK00;
> orq |= MASK00;
> 
> would still give:
> 
> 03c       andq    [RSI + #16 (8-bit)], #-2    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.andq
> 041       orq     [RSI + #24 (8-bit)], #1    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq
> 046       ...
> 
> Now, the interesting point is that pure 64-bit immediates (which
> cannot be treated as sign-extended 8/32-bit values) are assembled
> using two instructions (not one) because AND/OR cannot be used
> directly in such cases, for example:
> 
> andq &= ~MASK63;
> orq |= MASK63;
> 
> gives:
> 
> 03e       movq    R10, #9223372036854775807    # long
> 048       andq    [RSI + #16 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.andq
> 04c       movq    R10, #-9223372036854775808    # long
> 056       orq     [RSI + #24 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq
> 05a       ...
> 
> So, even though Sandhya mentioned a better throughput for AND/OR, the
> additional MOV cost (I didn't find it in table C-17 but I assume
> something close to MOVS/Z with latency=1/throughput=0.25) seems to be
> in favor of a sole BTR/BTS instruction as shown by the initial
> benchmark.
> 
> However, as John suggested, I tried another benchmark which focuses on
> the throughput to make sure there isn't any regression in such
> situations:
> 
>      private long orq63, orq62, orq61, orq60;
> 
>      @Benchmark
>      public void throughput(Blackhole bh) {
>          for (int i=0; i<COUNT; i++) {
>              orq63 = orq62 = orq61 = orq60 = 0;
>              bh.consume(testTp());
>          }
>      }
> 
>      private long testTp() {
>          orq63 |= MASK63;
>          orq62 |= MASK62;
>          orq61 |= MASK61;
>          orq60 |= MASK60;
>          return 0L;
>      }
> 
> Before, we had:
> 
> 03e       movq    R10, #-9223372036854775808    # long
> 048       orq     [RSI + #32 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq63
> 04c       movq    R10, #4611686018427387904    # long
> 056       orq     [RSI + #40 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq62
> 05a       movq    R10, #2305843009213693952    # long
> 064       orq     [RSI + #48 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq61
> 068       movq    R10, #1152921504606846976    # long
> 072       orq     [RSI + #56 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq60
> 
> Benchmark                  Mode  Cnt      Score      Error  Units
> BitSetAndReset.throughput  avgt    9  25912.455 ? 2527.041  ns/op
> 
> And after, we would have:
> 
> 03c       btsq    [RSI + #32 (8-bit)], log2(#-9223372036854775808)
> # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq63
> 042       btsq    [RSI + #40 (8-bit)], log2(#4611686018427387904)    #
> long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq62
> 048       btsq    [RSI + #48 (8-bit)], log2(#2305843009213693952)    #
> long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq61
> 04e       btsq    [RSI + #56 (8-bit)], log2(#1152921504606846976)    #
> long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq60
> 
> Benchmark                  Mode  Cnt      Score      Error  Units
> BitSetAndReset.throughput  avgt    9  25803.195 ? 2434.009  ns/op
> 
> Fortunately, we still see a tiny performance gain along with the large
> size reduction and register saving.
> Should we go ahead with this optimization? If so, I'll post a RFR with
> Vladimir's requested changes soon.
> 
> Thanks,
> Bernard
> 
> On Thu, 7 Nov 2019 at 02:02, John Rose <john.r.rose at oracle.com> wrote:
>>
>> I recently saw LLVM compile a classification switch into a really tidy BTR instruction,
>> something like this:
>>
>>    switch (ch) {
>>    case ';': case '/': case '.': case '[':  return 0;
>>    default: return 1;
>>    }
>> =>
>>    ? range check ?
>>    movabsq       0x200000002003, %rcx
>>    btq   %rdi, %rcx
>>
>> It made me wish for this change, plus some more to switch itself.
>> Given Sandhya?s report, though, BTR may only be helpful in limited
>> cases.  In the case above, it subsumes a shift instruction.
>>
>> Bernard?s JMH experiment suggests something else is going on besides
>> the throughput difference which Sandhya cites.  Maybe it?s a benchmark
>> artifact, or maybe it?s a good effect from smaller code.  I suggest jamming
>> more back-to-back BTRs together, to see if the throughput effect appears.
>>
>> ? John
>>
>> On Nov 6, 2019, at 4:34 PM, Viswanathan, Sandhya <sandhya.viswanathan at intel.com> wrote:
>>>
>>> Hi Vladimir/Bernard,
>>>
>>>
>>>
>>> I don?t see any restrictions/limitations on these instructions other than the fact that the ?long? operation is only supported on 64-bit format as usual so should be restricted to 64-bit JVM only.
>>>
>>> The code size improvement that Bernard demonstrates is significant for operation on longs.
>>>
>>> It looks like the throughput for AND/OR is better than BTR/BTS  (0.25 vs 0.5) though. Please refer Table C-17 in the document below:
>>

From smita.kamath at intel.com  Thu Nov  7 20:12:18 2019
From: smita.kamath at intel.com (Kamath, Smita)
Date: Thu, 7 Nov 2019 20:12:18 +0000
Subject: RFR(S) JDK-8233741: AES Countermode (AES-CTR) optimization using
 AVX512 + VAES instructions
Message-ID: <6563F381B547594081EF9DE181D07912B2D4418D@fmsmsx121.amr.corp.intel.com>

Hi Vladimir,


As per Intel Architecture Instruction Set Reference [1] Vector AES (VAES) Operations will be supported in future Intel ISA. I would like to contribute an optimization for AES-CTR algorithm using AVX512+VAES instructions. This optimization is for x86_64 architecture that have AVX512-VAES enabled. I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented.


I, smita.kamath at intel.com<mailto:smita.kamath at intel.com> , Regev Shemy (regev.shemy at intel.com<mailto:regev.shemy at intel.com>) and Shay Gueron, (shay.gueron at intel.com<mailto:shay.gueron at intel.com>) are contributors to this code.

Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8233741

Link to webrev: https://cr.openjdk.java.net/~srukmannagar/AESCTR/webrev.01


[1] https://software.intel.com/sites/default/files/managed/ad/01/253666-sdm-vol-2a.pdf  (Pages 156 - 159)

[2] https://software.intel.com/en-us/articles/intel-software-development-emulator


Regards,
Smita Kamath


From vladimir.kozlov at oracle.com  Thu Nov  7 21:09:29 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 7 Nov 2019 13:09:29 -0800
Subject: RFR(S) JDK-8233741: AES Countermode (AES-CTR) optimization using
 AVX512 + VAES instructions
In-Reply-To: <6563F381B547594081EF9DE181D07912B2D4418D@fmsmsx121.amr.corp.intel.com>
References: <6563F381B547594081EF9DE181D07912B2D4418D@fmsmsx121.amr.corp.intel.com>
Message-ID: <c549b5d5-97a4-ebf3-36ae-854f9dd2b394@oracle.com>

Hi Smita,

You don't need #ifdef _LP64 in stubGenerator_x86_64.cpp. This file is compiled only for 64-bit JVM.
You also have trailing spaces - please remove them.

Changes seem fine otherwise. I submit tier1 testing to make sure it builds.

Thanks,
Vladimir

On 11/7/19 12:12 PM, Kamath, Smita wrote:
> Hi Vladimir,
> 
> 
> As per Intel Architecture Instruction Set Reference [1] Vector AES (VAES) Operations will be supported in future Intel ISA. I would like to contribute an optimization for AES-CTR algorithm using AVX512+VAES instructions. This optimization is for x86_64 architecture that have AVX512-VAES enabled. I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented.
> 
> 
> I, smita.kamath at intel.com<mailto:smita.kamath at intel.com> , Regev Shemy (regev.shemy at intel.com<mailto:regev.shemy at intel.com>) and Shay Gueron, (shay.gueron at intel.com<mailto:shay.gueron at intel.com>) are contributors to this code.
> 
> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8233741
> 
> Link to webrev: https://cr.openjdk.java.net/~srukmannagar/AESCTR/webrev.01
> 
> 
> [1] https://software.intel.com/sites/default/files/managed/ad/01/253666-sdm-vol-2a.pdf  (Pages 156 - 159)
> 
> [2] https://software.intel.com/en-us/articles/intel-software-development-emulator
> 
> 
> Regards,
> Smita Kamath
> 

From lutz.schmidt at sap.com  Thu Nov  7 21:33:49 2019
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Thu, 7 Nov 2019 21:33:49 +0000
Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free
 blocks
In-Reply-To: <bfc79f43-c00d-3f0c-8025-dbe19bc582e1@redhat.com>
References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com>
 <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com>
 <B89E11F0-98EE-49F1-87EB-FC22CB7D54A2@sap.com>
 <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com>
 <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com>
 <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com>
 <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com>
 <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com>
 <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com>
 <dd34fa81-b027-4353-df0f-daedf9911b6d@redhat.com>
 <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com>
 <bfc79f43-c00d-3f0c-8025-dbe19bc582e1@redhat.com>
Message-ID: <D146649B-38B0-448A-9322-69F41BE227A3@sap.com>

Hi Andrew,

thanks for spending more thoughts on this matter - and for updating
your opinion.

The instrumentation and measurement of other tests will take longer than expected. It got delayed by JDK-8233787. The fix for this bug will enable my timing code to run smoother.

Side note: this timing code I have mentioned now several times is nothing secret. It's just not suitable to contribute, among other reasons because it's only available for ppc and s390. I can give you more information in case you are interested - no problem if you say "ahhh, never mind...".

Thanks,
Lutz

?On 07.11.19, 17:34, "Andrew Dinn" <adinn at redhat.com> wrote:

    On 04/11/2019 15:35, Schmidt, Lutz wrote:
    > thank you for your thoughts. I do not agree to your conclusion, 
    > though.
    > 
    > There are two bottlenecks in the CodeHeap management code. One is in
    > CodeHeap::mark_segmap_as_used(), uncovered by
    > OverflowCodeCacheTest.java.  The other is in 
    > CodeHeap::add_to_freelist(), uncovered by StressCodeCacheTest.java.
    > 
    > Both bottlenecks are tackled by the recommended changeset.
      . . .
    > CodeHeap::add_to_freelist() is still O(n*n), with n being the free 
    > list length. But the kick-in point of the non-linearity could be 
    > significantly shifted towards larger n. The time reduction from 
    > approx. 8 seconds to 160 milliseconds supports this statement.
    
    Ah sorry, I was not clear from your original post that the proposed
    change had significantly improved the time spent in free list management
    in the second test by significantly cutting down the free list size. As
    you say, a reduction factor of 1/K in list size will give a 1/K*K
    reduction in execution time. Since this test is a lot nearer to reality
    than the overflow test I think the current result is perhaps enough to
    justify its value.
    
    > I agree it would be helpful to have a "real-world" example showing 
    > some improvement. Providing such evidence is hard, though. I could 
    > instrument the code and print some values form time to time. It's 
    > certain this additional output will mess up success/failure decisions
    > in our test environment. Not sure everybody likes that. But I will
    > give it a try and take the hits. This will be a multi-day effort.
    
    Well, that would be nice to have but not if it stops other work. The one
    thing about the Stress test that I fear may be 'unreal' is the
    potentially over-high probability of generating long(ish) runs of
    adjacent free segments. That might be giving an artificial win that we
    will not in fact see. However, given the current numbers I'd be happy to
    risk that and let this patch go in as is.
    
    > On a general note, I am always uncomfortable knowing of a O(n*n) 
    > effort, in particular when it could be removed or at least tamed 
    > considerably. Experience tells (at least to me) that, at some point 
    > in time, n will be large enough to hurt.
    
    Well, yes, although salesman do travel /and/ make money ... ;-)
    
    > I'll be back.
    
    Sure, thanks for following up. This is all very interesting.
    
    regards,
    
    
    Andrew Dinn
    -----------
    Senior Principal Software Engineer
    Red Hat UK Ltd
    Registered in England and Wales under Company Registration No. 03798903
    Directors: Michael Cunningham, Michael ("Mike") O'Neill
    
    
From igor.ignatyev at oracle.com  Thu Nov  7 21:41:10 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Thu, 7 Nov 2019 13:41:10 -0800
Subject: RFR(S) : 8230364 : [JVMCI] a number of JVMCI tests are not jtreg
 enabled
In-Reply-To: <09AB4909-D013-44C3-8A80-4334A5F52902@oracle.com>
References: <82DF702D-19C9-486B-8B5D-82C4F94D0A95@oracle.com>
 <09AB4909-D013-44C3-8A80-4334A5F52902@oracle.com>
Message-ID: <F9CE959C-4B3A-4578-B9E1-4186E262EF53@oracle.com>

Thanks Vladimir, pushed.
-- Igor

> On Nov 6, 2019, at 9:20 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Looks good.
> 
> Thanks
> Vladimir
> 
>> On Nov 6, 2019, at 8:23 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>> 
>> http://cr.openjdk.java.net/~iignatyev//8230364/webrev.02/
>>> 102 lines changed: 72 ins; 9 del; 21 mod;
>> 
>> Hi all,
>> 
>> could you please review this small patch which adds jtreg test descriptions to all tests in compiler/jvmci/jdk.vm.ci.hotspot.test/src?
>> to make it work, the patch also:
>> - replaces junit ceremonies w/ testng ceremonies;
>> - changes TestHotSpotJVMCIRuntime to use platform classLoader instead of ext. loader b/c ext. loader (and internal classes used by the test)  got removed in jdk9;
>> - temporary excludes TestTranslatedException. the test fails b/c decoded exception doesn't have information about modules. 8233745 is going to update jdk.vm.ci.hotspot.TranslatedException and remove @ignore from the test.
>> 
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8230364
>> webrev: http://cr.openjdk.java.net/~iignatyev//8230364/webrev.02/
>> testing: "added" tests (test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.hotspot.test/)
>> 
>> Thanks,
>> -- Igor
> 


From john.r.rose at oracle.com  Thu Nov  7 21:58:45 2019
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 7 Nov 2019 13:58:45 -0800
Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <CAEgw74DKy-2kQqd-PyK62DR5baWund-F60jM5DjpAoJtKUe_TA@mail.gmail.com>
References: <CAEgw74CWKFCH9c744sJcAEf4_9-33vk9HBGosNH4QwgEH4nKYw@mail.gmail.com>
 <dbdbd1cd-c564-20de-a7ac-f31807b02cc5@oracle.com>
 <CH2PR11MB4280661F4597DEF39127D45FEF780@CH2PR11MB4280.namprd11.prod.outlook.com>
 <58CC373D-E70D-4503-8578-6110F3FC04EE@oracle.com>
 <CAEgw74DKy-2kQqd-PyK62DR5baWund-F60jM5DjpAoJtKUe_TA@mail.gmail.com>
Message-ID: <A531AF2F-FF79-40F1-8463-309A97C85BB9@oracle.com>

Would you consider adding patterns for non-constant masks also?
It would be something like (And (LShift n) x), etc.
It could be in this set or in an a follow-on.
Thanks (says John who always wants more).

> On Nov 7, 2019, at 11:30 AM, B. Blaser <bsrbnd at gmail.com> wrote:
> 
> Hi Vladimir, Sandhya and John,
> 
> Thanks for your respective answers.
> 
> The suggested fix focuses on x86_64 and pure 64-bit immediates which
> means that all other cases are left unchanged as shown by the initial
> benchmark, for example:
> 
> andq &= ~MASK00;
> orq |= MASK00;
> 
> would still give:
> 
> 03c       andq    [RSI + #16 (8-bit)], #-2    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.andq
> 041       orq     [RSI + #24 (8-bit)], #1    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq
> 046       ...
> 
> Now, the interesting point is that pure 64-bit immediates (which
> cannot be treated as sign-extended 8/32-bit values) are assembled
> using two instructions (not one) because AND/OR cannot be used
> directly in such cases, for example:
> 
> andq &= ~MASK63;
> orq |= MASK63;
> 
> gives:
> 
> 03e       movq    R10, #9223372036854775807    # long
> 048       andq    [RSI + #16 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.andq
> 04c       movq    R10, #-9223372036854775808    # long
> 056       orq     [RSI + #24 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq
> 05a       ...
> 
> So, even though Sandhya mentioned a better throughput for AND/OR, the
> additional MOV cost (I didn't find it in table C-17 but I assume
> something close to MOVS/Z with latency=1/throughput=0.25) seems to be
> in favor of a sole BTR/BTS instruction as shown by the initial
> benchmark.
> 
> However, as John suggested, I tried another benchmark which focuses on
> the throughput to make sure there isn't any regression in such
> situations:
> 
>    private long orq63, orq62, orq61, orq60;
> 
>    @Benchmark
>    public void throughput(Blackhole bh) {
>        for (int i=0; i<COUNT; i++) {
>            orq63 = orq62 = orq61 = orq60 = 0;
>            bh.consume(testTp());
>        }
>    }
> 
>    private long testTp() {
>        orq63 |= MASK63;
>        orq62 |= MASK62;
>        orq61 |= MASK61;
>        orq60 |= MASK60;
>        return 0L;
>    }
> 
> Before, we had:
> 
> 03e       movq    R10, #-9223372036854775808    # long
> 048       orq     [RSI + #32 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq63
> 04c       movq    R10, #4611686018427387904    # long
> 056       orq     [RSI + #40 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq62
> 05a       movq    R10, #2305843009213693952    # long
> 064       orq     [RSI + #48 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq61
> 068       movq    R10, #1152921504606846976    # long
> 072       orq     [RSI + #56 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq60
> 
> Benchmark                  Mode  Cnt      Score      Error  Units
> BitSetAndReset.throughput  avgt    9  25912.455 ? 2527.041  ns/op
> 
> And after, we would have:
> 
> 03c       btsq    [RSI + #32 (8-bit)], log2(#-9223372036854775808)
> # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq63
> 042       btsq    [RSI + #40 (8-bit)], log2(#4611686018427387904)    #
> long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq62
> 048       btsq    [RSI + #48 (8-bit)], log2(#2305843009213693952)    #
> long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq61
> 04e       btsq    [RSI + #56 (8-bit)], log2(#1152921504606846976)    #
> long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq60
> 
> Benchmark                  Mode  Cnt      Score      Error  Units
> BitSetAndReset.throughput  avgt    9  25803.195 ? 2434.009  ns/op
> 
> Fortunately, we still see a tiny performance gain along with the large
> size reduction and register saving.
> Should we go ahead with this optimization? If so, I'll post a RFR with
> Vladimir's requested changes soon.
> 
> Thanks,
> Bernard
> 
> On Thu, 7 Nov 2019 at 02:02, John Rose <john.r.rose at oracle.com> wrote:
>> 
>> I recently saw LLVM compile a classification switch into a really tidy BTR instruction,
>> something like this:
>> 
>>  switch (ch) {
>>  case ';': case '/': case '.': case '[':  return 0;
>>  default: return 1;
>>  }
>> =>
>>  ? range check ?
>>  movabsq       0x200000002003, %rcx
>>  btq   %rdi, %rcx
>> 
>> It made me wish for this change, plus some more to switch itself.
>> Given Sandhya?s report, though, BTR may only be helpful in limited
>> cases.  In the case above, it subsumes a shift instruction.
>> 
>> Bernard?s JMH experiment suggests something else is going on besides
>> the throughput difference which Sandhya cites.  Maybe it?s a benchmark
>> artifact, or maybe it?s a good effect from smaller code.  I suggest jamming
>> more back-to-back BTRs together, to see if the throughput effect appears.
>> 
>> ? John
>> 
>> On Nov 6, 2019, at 4:34 PM, Viswanathan, Sandhya <sandhya.viswanathan at intel.com> wrote:
>>> 
>>> Hi Vladimir/Bernard,
>>> 
>>> 
>>> 
>>> I don?t see any restrictions/limitations on these instructions other than the fact that the ?long? operation is only supported on 64-bit format as usual so should be restricted to 64-bit JVM only.
>>> 
>>> The code size improvement that Bernard demonstrates is significant for operation on longs.
>>> 
>>> It looks like the throughput for AND/OR is better than BTR/BTS  (0.25 vs 0.5) though. Please refer Table C-17 in the document below:
>> 


From vladimir.kozlov at oracle.com  Thu Nov  7 22:14:35 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 7 Nov 2019 14:14:35 -0800
Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread
In-Reply-To: <d147c777-b05f-3c65-90eb-d35cea0beed1@oracle.com>
References: <VI1PR0201MB24797F385592A8C572B4064E9A900@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com>
 <VI1PR0201MB24790137F2D7A941A33CE4949A660@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d8990c59-b51e-1c21-3e7c-272d287fc711@oracle.com>
 <VI1PR0201MB2479DDFCD2E55541481E65ED9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com>
 <VI1PR0201MB247928CE867614F8272E5D719A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com>
 <VI1PR0201MB247921C496C30C638B0959279A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <8BCAE092-9E05-4B38-9D41-2F3AC889C0AA@oracle.com>
 <HE1PR0201MB2475B62202B8D5187D6C3CAC9A790@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <84ef3a8c-5005-6529-5192-b9214e0348ac@oracle.com>
 <VI1PR0201MB2479F3BD3483ABF14F0A36109A780@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d147c777-b05f-3c65-90eb-d35cea0beed1@oracle.com>
Message-ID: <e87a02b4-aa32-bfb9-ce76-5c875b006d24@oracle.com>

I resubmitted testing.

Vladimir

On 11/7/19 1:51 AM, David Holmes wrote:
> On 7/11/2019 7:08 pm, Doerr, Martin wrote:
>> Hi David,
>>
>> get_log only accesses the executing thread's own oop and the ones before it. So it's ensured by 
>> the algorithm that all accessed oops are in live handles.
> 
> Okay I see that now.
> 
> Thanks,
> David
> 
>> The problem is in can_remove when not holding the lock. For that, webrev.04 avoids accessing the 
>> oop of the last compiler thread in the case in which the lock is not held.
>>
>> Best regards,
>> Martin
>>
>>
>>> -----Original Message-----
>>> From: David Holmes <david.holmes at oracle.com>
>>> Sent: Mittwoch, 6. November 2019 11:15
>>> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
>>> <kim.barrett at oracle.com>
>>> Cc: dean.long at oracle.com; Vladimir Kozlov (vladimir.kozlov at oracle.com)
>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>>
>>> Hi Martin,
>>>
>>> On 6/11/2019 7:12 pm, Doerr, Martin wrote:
>>>> Hi Kim,
>>>>
>>>> thanks for confirming.
>>>>
>>>>
>>> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
>>> ev.04/
>>>> already avoids access to freed handles.
>>>
>>> Sorry I missed your earlier reference to this version.
>>>
>>> So the expectation here is that all accesses to these arrays are guarded
>>> by the CompileThread_lock, but that doesn't seem to hold for get_log ?
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> I don't really like the complexity of this code.
>>>> Replacing oops in handles would have been much more simple.
>>>> But I can live with either version.
>>>>
>>>> Best regards,
>>>> Martin
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Kim Barrett <kim.barrett at oracle.com>
>>>>> Sent: Mittwoch, 6. November 2019 04:09
>>>>> To: Doerr, Martin <martin.doerr at sap.com>
>>>>> Cc: David Holmes <david.holmes at oracle.com>; dean.long at oracle.com;
>>>>> Vladimir Kozlov (vladimir.kozlov at oracle.com)
>>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>>>>
>>>>>> On Nov 5, 2019, at 3:40 AM, Doerr, Martin <martin.doerr at sap.com>
>>> wrote:
>>>>>
>>>>> Coming back in, because this seems to be going off into the weeds again.
>>>>>
>>>>>>> I don't understand what you mean. If a compiler thread holds an oop,
>>> any
>>>>>>> oop, it must hold it in a Handle to ensure it can't be gc'd.
>>>>>>
>>>>>> The problem is not related to gc.
>>>>>> My change introduces destroy_global for the handles. This means that
>>> the
>>>>> OopStorage portion which has held the oop can get freed.
>>>>>> However, other compiler threads are running concurrently. They may
>>>>> execute code which reads the oop from the handle which is freed by this
>>>>> thread.
>>>>>> Reading stale data is not a problem here, but reading freed memory may
>>>>> assert or even crash in general.
>>>>>> I can't see how OopStorage supports reading from handles which were
>>>>> freed by destroy_global.
>>>>>
>>>>> So don't do that!
>>>>>
>>>>> OopStorage isn't magic. If you are going to look at an OopStorage
>>>>> handle, you have to ensure there won't be concurrent deletion. Use
>>>>> locks or some safe memory reclamation protocol. (GlobalCounter might
>>>>> be used here, but it depends a lot on what the iterations are doing. A
>>>>> reference counting mechanism is another possibility.) This is no
>>>>> different from any other resource management.
>>>>>
>>>>>> I think it would be safe if the freeing only occurred at safepoints, but I
>>> don't
>>>>> think this is the case.
>>>>>
>>>>> Assuming the iteration didn?t happen at safepoints (which is just a way to
>>>>> make the iteration and
>>>>> deletion not concurrent).? And I agree that isn?t the case with the current
>>>>> code.
>>>>

From igor.ignatyev at oracle.com  Thu Nov  7 23:15:24 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Thu, 7 Nov 2019 15:15:24 -0800
Subject: RFR(S) : 8233745 : [JVMCI] TranslatedException should serialize
 classloader and module info
Message-ID: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00/index.html
> 71 lines changed: 50 ins; 14 del; 7 mod; 

Hi all,

could you please review the small patch which updates jdk/vm/ci/hotspot/TranslatedException to encode/decode StackTraceElement fields which were introduced in JDK9 (classloader name, module name and version fields)?

I wasn't able to make deserialize StackTraceElement::toString to return the same string representation as original ones b/c StackTraceElement::declaringClassObject won't be set, as a result, JDK_NON_UPGRADEABLE_MODULE and BUILTIN_CLASS_LOADER bits won't be set either and StackTraceElement::toString will have classloader names even for built-it loader (won't be in original b/c dropClassLoaderName() is true) and version of system modules (won't be in original b/c dropModuleVersion() is true); so I changed how TestTranslatedException compares original and decoded exceptions.

webrev: http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00
JBS: https://bugs.openjdk.java.net/browse/JDK-8233745
testing: compiler/jvmci/ + graal tiers

Thanks,
-- Igor


From vladimir.kozlov at oracle.com  Thu Nov  7 23:33:21 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 7 Nov 2019 15:33:21 -0800
Subject: RFR(S) : 8233745 : [JVMCI] TranslatedException should serialize
 classloader and module info
In-Reply-To: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com>
References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com>
Message-ID: <db7cbca8-9644-ae6d-a0a3-a808ecd13f92@oracle.com>

Good.

Tom and Doug should look on this.

Thanks,
Vladimir

On 11/7/19 3:15 PM, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00/index.html
>> 71 lines changed: 50 ins; 14 del; 7 mod;
> 
> Hi all,
> 
> could you please review the small patch which updates jdk/vm/ci/hotspot/TranslatedException to encode/decode StackTraceElement fields which were introduced in JDK9 (classloader name, module name and version fields)?
> 
> I wasn't able to make deserialize StackTraceElement::toString to return the same string representation as original ones b/c StackTraceElement::declaringClassObject won't be set, as a result, JDK_NON_UPGRADEABLE_MODULE and BUILTIN_CLASS_LOADER bits won't be set either and StackTraceElement::toString will have classloader names even for built-it loader (won't be in original b/c dropClassLoaderName() is true) and version of system modules (won't be in original b/c dropModuleVersion() is true); so I changed how TestTranslatedException compares original and decoded exceptions.
> 
> webrev: http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00
> JBS: https://bugs.openjdk.java.net/browse/JDK-8233745
> testing: compiler/jvmci/ + graal tiers
> 
> Thanks,
> -- Igor
> 
> 

From sandhya.viswanathan at intel.com  Fri Nov  8 00:57:28 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 8 Nov 2019 00:57:28 +0000
Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <5fac20bf-d5d6-3380-894e-0cb488fb0dca@oracle.com>
References: <CAEgw74CWKFCH9c744sJcAEf4_9-33vk9HBGosNH4QwgEH4nKYw@mail.gmail.com>
 <dbdbd1cd-c564-20de-a7ac-f31807b02cc5@oracle.com>
 <CH2PR11MB4280661F4597DEF39127D45FEF780@CH2PR11MB4280.namprd11.prod.outlook.com>
 <58CC373D-E70D-4503-8578-6110F3FC04EE@oracle.com>
 <CAEgw74DKy-2kQqd-PyK62DR5baWund-F60jM5DjpAoJtKUe_TA@mail.gmail.com>
 <5fac20bf-d5d6-3380-894e-0cb488fb0dca@oracle.com>
Message-ID: <CH2PR11MB42807634E8C54F82F99EEB94EF7B0@CH2PR11MB4280.namprd11.prod.outlook.com>

Thanks a lot Bernard, for identifying these optimizations. 

Best Regards,
Sandhya


-----Original Message-----
From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Vladimir Kozlov
Sent: Thursday, November 07, 2019 11:51 AM
To: B. Blaser <bsrbnd at gmail.com>; John Rose <john.r.rose at oracle.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and setting long vector bits

I agree with you, Bernard.

I think throughput performance is limited by memory accesses which is the same in both cases. But code reduction is very nice improvement. We can squeeze more code into CPU buffer which is very good for small loops.

Please, send official RFR and to testing. Also would be nice to have a test which verifies result of these operations.

Thanks,
Vladimir


On 11/7/19 11:30 AM, B. Blaser wrote:
> Hi Vladimir, Sandhya and John,
> 
> Thanks for your respective answers.
> 
> The suggested fix focuses on x86_64 and pure 64-bit immediates which 
> means that all other cases are left unchanged as shown by the initial 
> benchmark, for example:
> 
> andq &= ~MASK00;
> orq |= MASK00;
> 
> would still give:
> 
> 03c       andq    [RSI + #16 (8-bit)], #-2    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.andq
> 041       orq     [RSI + #24 (8-bit)], #1    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq
> 046       ...
> 
> Now, the interesting point is that pure 64-bit immediates (which 
> cannot be treated as sign-extended 8/32-bit values) are assembled 
> using two instructions (not one) because AND/OR cannot be used 
> directly in such cases, for example:
> 
> andq &= ~MASK63;
> orq |= MASK63;
> 
> gives:
> 
> 03e       movq    R10, #9223372036854775807    # long
> 048       andq    [RSI + #16 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.andq
> 04c       movq    R10, #-9223372036854775808    # long
> 056       orq     [RSI + #24 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq
> 05a       ...
> 
> So, even though Sandhya mentioned a better throughput for AND/OR, the 
> additional MOV cost (I didn't find it in table C-17 but I assume 
> something close to MOVS/Z with latency=1/throughput=0.25) seems to be 
> in favor of a sole BTR/BTS instruction as shown by the initial 
> benchmark.
> 
> However, as John suggested, I tried another benchmark which focuses on 
> the throughput to make sure there isn't any regression in such
> situations:
> 
>      private long orq63, orq62, orq61, orq60;
> 
>      @Benchmark
>      public void throughput(Blackhole bh) {
>          for (int i=0; i<COUNT; i++) {
>              orq63 = orq62 = orq61 = orq60 = 0;
>              bh.consume(testTp());
>          }
>      }
> 
>      private long testTp() {
>          orq63 |= MASK63;
>          orq62 |= MASK62;
>          orq61 |= MASK61;
>          orq60 |= MASK60;
>          return 0L;
>      }
> 
> Before, we had:
> 
> 03e       movq    R10, #-9223372036854775808    # long
> 048       orq     [RSI + #32 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq63
> 04c       movq    R10, #4611686018427387904    # long
> 056       orq     [RSI + #40 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq62
> 05a       movq    R10, #2305843009213693952    # long
> 064       orq     [RSI + #48 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq61
> 068       movq    R10, #1152921504606846976    # long
> 072       orq     [RSI + #56 (8-bit)], R10    # long ! Field:
> org/openjdk/bench/vm/compiler/BitSetAndReset.orq60
> 
> Benchmark                  Mode  Cnt      Score      Error  Units
> BitSetAndReset.throughput  avgt    9  25912.455 ? 2527.041  ns/op
> 
> And after, we would have:
> 
> 03c       btsq    [RSI + #32 (8-bit)], log2(#-9223372036854775808)
> # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq63
> 042       btsq    [RSI + #40 (8-bit)], log2(#4611686018427387904)    #
> long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq62
> 048       btsq    [RSI + #48 (8-bit)], log2(#2305843009213693952)    #
> long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq61
> 04e       btsq    [RSI + #56 (8-bit)], log2(#1152921504606846976)    #
> long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq60
> 
> Benchmark                  Mode  Cnt      Score      Error  Units
> BitSetAndReset.throughput  avgt    9  25803.195 ? 2434.009  ns/op
> 
> Fortunately, we still see a tiny performance gain along with the large 
> size reduction and register saving.
> Should we go ahead with this optimization? If so, I'll post a RFR with 
> Vladimir's requested changes soon.
> 
> Thanks,
> Bernard
> 
> On Thu, 7 Nov 2019 at 02:02, John Rose <john.r.rose at oracle.com> wrote:
>>
>> I recently saw LLVM compile a classification switch into a really 
>> tidy BTR instruction, something like this:
>>
>>    switch (ch) {
>>    case ';': case '/': case '.': case '[':  return 0;
>>    default: return 1;
>>    }
>> =>
>>    ? range check ?
>>    movabsq       0x200000002003, %rcx
>>    btq   %rdi, %rcx
>>
>> It made me wish for this change, plus some more to switch itself.
>> Given Sandhya?s report, though, BTR may only be helpful in limited 
>> cases.  In the case above, it subsumes a shift instruction.
>>
>> Bernard?s JMH experiment suggests something else is going on besides 
>> the throughput difference which Sandhya cites.  Maybe it?s a 
>> benchmark artifact, or maybe it?s a good effect from smaller code.  I 
>> suggest jamming more back-to-back BTRs together, to see if the throughput effect appears.
>>
>> ? John
>>
>> On Nov 6, 2019, at 4:34 PM, Viswanathan, Sandhya <sandhya.viswanathan at intel.com> wrote:
>>>
>>> Hi Vladimir/Bernard,
>>>
>>>
>>>
>>> I don?t see any restrictions/limitations on these instructions other than the fact that the ?long? operation is only supported on 64-bit format as usual so should be restricted to 64-bit JVM only.
>>>
>>> The code size improvement that Bernard demonstrates is significant for operation on longs.
>>>
>>> It looks like the throughput for AND/OR is better than BTR/BTS  (0.25 vs 0.5) though. Please refer Table C-17 in the document below:
>>

From smita.kamath at intel.com  Fri Nov  8 01:34:52 2019
From: smita.kamath at intel.com (Kamath, Smita)
Date: Fri, 8 Nov 2019 01:34:52 +0000
Subject: RFR(S) JDK-8233741: AES Countermode (AES-CTR) optimization
 using AVX512 + VAES instructions
In-Reply-To: <c549b5d5-97a4-ebf3-36ae-854f9dd2b394@oracle.com>
References: <6563F381B547594081EF9DE181D07912B2D4418D@fmsmsx121.amr.corp.intel.com>
 <c549b5d5-97a4-ebf3-36ae-854f9dd2b394@oracle.com>
Message-ID: <6563F381B547594081EF9DE181D07912B2D4495E@fmsmsx121.amr.corp.intel.com>

Hi Vladimir, 

I have made the changes. Please find the updated webrev at: 
http://cr.openjdk.java.net/~srukmannagar/AESCTR/webrev.02/

Thanks and Regards,
Smita Kamath


-----Original Message-----
From: Vladimir Kozlov <vladimir.kozlov at oracle.com> 
Sent: Thursday, November 07, 2019 1:09 PM
To: Kamath, Smita <smita.kamath at intel.com>
Cc: 'hotspot compiler' <hotspot-compiler-dev at openjdk.java.net>; Shemy, Regev <regev.shemy at intel.com>
Subject: Re: RFR(S) JDK-8233741: AES Countermode (AES-CTR) optimization using AVX512 + VAES instructions

Hi Smita,

You don't need #ifdef _LP64 in stubGenerator_x86_64.cpp. This file is compiled only for 64-bit JVM.
You also have trailing spaces - please remove them.

Changes seem fine otherwise. I submit tier1 testing to make sure it builds.

Thanks,
Vladimir

On 11/7/19 12:12 PM, Kamath, Smita wrote:
> Hi Vladimir,
> 
> 
> As per Intel Architecture Instruction Set Reference [1] Vector AES (VAES) Operations will be supported in future Intel ISA. I would like to contribute an optimization for AES-CTR algorithm using AVX512+VAES instructions. This optimization is for x86_64 architecture that have AVX512-VAES enabled. I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented.
> 
> 
> I, smita.kamath at intel.com<mailto:smita.kamath at intel.com> , Regev Shemy (regev.shemy at intel.com<mailto:regev.shemy at intel.com>) and Shay Gueron, (shay.gueron at intel.com<mailto:shay.gueron at intel.com>) are contributors to this code.
> 
> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8233741
> 
> Link to webrev: https://cr.openjdk.java.net/~srukmannagar/AESCTR/webrev.01
> 
> 
> [1] https://software.intel.com/sites/default/files/managed/ad/01/253666-sdm-vol-2a.pdf  (Pages 156 - 159)
> 
> [2] https://software.intel.com/en-us/articles/intel-software-development-emulator
> 
> 
> Regards,
> Smita Kamath
> 

From vladimir.kozlov at oracle.com  Fri Nov  8 01:51:01 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 7 Nov 2019 17:51:01 -0800
Subject: RFR(S) JDK-8233741: AES Countermode (AES-CTR) optimization using
 AVX512 + VAES instructions
In-Reply-To: <6563F381B547594081EF9DE181D07912B2D4495E@fmsmsx121.amr.corp.intel.com>
References: <6563F381B547594081EF9DE181D07912B2D4418D@fmsmsx121.amr.corp.intel.com>
 <c549b5d5-97a4-ebf3-36ae-854f9dd2b394@oracle.com>
 <6563F381B547594081EF9DE181D07912B2D4495E@fmsmsx121.amr.corp.intel.com>
Message-ID: <522c1d75-c0ce-146f-9838-3445d6638a91@oracle.com>

Looks good.

I pushed 02 version since tests passed with same changes - I removed #ifdef _LP64 in 
stubGenerator_x86_64.cpp before testing. And I compared patches - in new one spaces were removed as 
asked. This should not affect tests results.

thanks,
Vladimir

On 11/7/19 5:34 PM, Kamath, Smita wrote:
> Hi Vladimir,
> 
> I have made the changes. Please find the updated webrev at:
> http://cr.openjdk.java.net/~srukmannagar/AESCTR/webrev.02/
> 
> Thanks and Regards,
> Smita Kamath
> 
> 
> -----Original Message-----
> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Sent: Thursday, November 07, 2019 1:09 PM
> To: Kamath, Smita <smita.kamath at intel.com>
> Cc: 'hotspot compiler' <hotspot-compiler-dev at openjdk.java.net>; Shemy, Regev <regev.shemy at intel.com>
> Subject: Re: RFR(S) JDK-8233741: AES Countermode (AES-CTR) optimization using AVX512 + VAES instructions
> 
> Hi Smita,
> 
> You don't need #ifdef _LP64 in stubGenerator_x86_64.cpp. This file is compiled only for 64-bit JVM.
> You also have trailing spaces - please remove them.
> 
> Changes seem fine otherwise. I submit tier1 testing to make sure it builds.
> 
> Thanks,
> Vladimir
> 
> On 11/7/19 12:12 PM, Kamath, Smita wrote:
>> Hi Vladimir,
>>
>>
>> As per Intel Architecture Instruction Set Reference [1] Vector AES (VAES) Operations will be supported in future Intel ISA. I would like to contribute an optimization for AES-CTR algorithm using AVX512+VAES instructions. This optimization is for x86_64 architecture that have AVX512-VAES enabled. I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented.
>>
>>
>> I, smita.kamath at intel.com<mailto:smita.kamath at intel.com> , Regev Shemy (regev.shemy at intel.com<mailto:regev.shemy at intel.com>) and Shay Gueron, (shay.gueron at intel.com<mailto:shay.gueron at intel.com>) are contributors to this code.
>>
>> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8233741
>>
>> Link to webrev: https://cr.openjdk.java.net/~srukmannagar/AESCTR/webrev.01
>>
>>
>> [1] https://software.intel.com/sites/default/files/managed/ad/01/253666-sdm-vol-2a.pdf  (Pages 156 - 159)
>>
>> [2] https://software.intel.com/en-us/articles/intel-software-development-emulator
>>
>>
>> Regards,
>> Smita Kamath
>>

From tobias.hartmann at oracle.com  Fri Nov  8 07:56:10 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 8 Nov 2019 08:56:10 +0100
Subject: [14] RFR(S): 8233788: Remove useless asserts in
 PhaseCFG::insert_anti_dependences
In-Reply-To: <f4dcd527-9d74-31e8-afa3-b01da6746d21@oracle.com>
References: <b04cdfba-50da-227c-b0b4-7a35460977c1@oracle.com>
 <f4dcd527-9d74-31e8-afa3-b01da6746d21@oracle.com>
Message-ID: <88db889c-4601-e654-ec03-973b1f381f98@oracle.com>

Thanks Vladimir.

Best regards,
Tobias

On 07.11.19 20:28, Vladimir Kozlov wrote:
> Looks good.
> 
> Thanks,
> Vladimir
> 
> On 11/7/19 3:34 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following cleanup:
>> https://bugs.openjdk.java.net/browse/JDK-8233788
>> http://cr.openjdk.java.net/~thartmann/8233788/webrev.00/
>>
>> load_alias_idx can never be 0 and even if it could be, one of the asserts would fail because the
>> opcode check handles only a single type. In fact, all these intrinsic nodes have adr_type() == NULL
>> which maps to AliasIdxTop.
>>
>> Thanks,
>> Tobias
>>

From tobias.hartmann at oracle.com  Fri Nov  8 08:38:06 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 8 Nov 2019 09:38:06 +0100
Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted
 to a load for strong refs
In-Reply-To: <db3d2acd-3f06-5660-a9d6-7b04dda99204@oracle.com>
References: <db3d2acd-3f06-5660-a9d6-7b04dda99204@oracle.com>
Message-ID: <575000ef-fc06-c33e-07e3-d8046db1ea43@oracle.com>

Hi Erik,

On 07.11.19 14:49, Erik ?sterlund wrote:
> We have noticed problems with the one single place where we let C2 touch the graph while injecting
> our load barriers. Right before tagging a load as needing a load barrier, it is GVN transformed. The
> problem with this is that if we emit a strong field load, the GVN transformation can see through
> that load, by following it through the store that set its value, and find that the field value
> really came from the load of a weak Reference.get intrinsic. In this scenario, the load we get out
> from the GVN transformation needs weak barriers, yet we override it with strong barriers, as that
> was the semantics of the access being parsed. Sigh. We already have code that tries to determine if
> the load we got out from the GVN transformation looks like a load that was created in the
> BarrierSetC2 factory function, so one way of solving this is to refine that logic that tries to
> determine if this was the load we created before the transformation or not. 

Do we still need that logic with your change?

> But I felt like a better
> solution is to finish constructing the access with all the intended properties *before* transformation
Yes, that's much better.

> I massaged the code so that the GC barrier data of accesses with load barriers gets passed in to the
> factory functions that create the access, right before the transformation. This way, we construct
> the access with the intended semantics where it is being created (parser or macro expansion for
> field accesses in clone intrinsics). Then we do not have to touch it after the GVN transformation.
> 
> It does seem like there could be similar problems from other GCs, but in e.g. G1, the consequences
> are weird suboptimal code instead of anything dangerous happening. For example, we can generate SATB
> buffering code required by G1 Reference.get() intrinsics for strong accesses, due to GVN handing out
> earlier accesses with different semantics. Perhaps that should be looked into separately as well.
> But that investigation is outside of the scope of this bug fix.

Could you please file an RFE for that? I also wonder if we could assert that the barrier data is set
when GVN is performed? That way we would catch problems like the one you've described above early.

> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8233506/webrev.00/

Looks good to me!

Best regards,
Tobias

From tobias.hartmann at oracle.com  Fri Nov  8 09:46:30 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 8 Nov 2019 10:46:30 +0100
Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed: must
 have CFG nodes
Message-ID: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>

Hi,

please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8233656
http://cr.openjdk.java.net/~thartmann/8233656/webrev.00/

During IGVN, we process a CastII node that carries a non-zero dependency from
GraphKit::cast_not_null [1]. ConstraintCastNode::dominating_cast then finds another CastII and
checks if it's dominating. We assert in PhaseGVN::is_dominator_helper because the other CastII has a
ProjNode as control input that has !is_CFG() because it's input is TOP [2]. The input has been
replaced in the same round of IGVN and the projection is already on the IGVN worklist but hasn't
been processed yet (it will go away).

I propose to simply check the control inputs for is_CFG().

I can reproduce the issue with a complex Javafuzzer generated test (attached to the bug) but minimal
changes/simplifications to the test cause the issue to not reproduce anymore because it depends on
the order in which nodes are processed by IGVN. So I don't think it makes sense to include that
fragile test.

This has been triggered by my fix for 8229496 [3] which added additional Cast nodes but I believe it
can also happen without these changes.

Thanks,
Tobias

[1] https://hg.openjdk.java.net/jdk/jdk/rev/86b95fc6ca32#l12.40
[2] https://hg.openjdk.java.net/jdk/jdk/file/47c20fc6a517/src/hotspot/share/opto/multnode.cpp#l83
[3] https://bugs.openjdk.java.net/browse/JDK-8229496

From tobias.hartmann at oracle.com  Fri Nov  8 09:54:41 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 8 Nov 2019 10:54:41 +0100
Subject: RFR 8233389: Add PrintIdeal to compiler directives
In-Reply-To: <975bbd8f-8021-8a5c-544b-123b2a2e08d7@oracle.com>
References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com>
 <21ab250e-0564-69e4-62c0-07bb4dce9082@oracle.com>
 <6547a22a-47e2-822f-0772-c2b0a7599088@oracle.com>
 <975bbd8f-8021-8a5c-544b-123b2a2e08d7@oracle.com>
Message-ID: <19f1d1e3-74c5-7228-6f5d-47f616323d03@oracle.com>

Hi Jorn,

looks good to me too. I'll sponsor.

Best regards,
Tobias

On 05.11.19 17:54, Vladimir Ivanov wrote:
> 
>> http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.02/
> 
> Looks good.
> 
>> W.r.t. usefulness of PrintIdeal vs PrintIdealGraph; The obvious thing is that PrintIdeal doesn't
>> require IGV, which might be more useful if PrintIdeal were a diagnostic flag instead (as suggested
>> by Nils), so it could be used from a standard JDK build which doesn't come with IGV. Another
>> advantage that comes to mind is that PrintIdeal output is easier to share as text; you can just
>> copy a few lines into the body of an email. That said, I haven't used either of these flags
>> extensively, so I find it hard to judge whether one is clearly better than the other. But, it
>> seems at least unfortunate that we have the PrintIdeal flag, but can not use it in compiler
>> directives to filter the output.
> 
> It's a long-standing issue with tracing functionality in C2: both options depend on ability to dump
> Node and Type instances in textual form which is absent in product binaries. It would be nice to
> bundle that code in product binaries as well and turn both options into diagnostic ones, but nobody
> have taken care of it yet.
> 
> Also, at some point, IGV had a text view of the graph (which was pretty close to PrintIdeal output),
> but I can't find it there anymore.
> 
> Best regards,
> Vladimir Ivanov
> 
>> On 04/11/2019 15:57, Vladimir Ivanov wrote:
>>> Hi Jorn,
>>>
>>> src\hotspot\share\opto\compile.hpp:
>>> +?? bool????????????????? _print_ideal;?????????? // True if we should dump node IR for this
>>> compilation
>>>
>>> Since the only usage is in non-product code, I suggest to put _print_ideal into #ifndef PRODUCT,
>>> so you don't need to initialize it in product build.
>>>
>>> Also, it'll allow you to just put it on initializer list instead of doing it in the ctor body
>>> (akin to how _trace_opto_output is handled):
>>>
>>> src\hotspot\share\opto\compile.cpp:
>>>
>>> Compile::Compile( ciEnv* ci_env,
>>> ...
>>> ? : Phase(Compiler),
>>> ...
>>> ??? _has_reserved_stack_access(false),
>>> #ifndef PRODUCT
>>> ??? _trace_opto_output(directive->TraceOptoOutputOption),
>>> #endif
>>> ??? _has_method_handle_invokes(false),
>>>
>>>
>>> Overall, I don't see much value in PrintIdeal: PrintIdealGraph provides much more detailed
>>> information (even though in XML format) and IdealGraphVisualizer is better at browsing the graph.
>>> The only thing I'm usually missing is full text dump output on individual nodes (they are shown
>>> pruned in IGV; not sure whether it's IGV fault or the info is missing in the dump).
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> On 01.11.2019 18:09, Jorn Vernee wrote:
>>>> Hi,
>>>>
>>>> I'd like to add PrintIdeal as a compiler directive in order to enable PrintIdeal for only a
>>>> single method when combining it with the 'match' directive.
>>>>
>>>> Please review the following:
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8233389
>>>> Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/
>>>> (Testing = tier1, manual)
>>>>
>>>> As a heads-up; I'm not a committer on the jdk project, so if this sounds like a good idea, I
>>>> would require a sponsor to push the changes.
>>>>
>>>> Thanks,
>>>> Jorn
>>>>

From jorn.vernee at oracle.com  Fri Nov  8 10:05:15 2019
From: jorn.vernee at oracle.com (Jorn Vernee)
Date: Fri, 8 Nov 2019 11:05:15 +0100
Subject: RFR 8233389: Add PrintIdeal to compiler directives
In-Reply-To: <19f1d1e3-74c5-7228-6f5d-47f616323d03@oracle.com>
References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com>
 <21ab250e-0564-69e4-62c0-07bb4dce9082@oracle.com>
 <6547a22a-47e2-822f-0772-c2b0a7599088@oracle.com>
 <975bbd8f-8021-8a5c-544b-123b2a2e08d7@oracle.com>
 <19f1d1e3-74c5-7228-6f5d-47f616323d03@oracle.com>
Message-ID: <4b014e01-2233-77da-f3d3-7407f226833b@oracle.com>

Thanks Tobias!

Jorn

On 08/11/2019 10:54, Tobias Hartmann wrote:
> Hi Jorn,
>
> looks good to me too. I'll sponsor.
>
> Best regards,
> Tobias
>
> On 05.11.19 17:54, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.02/
>> Looks good.
>>
>>> W.r.t. usefulness of PrintIdeal vs PrintIdealGraph; The obvious thing is that PrintIdeal doesn't
>>> require IGV, which might be more useful if PrintIdeal were a diagnostic flag instead (as suggested
>>> by Nils), so it could be used from a standard JDK build which doesn't come with IGV. Another
>>> advantage that comes to mind is that PrintIdeal output is easier to share as text; you can just
>>> copy a few lines into the body of an email. That said, I haven't used either of these flags
>>> extensively, so I find it hard to judge whether one is clearly better than the other. But, it
>>> seems at least unfortunate that we have the PrintIdeal flag, but can not use it in compiler
>>> directives to filter the output.
>> It's a long-standing issue with tracing functionality in C2: both options depend on ability to dump
>> Node and Type instances in textual form which is absent in product binaries. It would be nice to
>> bundle that code in product binaries as well and turn both options into diagnostic ones, but nobody
>> have taken care of it yet.
>>
>> Also, at some point, IGV had a text view of the graph (which was pretty close to PrintIdeal output),
>> but I can't find it there anymore.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>> On 04/11/2019 15:57, Vladimir Ivanov wrote:
>>>> Hi Jorn,
>>>>
>>>> src\hotspot\share\opto\compile.hpp:
>>>> +?? bool????????????????? _print_ideal;?????????? // True if we should dump node IR for this
>>>> compilation
>>>>
>>>> Since the only usage is in non-product code, I suggest to put _print_ideal into #ifndef PRODUCT,
>>>> so you don't need to initialize it in product build.
>>>>
>>>> Also, it'll allow you to just put it on initializer list instead of doing it in the ctor body
>>>> (akin to how _trace_opto_output is handled):
>>>>
>>>> src\hotspot\share\opto\compile.cpp:
>>>>
>>>> Compile::Compile( ciEnv* ci_env,
>>>> ...
>>>>  ? : Phase(Compiler),
>>>> ...
>>>>  ??? _has_reserved_stack_access(false),
>>>> #ifndef PRODUCT
>>>>  ??? _trace_opto_output(directive->TraceOptoOutputOption),
>>>> #endif
>>>>  ??? _has_method_handle_invokes(false),
>>>>
>>>>
>>>> Overall, I don't see much value in PrintIdeal: PrintIdealGraph provides much more detailed
>>>> information (even though in XML format) and IdealGraphVisualizer is better at browsing the graph.
>>>> The only thing I'm usually missing is full text dump output on individual nodes (they are shown
>>>> pruned in IGV; not sure whether it's IGV fault or the info is missing in the dump).
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>> On 01.11.2019 18:09, Jorn Vernee wrote:
>>>>> Hi,
>>>>>
>>>>> I'd like to add PrintIdeal as a compiler directive in order to enable PrintIdeal for only a
>>>>> single method when combining it with the 'match' directive.
>>>>>
>>>>> Please review the following:
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8233389
>>>>> Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/
>>>>> (Testing = tier1, manual)
>>>>>
>>>>> As a heads-up; I'm not a committer on the jdk project, so if this sounds like a good idea, I
>>>>> would require a sponsor to push the changes.
>>>>>
>>>>> Thanks,
>>>>> Jorn
>>>>>

From doug.simon at oracle.com  Fri Nov  8 11:18:40 2019
From: doug.simon at oracle.com (Doug Simon)
Date: Fri, 8 Nov 2019 12:18:40 +0100
Subject: RFR(S) : 8233745 : [JVMCI] TranslatedException should serialize
 classloader and module info
In-Reply-To: <db7cbca8-9644-ae6d-a0a3-a808ecd13f92@oracle.com>
References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com>
 <db7cbca8-9644-ae6d-a0a3-a808ecd13f92@oracle.com>
Message-ID: <327E76EB-4257-449F-8648-0546203D8216@oracle.com>

Hi Igor,

To understand the bits lost in the translation as you describe below, can you please paste here or in the issue an example of before and after of a translated exception that looses info in the translation.

-Doug

> On 8 Nov 2019, at 00:33, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Good.
> 
> Tom and Doug should look on this.
> 
> Thanks,
> Vladimir
> 
> On 11/7/19 3:15 PM, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00/index.html
>>> 71 lines changed: 50 ins; 14 del; 7 mod;
>> Hi all,
>> could you please review the small patch which updates jdk/vm/ci/hotspot/TranslatedException to encode/decode StackTraceElement fields which were introduced in JDK9 (classloader name, module name and version fields)?
>> I wasn't able to make deserialize StackTraceElement::toString to return the same string representation as original ones b/c StackTraceElement::declaringClassObject won't be set, as a result, JDK_NON_UPGRADEABLE_MODULE and BUILTIN_CLASS_LOADER bits won't be set either and StackTraceElement::toString will have classloader names even for built-it loader (won't be in original b/c dropClassLoaderName() is true) and version of system modules (won't be in original b/c dropModuleVersion() is true); so I changed how TestTranslatedException compares original and decoded exceptions.
>> webrev: http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233745
>> testing: compiler/jvmci/ + graal tiers
>> Thanks,
>> -- Igor


From bsrbnd at gmail.com  Fri Nov  8 11:40:49 2019
From: bsrbnd at gmail.com (B. Blaser)
Date: Fri, 8 Nov 2019 12:40:49 +0100
Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <A531AF2F-FF79-40F1-8463-309A97C85BB9@oracle.com>
References: <CAEgw74CWKFCH9c744sJcAEf4_9-33vk9HBGosNH4QwgEH4nKYw@mail.gmail.com>
 <dbdbd1cd-c564-20de-a7ac-f31807b02cc5@oracle.com>
 <CH2PR11MB4280661F4597DEF39127D45FEF780@CH2PR11MB4280.namprd11.prod.outlook.com>
 <58CC373D-E70D-4503-8578-6110F3FC04EE@oracle.com>
 <CAEgw74DKy-2kQqd-PyK62DR5baWund-F60jM5DjpAoJtKUe_TA@mail.gmail.com>
 <A531AF2F-FF79-40F1-8463-309A97C85BB9@oracle.com>
Message-ID: <CAEgw74DO1nGPem+irirTwkzLoJzJ9+Z2VHfaihtaKJ=_1_o+rg@mail.gmail.com>

Yes, I'll also consider adding patterns for non-constant masks which
might help in collections like BitSet or EnumSet (see [1] & [2]) by
subsuming the shift operation provided that we do successful
experiments.

The range check of LLVM classification switch is somewhat different in
the sense that it uses BT (bit test) which unfortunately sets uncommon
flags (CF) instead of regular AND/TEST flags (ZF,...) which might
require some additional work to replace the existing operations?

However, the current patch being almost ready to be pushed, I'll look
at these questions in separate issues.

Thanks for these suggestions,
Bernard

[1] http://hg.openjdk.java.net/jdk/jdk/file/c709424ad48f/src/java.base/share/classes/java/util/BitSet.java#l452
[2] http://hg.openjdk.java.net/jdk/jdk/file/c709424ad48f/src/java.base/share/classes/java/util/RegularEnumSet.java#l165

On Thu, 7 Nov 2019 at 22:59, John Rose <john.r.rose at oracle.com> wrote:
>
> Would you consider adding patterns for non-constant masks also?
> It would be something like (And (LShift n) x), etc.
> It could be in this set or in an a follow-on.
> Thanks (says John who always wants more).

From tobias.hartmann at oracle.com  Fri Nov  8 12:52:04 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 8 Nov 2019 13:52:04 +0100
Subject: [14] RFR(S): 8233529: loopTransform.cpp:2984: Error:
 assert(p_f->Opcode() == Op_IfFalse) failed
Message-ID: <472426eb-a89a-dd79-c196-4a2e4fa6a2e2@oracle.com>

Hi,

please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8233529
http://cr.openjdk.java.net/~thartmann/8233529/webrev.00/

We have two loops (see TestRemoveMainPostLoops.java): Loop A with an inner loop followed by Loop B.

(1) OSR compilation is triggered in loop B.
(2) Pre-/main-/post loops are created for loop B.
(3) Main and post loops of B are found empty and are removed.
(4) Inner loop A is fully unrolled and removed.
(5) Only main and post loops are created for A (no pre loop -> "PeelMainPost") and main is unrolled.
(6) Pre loop of A is found empty, attempt to remove main and post loop then incorrectly selects main
loop from A.

The loop layout looks like this:
  Loop: N0/N0  has_sfpt
    Loop: N383/N718  limit_check sfpts={ 160 }
      Loop: N512/N517  counted [int,int),+1 (4 iters)  pre has_sfpt   <- belongs to A
      Loop: N760/N338  counted [1,100),+2 (102 iters)  main has_sfpt  <- belongs to B
      Loop: N713/N716  counted [int,101),+1 (4 iters)  post has_sfpt  <- belongs to B

Please note that the order of the two loops is not like in the Java code because it's an OSR
compilation that starts execution in the second loop.

I've strengthened the asserts in locate_pre_from_main() and added a check for is_main_no_pre_loop()
in the caller.

The code has been introduced by JDK-8085832 [1].

Thanks,
Tobias

[1] https://bugs.openjdk.java.net/browse/JDK-8085832

From nils.eliasson at oracle.com  Fri Nov  8 14:58:06 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Fri, 8 Nov 2019 15:58:06 +0100
Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted
 to a load for strong refs
In-Reply-To: <db3d2acd-3f06-5660-a9d6-7b04dda99204@oracle.com>
References: <db3d2acd-3f06-5660-a9d6-7b04dda99204@oracle.com>
Message-ID: <ab0332fb-c088-37a6-cb56-a3ed5d81c924@oracle.com>

Hi Erik,

Looks good!

Regards,

Nils

On 2019-11-07 14:49, Erik ?sterlund wrote:
> Hi,
>
> We have noticed problems with the one single place where we let C2 
> touch the graph while injecting our load barriers. Right before 
> tagging a load as needing a load barrier, it is GVN transformed. The 
> problem with this is that if we emit a strong field load, the GVN 
> transformation can see through that load, by following it through the 
> store that set its value, and find that the field value really came 
> from the load of a weak Reference.get intrinsic. In this scenario, the 
> load we get out from the GVN transformation needs weak barriers, yet 
> we override it with strong barriers, as that was the semantics of the 
> access being parsed. Sigh. We already have code that tries to 
> determine if the load we got out from the GVN transformation looks 
> like a load that was created in the BarrierSetC2 factory function, so 
> one way of solving this is to refine that logic that tries to 
> determine if this was the load we created before the transformation or 
> not. But I felt like a better solution is to finish constructing the 
> access with all the intended properties *before* transformation.
>
> I massaged the code so that the GC barrier data of accesses with load 
> barriers gets passed in to the factory functions that create the 
> access, right before the transformation. This way, we construct the 
> access with the intended semantics where it is being created (parser 
> or macro expansion for field accesses in clone intrinsics). Then we do 
> not have to touch it after the GVN transformation.
>
> It does seem like there could be similar problems from other GCs, but 
> in e.g. G1, the consequences are weird suboptimal code instead of 
> anything dangerous happening. For example, we can generate SATB 
> buffering code required by G1 Reference.get() intrinsics for strong 
> accesses, due to GVN handing out earlier accesses with different 
> semantics. Perhaps that should be looked into separately as well. But 
> that investigation is outside of the scope of this bug fix.
>
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8233506/webrev.00/
>
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8233506
>
> Thanks,
> /Erik

From erik.osterlund at oracle.com  Fri Nov  8 15:00:42 2019
From: erik.osterlund at oracle.com (erik.osterlund at oracle.com)
Date: Fri, 8 Nov 2019 16:00:42 +0100
Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted
 to a load for strong refs
In-Reply-To: <ab0332fb-c088-37a6-cb56-a3ed5d81c924@oracle.com>
References: <db3d2acd-3f06-5660-a9d6-7b04dda99204@oracle.com>
 <ab0332fb-c088-37a6-cb56-a3ed5d81c924@oracle.com>
Message-ID: <9f458896-caea-cb93-6b64-f300cf116d0f@oracle.com>

Hi Nils,

Thanks!

/Erik

On 11/8/19 3:58 PM, Nils Eliasson wrote:
> Hi Erik,
>
> Looks good!
>
> Regards,
>
> Nils
>
> On 2019-11-07 14:49, Erik ?sterlund wrote:
>> Hi,
>>
>> We have noticed problems with the one single place where we let C2 
>> touch the graph while injecting our load barriers. Right before 
>> tagging a load as needing a load barrier, it is GVN transformed. The 
>> problem with this is that if we emit a strong field load, the GVN 
>> transformation can see through that load, by following it through the 
>> store that set its value, and find that the field value really came 
>> from the load of a weak Reference.get intrinsic. In this scenario, 
>> the load we get out from the GVN transformation needs weak barriers, 
>> yet we override it with strong barriers, as that was the semantics of 
>> the access being parsed. Sigh. We already have code that tries to 
>> determine if the load we got out from the GVN transformation looks 
>> like a load that was created in the BarrierSetC2 factory function, so 
>> one way of solving this is to refine that logic that tries to 
>> determine if this was the load we created before the transformation 
>> or not. But I felt like a better solution is to finish constructing 
>> the access with all the intended properties *before* transformation.
>>
>> I massaged the code so that the GC barrier data of accesses with load 
>> barriers gets passed in to the factory functions that create the 
>> access, right before the transformation. This way, we construct the 
>> access with the intended semantics where it is being created (parser 
>> or macro expansion for field accesses in clone intrinsics). Then we 
>> do not have to touch it after the GVN transformation.
>>
>> It does seem like there could be similar problems from other GCs, but 
>> in e.g. G1, the consequences are weird suboptimal code instead of 
>> anything dangerous happening. For example, we can generate SATB 
>> buffering code required by G1 Reference.get() intrinsics for strong 
>> accesses, due to GVN handing out earlier accesses with different 
>> semantics. Perhaps that should be looked into separately as well. But 
>> that investigation is outside of the scope of this bug fix.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8233506/webrev.00/
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8233506
>>
>> Thanks,
>> /Erik


From erik.osterlund at oracle.com  Fri Nov  8 15:15:48 2019
From: erik.osterlund at oracle.com (erik.osterlund at oracle.com)
Date: Fri, 8 Nov 2019 16:15:48 +0100
Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted
 to a load for strong refs
In-Reply-To: <575000ef-fc06-c33e-07e3-d8046db1ea43@oracle.com>
References: <db3d2acd-3f06-5660-a9d6-7b04dda99204@oracle.com>
 <575000ef-fc06-c33e-07e3-d8046db1ea43@oracle.com>
Message-ID: <1e3fd82d-2235-e5c6-3e41-4d5b5242ced4@oracle.com>

Hi Tobias,

On 11/8/19 9:38 AM, Tobias Hartmann wrote:
> Hi Erik,
>
> On 07.11.19 14:49, Erik ?sterlund wrote:
>> We have noticed problems with the one single place where we let C2 touch the graph while injecting
>> our load barriers. Right before tagging a load as needing a load barrier, it is GVN transformed. The
>> problem with this is that if we emit a strong field load, the GVN transformation can see through
>> that load, by following it through the store that set its value, and find that the field value
>> really came from the load of a weak Reference.get intrinsic. In this scenario, the load we get out
>> from the GVN transformation needs weak barriers, yet we override it with strong barriers, as that
>> was the semantics of the access being parsed. Sigh. We already have code that tries to determine if
>> the load we got out from the GVN transformation looks like a load that was created in the
>> BarrierSetC2 factory function, so one way of solving this is to refine that logic that tries to
>> determine if this was the load we created before the transformation or not.
> Do we still need that logic with your change?

Nope! :) Because the access nodes have their barrier data populated 
before transformation, describing the semantics of the produced access, 
we don't care what GVN gives us after transformation. Whatever it gives 
us has the correct semantics for the corresponding access that produced it.

>> But I felt like a better
>> solution is to finish constructing the access with all the intended properties *before* transformation
> Yes, that's much better.
>
>> I massaged the code so that the GC barrier data of accesses with load barriers gets passed in to the
>> factory functions that create the access, right before the transformation. This way, we construct
>> the access with the intended semantics where it is being created (parser or macro expansion for
>> field accesses in clone intrinsics). Then we do not have to touch it after the GVN transformation.
>>
>> It does seem like there could be similar problems from other GCs, but in e.g. G1, the consequences
>> are weird suboptimal code instead of anything dangerous happening. For example, we can generate SATB
>> buffering code required by G1 Reference.get() intrinsics for strong accesses, due to GVN handing out
>> earlier accesses with different semantics. Perhaps that should be looked into separately as well.
>> But that investigation is outside of the scope of this bug fix.
> Could you please file an RFE for that? I also wonder if we could assert that the barrier data is set
> when GVN is performed? That way we would catch problems like the one you've described above early.

Sure, will file another RFE for that.
Regarding the assert, it's not obvious what it would look like, since 
the assert in the transformation code has to know exactly what nodes are 
expected to have GC data. For example, a LoadP might be used to read any 
pointer, including but not limited to oops. And some oops don't need 
barriers like the threadOop due to being processed in safepoints. So 
given a LoadP node for example, I don't know if we can determine whether 
it should or should not have GC data.

>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8233506/webrev.00/
> Looks good to me!

Thanks Tobias!

/Erik

> Best regards,
> Tobias


From goetz.lindenmaier at sap.com  Fri Nov  8 15:32:48 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Fri, 8 Nov 2019 15:32:48 +0000
Subject: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
In-Reply-To: <5e283ed2-1f93-f3a1-1762-da6d5cf465a2@oracle.com>
References: <AM6PR02MB534783144A9A6CF30C1049F8EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <3f1be78d-b5e3-192b-d05f-e81ed520d65a@oracle.com>
 <AM6PR02MB534729CACDA052E9B80ED146EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <5e283ed2-1f93-f3a1-1762-da6d5cf465a2@oracle.com>
Message-ID: <AM6PR02MB5347AB9322749A90AED3128BEC7B0@AM6PR02MB5347.eurprd02.prod.outlook.com>

Hi,

I waited for https://bugs.openjdk.java.net/browse/JDK-8233081
which makes one of the fixes unnecessary.
Also, I had to fix the argument of verify_oop_helper 
from oop to oopDesc* for the fastdebug build.

New webrev:
http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/03/

Best regards,
  Goetz.

> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Freitag, 18. Oktober 2019 01:38
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-
> compiler-dev at openjdk.java.net>
> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> 
> On 18/10/2019 12:10 am, Lindenmaier, Goetz wrote:
> > Hi David,
> >
> > you are right, thanks for pointing me to that!
> > Doing one test for vm.bits=64 and one for 32 should fix it:
> > http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
> 
> s/01/02/ :)
> 
> For the 32-bit case you can delete the line:
> 
>     * @requires vm.debug & (os.arch != "sparc") & (os.arch != "sparcv9")
> 
> For the 64-but case you can delete the "sparc" check from the same line.
> 
> Thanks,
> David
> 
> >
> > Best regards,
> >    Goetz.
> >
> >> -----Original Message-----
> >> From: David Holmes <david.holmes at oracle.com>
> >> Sent: Donnerstag, 17. Oktober 2019 13:18
> >> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> >> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-
> >> compiler-dev at openjdk.java.net>
> >> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> >>
> >> Hi Goetz,
> >>
> >> UseCompressedOops is a 64-bit flag only so your change will break the
> >> test on 32-bit systems.
> >>
> >> David
> >>
> >> On 17/10/2019 8:55 pm, Lindenmaier, Goetz wrote:
> >>> Hi,
> >>>
> >>> 8231058 introduced a test that enables +VerifyOops.
> >>> This fails on ppc, because this was not used in a very
> >>> long time.
> >>>
> >>> The crash is caused by passing compressed oops from
> >>> LIR_Assembler::store() to the checker routine.
> >>> I fix this by implementing a checker routine verify_coop
> >>> that first decompresses the coop.  This makes the new
> >>> test pass.
> >>>
> >>> Further testing showed that the additional checker
> >>> coding makes Patching Stubs overflow. These
> >>> can not be increased in size to fit the code. I
> >>> disable generating verify_oop code in LIRAssembler::load()
> >>> which fixes the issue.
> >>>
> >>> Further I extended the message printed when verification
> >>> of an oop failed. First, I print the location in the source
> >>> code where the checker code was generated. Second,
> >>> I print the faulty oop.
> >>>
> >>> I also improved the message printed when PatchingStubs
> >>> overflow.
> >>>
> >>> Finally, I improve the test to run with and without compressed
> >>> Oops.
> >>>
> >>> Please review:
> >>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
> >>>
> >>> @runtime as I modify the test introduced there
> >>> @compiler as the error is in C1.
> >>>
> >>> Best regards,
> >>>     Goetz.
> >>>

From vladimir.kozlov at oracle.com  Fri Nov  8 16:50:50 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 8 Nov 2019 08:50:50 -0800
Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed:
 must have CFG nodes
In-Reply-To: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>
References: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>
Message-ID: <a95b6f2f-4c88-f964-84f4-e7e8aad342df@oracle.com>

Good.

thanks,
Vladimir

On 11/8/19 1:46 AM, Tobias Hartmann wrote:
> Hi,
> 
> please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8233656
> http://cr.openjdk.java.net/~thartmann/8233656/webrev.00/
> 
> During IGVN, we process a CastII node that carries a non-zero dependency from
> GraphKit::cast_not_null [1]. ConstraintCastNode::dominating_cast then finds another CastII and
> checks if it's dominating. We assert in PhaseGVN::is_dominator_helper because the other CastII has a
> ProjNode as control input that has !is_CFG() because it's input is TOP [2]. The input has been
> replaced in the same round of IGVN and the projection is already on the IGVN worklist but hasn't
> been processed yet (it will go away).
> 
> I propose to simply check the control inputs for is_CFG().
> 
> I can reproduce the issue with a complex Javafuzzer generated test (attached to the bug) but minimal
> changes/simplifications to the test cause the issue to not reproduce anymore because it depends on
> the order in which nodes are processed by IGVN. So I don't think it makes sense to include that
> fragile test.
> 
> This has been triggered by my fix for 8229496 [3] which added additional Cast nodes but I believe it
> can also happen without these changes.
> 
> Thanks,
> Tobias
> 
> [1] https://hg.openjdk.java.net/jdk/jdk/rev/86b95fc6ca32#l12.40
> [2] https://hg.openjdk.java.net/jdk/jdk/file/47c20fc6a517/src/hotspot/share/opto/multnode.cpp#l83
> [3] https://bugs.openjdk.java.net/browse/JDK-8229496
> 

From vladimir.kozlov at oracle.com  Fri Nov  8 17:00:42 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 8 Nov 2019 09:00:42 -0800
Subject: [14] RFR(S): 8233529: loopTransform.cpp:2984: Error:
 assert(p_f->Opcode() == Op_IfFalse) failed
In-Reply-To: <472426eb-a89a-dd79-c196-4a2e4fa6a2e2@oracle.com>
References: <472426eb-a89a-dd79-c196-4a2e4fa6a2e2@oracle.com>
Message-ID: <1de496aa-f57b-1ce1-5fa2-5d3488217887@oracle.com>

Looks good. How much time take for Graal to run the test (you switched off TieredCompilation)?

Vladimir

On 11/8/19 4:52 AM, Tobias Hartmann wrote:
> Hi,
> 
> please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8233529
> http://cr.openjdk.java.net/~thartmann/8233529/webrev.00/
> 
> We have two loops (see TestRemoveMainPostLoops.java): Loop A with an inner loop followed by Loop B.
> 
> (1) OSR compilation is triggered in loop B.
> (2) Pre-/main-/post loops are created for loop B.
> (3) Main and post loops of B are found empty and are removed.
> (4) Inner loop A is fully unrolled and removed.
> (5) Only main and post loops are created for A (no pre loop -> "PeelMainPost") and main is unrolled.
> (6) Pre loop of A is found empty, attempt to remove main and post loop then incorrectly selects main
> loop from A.
> 
> The loop layout looks like this:
>    Loop: N0/N0  has_sfpt
>      Loop: N383/N718  limit_check sfpts={ 160 }
>        Loop: N512/N517  counted [int,int),+1 (4 iters)  pre has_sfpt   <- belongs to A
>        Loop: N760/N338  counted [1,100),+2 (102 iters)  main has_sfpt  <- belongs to B
>        Loop: N713/N716  counted [int,101),+1 (4 iters)  post has_sfpt  <- belongs to B
> 
> Please note that the order of the two loops is not like in the Java code because it's an OSR
> compilation that starts execution in the second loop.
> 
> I've strengthened the asserts in locate_pre_from_main() and added a check for is_main_no_pre_loop()
> in the caller.
> 
> The code has been introduced by JDK-8085832 [1].
> 
> Thanks,
> Tobias
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8085832
> 

From vladimir.kozlov at oracle.com  Fri Nov  8 17:16:01 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 8 Nov 2019 09:16:01 -0800
Subject: [14] RFR(S): 8229694: JVM crash in SWPointer during C2 OSR
 compilation
In-Reply-To: <ac04b5f1-a652-e638-cd40-04fc01fca203@oracle.com>
References: <ac04b5f1-a652-e638-cd40-04fc01fca203@oracle.com>
Message-ID: <c7d7453f-318b-33ee-8ff6-6c511670a4bc@oracle.com>

Looks good. It looks like you now know our vector optimizations ;)
Please file a RFE, as you suggested, to clean this up later.

Thanks,
Vladimir

On 11/6/19 2:13 AM, Christian Hagedorn wrote:
> Hi
> 
> Please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8229694
> http://cr.openjdk.java.net/~chagedorn/8229694/webrev.00/
> 
> The JVM crashes in the testcase when trying to dereference mem at [1] which is NULL. This happens 
> when converting packs into vector nodes in SuperWord::output() by reading an unexpected NULL value 
> with SuperWord::align_to_ref() [2]. The corresponding field _align_to_ref is set to NULL at [3] 
> since best_align_to_mem_ref was assigned NULL just before at [4]. _packset only contained one pack 
> and there were no memory operations left to be processed (memops is empty). As a result 
> SuperWord::find_align_to_ref() will return NULL since it needs at least two operations to find an 
> alignment.
> 
> The fix is straight forward to directly use the alignment of the only pack remaining if there are no 
> memory operations left in memops to find another alignment.
> 
> 
> The testcase creates such a situation where only one pack remains at [4] when the loop is unrolled 
> four times. When calling SuperWord::find_adjacent_refs() there are:
> - 4 StoreI for intArr[j-1] = 400
> - 4 StoreC for shortArr[j] = 30
> - 2 StoreI for intArr[7] = 260 // Initially 4 but 2 are removed by IGVN in Ideal()
> - 2 StoreC for shortArr[10] = 10 // Initially 4 but 2 are removed by IGVN in Ideal()
> - 2 LoadI (and 2 StoreI) for iFld = intArr[j] // Initially 4 each but 2 of each are removed by IGVN 
> in Ideal()
> 
> The field stores are obviously ignored for the superword algorithm. intArr[j-1] aligns with 
> intArr[7] and therefore create_pack is true. The only pack created is one with two immediately 
> following stores for intArr[j-1]. The IGVN algorithm is not able to remove the first redundant store 
> to intArr[7] when the loop is unrolled the first time. Only when unrolling it again the second time, 
> it is able to remove the two newly created redundant stores to intArr[7]. This leaves us with the 
> following depencendies of stores: "intArr[j-1] -> intArr[j-1] -> intArr[7] -> intArr[j-1] -> 
> intArr[7] -> intArr[j-1]" from which only the first two operations can be used to create a pack.
> 
> The very same applies to the StoreC nodes. As a result, one pack for StoreI and one for StoreC are 
> created in total. There are now only the two LoadI nodes of intArr[j] left which are not aligned 
> with intArr[j-1]. Therefore, all StoreI packs are removed at [5]. This leaves us with exactly one 
> ShortC pack and an empty memops list which sets the alignment to NULL and eventually lets the JVM 
> crash at [1].
> 
> We might want to file an RFE to investigate further why IGVN cannot remove the first redundant 
> stores to intArr[7], shortArr[10], and iFld, respectively (even though it's quite useless to keep 
> setting the same values in a loop). This problem can also be observed if the loop only contains the 
> statement "iFld = intArr[j]". But I think even if those redundant stores would have been optimized 
> away we should have this fix to handle the situation with only one pack and no memory operations 
> remaining.
> 
> 
> Thank you!
> 
> Best regards,
> Christian
> 
> 
> [1] http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l3608
> [2] http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l2328
> [3] http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l732
> [4] http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l708
> [5] http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l688

From igor.ignatyev at oracle.com  Fri Nov  8 19:33:32 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Fri, 8 Nov 2019 11:33:32 -0800
Subject: RFR(S) : 8233745 : [JVMCI] TranslatedException should serialize
 classloader and module info
In-Reply-To: <327E76EB-4257-449F-8648-0546203D8216@oracle.com>
References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com>
 <db7cbca8-9644-ae6d-a0a3-a808ecd13f92@oracle.com>
 <327E76EB-4257-449F-8648-0546203D8216@oracle.com>
Message-ID: <5716B217-5CAE-4512-A063-CC416336B064@oracle.com>

Hi Doug,

I've added the difference in string representations to the bug report, for the connivence, here is the part of the diff:
> < at app//jdk.vm.ci.hotspot.test.TestTranslatedException.encodeDecodeTest(TestTranslatedException.java:73) 
> < at java.base at 14-internal/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
<...>
> ---
> > at jdk.vm.ci.hotspot.test.TestTranslatedException.encodeDecodeTest(TestTranslatedException.java:73) 
> > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 

as you can see there is 'app/' for classes loaded by application loader  (and another '/' as in the tests are from unnamed module) and '@14-internal' for classes from system modules.

Thanks,
-- Igor

> On Nov 8, 2019, at 3:18 AM, Doug Simon <doug.simon at oracle.com> wrote:
> 
> Hi Igor,
> 
> To understand the bits lost in the translation as you describe below, can you please paste here or in the issue an example of before and after of a translated exception that looses info in the translation.
> 
> -Doug
> 
>> On 8 Nov 2019, at 00:33, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>> 
>> Good.
>> 
>> Tom and Doug should look on this.
>> 
>> Thanks,
>> Vladimir
>> 
>> On 11/7/19 3:15 PM, Igor Ignatyev wrote:
>>> http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00/index.html
>>>> 71 lines changed: 50 ins; 14 del; 7 mod;
>>> Hi all,
>>> could you please review the small patch which updates jdk/vm/ci/hotspot/TranslatedException to encode/decode StackTraceElement fields which were introduced in JDK9 (classloader name, module name and version fields)?
>>> I wasn't able to make deserialize StackTraceElement::toString to return the same string representation as original ones b/c StackTraceElement::declaringClassObject won't be set, as a result, JDK_NON_UPGRADEABLE_MODULE and BUILTIN_CLASS_LOADER bits won't be set either and StackTraceElement::toString will have classloader names even for built-it loader (won't be in original b/c dropClassLoaderName() is true) and version of system modules (won't be in original b/c dropModuleVersion() is true); so I changed how TestTranslatedException compares original and decoded exceptions.
>>> webrev: http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233745
>>> testing: compiler/jvmci/ + graal tiers
>>> Thanks,
>>> -- Igor
> 


From doug.simon at oracle.com  Fri Nov  8 22:38:58 2019
From: doug.simon at oracle.com (Doug Simon)
Date: Fri, 8 Nov 2019 23:38:58 +0100
Subject: RFR(S) : 8233745 : [JVMCI] TranslatedException should serialize
 classloader and module info
In-Reply-To: <5716B217-5CAE-4512-A063-CC416336B064@oracle.com>
References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com>
 <db7cbca8-9644-ae6d-a0a3-a808ecd13f92@oracle.com>
 <327E76EB-4257-449F-8648-0546203D8216@oracle.com>
 <5716B217-5CAE-4512-A063-CC416336B064@oracle.com>
Message-ID: <416DB2FA-EEC4-4AB4-8265-C166F68EBEB9@oracle.com>

Ok, I think the translated exceptions still convert the most important information. Looks good to me.

-Doug

> On 8 Nov 2019, at 20:33, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
> 
> Hi Doug,
> 
> I've added the difference in string representations to the bug report, for the connivence, here is the part of the diff:
>> < at app//jdk.vm.ci.hotspot.test.TestTranslatedException.encodeDecodeTest(TestTranslatedException.java:73) 
>> < at java.base at 14-internal/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> <...>
>> ---
>>> at jdk.vm.ci.hotspot.test.TestTranslatedException.encodeDecodeTest(TestTranslatedException.java:73) 
>>> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> 
> as you can see there is 'app/' for classes loaded by application loader  (and another '/' as in the tests are from unnamed module) and '@14-internal' for classes from system modules.
> 
> Thanks,
> -- Igor
> 
>> On Nov 8, 2019, at 3:18 AM, Doug Simon <doug.simon at oracle.com> wrote:
>> 
>> Hi Igor,
>> 
>> To understand the bits lost in the translation as you describe below, can you please paste here or in the issue an example of before and after of a translated exception that looses info in the translation.
>> 
>> -Doug
>> 
>>> On 8 Nov 2019, at 00:33, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>> 
>>> Good.
>>> 
>>> Tom and Doug should look on this.
>>> 
>>> Thanks,
>>> Vladimir
>>> 
>>> On 11/7/19 3:15 PM, Igor Ignatyev wrote:
>>>> http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00/index.html
>>>>> 71 lines changed: 50 ins; 14 del; 7 mod;
>>>> Hi all,
>>>> could you please review the small patch which updates jdk/vm/ci/hotspot/TranslatedException to encode/decode StackTraceElement fields which were introduced in JDK9 (classloader name, module name and version fields)?
>>>> I wasn't able to make deserialize StackTraceElement::toString to return the same string representation as original ones b/c StackTraceElement::declaringClassObject won't be set, as a result, JDK_NON_UPGRADEABLE_MODULE and BUILTIN_CLASS_LOADER bits won't be set either and StackTraceElement::toString will have classloader names even for built-it loader (won't be in original b/c dropClassLoaderName() is true) and version of system modules (won't be in original b/c dropModuleVersion() is true); so I changed how TestTranslatedException compares original and decoded exceptions.
>>>> webrev: http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00
>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233745
>>>> testing: compiler/jvmci/ + graal tiers
>>>> Thanks,
>>>> -- Igor
>> 
> 


From per.liden at oracle.com  Fri Nov  8 22:45:41 2019
From: per.liden at oracle.com (Per Liden)
Date: Fri, 8 Nov 2019 23:45:41 +0100
Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted
 to a load for strong refs
In-Reply-To: <db3d2acd-3f06-5660-a9d6-7b04dda99204@oracle.com>
References: <db3d2acd-3f06-5660-a9d6-7b04dda99204@oracle.com>
Message-ID: <3d08b7f4-68a4-5e69-1e39-cf3725adc8da@oracle.com>

On 11/7/19 2:49 PM, Erik ?sterlund wrote:
> Hi,
> 
> We have noticed problems with the one single place where we let C2 touch 
> the graph while injecting our load barriers. Right before tagging a load 
> as needing a load barrier, it is GVN transformed. The problem with this 
> is that if we emit a strong field load, the GVN transformation can see 
> through that load, by following it through the store that set its value, 
> and find that the field value really came from the load of a weak 
> Reference.get intrinsic. In this scenario, the load we get out from the 
> GVN transformation needs weak barriers, yet we override it with strong 
> barriers, as that was the semantics of the access being parsed. Sigh. We 
> already have code that tries to determine if the load we got out from 
> the GVN transformation looks like a load that was created in the 
> BarrierSetC2 factory function, so one way of solving this is to refine 
> that logic that tries to determine if this was the load we created 
> before the transformation or not. But I felt like a better solution is 
> to finish constructing the access with all the intended properties 
> *before* transformation.
> 
> I massaged the code so that the GC barrier data of accesses with load 
> barriers gets passed in to the factory functions that create the access, 
> right before the transformation. This way, we construct the access with 
> the intended semantics where it is being created (parser or macro 
> expansion for field accesses in clone intrinsics). Then we do not have 
> to touch it after the GVN transformation.
> 
> It does seem like there could be similar problems from other GCs, but in 
> e.g. G1, the consequences are weird suboptimal code instead of anything 
> dangerous happening. For example, we can generate SATB buffering code 
> required by G1 Reference.get() intrinsics for strong accesses, due to 
> GVN handing out earlier accesses with different semantics. Perhaps that 
> should be looked into separately as well. But that investigation is 
> outside of the scope of this bug fix.
> 
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8233506/webrev.00/

Looks good!

As discussed off-line, even if it's not a problem for ZGC, we should fix 
so that we never call access.set_raw_access() *after* GVN 
transformation. But let's do that as a separate fix.

/Per

> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8233506
> 
> Thanks,
> /Erik

From john.r.rose at oracle.com  Fri Nov  8 23:00:52 2019
From: john.r.rose at oracle.com (John Rose)
Date: Fri, 8 Nov 2019 15:00:52 -0800
Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <CAEgw74DO1nGPem+irirTwkzLoJzJ9+Z2VHfaihtaKJ=_1_o+rg@mail.gmail.com>
References: <CAEgw74CWKFCH9c744sJcAEf4_9-33vk9HBGosNH4QwgEH4nKYw@mail.gmail.com>
 <dbdbd1cd-c564-20de-a7ac-f31807b02cc5@oracle.com>
 <CH2PR11MB4280661F4597DEF39127D45FEF780@CH2PR11MB4280.namprd11.prod.outlook.com>
 <58CC373D-E70D-4503-8578-6110F3FC04EE@oracle.com>
 <CAEgw74DKy-2kQqd-PyK62DR5baWund-F60jM5DjpAoJtKUe_TA@mail.gmail.com>
 <A531AF2F-FF79-40F1-8463-309A97C85BB9@oracle.com>
 <CAEgw74DO1nGPem+irirTwkzLoJzJ9+Z2VHfaihtaKJ=_1_o+rg@mail.gmail.com>
Message-ID: <74901E7C-D45E-47FF-A61B-913F91CC7EEC@oracle.com>

On Nov 8, 2019, at 3:40 AM, B. Blaser <bsrbnd at gmail.com> wrote:
> 
> Yes, I'll also consider adding patterns for non-constant masks which
> might help in collections like BitSet or EnumSet (see [1] & [2]) by
> subsuming the shift operation provided that we do successful
> experiments.

Yes.  There?s lots of code like that out there.  If you make a micro
of enum performance, please be aware of this missing bit:
  https://bugs.openjdk.java.net/browse/JDK-8161245

I added you as a watcher.  The problem is that, although an
enum *object* often constant folds, its *ordinal field* fails to constant
fold (last time I looked).  The above is a point fix, but we also need
a more comprehensive fix.

> The range check of LLVM classification switch is somewhat different in
> the sense that it uses BT (bit test) which unfortunately sets uncommon
> flags (CF) instead of regular AND/TEST flags (ZF,...) which might
> require some additional work to replace the existing operations?

That may be.  Flag handling is a tricky part of C2.  I think there are
pre-existing instructions that work with CF which can serves as examples.

FWIW I noticed this as a possibly relevant change to LLVM:
  https://reviews.llvm.org/D48606

> However, the current patch being almost ready to be pushed, I'll look
> at these questions in separate issues.

That makes perfect sense; don?t delay what you have.  If you get stalled on
the follow-up work, please do post your learnings on JBS (like JDK-8214239).

> Thanks for these suggestions,
> Bernard

Thank you for taking this on!

? John

> 
> [1] http://hg.openjdk.java.net/jdk/jdk/file/c709424ad48f/src/java.base/share/classes/java/util/BitSet.java#l452
> [2] http://hg.openjdk.java.net/jdk/jdk/file/c709424ad48f/src/java.base/share/classes/java/util/RegularEnumSet.java#l165
> 
> On Thu, 7 Nov 2019 at 22:59, John Rose <john.r.rose at oracle.com> wrote:
>> 
>> Would you consider adding patterns for non-constant masks also?
>> It would be something like (And (LShift n) x), etc.
>> It could be in this set or in an a follow-on.
>> Thanks (says John who always wants more).


From john.r.rose at oracle.com  Sat Nov  9 01:24:10 2019
From: john.r.rose at oracle.com (John Rose)
Date: Fri, 8 Nov 2019 17:24:10 -0800
Subject: final field values should be trusted as constant (filed as
 JDK-8233873)
Message-ID: <B3AE0B03-0AD0-4C1B-A5A4-67CF789A5B9D@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8233873

# Problem

The JVM JITs routinely optimize references to final fields as constant
values, when a JIT can deduce a constant containing object.  This is a
fundamental capability for producing good code.

Currently, though, only a small number of "white listed" fields are
treated in this way, since vigorously optimizing _all_ final fields is
thought to have unknown risky consequences.  The white listing logic
is defined using the function `trust_final_non_static_fields` and
similar logic setting `
as part of changes like JDK-6912065 and JDK-8140483.

# Proposal

The JVM should support an option `FoldConstantFields` which treats
bypasses the above "white list" and uses a "black list" instead as
needed.  Initially this option should be turned off by default.
Turning it on should, initially, also turn on a new option
`VerifyConstantFields` which detects updates to final fields and
diagnoses them with some selectable mix of warnings or errors.

(See below for discussion of how updates to final fields can occcur.
The short summary is "reflection, JNI, or Unsafe". Each of these
requires a different remediation.)

This feature will not solve the problem of full optimization of
constant fields all at once, but will set the stage for finding and
fixing problems caused by such optimizations.

The support for `FoldConstantFields` should include (either initially
or as follow-on work) the following functions:

 - Dependency recording in the JIT, whenever a final field value is
   used.  At first this should be recorded per field declaration, not
   per individual field instance, on the assumption that invalidation
   will be very rare.  This assumption may need to be revised.

 - Updates to final fields via reflection must be trapped and must
   trigger deoptimization of dependent JIT.

 - Updates to final fields via JNI must be trapped similarly.

 - Updates to final fields via other users of `Unsafe` must be trapped
   similarly.  This addresses uses of `Unsafe` _that the JDK knows
   about and controls_.

 - Encourage other users of `Unsafe` to perform similar notifications,
   and document how to do so.  Perhaps there are additional `Unsafe`
   API points to notify the JIT.

 - Placing the checking logic inside `Unsafe` is the wrong answer in
   most cases, since it would penalize well-behaved users of `Unsafe`.
   Perhaps a separate flag `VerifyUnsafeUpdates` would be applicable,
   for stress tests where performance can be sacrificed.

 - Define an API for use by privileged frameworks (including those in
   the JDK) for creating objects in a "larval" state, apart from
   normal constructor invocation.  (Possibly `Unsafe.allocateInstance`
   is such an API point; see also JNI AllocObject.)  These are
   released from the constraints on final field writing, including JIT
   invalidation.  If a JIT encounters an object in the larval state,
   the JIT will simply refrain from constant-folding its fields.

 - Define an API for promoting larval objects to a normal "adult"
   state, at which point the normal JIT optimizations would apply.  If
   this isn't done, performance will be lost only regarding the larval
   objects created by old frameworks, so perhaps this isn't needed.

 - It seems likely that the larval and adult states would need to be
   reflected in a bit pattern in the object header.  As an
   optimization, normally constructed objects would probably not need
   to have this state change in their header bits, unless perhaps they
   "escape" during their constructor call.

# Discussion

A final field can in some cases be assigned a new value.  If a JIT has
already observed the previous value of that final field, and
incorporated it into object code as a constant, then (after the
assignment of a new value to that field), the optimized object code
will execute wrongly.  We call such wrongly executing code "invalid",
and the JVM takes great care to avoid executing invalid code in
similar cases involving speculative optimizations, such as
devirtualized method calls or uncommon traps.

The basic reason for this is that the Java Memory Model requires that
all fields (including changed final fields) must be read accurately.
An accurate read yields a value that is appropriate to the current
thread, as defined by a web of "happens-before" relations.  (It is not
entirely wrong to think of these relations as a linear set, although
concurrency and races are also part of the JMM.)

But field fields _must_ be changed when an object is initialized, and
_may rarely_ change in other circumstances.  There are a number of
ways to change the current value of a final field:

0. In a constructor, a final field may be changed from its current
value (typically initial default value) to a new (possibly
non-default) value.  The JVM (per specification) allows this to occur
_multiple times_ although most sources of bytecode are thought to
avoid such behavior.

1. When a field is reflected, and `setAccessible(true)` is called, the
value may be set.  This "hook" is intended for use by deserializers
and other low-level facilities.  It is thought to be used as a
simulation of case #0 above, when an object's constructor cannot be
conveniently invoked. In a real sense, holding this option open for
serialization frameworks harms the optimization of the entire
ecosystem.

2. JNI functions such as SetBooleanField can be used to smash new
values into fields even if they are final.

3. Good old `Unsafe.setInt` can be also be used to smash new values
into fields (or parts of fields or groups of fields) even if they are
final.

Although a debugger can forcibly change the value of a field from
outside the JVM, via APIs in the `jdk.jdi` module, it appears to be
impossible to use those APIs to change final fields.

It is unknown what libraries or bytecode spinners "in the wild" are
using any of the four options above in ways that would invalidate
JIT-compiled code.  Setting the JITs free to optimize fully requires a
plan for mitigating the impact of final field changes both in known
code (in the JDK) and in unknown "wild" code.

## Side note on races

Although race conditions (on non-volatile fields) allow the JVM some
latitute to return "stale" values for field references, such latitude
would usually be quite narrow, since an execution of the invalid
optimized method is likely to occur downstream of the invalidating
field update (as determined by the happens-before relation of the
JMM).  The JMM itself would have to be updated to either relax
happens-before relations pertaining to final field updates, or else
allow special race conditions that allow the JIT to use stale values
of final fields (in effect, loo king backward in time, past events
visible through the relevant happens-before events).  There are no
active proposals to update the JMM in this way, and it seems easier to
take the JMM as a given, or (at most) make very small changes to it to
further specialize the treatment of final fields.


From vladimir.x.ivanov at oracle.com  Sat Nov  9 12:38:10 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Sat, 9 Nov 2019 15:38:10 +0300
Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed:
 must have CFG nodes
In-Reply-To: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>
References: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>
Message-ID: <bf5d6250-abcc-e820-7886-a966fae1af9e@oracle.com>


> http://cr.openjdk.java.net/~thartmann/8233656/webrev.00/

> I propose to simply check the control inputs for is_CFG().

Can you just check control for TOP instead?

Also, is it worth putting an assert to ensure the node is already on 
worklist and will be eventually eliminated?

Best regards,
Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Sat Nov  9 12:40:52 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Sat, 9 Nov 2019 15:40:52 +0300
Subject: [14] RFR(S): 8233529: loopTransform.cpp:2984: Error:
 assert(p_f->Opcode() == Op_IfFalse) failed
In-Reply-To: <472426eb-a89a-dd79-c196-4a2e4fa6a2e2@oracle.com>
References: <472426eb-a89a-dd79-c196-4a2e4fa6a2e2@oracle.com>
Message-ID: <3d3aec9d-5cfb-4664-cd67-b2f8b84f319f@oracle.com>

> http://cr.openjdk.java.net/~thartmann/8233529/webrev.00/
Looks good.

Best regards,
Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Sat Nov  9 12:47:25 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Sat, 9 Nov 2019 15:47:25 +0300
Subject: [14] RFR(S): 8229694: JVM crash in SWPointer during C2 OSR
 compilation
In-Reply-To: <ac04b5f1-a652-e638-cd40-04fc01fca203@oracle.com>
References: <ac04b5f1-a652-e638-cd40-04fc01fca203@oracle.com>
Message-ID: <48dea421-af44-047a-05d8-26c2c37358b7@oracle.com>

Nice analysis, Christian!

> http://cr.openjdk.java.net/~chagedorn/8229694/webrev.00/

Looks good.

> We might want to file an RFE to investigate further why IGVN cannot 
> remove the first redundant stores to intArr[7], shortArr[10], and iFld, 
> respectively (even though it's quite useless to keep setting the same 
> values in a loop). This problem can also be observed if the loop only 
> contains the statement "iFld = intArr[j]". But I think even if those 
> redundant stores would have been optimized away we should have this fix 
> to handle the situation with only one pack and no memory operations 
> remaining.

Agree. And, please, file the RFE.

Best regards,
Vladimir Ivanov

From igor.ignatyev at oracle.com  Sat Nov  9 18:29:20 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Sat, 9 Nov 2019 10:29:20 -0800
Subject: RFR(S) : 8233745 : [JVMCI] TranslatedException should serialize
 classloader and module info
In-Reply-To: <416DB2FA-EEC4-4AB4-8265-C166F68EBEB9@oracle.com>
References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com>
 <db7cbca8-9644-ae6d-a0a3-a808ecd13f92@oracle.com>
 <327E76EB-4257-449F-8648-0546203D8216@oracle.com>
 <5716B217-5CAE-4512-A063-CC416336B064@oracle.com>
 <416DB2FA-EEC4-4AB4-8265-C166F68EBEB9@oracle.com>
Message-ID: <E812FB8D-F013-4786-B55B-CFB2BB09E3E5@oracle.com>

Doug, Vladimir,

thanks for your review, pushed.

-- Igor

> On Nov 8, 2019, at 2:38 PM, Doug Simon <doug.simon at oracle.com> wrote:
> 
> Ok, I think the translated exceptions still convert the most important information. Looks good to me.
> 
> -Doug
> 
>> On 8 Nov 2019, at 20:33, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>> 
>> Hi Doug,
>> 
>> I've added the difference in string representations to the bug report, for the connivence, here is the part of the diff:
>>> < at app//jdk.vm.ci.hotspot.test.TestTranslatedException.encodeDecodeTest(TestTranslatedException.java:73) 
>>> < at java.base at 14-internal/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
>> <...>
>>> ---
>>>> at jdk.vm.ci.hotspot.test.TestTranslatedException.encodeDecodeTest(TestTranslatedException.java:73) 
>>>> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
>> 
>> as you can see there is 'app/' for classes loaded by application loader  (and another '/' as in the tests are from unnamed module) and '@14-internal' for classes from system modules.
>> 
>> Thanks,
>> -- Igor
>> 
>>> On Nov 8, 2019, at 3:18 AM, Doug Simon <doug.simon at oracle.com> wrote:
>>> 
>>> Hi Igor,
>>> 
>>> To understand the bits lost in the translation as you describe below, can you please paste here or in the issue an example of before and after of a translated exception that looses info in the translation.
>>> 
>>> -Doug
>>> 
>>>> On 8 Nov 2019, at 00:33, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>>> 
>>>> Good.
>>>> 
>>>> Tom and Doug should look on this.
>>>> 
>>>> Thanks,
>>>> Vladimir
>>>> 
>>>> On 11/7/19 3:15 PM, Igor Ignatyev wrote:
>>>>> http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00/index.html
>>>>>> 71 lines changed: 50 ins; 14 del; 7 mod;
>>>>> Hi all,
>>>>> could you please review the small patch which updates jdk/vm/ci/hotspot/TranslatedException to encode/decode StackTraceElement fields which were introduced in JDK9 (classloader name, module name and version fields)?
>>>>> I wasn't able to make deserialize StackTraceElement::toString to return the same string representation as original ones b/c StackTraceElement::declaringClassObject won't be set, as a result, JDK_NON_UPGRADEABLE_MODULE and BUILTIN_CLASS_LOADER bits won't be set either and StackTraceElement::toString will have classloader names even for built-it loader (won't be in original b/c dropClassLoaderName() is true) and version of system modules (won't be in original b/c dropModuleVersion() is true); so I changed how TestTranslatedException compares original and decoded exceptions.
>>>>> webrev: http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00
>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233745
>>>>> testing: compiler/jvmci/ + graal tiers
>>>>> Thanks,
>>>>> -- Igor
>>> 
>> 
> 


From dl at cs.oswego.edu  Sat Nov  9 19:21:00 2019
From: dl at cs.oswego.edu (Doug Lea)
Date: Sat, 9 Nov 2019 14:21:00 -0500
Subject: final field values should be trusted as constant (filed as
 JDK-8233873)
In-Reply-To: <B3AE0B03-0AD0-4C1B-A5A4-67CF789A5B9D@oracle.com>
References: <B3AE0B03-0AD0-4C1B-A5A4-67CF789A5B9D@oracle.com>
Message-ID: <69de5d88-2487-850b-5388-6363d88d2b5b@cs.oswego.edu>

On 11/8/19 8:24 PM, John Rose wrote:

> ## Side note on races
> 
> Although race conditions (on non-volatile fields) allow the JVM some
> latitute to return "stale" values for field references, such latitude
> would usually be quite narrow, since an execution of the invalid
> optimized method is likely to occur downstream of the invalidating
> field update (as determined by the happens-before relation of the
> JMM). 

Ever since initial revisions of JLS1 version, the intent of JMM specs
(including current) is to allow compilers to believe that the value they
see in initial reads of a final field is the only value they will ever
see. So no revision is necessary on these grounds (although one of these
days there will be one that accommodates VarHandle modes etc,
formalizing http://gee.cs.oswego.edu/dl/html/j9mm.html). Some of the
spec messiness exists just to explain why compilers are allowed not to
believe this as well, because of reflection etc.

In other words, don't let JMM concerns stop you from this worthwhile effort.

-Doug


From fujie at loongson.cn  Mon Nov 11 03:56:32 2019
From: fujie at loongson.cn (Jie Fu)
Date: Mon, 11 Nov 2019 11:56:32 +0800
Subject: RFR: 8233885: Test fails with assert(comp != __null) failed: Ensure
 we have a compiler
Message-ID: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn>

Hi all,

May I get reviews for this small change?
This bug was found by running 
compiler/profiling/spectrapredefineclass/Launcher.java with -Xcomp.
The fix sets CompLevel_initial_compile to CompLevel_full_optimization if 
CompilationMode=high-only is specified.

JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233885
Webrev: http://cr.openjdk.java.net/~jiefu/8233885/webrev.00/

Testing:
 ? - make test TEST="test/hotspot/jtreg/compiler" CONF=fastdebug on 
Linux/x64

Thanks a lot.
Best regards,
Jie


From fujie at loongson.cn  Mon Nov 11 05:43:58 2019
From: fujie at loongson.cn (Jie Fu)
Date: Mon, 11 Nov 2019 13:43:58 +0800
Subject: RFR: 8233885: Test fails with assert(comp != __null) failed:
 Ensure we have a compiler
In-Reply-To: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn>
References: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn>
Message-ID: <11304b85-3a28-017b-645e-e76fe4c2d0f8@loongson.cn>

Sorry, I missed the test case in webrev.00
I just added it in webrev.01

Please review this version: 
http://cr.openjdk.java.net/~jiefu/8233885/webrev.01/

Thanks a lot.
Best regards,
Jie

On 2019/11/11 ??11:56, Jie Fu wrote:
> Hi all,
>
> May I get reviews for this small change?
> This bug was found by running 
> compiler/profiling/spectrapredefineclass/Launcher.java with -Xcomp.
> The fix sets CompLevel_initial_compile to CompLevel_full_optimization 
> if CompilationMode=high-only is specified.
>
> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233885
> Webrev: http://cr.openjdk.java.net/~jiefu/8233885/webrev.00/
>
> Testing:
> ? - make test TEST="test/hotspot/jtreg/compiler" CONF=fastdebug on 
> Linux/x64
>
> Thanks a lot.
> Best regards,
> Jie
>


From tobias.hartmann at oracle.com  Mon Nov 11 05:57:33 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 11 Nov 2019 06:57:33 +0100
Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed:
 must have CFG nodes
In-Reply-To: <a95b6f2f-4c88-f964-84f4-e7e8aad342df@oracle.com>
References: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>
 <a95b6f2f-4c88-f964-84f4-e7e8aad342df@oracle.com>
Message-ID: <574c76eb-3613-13d6-bb4d-288e5f509426@oracle.com>

Thanks Vladimir.

Best regards,
Tobias

On 08.11.19 17:50, Vladimir Kozlov wrote:
> Good.
> 
> thanks,
> Vladimir
> 
> On 11/8/19 1:46 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8233656
>> http://cr.openjdk.java.net/~thartmann/8233656/webrev.00/
>>
>> During IGVN, we process a CastII node that carries a non-zero dependency from
>> GraphKit::cast_not_null [1]. ConstraintCastNode::dominating_cast then finds another CastII and
>> checks if it's dominating. We assert in PhaseGVN::is_dominator_helper because the other CastII has a
>> ProjNode as control input that has !is_CFG() because it's input is TOP [2]. The input has been
>> replaced in the same round of IGVN and the projection is already on the IGVN worklist but hasn't
>> been processed yet (it will go away).
>>
>> I propose to simply check the control inputs for is_CFG().
>>
>> I can reproduce the issue with a complex Javafuzzer generated test (attached to the bug) but minimal
>> changes/simplifications to the test cause the issue to not reproduce anymore because it depends on
>> the order in which nodes are processed by IGVN. So I don't think it makes sense to include that
>> fragile test.
>>
>> This has been triggered by my fix for 8229496 [3] which added additional Cast nodes but I believe it
>> can also happen without these changes.
>>
>> Thanks,
>> Tobias
>>
>> [1] https://hg.openjdk.java.net/jdk/jdk/rev/86b95fc6ca32#l12.40
>> [2] https://hg.openjdk.java.net/jdk/jdk/file/47c20fc6a517/src/hotspot/share/opto/multnode.cpp#l83
>> [3] https://bugs.openjdk.java.net/browse/JDK-8229496
>>

From tobias.hartmann at oracle.com  Mon Nov 11 06:10:58 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 11 Nov 2019 07:10:58 +0100
Subject: [14] RFR(S): 8233529: loopTransform.cpp:2984: Error:
 assert(p_f->Opcode() == Op_IfFalse) failed
In-Reply-To: <1de496aa-f57b-1ce1-5fa2-5d3488217887@oracle.com>
References: <472426eb-a89a-dd79-c196-4a2e4fa6a2e2@oracle.com>
 <1de496aa-f57b-1ce1-5fa2-5d3488217887@oracle.com>
Message-ID: <29541caa-f0aa-d8f6-56ea-cb350944c2e5@oracle.com>

Thanks Vladimir. On my machine, it takes 30s with Graal as JIT vs. 10s with C2.

Best regards,
Tobias

On 08.11.19 18:00, Vladimir Kozlov wrote:
> Looks good. How much time take for Graal to run the test (you switched off TieredCompilation)?
> 
> Vladimir
> 
> On 11/8/19 4:52 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8233529
>> http://cr.openjdk.java.net/~thartmann/8233529/webrev.00/
>>
>> We have two loops (see TestRemoveMainPostLoops.java): Loop A with an inner loop followed by Loop B.
>>
>> (1) OSR compilation is triggered in loop B.
>> (2) Pre-/main-/post loops are created for loop B.
>> (3) Main and post loops of B are found empty and are removed.
>> (4) Inner loop A is fully unrolled and removed.
>> (5) Only main and post loops are created for A (no pre loop -> "PeelMainPost") and main is unrolled.
>> (6) Pre loop of A is found empty, attempt to remove main and post loop then incorrectly selects main
>> loop from A.
>>
>> The loop layout looks like this:
>> ?? Loop: N0/N0? has_sfpt
>> ???? Loop: N383/N718? limit_check sfpts={ 160 }
>> ?????? Loop: N512/N517? counted [int,int),+1 (4 iters)? pre has_sfpt?? <- belongs to A
>> ?????? Loop: N760/N338? counted [1,100),+2 (102 iters)? main has_sfpt? <- belongs to B
>> ?????? Loop: N713/N716? counted [int,101),+1 (4 iters)? post has_sfpt? <- belongs to B
>>
>> Please note that the order of the two loops is not like in the Java code because it's an OSR
>> compilation that starts execution in the second loop.
>>
>> I've strengthened the asserts in locate_pre_from_main() and added a check for is_main_no_pre_loop()
>> in the caller.
>>
>> The code has been introduced by JDK-8085832 [1].
>>
>> Thanks,
>> Tobias
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8085832
>>

From tobias.hartmann at oracle.com  Mon Nov 11 06:13:30 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 11 Nov 2019 07:13:30 +0100
Subject: [14] RFR(S): 8233529: loopTransform.cpp:2984: Error:
 assert(p_f->Opcode() == Op_IfFalse) failed
In-Reply-To: <3d3aec9d-5cfb-4664-cd67-b2f8b84f319f@oracle.com>
References: <472426eb-a89a-dd79-c196-4a2e4fa6a2e2@oracle.com>
 <3d3aec9d-5cfb-4664-cd67-b2f8b84f319f@oracle.com>
Message-ID: <7d4ab00c-e3c6-044d-45a0-c1de60dddcd0@oracle.com>

Thanks Vladimir.

Best regards,
Tobias

On 09.11.19 13:40, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~thartmann/8233529/webrev.00/
> Looks good.
> 
> Best regards,
> Vladimir Ivanov

From tobias.hartmann at oracle.com  Mon Nov 11 07:07:23 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 11 Nov 2019 08:07:23 +0100
Subject: [14] RFR(S): 8229694: JVM crash in SWPointer during C2 OSR
 compilation
In-Reply-To: <48dea421-af44-047a-05d8-26c2c37358b7@oracle.com>
References: <ac04b5f1-a652-e638-cd40-04fc01fca203@oracle.com>
 <48dea421-af44-047a-05d8-26c2c37358b7@oracle.com>
Message-ID: <bdec3244-b4bb-99e1-8fec-7e0c5f2aa299@oracle.com>


On 09.11.19 13:47, Vladimir Ivanov wrote:
> Nice analysis, Christian!

I agree, very nice.

Looks good to me too. I'll sponsor.

Best regards,
Tobias

From tobias.hartmann at oracle.com  Mon Nov 11 07:12:24 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 11 Nov 2019 08:12:24 +0100
Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted
 to a load for strong refs
In-Reply-To: <1e3fd82d-2235-e5c6-3e41-4d5b5242ced4@oracle.com>
References: <db3d2acd-3f06-5660-a9d6-7b04dda99204@oracle.com>
 <575000ef-fc06-c33e-07e3-d8046db1ea43@oracle.com>
 <1e3fd82d-2235-e5c6-3e41-4d5b5242ced4@oracle.com>
Message-ID: <17f9f020-f064-6d00-2f11-c71b35832611@oracle.com>

Hi Erik,

On 08.11.19 16:15, erik.osterlund at oracle.com wrote:
> On 11/8/19 9:38 AM, Tobias Hartmann wrote:
>> Do we still need that logic with your change?
> 
> Nope! :) Because the access nodes have their barrier data populated before transformation,
> describing the semantics of the produced access, we don't care what GVN gives us after
> transformation. Whatever it gives us has the correct semantics for the corresponding access that
> produced it.

Can we remove that code then? I mean the one you referred to with "We already have code that tries
to determine if the load we got out from the GVN transformation looks like a load that was created
in the BarrierSetC2 factory function".

> Sure, will file another RFE for that.
> Regarding the assert, it's not obvious what it would look like, since the assert in the
> transformation code has to know exactly what nodes are expected to have GC data. For example, a
> LoadP might be used to read any pointer, including but not limited to oops. And some oops don't need
> barriers like the threadOop due to being processed in safepoints. So given a LoadP node for example,
> I don't know if we can determine whether it should or should not have GC data.

Right, that doesn't seem feasible.

Ship it!

Best regards,
Tobias

From tobias.hartmann at oracle.com  Mon Nov 11 07:44:04 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 11 Nov 2019 08:44:04 +0100
Subject: RFR: 8233885: Test fails with assert(comp != __null) failed:
 Ensure we have a compiler
In-Reply-To: <11304b85-3a28-017b-645e-e76fe4c2d0f8@loongson.cn>
References: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn>
 <11304b85-3a28-017b-645e-e76fe4c2d0f8@loongson.cn>
Message-ID: <016d980a-1197-be1e-6078-23290e9ae365@oracle.com>

Hi Jie,

what about the high-only-quick-internal mode?

While looking at the fix for 8227003, I spotted a little typo here ("mininum"):
https://hg.openjdk.java.net/jdk/jdk/rev/b95bead30957#l6.8
Maybe you can fix that as well with your patch.

Thanks,
Tobias

On 11.11.19 06:43, Jie Fu wrote:
> Sorry, I missed the test case in webrev.00
> I just added it in webrev.01
> 
> Please review this version: http://cr.openjdk.java.net/~jiefu/8233885/webrev.01/
> 
> Thanks a lot.
> Best regards,
> Jie
> 
> On 2019/11/11 ??11:56, Jie Fu wrote:
>> Hi all,
>>
>> May I get reviews for this small change?
>> This bug was found by running compiler/profiling/spectrapredefineclass/Launcher.java with -Xcomp.
>> The fix sets CompLevel_initial_compile to CompLevel_full_optimization if CompilationMode=high-only
>> is specified.
>>
>> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233885
>> Webrev: http://cr.openjdk.java.net/~jiefu/8233885/webrev.00/
>>
>> Testing:
>> ? - make test TEST="test/hotspot/jtreg/compiler" CONF=fastdebug on Linux/x64
>>
>> Thanks a lot.
>> Best regards,
>> Jie
>>
> 

From erik.osterlund at oracle.com  Mon Nov 11 07:53:26 2019
From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=)
Date: Mon, 11 Nov 2019 08:53:26 +0100
Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted
 to a load for strong refs
In-Reply-To: <3d08b7f4-68a4-5e69-1e39-cf3725adc8da@oracle.com>
References: <3d08b7f4-68a4-5e69-1e39-cf3725adc8da@oracle.com>
Message-ID: <66003FC0-87B2-432F-8151-87952B3115E2@oracle.com>

Hi Per,

Thanks for the review! Will file a cleanup RFE.

Thanks,
/Erik

> On 8 Nov 2019, at 23:45, Per Liden <per.liden at oracle.com> wrote:
> 
> ?On 11/7/19 2:49 PM, Erik ?sterlund wrote:
>> Hi,
>> We have noticed problems with the one single place where we let C2 touch the graph while injecting our load barriers. Right before tagging a load as needing a load barrier, it is GVN transformed. The problem with this is that if we emit a strong field load, the GVN transformation can see through that load, by following it through the store that set its value, and find that the field value really came from the load of a weak Reference.get intrinsic. In this scenario, the load we get out from the GVN transformation needs weak barriers, yet we override it with strong barriers, as that was the semantics of the access being parsed. Sigh. We already have code that tries to determine if the load we got out from the GVN transformation looks like a load that was created in the BarrierSetC2 factory function, so one way of solving this is to refine that logic that tries to determine if this was the load we created before the transformation or not. But I felt like a better solution is to finish constructing the access with all the intended properties *before* transformation.
>> I massaged the code so that the GC barrier data of accesses with load barriers gets passed in to the factory functions that create the access, right before the transformation. This way, we construct the access with the intended semantics where it is being created (parser or macro expansion for field accesses in clone intrinsics). Then we do not have to touch it after the GVN transformation.
>> It does seem like there could be similar problems from other GCs, but in e.g. G1, the consequences are weird suboptimal code instead of anything dangerous happening. For example, we can generate SATB buffering code required by G1 Reference.get() intrinsics for strong accesses, due to GVN handing out earlier accesses with different semantics. Perhaps that should be looked into separately as well. But that investigation is outside of the scope of this bug fix.
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8233506/webrev.00/
> 
> Looks good!
> 
> As discussed off-line, even if it's not a problem for ZGC, we should fix so that we never call access.set_raw_access() *after* GVN transformation. But let's do that as a separate fix.
> 
> /Per
> 
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8233506
>> Thanks,
>> /Erik


From david.holmes at oracle.com  Mon Nov 11 07:56:21 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 11 Nov 2019 17:56:21 +1000
Subject: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
In-Reply-To: <AM6PR02MB5347AB9322749A90AED3128BEC7B0@AM6PR02MB5347.eurprd02.prod.outlook.com>
References: <AM6PR02MB534783144A9A6CF30C1049F8EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <3f1be78d-b5e3-192b-d05f-e81ed520d65a@oracle.com>
 <AM6PR02MB534729CACDA052E9B80ED146EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <5e283ed2-1f93-f3a1-1762-da6d5cf465a2@oracle.com>
 <AM6PR02MB5347AB9322749A90AED3128BEC7B0@AM6PR02MB5347.eurprd02.prod.outlook.com>
Message-ID: <cb3b92d1-5dee-8d51-1298-2d01a016ef2b@oracle.com>

Hi Goetz,

Please note I only looked at the test initially and have not reviewed 
this overall fix as I don't know the PPC code.

The updated test seems fine.

Thanks,
David

On 9/11/2019 1:32 am, Lindenmaier, Goetz wrote:
> Hi,
> 
> I waited for https://bugs.openjdk.java.net/browse/JDK-8233081
> which makes one of the fixes unnecessary.
> Also, I had to fix the argument of verify_oop_helper
> from oop to oopDesc* for the fastdebug build.
> 
> New webrev:
> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/03/
> 
> Best regards,
>    Goetz.
> 
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Freitag, 18. Oktober 2019 01:38
>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
>> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-
>> compiler-dev at openjdk.java.net>
>> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
>>
>> On 18/10/2019 12:10 am, Lindenmaier, Goetz wrote:
>>> Hi David,
>>>
>>> you are right, thanks for pointing me to that!
>>> Doing one test for vm.bits=64 and one for 32 should fix it:
>>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
>>
>> s/01/02/ :)
>>
>> For the 32-bit case you can delete the line:
>>
>>      * @requires vm.debug & (os.arch != "sparc") & (os.arch != "sparcv9")
>>
>> For the 64-but case you can delete the "sparc" check from the same line.
>>
>> Thanks,
>> David
>>
>>>
>>> Best regards,
>>>     Goetz.
>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> Sent: Donnerstag, 17. Oktober 2019 13:18
>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
>>>> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-
>>>> compiler-dev at openjdk.java.net>
>>>> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
>>>>
>>>> Hi Goetz,
>>>>
>>>> UseCompressedOops is a 64-bit flag only so your change will break the
>>>> test on 32-bit systems.
>>>>
>>>> David
>>>>
>>>> On 17/10/2019 8:55 pm, Lindenmaier, Goetz wrote:
>>>>> Hi,
>>>>>
>>>>> 8231058 introduced a test that enables +VerifyOops.
>>>>> This fails on ppc, because this was not used in a very
>>>>> long time.
>>>>>
>>>>> The crash is caused by passing compressed oops from
>>>>> LIR_Assembler::store() to the checker routine.
>>>>> I fix this by implementing a checker routine verify_coop
>>>>> that first decompresses the coop.  This makes the new
>>>>> test pass.
>>>>>
>>>>> Further testing showed that the additional checker
>>>>> coding makes Patching Stubs overflow. These
>>>>> can not be increased in size to fit the code. I
>>>>> disable generating verify_oop code in LIRAssembler::load()
>>>>> which fixes the issue.
>>>>>
>>>>> Further I extended the message printed when verification
>>>>> of an oop failed. First, I print the location in the source
>>>>> code where the checker code was generated. Second,
>>>>> I print the faulty oop.
>>>>>
>>>>> I also improved the message printed when PatchingStubs
>>>>> overflow.
>>>>>
>>>>> Finally, I improve the test to run with and without compressed
>>>>> Oops.
>>>>>
>>>>> Please review:
>>>>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
>>>>>
>>>>> @runtime as I modify the test introduced there
>>>>> @compiler as the error is in C1.
>>>>>
>>>>> Best regards,
>>>>>      Goetz.
>>>>>

From erik.osterlund at oracle.com  Mon Nov 11 07:58:16 2019
From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=)
Date: Mon, 11 Nov 2019 08:58:16 +0100
Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted
 to a load for strong refs
In-Reply-To: <17f9f020-f064-6d00-2f11-c71b35832611@oracle.com>
References: <17f9f020-f064-6d00-2f11-c71b35832611@oracle.com>
Message-ID: <75C2E95C-1D75-4AD6-8056-28323ED85507@oracle.com>

Hi Tobias,

> On 11 Nov 2019, at 08:12, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> ?Hi Erik,
> 
>> On 08.11.19 16:15, erik.osterlund at oracle.com wrote:
>>> On 11/8/19 9:38 AM, Tobias Hartmann wrote:
>>> Do we still need that logic with your change?
>> 
>> Nope! :) Because the access nodes have their barrier data populated before transformation,
>> describing the semantics of the produced access, we don't care what GVN gives us after
>> transformation. Whatever it gives us has the correct semantics for the corresponding access that
>> produced it.
> 
> Can we remove that code then? I mean the one you referred to with "We already have code that tries
> to determine if the load we got out from the GVN transformation looks like a load that was created
> in the BarrierSetC2 factory function".

Ahh. To be clear: I meant the code in the backend that first calls e.g. BarrerSetC2::load_at_resolved, and then checks in ZBarrierSetC2, if the resulting raw_access() is really a load before sprinkling barriers. This change deletes all that.

>> Sure, will file another RFE for that.
>> Regarding the assert, it's not obvious what it would look like, since the assert in the
>> transformation code has to know exactly what nodes are expected to have GC data. For example, a
>> LoadP might be used to read any pointer, including but not limited to oops. And some oops don't need
>> barriers like the threadOop due to being processed in safepoints. So given a LoadP node for example,
>> I don't know if we can determine whether it should or should not have GC data.
> 
> Right, that doesn't seem feasible.
> 
> Ship it!

Thanks Tobias!

/Erik

> Best regards,
> Tobias


From tobias.hartmann at oracle.com  Mon Nov 11 08:01:18 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 11 Nov 2019 09:01:18 +0100
Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted
 to a load for strong refs
In-Reply-To: <75C2E95C-1D75-4AD6-8056-28323ED85507@oracle.com>
References: <17f9f020-f064-6d00-2f11-c71b35832611@oracle.com>
 <75C2E95C-1D75-4AD6-8056-28323ED85507@oracle.com>
Message-ID: <e68d3673-0ab0-1618-a925-65f31f637f3b@oracle.com>


On 11.11.19 08:58, Erik ?sterlund wrote:
> Ahh. To be clear: I meant the code in the backend that first calls e.g. BarrerSetC2::load_at_resolved, and then checks in ZBarrierSetC2, if the resulting raw_access() is really a load before sprinkling barriers. This change deletes all that.

Okay, got it. I thought there's more.

Thanks,
Tobias

From tobias.hartmann at oracle.com  Mon Nov 11 08:03:35 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 11 Nov 2019 09:03:35 +0100
Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed:
 must have CFG nodes
In-Reply-To: <bf5d6250-abcc-e820-7886-a966fae1af9e@oracle.com>
References: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>
 <bf5d6250-abcc-e820-7886-a966fae1af9e@oracle.com>
Message-ID: <af8bacdd-09d8-3c11-6444-ee51a6afc0a4@oracle.com>

Hi Vladimir,

thanks for the review.

On 09.11.19 13:38, Vladimir Ivanov wrote:
> Can you just check control for TOP instead?

In the failing case, the control input is not TOP but it's a ProjNode with a TOP input. We hit the
assert because the is_CFG() method returns false for these:
https://hg.openjdk.java.net/jdk/jdk/file/47c20fc6a517/src/hotspot/share/opto/multnode.cpp#l83

Or do you mean checking ctl->in(0) for TOP?

> Also, is it worth putting an assert to ensure the node is already on worklist and will be eventually
> eliminated?

I don't think it's worth it. Since 8040213 [1] we have code that ensures that all modified nodes are
added to the worklist (see Compile::record_modified_node()). I've verified that the ProjNode with
TOP input is covered by that.

Best regards,
Tobias

[1] https://bugs.openjdk.java.net/browse/JDK-8040213

From christian.hagedorn at oracle.com  Mon Nov 11 09:17:36 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Mon, 11 Nov 2019 10:17:36 +0100
Subject: [14] RFR(S): 8229694: JVM crash in SWPointer during C2 OSR
 compilation
In-Reply-To: <bdec3244-b4bb-99e1-8fec-7e0c5f2aa299@oracle.com>
References: <ac04b5f1-a652-e638-cd40-04fc01fca203@oracle.com>
 <48dea421-af44-047a-05d8-26c2c37358b7@oracle.com>
 <bdec3244-b4bb-99e1-8fec-7e0c5f2aa299@oracle.com>
Message-ID: <931f7ed6-5f1b-9669-a5fc-c0902dcd9b08@oracle.com>

Thank you very much Vladimir K., Vladimir I. and Tobias! I filed an RFE 
[1] and assigned it to me to further investigate.

Best regards,
Christian


[1] https://bugs.openjdk.java.net/browse/JDK-8233895

On 11.11.19 08:07, Tobias Hartmann wrote:
> 
> On 09.11.19 13:47, Vladimir Ivanov wrote:
>> Nice analysis, Christian!
> 
> I agree, very nice.
> 
> Looks good to me too. I'll sponsor.
> 
> Best regards,
> Tobias
> 

From martin.doerr at sap.com  Mon Nov 11 10:39:31 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 11 Nov 2019 10:39:31 +0000
Subject: RFR(S): Test crashed with assert(phi->operand_count() != 1 ||
 phi->subst() != phi) failed: missed trivial simplification
Message-ID: <VI1PR0201MB24793BAE30999C6DEA426E779A740@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi,

some C1 assertions currently don't deal correctly with illegal phi functions (phi function with illegal type due to type conflict).
Since JDK-8214352 we bail out in fewer situations, so C1 needs to deal with a few more illegal phi cases.

The assertion (see headline) in PhiSimplifier doesn't support illegal phi, but the simplify call before it does (by just skipping).

Another assertion which needs to skip illegal phi is in c1_Optimizer where we merge a block with its unique successor.
If a local value of the succeeding block is coming from an illegal phi function, we drop it and take the value from the first block.
This is correct, only the assertion doesn't expect that at the moment.

So I'd like to fix these 2 assertions I found:
http://cr.openjdk.java.net/~mdoerr/8233820_C1_illegal_phi/webrev.00/

Best regards,
Martin


From doug.simon at oracle.com  Mon Nov 11 10:54:08 2019
From: doug.simon at oracle.com (Doug Simon)
Date: Mon, 11 Nov 2019 11:54:08 +0100
Subject: RFR(S) : 8233900: [JVMCI] improve help text for EnableJVMCIProduct
 option
In-Reply-To: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com>
References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com>
Message-ID: <A64BA248-9194-4EE3-835D-F53B962AB9EE@oracle.com>

Hi,

Please review this change to improve the help text for EnableJVMCIProduct and related options.

https://dougxc.github.io/webrevs/8233900
https://bugs.openjdk.java.net/browse/JDK-8233900

-Doug

From vladimir.x.ivanov at oracle.com  Mon Nov 11 12:06:45 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 11 Nov 2019 15:06:45 +0300
Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed:
 must have CFG nodes
In-Reply-To: <af8bacdd-09d8-3c11-6444-ee51a6afc0a4@oracle.com>
References: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>
 <bf5d6250-abcc-e820-7886-a966fae1af9e@oracle.com>
 <af8bacdd-09d8-3c11-6444-ee51a6afc0a4@oracle.com>
Message-ID: <720f25b4-e153-652a-cef3-f41084d30662@oracle.com>

Hi Tobias,

> Or do you mean checking ctl->in(0) for TOP?

Yes, I'm curious whether it's enough to check that in(0) is NULL, TOP, 
or Proj(_, TOP). Are there any other important cases left?

My concerns is that other usages of PhaseGVN::is_dominator() may be 
affected the same way as well.

It looks like PhaseGVN::is_dominator_helper() would benefit from 
additional checks:

bool PhaseGVN::is_dominator_helper(Node *d, Node *n, bool linear_only) {
   if (d->is_top() || n->is_top()) {
     return false;
   }
   assert(d->is_CFG() && n->is_CFG(), "must have CFG nodes");
   ...


As an example, InitializeNode::detect_init_independence() does some 
control normalization first:

bool InitializeNode::detect_init_independence(Node* value, PhaseGVN* 
phase) {
...
     if (n->is_Proj())   n = n->in(0);
...
     if (n->is_CFG() && phase->is_dominator(n, allocation())) {
       continue;
     }

     Node* ctl = n->in(0);
     if (ctl != NULL && !ctl->is_top()) {
       if (ctl->is_Proj())  ctl = ctl->in(0);
       if (ctl == this)  return false;
...
       if (!MemNode::all_controls_dominate(n, this))
...
bool MemNode::all_controls_dominate(Node* dom, Node* sub) {
   if (dom == NULL || dom->is_top() || sub == NULL || sub->is_top())
     return false; // Conservative answer for dead code

>> Also, is it worth putting an assert to ensure the node is already on worklist and will be eventually
>> eliminated?
> 
> I don't think it's worth it. Since 8040213 [1] we have code that ensures that all modified nodes are
> added to the worklist (see Compile::record_modified_node()). I've verified that the ProjNode with
> TOP input is covered by that.

Got it. Sounds good.

Best regards,
Vladimir Ivanov

From fujie at loongson.cn  Mon Nov 11 13:28:13 2019
From: fujie at loongson.cn (Jie Fu)
Date: Mon, 11 Nov 2019 21:28:13 +0800
Subject: RFR: 8233885: Test fails with assert(comp != __null) failed:
 Ensure we have a compiler
In-Reply-To: <016d980a-1197-be1e-6078-23290e9ae365@oracle.com>
References: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn>
 <11304b85-3a28-017b-645e-e76fe4c2d0f8@loongson.cn>
 <016d980a-1197-be1e-6078-23290e9ae365@oracle.com>
Message-ID: <df2883da-5b3e-69af-8e1e-a87a82e1e55a@loongson.cn>

Hi Tobias,

Thank you for your review and valuable comments.
Updated: http://cr.openjdk.java.net/~jiefu/8233885/webrev.02/

Thanks a lot.
Best regards,
Jie

On 2019/11/11 ??3:44, Tobias Hartmann wrote:
> Hi Jie,
>
> what about the high-only-quick-internal mode?
It's really a nice catch.
Fixed. Thanks.


> While looking at the fix for 8227003, I spotted a little typo here ("mininum"):
> https://hg.openjdk.java.net/jdk/jdk/rev/b95bead30957#l6.8
> Maybe you can fix that as well with your patch.
Done.


From martin.doerr at sap.com  Mon Nov 11 15:06:54 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 11 Nov 2019 15:06:54 +0000
Subject: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
In-Reply-To: <cb3b92d1-5dee-8d51-1298-2d01a016ef2b@oracle.com>
References: <AM6PR02MB534783144A9A6CF30C1049F8EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <3f1be78d-b5e3-192b-d05f-e81ed520d65a@oracle.com>
 <AM6PR02MB534729CACDA052E9B80ED146EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <5e283ed2-1f93-f3a1-1762-da6d5cf465a2@oracle.com>
 <AM6PR02MB5347AB9322749A90AED3128BEC7B0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <cb3b92d1-5dee-8d51-1298-2d01a016ef2b@oracle.com>
Message-ID: <VI1PR0201MB247982FA923CE145853156219A740@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi G?tz,

the PPC64 code looks good, too.
Thanks for fixing and improving it.

Best regards,
Martin


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of David Holmes
> Sent: Montag, 11. November 2019 08:56
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-
> compiler-dev at openjdk.java.net>
> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> 
> Hi Goetz,
> 
> Please note I only looked at the test initially and have not reviewed
> this overall fix as I don't know the PPC code.
> 
> The updated test seems fine.
> 
> Thanks,
> David
> 
> On 9/11/2019 1:32 am, Lindenmaier, Goetz wrote:
> > Hi,
> >
> > I waited for https://bugs.openjdk.java.net/browse/JDK-8233081
> > which makes one of the fixes unnecessary.
> > Also, I had to fix the argument of verify_oop_helper
> > from oop to oopDesc* for the fastdebug build.
> >
> > New webrev:
> > http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/03/
> >
> > Best regards,
> >    Goetz.
> >
> >> -----Original Message-----
> >> From: David Holmes <david.holmes at oracle.com>
> >> Sent: Freitag, 18. Oktober 2019 01:38
> >> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> >> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net'
> <hotspot-
> >> compiler-dev at openjdk.java.net>
> >> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> >>
> >> On 18/10/2019 12:10 am, Lindenmaier, Goetz wrote:
> >>> Hi David,
> >>>
> >>> you are right, thanks for pointing me to that!
> >>> Doing one test for vm.bits=64 and one for 32 should fix it:
> >>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
> >>
> >> s/01/02/ :)
> >>
> >> For the 32-bit case you can delete the line:
> >>
> >>      * @requires vm.debug & (os.arch != "sparc") & (os.arch != "sparcv9")
> >>
> >> For the 64-but case you can delete the "sparc" check from the same line.
> >>
> >> Thanks,
> >> David
> >>
> >>>
> >>> Best regards,
> >>>     Goetz.
> >>>
> >>>> -----Original Message-----
> >>>> From: David Holmes <david.holmes at oracle.com>
> >>>> Sent: Donnerstag, 17. Oktober 2019 13:18
> >>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> runtime-
> >>>> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net'
> <hotspot-
> >>>> compiler-dev at openjdk.java.net>
> >>>> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> >>>>
> >>>> Hi Goetz,
> >>>>
> >>>> UseCompressedOops is a 64-bit flag only so your change will break the
> >>>> test on 32-bit systems.
> >>>>
> >>>> David
> >>>>
> >>>> On 17/10/2019 8:55 pm, Lindenmaier, Goetz wrote:
> >>>>> Hi,
> >>>>>
> >>>>> 8231058 introduced a test that enables +VerifyOops.
> >>>>> This fails on ppc, because this was not used in a very
> >>>>> long time.
> >>>>>
> >>>>> The crash is caused by passing compressed oops from
> >>>>> LIR_Assembler::store() to the checker routine.
> >>>>> I fix this by implementing a checker routine verify_coop
> >>>>> that first decompresses the coop.  This makes the new
> >>>>> test pass.
> >>>>>
> >>>>> Further testing showed that the additional checker
> >>>>> coding makes Patching Stubs overflow. These
> >>>>> can not be increased in size to fit the code. I
> >>>>> disable generating verify_oop code in LIRAssembler::load()
> >>>>> which fixes the issue.
> >>>>>
> >>>>> Further I extended the message printed when verification
> >>>>> of an oop failed. First, I print the location in the source
> >>>>> code where the checker code was generated. Second,
> >>>>> I print the faulty oop.
> >>>>>
> >>>>> I also improved the message printed when PatchingStubs
> >>>>> overflow.
> >>>>>
> >>>>> Finally, I improve the test to run with and without compressed
> >>>>> Oops.
> >>>>>
> >>>>> Please review:
> >>>>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
> >>>>>
> >>>>> @runtime as I modify the test introduced there
> >>>>> @compiler as the error is in C1.
> >>>>>
> >>>>> Best regards,
> >>>>>      Goetz.
> >>>>>

From goetz.lindenmaier at sap.com  Mon Nov 11 15:17:23 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Mon, 11 Nov 2019 15:17:23 +0000
Subject: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
In-Reply-To: <cb3b92d1-5dee-8d51-1298-2d01a016ef2b@oracle.com>
References: <AM6PR02MB534783144A9A6CF30C1049F8EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <3f1be78d-b5e3-192b-d05f-e81ed520d65a@oracle.com>
 <AM6PR02MB534729CACDA052E9B80ED146EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <5e283ed2-1f93-f3a1-1762-da6d5cf465a2@oracle.com>
 <AM6PR02MB5347AB9322749A90AED3128BEC7B0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <cb3b92d1-5dee-8d51-1298-2d01a016ef2b@oracle.com>
Message-ID: <AM6PR02MB534728BE141A46D33E45A298EC740@AM6PR02MB5347.eurprd02.prod.outlook.com>

Hi David,

thanks for looking again. 
Martin checked the PPC code.

Best regards,
  Goetz.

> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Montag, 11. November 2019 08:56
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-
> compiler-dev at openjdk.java.net>
> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> 
> Hi Goetz,
> 
> Please note I only looked at the test initially and have not reviewed
> this overall fix as I don't know the PPC code.
> 
> The updated test seems fine.
> 
> Thanks,
> David
> 
> On 9/11/2019 1:32 am, Lindenmaier, Goetz wrote:
> > Hi,
> >
> > I waited for https://bugs.openjdk.java.net/browse/JDK-8233081
> > which makes one of the fixes unnecessary.
> > Also, I had to fix the argument of verify_oop_helper
> > from oop to oopDesc* for the fastdebug build.
> >
> > New webrev:
> > http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/03/
> >
> > Best regards,
> >    Goetz.
> >
> >> -----Original Message-----
> >> From: David Holmes <david.holmes at oracle.com>
> >> Sent: Freitag, 18. Oktober 2019 01:38
> >> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> >> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-
> >> compiler-dev at openjdk.java.net>
> >> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> >>
> >> On 18/10/2019 12:10 am, Lindenmaier, Goetz wrote:
> >>> Hi David,
> >>>
> >>> you are right, thanks for pointing me to that!
> >>> Doing one test for vm.bits=64 and one for 32 should fix it:
> >>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
> >>
> >> s/01/02/ :)
> >>
> >> For the 32-bit case you can delete the line:
> >>
> >>      * @requires vm.debug & (os.arch != "sparc") & (os.arch != "sparcv9")
> >>
> >> For the 64-but case you can delete the "sparc" check from the same line.
> >>
> >> Thanks,
> >> David
> >>
> >>>
> >>> Best regards,
> >>>     Goetz.
> >>>
> >>>> -----Original Message-----
> >>>> From: David Holmes <david.holmes at oracle.com>
> >>>> Sent: Donnerstag, 17. Oktober 2019 13:18
> >>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> >>>> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net'
> <hotspot-
> >>>> compiler-dev at openjdk.java.net>
> >>>> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> >>>>
> >>>> Hi Goetz,
> >>>>
> >>>> UseCompressedOops is a 64-bit flag only so your change will break the
> >>>> test on 32-bit systems.
> >>>>
> >>>> David
> >>>>
> >>>> On 17/10/2019 8:55 pm, Lindenmaier, Goetz wrote:
> >>>>> Hi,
> >>>>>
> >>>>> 8231058 introduced a test that enables +VerifyOops.
> >>>>> This fails on ppc, because this was not used in a very
> >>>>> long time.
> >>>>>
> >>>>> The crash is caused by passing compressed oops from
> >>>>> LIR_Assembler::store() to the checker routine.
> >>>>> I fix this by implementing a checker routine verify_coop
> >>>>> that first decompresses the coop.  This makes the new
> >>>>> test pass.
> >>>>>
> >>>>> Further testing showed that the additional checker
> >>>>> coding makes Patching Stubs overflow. These
> >>>>> can not be increased in size to fit the code. I
> >>>>> disable generating verify_oop code in LIRAssembler::load()
> >>>>> which fixes the issue.
> >>>>>
> >>>>> Further I extended the message printed when verification
> >>>>> of an oop failed. First, I print the location in the source
> >>>>> code where the checker code was generated. Second,
> >>>>> I print the faulty oop.
> >>>>>
> >>>>> I also improved the message printed when PatchingStubs
> >>>>> overflow.
> >>>>>
> >>>>> Finally, I improve the test to run with and without compressed
> >>>>> Oops.
> >>>>>
> >>>>> Please review:
> >>>>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
> >>>>>
> >>>>> @runtime as I modify the test introduced there
> >>>>> @compiler as the error is in C1.
> >>>>>
> >>>>> Best regards,
> >>>>>      Goetz.
> >>>>>

From goetz.lindenmaier at sap.com  Mon Nov 11 15:18:03 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Mon, 11 Nov 2019 15:18:03 +0000
Subject: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
In-Reply-To: <VI1PR0201MB247982FA923CE145853156219A740@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <AM6PR02MB534783144A9A6CF30C1049F8EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <3f1be78d-b5e3-192b-d05f-e81ed520d65a@oracle.com>
 <AM6PR02MB534729CACDA052E9B80ED146EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <5e283ed2-1f93-f3a1-1762-da6d5cf465a2@oracle.com>
 <AM6PR02MB5347AB9322749A90AED3128BEC7B0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <cb3b92d1-5dee-8d51-1298-2d01a016ef2b@oracle.com>
 <VI1PR0201MB247982FA923CE145853156219A740@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <AM6PR02MB534721086E3BEDC657A5436BEC740@AM6PR02MB5347.eurprd02.prod.outlook.com>

Hi Martin, 

thanks for looking at this, 
and thanks for resolving the patching stub issue!

Best regards,
  Goetz.

> -----Original Message-----
> From: Doerr, Martin <martin.doerr at sap.com>
> Sent: Montag, 11. November 2019 16:07
> To: David Holmes <david.holmes at oracle.com>; Lindenmaier, Goetz
> <goetz.lindenmaier at sap.com>; hotspot-runtime-dev at openjdk.java.net;
> 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-
> dev at openjdk.java.net>
> Subject: RE: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> 
> Hi G?tz,
> 
> the PPC64 code looks good, too.
> Thanks for fixing and improving it.
> 
> Best regards,
> Martin
> 
> 
> > -----Original Message-----
> > From: hotspot-compiler-dev <hotspot-compiler-dev-
> > bounces at openjdk.java.net> On Behalf Of David Holmes
> > Sent: Montag, 11. November 2019 08:56
> > To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> > dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-
> > compiler-dev at openjdk.java.net>
> > Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> >
> > Hi Goetz,
> >
> > Please note I only looked at the test initially and have not reviewed
> > this overall fix as I don't know the PPC code.
> >
> > The updated test seems fine.
> >
> > Thanks,
> > David
> >
> > On 9/11/2019 1:32 am, Lindenmaier, Goetz wrote:
> > > Hi,
> > >
> > > I waited for https://bugs.openjdk.java.net/browse/JDK-8233081
> > > which makes one of the fixes unnecessary.
> > > Also, I had to fix the argument of verify_oop_helper
> > > from oop to oopDesc* for the fastdebug build.
> > >
> > > New webrev:
> > > http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/03/
> > >
> > > Best regards,
> > >    Goetz.
> > >
> > >> -----Original Message-----
> > >> From: David Holmes <david.holmes at oracle.com>
> > >> Sent: Freitag, 18. Oktober 2019 01:38
> > >> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> > >> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net'
> > <hotspot-
> > >> compiler-dev at openjdk.java.net>
> > >> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> > >>
> > >> On 18/10/2019 12:10 am, Lindenmaier, Goetz wrote:
> > >>> Hi David,
> > >>>
> > >>> you are right, thanks for pointing me to that!
> > >>> Doing one test for vm.bits=64 and one for 32 should fix it:
> > >>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
> > >>
> > >> s/01/02/ :)
> > >>
> > >> For the 32-bit case you can delete the line:
> > >>
> > >>      * @requires vm.debug & (os.arch != "sparc") & (os.arch != "sparcv9")
> > >>
> > >> For the 64-but case you can delete the "sparc" check from the same line.
> > >>
> > >> Thanks,
> > >> David
> > >>
> > >>>
> > >>> Best regards,
> > >>>     Goetz.
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: David Holmes <david.holmes at oracle.com>
> > >>>> Sent: Donnerstag, 17. Oktober 2019 13:18
> > >>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> > runtime-
> > >>>> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net'
> > <hotspot-
> > >>>> compiler-dev at openjdk.java.net>
> > >>>> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> > >>>>
> > >>>> Hi Goetz,
> > >>>>
> > >>>> UseCompressedOops is a 64-bit flag only so your change will break the
> > >>>> test on 32-bit systems.
> > >>>>
> > >>>> David
> > >>>>
> > >>>> On 17/10/2019 8:55 pm, Lindenmaier, Goetz wrote:
> > >>>>> Hi,
> > >>>>>
> > >>>>> 8231058 introduced a test that enables +VerifyOops.
> > >>>>> This fails on ppc, because this was not used in a very
> > >>>>> long time.
> > >>>>>
> > >>>>> The crash is caused by passing compressed oops from
> > >>>>> LIR_Assembler::store() to the checker routine.
> > >>>>> I fix this by implementing a checker routine verify_coop
> > >>>>> that first decompresses the coop.  This makes the new
> > >>>>> test pass.
> > >>>>>
> > >>>>> Further testing showed that the additional checker
> > >>>>> coding makes Patching Stubs overflow. These
> > >>>>> can not be increased in size to fit the code. I
> > >>>>> disable generating verify_oop code in LIRAssembler::load()
> > >>>>> which fixes the issue.
> > >>>>>
> > >>>>> Further I extended the message printed when verification
> > >>>>> of an oop failed. First, I print the location in the source
> > >>>>> code where the checker code was generated. Second,
> > >>>>> I print the faulty oop.
> > >>>>>
> > >>>>> I also improved the message printed when PatchingStubs
> > >>>>> overflow.
> > >>>>>
> > >>>>> Finally, I improve the test to run with and without compressed
> > >>>>> Oops.
> > >>>>>
> > >>>>> Please review:
> > >>>>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
> > >>>>>
> > >>>>> @runtime as I modify the test introduced there
> > >>>>> @compiler as the error is in C1.
> > >>>>>
> > >>>>> Best regards,
> > >>>>>      Goetz.
> > >>>>>

From richard.reingruber at sap.com  Mon Nov 11 15:29:36 2019
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Mon, 11 Nov 2019 15:29:36 +0000
Subject: RFC 8233915: JVMTI FollowReferences: Java Heap Leak not found because
 of C2 Scalar Replacement
Message-ID: <VI1PR02MB482960E862A276EF918D88B59B740@VI1PR02MB4829.eurprd02.prod.outlook.com>

Hi,

I have created https://bugs.openjdk.java.net/browse/JDK-8233915

In short, a set of live objects L is not found using JVMTI FollowReferences() if L is only reachable
from a scalar replaced object in a frame of a C2 compiled method. If L happens to be a growing leak,
then a dynamically loaded JVMTI agent (note: can_tag_objects is an always capability) for heap
diagnostics won't discover L as live and it won't be able to find root references that lead to L.

I'd like to suggest the implementation for the proposed enhancement JDK-8227745 as bug-fix.

RFE:       https://bugs.openjdk.java.net/browse/JDK-8227745
Webrev(*): http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.1/

Please comment on the suggestion. Dou you see other solutions that allow an agent to discover the
chain of references to L?

I'd like to work on the complexity as well. One significant simplification could be, if it was
possible to reallocate scalar replaced objects at safepoints (i.e. allow the VM thread to call
Deoptimization::realloc_objects()). The GC interface does not seem to allow this.

Thanks, Richard.

(*) Not yet accepted, because deemed too complex for the performance gain. Note that I was able to
    reduce webrev.1 in size compared to webrev.0

From fweimer at redhat.com  Mon Nov 11 15:40:21 2019
From: fweimer at redhat.com (Florian Weimer)
Date: Mon, 11 Nov 2019 16:40:21 +0100
Subject: adlc-generated operator= for Pipeline_Use_Cycle_Mask
Message-ID: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com>

adlc generates an assignment operator for the  Pipeline_Use_Cycle_Mask
class, like this:

  Pipeline_Use_Cycle_Mask& operator=(const Pipeline_Use_Cycle_Mask &in) {
    _mask = in._mask;
    return *this;
  }

GCC 10 takes this as an indicator that objects of this class should not
be modified with memcpy or memset and issues -Wclass-memaccess warnings:

?/src/hotspot/share/opto/output.cpp: In constructor 'Scheduling::Scheduling(Arena*, Compile&)':
?/src/hotspot/share/opto/output.cpp:1745:108: error: 'void* memcpy(void*, const void*, size_t)' writing to an object of type 'class Pipeline_Use_Element' with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Werror=class-memaccess]
 1745 |   memcpy(_bundle_use_elements, Pipeline_Use::elaborated_elements, sizeof(Pipeline_Use::elaborated_elements));
      |                                                                                                            ^
In file included from ?/src/hotspot/share/opto/ad.hpp:31,
                 from ?/src/hotspot/share/opto/output.cpp:37:
ad_x86.hpp:6196:7: note: 'class Pipeline_Use_Element' declared here
?/src/hotspot/share/opto/output.cpp: In member function 'void Scheduling::step_and_clear()':
?/src/hotspot/share/opto/output.cpp:1797:51: error: 'void* memcpy(void*, const void*, size_t)' writing to an object of type 'class Pipeline_Use_Element' with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Werror=class-memaccess]
 1797 |          sizeof(Pipeline_Use::elaborated_elements));
      |                                                   ^

Looking at the code generator, it seems that the default assignment
operator will do the right thing in all cases.  Can we remove generation
of this C++ code fragment?

Thanks,
Florian


From patric.hedlin at oracle.com  Mon Nov 11 16:49:04 2019
From: patric.hedlin at oracle.com (Patric Hedlin)
Date: Mon, 11 Nov 2019 17:49:04 +0100
Subject: RFR(XS): 8233918: 8233498 broke build on SPARC
Message-ID: <6fc02880-958f-1e0b-bb9a-a40cc8fb6cfa@oracle.com>

Dear all,

I would like to ask for help to review the following change/update:

Issue:? https://bugs.openjdk.java.net/browse/JDK-8233918

8233918: 8233498 broke build on SPARC

 ??? Pushed poor patch that broke debug builds.


Testing: SPARC build on Solaris (sparcv9, sparcv9-debug, sparcv9-slowdebug)

Best regards,
Patric

-----8<-----

diff -r e4d7fcab43d7 -r 0c993f1305eb 
src/hotspot/cpu/sparc/interp_masm_sparc.hpp
--- a/src/hotspot/cpu/sparc/interp_masm_sparc.hpp?????? Tue Apr 24 
13:59:02 2018 +0200
+++ b/src/hotspot/cpu/sparc/interp_masm_sparc.hpp?????? Mon Nov 11 
16:59:42 2019 +0100
@@ -321,6 +321,7 @@
 ?? // Debugging
 ?? void interp_verify_oop(Register reg, TosState state, const char * 
file, int line);??? // only if +VerifyOops && state == atos
 ?? void verify_oop_or_return_address(Register reg, Register rtmp); // 
for astore
+? void verify_FPU(int stack_depth, TosState state = ftos) {} // No-op.

 ?? // support for JVMTI/Dtrace
 ?? typedef enum { NotifyJVMTI, SkipNotifyJVMTI } NotifyMethodExitMode;


From erik.osterlund at oracle.com  Mon Nov 11 16:54:57 2019
From: erik.osterlund at oracle.com (erik.osterlund at oracle.com)
Date: Mon, 11 Nov 2019 17:54:57 +0100
Subject: RFR(XS): 8233918: 8233498 broke build on SPARC
In-Reply-To: <6fc02880-958f-1e0b-bb9a-a40cc8fb6cfa@oracle.com>
References: <6fc02880-958f-1e0b-bb9a-a40cc8fb6cfa@oracle.com>
Message-ID: <c19b6745-168e-23de-8955-3a0f12c98904@oracle.com>

Hi Patric,

Looks good, and trivial. Ship it!

Thanks,
/Erik

On 11/11/19 5:49 PM, Patric Hedlin wrote:
> Dear all,
>
> I would like to ask for help to review the following change/update:
>
> Issue:? https://bugs.openjdk.java.net/browse/JDK-8233918
>
> 8233918: 8233498 broke build on SPARC
>
> ??? Pushed poor patch that broke debug builds.
>
>
> Testing: SPARC build on Solaris (sparcv9, sparcv9-debug, 
> sparcv9-slowdebug)
>
> Best regards,
> Patric
>
> -----8<-----
>
> diff -r e4d7fcab43d7 -r 0c993f1305eb 
> src/hotspot/cpu/sparc/interp_masm_sparc.hpp
> --- a/src/hotspot/cpu/sparc/interp_masm_sparc.hpp?????? Tue Apr 24 
> 13:59:02 2018 +0200
> +++ b/src/hotspot/cpu/sparc/interp_masm_sparc.hpp?????? Mon Nov 11 
> 16:59:42 2019 +0100
> @@ -321,6 +321,7 @@
> ?? // Debugging
> ?? void interp_verify_oop(Register reg, TosState state, const char * 
> file, int line);??? // only if +VerifyOops && state == atos
> ?? void verify_oop_or_return_address(Register reg, Register rtmp); // 
> for astore
> +? void verify_FPU(int stack_depth, TosState state = ftos) {} // No-op.
>
> ?? // support for JVMTI/Dtrace
> ?? typedef enum { NotifyJVMTI, SkipNotifyJVMTI } NotifyMethodExitMode;
>


From vladimir.kozlov at oracle.com  Mon Nov 11 16:59:00 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 11 Nov 2019 08:59:00 -0800
Subject: RFR(S) : 8233900: [JVMCI] improve help text for
 EnableJVMCIProduct option
In-Reply-To: <A64BA248-9194-4EE3-835D-F53B962AB9EE@oracle.com>
References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com>
 <A64BA248-9194-4EE3-835D-F53B962AB9EE@oracle.com>
Message-ID: <3C974BFC-3CF7-4214-B4F3-EE630C32EBF3@oracle.com>

Looks good. Do you need this be backported to 11u? Add affected versions and fix version to rfe.

Thanks
Vladimir

> On Nov 11, 2019, at 2:54 AM, Doug Simon <doug.simon at oracle.com> wrote:
> 
> Hi,
> 
> Please review this change to improve the help text for EnableJVMCIProduct and related options.
> 
> https://dougxc.github.io/webrevs/8233900
> https://bugs.openjdk.java.net/browse/JDK-8233900
> 
> -Doug


From bsrbnd at gmail.com  Mon Nov 11 17:23:32 2019
From: bsrbnd at gmail.com (B. Blaser)
Date: Mon, 11 Nov 2019 18:23:32 +0100
Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and setting
 long vector bits
Message-ID: <CAEgw74CAZ+8uwjRoP_ti0yyyVAKQ_KMrM7mWJ4+XeqaKUbicyQ@mail.gmail.com>

Hi,

Please review the following fix for [1] which has been extensively
discussed in [2]:

http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/

It includes:
* Vladimir's requested changes about instruction encoding and testing
along with:
* John's suggested complementary benchmark following Sandhya's note
regarding throughput.

Thanks (hotspot:tier1 is OK on Linux/x86_64),
Bernard

[1] https://bugs.openjdk.java.net/browse/JDK-8214239
[2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/035699.html

From vladimir.kozlov at oracle.com  Mon Nov 11 20:14:41 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 11 Nov 2019 12:14:41 -0800
Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <CAEgw74CAZ+8uwjRoP_ti0yyyVAKQ_KMrM7mWJ4+XeqaKUbicyQ@mail.gmail.com>
References: <CAEgw74CAZ+8uwjRoP_ti0yyyVAKQ_KMrM7mWJ4+XeqaKUbicyQ@mail.gmail.com>
Message-ID: <07845286-F5FB-4A7C-925B-8C5488EA7A02@oracle.com>

Very good. What testing you did?

Thanks
Vladimir

> On Nov 11, 2019, at 9:23 AM, B. Blaser <bsrbnd at gmail.com> wrote:
> 
> Hi,
> 
> Please review the following fix for [1] which has been extensively
> discussed in [2]:
> 
> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/
> 
> It includes:
> * Vladimir's requested changes about instruction encoding and testing
> along with:
> * John's suggested complementary benchmark following Sandhya's note
> regarding throughput.
> 
> Thanks (hotspot:tier1 is OK on Linux/x86_64),
> Bernard
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8214239
> [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/035699.html


From vladimir.kozlov at oracle.com  Mon Nov 11 20:16:03 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 11 Nov 2019 12:16:03 -0800
Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <07845286-F5FB-4A7C-925B-8C5488EA7A02@oracle.com>
References: <CAEgw74CAZ+8uwjRoP_ti0yyyVAKQ_KMrM7mWJ4+XeqaKUbicyQ@mail.gmail.com>
 <07845286-F5FB-4A7C-925B-8C5488EA7A02@oracle.com>
Message-ID: <b4d89d7e-f048-49a1-f5ac-1d202a78a2a4@oracle.com>

On 11/11/19 12:14 PM, Vladimir Kozlov wrote:
> Very good. What testing you did?

I mean in addition to tier1.

Vladimir

> 
> Thanks
> Vladimir
> 
>> On Nov 11, 2019, at 9:23 AM, B. Blaser <bsrbnd at gmail.com> wrote:
>>
>> Hi,
>>
>> Please review the following fix for [1] which has been extensively
>> discussed in [2]:
>>
>> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/
>>
>> It includes:
>> * Vladimir's requested changes about instruction encoding and testing
>> along with:
>> * John's suggested complementary benchmark following Sandhya's note
>> regarding throughput.
>>
>> Thanks (hotspot:tier1 is OK on Linux/x86_64),
>> Bernard
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8214239
>> [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/035699.html
> 

From kim.barrett at oracle.com  Mon Nov 11 20:20:24 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Mon, 11 Nov 2019 15:20:24 -0500
Subject: adlc-generated operator= for Pipeline_Use_Cycle_Mask
In-Reply-To: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com>
References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com>
Message-ID: <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com>

> On Nov 11, 2019, at 10:40 AM, Florian Weimer <fweimer at redhat.com> wrote:
> 
> adlc generates an assignment operator for the  Pipeline_Use_Cycle_Mask
> class, like this:
> 
>  Pipeline_Use_Cycle_Mask& operator=(const Pipeline_Use_Cycle_Mask &in) {
>    _mask = in._mask;
>    return *this;
>  }
> 
> GCC 10 takes this as an indicator that objects of this class should not
> be modified with memcpy or memset and issues -Wclass-memaccess warnings:
> 
> ?/src/hotspot/share/opto/output.cpp: In constructor 'Scheduling::Scheduling(Arena*, Compile&)':
> ?/src/hotspot/share/opto/output.cpp:1745:108: error: 'void* memcpy(void*, const void*, size_t)' writing to an object of type 'class Pipeline_Use_Element' with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Werror=class-memaccess]
> 1745 |   memcpy(_bundle_use_elements, Pipeline_Use::elaborated_elements, sizeof(Pipeline_Use::elaborated_elements));
>      |                                                                                                            ^
> In file included from ?/src/hotspot/share/opto/ad.hpp:31,
>                 from ?/src/hotspot/share/opto/output.cpp:37:
> ad_x86.hpp:6196:7: note: 'class Pipeline_Use_Element' declared here
> ?/src/hotspot/share/opto/output.cpp: In member function 'void Scheduling::step_and_clear()':
> ?/src/hotspot/share/opto/output.cpp:1797:51: error: 'void* memcpy(void*, const void*, size_t)' writing to an object of type 'class Pipeline_Use_Element' with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Werror=class-memaccess]
> 1797 |          sizeof(Pipeline_Use::elaborated_elements));
>      |                                                   ^
> 
> Looking at the code generator, it seems that the default assignment
> operator will do the right thing in all cases.  Can we remove generation
> of this C++ code fragment?
> 
> Thanks,
> Florian

I agree with that analysis and the suggested solution approach.  I?ve filed a bug for this issue:
https://bugs.openjdk.java.net/browse/JDK-8233941


From bsrbnd at gmail.com  Mon Nov 11 20:24:24 2019
From: bsrbnd at gmail.com (B. Blaser)
Date: Mon, 11 Nov 2019 21:24:24 +0100
Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <b4d89d7e-f048-49a1-f5ac-1d202a78a2a4@oracle.com>
References: <CAEgw74CAZ+8uwjRoP_ti0yyyVAKQ_KMrM7mWJ4+XeqaKUbicyQ@mail.gmail.com>
 <07845286-F5FB-4A7C-925B-8C5488EA7A02@oracle.com>
 <b4d89d7e-f048-49a1-f5ac-1d202a78a2a4@oracle.com>
Message-ID: <CAEgw74BMG2rCDW8u+qXbmocnChq7UuGYi99r7jy0LO0s9bHDMw@mail.gmail.com>

I pushed it to jdk/submit and all seems to be OK:

http://hg.openjdk.java.net/jdk/submit/rev/cbe81ae81095

Should we run more tests?

Bernard

On Mon, 11 Nov 2019 at 21:16, Vladimir Kozlov
<vladimir.kozlov at oracle.com> wrote:
>
> On 11/11/19 12:14 PM, Vladimir Kozlov wrote:
> > Very good. What testing you did?
>
> I mean in addition to tier1.
>
> Vladimir
>
> >
> > Thanks
> > Vladimir
> >
> >> On Nov 11, 2019, at 9:23 AM, B. Blaser <bsrbnd at gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> Please review the following fix for [1] which has been extensively
> >> discussed in [2]:
> >>
> >> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/
> >>
> >> It includes:
> >> * Vladimir's requested changes about instruction encoding and testing
> >> along with:
> >> * John's suggested complementary benchmark following Sandhya's note
> >> regarding throughput.
> >>
> >> Thanks (hotspot:tier1 is OK on Linux/x86_64),
> >> Bernard
> >>
> >> [1] https://bugs.openjdk.java.net/browse/JDK-8214239
> >> [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/035699.html
> >

From fweimer at redhat.com  Mon Nov 11 20:29:01 2019
From: fweimer at redhat.com (Florian Weimer)
Date: Mon, 11 Nov 2019 21:29:01 +0100
Subject: RFR 8233941: adlc should not generate
 Pipeline_Use_Cycle_Mask::operator= (was: Re: adlc-generated operator= for
 Pipeline_Use_Cycle_Mask)
In-Reply-To: <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com> (Kim Barrett's
 message of "Mon, 11 Nov 2019 15:20:24 -0500")
References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com>
 <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com>
Message-ID: <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com>

* Kim Barrett:

> I agree with that analysis and the suggested solution approach.  I?ve
> filed a bug for this issue:

> https://bugs.openjdk.java.net/browse/JDK-8233941

Thanks, here's a webrev:

  <http://cr.openjdk.java.net/~fweimer/8233941/webrev.01/>

I have put this through some testing on x86-64 (although that exercises
only one branch).

Thanks,
Florian


From vladimir.kozlov at oracle.com  Mon Nov 11 20:31:37 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 11 Nov 2019 12:31:37 -0800
Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <CAEgw74BMG2rCDW8u+qXbmocnChq7UuGYi99r7jy0LO0s9bHDMw@mail.gmail.com>
References: <CAEgw74CAZ+8uwjRoP_ti0yyyVAKQ_KMrM7mWJ4+XeqaKUbicyQ@mail.gmail.com>
 <07845286-F5FB-4A7C-925B-8C5488EA7A02@oracle.com>
 <b4d89d7e-f048-49a1-f5ac-1d202a78a2a4@oracle.com>
 <CAEgw74BMG2rCDW8u+qXbmocnChq7UuGYi99r7jy0LO0s9bHDMw@mail.gmail.com>
Message-ID: <aec3fe39-cac8-df30-d77c-a8ca4ecf8a33@oracle.com>

It is fine. I will submit our internal testing (more tiers) and let you know results.

Vladimir

On 11/11/19 12:24 PM, B. Blaser wrote:
> I pushed it to jdk/submit and all seems to be OK:
> 
> http://hg.openjdk.java.net/jdk/submit/rev/cbe81ae81095
> 
> Should we run more tests?
> 
> Bernard
> 
> On Mon, 11 Nov 2019 at 21:16, Vladimir Kozlov
> <vladimir.kozlov at oracle.com> wrote:
>>
>> On 11/11/19 12:14 PM, Vladimir Kozlov wrote:
>>> Very good. What testing you did?
>>
>> I mean in addition to tier1.
>>
>> Vladimir
>>
>>>
>>> Thanks
>>> Vladimir
>>>
>>>> On Nov 11, 2019, at 9:23 AM, B. Blaser <bsrbnd at gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Please review the following fix for [1] which has been extensively
>>>> discussed in [2]:
>>>>
>>>> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/
>>>>
>>>> It includes:
>>>> * Vladimir's requested changes about instruction encoding and testing
>>>> along with:
>>>> * John's suggested complementary benchmark following Sandhya's note
>>>> regarding throughput.
>>>>
>>>> Thanks (hotspot:tier1 is OK on Linux/x86_64),
>>>> Bernard
>>>>
>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8214239
>>>> [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/035699.html
>>>

From vladimir.kozlov at oracle.com  Mon Nov 11 20:37:18 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 11 Nov 2019 12:37:18 -0800
Subject: RFR 8233941: adlc should not generate
 Pipeline_Use_Cycle_Mask::operator=
In-Reply-To: <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com>
References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com>
 <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com>
 <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com>
Message-ID: <44025d2f-9d0d-b42f-816f-582e67df62e2@oracle.com>

Patch is empty.

Vladimir

On 11/11/19 12:29 PM, Florian Weimer wrote:
> * Kim Barrett:
> 
>> I agree with that analysis and the suggested solution approach.  I?ve
>> filed a bug for this issue:
> 
>> https://bugs.openjdk.java.net/browse/JDK-8233941
> 
> Thanks, here's a webrev:
> 
>    <http://cr.openjdk.java.net/~fweimer/8233941/webrev.01/>
> 
> I have put this through some testing on x86-64 (although that exercises
> only one branch).
> 
> Thanks,
> Florian
> 

From fweimer at redhat.com  Mon Nov 11 21:36:18 2019
From: fweimer at redhat.com (Florian Weimer)
Date: Mon, 11 Nov 2019 22:36:18 +0100
Subject: RFR 8233941: adlc should not generate
 Pipeline_Use_Cycle_Mask::operator=
In-Reply-To: <44025d2f-9d0d-b42f-816f-582e67df62e2@oracle.com> (Vladimir
 Kozlov's message of "Mon, 11 Nov 2019 12:37:18 -0800")
References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com>
 <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com>
 <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com>
 <44025d2f-9d0d-b42f-816f-582e67df62e2@oracle.com>
Message-ID: <87k186rt7x.fsf@oldenburg2.str.redhat.com>

* Vladimir Kozlov:

> Patch is empty.

Ugh.  Is it better now?

Thanks,
Florian

>
> Vladimir
>
> On 11/11/19 12:29 PM, Florian Weimer wrote:
>> * Kim Barrett:
>>
>>> I agree with that analysis and the suggested solution approach.  I?ve
>>> filed a bug for this issue:
>>
>>> https://bugs.openjdk.java.net/browse/JDK-8233941
>>
>> Thanks, here's a webrev:
>>
>>    <http://cr.openjdk.java.net/~fweimer/8233941/webrev.01/>
>>
>> I have put this through some testing on x86-64 (although that exercises
>> only one branch).
>>
>> Thanks,
>> Florian
>>


From doug.simon at oracle.com  Mon Nov 11 21:44:37 2019
From: doug.simon at oracle.com (Doug Simon)
Date: Mon, 11 Nov 2019 22:44:37 +0100
Subject: RFR(S) : 8233900: [JVMCI] improve help text for
 EnableJVMCIProduct option
In-Reply-To: <3C974BFC-3CF7-4214-B4F3-EE630C32EBF3@oracle.com>
References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com>
 <A64BA248-9194-4EE3-835D-F53B962AB9EE@oracle.com>
 <3C974BFC-3CF7-4214-B4F3-EE630C32EBF3@oracle.com>
Message-ID: <23E9D7B9-F573-4BD7-BF7A-9434AECCE30C@oracle.com>

Thanks for the review Vladimir.

-Doug

> On 11 Nov 2019, at 17:59, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Looks good. Do you need this be backported to 11u? Add affected versions and fix version to rfe.
> 
> Thanks
> Vladimir
> 
>> On Nov 11, 2019, at 2:54 AM, Doug Simon <doug.simon at oracle.com> wrote:
>> 
>> Hi,
>> 
>> Please review this change to improve the help text for EnableJVMCIProduct and related options.
>> 
>> https://dougxc.github.io/webrevs/8233900
>> https://bugs.openjdk.java.net/browse/JDK-8233900
>> 
>> -Doug
> 


From vladimir.kozlov at oracle.com  Mon Nov 11 23:35:45 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 11 Nov 2019 15:35:45 -0800
Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <aec3fe39-cac8-df30-d77c-a8ca4ecf8a33@oracle.com>
References: <CAEgw74CAZ+8uwjRoP_ti0yyyVAKQ_KMrM7mWJ4+XeqaKUbicyQ@mail.gmail.com>
 <07845286-F5FB-4A7C-925B-8C5488EA7A02@oracle.com>
 <b4d89d7e-f048-49a1-f5ac-1d202a78a2a4@oracle.com>
 <CAEgw74BMG2rCDW8u+qXbmocnChq7UuGYi99r7jy0LO0s9bHDMw@mail.gmail.com>
 <aec3fe39-cac8-df30-d77c-a8ca4ecf8a33@oracle.com>
Message-ID: <6ebb5cee-04ca-4cb8-5c6a-6dc29f52d73a@oracle.com>

Bernard,

Testing passed clean. After second review it can be pushed.

Thanks,
Vladimir

On 11/11/19 12:31 PM, Vladimir Kozlov wrote:
> It is fine. I will submit our internal testing (more tiers) and let you know results.
> 
> Vladimir
> 
> On 11/11/19 12:24 PM, B. Blaser wrote:
>> I pushed it to jdk/submit and all seems to be OK:
>>
>> http://hg.openjdk.java.net/jdk/submit/rev/cbe81ae81095
>>
>> Should we run more tests?
>>
>> Bernard
>>
>> On Mon, 11 Nov 2019 at 21:16, Vladimir Kozlov
>> <vladimir.kozlov at oracle.com> wrote:
>>>
>>> On 11/11/19 12:14 PM, Vladimir Kozlov wrote:
>>>> Very good. What testing you did?
>>>
>>> I mean in addition to tier1.
>>>
>>> Vladimir
>>>
>>>>
>>>> Thanks
>>>> Vladimir
>>>>
>>>>> On Nov 11, 2019, at 9:23 AM, B. Blaser <bsrbnd at gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Please review the following fix for [1] which has been extensively
>>>>> discussed in [2]:
>>>>>
>>>>> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/
>>>>>
>>>>> It includes:
>>>>> * Vladimir's requested changes about instruction encoding and testing
>>>>> along with:
>>>>> * John's suggested complementary benchmark following Sandhya's note
>>>>> regarding throughput.
>>>>>
>>>>> Thanks (hotspot:tier1 is OK on Linux/x86_64),
>>>>> Bernard
>>>>>
>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8214239
>>>>> [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/035699.html
>>>>

From vladimir.kozlov at oracle.com  Mon Nov 11 23:37:10 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 11 Nov 2019 15:37:10 -0800
Subject: RFR 8233941: adlc should not generate
 Pipeline_Use_Cycle_Mask::operator=
In-Reply-To: <87k186rt7x.fsf@oldenburg2.str.redhat.com>
References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com>
 <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com>
 <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com>
 <44025d2f-9d0d-b42f-816f-582e67df62e2@oracle.com>
 <87k186rt7x.fsf@oldenburg2.str.redhat.com>
Message-ID: <72d6aa27-014c-7344-b782-c7f9b9bfd5ed@oracle.com>

Yes, it looks good.

Someone have to test it to make sure it builds and runs on all supported platforms (and compilers).

Thanks,
vladimir

On 11/11/19 1:36 PM, Florian Weimer wrote:
> * Vladimir Kozlov:
> 
>> Patch is empty.
> 
> Ugh.  Is it better now?
> 
> Thanks,
> Florian
> 
>>
>> Vladimir
>>
>> On 11/11/19 12:29 PM, Florian Weimer wrote:
>>> * Kim Barrett:
>>>
>>>> I agree with that analysis and the suggested solution approach.  I?ve
>>>> filed a bug for this issue:
>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8233941
>>>
>>> Thanks, here's a webrev:
>>>
>>>     <http://cr.openjdk.java.net/~fweimer/8233941/webrev.01/>
>>>
>>> I have put this through some testing on x86-64 (although that exercises
>>> only one branch).
>>>
>>> Thanks,
>>> Florian
>>>
> 

From gromero at linux.vnet.ibm.com  Tue Nov 12 02:54:17 2019
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Mon, 11 Nov 2019 23:54:17 -0300
Subject: [8u] RFR for backport of 8216060 (CRC32 3/4): [PPC64] Vector CRC
 implementation should be used by interpreter and be faster for short
 arrays
In-Reply-To: <675a6c68-ca08-27ba-d9cb-8fa02efc5102@linux.vnet.ibm.com>
References: <675a6c68-ca08-27ba-d9cb-8fa02efc5102@linux.vnet.ibm.com>
Message-ID: <92c29733-7916-3069-b913-4c51fc5424df@linux.vnet.ibm.com>

Hi Martin,

Please find v3 for CRC32 3/4 accordingly to your last review in:

http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v3_/8216060/

I kept the at the CRC32C defines.


Thank you & best regards,
Gustavo

From gromero at linux.vnet.ibm.com  Tue Nov 12 02:58:48 2019
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Mon, 11 Nov 2019 23:58:48 -0300
Subject: [8u] RFR for backport of 8217459 (CRC32 4/4): [PPC64] Cleanup
 non-vector version of CRC32
In-Reply-To: <06bc93e6-59b4-95ec-7a27-ef789ac51564@linux.vnet.ibm.com>
References: <06bc93e6-59b4-95ec-7a27-ef789ac51564@linux.vnet.ibm.com>
Message-ID: <e86333d3-8991-3d9d-7923-bc37c7d1f68d@linux.vnet.ibm.com>

Hi Martin,

Change 8206173 is now backported to jdk8u-dev:

http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/9148fcba5de9

So could you please review v3 CRC32 4/4 accordingly to your last review? The
change now is PPC64-only since shared code part was pushed with 8206173.

Please find v3 in:

http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v3_/8217459/

Thank you & best regards,
Gustavo

From tobias.hartmann at oracle.com  Tue Nov 12 08:05:56 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 12 Nov 2019 09:05:56 +0100
Subject: RFR 8233941: adlc should not generate
 Pipeline_Use_Cycle_Mask::operator=
In-Reply-To: <72d6aa27-014c-7344-b782-c7f9b9bfd5ed@oracle.com>
References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com>
 <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com>
 <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com>
 <44025d2f-9d0d-b42f-816f-582e67df62e2@oracle.com>
 <87k186rt7x.fsf@oldenburg2.str.redhat.com>
 <72d6aa27-014c-7344-b782-c7f9b9bfd5ed@oracle.com>
Message-ID: <8743f6f3-37a5-e1f7-e607-5a7cfe75c5d5@oracle.com>

Looks good to me too. I'll run some testing and sponsor if everything passes.

Best regards,
Tobias

On 12.11.19 00:37, Vladimir Kozlov wrote:
> Yes, it looks good.
> 
> Someone have to test it to make sure it builds and runs on all supported platforms (and compilers).
> 
> Thanks,
> vladimir
> 
> On 11/11/19 1:36 PM, Florian Weimer wrote:
>> * Vladimir Kozlov:
>>
>>> Patch is empty.
>>
>> Ugh.? Is it better now?
>>
>> Thanks,
>> Florian
>>
>>>
>>> Vladimir
>>>
>>> On 11/11/19 12:29 PM, Florian Weimer wrote:
>>>> * Kim Barrett:
>>>>
>>>>> I agree with that analysis and the suggested solution approach.? I?ve
>>>>> filed a bug for this issue:
>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8233941
>>>>
>>>> Thanks, here's a webrev:
>>>>
>>>> ??? <http://cr.openjdk.java.net/~fweimer/8233941/webrev.01/>
>>>>
>>>> I have put this through some testing on x86-64 (although that exercises
>>>> only one branch).
>>>>
>>>> Thanks,
>>>> Florian
>>>>
>>

From tobias.hartmann at oracle.com  Tue Nov 12 08:30:28 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 12 Nov 2019 09:30:28 +0100
Subject: RFR(S): Test crashed with assert(phi->operand_count() != 1 ||
 phi->subst() != phi) failed: missed trivial simplification
In-Reply-To: <VI1PR0201MB24793BAE30999C6DEA426E779A740@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB24793BAE30999C6DEA426E779A740@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <9f3b6da1-9d86-5850-14b7-ea53d99e54ab@oracle.com>

Hi Martin,

looks reasonable to me.

Best regards,
Tobias

On 11.11.19 11:39, Doerr, Martin wrote:
> Hi,
> 
> some C1 assertions currently don't deal correctly with illegal phi functions (phi function with illegal type due to type conflict).
> Since JDK-8214352 we bail out in fewer situations, so C1 needs to deal with a few more illegal phi cases.
> 
> The assertion (see headline) in PhiSimplifier doesn't support illegal phi, but the simplify call before it does (by just skipping).
> 
> Another assertion which needs to skip illegal phi is in c1_Optimizer where we merge a block with its unique successor.
> If a local value of the succeeding block is coming from an illegal phi function, we drop it and take the value from the first block.
> This is correct, only the assertion doesn't expect that at the moment.
> 
> So I'd like to fix these 2 assertions I found:
> http://cr.openjdk.java.net/~mdoerr/8233820_C1_illegal_phi/webrev.00/
> 
> Best regards,
> Martin
> 

From tobias.hartmann at oracle.com  Tue Nov 12 08:36:59 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 12 Nov 2019 09:36:59 +0100
Subject: RFR: 8233885: Test fails with assert(comp != __null) failed:
 Ensure we have a compiler
In-Reply-To: <df2883da-5b3e-69af-8e1e-a87a82e1e55a@loongson.cn>
References: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn>
 <11304b85-3a28-017b-645e-e76fe4c2d0f8@loongson.cn>
 <016d980a-1197-be1e-6078-23290e9ae365@oracle.com>
 <df2883da-5b3e-69af-8e1e-a87a82e1e55a@loongson.cn>
Message-ID: <9967d8f1-19d1-6ca1-5ad2-490c36801544@oracle.com>

Hi Jie,

seems reasonable to me but Igor should have a look as well.

Thanks,
Tobias

On 11.11.19 14:28, Jie Fu wrote:
> Hi Tobias,
> 
> Thank you for your review and valuable comments.
> Updated: http://cr.openjdk.java.net/~jiefu/8233885/webrev.02/
> 
> Thanks a lot.
> Best regards,
> Jie
> 
> On 2019/11/11 ??3:44, Tobias Hartmann wrote:
>> Hi Jie,
>>
>> what about the high-only-quick-internal mode?
> It's really a nice catch.
> Fixed. Thanks.
> 
> 
>> While looking at the fix for 8227003, I spotted a little typo here ("mininum"):
>> https://hg.openjdk.java.net/jdk/jdk/rev/b95bead30957#l6.8
>> Maybe you can fix that as well with your patch.
> Done.
> 

From vladimir.x.ivanov at oracle.com  Tue Nov 12 09:00:21 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 12 Nov 2019 12:00:21 +0300
Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <CAEgw74CAZ+8uwjRoP_ti0yyyVAKQ_KMrM7mWJ4+XeqaKUbicyQ@mail.gmail.com>
References: <CAEgw74CAZ+8uwjRoP_ti0yyyVAKQ_KMrM7mWJ4+XeqaKUbicyQ@mail.gmail.com>
Message-ID: <15fe28f9-e3e9-2766-2287-0fb1762b4414@oracle.com>

Hi Bernard,

Nice improvement!

> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/

I don't see cases for non-constant masks John suggested covered. Have 
you tried to implement them? Any problems encountered or did you just 
leave them for future improvement?

====================================================================

I'd like to see asserts in new MacroAssembler methods that imm8 fits 
into a byte, like:

+void Assembler::btsq(Address dst, int imm8) {
+  assert(isByte(imm8), "not a byte");
+  InstructionMark im(this);


====================================================================

src/hotspot/cpu/x86/x86_64.ad:

+operand immPow2L()
+%{
+  // n should be a pure 64-bit power of 2 immediate.
+  predicate(is_power_of_2_long(n->get_long()) && 
log2_long(n->get_long()) > 31);

+operand immPow2NotL()
+%{
+  // n should be a pure 64-bit immediate given that not(n) is a power of 2.

Why do you limit the optimization to bits in upper half? Is it because 
ordinary andq/orq instructions work well for the rest? If that's the 
case, it deserves a comment.

(immPow2NotL is a bit misleading: I read it as "power of 2, but not a 
long". What do you think about immL_NegPow2/immL_Pow2? Not sure how to 
encode that it's > 2^32, but I would just skip it for now.)

====================================================================

+instruct btrL_mem_imm(memory dst, immPow2NotL src, rFlagsReg cr) %{
+  match(Set dst (StoreL dst (AndL (LoadL dst) src)));

+instruct btsL_mem_imm(memory dst, immPow2L    src, rFlagsReg cr) %{
+  match(Set dst (StoreL dst (OrL (LoadL dst) src)));

Does it make sense to cover 32-/16-bit cases the same way? (Relates to 
the earlier question on bits from upper half.)

Do you leave out in-register updates because they don't give any 
benefits compared to andq/orq?

Also, please, use "con" instead of "src". It's easier to read when the 
name reflects that the value is a constant.

====================================================================

test/hotspot/jtreg/compiler/c2/TestBitSetAndReset.java:

As I understand, the only test case which exercises new code is:

   64     private static void test63() {
   65         andq &= ~MASK63;
   66         orq |= MASK63;
   67     }

Please, at least, add a case for MASK32.

   29  * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions 
-XX:+UnlockDiagnosticVMOptions
   30  *                   -XX:-TieredCompilation 
-XX:CompileThresholdScaling=0.1 -XX:-Inline
   31  * 
-XX:CompileCommand=print,compiler/c2/TestBitSetAndReset.test*
   32  * 
-XX:CompileCommand=compileonly,compiler/c2/TestBitSetAndReset.test*
   33  *                   compiler.c2.TestBitSetAndReset

Since you explicitly disable tiered, you can just directly set the 
threshold instead (-XX:CompileThreshold=1000).

-XX:-Inline is redundant and you can replace compileonly directive with 
dontinline to speed up the test.

Best regards,
Vladimir Ivanov
> 
> It includes:
> * Vladimir's requested changes about instruction encoding and testing
> along with:
> * John's suggested complementary benchmark following Sandhya's note
> regarding throughput.
> 
> Thanks (hotspot:tier1 is OK on Linux/x86_64),
> Bernard
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8214239
> [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/035699.html
> 

From vladimir.x.ivanov at oracle.com  Tue Nov 12 09:02:27 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 12 Nov 2019 12:02:27 +0300
Subject: RFR(S): Test crashed with assert(phi->operand_count() != 1 ||
 phi->subst() != phi) failed: missed trivial simplification
In-Reply-To: <VI1PR0201MB24793BAE30999C6DEA426E779A740@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB24793BAE30999C6DEA426E779A740@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <d810eceb-ac2c-9914-be2f-a5e2b2f529fa@oracle.com>

Hi Martin,

> http://cr.openjdk.java.net/~mdoerr/8233820_C1_illegal_phi/webrev.00/

Looks good.

Best regards,
Vladimir Ivanov

From igor.veresov at oracle.com  Tue Nov 12 09:11:44 2019
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 12 Nov 2019 01:11:44 -0800
Subject: RFR: 8233885: Test fails with assert(comp != __null) failed:
 Ensure we have a compiler
In-Reply-To: <9967d8f1-19d1-6ca1-5ad2-490c36801544@oracle.com>
References: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn>
 <11304b85-3a28-017b-645e-e76fe4c2d0f8@loongson.cn>
 <016d980a-1197-be1e-6078-23290e9ae365@oracle.com>
 <df2883da-5b3e-69af-8e1e-a87a82e1e55a@loongson.cn>
 <9967d8f1-19d1-6ca1-5ad2-490c36801544@oracle.com>
Message-ID: <B636F4C2-F6A2-4A67-858F-0B58F3551986@oracle.com>

Looks good to me.

igor


> On Nov 12, 2019, at 12:36 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> Hi Jie,
> 
> seems reasonable to me but Igor should have a look as well.
> 
> Thanks,
> Tobias
> 
> On 11.11.19 14:28, Jie Fu wrote:
>> Hi Tobias,
>> 
>> Thank you for your review and valuable comments.
>> Updated: http://cr.openjdk.java.net/~jiefu/8233885/webrev.02/
>> 
>> Thanks a lot.
>> Best regards,
>> Jie
>> 
>> On 2019/11/11 ??3:44, Tobias Hartmann wrote:
>>> Hi Jie,
>>> 
>>> what about the high-only-quick-internal mode?
>> It's really a nice catch.
>> Fixed. Thanks.
>> 
>> 
>>> While looking at the fix for 8227003, I spotted a little typo here ("mininum"):
>>> https://hg.openjdk.java.net/jdk/jdk/rev/b95bead30957#l6.8
>>> Maybe you can fix that as well with your patch.
>> Done.
>> 


From tobias.hartmann at oracle.com  Tue Nov 12 09:14:05 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 12 Nov 2019 10:14:05 +0100
Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed:
 must have CFG nodes
In-Reply-To: <720f25b4-e153-652a-cef3-f41084d30662@oracle.com>
References: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>
 <bf5d6250-abcc-e820-7886-a966fae1af9e@oracle.com>
 <af8bacdd-09d8-3c11-6444-ee51a6afc0a4@oracle.com>
 <720f25b4-e153-652a-cef3-f41084d30662@oracle.com>
Message-ID: <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com>

Hi Vladimir,

On 11.11.19 13:06, Vladimir Ivanov wrote:
> Yes, I'm curious whether it's enough to check that in(0) is NULL, TOP, or Proj(_, TOP). Are there
> any other important cases left?

I don't think in(0) == NULL can/should happen and I don't think there are any other cases left.

> My concerns is that other usages of PhaseGVN::is_dominator() may be affected the same way as well.

The method currently has two implementations:
- PhaseIdealLoop::is_dominator
- PhaseGVN/PhaseIterGVN:is_dominator -> PhaseGVN::is_dominator_helper

Both assert is_CFG() for the arguments so it's the callers responsibility to ensure that.

> It looks like PhaseGVN::is_dominator_helper() would benefit from additional checks:
> 
> bool PhaseGVN::is_dominator_helper(Node *d, Node *n, bool linear_only) {
> ? if (d->is_top() || n->is_top()) {
> ??? return false;
> ? }
> ? assert(d->is_CFG() && n->is_CFG(), "must have CFG nodes");
> ? ...

Do you mean simply converting the assert to a check or adding additional asserts?

> As an example, InitializeNode::detect_init_independence() does some control normalization first:

Yes but that method processes data and control edges.

Thanks,
Tobias

From fujie at loongson.cn  Tue Nov 12 09:29:57 2019
From: fujie at loongson.cn (Jie Fu)
Date: Tue, 12 Nov 2019 17:29:57 +0800
Subject: RFR: 8233885: Test fails with assert(comp != __null) failed:
 Ensure we have a compiler
In-Reply-To: <B636F4C2-F6A2-4A67-858F-0B58F3551986@oracle.com>
References: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn>
 <11304b85-3a28-017b-645e-e76fe4c2d0f8@loongson.cn>
 <016d980a-1197-be1e-6078-23290e9ae365@oracle.com>
 <df2883da-5b3e-69af-8e1e-a87a82e1e55a@loongson.cn>
 <9967d8f1-19d1-6ca1-5ad2-490c36801544@oracle.com>
 <B636F4C2-F6A2-4A67-858F-0B58F3551986@oracle.com>
Message-ID: <86d80738-aef5-3d6d-fe40-f226575bbcb9@loongson.cn>

Thanks Igor for your review.

Hope you can sponsor it.

Thanks a lot.
Best regards,
Jie

On 2019/11/12 ??5:11, Igor Veresov wrote:
> Looks good to me.
>
> igor
>
>
>
>> On Nov 12, 2019, at 12:36 AM, Tobias Hartmann 
>> <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>
>> Hi Jie,
>>
>> seems reasonable to me but Igor should have a look as well.
>>
>> Thanks,
>> Tobias
>>
>> On 11.11.19 14:28, Jie Fu wrote:
>>> Hi Tobias,
>>>
>>> Thank you for your review and valuable comments.
>>> Updated: http://cr.openjdk.java.net/~jiefu/8233885/webrev.02/
>>>
>>> Thanks a lot.
>>> Best regards,
>>> Jie
>>>
>>> On 2019/11/11 ??3:44, Tobias Hartmann wrote:
>>>> Hi Jie,
>>>>
>>>> what about the high-only-quick-internal mode?
>>> It's really a nice catch.
>>> Fixed. Thanks.
>>>
>>>
>>>> While looking at the fix for 8227003, I spotted a little typo here 
>>>> ("mininum"):
>>>> https://hg.openjdk.java.net/jdk/jdk/rev/b95bead30957#l6.8
>>>> Maybe you can fix that as well with your patch.
>>> Done.
>>>
>

From vladimir.x.ivanov at oracle.com  Tue Nov 12 09:39:31 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 12 Nov 2019 12:39:31 +0300
Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed:
 must have CFG nodes
In-Reply-To: <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com>
References: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>
 <bf5d6250-abcc-e820-7886-a966fae1af9e@oracle.com>
 <af8bacdd-09d8-3c11-6444-ee51a6afc0a4@oracle.com>
 <720f25b4-e153-652a-cef3-f41084d30662@oracle.com>
 <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com>
Message-ID: <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com>

Hi Tobias,

>> My concerns is that other usages of PhaseGVN::is_dominator() may be affected the same way as well.
> 
> The method currently has two implementations:
> - PhaseIdealLoop::is_dominator
> - PhaseGVN/PhaseIterGVN:is_dominator -> PhaseGVN::is_dominator_helper
> 
> Both assert is_CFG() for the arguments so it's the callers responsibility to ensure that.
> 
>> It looks like PhaseGVN::is_dominator_helper() would benefit from additional checks:
>>
>> bool PhaseGVN::is_dominator_helper(Node *d, Node *n, bool linear_only) {
>>  ? if (d->is_top() || n->is_top()) {
>>  ??? return false;
>>  ? }
>>  ? assert(d->is_CFG() && n->is_CFG(), "must have CFG nodes");
>>  ? ...
> 
> Do you mean simply converting the assert to a check or adding additional asserts?

(Assuming in(0) == NULL or TOP or Proj(_, TOP) are the only cases we 
need to care about.)

Currently, ConstraintCastNode::dominating_cast() does NULL check, 
PhaseGVN::is_dominator_helper() does TOP check, but Proj(TOP) is left 
uncovered and it reaches d->is_CFG() check.

Your fix covers both TOP and Proj(_, TOP) on 
ConstraintCastNode::dominating_cast() by doing is_CFG() check.

What I'm in favor of is to handle Proj(TOP) case explicitly and there 
are other places in the code base which do that. (It may sound too 
subtle, but it doesn't look right when the code performs in(0)->is_CFG() 
check outside of an assert.)

I mentioned InitializeNode::detect_init_independence() as an example how 
control info processing can be done [1]. It covers NULL, TOP, and 
Proj(TOP) cases, but without is_CFG() check.

Considering current shape of ConstraintCastNode::dominating_cast() and 
that PhaseGVN::is_dominator_helper() already assumes non-NULL inputs, 
putting control normalization before TOP checks should solve the problem 
as well:

   bool PhaseGVN::is_dominator_helper(Node *d, Node *n, bool linear_only) {
   + if (d->is_Proj())  d = d->in(0);
   + if (n->is_Proj())  n = n->in(0);
     if (d->is_top() || n->is_top()) {
       return false;
     }

Best regards,
Vladimir Ivanov

[1]

     Node* ctl = n->in(0);
     if (ctl != NULL && !ctl->is_top()) {
       if (ctl->is_Proj())  ctl = ctl->in(0);
       ...
       if (!MemNode::all_controls_dominate(n, this))

bool MemNode::all_controls_dominate(Node* dom, Node* sub) {
   if (dom == NULL || dom->is_top() || sub == NULL || sub->is_top())
     return false; // Conservative answer for dead code

>> As an example, InitializeNode::detect_init_independence() does some control normalization first:
> 
> Yes but that method processes data and control edges.
> 
> Thanks,
> Tobias
> 

From tobias.hartmann at oracle.com  Tue Nov 12 09:46:22 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 12 Nov 2019 10:46:22 +0100
Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed:
 must have CFG nodes
In-Reply-To: <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com>
References: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>
 <bf5d6250-abcc-e820-7886-a966fae1af9e@oracle.com>
 <af8bacdd-09d8-3c11-6444-ee51a6afc0a4@oracle.com>
 <720f25b4-e153-652a-cef3-f41084d30662@oracle.com>
 <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com>
 <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com>
Message-ID: <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com>


On 12.11.19 10:39, Vladimir Ivanov wrote:
> (Assuming in(0) == NULL or TOP or Proj(_, TOP) are the only cases we need to care about.)
> 
> Currently, ConstraintCastNode::dominating_cast() does NULL check, PhaseGVN::is_dominator_helper()
> does TOP check, but Proj(TOP) is left uncovered and it reaches d->is_CFG() check.
> 
> Your fix covers both TOP and Proj(_, TOP) on ConstraintCastNode::dominating_cast() by doing is_CFG()
> check.
> 
> What I'm in favor of is to handle Proj(TOP) case explicitly and there are other places in the code
> base which do that. (It may sound too subtle, but it doesn't look right when the code performs
> in(0)->is_CFG() check outside of an assert.)
> 
> I mentioned InitializeNode::detect_init_independence() as an example how control info processing can
> be done [1]. It covers NULL, TOP, and Proj(TOP) cases, but without is_CFG() check.
> 
> Considering current shape of ConstraintCastNode::dominating_cast() and that
> PhaseGVN::is_dominator_helper() already assumes non-NULL inputs, putting control normalization
> before TOP checks should solve the problem as well:
> 
> ? bool PhaseGVN::is_dominator_helper(Node *d, Node *n, bool linear_only) {
> ? + if (d->is_Proj())? d = d->in(0);
> ? + if (n->is_Proj())? n = n->in(0);
> ??? if (d->is_top() || n->is_top()) {
> ????? return false;
> ??? }

Okay, let's go with that version then:
http://cr.openjdk.java.net/~thartmann/8233656/webrev.01/

I've verified that it still fixes the problem.

Thanks,
Tobias

From vladimir.x.ivanov at oracle.com  Tue Nov 12 09:52:08 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 12 Nov 2019 12:52:08 +0300
Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed:
 must have CFG nodes
In-Reply-To: <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com>
References: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>
 <bf5d6250-abcc-e820-7886-a966fae1af9e@oracle.com>
 <af8bacdd-09d8-3c11-6444-ee51a6afc0a4@oracle.com>
 <720f25b4-e153-652a-cef3-f41084d30662@oracle.com>
 <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com>
 <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com>
 <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com>
Message-ID: <be05a933-f91b-840e-1e0b-a1d412d5ca71@oracle.com>


> http://cr.openjdk.java.net/~thartmann/8233656/webrev.01/

Looks good.

Best regards,
Vladimir Ivanov

From tobias.hartmann at oracle.com  Tue Nov 12 09:52:27 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 12 Nov 2019 10:52:27 +0100
Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed:
 must have CFG nodes
In-Reply-To: <be05a933-f91b-840e-1e0b-a1d412d5ca71@oracle.com>
References: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>
 <bf5d6250-abcc-e820-7886-a966fae1af9e@oracle.com>
 <af8bacdd-09d8-3c11-6444-ee51a6afc0a4@oracle.com>
 <720f25b4-e153-652a-cef3-f41084d30662@oracle.com>
 <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com>
 <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com>
 <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com>
 <be05a933-f91b-840e-1e0b-a1d412d5ca71@oracle.com>
Message-ID: <47863dda-8509-2d94-83b5-56ad76c178e2@oracle.com>

Thanks Vladimir!

Best regards,
Tobias

On 12.11.19 10:52, Vladimir Ivanov wrote:
> 
>> http://cr.openjdk.java.net/~thartmann/8233656/webrev.01/
> 
> Looks good.
> 
> Best regards,
> Vladimir Ivanov

From tobias.hartmann at oracle.com  Tue Nov 12 10:49:53 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 12 Nov 2019 11:49:53 +0100
Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed:
 must have CFG nodes
In-Reply-To: <47863dda-8509-2d94-83b5-56ad76c178e2@oracle.com>
References: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>
 <bf5d6250-abcc-e820-7886-a966fae1af9e@oracle.com>
 <af8bacdd-09d8-3c11-6444-ee51a6afc0a4@oracle.com>
 <720f25b4-e153-652a-cef3-f41084d30662@oracle.com>
 <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com>
 <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com>
 <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com>
 <be05a933-f91b-840e-1e0b-a1d412d5ca71@oracle.com>
 <47863dda-8509-2d94-83b5-56ad76c178e2@oracle.com>
Message-ID: <aed1473b-c4d5-fabd-431b-bae83fc8af2f@oracle.com>

Okay, this is actually not correct:
If d is an IfTrue projection, we would change d to the corresponding If node which could be a
dominator of n while the IfTrue projection is not.

Best regards,
Tobias

On 12.11.19 10:52, Tobias Hartmann wrote:
> Thanks Vladimir!
> 
> Best regards,
> Tobias
> 
> On 12.11.19 10:52, Vladimir Ivanov wrote:
>>
>>> http://cr.openjdk.java.net/~thartmann/8233656/webrev.01/
>>
>> Looks good.
>>
>> Best regards,
>> Vladimir Ivanov

From tobias.hartmann at oracle.com  Tue Nov 12 10:58:50 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 12 Nov 2019 11:58:50 +0100
Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed:
 must have CFG nodes
In-Reply-To: <aed1473b-c4d5-fabd-431b-bae83fc8af2f@oracle.com>
References: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>
 <bf5d6250-abcc-e820-7886-a966fae1af9e@oracle.com>
 <af8bacdd-09d8-3c11-6444-ee51a6afc0a4@oracle.com>
 <720f25b4-e153-652a-cef3-f41084d30662@oracle.com>
 <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com>
 <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com>
 <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com>
 <be05a933-f91b-840e-1e0b-a1d412d5ca71@oracle.com>
 <47863dda-8509-2d94-83b5-56ad76c178e2@oracle.com>
 <aed1473b-c4d5-fabd-431b-bae83fc8af2f@oracle.com>
Message-ID: <6608380f-065f-a839-2e40-a58e43cfc7ec@oracle.com>

Here's a new version:
http://cr.openjdk.java.net/~thartmann/8233656/webrev.02/

Thanks,
Tobias

On 12.11.19 11:49, Tobias Hartmann wrote:
> Okay, this is actually not correct:
> If d is an IfTrue projection, we would change d to the corresponding If node which could be a
> dominator of n while the IfTrue projection is not.
> 
> Best regards,
> Tobias
> 
> On 12.11.19 10:52, Tobias Hartmann wrote:
>> Thanks Vladimir!
>>
>> Best regards,
>> Tobias
>>
>> On 12.11.19 10:52, Vladimir Ivanov wrote:
>>>
>>>> http://cr.openjdk.java.net/~thartmann/8233656/webrev.01/
>>>
>>> Looks good.
>>>
>>> Best regards,
>>> Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Tue Nov 12 11:00:31 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 12 Nov 2019 14:00:31 +0300
Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed:
 must have CFG nodes
In-Reply-To: <6608380f-065f-a839-2e40-a58e43cfc7ec@oracle.com>
References: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>
 <bf5d6250-abcc-e820-7886-a966fae1af9e@oracle.com>
 <af8bacdd-09d8-3c11-6444-ee51a6afc0a4@oracle.com>
 <720f25b4-e153-652a-cef3-f41084d30662@oracle.com>
 <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com>
 <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com>
 <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com>
 <be05a933-f91b-840e-1e0b-a1d412d5ca71@oracle.com>
 <47863dda-8509-2d94-83b5-56ad76c178e2@oracle.com>
 <aed1473b-c4d5-fabd-431b-bae83fc8af2f@oracle.com>
 <6608380f-065f-a839-2e40-a58e43cfc7ec@oracle.com>
Message-ID: <55581f09-b442-b379-23ce-236ae46e9fff@oracle.com>


> http://cr.openjdk.java.net/~thartmann/8233656/webrev.02/

Looks good. (And sorry for the misleading suggestion.)

Best regards,
Vladimir Ivanov

> On 12.11.19 11:49, Tobias Hartmann wrote:
>> Okay, this is actually not correct:
>> If d is an IfTrue projection, we would change d to the corresponding If node which could be a
>> dominator of n while the IfTrue projection is not.
>>
>> Best regards,
>> Tobias
>>
>> On 12.11.19 10:52, Tobias Hartmann wrote:
>>> Thanks Vladimir!
>>>
>>> Best regards,
>>> Tobias
>>>
>>> On 12.11.19 10:52, Vladimir Ivanov wrote:
>>>>
>>>>> http://cr.openjdk.java.net/~thartmann/8233656/webrev.01/
>>>>
>>>> Looks good.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov

From tobias.hartmann at oracle.com  Tue Nov 12 11:07:40 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 12 Nov 2019 12:07:40 +0100
Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed:
 must have CFG nodes
In-Reply-To: <55581f09-b442-b379-23ce-236ae46e9fff@oracle.com>
References: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>
 <bf5d6250-abcc-e820-7886-a966fae1af9e@oracle.com>
 <af8bacdd-09d8-3c11-6444-ee51a6afc0a4@oracle.com>
 <720f25b4-e153-652a-cef3-f41084d30662@oracle.com>
 <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com>
 <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com>
 <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com>
 <be05a933-f91b-840e-1e0b-a1d412d5ca71@oracle.com>
 <47863dda-8509-2d94-83b5-56ad76c178e2@oracle.com>
 <aed1473b-c4d5-fabd-431b-bae83fc8af2f@oracle.com>
 <6608380f-065f-a839-2e40-a58e43cfc7ec@oracle.com>
 <55581f09-b442-b379-23ce-236ae46e9fff@oracle.com>
Message-ID: <2bde2c43-1230-3d7b-ebd5-7befd340a801@oracle.com>

Thanks again, Vladimir.

Best regards,
Tobias

On 12.11.19 12:00, Vladimir Ivanov wrote:
> 
>> http://cr.openjdk.java.net/~thartmann/8233656/webrev.02/
> 
> Looks good. (And sorry for the misleading suggestion.)
> 
> Best regards,
> Vladimir Ivanov
> 
>> On 12.11.19 11:49, Tobias Hartmann wrote:
>>> Okay, this is actually not correct:
>>> If d is an IfTrue projection, we would change d to the corresponding If node which could be a
>>> dominator of n while the IfTrue projection is not.
>>>
>>> Best regards,
>>> Tobias
>>>
>>> On 12.11.19 10:52, Tobias Hartmann wrote:
>>>> Thanks Vladimir!
>>>>
>>>> Best regards,
>>>> Tobias
>>>>
>>>> On 12.11.19 10:52, Vladimir Ivanov wrote:
>>>>>
>>>>>> http://cr.openjdk.java.net/~thartmann/8233656/webrev.01/
>>>>>
>>>>> Looks good.
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov

From tobias.hartmann at oracle.com  Tue Nov 12 11:13:35 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 12 Nov 2019 12:13:35 +0100
Subject: RFR: 8233885: Test fails with assert(comp != __null) failed:
 Ensure we have a compiler
In-Reply-To: <86d80738-aef5-3d6d-fe40-f226575bbcb9@loongson.cn>
References: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn>
 <11304b85-3a28-017b-645e-e76fe4c2d0f8@loongson.cn>
 <016d980a-1197-be1e-6078-23290e9ae365@oracle.com>
 <df2883da-5b3e-69af-8e1e-a87a82e1e55a@loongson.cn>
 <9967d8f1-19d1-6ca1-5ad2-490c36801544@oracle.com>
 <B636F4C2-F6A2-4A67-858F-0B58F3551986@oracle.com>
 <86d80738-aef5-3d6d-fe40-f226575bbcb9@loongson.cn>
Message-ID: <6f97cdb5-ce66-ed35-83e4-37149d2740be@oracle.com>

Hi Jie,

I've pushed the fix.

Best regards,
Tobias

On 12.11.19 10:29, Jie Fu wrote:
> Thanks Igor for your review.
> 
> Hope you can sponsor it.
> 
> Thanks a lot.
> Best regards,
> Jie
> 
> On 2019/11/12 ??5:11, Igor Veresov wrote:
>> Looks good to me.
>>
>> igor
>>
>>
>>
>>> On Nov 12, 2019, at 12:36 AM, Tobias Hartmann <tobias.hartmann at oracle.com
>>> <mailto:tobias.hartmann at oracle.com>> wrote:
>>>
>>> Hi Jie,
>>>
>>> seems reasonable to me but Igor should have a look as well.
>>>
>>> Thanks,
>>> Tobias
>>>
>>> On 11.11.19 14:28, Jie Fu wrote:
>>>> Hi Tobias,
>>>>
>>>> Thank you for your review and valuable comments.
>>>> Updated: http://cr.openjdk.java.net/~jiefu/8233885/webrev.02/
>>>>
>>>> Thanks a lot.
>>>> Best regards,
>>>> Jie
>>>>
>>>> On 2019/11/11 ??3:44, Tobias Hartmann wrote:
>>>>> Hi Jie,
>>>>>
>>>>> what about the high-only-quick-internal mode?
>>>> It's really a nice catch.
>>>> Fixed. Thanks.
>>>>
>>>>
>>>>> While looking at the fix for 8227003, I spotted a little typo here ("mininum"):
>>>>> https://hg.openjdk.java.net/jdk/jdk/rev/b95bead30957#l6.8
>>>>> Maybe you can fix that as well with your patch.
>>>> Done.
>>>>
>>

From fujie at loongson.cn  Tue Nov 12 11:18:43 2019
From: fujie at loongson.cn (Jie Fu)
Date: Tue, 12 Nov 2019 19:18:43 +0800
Subject: RFR: 8233885: Test fails with assert(comp != __null) failed:
 Ensure we have a compiler
In-Reply-To: <6f97cdb5-ce66-ed35-83e4-37149d2740be@oracle.com>
References: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn>
 <11304b85-3a28-017b-645e-e76fe4c2d0f8@loongson.cn>
 <016d980a-1197-be1e-6078-23290e9ae365@oracle.com>
 <df2883da-5b3e-69af-8e1e-a87a82e1e55a@loongson.cn>
 <9967d8f1-19d1-6ca1-5ad2-490c36801544@oracle.com>
 <B636F4C2-F6A2-4A67-858F-0B58F3551986@oracle.com>
 <86d80738-aef5-3d6d-fe40-f226575bbcb9@loongson.cn>
 <6f97cdb5-ce66-ed35-83e4-37149d2740be@oracle.com>
Message-ID: <19c26acb-e5be-bba4-1ae6-bf2f2137bcd9@loongson.cn>

Thank you so much, Tobias.

On 2019/11/12 ??7:13, Tobias Hartmann wrote:
> I've pushed the fix.


From tobias.hartmann at oracle.com  Tue Nov 12 11:20:03 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 12 Nov 2019 12:20:03 +0100
Subject: RFR 8233941: adlc should not generate
 Pipeline_Use_Cycle_Mask::operator=
In-Reply-To: <8743f6f3-37a5-e1f7-e607-5a7cfe75c5d5@oracle.com>
References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com>
 <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com>
 <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com>
 <44025d2f-9d0d-b42f-816f-582e67df62e2@oracle.com>
 <87k186rt7x.fsf@oldenburg2.str.redhat.com>
 <72d6aa27-014c-7344-b782-c7f9b9bfd5ed@oracle.com>
 <8743f6f3-37a5-e1f7-e607-5a7cfe75c5d5@oracle.com>
Message-ID: <635763e6-b435-6028-22a0-9eb5c586df35@oracle.com>


On 12.11.19 09:05, Tobias Hartmann wrote:
> Looks good to me too. I'll run some testing and sponsor if everything passes.

All tests passed. Pushed.

Best regards,
Tobias

From christoph.goettschkes at microdoc.com  Tue Nov 12 12:07:47 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Tue, 12 Nov 2019 13:07:47 +0100
Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only
 works with server VMs.
Message-ID: <mailman.6.1573560573.19479.hotspot-compiler-dev@openjdk.java.net>

Hi,

The test "compiler/codegen/TestCharVect2.java" uses the VM flag 
"MaxVectorSize" which is not defined for all supported VM configurations. 
Client VMs exit with an "Unrecognized VM option" error.

Bug: https://bugs.openjdk.java.net/browse/JDK-8231954
Webrev: https://cr.openjdk.java.net/~bulasevich/8231954/webrev.00

Igor suggested to use the tag '@requires vm.flavor == "server"', but since 
other tests like [1] and [2] are also using my suggested approach, and I 
am uncertain how the flag MaxVectorSize plays together with the JVMCI, I 
would currently stick with my approach and would like to get more feedback 
on this topic. Also, the @requires tag would disable the test altogether, 
but the first run (without the MaxVectorSize option) works in client VMs 
and might be a viable test case, which is lost if the tag is added.

Thanks,
Christoph

[1] 
https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestNaNVector.java
[2] 
https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestPopCountVector.java


From bsrbnd at gmail.com  Tue Nov 12 12:49:31 2019
From: bsrbnd at gmail.com (B. Blaser)
Date: Tue, 12 Nov 2019 13:49:31 +0100
Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <15fe28f9-e3e9-2766-2287-0fb1762b4414@oracle.com>
References: <CAEgw74CAZ+8uwjRoP_ti0yyyVAKQ_KMrM7mWJ4+XeqaKUbicyQ@mail.gmail.com>
 <15fe28f9-e3e9-2766-2287-0fb1762b4414@oracle.com>
Message-ID: <CAEgw74A4NdXGRBEyQn1VZndKg-9nPYqtyCh2pwpQktKi48rN4Q@mail.gmail.com>

Hi Vladimir Ivanov,

On Tue, 12 Nov 2019 at 10:00, Vladimir Ivanov
<vladimir.x.ivanov at oracle.com> wrote:
>
> Hi Bernard,
>
> Nice improvement!

Thanks.

> > http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/
>
> I don't see cases for non-constant masks John suggested covered. Have
> you tried to implement them? Any problems encountered or did you just
> leave them for future improvement?

I didn't experiment with non-constant masks yet, which is why I left
them for future improvements (as told to John).

> ====================================================================
>
> I'd like to see asserts in new MacroAssembler methods that imm8 fits
> into a byte, like:
>
> +void Assembler::btsq(Address dst, int imm8) {
> +  assert(isByte(imm8), "not a byte");
> +  InstructionMark im(this);

I'll do this.

> ====================================================================
>
> src/hotspot/cpu/x86/x86_64.ad:
>
> +operand immPow2L()
> +%{
> +  // n should be a pure 64-bit power of 2 immediate.
> +  predicate(is_power_of_2_long(n->get_long()) &&
> log2_long(n->get_long()) > 31);
>
> +operand immPow2NotL()
> +%{
> +  // n should be a pure 64-bit immediate given that not(n) is a power of 2.
>
> Why do you limit the optimization to bits in upper half? Is it because
> ordinary andq/orq instructions work well for the rest? If that's the
> case, it deserves a comment.

On a pure specification basis (Intel optimization manual that Sandhya
pointed me to), AND/OR and BTR/BTS have the same latency=1 but a
slightly better throughput for the former and when experimenting with
values <= 32-bit, I didn't observed much difference or quite
imperceptibly in favor of AND/OR. But with pure 64-bit values, the
benefit is much more evident because BTR/BTS replaces both a MOV and
an AND/OR which is simply better on specification basis (latency=1 for
BTR/BTS vs latency=1+1 for MOV + AND/OR). So, I'll update the comments
as next:

// n should be a pure 64-bit power of 2 immediate because AND/OR works
well enough for 8/32-bit values.
// n should be a pure 64-bit immediate given that not(n) is a power of
2 because AND/OR works well enough for 8/32-bit values.

> (immPow2NotL is a bit misleading: I read it as "power of 2, but not a
> long". What do you think about immL_NegPow2/immL_Pow2? Not sure how to
> encode that it's > 2^32, but I would just skip it for now.)

I agree with immL_NotPow2/immL_Pow2, for the encoding, see below.

> ====================================================================
>
> +instruct btrL_mem_imm(memory dst, immPow2NotL src, rFlagsReg cr) %{
> +  match(Set dst (StoreL dst (AndL (LoadL dst) src)));
>
> +instruct btsL_mem_imm(memory dst, immPow2L    src, rFlagsReg cr) %{
> +  match(Set dst (StoreL dst (OrL (LoadL dst) src)));
>
> Does it make sense to cover 32-/16-bit cases the same way? (Relates to
> the earlier question on bits from upper half.)

Given my above answer, I don't think this would make sense (but I'll
do some more experiments as part of further enhancement issues as
discussed with John).

> Do you leave out in-register updates because they don't give any
> benefits compared to andq/orq?

Same answer, I'd have to do further experiments as part of complementary issues.
Note that this very one is focusing on the most frequent cases which
will likely to become even more common with final fields trusted as
constants.

> Also, please, use "con" instead of "src". It's easier to read when the
> name reflects that the value is a constant.

Yes, I'll do this.

> ====================================================================
>
> test/hotspot/jtreg/compiler/c2/TestBitSetAndReset.java:
>
> As I understand, the only test case which exercises new code is:
>
>    64     private static void test63() {
>    65         andq &= ~MASK63;
>    66         orq |= MASK63;
>    67     }
>
> Please, at least, add a case for MASK32.

The existing MASK31 exercises the new code because ffff_ffff_7fff_ffff
cannot be written as an AND/OR sign-extended 32-bit value:

03c       btrq    [RSI + #16 (8-bit)], log2(not(#-2147483649))    #
long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq

See 'immPow2NotL' predicate 'log2_long(~n->get_long()) > 30'.

Note that 8000_0000 is also a corner case as it can be written as a
MOV zero-extended 32-bit value which doesn't seem to be worse than
BTR/BTS on experiment basis:

044       movl    R10, #2147483648    # long (unsigned 32-bit)
04a       orq     [RSI + #24 (8-bit)], R10    # long ! Field:
org/openjdk/bench/vm/compiler/BitSetAndReset.orq

That's why I kept it, see 'immPow2L' predicate 'log2_long(n->get_long()) > 31'.
But I agree that a test case for MASK32 would be nice in this
situation and I'll add it.

>    29  * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions
> -XX:+UnlockDiagnosticVMOptions
>    30  *                   -XX:-TieredCompilation
> -XX:CompileThresholdScaling=0.1 -XX:-Inline
>    31  *
> -XX:CompileCommand=print,compiler/c2/TestBitSetAndReset.test*
>    32  *
> -XX:CompileCommand=compileonly,compiler/c2/TestBitSetAndReset.test*
>    33  *                   compiler.c2.TestBitSetAndReset
>
> Since you explicitly disable tiered, you can just directly set the
> threshold instead (-XX:CompileThreshold=1000).
>
> -XX:-Inline is redundant and you can replace compileonly directive with
> dontinline to speed up the test.

Yes, thanks, I'll do this too.

Best regards,
Bernard

> Best regards,
> Vladimir Ivanov
> >
> > It includes:
> > * Vladimir's requested changes about instruction encoding and testing
> > along with:
> > * John's suggested complementary benchmark following Sandhya's note
> > regarding throughput.
> >
> > Thanks (hotspot:tier1 is OK on Linux/x86_64),
> > Bernard
> >
> > [1] https://bugs.openjdk.java.net/browse/JDK-8214239
> > [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/035699.html
> >

From bsrbnd at gmail.com  Tue Nov 12 13:05:33 2019
From: bsrbnd at gmail.com (B. Blaser)
Date: Tue, 12 Nov 2019 14:05:33 +0100
Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <6ebb5cee-04ca-4cb8-5c6a-6dc29f52d73a@oracle.com>
References: <CAEgw74CAZ+8uwjRoP_ti0yyyVAKQ_KMrM7mWJ4+XeqaKUbicyQ@mail.gmail.com>
 <07845286-F5FB-4A7C-925B-8C5488EA7A02@oracle.com>
 <b4d89d7e-f048-49a1-f5ac-1d202a78a2a4@oracle.com>
 <CAEgw74BMG2rCDW8u+qXbmocnChq7UuGYi99r7jy0LO0s9bHDMw@mail.gmail.com>
 <aec3fe39-cac8-df30-d77c-a8ca4ecf8a33@oracle.com>
 <6ebb5cee-04ca-4cb8-5c6a-6dc29f52d73a@oracle.com>
Message-ID: <CAEgw74BEfY2eFq8HZhXH_WzeD74WmiWHc3hjLM=vFuNh6ZkNuw@mail.gmail.com>

Thanks for testing and reviewing.

I'll do Vladimir Ivanov's non-functional changes and I'll push it
again to jdk/submit to make sure all is right.

You're welcome to do additional internal testing if necessary,
Bernard

On Tue, 12 Nov 2019 at 00:35, Vladimir Kozlov
<vladimir.kozlov at oracle.com> wrote:
>
> Bernard,
>
> Testing passed clean. After second review it can be pushed.
>
> Thanks,
> Vladimir
>
> On 11/11/19 12:31 PM, Vladimir Kozlov wrote:
> > It is fine. I will submit our internal testing (more tiers) and let you know results.
> >
> > Vladimir
> >
> > On 11/11/19 12:24 PM, B. Blaser wrote:
> >> I pushed it to jdk/submit and all seems to be OK:
> >>
> >> http://hg.openjdk.java.net/jdk/submit/rev/cbe81ae81095
> >>
> >> Should we run more tests?
> >>
> >> Bernard

From vladimir.x.ivanov at oracle.com  Tue Nov 12 13:32:40 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 12 Nov 2019 16:32:40 +0300
Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <CAEgw74A4NdXGRBEyQn1VZndKg-9nPYqtyCh2pwpQktKi48rN4Q@mail.gmail.com>
References: <CAEgw74CAZ+8uwjRoP_ti0yyyVAKQ_KMrM7mWJ4+XeqaKUbicyQ@mail.gmail.com>
 <15fe28f9-e3e9-2766-2287-0fb1762b4414@oracle.com>
 <CAEgw74A4NdXGRBEyQn1VZndKg-9nPYqtyCh2pwpQktKi48rN4Q@mail.gmail.com>
Message-ID: <b2bf3707-d593-277a-7cf8-0144655ee09b@oracle.com>

Thanks for the clarifications, Bernard.

>>> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/
>>
>> I don't see cases for non-constant masks John suggested covered. Have
>> you tried to implement them? Any problems encountered or did you just
>> leave them for future improvement?
> 
> I didn't experiment with non-constant masks yet, which is why I left
> them for future improvements (as told to John).

Sounds good.


>> Why do you limit the optimization to bits in upper half? Is it because
>> ordinary andq/orq instructions work well for the rest? If that's the
>> case, it deserves a comment.
> 
> On a pure specification basis (Intel optimization manual that Sandhya
> pointed me to), AND/OR and BTR/BTS have the same latency=1 but a
> slightly better throughput for the former and when experimenting with
> values <= 32-bit, I didn't observed much difference or quite
> imperceptibly in favor of AND/OR. But with pure 64-bit values, the
> benefit is much more evident because BTR/BTS replaces both a MOV and
> an AND/OR which is simply better on specification basis (latency=1 for
> BTR/BTS vs latency=1+1 for MOV + AND/OR). So, I'll update the comments
> as next:
> 
> // n should be a pure 64-bit power of 2 immediate because AND/OR works
> well enough for 8/32-bit values.
> // n should be a pure 64-bit immediate given that not(n) is a power of
> 2 because AND/OR works well enough for 8/32-bit values.

Looks good.

> 
>> (immPow2NotL is a bit misleading: I read it as "power of 2, but not a
>> long". What do you think about immL_NegPow2/immL_Pow2? Not sure how to
>> encode that it's > 2^32, but I would just skip it for now.)
> 
> I agree with immL_NotPow2/immL_Pow2, for the encoding, see below.

One idea to try: you can move "log2_long(n->get_long()) > ..." check 
from operand declaration to the instruction.

operand immL_Pow2() %{
   // ...
   predicate(is_power_of_2_long(n->get_long()));
   ...

operand immL_NotPow2() %{
   // ...
   predicate(is_power_of_2_long(~n->get_long()));
   ...

instruct btrL_mem_imm(memory dst, immL_NotPow2 con, rFlagsReg cr) %{
   predicate(log2_long(~in(2)->in(2)->get_long()) > 30);
   match(Set dst (StoreL dst (AndL (LoadL dst) con)));
...

instruct btsL_mem_imm(memory dst, immPow2L con, rFlagsReg cr) %{
   predicate(log2_long(in(2)->in(2)->get_long()) > 31);
   match(Set dst (StoreL dst (OrL (LoadL dst) con)));
...

It looks more natural (but also it requires more code) to do such 
operation-specific dispatching on instructions than on operands.

Best regards,
Vladimir Ivanov

From martin.doerr at sap.com  Tue Nov 12 13:54:31 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 12 Nov 2019 13:54:31 +0000
Subject: [8u] RFR for backport of 8216060 (CRC32 3/4): [PPC64] Vector CRC
 implementation should be used by interpreter and be faster for short
 arrays
In-Reply-To: <92c29733-7916-3069-b913-4c51fc5424df@linux.vnet.ibm.com>
References: <675a6c68-ca08-27ba-d9cb-8fa02efc5102@linux.vnet.ibm.com>
 <92c29733-7916-3069-b913-4c51fc5424df@linux.vnet.ibm.com>
Message-ID: <VI1PR0201MB2479A042AF81020FC61E5E8A9A770@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Gustavo,

looks good, now.

Best regards,
Martin


> -----Original Message-----
> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> Sent: Dienstag, 12. November 2019 03:54
> To: Doerr, Martin <martin.doerr at sap.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: [8u] RFR for backport of 8216060 (CRC32 3/4): [PPC64] Vector
> CRC implementation should be used by interpreter and be faster for short
> arrays
> 
> Hi Martin,
> 
> Please find v3 for CRC32 3/4 accordingly to your last review in:
> 
> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v3_/8216060/
> 
> I kept the at the CRC32C defines.
> 
> 
> Thank you & best regards,
> Gustavo

From martin.doerr at sap.com  Tue Nov 12 13:58:10 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 12 Nov 2019 13:58:10 +0000
Subject: [8u] RFR for backport of 8217459 (CRC32 4/4): [PPC64] Cleanup
 non-vector version of CRC32
In-Reply-To: <e86333d3-8991-3d9d-7923-bc37c7d1f68d@linux.vnet.ibm.com>
References: <06bc93e6-59b4-95ec-7a27-ef789ac51564@linux.vnet.ibm.com>
 <e86333d3-8991-3d9d-7923-bc37c7d1f68d@linux.vnet.ibm.com>
Message-ID: <VI1PR0201MB2479D11EE5AB51ADB1D0A3CA9A770@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Gustavo,

thanks for backporting JDK-8206173 separately.
Looks good to me.

Best regards,
Martin


> -----Original Message-----
> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> Sent: Dienstag, 12. November 2019 03:59
> To: Doerr, Martin <martin.doerr at sap.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: [8u] RFR for backport of 8217459 (CRC32 4/4): [PPC64] Cleanup
> non-vector version of CRC32
> 
> Hi Martin,
> 
> Change 8206173 is now backported to jdk8u-dev:
> 
> http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/9148fcba5de9
> 
> So could you please review v3 CRC32 4/4 accordingly to your last review? The
> change now is PPC64-only since shared code part was pushed with 8206173.
> 
> Please find v3 in:
> 
> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v3_/8217459/
> 
> Thank you & best regards,
> Gustavo

From patric.hedlin at oracle.com  Tue Nov 12 14:16:13 2019
From: patric.hedlin at oracle.com (Patric Hedlin)
Date: Tue, 12 Nov 2019 15:16:13 +0100
Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 check
Message-ID: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>

Dear all,

I would like to ask for help to review the following change/update:

Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376
Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/

8220376: C2: Int >0 not recognized as !=0 for div by 0 check

 ??? Adding a simple subsumption test to IfNode::Ideal to enable a local
 ??? short-circuit for (obviously) redundant if-nodes.

Testing: hs-tier1-4, hs-precheckin-comp


Best regards,
Patric


From bob.vandette at oracle.com  Tue Nov 12 14:33:21 2019
From: bob.vandette at oracle.com (Bob Vandette)
Date: Tue, 12 Nov 2019 09:33:21 -0500
Subject: RFR(S) : 8233900: [JVMCI] improve help text for
 EnableJVMCIProduct option
In-Reply-To: <23E9D7B9-F573-4BD7-BF7A-9434AECCE30C@oracle.com>
References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com>
 <A64BA248-9194-4EE3-835D-F53B962AB9EE@oracle.com>
 <3C974BFC-3CF7-4214-B4F3-EE630C32EBF3@oracle.com>
 <23E9D7B9-F573-4BD7-BF7A-9434AECCE30C@oracle.com>
Message-ID: <B05679F5-BB17-4720-ADCE-4FC2461AFCEF@oracle.com>

The changes look fine.

Bob.

> On Nov 11, 2019, at 4:44 PM, Doug Simon <doug.simon at oracle.com> wrote:
> 
> Thanks for the review Vladimir.
> 
> -Doug
> 
>> On 11 Nov 2019, at 17:59, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>> 
>> Looks good. Do you need this be backported to 11u? Add affected versions and fix version to rfe.
>> 
>> Thanks
>> Vladimir
>> 
>>> On Nov 11, 2019, at 2:54 AM, Doug Simon <doug.simon at oracle.com> wrote:
>>> 
>>> Hi,
>>> 
>>> Please review this change to improve the help text for EnableJVMCIProduct and related options.
>>> 
>>> https://dougxc.github.io/webrevs/8233900
>>> https://bugs.openjdk.java.net/browse/JDK-8233900
>>> 
>>> -Doug
>> 
> 


From doug.simon at oracle.com  Tue Nov 12 14:33:59 2019
From: doug.simon at oracle.com (Doug Simon)
Date: Tue, 12 Nov 2019 15:33:59 +0100
Subject: RFR(S) : 8233900: [JVMCI] improve help text for
 EnableJVMCIProduct option
In-Reply-To: <B05679F5-BB17-4720-ADCE-4FC2461AFCEF@oracle.com>
References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com>
 <A64BA248-9194-4EE3-835D-F53B962AB9EE@oracle.com>
 <3C974BFC-3CF7-4214-B4F3-EE630C32EBF3@oracle.com>
 <23E9D7B9-F573-4BD7-BF7A-9434AECCE30C@oracle.com>
 <B05679F5-BB17-4720-ADCE-4FC2461AFCEF@oracle.com>
Message-ID: <F0563CBF-81AA-4471-B56A-FCFB7E16B2BB@oracle.com>

Thanks Bob.

-Doug

> On 12 Nov 2019, at 15:33, Bob Vandette <bob.vandette at oracle.com> wrote:
> 
> The changes look fine.
> 
> Bob.
> 
>> On Nov 11, 2019, at 4:44 PM, Doug Simon <doug.simon at oracle.com> wrote:
>> 
>> Thanks for the review Vladimir.
>> 
>> -Doug
>> 
>>> On 11 Nov 2019, at 17:59, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>> 
>>> Looks good. Do you need this be backported to 11u? Add affected versions and fix version to rfe.
>>> 
>>> Thanks
>>> Vladimir
>>> 
>>>> On Nov 11, 2019, at 2:54 AM, Doug Simon <doug.simon at oracle.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Please review this change to improve the help text for EnableJVMCIProduct and related options.
>>>> 
>>>> https://dougxc.github.io/webrevs/8233900
>>>> https://bugs.openjdk.java.net/browse/JDK-8233900
>>>> 
>>>> -Doug
>>> 
>> 
> 


From gromero at linux.vnet.ibm.com  Tue Nov 12 15:54:12 2019
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Tue, 12 Nov 2019 12:54:12 -0300
Subject: [8u] RFR for backport of 8217459 (CRC32 4/4): [PPC64] Cleanup
 non-vector version of CRC32
In-Reply-To: <VI1PR0201MB2479D11EE5AB51ADB1D0A3CA9A770@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <06bc93e6-59b4-95ec-7a27-ef789ac51564@linux.vnet.ibm.com>
 <e86333d3-8991-3d9d-7923-bc37c7d1f68d@linux.vnet.ibm.com>
 <VI1PR0201MB2479D11EE5AB51ADB1D0A3CA9A770@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <ad5799b4-c3f6-10d0-098a-18b15773ed6e@linux.vnet.ibm.com>

Hi Martin,

On 11/12/2019 10:58 AM, Doerr, Martin wrote:
> Hi Gustavo,
> 
> thanks for backporting JDK-8206173 separately.
> Looks good to me.

Thanks a lot for reviewing the whole patchset. I'll proceed to get the approval
to push them, and once it's approved I'll push them all at once.


Best regards,
Gustavo

From fweimer at redhat.com  Tue Nov 12 16:02:45 2019
From: fweimer at redhat.com (Florian Weimer)
Date: Tue, 12 Nov 2019 17:02:45 +0100
Subject: RFR 8233941: adlc should not generate
 Pipeline_Use_Cycle_Mask::operator=
In-Reply-To: <635763e6-b435-6028-22a0-9eb5c586df35@oracle.com> (Tobias
 Hartmann's message of "Tue, 12 Nov 2019 12:20:03 +0100")
References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com>
 <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com>
 <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com>
 <44025d2f-9d0d-b42f-816f-582e67df62e2@oracle.com>
 <87k186rt7x.fsf@oldenburg2.str.redhat.com>
 <72d6aa27-014c-7344-b782-c7f9b9bfd5ed@oracle.com>
 <8743f6f3-37a5-e1f7-e607-5a7cfe75c5d5@oracle.com>
 <635763e6-b435-6028-22a0-9eb5c586df35@oracle.com>
Message-ID: <87zhh1nkuy.fsf@oldenburg2.str.redhat.com>

* Tobias Hartmann:

> On 12.11.19 09:05, Tobias Hartmann wrote:
>> Looks good to me too. I'll run some testing and sponsor if everything passes.
>
> All tests passed. Pushed.

Thanks!

Florian


From kim.barrett at oracle.com  Tue Nov 12 17:31:51 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Tue, 12 Nov 2019 12:31:51 -0500
Subject: RFR 8233941: adlc should not generate
 Pipeline_Use_Cycle_Mask::operator=
In-Reply-To: <635763e6-b435-6028-22a0-9eb5c586df35@oracle.com>
References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com>
 <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com>
 <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com>
 <44025d2f-9d0d-b42f-816f-582e67df62e2@oracle.com>
 <87k186rt7x.fsf@oldenburg2.str.redhat.com>
 <72d6aa27-014c-7344-b782-c7f9b9bfd5ed@oracle.com>
 <8743f6f3-37a5-e1f7-e607-5a7cfe75c5d5@oracle.com>
 <635763e6-b435-6028-22a0-9eb5c586df35@oracle.com>
Message-ID: <D1D47188-F8DF-4FC2-891D-04E0FCDF92E4@oracle.com>

> On Nov 12, 2019, at 6:20 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> 
> On 12.11.19 09:05, Tobias Hartmann wrote:
>> Looks good to me too. I'll run some testing and sponsor if everything passes.
> 
> All tests passed. Pushed.
> 
> Best regards,
> Tobias

For the record, the change looked good to me too.

Did the copyright year get updated in the changed file?


From martin.doerr at sap.com  Tue Nov 12 15:40:46 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 12 Nov 2019 15:40:46 +0000
Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread
In-Reply-To: <e87a02b4-aa32-bfb9-ce76-5c875b006d24@oracle.com>
References: <VI1PR0201MB24797F385592A8C572B4064E9A900@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com>
 <VI1PR0201MB24790137F2D7A941A33CE4949A660@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d8990c59-b51e-1c21-3e7c-272d287fc711@oracle.com>
 <VI1PR0201MB2479DDFCD2E55541481E65ED9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com>
 <VI1PR0201MB247928CE867614F8272E5D719A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com>
 <VI1PR0201MB247921C496C30C638B0959279A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <8BCAE092-9E05-4B38-9D41-2F3AC889C0AA@oracle.com>
 <HE1PR0201MB2475B62202B8D5187D6C3CAC9A790@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <84ef3a8c-5005-6529-5192-b9214e0348ac@oracle.com>
 <VI1PR0201MB2479F3BD3483ABF14F0A36109A780@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d147c777-b05f-3c65-90eb-d35cea0beed1@oracle.com>
 <e87a02b4-aa32-bfb9-ce76-5c875b006d24@oracle.com>
Message-ID: <VI1PR0201MB247930D368B25F07DFA2EDFE9A770@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Vladimir et al.,

thanks a lot for running these tests. I've seen that they have passed.
I haven't got any more comments, so should I push the latest version?
Are you ok with it?

Best regards,
Martin


> -----Original Message-----
> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Sent: Donnerstag, 7. November 2019 23:15
> To: David Holmes <david.holmes at oracle.com>; Doerr, Martin
> <martin.doerr at sap.com>; Kim Barrett <kim.barrett at oracle.com>
> Cc: dean.long at oracle.com; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> 
> I resubmitted testing.
> 
> Vladimir
> 
> On 11/7/19 1:51 AM, David Holmes wrote:
> > On 7/11/2019 7:08 pm, Doerr, Martin wrote:
> >> Hi David,
> >>
> >> get_log only accesses the executing thread's own oop and the ones
> before it. So it's ensured by
> >> the algorithm that all accessed oops are in live handles.
> >
> > Okay I see that now.
> >
> > Thanks,
> > David
> >
> >> The problem is in can_remove when not holding the lock. For that,
> webrev.04 avoids accessing the
> >> oop of the last compiler thread in the case in which the lock is not held.
> >>
> >> Best regards,
> >> Martin
> >>
> >>
> >>> -----Original Message-----
> >>> From: David Holmes <david.holmes at oracle.com>
> >>> Sent: Mittwoch, 6. November 2019 11:15
> >>> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
> >>> <kim.barrett at oracle.com>
> >>> Cc: dean.long at oracle.com; Vladimir Kozlov
> (vladimir.kozlov at oracle.com)
> >>> <vladimir.kozlov at oracle.com>; hotspot-compiler-
> dev at openjdk.java.net
> >>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> >>>
> >>> Hi Martin,
> >>>
> >>> On 6/11/2019 7:12 pm, Doerr, Martin wrote:
> >>>> Hi Kim,
> >>>>
> >>>> thanks for confirming.
> >>>>
> >>>>
> >>>
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> >>> ev.04/
> >>>> already avoids access to freed handles.
> >>>
> >>> Sorry I missed your earlier reference to this version.
> >>>
> >>> So the expectation here is that all accesses to these arrays are guarded
> >>> by the CompileThread_lock, but that doesn't seem to hold for get_log ?
> >>>
> >>> Thanks,
> >>> David
> >>> -----
> >>>
> >>>> I don't really like the complexity of this code.
> >>>> Replacing oops in handles would have been much more simple.
> >>>> But I can live with either version.
> >>>>
> >>>> Best regards,
> >>>> Martin
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Kim Barrett <kim.barrett at oracle.com>
> >>>>> Sent: Mittwoch, 6. November 2019 04:09
> >>>>> To: Doerr, Martin <martin.doerr at sap.com>
> >>>>> Cc: David Holmes <david.holmes at oracle.com>;
> dean.long at oracle.com;
> >>>>> Vladimir Kozlov (vladimir.kozlov at oracle.com)
> >>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-
> dev at openjdk.java.net
> >>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI
> CompilerThread
> >>>>>
> >>>>>> On Nov 5, 2019, at 3:40 AM, Doerr, Martin <martin.doerr at sap.com>
> >>> wrote:
> >>>>>
> >>>>> Coming back in, because this seems to be going off into the weeds
> again.
> >>>>>
> >>>>>>> I don't understand what you mean. If a compiler thread holds an
> oop,
> >>> any
> >>>>>>> oop, it must hold it in a Handle to ensure it can't be gc'd.
> >>>>>>
> >>>>>> The problem is not related to gc.
> >>>>>> My change introduces destroy_global for the handles. This means
> that
> >>> the
> >>>>> OopStorage portion which has held the oop can get freed.
> >>>>>> However, other compiler threads are running concurrently. They
> may
> >>>>> execute code which reads the oop from the handle which is freed by
> this
> >>>>> thread.
> >>>>>> Reading stale data is not a problem here, but reading freed memory
> may
> >>>>> assert or even crash in general.
> >>>>>> I can't see how OopStorage supports reading from handles which
> were
> >>>>> freed by destroy_global.
> >>>>>
> >>>>> So don't do that!
> >>>>>
> >>>>> OopStorage isn't magic. If you are going to look at an OopStorage
> >>>>> handle, you have to ensure there won't be concurrent deletion. Use
> >>>>> locks or some safe memory reclamation protocol. (GlobalCounter
> might
> >>>>> be used here, but it depends a lot on what the iterations are doing. A
> >>>>> reference counting mechanism is another possibility.) This is no
> >>>>> different from any other resource management.
> >>>>>
> >>>>>> I think it would be safe if the freeing only occurred at safepoints, but
> I
> >>> don't
> >>>>> think this is the case.
> >>>>>
> >>>>> Assuming the iteration didn?t happen at safepoints (which is just a
> way to
> >>>>> make the iteration and
> >>>>> deletion not concurrent).? And I agree that isn?t the case with the
> current
> >>>>> code.
> >>>>

From martin.doerr at sap.com  Tue Nov 12 15:17:58 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 12 Nov 2019 15:17:58 +0000
Subject: RFR(S): Test crashed with assert(phi->operand_count() != 1 ||
 phi->subst() != phi) failed: missed trivial simplification
In-Reply-To: <d810eceb-ac2c-9914-be2f-a5e2b2f529fa@oracle.com>
References: <VI1PR0201MB24793BAE30999C6DEA426E779A740@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d810eceb-ac2c-9914-be2f-a5e2b2f529fa@oracle.com>
Message-ID: <VI1PR0201MB247977001190E35A57B5A6A79A770@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Tobias and Vladimir,

thanks for the reviews. Pushed to jdk/jdk. We'll request an 11u backport later.

Best regards,
Martin


> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Dienstag, 12. November 2019 10:02
> To: Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-
> dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR(S): Test crashed with assert(phi->operand_count() != 1 ||
> phi->subst() != phi) failed: missed trivial simplification
> 
> Hi Martin,
> 
> > http://cr.openjdk.java.net/~mdoerr/8233820_C1_illegal_phi/webrev.00/
> 
> Looks good.
> 
> Best regards,
> Vladimir Ivanov

From vladimir.kozlov at oracle.com  Tue Nov 12 19:07:30 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 12 Nov 2019 11:07:30 -0800
Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java
 only works with server VMs.
In-Reply-To: <20191112120936.1D826D285F@aojmv0009>
References: <20191112120936.1D826D285F@aojmv0009>
Message-ID: <58942EBB-CEC9-4B24-9917-0BE3C6AA96F2@oracle.com>

I agree with this fix. Reviewed.

Thanks
Vladimir

> On Nov 12, 2019, at 4:07 AM, christoph.goettschkes at microdoc.com wrote:
> 
> Hi,
> 
> The test "compiler/codegen/TestCharVect2.java" uses the VM flag 
> "MaxVectorSize" which is not defined for all supported VM configurations. 
> Client VMs exit with an "Unrecognized VM option" error.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8231954
> Webrev: https://cr.openjdk.java.net/~bulasevich/8231954/webrev.00
> 
> Igor suggested to use the tag '@requires vm.flavor == "server"', but since 
> other tests like [1] and [2] are also using my suggested approach, and I 
> am uncertain how the flag MaxVectorSize plays together with the JVMCI, I 
> would currently stick with my approach and would like to get more feedback 
> on this topic. Also, the @requires tag would disable the test altogether, 
> but the first run (without the MaxVectorSize option) works in client VMs 
> and might be a viable test case, which is lost if the tag is added.
> 
> Thanks,
> Christoph
> 
> [1] 
> https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestNaNVector.java
> [2] 
> https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestPopCountVector.java
> 


From leonid.mesnik at oracle.com  Tue Nov 12 19:33:49 2019
From: leonid.mesnik at oracle.com (Leonid Mesnik)
Date: Tue, 12 Nov 2019 11:33:49 -0800
Subject: RFC 8233915: JVMTI FollowReferences: Java Heap Leak not found
 because of C2 Scalar Replacement
In-Reply-To: <VI1PR02MB482960E862A276EF918D88B59B740@VI1PR02MB4829.eurprd02.prod.outlook.com>
References: <VI1PR02MB482960E862A276EF918D88B59B740@VI1PR02MB4829.eurprd02.prod.outlook.com>
Message-ID: <729138cc-7a21-cf79-947c-c6a68f34237a@oracle.com>

Hi

I don't make complete review just sanity verified your test headers. I 
see a couple of potential issues with them.

1) The using Xmx32M could cause OOME failures if test is executed with 
ZGC. I think that at least 256M should be set. Could you please verify 
that your tests pass with ZGC enabled.


2) I think it makes sense to add requires

vm.opt.TieredCompilation != true

to just skip tests if anyone runs them with tiered compilation disabled 
explicitly.

Leonid

On 11/11/19 7:29 AM, Reingruber, Richard wrote:
> Hi,
>
> I have created https://bugs.openjdk.java.net/browse/JDK-8233915
>
> In short, a set of live objects L is not found using JVMTI FollowReferences() if L is only reachable
> from a scalar replaced object in a frame of a C2 compiled method. If L happens to be a growing leak,
> then a dynamically loaded JVMTI agent (note: can_tag_objects is an always capability) for heap
> diagnostics won't discover L as live and it won't be able to find root references that lead to L.
>
> I'd like to suggest the implementation for the proposed enhancement JDK-8227745 as bug-fix.
>
> RFE:       https://bugs.openjdk.java.net/browse/JDK-8227745
> Webrev(*): http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.1/
>
> Please comment on the suggestion. Dou you see other solutions that allow an agent to discover the
> chain of references to L?
>
> I'd like to work on the complexity as well. One significant simplification could be, if it was
> possible to reallocate scalar replaced objects at safepoints (i.e. allow the VM thread to call
> Deoptimization::realloc_objects()). The GC interface does not seem to allow this.
>
> Thanks, Richard.
>
> (*) Not yet accepted, because deemed too complex for the performance gain. Note that I was able to
>      reduce webrev.1 in size compared to webrev.0

From vladimir.kozlov at oracle.com  Tue Nov 12 19:35:04 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 12 Nov 2019 11:35:04 -0800
Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread
In-Reply-To: <VI1PR0201MB247930D368B25F07DFA2EDFE9A770@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB24797F385592A8C572B4064E9A900@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d8990c59-b51e-1c21-3e7c-272d287fc711@oracle.com>
 <VI1PR0201MB2479DDFCD2E55541481E65ED9A600@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com>
 <VI1PR0201MB247928CE867614F8272E5D719A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com>
 <VI1PR0201MB247921C496C30C638B0959279A7E0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <8BCAE092-9E05-4B38-9D41-2F3AC889C0AA@oracle.com>
 <HE1PR0201MB2475B62202B8D5187D6C3CAC9A790@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <84ef3a8c-5005-6529-5192-b9214e0348ac@oracle.com>
 <VI1PR0201MB2479F3BD3483ABF14F0A36109A780@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <d147c777-b05f-3c65-90eb-d35cea0beed1@oracle.com>
 <e87a02b4-aa32-bfb9-ce76-5c875b006d24@oracle.com>
 <VI1PR0201MB247930D368B25F07DFA2EDFE9A770@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <459ded9d-f5da-bfbd-6e80-3501e1443b09@oracle.com>

On 11/12/19 7:40 AM, Doerr, Martin wrote:
> Hi Vladimir et al.,
> 
> thanks a lot for running these tests. I've seen that they have passed.
> I haven't got any more comments, so should I push the latest version?
> Are you ok with it?

Yes, please push it.

Vladimir

> 
> Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> Sent: Donnerstag, 7. November 2019 23:15
>> To: David Holmes <david.holmes at oracle.com>; Doerr, Martin
>> <martin.doerr at sap.com>; Kim Barrett <kim.barrett at oracle.com>
>> Cc: dean.long at oracle.com; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>
>> I resubmitted testing.
>>
>> Vladimir
>>
>> On 11/7/19 1:51 AM, David Holmes wrote:
>>> On 7/11/2019 7:08 pm, Doerr, Martin wrote:
>>>> Hi David,
>>>>
>>>> get_log only accesses the executing thread's own oop and the ones
>> before it. So it's ensured by
>>>> the algorithm that all accessed oops are in live handles.
>>>
>>> Okay I see that now.
>>>
>>> Thanks,
>>> David
>>>
>>>> The problem is in can_remove when not holding the lock. For that,
>> webrev.04 avoids accessing the
>>>> oop of the last compiler thread in the case in which the lock is not held.
>>>>
>>>> Best regards,
>>>> Martin
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>> Sent: Mittwoch, 6. November 2019 11:15
>>>>> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
>>>>> <kim.barrett at oracle.com>
>>>>> Cc: dean.long at oracle.com; Vladimir Kozlov
>> (vladimir.kozlov at oracle.com)
>>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-
>> dev at openjdk.java.net
>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>>>>
>>>>> Hi Martin,
>>>>>
>>>>> On 6/11/2019 7:12 pm, Doerr, Martin wrote:
>>>>>> Hi Kim,
>>>>>>
>>>>>> thanks for confirming.
>>>>>>
>>>>>>
>>>>>
>> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
>>>>> ev.04/
>>>>>> already avoids access to freed handles.
>>>>>
>>>>> Sorry I missed your earlier reference to this version.
>>>>>
>>>>> So the expectation here is that all accesses to these arrays are guarded
>>>>> by the CompileThread_lock, but that doesn't seem to hold for get_log ?
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>>> I don't really like the complexity of this code.
>>>>>> Replacing oops in handles would have been much more simple.
>>>>>> But I can live with either version.
>>>>>>
>>>>>> Best regards,
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Kim Barrett <kim.barrett at oracle.com>
>>>>>>> Sent: Mittwoch, 6. November 2019 04:09
>>>>>>> To: Doerr, Martin <martin.doerr at sap.com>
>>>>>>> Cc: David Holmes <david.holmes at oracle.com>;
>> dean.long at oracle.com;
>>>>>>> Vladimir Kozlov (vladimir.kozlov at oracle.com)
>>>>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-
>> dev at openjdk.java.net
>>>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI
>> CompilerThread
>>>>>>>
>>>>>>>> On Nov 5, 2019, at 3:40 AM, Doerr, Martin <martin.doerr at sap.com>
>>>>> wrote:
>>>>>>>
>>>>>>> Coming back in, because this seems to be going off into the weeds
>> again.
>>>>>>>
>>>>>>>>> I don't understand what you mean. If a compiler thread holds an
>> oop,
>>>>> any
>>>>>>>>> oop, it must hold it in a Handle to ensure it can't be gc'd.
>>>>>>>>
>>>>>>>> The problem is not related to gc.
>>>>>>>> My change introduces destroy_global for the handles. This means
>> that
>>>>> the
>>>>>>> OopStorage portion which has held the oop can get freed.
>>>>>>>> However, other compiler threads are running concurrently. They
>> may
>>>>>>> execute code which reads the oop from the handle which is freed by
>> this
>>>>>>> thread.
>>>>>>>> Reading stale data is not a problem here, but reading freed memory
>> may
>>>>>>> assert or even crash in general.
>>>>>>>> I can't see how OopStorage supports reading from handles which
>> were
>>>>>>> freed by destroy_global.
>>>>>>>
>>>>>>> So don't do that!
>>>>>>>
>>>>>>> OopStorage isn't magic. If you are going to look at an OopStorage
>>>>>>> handle, you have to ensure there won't be concurrent deletion. Use
>>>>>>> locks or some safe memory reclamation protocol. (GlobalCounter
>> might
>>>>>>> be used here, but it depends a lot on what the iterations are doing. A
>>>>>>> reference counting mechanism is another possibility.) This is no
>>>>>>> different from any other resource management.
>>>>>>>
>>>>>>>> I think it would be safe if the freeing only occurred at safepoints, but
>> I
>>>>> don't
>>>>>>> think this is the case.
>>>>>>>
>>>>>>> Assuming the iteration didn?t happen at safepoints (which is just a
>> way to
>>>>>>> make the iteration and
>>>>>>> deletion not concurrent).? And I agree that isn?t the case with the
>> current
>>>>>>> code.
>>>>>>

From igor.ignatyev at oracle.com  Tue Nov 12 19:40:46 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 12 Nov 2019 11:40:46 -0800
Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java
 only works with server VMs.
In-Reply-To: <20191112120936.1D826D285F@aojmv0009>
References: <20191112120936.1D826D285F@aojmv0009>
Message-ID: <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com>

Hi Christoph,

we are trying to get rid of IgnoreUnrecognizedVMOptions in our tests, as in most cases, it causes wasted compute time (as in this test) and can also lead to wrong/deprecated/deleted flags sneaking into the testbase, so I'd like you to reconsider your decision. below are my comments on your concerns regarding @requires approach.


> but since  other tests like [1] and [2] are also using my suggested approach

I'm sorry but this is not an argument.

> how the flag MaxVectorSize plays together with the JVMCI
as '@requires vm.flavor == "server"' filters configurations based vm build type, it will still allow execution on JVM w/ JVMCI and when JVMCI compiler is selected, as it will still be Server VM build. so, in a sense, the test will be w/ JVMCI in the same way as w/ your approach.

> @requires tag would disable the test altogether, but the first run
this is the known limitation of jtreg/@requires, and our current way to workaround it is to split a test description based on @requires values, so in this case it will be:

> /**
>  * @test
>  * @bug 8001183
>  * @summary incorrect results of char vectors right shift operaiton
>  *
>  * @run main/othervm/timeout=400 -Xbatch -Xmx128m compiler.codegen.TestCharVect2
>  */
> /**
>  * @test
>  * @bug 8001183
>  * @summary incorrect results of char vectors right shift operaiton with different MaxVectorSize
>  *
>  * @comment only server VM has MaxVectorSize
>  * @requires MaxVectorSizevm.flavor == "server"
>  * @run main/othervm/timeout=400 -Xbatch -Xmx128m -XX:+IgnoreUnrecognizedVMOptions -XX:MaxVectorSize=8 compiler.codegen.TestCharVect2
>  * @run main/othervm/timeout=400 -Xbatch -Xmx128m -XX:+IgnoreUnrecognizedVMOptions -XX:MaxVectorSize=16 compiler.codegen.TestCharVect2
>  * @run main/othervm/timeout=400 -Xbatch -Xmx128m -XX:+IgnoreUnrecognizedVMOptions -XX:MaxVectorSize=32 compiler.codegen.TestCharVect2
>  */


Thanks,
-- Igor
 

> On Nov 12, 2019, at 4:07 AM, christoph.goettschkes at microdoc.com wrote:
> 
> Hi,
> 
> The test "compiler/codegen/TestCharVect2.java" uses the VM flag 
> "MaxVectorSize" which is not defined for all supported VM configurations. 
> Client VMs exit with an "Unrecognized VM option" error.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8231954
> Webrev: https://cr.openjdk.java.net/~bulasevich/8231954/webrev.00
> 
> Igor suggested to use the tag '@requires vm.flavor == "server"', but since 
> other tests like [1] and [2] are also using my suggested approach, and I 
> am uncertain how the flag MaxVectorSize plays together with the JVMCI, I 
> would currently stick with my approach and would like to get more feedback 
> on this topic. Also, the @requires tag would disable the test altogether, 
> but the first run (without the MaxVectorSize option) works in client VMs 
> and might be a viable test case, which is lost if the tag is added.
> 
> Thanks,
> Christoph
> 
> [1] 
> https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestNaNVector.java
> [2] 
> https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestPopCountVector.java
> 


From vladimir.kozlov at oracle.com  Tue Nov 12 19:42:59 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 12 Nov 2019 11:42:59 -0800
Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed:
 must have CFG nodes
In-Reply-To: <55581f09-b442-b379-23ce-236ae46e9fff@oracle.com>
References: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>
 <bf5d6250-abcc-e820-7886-a966fae1af9e@oracle.com>
 <af8bacdd-09d8-3c11-6444-ee51a6afc0a4@oracle.com>
 <720f25b4-e153-652a-cef3-f41084d30662@oracle.com>
 <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com>
 <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com>
 <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com>
 <be05a933-f91b-840e-1e0b-a1d412d5ca71@oracle.com>
 <47863dda-8509-2d94-83b5-56ad76c178e2@oracle.com>
 <aed1473b-c4d5-fabd-431b-bae83fc8af2f@oracle.com>
 <6608380f-065f-a839-2e40-a58e43cfc7ec@oracle.com>
 <55581f09-b442-b379-23ce-236ae46e9fff@oracle.com>
Message-ID: <cce4f864-87c0-0282-a504-95ef93948d74@oracle.com>

+1 (good)

Thanks,
Vladimir K.

On 11/12/19 3:00 AM, Vladimir Ivanov wrote:
> 
>> http://cr.openjdk.java.net/~thartmann/8233656/webrev.02/
> 
> Looks good. (And sorry for the misleading suggestion.)
> 
> Best regards,
> Vladimir Ivanov
> 
>> On 12.11.19 11:49, Tobias Hartmann wrote:
>>> Okay, this is actually not correct:
>>> If d is an IfTrue projection, we would change d to the corresponding If node which could be a
>>> dominator of n while the IfTrue projection is not.
>>>
>>> Best regards,
>>> Tobias
>>>
>>> On 12.11.19 10:52, Tobias Hartmann wrote:
>>>> Thanks Vladimir!
>>>>
>>>> Best regards,
>>>> Tobias
>>>>
>>>> On 12.11.19 10:52, Vladimir Ivanov wrote:
>>>>>
>>>>>> http://cr.openjdk.java.net/~thartmann/8233656/webrev.01/
>>>>>
>>>>> Looks good.
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov

From fweimer at redhat.com  Tue Nov 12 20:18:32 2019
From: fweimer at redhat.com (Florian Weimer)
Date: Tue, 12 Nov 2019 21:18:32 +0100
Subject: RFR 8233941: adlc should not generate
 Pipeline_Use_Cycle_Mask::operator=
In-Reply-To: <D1D47188-F8DF-4FC2-891D-04E0FCDF92E4@oracle.com> (Kim Barrett's
 message of "Tue, 12 Nov 2019 12:31:51 -0500")
References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com>
 <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com>
 <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com>
 <44025d2f-9d0d-b42f-816f-582e67df62e2@oracle.com>
 <87k186rt7x.fsf@oldenburg2.str.redhat.com>
 <72d6aa27-014c-7344-b782-c7f9b9bfd5ed@oracle.com>
 <8743f6f3-37a5-e1f7-e607-5a7cfe75c5d5@oracle.com>
 <635763e6-b435-6028-22a0-9eb5c586df35@oracle.com>
 <D1D47188-F8DF-4FC2-891D-04E0FCDF92E4@oracle.com>
Message-ID: <87y2wklug7.fsf@oldenburg2.str.redhat.com>

* Kim Barrett:

>> On Nov 12, 2019, at 6:20 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
>> 
>> 
>> On 12.11.19 09:05, Tobias Hartmann wrote:
>>> Looks good to me too. I'll run some testing and sponsor if everything passes.
>> 
>> All tests passed. Pushed.
>> 
>> Best regards,
>> Tobias
>
> For the record, the change looked good to me too.
>
> Did the copyright year get updated in the changed file?

No, it did not. 8-( I forgot that this is necessary in OpenJDK, and this
is a file that is only changed rarely.

What shall we do now?

Thanks,
Florian


From bsrbnd at gmail.com  Tue Nov 12 21:13:52 2019
From: bsrbnd at gmail.com (B. Blaser)
Date: Tue, 12 Nov 2019 22:13:52 +0100
Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <b2bf3707-d593-277a-7cf8-0144655ee09b@oracle.com>
References: <CAEgw74CAZ+8uwjRoP_ti0yyyVAKQ_KMrM7mWJ4+XeqaKUbicyQ@mail.gmail.com>
 <15fe28f9-e3e9-2766-2287-0fb1762b4414@oracle.com>
 <CAEgw74A4NdXGRBEyQn1VZndKg-9nPYqtyCh2pwpQktKi48rN4Q@mail.gmail.com>
 <b2bf3707-d593-277a-7cf8-0144655ee09b@oracle.com>
Message-ID: <CAEgw74BJTRaNTq2_nzpwaZ6oS_UmDF35CVa4HapQ8+pMaHhQ_w@mail.gmail.com>

Hi Vladimir Kozlov and Ivanov,

Please review the updated patch according to Vladimir Ivanov comments:

http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.01/

I've pushed it to jdk/submit as second changeset on branch
"JDK-8214239" and tests are OK:

http://hg.openjdk.java.net/jdk/submit/rev/f961f7a454e4

Any feedback is welcome.

Thanks,
Bernard

On Tue, 12 Nov 2019 at 14:32, Vladimir Ivanov
<vladimir.x.ivanov at oracle.com> wrote:
>
> Thanks for the clarifications, Bernard.
>
> >>> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/
> >>
> >> I don't see cases for non-constant masks John suggested covered. Have
> >> you tried to implement them? Any problems encountered or did you just
> >> leave them for future improvement?
> >
> > I didn't experiment with non-constant masks yet, which is why I left
> > them for future improvements (as told to John).
>
> Sounds good.
>
>
> >> Why do you limit the optimization to bits in upper half? Is it because
> >> ordinary andq/orq instructions work well for the rest? If that's the
> >> case, it deserves a comment.
> >
> > On a pure specification basis (Intel optimization manual that Sandhya
> > pointed me to), AND/OR and BTR/BTS have the same latency=1 but a
> > slightly better throughput for the former and when experimenting with
> > values <= 32-bit, I didn't observed much difference or quite
> > imperceptibly in favor of AND/OR. But with pure 64-bit values, the
> > benefit is much more evident because BTR/BTS replaces both a MOV and
> > an AND/OR which is simply better on specification basis (latency=1 for
> > BTR/BTS vs latency=1+1 for MOV + AND/OR). So, I'll update the comments
> > as next:
> >
> > // n should be a pure 64-bit power of 2 immediate because AND/OR works
> > well enough for 8/32-bit values.
> > // n should be a pure 64-bit immediate given that not(n) is a power of
> > 2 because AND/OR works well enough for 8/32-bit values.
>
> Looks good.
>
> >
> >> (immPow2NotL is a bit misleading: I read it as "power of 2, but not a
> >> long". What do you think about immL_NegPow2/immL_Pow2? Not sure how to
> >> encode that it's > 2^32, but I would just skip it for now.)
> >
> > I agree with immL_NotPow2/immL_Pow2, for the encoding, see below.
>
> One idea to try: you can move "log2_long(n->get_long()) > ..." check
> from operand declaration to the instruction.
>
> operand immL_Pow2() %{
>    // ...
>    predicate(is_power_of_2_long(n->get_long()));
>    ...
>
> operand immL_NotPow2() %{
>    // ...
>    predicate(is_power_of_2_long(~n->get_long()));
>    ...
>
> instruct btrL_mem_imm(memory dst, immL_NotPow2 con, rFlagsReg cr) %{
>    predicate(log2_long(~in(2)->in(2)->get_long()) > 30);
>    match(Set dst (StoreL dst (AndL (LoadL dst) con)));
> ...
>
> instruct btsL_mem_imm(memory dst, immPow2L con, rFlagsReg cr) %{
>    predicate(log2_long(in(2)->in(2)->get_long()) > 31);
>    match(Set dst (StoreL dst (OrL (LoadL dst) con)));
> ...
>
> It looks more natural (but also it requires more code) to do such
> operation-specific dispatching on instructions than on operands.
>
> Best regards,
> Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Tue Nov 12 22:11:13 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 13 Nov 2019 01:11:13 +0300
Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and
 setting long vector bits
In-Reply-To: <CAEgw74BJTRaNTq2_nzpwaZ6oS_UmDF35CVa4HapQ8+pMaHhQ_w@mail.gmail.com>
References: <CAEgw74CAZ+8uwjRoP_ti0yyyVAKQ_KMrM7mWJ4+XeqaKUbicyQ@mail.gmail.com>
 <15fe28f9-e3e9-2766-2287-0fb1762b4414@oracle.com>
 <CAEgw74A4NdXGRBEyQn1VZndKg-9nPYqtyCh2pwpQktKi48rN4Q@mail.gmail.com>
 <b2bf3707-d593-277a-7cf8-0144655ee09b@oracle.com>
 <CAEgw74BJTRaNTq2_nzpwaZ6oS_UmDF35CVa4HapQ8+pMaHhQ_w@mail.gmail.com>
Message-ID: <534e81e9-9eac-6631-805a-7e5616b2f940@oracle.com>


> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.01/

Looks good.

PS: it would be nice to be able to directly reference instruction 
arguments from predicates (in a similar way it is supported in 
ins_encode section).

Then:

+  predicate(log2_long(~n->in(3)->in(2)->get_long()) > 30);
+  predicate(log2_long( n->in(3)->in(2)->get_long()) > 31);

could be turned into:

+  predicate(log2_long(~$con->get_long()) > 30);
+  predicate(log2_long( $con->get_long()) > 31);

or even:

+  predicate(log2_long(~$con$$constant) > 30);
+  predicate(log2_long( $con$$constant) > 31);

Best regards,
Vladimir Ivanov

> 
> I've pushed it to jdk/submit as second changeset on branch
> "JDK-8214239" and tests are OK:
> 
> http://hg.openjdk.java.net/jdk/submit/rev/f961f7a454e4
> 
> Any feedback is welcome.
> 
> Thanks,
> Bernard
> 
> On Tue, 12 Nov 2019 at 14:32, Vladimir Ivanov
> <vladimir.x.ivanov at oracle.com> wrote:
>>
>> Thanks for the clarifications, Bernard.
>>
>>>>> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/
>>>>
>>>> I don't see cases for non-constant masks John suggested covered. Have
>>>> you tried to implement them? Any problems encountered or did you just
>>>> leave them for future improvement?
>>>
>>> I didn't experiment with non-constant masks yet, which is why I left
>>> them for future improvements (as told to John).
>>
>> Sounds good.
>>
>>
>>>> Why do you limit the optimization to bits in upper half? Is it because
>>>> ordinary andq/orq instructions work well for the rest? If that's the
>>>> case, it deserves a comment.
>>>
>>> On a pure specification basis (Intel optimization manual that Sandhya
>>> pointed me to), AND/OR and BTR/BTS have the same latency=1 but a
>>> slightly better throughput for the former and when experimenting with
>>> values <= 32-bit, I didn't observed much difference or quite
>>> imperceptibly in favor of AND/OR. But with pure 64-bit values, the
>>> benefit is much more evident because BTR/BTS replaces both a MOV and
>>> an AND/OR which is simply better on specification basis (latency=1 for
>>> BTR/BTS vs latency=1+1 for MOV + AND/OR). So, I'll update the comments
>>> as next:
>>>
>>> // n should be a pure 64-bit power of 2 immediate because AND/OR works
>>> well enough for 8/32-bit values.
>>> // n should be a pure 64-bit immediate given that not(n) is a power of
>>> 2 because AND/OR works well enough for 8/32-bit values.
>>
>> Looks good.
>>
>>>
>>>> (immPow2NotL is a bit misleading: I read it as "power of 2, but not a
>>>> long". What do you think about immL_NegPow2/immL_Pow2? Not sure how to
>>>> encode that it's > 2^32, but I would just skip it for now.)
>>>
>>> I agree with immL_NotPow2/immL_Pow2, for the encoding, see below.
>>
>> One idea to try: you can move "log2_long(n->get_long()) > ..." check
>> from operand declaration to the instruction.
>>
>> operand immL_Pow2() %{
>>     // ...
>>     predicate(is_power_of_2_long(n->get_long()));
>>     ...
>>
>> operand immL_NotPow2() %{
>>     // ...
>>     predicate(is_power_of_2_long(~n->get_long()));
>>     ...
>>
>> instruct btrL_mem_imm(memory dst, immL_NotPow2 con, rFlagsReg cr) %{
>>     predicate(log2_long(~in(2)->in(2)->get_long()) > 30);
>>     match(Set dst (StoreL dst (AndL (LoadL dst) con)));
>> ...
>>
>> instruct btsL_mem_imm(memory dst, immPow2L con, rFlagsReg cr) %{
>>     predicate(log2_long(in(2)->in(2)->get_long()) > 31);
>>     match(Set dst (StoreL dst (OrL (LoadL dst) con)));
>> ...
>>
>> It looks more natural (but also it requires more code) to do such
>> operation-specific dispatching on instructions than on operands.
>>
>> Best regards,
>> Vladimir Ivanov

From claes.redestad at oracle.com  Tue Nov 12 23:09:44 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Wed, 13 Nov 2019 00:09:44 +0100
Subject: RFR: 8234003: Improve IndexSet iteration
Message-ID: <0f55bfa2-f6e7-9c9e-e10f-a97a909eeb98@oracle.com>

Hi,

a significant portion of work done during register allocation in C2 is
iterating over IndexSets.

A few small optimizations show a ~4% decrease in instructions retired by
register allocation when instrumenting, and up to 3% fewer instructions
retired in total on startup tests.

Bug:    https://bugs.openjdk.java.net/browse/JDK-8234003
Webrev: http://cr.openjdk.java.net/~redestad/8234003/open.00/

The biggest improvement comes from avoiding iterating over empty sets
altogether. A smaller improvement from adding a water mark to avoid
iterating over all the blocks in the IndexSet.

Testing: tier1-3, verified improvements on large and tiny startup tests,
checked that any increased inlining is footprint neutral on Linux.

Thanks!

/Claes


From igor.ignatyev at oracle.com  Tue Nov 12 23:24:38 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 12 Nov 2019 15:24:38 -0800
Subject: RFR(T) : 8226795 : compiler/tiered/Level2RecompilationTest.java fails
 when XX:TieredStopAtLevel=1/2/3 is set
Message-ID: <E2288872-8982-4F4F-BDB4-C3D609689471@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8226795/webrev.00/index.html
> 13 lines changed: 12 ins; 0 del; 1 mod; 

Hi all,

could you please review this small and trivial patch which adds @requires to exclude Level2RecompilationTest, OSRFailureLevel4Test, and TestTypeProfiling tests from execution w/ non default values of TieredStopAtLevel?
the tests expect to have tiered compilation enabled and level 4 compiler available, so they can't be run w/ TieredStopAtLevel < 4.

webrev: http://cr.openjdk.java.net/~iignatyev//8226795/webrev.00/index.html
JBS: https://bugs.openjdk.java.net/browse/JDK-8226795
testing: verified that changed tests are run w/o TieredStopAtLevel, w/ TieredStopAtLevel=4 and aren't run w/ TieredStopAtLevel != 4

Thanks,
-- Igor


From claes.redestad at oracle.com  Tue Nov 12 23:47:59 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Wed, 13 Nov 2019 00:47:59 +0100
Subject: RFR(T) : 8226795 : compiler/tiered/Level2RecompilationTest.java
 fails when XX:TieredStopAtLevel=1/2/3 is set
In-Reply-To: <E2288872-8982-4F4F-BDB4-C3D609689471@oracle.com>
References: <E2288872-8982-4F4F-BDB4-C3D609689471@oracle.com>
Message-ID: <d1acf026-e22a-9c3c-2e57-abfd28a8adaa@oracle.com>

Hi Igor,

fix looks good to me.

A more open question: do tests that need to specifically compile with C2
also need to be made aware of the recently added
-XX:CompilationMode=quick-only flag? IIRC this would disable C2, but not
in exactly the same way as TieredStopAtLevel=1/2/3 would
(TieredStopAtLevel is ergonomically set to 4 in this case).

/Claes

On 2019-11-13 00:24, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev//8226795/webrev.00/index.html
>> 13 lines changed: 12 ins; 0 del; 1 mod;
> 
> Hi all,
> 
> could you please review this small and trivial patch which adds @requires to exclude Level2RecompilationTest, OSRFailureLevel4Test, and TestTypeProfiling tests from execution w/ non default values of TieredStopAtLevel?
> the tests expect to have tiered compilation enabled and level 4 compiler available, so they can't be run w/ TieredStopAtLevel < 4.
> 
> webrev: http://cr.openjdk.java.net/~iignatyev//8226795/webrev.00/index.html
> JBS: https://bugs.openjdk.java.net/browse/JDK-8226795
> testing: verified that changed tests are run w/o TieredStopAtLevel, w/ TieredStopAtLevel=4 and aren't run w/ TieredStopAtLevel != 4
> 
> Thanks,
> -- Igor
> 

From igor.ignatyev at oracle.com  Tue Nov 12 23:52:55 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 12 Nov 2019 15:52:55 -0800
Subject: RFR(T) : 8226795 : compiler/tiered/Level2RecompilationTest.java
 fails when XX:TieredStopAtLevel=1/2/3 is set
In-Reply-To: <d1acf026-e22a-9c3c-2e57-abfd28a8adaa@oracle.com>
References: <E2288872-8982-4F4F-BDB4-C3D609689471@oracle.com>
 <d1acf026-e22a-9c3c-2e57-abfd28a8adaa@oracle.com>
Message-ID: <68140795-6949-4B4B-9108-13F656F504E9@oracle.com>


> On Nov 12, 2019, at 3:47 PM, Claes Redestad <claes.redestad at oracle.com> wrote:
> 
> Hi Igor,
> 
> fix looks good to me.
thanks.
> 
> A more open question: do tests that need to specifically compile with C2
> also need to be made aware of the recently added
> -XX:CompilationMode=quick-only flag? IIRC this would disable C2, but not
> in exactly the same way as TieredStopAtLevel=1/2/3 would
> (TieredStopAtLevel is ergonomically set to 4 in this case).
I think they do, I'll file an RFE to improve how we mark c2-only tests.

> 
> /Claes
> 
> On 2019-11-13 00:24, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev//8226795/webrev.00/index.html
>>> 13 lines changed: 12 ins; 0 del; 1 mod;
>> Hi all,
>> could you please review this small and trivial patch which adds @requires to exclude Level2RecompilationTest, OSRFailureLevel4Test, and TestTypeProfiling tests from execution w/ non default values of TieredStopAtLevel?
>> the tests expect to have tiered compilation enabled and level 4 compiler available, so they can't be run w/ TieredStopAtLevel < 4.
>> webrev: http://cr.openjdk.java.net/~iignatyev//8226795/webrev.00/index.html
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8226795
>> testing: verified that changed tests are run w/o TieredStopAtLevel, w/ TieredStopAtLevel=4 and aren't run w/ TieredStopAtLevel != 4
>> Thanks,
>> -- Igor


From igor.ignatyev at oracle.com  Wed Nov 13 04:18:15 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 12 Nov 2019 20:18:15 -0800
Subject: RFR(S) : 8225756 : [testbug]
 compiler/loopstripmining/CheckLoopStripMining.java sets too short a
 SafepointTimeoutDelay
Message-ID: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com>

http://cr.openjdk.java.net/~iignatyev/8225756/webrev.00/
> 34 lines changed: 24 ins; 3 del; 7 mod;

Hi all,

could you please review this patch adjust SafepointTimeoutDelay and GuaranteedSafepointInterval in CheckLoopStripMining test according to time out factor?

webrev: http://cr.openjdk.java.net/~iignatyev/8225756/webrev.00/
JBS: https://bugs.openjdk.java.net/browse/JDK-8225756
testing: the changed test on windows-x64-debug (where all the failures were seen) 100 times

Thanks,
-- Igor 

From vladimir.kozlov at oracle.com  Wed Nov 13 05:05:34 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 12 Nov 2019 21:05:34 -0800
Subject: RFR(S) : 8225756 : [testbug]
 compiler/loopstripmining/CheckLoopStripMining.java sets too short a
 SafepointTimeoutDelay
In-Reply-To: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com>
References: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com>
Message-ID: <a33632e7-4f34-8375-91b6-09f7857c580d@oracle.com>

Good.

thanks,
Vladimir

On 11/12/19 8:18 PM, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev/8225756/webrev.00/
>> 34 lines changed: 24 ins; 3 del; 7 mod;
> 
> Hi all,
> 
> could you please review this patch adjust SafepointTimeoutDelay and GuaranteedSafepointInterval in CheckLoopStripMining test according to time out factor?
> 
> webrev: http://cr.openjdk.java.net/~iignatyev/8225756/webrev.00/
> JBS: https://bugs.openjdk.java.net/browse/JDK-8225756
> testing: the changed test on windows-x64-debug (where all the failures were seen) 100 times
> 
> Thanks,
> -- Igor
> 

From ekaterina.pavlova at oracle.com  Wed Nov 13 05:54:35 2019
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Tue, 12 Nov 2019 21:54:35 -0800
Subject: RFR(S) : 8225756 : [testbug]
 compiler/loopstripmining/CheckLoopStripMining.java sets too short a
 SafepointTimeoutDelay
In-Reply-To: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com>
References: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com>
Message-ID: <6a25864e-40c2-d8b6-e166-1ec39b81639b@oracle.com>

Looks good.

thanks,
-katya

On 11/12/19 8:18 PM, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev/8225756/webrev.00/
>> 34 lines changed: 24 ins; 3 del; 7 mod;
> 
> Hi all,
> 
> could you please review this patch adjust SafepointTimeoutDelay and GuaranteedSafepointInterval in CheckLoopStripMining test according to time out factor?
> 
> webrev: http://cr.openjdk.java.net/~iignatyev/8225756/webrev.00/
> JBS: https://bugs.openjdk.java.net/browse/JDK-8225756
> testing: the changed test on windows-x64-debug (where all the failures were seen) 100 times
> 
> Thanks,
> -- Igor
> 


From tobias.hartmann at oracle.com  Wed Nov 13 07:40:51 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 13 Nov 2019 08:40:51 +0100
Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed:
 must have CFG nodes
In-Reply-To: <cce4f864-87c0-0282-a504-95ef93948d74@oracle.com>
References: <b0d4f5db-1e2b-ec7a-b6e0-dced28e85a44@oracle.com>
 <bf5d6250-abcc-e820-7886-a966fae1af9e@oracle.com>
 <af8bacdd-09d8-3c11-6444-ee51a6afc0a4@oracle.com>
 <720f25b4-e153-652a-cef3-f41084d30662@oracle.com>
 <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com>
 <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com>
 <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com>
 <be05a933-f91b-840e-1e0b-a1d412d5ca71@oracle.com>
 <47863dda-8509-2d94-83b5-56ad76c178e2@oracle.com>
 <aed1473b-c4d5-fabd-431b-bae83fc8af2f@oracle.com>
 <6608380f-065f-a839-2e40-a58e43cfc7ec@oracle.com>
 <55581f09-b442-b379-23ce-236ae46e9fff@oracle.com>
 <cce4f864-87c0-0282-a504-95ef93948d74@oracle.com>
Message-ID: <755ad6bd-e29a-3295-329f-452d695515d6@oracle.com>

Thanks Vladimir.

Best regards,
Tobias

On 12.11.19 20:42, Vladimir Kozlov wrote:
> +1 (good)
> 
> Thanks,
> Vladimir K.
> 
> On 11/12/19 3:00 AM, Vladimir Ivanov wrote:
>>
>>> http://cr.openjdk.java.net/~thartmann/8233656/webrev.02/
>>
>> Looks good. (And sorry for the misleading suggestion.)
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>> On 12.11.19 11:49, Tobias Hartmann wrote:
>>>> Okay, this is actually not correct:
>>>> If d is an IfTrue projection, we would change d to the corresponding If node which could be a
>>>> dominator of n while the IfTrue projection is not.
>>>>
>>>> Best regards,
>>>> Tobias
>>>>
>>>> On 12.11.19 10:52, Tobias Hartmann wrote:
>>>>> Thanks Vladimir!
>>>>>
>>>>> Best regards,
>>>>> Tobias
>>>>>
>>>>> On 12.11.19 10:52, Vladimir Ivanov wrote:
>>>>>>
>>>>>>> http://cr.openjdk.java.net/~thartmann/8233656/webrev.01/
>>>>>>
>>>>>> Looks good.
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir Ivanov

From rwestrel at redhat.com  Wed Nov 13 09:19:44 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 13 Nov 2019 10:19:44 +0100
Subject: RFR(S) : 8225756 : [testbug]
 compiler/loopstripmining/CheckLoopStripMining.java sets too short a
 SafepointTimeoutDelay
In-Reply-To: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com>
References: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com>
Message-ID: <87a790dtfz.fsf@redhat.com>


Looks reasonable to me.

Thanks for fixing this, Igor.

Roland.


From Pengfei.Li at arm.com  Wed Nov 13 09:55:48 2019
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Wed, 13 Nov 2019 09:55:48 +0000
Subject: RFR(M): 8233743: AArch64: Make r27 conditionally allocatable
Message-ID: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi,

JBS: https://bugs.openjdk.java.net/browse/JDK-8233743
Webrev: http://cr.openjdk.java.net/~pli/rfr/8233743/webrev.00/

This is a follow-up patch of JDK-8217909[1] to make the AArch64 register
r27 allocatable when CompressedOops and CompressedClassPointers are both
turned off.

Below changes have been made:
- Massage the RegMask(s) in reg_mask_init() at C2 initialization and
remove r27 from some of the masks conditionally to make it allocatable.
- Also make r29 conditionally reserved in this similar way.
- Make r29 allocatable for pointers as well as integers.
- Replace an rheapbase use to rscratch1 in AArch64 ZGC.
- Revert JDK-8231754[2] which makes r27 always reserved in JVMCI.

This patch aligns with the implementation in [1] which makes the x86_64
r12 register allocatable. Please let me know if I have missed anything
for AArch64.

Tests:
Full jtreg with default options and extra options "-XX:-UseCompressedOops
-XX:+PreserveFramePointer". No new failure is found.

[1] https://hg.openjdk.java.net/jdk/jdk/rev/48b50573dee4
[2] https://hg.openjdk.java.net/jdk/jdk/rev/d068b1e534de

--
Thanks,
Pengfei


From martin.doerr at sap.com  Wed Nov 13 11:37:05 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 13 Nov 2019 11:37:05 +0000
Subject: RFR(S) : 8225756 : [testbug]
 compiler/loopstripmining/CheckLoopStripMining.java sets too short a
 SafepointTimeoutDelay
In-Reply-To: <87a790dtfz.fsf@redhat.com>
References: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com>
 <87a790dtfz.fsf@redhat.com>
Message-ID: <VI1PR0201MB247990D2851CF64E7F0BF5FA9A760@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Igor,

thanks for improving it.

Please note that this test was derived from
test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.java
JDK-8227528 has added -XX:-UseBiasedLocking.

Would you mind adding that to your new version, too?

Please also remove double-whitespace before Utils.adjustTimeout(500).

Thanks and best regards,
Martin


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of Roland Westrelin
> Sent: Mittwoch, 13. November 2019 10:20
> To: Igor Ignatyev <igor.ignatyev at oracle.com>; hotspot compiler <hotspot-
> compiler-dev at openjdk.java.net>
> Subject: Re: RFR(S) : 8225756 : [testbug]
> compiler/loopstripmining/CheckLoopStripMining.java sets too short a
> SafepointTimeoutDelay
> 
> 
> Looks reasonable to me.
> 
> Thanks for fixing this, Igor.
> 
> Roland.


From richard.reingruber at sap.com  Wed Nov 13 12:24:41 2019
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Wed, 13 Nov 2019 12:24:41 +0000
Subject: RFC 8233915: JVMTI FollowReferences: Java Heap Leak not found
 because of C2 Scalar Replacement
In-Reply-To: <729138cc-7a21-cf79-947c-c6a68f34237a@oracle.com>
References: <VI1PR02MB482960E862A276EF918D88B59B740@VI1PR02MB4829.eurprd02.prod.outlook.com>
 <729138cc-7a21-cf79-947c-c6a68f34237a@oracle.com>
Message-ID: <VI1PR02MB4829B0C8CCBBA84E920B924D9B760@VI1PR02MB4829.eurprd02.prod.outlook.com>

Hi Leonid,

these are valid points. Thanks for making me aware of them.

I've increased the maximum heap size in my tests as suggested, and I've also run them with ZGC
enabled.

I've also added the vm.opt.TieredCompilation != true requirement.

I've done the changes in place.

Thanks, Richard.

-----Original Message-----
From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Leonid Mesnik
Sent: Dienstag, 12. November 2019 20:34
To: hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFC 8233915: JVMTI FollowReferences: Java Heap Leak not found because of C2 Scalar Replacement

Hi

I don't make complete review just sanity verified your test headers. I 
see a couple of potential issues with them.

1) The using Xmx32M could cause OOME failures if test is executed with 
ZGC. I think that at least 256M should be set. Could you please verify 
that your tests pass with ZGC enabled.


2) I think it makes sense to add requires

vm.opt.TieredCompilation != true

to just skip tests if anyone runs them with tiered compilation disabled 
explicitly.

Leonid

On 11/11/19 7:29 AM, Reingruber, Richard wrote:
> Hi,
>
> I have created https://bugs.openjdk.java.net/browse/JDK-8233915
>
> In short, a set of live objects L is not found using JVMTI FollowReferences() if L is only reachable
> from a scalar replaced object in a frame of a C2 compiled method. If L happens to be a growing leak,
> then a dynamically loaded JVMTI agent (note: can_tag_objects is an always capability) for heap
> diagnostics won't discover L as live and it won't be able to find root references that lead to L.
>
> I'd like to suggest the implementation for the proposed enhancement JDK-8227745 as bug-fix.
>
> RFE:       https://bugs.openjdk.java.net/browse/JDK-8227745
> Webrev(*): http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.1/
>
> Please comment on the suggestion. Dou you see other solutions that allow an agent to discover the
> chain of references to L?
>
> I'd like to work on the complexity as well. One significant simplification could be, if it was
> possible to reallocate scalar replaced objects at safepoints (i.e. allow the VM thread to call
> Deoptimization::realloc_objects()). The GC interface does not seem to allow this.
>
> Thanks, Richard.
>
> (*) Not yet accepted, because deemed too complex for the performance gain. Note that I was able to
>      reduce webrev.1 in size compared to webrev.0

From christoph.goettschkes at microdoc.com  Wed Nov 13 12:42:58 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Wed, 13 Nov 2019 13:42:58 +0100
Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java
 only works with server VMs.
In-Reply-To: <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com>
References: <20191112120936.1D826D285F@aojmv0009>
 <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com>
Message-ID: <mailman.7.1573649079.19479.hotspot-compiler-dev@openjdk.java.net>

Hi Igor,

thanks for your explanation.

Igor Ignatyev <igor.ignatyev at oracle.com> wrote on 2019-11-12 20:40:46:

> we are trying to get rid of IgnoreUnrecognizedVMOptions in our tests, as
> in most cases, it causes wasted compute time (as in this test) and can
> also lead to wrong/deprecated/deleted flags sneaking into the testbase

Agreed. I also wanted to discuss this, since I think that your solution
is better than mine, but at the same time, I saw possible problems with
it, see below.

> as '@requires vm.flavor == "server"' filters configurations based vm
> build type, it will still allow execution on JVM w/ JVMCI and when JVMCI
> compiler is selected, as it will still be Server VM build. so, in a
> sense, the test will be w/ JVMCI in the same way as w/ your approach.

My concern is not about server VMs with JVMCI, but client VMs with JVMCI
enabled. Is this a valid configuration? The MaxVectorSize option is
defined in [1] as well as in [2], so for me it looks like MaxVectorSize
can be used for any VM variant as long as JVMCI is enabled. The
configure script also states that both compilers are possible (if you
configure with --with-jvm-features='jvmci'):

configure: error: Specified JVM feature 'jvmci' requires feature
'compiler2' or 'compiler1'

Should maybe the requires tag "vm.jvmci" be used as well, like:

    @requires vm.flavor == "server" | vm.jvmci

> this is the known limitation of jtreg/@requires, and our current way to
> workaround it is to split a test description based on @requires values

Yes, if the @requires tag is used, splitting up the test looks like a good
idea. I didn't know that it is possible to have multiple test descriptions
in one test file.

I created a new webrev with the new ideas:

https://cr.openjdk.java.net/~cgo/8231954/webrev.01/

I tested with an amd64 client and server VM and it looks good. I am
currently unable to build a client VM with JVMCI enabled, hence no test
for that yet. I get compile errors and as soon as I resolve those,
runtime errors occur. Before I look into that, I would like to know if
client VMs with JVMCI enabled are supported or not.

Thanks,
Christoph

[1]
https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/share/opto/c2_globals.hpp

[2]
https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/share/jvmci/jvmci_globals.hpp


From nils.eliasson at oracle.com  Wed Nov 13 16:08:10 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Wed, 13 Nov 2019 17:08:10 +0100
Subject: RFR: 8234003: Improve IndexSet iteration
In-Reply-To: <0f55bfa2-f6e7-9c9e-e10f-a97a909eeb98@oracle.com>
References: <0f55bfa2-f6e7-9c9e-e10f-a97a909eeb98@oracle.com>
Message-ID: <7edbcedf-b24c-fd8b-9aac-f3a87aaf1134@oracle.com>

Hi Claes,

Thanks for cleaning up some of the surrounding code too.

Looks good,

Nils Eliasson

On 2019-11-13 00:09, Claes Redestad wrote:
> Hi,
>
> a significant portion of work done during register allocation in C2 is
> iterating over IndexSets.
>
> A few small optimizations show a ~4% decrease in instructions retired by
> register allocation when instrumenting, and up to 3% fewer instructions
> retired in total on startup tests.
>
> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8234003
> Webrev: http://cr.openjdk.java.net/~redestad/8234003/open.00/
>
> The biggest improvement comes from avoiding iterating over empty sets
> altogether. A smaller improvement from adding a water mark to avoid
> iterating over all the blocks in the IndexSet.
>
> Testing: tier1-3, verified improvements on large and tiny startup tests,
> checked that any increased inlining is footprint neutral on Linux.
>
> Thanks!
>
> /Claes
>

From nils.eliasson at oracle.com  Wed Nov 13 16:12:57 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Wed, 13 Nov 2019 17:12:57 +0100
Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
 check
In-Reply-To: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
References: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
Message-ID: <8d451a2e-fb94-8d59-e02e-e3182e115a0b@oracle.com>

Hi Patric,

Looks good!

(I have pre-reviewed this patch offline)

Regards,

Nils

On 2019-11-12 15:16, Patric Hedlin wrote:
> Dear all,
>
> I would like to ask for help to review the following change/update:
>
> Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376
> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/
>
> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check
>
> ??? Adding a simple subsumption test to IfNode::Ideal to enable a local
> ??? short-circuit for (obviously) redundant if-nodes.
>
> Testing: hs-tier1-4, hs-precheckin-comp
>
>
> Best regards,
> Patric
>

From leonid.mesnik at oracle.com  Wed Nov 13 16:42:18 2019
From: leonid.mesnik at oracle.com (Leonid Mesnik)
Date: Wed, 13 Nov 2019 08:42:18 -0800
Subject: RFC 8233915: JVMTI FollowReferences: Java Heap Leak not found
 because of C2 Scalar Replacement
In-Reply-To: <VI1PR02MB4829B0C8CCBBA84E920B924D9B760@VI1PR02MB4829.eurprd02.prod.outlook.com>
References: <VI1PR02MB482960E862A276EF918D88B59B740@VI1PR02MB4829.eurprd02.prod.outlook.com>
 <729138cc-7a21-cf79-947c-c6a68f34237a@oracle.com>
 <VI1PR02MB4829B0C8CCBBA84E920B924D9B760@VI1PR02MB4829.eurprd02.prod.outlook.com>
Message-ID: <b6ce6d77-3c9c-c978-3fc5-586717089ec8@oracle.com>

Thank you for fixing this.

Leonid

On 11/13/19 4:24 AM, Reingruber, Richard wrote:
> Hi Leonid,
>
> these are valid points. Thanks for making me aware of them.
>
> I've increased the maximum heap size in my tests as suggested, and I've also run them with ZGC
> enabled.
>
> I've also added the vm.opt.TieredCompilation != true requirement.
>
> I've done the changes in place.
>
> Thanks, Richard.
>
> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Leonid Mesnik
> Sent: Dienstag, 12. November 2019 20:34
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFC 8233915: JVMTI FollowReferences: Java Heap Leak not found because of C2 Scalar Replacement
>
> Hi
>
> I don't make complete review just sanity verified your test headers. I
> see a couple of potential issues with them.
>
> 1) The using Xmx32M could cause OOME failures if test is executed with
> ZGC. I think that at least 256M should be set. Could you please verify
> that your tests pass with ZGC enabled.
>
>
> 2) I think it makes sense to add requires
>
> vm.opt.TieredCompilation != true
>
> to just skip tests if anyone runs them with tiered compilation disabled
> explicitly.
>
> Leonid
>
> On 11/11/19 7:29 AM, Reingruber, Richard wrote:
>> Hi,
>>
>> I have created https://bugs.openjdk.java.net/browse/JDK-8233915
>>
>> In short, a set of live objects L is not found using JVMTI FollowReferences() if L is only reachable
>> from a scalar replaced object in a frame of a C2 compiled method. If L happens to be a growing leak,
>> then a dynamically loaded JVMTI agent (note: can_tag_objects is an always capability) for heap
>> diagnostics won't discover L as live and it won't be able to find root references that lead to L.
>>
>> I'd like to suggest the implementation for the proposed enhancement JDK-8227745 as bug-fix.
>>
>> RFE:       https://bugs.openjdk.java.net/browse/JDK-8227745
>> Webrev(*): http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.1/
>>
>> Please comment on the suggestion. Dou you see other solutions that allow an agent to discover the
>> chain of references to L?
>>
>> I'd like to work on the complexity as well. One significant simplification could be, if it was
>> possible to reallocate scalar replaced objects at safepoints (i.e. allow the VM thread to call
>> Deoptimization::realloc_objects()). The GC interface does not seem to allow this.
>>
>> Thanks, Richard.
>>
>> (*) Not yet accepted, because deemed too complex for the performance gain. Note that I was able to
>>       reduce webrev.1 in size compared to webrev.0

From igor.ignatyev at oracle.com  Wed Nov 13 19:11:14 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 13 Nov 2019 11:11:14 -0800
Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java
 only works with server VMs.
In-Reply-To: <2w8hcs8xd1-1@aserp2030.oracle.com>
References: <20191112120936.1D826D285F@aojmv0009>
 <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com>
 <2w8hcs8xd1-1@aserp2030.oracle.com>
Message-ID: <7BEE375F-312E-4EE5-B0FA-2223752689A3@oracle.com>

@Christoph,

webrev.01 looks good to me.
I always thought that jvmci feature can be built only when compiler2 feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp suggests that jvmci can be used w/o compiler2; I don't think we have ever build/test, let alone support, this configuration.

@Vladimir,
did/do we plan to support compiler1 + jvmci w/o compiler2 configuration?

Thanks,
-- Igor

> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com wrote:
> 
> Hi Igor,
> 
> thanks for your explanation.
> 
> Igor Ignatyev <igor.ignatyev at oracle.com> wrote on 2019-11-12 20:40:46:
> 
>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our tests, as
>> in most cases, it causes wasted compute time (as in this test) and can
>> also lead to wrong/deprecated/deleted flags sneaking into the testbase
> 
> Agreed. I also wanted to discuss this, since I think that your solution
> is better than mine, but at the same time, I saw possible problems with
> it, see below.
> 
>> as '@requires vm.flavor == "server"' filters configurations based vm
>> build type, it will still allow execution on JVM w/ JVMCI and when JVMCI
>> compiler is selected, as it will still be Server VM build. so, in a
>> sense, the test will be w/ JVMCI in the same way as w/ your approach.
> 
> My concern is not about server VMs with JVMCI, but client VMs with JVMCI
> enabled. Is this a valid configuration? The MaxVectorSize option is
> defined in [1] as well as in [2], so for me it looks like MaxVectorSize
> can be used for any VM variant as long as JVMCI is enabled. The
> configure script also states that both compilers are possible (if you
> configure with --with-jvm-features='jvmci'):
> 
> configure: error: Specified JVM feature 'jvmci' requires feature
> 'compiler2' or 'compiler1'
> 
> Should maybe the requires tag "vm.jvmci" be used as well, like:
> 
>    @requires vm.flavor == "server" | vm.jvmci
> 
>> this is the known limitation of jtreg/@requires, and our current way to
>> workaround it is to split a test description based on @requires values
> 
> Yes, if the @requires tag is used, splitting up the test looks like a good
> idea. I didn't know that it is possible to have multiple test descriptions
> in one test file.
> 
> I created a new webrev with the new ideas:
> 
> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/
> 
> I tested with an amd64 client and server VM and it looks good. I am
> currently unable to build a client VM with JVMCI enabled, hence no test
> for that yet. I get compile errors and as soon as I resolve those,
> runtime errors occur. Before I look into that, I would like to know if
> client VMs with JVMCI enabled are supported or not.
> 
> Thanks,
> Christoph
> 
> [1]
> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/share/opto/c2_globals.hpp
> 
> [2]
> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/share/jvmci/jvmci_globals.hpp
> 


From martin.doerr at sap.com  Wed Nov 13 19:26:43 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 13 Nov 2019 19:26:43 +0000
Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
 check
In-Reply-To: <8d451a2e-fb94-8d59-e02e-e3182e115a0b@oracle.com>
References: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
 <8d451a2e-fb94-8d59-e02e-e3182e115a0b@oracle.com>
Message-ID: <VI1PR0201MB24791AD6578AAC150EA1B6B69A760@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Patric,

thanks for addressing this issue.

There seems to be a small issue with the webrev:
-  if (bol->is_Bool()) {
+  if (!bol->is_Bool()) {

I guess you have tested it already this way and just something with the webrev went wrong.

You recognize and transform a specific pattern you have described in the comment. It is appropriate for fixing this example and it looks good to me. But I'm curious how often this matches. Maybe we can see a performance improvement.

Thanks and best regards,
Martin


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of Nils Eliasson
> Sent: Mittwoch, 13. November 2019 17:13
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
> check
> 
> Hi Patric,
> 
> Looks good!
> 
> (I have pre-reviewed this patch offline)
> 
> Regards,
> 
> Nils
> 
> On 2019-11-12 15:16, Patric Hedlin wrote:
> > Dear all,
> >
> > I would like to ask for help to review the following change/update:
> >
> > Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376
> > Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/
> >
> > 8220376: C2: Int >0 not recognized as !=0 for div by 0 check
> >
> > ??? Adding a simple subsumption test to IfNode::Ideal to enable a local
> > ??? short-circuit for (obviously) redundant if-nodes.
> >
> > Testing: hs-tier1-4, hs-precheckin-comp
> >
> >
> > Best regards,
> > Patric
> >

From igor.ignatyev at oracle.com  Wed Nov 13 19:30:44 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 13 Nov 2019 11:30:44 -0800
Subject: RFR(S) : 8225756 : [testbug]
 compiler/loopstripmining/CheckLoopStripMining.java sets too short a
 SafepointTimeoutDelay
In-Reply-To: <VI1PR0201MB247990D2851CF64E7F0BF5FA9A760@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com>
 <87a790dtfz.fsf@redhat.com>
 <VI1PR0201MB247990D2851CF64E7F0BF5FA9A760@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <C673A8AF-41BE-4465-B03E-B54A42A3589B@oracle.com>

@Martin,
I've updated the test according to your comments and also added 'to prevent biased locking handshakes from changing the timing' comment preceding '-XX:-UseBiasedLocking'.

@all, thanks for your review, pushed.

-- Igor


> On Nov 13, 2019, at 3:37 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
> 
> Hi Igor,
> 
> thanks for improving it.
> 
> Please note that this test was derived from
> test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.java
> JDK-8227528 has added -XX:-UseBiasedLocking.
> 
> Would you mind adding that to your new version, too?
> 
> Please also remove double-whitespace before Utils.adjustTimeout(500).
> 
> Thanks and best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: hotspot-compiler-dev <hotspot-compiler-dev-
>> bounces at openjdk.java.net> On Behalf Of Roland Westrelin
>> Sent: Mittwoch, 13. November 2019 10:20
>> To: Igor Ignatyev <igor.ignatyev at oracle.com>; hotspot compiler <hotspot-
>> compiler-dev at openjdk.java.net>
>> Subject: Re: RFR(S) : 8225756 : [testbug]
>> compiler/loopstripmining/CheckLoopStripMining.java sets too short a
>> SafepointTimeoutDelay
>> 
>> 
>> Looks reasonable to me.
>> 
>> Thanks for fixing this, Igor.
>> 
>> Roland.
> 


From vladimir.kozlov at oracle.com  Wed Nov 13 19:32:18 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 13 Nov 2019 11:32:18 -0800
Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java
 only works with server VMs.
In-Reply-To: <7BEE375F-312E-4EE5-B0FA-2223752689A3@oracle.com>
References: <20191112120936.1D826D285F@aojmv0009>
 <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com>
 <2w8hcs8xd1-1@aserp2030.oracle.com>
 <7BEE375F-312E-4EE5-B0FA-2223752689A3@oracle.com>
Message-ID: <71204849-979a-bd2e-b302-69ac81a44bca@oracle.com>

On 11/13/19 11:11 AM, Igor Ignatyev wrote:
> @Christoph,
> 
> webrev.01 looks good to me.
> I always thought that jvmci feature can be built only when compiler2 feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp suggests that jvmci can be used w/o compiler2; I don't think we have ever build/test, let alone support, this configuration.
> 
> @Vladimir,
> did/do we plan to support compiler1 + jvmci w/o compiler2 configuration?

Yes. It could be configuration when we start looking on replacing C1 with Graal. I think several people were interested 
in "Client VM" like configuration.
Also Server configuration without C2 (with Graal or other jvmci compiler) which would be out configuration in a future.

But I would prefer to be more explicit in these changes:

@requires vm.compiler2.enabled | vm.graal.enabled

Thanks,
Vladimir

> 
> Thanks,
> -- Igor
> 
>> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com wrote:
>>
>> Hi Igor,
>>
>> thanks for your explanation.
>>
>> Igor Ignatyev <igor.ignatyev at oracle.com> wrote on 2019-11-12 20:40:46:
>>
>>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our tests, as
>>> in most cases, it causes wasted compute time (as in this test) and can
>>> also lead to wrong/deprecated/deleted flags sneaking into the testbase
>>
>> Agreed. I also wanted to discuss this, since I think that your solution
>> is better than mine, but at the same time, I saw possible problems with
>> it, see below.
>>
>>> as '@requires vm.flavor == "server"' filters configurations based vm
>>> build type, it will still allow execution on JVM w/ JVMCI and when JVMCI
>>> compiler is selected, as it will still be Server VM build. so, in a
>>> sense, the test will be w/ JVMCI in the same way as w/ your approach.
>>
>> My concern is not about server VMs with JVMCI, but client VMs with JVMCI
>> enabled. Is this a valid configuration? The MaxVectorSize option is
>> defined in [1] as well as in [2], so for me it looks like MaxVectorSize
>> can be used for any VM variant as long as JVMCI is enabled. The
>> configure script also states that both compilers are possible (if you
>> configure with --with-jvm-features='jvmci'):
>>
>> configure: error: Specified JVM feature 'jvmci' requires feature
>> 'compiler2' or 'compiler1'
>>
>> Should maybe the requires tag "vm.jvmci" be used as well, like:
>>
>>     @requires vm.flavor == "server" | vm.jvmci
>>
>>> this is the known limitation of jtreg/@requires, and our current way to
>>> workaround it is to split a test description based on @requires values
>>
>> Yes, if the @requires tag is used, splitting up the test looks like a good
>> idea. I didn't know that it is possible to have multiple test descriptions
>> in one test file.
>>
>> I created a new webrev with the new ideas:
>>
>> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/
>>
>> I tested with an amd64 client and server VM and it looks good. I am
>> currently unable to build a client VM with JVMCI enabled, hence no test
>> for that yet. I get compile errors and as soon as I resolve those,
>> runtime errors occur. Before I look into that, I would like to know if
>> client VMs with JVMCI enabled are supported or not.
>>
>> Thanks,
>> Christoph
>>
>> [1]
>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/share/opto/c2_globals.hpp
>>
>> [2]
>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/share/jvmci/jvmci_globals.hpp
>>
> 

From john.r.rose at oracle.com  Wed Nov 13 21:23:35 2019
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 13 Nov 2019 13:23:35 -0800
Subject: final field values should be trusted as constant (filed as
 JDK-8233873)
In-Reply-To: <69de5d88-2487-850b-5388-6363d88d2b5b@cs.oswego.edu>
References: <B3AE0B03-0AD0-4C1B-A5A4-67CF789A5B9D@oracle.com>
 <69de5d88-2487-850b-5388-6363d88d2b5b@cs.oswego.edu>
Message-ID: <06DC8BED-60F2-438C-82FD-1578B7215D80@oracle.com>

On Nov 9, 2019, at 11:21 AM, Doug Lea <dl at cs.oswego.edu> wrote:
> 
> On 11/8/19 8:24 PM, John Rose wrote:
> 
>> ## Side note on races
>> 
>> Although race conditions (on non-volatile fields) allow the JVM some
>> latitute to return "stale" values for field references, such latitude
>> would usually be quite narrow, since an execution of the invalid
>> optimized method is likely to occur downstream of the invalidating
>> field update (as determined by the happens-before relation of the
>> JMM). 
> 
> Ever since initial revisions of JLS1 version, the intent of JMM specs
> (including current) is to allow compilers to believe that the value they
> see in initial reads of a final field is the only value they will ever
> see. So no revision is necessary on these grounds (although one of these
> days there will be one that accommodates VarHandle modes etc,
> formalizing http://gee.cs.oswego.edu/dl/html/j9mm.html). Some of the
> spec messiness exists just to explain why compilers are allowed not to
> believe this as well, because of reflection etc.
> 
> In other words, don't let JMM concerns stop you from this worthwhile effort.

Thanks for the encouragement, Doug.

The JMM probably doesn?t require enhancement, but we do need some
elucidation of dark corners around settings of final fields outside if constructors.
Maybe it?s already been written; I?m sure it?s already been thought about,
by you at least.  Setting a final inside a constructor is not problematic, as
long as a JIT can prove statically that a (non-static) field it proposes to
constant fold is in an instance whose constructor has returned (normally).
This all by itself is tricky to do (see also B, C below), but let?s set that aside.

Also tricky is smashing of finals of normally-constructed objects, which is
allowed by the JVM.  This is allowed even if heinous.  We need some
Big Hammer to turn off optimization when it happens.  If we were really
slick we might try to amend the JMM to declare that the JIT can retain
a previously folded value, even after somebody smashed it, on the grounds
that this is a valid race condition, where the JITted code is racily reading
back in time.  (Is this already true??  Probably not.  It is worth considering
as a JMM enhancement; the idea is of races which are allowed to retain
sticky values that would ordinarily be updated.  I?m thinking @Stable
and final both.)  But set that aside for now also.

Here?s the hard part, I think.

When an instance is born without running a constructor, then we know
somebody did something off-label to set any non-static final fields it has.
This means the JMM makes no guarantees about safe publication of
those fields.  This in turn means somebody must do something:

A. Give up on all static final fields, because somebody might smash
a properly constructed object.  This is our decade-plus status quo,
which I?m very tired of.

B. Have the JVM leave enough clues to distinguish properly constructed
objects from the off-label ones.  (This is the ?larval/adult? distinction.)
Then the JIT can optimize only the normal cases and avoid the others.

C. Demand that frameworks which make off-label objects be upgraded
to include a memory fence after they are done setting their finals, so
that they conform to the JMM behaviors required of regular objects.
Make the fence be checkable by the JIT so it can see when the
framework has done its duty, so the JIT can treat the object normally.

D. Have the JMM make special rules for final fields, that their values
are ?sticker? or more ?memorable? than normal fields, so races can
see old values *if the JVM want to*.

E. Some balanced combination of the above.  I.e., give frameworks
a carrot for upgrading their construction sequences with better
performance, and a stick of emitting log entries or warnings when
they spew out objects in the larval state.

F. Your idea here, please!

? John

From ekaterina.pavlova at oracle.com  Wed Nov 13 21:28:58 2019
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Wed, 13 Nov 2019 13:28:58 -0800
Subject: RFR(T/XS) 8215728: [Graal] we should run some Graal tests in tier1
Message-ID: <ac890bc5-f864-ab7d-a53f-64fa03c52765@oracle.com>

Hi,

please review this small patch which defines jtreg test group 'tier1_compiler_graal' so we can run it
as part of tier1 testing. compiler/graalunit/HotspotTest.java is the most frequent failed test based on
bugs statistics, so I included only this group of tests for now. They also takes reasonable amount of
time to execute.

      JBS: https://bugs.openjdk.java.net/browse/JDK-8215728
   webrev: http://cr.openjdk.java.net/~epavlova//8215728/webrev.00/index.html
  testing: tier1


thanks,
-katya

From igor.ignatyev at oracle.com  Wed Nov 13 21:30:59 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 13 Nov 2019 13:30:59 -0800
Subject: RFR(T/XS) 8215728: [Graal] we should run some Graal tests in tier1
In-Reply-To: <ac890bc5-f864-ab7d-a53f-64fa03c52765@oracle.com>
References: <ac890bc5-f864-ab7d-a53f-64fa03c52765@oracle.com>
Message-ID: <E11D3454-18ED-4615-9067-A194E6153E70@oracle.com>

Hi Katya,

shouldn't this group be also into tier1_compiler group?

-- Igor

> On Nov 13, 2019, at 1:28 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
> 
> Hi,
> 
> please review this small patch which defines jtreg test group 'tier1_compiler_graal' so we can run it
> as part of tier1 testing. compiler/graalunit/HotspotTest.java is the most frequent failed test based on
> bugs statistics, so I included only this group of tests for now. They also takes reasonable amount of
> time to execute.
> 
>     JBS: https://bugs.openjdk.java.net/browse/JDK-8215728
>  webrev: http://cr.openjdk.java.net/~epavlova//8215728/webrev.00/index.html
> testing: tier1
> 
> 
> thanks,
> -katya


From ekaterina.pavlova at oracle.com  Wed Nov 13 21:38:19 2019
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Wed, 13 Nov 2019 13:38:19 -0800
Subject: RFR(T/XS) 8215728: [Graal] we should run some Graal tests in tier1
In-Reply-To: <E11D3454-18ED-4615-9067-A194E6153E70@oracle.com>
References: <ac890bc5-f864-ab7d-a53f-64fa03c52765@oracle.com>
 <E11D3454-18ED-4615-9067-A194E6153E70@oracle.com>
Message-ID: <9d853b2a-05b8-b5eb-dbe7-f6a00396269b@oracle.com>

The tests require -XX:+EnableJVMCI to be run with, so this is why I created separate group.

On 11/13/19 1:30 PM, Igor Ignatyev wrote:
> Hi Katya,
> 
> shouldn't this group be also into tier1_compiler group?
> 
> -- Igor
> 
>> On Nov 13, 2019, at 1:28 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>
>> Hi,
>>
>> please review this small patch which defines jtreg test group 'tier1_compiler_graal' so we can run it
>> as part of tier1 testing. compiler/graalunit/HotspotTest.java is the most frequent failed test based on
>> bugs statistics, so I included only this group of tests for now. They also takes reasonable amount of
>> time to execute.
>>
>>      JBS: https://bugs.openjdk.java.net/browse/JDK-8215728
>>   webrev: http://cr.openjdk.java.net/~epavlova//8215728/webrev.00/index.html
>> testing: tier1
>>
>>
>> thanks,
>> -katya
> 


From igor.ignatyev at oracle.com  Wed Nov 13 21:45:55 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 13 Nov 2019 13:45:55 -0800
Subject: RFR(T/XS) 8215728: [Graal] we should run some Graal tests in tier1
In-Reply-To: <9d853b2a-05b8-b5eb-dbe7-f6a00396269b@oracle.com>
References: <ac890bc5-f864-ab7d-a53f-64fa03c52765@oracle.com>
 <E11D3454-18ED-4615-9067-A194E6153E70@oracle.com>
 <9d853b2a-05b8-b5eb-dbe7-f6a00396269b@oracle.com>
Message-ID: <72C6DC4D-1FD2-4710-8982-B08BCDA9023B@oracle.com>

all tier1 groups are expected to runnable as-is, so I think we need to update GraalUnitTestLauncher to pass -XX:+EnableJVMCI to the spawn JVM, and updated the test descriptions (and generateTests.sh) to require JVM w/ jvmci feature (@requires vm.jvmci)? then this will be a proper tier1 group.

-- Igor

> On Nov 13, 2019, at 1:38 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
> 
> The tests require -XX:+EnableJVMCI to be run with, so this is why I created separate group.
> 
> On 11/13/19 1:30 PM, Igor Ignatyev wrote:
>> Hi Katya,
>> shouldn't this group be also into tier1_compiler group?
>> -- Igor
>>> On Nov 13, 2019, at 1:28 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>> 
>>> Hi,
>>> 
>>> please review this small patch which defines jtreg test group 'tier1_compiler_graal' so we can run it
>>> as part of tier1 testing. compiler/graalunit/HotspotTest.java is the most frequent failed test based on
>>> bugs statistics, so I included only this group of tests for now. They also takes reasonable amount of
>>> time to execute.
>>> 
>>>     JBS: https://bugs.openjdk.java.net/browse/JDK-8215728
>>>  webrev: http://cr.openjdk.java.net/~epavlova//8215728/webrev.00/index.html
>>> testing: tier1
>>> 
>>> 
>>> thanks,
>>> -katya
> 


From dean.long at oracle.com  Thu Nov 14 04:15:40 2019
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Wed, 13 Nov 2019 20:15:40 -0800
Subject: RFR(M): 8233743: AArch64: Make r27 conditionally allocatable
In-Reply-To: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <b4de4a1c-9b25-1921-8a27-2946e7660dc2@oracle.com>

Hi Pengfei,

I took a quick look and didn't notice any problems.? Nice work! This 
seems to match the x64 approach, however please get other reviews.

dl

On 11/13/19 1:55 AM, Pengfei Li (Arm Technology China) wrote:
> Hi,
>
> JBS: https://bugs.openjdk.java.net/browse/JDK-8233743
> Webrev: http://cr.openjdk.java.net/~pli/rfr/8233743/webrev.00/
>
> This is a follow-up patch of JDK-8217909[1] to make the AArch64 register
> r27 allocatable when CompressedOops and CompressedClassPointers are both
> turned off.
>
> Below changes have been made:
> - Massage the RegMask(s) in reg_mask_init() at C2 initialization and
> remove r27 from some of the masks conditionally to make it allocatable.
> - Also make r29 conditionally reserved in this similar way.
> - Make r29 allocatable for pointers as well as integers.
> - Replace an rheapbase use to rscratch1 in AArch64 ZGC.
> - Revert JDK-8231754[2] which makes r27 always reserved in JVMCI.
>
> This patch aligns with the implementation in [1] which makes the x86_64
> r12 register allocatable. Please let me know if I have missed anything
> for AArch64.
>
> Tests:
> Full jtreg with default options and extra options "-XX:-UseCompressedOops
> -XX:+PreserveFramePointer". No new failure is found.
>
> [1] https://hg.openjdk.java.net/jdk/jdk/rev/48b50573dee4
> [2] https://hg.openjdk.java.net/jdk/jdk/rev/d068b1e534de
>
> --
> Thanks,
> Pengfei
>


From ekaterina.pavlova at oracle.com  Thu Nov 14 04:52:10 2019
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Wed, 13 Nov 2019 20:52:10 -0800
Subject: RFR(T/XS) 8215728: [Graal] we should run some Graal tests in tier1
In-Reply-To: <72C6DC4D-1FD2-4710-8982-B08BCDA9023B@oracle.com>
References: <ac890bc5-f864-ab7d-a53f-64fa03c52765@oracle.com>
 <E11D3454-18ED-4615-9067-A194E6153E70@oracle.com>
 <9d853b2a-05b8-b5eb-dbe7-f6a00396269b@oracle.com>
 <72C6DC4D-1FD2-4710-8982-B08BCDA9023B@oracle.com>
Message-ID: <93eb8e6c-9b36-5797-4445-635db0c0e509@oracle.com>

-XX:+EnableJVMCI is already passed to the spawn JVM by GraalUnitTestLauncher.
However -XX:+EnableJVMCI is also required for GraalUnitTestLauncher itself so
getModuleExports() function works properly for graal modules.
Also note that we can't pass '-XX:+EnableJVMCI' to GraalUnitTestLauncher in jtreg
directive as we use '@run main compiler.graalunit.common.GraalUnitTestLauncher' to launch it.
See also discussion regarding this issue in JDK-8216551.

Anyway, I understand the point regarding tier1 and will see what can be done.

thanks,
-katya

On 11/13/19 1:45 PM, Igor Ignatyev wrote:
> all tier1 groups are expected to runnable as-is, so I think we need to update GraalUnitTestLauncher to pass -XX:+EnableJVMCI to the spawn JVM, 
> and updated the test descriptions (and generateTests.sh) to require JVM w/ jvmci feature (@requires vm.jvmci)? then this will be a proper tier1 group.
> 
> -- Igor
> 
>> On Nov 13, 2019, at 1:38 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>
>> The tests require -XX:+EnableJVMCI to be run with, so this is why I created separate group.
>>
>> On 11/13/19 1:30 PM, Igor Ignatyev wrote:
>>> Hi Katya,
>>> shouldn't this group be also into tier1_compiler group?
>>> -- Igor
>>>> On Nov 13, 2019, at 1:28 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> please review this small patch which defines jtreg test group 'tier1_compiler_graal' so we can run it
>>>> as part of tier1 testing. compiler/graalunit/HotspotTest.java is the most frequent failed test based on
>>>> bugs statistics, so I included only this group of tests for now. They also takes reasonable amount of
>>>> time to execute.
>>>>
>>>>      JBS: https://bugs.openjdk.java.net/browse/JDK-8215728
>>>>   webrev: http://cr.openjdk.java.net/~epavlova//8215728/webrev.00/index.html
>>>> testing: tier1
>>>>
>>>>
>>>> thanks,
>>>> -katya
>>
> 


From igor.ignatyev at oracle.com  Thu Nov 14 04:59:37 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 13 Nov 2019 20:59:37 -0800
Subject: RFR(T/XS) 8215728: [Graal] we should run some Graal tests in tier1
In-Reply-To: <93eb8e6c-9b36-5797-4445-635db0c0e509@oracle.com>
References: <ac890bc5-f864-ab7d-a53f-64fa03c52765@oracle.com>
 <E11D3454-18ED-4615-9067-A194E6153E70@oracle.com>
 <9d853b2a-05b8-b5eb-dbe7-f6a00396269b@oracle.com>
 <72C6DC4D-1FD2-4710-8982-B08BCDA9023B@oracle.com>
 <93eb8e6c-9b36-5797-4445-635db0c0e509@oracle.com>
Message-ID: <A991B777-A769-4439-8955-CE7752F7300D@oracle.com>

if GraalUnitTestLauncher has to be run w/ -XX:+EnableJVMCI, I guess we just have to switch back to othervm mode.
-- Igor 

> On Nov 13, 2019, at 8:52 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
> 
> -XX:+EnableJVMCI is already passed to the spawn JVM by GraalUnitTestLauncher.
> However -XX:+EnableJVMCI is also required for GraalUnitTestLauncher itself so
> getModuleExports() function works properly for graal modules.
> Also note that we can't pass '-XX:+EnableJVMCI' to GraalUnitTestLauncher in jtreg
> directive as we use '@run main compiler.graalunit.common.GraalUnitTestLauncher' to launch it.
> See also discussion regarding this issue in JDK-8216551.
> 
> Anyway, I understand the point regarding tier1 and will see what can be done.
> 
> thanks,
> -katya
> 
> On 11/13/19 1:45 PM, Igor Ignatyev wrote:
>> all tier1 groups are expected to runnable as-is, so I think we need to update GraalUnitTestLauncher to pass -XX:+EnableJVMCI to the spawn JVM, and updated the test descriptions (and generateTests.sh) to require JVM w/ jvmci feature (@requires vm.jvmci)? then this will be a proper tier1 group.
>> -- Igor
>>> On Nov 13, 2019, at 1:38 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>> 
>>> The tests require -XX:+EnableJVMCI to be run with, so this is why I created separate group.
>>> 
>>> On 11/13/19 1:30 PM, Igor Ignatyev wrote:
>>>> Hi Katya,
>>>> shouldn't this group be also into tier1_compiler group?
>>>> -- Igor
>>>>> On Nov 13, 2019, at 1:28 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> please review this small patch which defines jtreg test group 'tier1_compiler_graal' so we can run it
>>>>> as part of tier1 testing. compiler/graalunit/HotspotTest.java is the most frequent failed test based on
>>>>> bugs statistics, so I included only this group of tests for now. They also takes reasonable amount of
>>>>> time to execute.
>>>>> 
>>>>>     JBS: https://bugs.openjdk.java.net/browse/JDK-8215728
>>>>>  webrev: http://cr.openjdk.java.net/~epavlova//8215728/webrev.00/index.html
>>>>> testing: tier1
>>>>> 
>>>>> 
>>>>> thanks,
>>>>> -katya
>>> 
> 


From dean.long at oracle.com  Thu Nov 14 05:12:22 2019
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Wed, 13 Nov 2019 21:12:22 -0800
Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
 check
In-Reply-To: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
References: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
Message-ID: <6f1dbc0a-b1a4-43e8-65e1-1e0df2115c33@oracle.com>

Hi Patric.? I was expecting the fix to allow the following existing 
logic in DivINode::Ideal to work:

 ? // Check for excluding div-zero case
 ? if (in(0) && (ti->_hi < 0 || ti->_lo > 0)) {
 ??? set_req(0, NULL);?????????? // Yank control input
 ??? return this;
 ? }

by making sure the range of "ti" has been sharpened by the previous 
if-node.? I was just wondering if you looked at that solution and 
thought it was feasible.? I see Parse::sharpen_type_after_if() is almost 
doing the right thing, but only handles BoolTest::eq.

dl

On 11/12/19 6:16 AM, Patric Hedlin wrote:
> Dear all,
>
> I would like to ask for help to review the following change/update:
>
> Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376
> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/
>
> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check
>
> ??? Adding a simple subsumption test to IfNode::Ideal to enable a local
> ??? short-circuit for (obviously) redundant if-nodes.
>
> Testing: hs-tier1-4, hs-precheckin-comp
>
>
> Best regards,
> Patric
>


From vladimir.x.ivanov at oracle.com  Thu Nov 14 08:31:12 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 14 Nov 2019 11:31:12 +0300
Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
 check
In-Reply-To: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
References: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
Message-ID: <82e3e8df-e6dc-5638-73a4-c5738b33fdad@oracle.com>


> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/

I just briefly looked through the patch and have a quick high-level 
question:

+  // Rewrite:
+  //              cmp                   cmp
+  //              / \                    |
+  //     (r1)  bool  \                  bool (r1)
+  //            /    bool (r2)            \
+  //    (dom) if       \        ==>       if
+  //            \       )                  \
+  //    (pre)  if[TF]  /                  if[TF]X
+  //               \  /
+  //                if (this)
+  //               /  \
+  //             ifT  ifF [X]

Why do you do complex graph surgery instead of simply adjusting 
condition at redundant If (to 0/1) and let existing logic to eliminate it?

Best regards,
Vladimir Ivanov

> 
> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check
> 
>  ??? Adding a simple subsumption test to IfNode::Ideal to enable a local
>  ??? short-circuit for (obviously) redundant if-nodes.
> 
> Testing: hs-tier1-4, hs-precheckin-comp
> 
> 
> Best regards,
> Patric
> 

From tobias.hartmann at oracle.com  Thu Nov 14 09:08:25 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 14 Nov 2019 10:08:25 +0100
Subject: RFR: 8234003: Improve IndexSet iteration
In-Reply-To: <0f55bfa2-f6e7-9c9e-e10f-a97a909eeb98@oracle.com>
References: <0f55bfa2-f6e7-9c9e-e10f-a97a909eeb98@oracle.com>
Message-ID: <48900cd6-3616-7e5c-dd26-f8475f4f0eb4@oracle.com>

Hi Claes,

nice cleanup, looks good to me!

Just noticed that you've (intentionally?) changed the indentation of the comment in live.cpp:259.

Best regards,
Tobias

On 13.11.19 00:09, Claes Redestad wrote:
> Hi,
> 
> a significant portion of work done during register allocation in C2 is
> iterating over IndexSets.
> 
> A few small optimizations show a ~4% decrease in instructions retired by
> register allocation when instrumenting, and up to 3% fewer instructions
> retired in total on startup tests.
> 
> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8234003
> Webrev: http://cr.openjdk.java.net/~redestad/8234003/open.00/
> 
> The biggest improvement comes from avoiding iterating over empty sets
> altogether. A smaller improvement from adding a water mark to avoid
> iterating over all the blocks in the IndexSet.
> 
> Testing: tier1-3, verified improvements on large and tiny startup tests,
> checked that any increased inlining is footprint neutral on Linux.
> 
> Thanks!
> 
> /Claes
> 

From martin.doerr at sap.com  Thu Nov 14 09:29:08 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 14 Nov 2019 09:29:08 +0000
Subject: RFR(S) : 8225756 : [testbug]
 compiler/loopstripmining/CheckLoopStripMining.java sets too short a
 SafepointTimeoutDelay
In-Reply-To: <C673A8AF-41BE-4465-B03E-B54A42A3589B@oracle.com>
References: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com>
 <87a790dtfz.fsf@redhat.com>
 <VI1PR0201MB247990D2851CF64E7F0BF5FA9A760@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <C673A8AF-41BE-4465-B03E-B54A42A3589B@oracle.com>
Message-ID: <VI1PR0201MB2479A9378878BD689C3A04D49A710@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Excellent. Thank you.
Martin


> -----Original Message-----
> From: Igor Ignatyev <igor.ignatyev at oracle.com>
> Sent: Mittwoch, 13. November 2019 20:31
> To: Doerr, Martin <martin.doerr at sap.com>; Ekaterina Pavlova
> <ekaterina.pavlova at oracle.com>; Vladimir Kozlov
> <vladimir.kozlov at oracle.com>; Roland Westrelin <rwestrel at redhat.com>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR(S) : 8225756 : [testbug]
> compiler/loopstripmining/CheckLoopStripMining.java sets too short a
> SafepointTimeoutDelay
> 
> @Martin,
> I've updated the test according to your comments and also added 'to prevent
> biased locking handshakes from changing the timing' comment preceding '-
> XX:-UseBiasedLocking'.
> 
> @all, thanks for your review, pushed.
> 
> -- Igor
> 
> 
> > On Nov 13, 2019, at 3:37 AM, Doerr, Martin <martin.doerr at sap.com>
> wrote:
> >
> > Hi Igor,
> >
> > thanks for improving it.
> >
> > Please note that this test was derived from
> >
> test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.ja
> va
> > JDK-8227528 has added -XX:-UseBiasedLocking.
> >
> > Would you mind adding that to your new version, too?
> >
> > Please also remove double-whitespace before Utils.adjustTimeout(500).
> >
> > Thanks and best regards,
> > Martin
> >
> >
> >> -----Original Message-----
> >> From: hotspot-compiler-dev <hotspot-compiler-dev-
> >> bounces at openjdk.java.net> On Behalf Of Roland Westrelin
> >> Sent: Mittwoch, 13. November 2019 10:20
> >> To: Igor Ignatyev <igor.ignatyev at oracle.com>; hotspot compiler
> <hotspot-
> >> compiler-dev at openjdk.java.net>
> >> Subject: Re: RFR(S) : 8225756 : [testbug]
> >> compiler/loopstripmining/CheckLoopStripMining.java sets too short a
> >> SafepointTimeoutDelay
> >>
> >>
> >> Looks reasonable to me.
> >>
> >> Thanks for fixing this, Igor.
> >>
> >> Roland.
> >


From aph at redhat.com  Thu Nov 14 10:40:53 2019
From: aph at redhat.com (Andrew Haley)
Date: Thu, 14 Nov 2019 10:40:53 +0000
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>

On 11/13/19 9:55 AM, Pengfei Li (Arm Technology China) wrote:
> This patch aligns with the implementation in [1] which makes the x86_64
> r12 register allocatable. Please let me know if I have missed anything
> for AArch64.

We don't generally use r27 for compressed class pointers.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From christoph.goettschkes at microdoc.com  Thu Nov 14 11:20:33 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Thu, 14 Nov 2019 12:20:33 +0100
Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java
 only works with server VMs.
In-Reply-To: <71204849-979a-bd2e-b302-69ac81a44bca@oracle.com>
References: <20191112120936.1D826D285F@aojmv0009>
 <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com>
 <2w8hcs8xd1-1@aserp2030.oracle.com>
 <7BEE375F-312E-4EE5-B0FA-2223752689A3@oracle.com>
 <71204849-979a-bd2e-b302-69ac81a44bca@oracle.com>
Message-ID: <mailman.8.1573730534.19479.hotspot-compiler-dev@openjdk.java.net>

Thanks for your feedback, this resolves my concerns and I am happy with 
the solution. I integrated the suggestions from Vladimir, here is the 
latest webrev:

https://cr.openjdk.java.net/~cgo/8231954/webrev.02/

I re-tested and it works as expected.
Please give your consent if this is fine for you as well.

-- Christoph

Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote on 2019-11-13 20:32:18:

> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> To: Igor Ignatyev <igor.ignatyev at oracle.com>, 
christoph.goettschkes at microdoc.com
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Date: 2019-11-13 20:32
> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/
> TestCharVect2.java only works with server VMs.
> 
> On 11/13/19 11:11 AM, Igor Ignatyev wrote:
> > @Christoph,
> > 
> > webrev.01 looks good to me.
> > I always thought that jvmci feature can be built only when compiler2 
> feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp 
> suggests that jvmci can be used w/o compiler2; I don't think we have 
> ever build/test, let alone support, this configuration.
> > 
> > @Vladimir,
> > did/do we plan to support compiler1 + jvmci w/o compiler2 
configuration?
> 
> Yes. It could be configuration when we start looking on replacing C1 
> with Graal. I think several people were interested 
> in "Client VM" like configuration.
> Also Server configuration without C2 (with Graal or other jvmci 
> compiler) which would be out configuration in a future.
> 
> But I would prefer to be more explicit in these changes:
> 
> @requires vm.compiler2.enabled | vm.graal.enabled
> 
> Thanks,
> Vladimir
> 
> > 
> > Thanks,
> > -- Igor
> > 
> >> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com 
wrote:
> >>
> >> Hi Igor,
> >>
> >> thanks for your explanation.
> >>
> >> Igor Ignatyev <igor.ignatyev at oracle.com> wrote on 2019-11-12 
20:40:46:
> >>
> >>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our 
tests, as
> >>> in most cases, it causes wasted compute time (as in this test) and 
can
> >>> also lead to wrong/deprecated/deleted flags sneaking into the 
testbase
> >>
> >> Agreed. I also wanted to discuss this, since I think that your 
solution
> >> is better than mine, but at the same time, I saw possible problems 
with
> >> it, see below.
> >>
> >>> as '@requires vm.flavor == "server"' filters configurations based vm
> >>> build type, it will still allow execution on JVM w/ JVMCI and when 
JVMCI
> >>> compiler is selected, as it will still be Server VM build. so, in a
> >>> sense, the test will be w/ JVMCI in the same way as w/ your 
approach.
> >>
> >> My concern is not about server VMs with JVMCI, but client VMs with 
JVMCI
> >> enabled. Is this a valid configuration? The MaxVectorSize option is
> >> defined in [1] as well as in [2], so for me it looks like 
MaxVectorSize
> >> can be used for any VM variant as long as JVMCI is enabled. The
> >> configure script also states that both compilers are possible (if you
> >> configure with --with-jvm-features='jvmci'):
> >>
> >> configure: error: Specified JVM feature 'jvmci' requires feature
> >> 'compiler2' or 'compiler1'
> >>
> >> Should maybe the requires tag "vm.jvmci" be used as well, like:
> >>
> >>     @requires vm.flavor == "server" | vm.jvmci
> >>
> >>> this is the known limitation of jtreg/@requires, and our current way 
to
> >>> workaround it is to split a test description based on @requires 
values
> >>
> >> Yes, if the @requires tag is used, splitting up the test looks like a 
good
> >> idea. I didn't know that it is possible to have multiple test 
descriptions
> >> in one test file.
> >>
> >> I created a new webrev with the new ideas:
> >>
> >> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/
> >>
> >> I tested with an amd64 client and server VM and it looks good. I am
> >> currently unable to build a client VM with JVMCI enabled, hence no 
test
> >> for that yet. I get compile errors and as soon as I resolve those,
> >> runtime errors occur. Before I look into that, I would like to know 
if
> >> client VMs with JVMCI enabled are supported or not.
> >>
> >> Thanks,
> >> Christoph
> >>
> >> [1]
> >> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/
> share/opto/c2_globals.hpp
> >>
> >> [2]
> >> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/
> share/jvmci/jvmci_globals.hpp
> >>
> > 
> 


From claes.redestad at oracle.com  Thu Nov 14 13:36:37 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Thu, 14 Nov 2019 14:36:37 +0100
Subject: RFR: 8234003: Improve IndexSet iteration
In-Reply-To: <48900cd6-3616-7e5c-dd26-f8475f4f0eb4@oracle.com>
References: <0f55bfa2-f6e7-9c9e-e10f-a97a909eeb98@oracle.com>
 <48900cd6-3616-7e5c-dd26-f8475f4f0eb4@oracle.com>
Message-ID: <1db8cd03-7553-416e-abfe-1e50ed9b82e3@oracle.com>

Nils, Tobias,


On 2019-11-14 10:08, Tobias Hartmann wrote:
> Hi Claes,
> 
> nice cleanup, looks good to me!

thank you for reviewing!

> 
> Just noticed that you've (intentionally?) changed the indentation of the comment in live.cpp:259.

Yes, seems like my IDE stumbled a bit there.. will fix before push.

/Claes

From martin.doerr at sap.com  Thu Nov 14 16:18:23 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 14 Nov 2019 16:18:23 +0000
Subject: RFR(S): 8233193: Incorrect bailout from possibly_add_compiler_threads
Message-ID: <VI1PR0201MB2479DBD887D69E520208951E9A710@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi,

I'd like to cleanup exception handling in CompileBroker a little bit.

Here's my proposal:
- Use THREAD instead of CHECK where no exceptions get thrown.
- Remove unused preload_classes.
- make_thread: Rename thread to new_thread to avoid confusion with THREAD (the current compiler thread).
- possibly_add_compiler_threads: Remove usage of EXCEPTION_MARK + CHECK because this functions is not supposed to kill the VM on exceptions. Add assertion to caller.

Webrev:
http://cr.openjdk.java.net/~mdoerr/8233193_CompileBroker/webrev.00/

@David:
You didn't like usage of the CHECK macro in the initialization functions, but I think they are ok.
Not very nice to read, but the behavior looks ok to me.
At least, I didn't find a better replacement for them. Maybe you have a proposal?

Best regards,
Martin


From igor.ignatyev at oracle.com  Thu Nov 14 16:44:38 2019
From: igor.ignatyev at oracle.com (Igor Ignatev)
Date: Thu, 14 Nov 2019 08:44:38 -0800
Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java
 only works with server VMs.
In-Reply-To: <2w947d9dgm-1@aserp2050.oracle.com>
References: <2w947d9dgm-1@aserp2050.oracle.com>
Message-ID: <9725CF6F-64B1-4101-BB3A-801C841A8441@oracle.com>

LGTM

? Igor

> On Nov 14, 2019, at 3:22 AM, christoph.goettschkes at microdoc.com wrote:
> 
> ?Thanks for your feedback, this resolves my concerns and I am happy with 
> the solution. I integrated the suggestions from Vladimir, here is the 
> latest webrev:
> 
> https://cr.openjdk.java.net/~cgo/8231954/webrev.02/
> 
> I re-tested and it works as expected.
> Please give your consent if this is fine for you as well.
> 
> -- Christoph
> 
> Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote on 2019-11-13 20:32:18:
> 
>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> To: Igor Ignatyev <igor.ignatyev at oracle.com>, 
> christoph.goettschkes at microdoc.com
>> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>> Date: 2019-11-13 20:32
>> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/
>> TestCharVect2.java only works with server VMs.
>> 
>>> On 11/13/19 11:11 AM, Igor Ignatyev wrote:
>>> @Christoph,
>>> 
>>> webrev.01 looks good to me.
>>> I always thought that jvmci feature can be built only when compiler2 
>> feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp 
>> suggests that jvmci can be used w/o compiler2; I don't think we have 
>> ever build/test, let alone support, this configuration.
>>> 
>>> @Vladimir,
>>> did/do we plan to support compiler1 + jvmci w/o compiler2 
> configuration?
>> 
>> Yes. It could be configuration when we start looking on replacing C1 
>> with Graal. I think several people were interested 
>> in "Client VM" like configuration.
>> Also Server configuration without C2 (with Graal or other jvmci 
>> compiler) which would be out configuration in a future.
>> 
>> But I would prefer to be more explicit in these changes:
>> 
>> @requires vm.compiler2.enabled | vm.graal.enabled
>> 
>> Thanks,
>> Vladimir
>> 
>>> 
>>> Thanks,
>>> -- Igor
>>> 
>>>> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com 
> wrote:
>>>> 
>>>> Hi Igor,
>>>> 
>>>> thanks for your explanation.
>>>> 
>>>> Igor Ignatyev <igor.ignatyev at oracle.com> wrote on 2019-11-12 
> 20:40:46:
>>>> 
>>>>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our 
> tests, as
>>>>> in most cases, it causes wasted compute time (as in this test) and 
> can
>>>>> also lead to wrong/deprecated/deleted flags sneaking into the 
> testbase
>>>> 
>>>> Agreed. I also wanted to discuss this, since I think that your 
> solution
>>>> is better than mine, but at the same time, I saw possible problems 
> with
>>>> it, see below.
>>>> 
>>>>> as '@requires vm.flavor == "server"' filters configurations based vm
>>>>> build type, it will still allow execution on JVM w/ JVMCI and when 
> JVMCI
>>>>> compiler is selected, as it will still be Server VM build. so, in a
>>>>> sense, the test will be w/ JVMCI in the same way as w/ your 
> approach.
>>>> 
>>>> My concern is not about server VMs with JVMCI, but client VMs with 
> JVMCI
>>>> enabled. Is this a valid configuration? The MaxVectorSize option is
>>>> defined in [1] as well as in [2], so for me it looks like 
> MaxVectorSize
>>>> can be used for any VM variant as long as JVMCI is enabled. The
>>>> configure script also states that both compilers are possible (if you
>>>> configure with --with-jvm-features='jvmci'):
>>>> 
>>>> configure: error: Specified JVM feature 'jvmci' requires feature
>>>> 'compiler2' or 'compiler1'
>>>> 
>>>> Should maybe the requires tag "vm.jvmci" be used as well, like:
>>>> 
>>>>    @requires vm.flavor == "server" | vm.jvmci
>>>> 
>>>>> this is the known limitation of jtreg/@requires, and our current way 
> to
>>>>> workaround it is to split a test description based on @requires 
> values
>>>> 
>>>> Yes, if the @requires tag is used, splitting up the test looks like a 
> good
>>>> idea. I didn't know that it is possible to have multiple test 
> descriptions
>>>> in one test file.
>>>> 
>>>> I created a new webrev with the new ideas:
>>>> 
>>>> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/
>>>> 
>>>> I tested with an amd64 client and server VM and it looks good. I am
>>>> currently unable to build a client VM with JVMCI enabled, hence no 
> test
>>>> for that yet. I get compile errors and as soon as I resolve those,
>>>> runtime errors occur. Before I look into that, I would like to know 
> if
>>>> client VMs with JVMCI enabled are supported or not.
>>>> 
>>>> Thanks,
>>>> Christoph
>>>> 
>>>> [1]
>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/
>> share/opto/c2_globals.hpp
>>>> 
>>>> [2]
>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/
>> share/jvmci/jvmci_globals.hpp
>>>> 
>>> 
>> 
> 


From dean.long at oracle.com  Thu Nov 14 17:23:25 2019
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Thu, 14 Nov 2019 09:23:25 -0800
Subject: RFR(M) 8233841: Update Graal
Message-ID: <e618ff51-8315-7db6-39f6-e2457ec2d98b@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8233841
http://cr.openjdk.java.net/~dlong/8233841/webrev/

This is a Graal update.? Changes since the last update (JDK-8231973) are 
listed in the bug description.

dl

From vladimir.kozlov at oracle.com  Thu Nov 14 17:53:20 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 14 Nov 2019 09:53:20 -0800
Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java
 only works with server VMs.
In-Reply-To: <9725CF6F-64B1-4101-BB3A-801C841A8441@oracle.com>
References: <2w947d9dgm-1@aserp2050.oracle.com>
 <9725CF6F-64B1-4101-BB3A-801C841A8441@oracle.com>
Message-ID: <79dba9cf-46c4-1a3d-aba5-28622f2bd505@oracle.com>

+1

Vladimir

On 11/14/19 8:44 AM, Igor Ignatev wrote:
> LGTM
> 
> ? Igor
> 
>> On Nov 14, 2019, at 3:22 AM, christoph.goettschkes at microdoc.com wrote:
>>
>> ?Thanks for your feedback, this resolves my concerns and I am happy with
>> the solution. I integrated the suggestions from Vladimir, here is the
>> latest webrev:
>>
>> https://cr.openjdk.java.net/~cgo/8231954/webrev.02/
>>
>> I re-tested and it works as expected.
>> Please give your consent if this is fine for you as well.
>>
>> -- Christoph
>>
>> Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote on 2019-11-13 20:32:18:
>>
>>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>>> To: Igor Ignatyev <igor.ignatyev at oracle.com>,
>> christoph.goettschkes at microdoc.com
>>> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>>> Date: 2019-11-13 20:32
>>> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/
>>> TestCharVect2.java only works with server VMs.
>>>
>>>> On 11/13/19 11:11 AM, Igor Ignatyev wrote:
>>>> @Christoph,
>>>>
>>>> webrev.01 looks good to me.
>>>> I always thought that jvmci feature can be built only when compiler2
>>> feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp
>>> suggests that jvmci can be used w/o compiler2; I don't think we have
>>> ever build/test, let alone support, this configuration.
>>>>
>>>> @Vladimir,
>>>> did/do we plan to support compiler1 + jvmci w/o compiler2
>> configuration?
>>>
>>> Yes. It could be configuration when we start looking on replacing C1
>>> with Graal. I think several people were interested
>>> in "Client VM" like configuration.
>>> Also Server configuration without C2 (with Graal or other jvmci
>>> compiler) which would be out configuration in a future.
>>>
>>> But I would prefer to be more explicit in these changes:
>>>
>>> @requires vm.compiler2.enabled | vm.graal.enabled
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> Thanks,
>>>> -- Igor
>>>>
>>>>> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com
>> wrote:
>>>>>
>>>>> Hi Igor,
>>>>>
>>>>> thanks for your explanation.
>>>>>
>>>>> Igor Ignatyev <igor.ignatyev at oracle.com> wrote on 2019-11-12
>> 20:40:46:
>>>>>
>>>>>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our
>> tests, as
>>>>>> in most cases, it causes wasted compute time (as in this test) and
>> can
>>>>>> also lead to wrong/deprecated/deleted flags sneaking into the
>> testbase
>>>>>
>>>>> Agreed. I also wanted to discuss this, since I think that your
>> solution
>>>>> is better than mine, but at the same time, I saw possible problems
>> with
>>>>> it, see below.
>>>>>
>>>>>> as '@requires vm.flavor == "server"' filters configurations based vm
>>>>>> build type, it will still allow execution on JVM w/ JVMCI and when
>> JVMCI
>>>>>> compiler is selected, as it will still be Server VM build. so, in a
>>>>>> sense, the test will be w/ JVMCI in the same way as w/ your
>> approach.
>>>>>
>>>>> My concern is not about server VMs with JVMCI, but client VMs with
>> JVMCI
>>>>> enabled. Is this a valid configuration? The MaxVectorSize option is
>>>>> defined in [1] as well as in [2], so for me it looks like
>> MaxVectorSize
>>>>> can be used for any VM variant as long as JVMCI is enabled. The
>>>>> configure script also states that both compilers are possible (if you
>>>>> configure with --with-jvm-features='jvmci'):
>>>>>
>>>>> configure: error: Specified JVM feature 'jvmci' requires feature
>>>>> 'compiler2' or 'compiler1'
>>>>>
>>>>> Should maybe the requires tag "vm.jvmci" be used as well, like:
>>>>>
>>>>>     @requires vm.flavor == "server" | vm.jvmci
>>>>>
>>>>>> this is the known limitation of jtreg/@requires, and our current way
>> to
>>>>>> workaround it is to split a test description based on @requires
>> values
>>>>>
>>>>> Yes, if the @requires tag is used, splitting up the test looks like a
>> good
>>>>> idea. I didn't know that it is possible to have multiple test
>> descriptions
>>>>> in one test file.
>>>>>
>>>>> I created a new webrev with the new ideas:
>>>>>
>>>>> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/
>>>>>
>>>>> I tested with an amd64 client and server VM and it looks good. I am
>>>>> currently unable to build a client VM with JVMCI enabled, hence no
>> test
>>>>> for that yet. I get compile errors and as soon as I resolve those,
>>>>> runtime errors occur. Before I look into that, I would like to know
>> if
>>>>> client VMs with JVMCI enabled are supported or not.
>>>>>
>>>>> Thanks,
>>>>> Christoph
>>>>>
>>>>> [1]
>>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/
>>> share/opto/c2_globals.hpp
>>>>>
>>>>> [2]
>>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/
>>> share/jvmci/jvmci_globals.hpp
>>>>>
>>>>
>>>
>>
> 

From vladimir.kozlov at oracle.com  Thu Nov 14 17:57:17 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 14 Nov 2019 09:57:17 -0800
Subject: RFR(M) 8233841: Update Graal
In-Reply-To: <e618ff51-8315-7db6-39f6-e2457ec2d98b@oracle.com>
References: <e618ff51-8315-7db6-39f6-e2457ec2d98b@oracle.com>
Message-ID: <5f3d6026-2bd8-0385-06e1-506c2aad4c7e@oracle.com>

Looks good. Tests results good too.

Thanks,
Vladimir

On 11/14/19 9:23 AM, dean.long at oracle.com wrote:
> https://bugs.openjdk.java.net/browse/JDK-8233841
> http://cr.openjdk.java.net/~dlong/8233841/webrev/
> 
> This is a Graal update.? Changes since the last update (JDK-8231973) are listed in the bug description.
> 
> dl

From tom.rodriguez at oracle.com  Thu Nov 14 18:17:13 2019
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Thu, 14 Nov 2019 10:17:13 -0800
Subject: RFR(XS) 8233888:
 jdk.vm.ci.hotspot.test.VirtualObjectLayoutTest.testFormat(): Unexpected error
 verifying
Message-ID: <02ef4b41-a693-e671-b549-a7e007fb549d@oracle.com>

http://cr.openjdk.java.net/~never/8233888/webrev
https://bugs.openjdk.java.net/browse/JDK-8233888

A recently added unit test was making assumptions about field alignment 
that didn't hold when compressed oops were disabled.  The fix is to 
adjust the starting point for the debug info depending on the alignment 
of the first field.  Tested locally with and without compressed oops.

tom

From igor.ignatyev at oracle.com  Thu Nov 14 18:28:32 2019
From: igor.ignatyev at oracle.com (Igor Ignatev)
Date: Thu, 14 Nov 2019 10:28:32 -0800
Subject: RFR(XS) 8233888:
 jdk.vm.ci.hotspot.test.VirtualObjectLayoutTest.testFormat(): Unexpected error
 verifying
In-Reply-To: <02ef4b41-a693-e671-b549-a7e007fb549d@oracle.com>
References: <02ef4b41-a693-e671-b549-a7e007fb549d@oracle.com>
Message-ID: <3FECF92F-62B8-4964-B8CA-664F6C832A0A@oracle.com>

Hi Tom,

LGTM

? Igor

> On Nov 14, 2019, at 10:17 AM, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
> 
> ?http://cr.openjdk.java.net/~never/8233888/webrev
> https://bugs.openjdk.java.net/browse/JDK-8233888
> 
> A recently added unit test was making assumptions about field alignment that didn't hold when compressed oops were disabled.  The fix is to adjust the starting point for the debug info depending on the alignment of the first field.  Tested locally with and without compressed oops.
> 
> tom


From vladimir.kozlov at oracle.com  Thu Nov 14 18:46:43 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 14 Nov 2019 10:46:43 -0800
Subject: RFR(XS) 8233888:
 jdk.vm.ci.hotspot.test.VirtualObjectLayoutTest.testFormat(): Unexpected error
 verifying
In-Reply-To: <3FECF92F-62B8-4964-B8CA-664F6C832A0A@oracle.com>
References: <02ef4b41-a693-e671-b549-a7e007fb549d@oracle.com>
 <3FECF92F-62B8-4964-B8CA-664F6C832A0A@oracle.com>
Message-ID: <0f87dac2-d383-4d0c-bfb0-1d9bcdc48d10@oracle.com>

+1.

Tom, or may be Igor better, can you run the test in the same configuration as in bug report to make sure it is fixed?

Thanks,
Vladimir

On 11/14/19 10:28 AM, Igor Ignatev wrote:
> Hi Tom,
> 
> LGTM
> 
> ? Igor
> 
>> On Nov 14, 2019, at 10:17 AM, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
>>
>> ?http://cr.openjdk.java.net/~never/8233888/webrev
>> https://bugs.openjdk.java.net/browse/JDK-8233888
>>
>> A recently added unit test was making assumptions about field alignment that didn't hold when compressed oops were disabled.  The fix is to adjust the starting point for the debug info depending on the alignment of the first field.  Tested locally with and without compressed oops.
>>
>> tom
> 

From dean.long at oracle.com  Thu Nov 14 20:06:54 2019
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Thu, 14 Nov 2019 12:06:54 -0800
Subject: RFR(M) 8233841: Update Graal
In-Reply-To: <5f3d6026-2bd8-0385-06e1-506c2aad4c7e@oracle.com>
References: <e618ff51-8315-7db6-39f6-e2457ec2d98b@oracle.com>
 <5f3d6026-2bd8-0385-06e1-506c2aad4c7e@oracle.com>
Message-ID: <86c3c5e6-191e-e92e-df28-52baab350937@oracle.com>

Thanks Vladimir.

dl

On 11/14/19 9:57 AM, Vladimir Kozlov wrote:
> Looks good. Tests results good too.
>
> Thanks,
> Vladimir
>
> On 11/14/19 9:23 AM, dean.long at oracle.com wrote:
>> https://bugs.openjdk.java.net/browse/JDK-8233841
>> http://cr.openjdk.java.net/~dlong/8233841/webrev/
>>
>> This is a Graal update.? Changes since the last update (JDK-8231973) 
>> are listed in the bug description.
>>
>> dl


From dl at cs.oswego.edu  Fri Nov 15 00:46:34 2019
From: dl at cs.oswego.edu (Doug Lea)
Date: Thu, 14 Nov 2019 19:46:34 -0500
Subject: final field values should be trusted as constant (filed as
 JDK-8233873)
In-Reply-To: <06DC8BED-60F2-438C-82FD-1578B7215D80@oracle.com>
References: <B3AE0B03-0AD0-4C1B-A5A4-67CF789A5B9D@oracle.com>
 <69de5d88-2487-850b-5388-6363d88d2b5b@cs.oswego.edu>
 <06DC8BED-60F2-438C-82FD-1578B7215D80@oracle.com>
Message-ID: <6b015267-9249-9b49-ced0-ec6b8e34a5fa@cs.oswego.edu>

On 11/13/19 4:23 PM, John Rose wrote:

> Also tricky is smashing of finals of normally-constructed objects, which is
> allowed by the JVM.  This is allowed even if heinous.  We need some
> Big Hammer to turn off optimization when it happens.  If we were really
> slick we might try to amend the JMM to declare that the JIT can retain
> a previously folded value, even after somebody smashed it, on the grounds
> that this is a valid race condition, where the JITted code is racily reading
> back in time.  (Is this already true?? 

I strongly suspect it is or could be sometimes true already, for example
when reads are hoisted out of loops. So there is not much of an argument
against the simplest interpretation of the final field rule: the first
value ever read by any thread is allowed to the only value ever used by
all threads. Although with some practical accommodations:

> 
> C. Demand that frameworks which make off-label objects be upgraded
> to include a memory fence after they are done setting their finals, so
> that they conform to the JMM behaviors required of regular objects.

I think that some (all the reliable ones!) already do so. Dependency
injection frameworks (which can be worst offenders) were the main reason
for adding explicit fences to Unsafe in jdk7, prior to the current full
VarHandle (plus fence) scheme in jdk9. If frameworks don't use them,
programs are still buggy. Where "buggy" means that readers are only
guaranteed to see a typesafe value. Maybe with a a context-specific
stronger guarantee; maybe not. My sense is that even the most egregious
cases set fields before publishing to make accessible to other threads.
Otherwise even now, bad things would routinely happen when multiple
threads disagree about values.

> E. Some balanced combination of the above. 

The remaining balance may be a matter of leaving a couple of loopholes:
System.setIn and a few others could be treated magically, which they
already are (but perhaps with an apologetic-sounding new annotation
@NotReallyFinal).  And dealing with the dubious legality of setting a
final field more than once, with intervening reads, in a constructor.
Maybe this could be deprecated, since this case doesn't seem to have a
semantics, and would remove the weirdness that "could be final" analysis
does always not apply to finals.

-Doug


From vladimir.kozlov at oracle.com  Fri Nov 15 01:18:28 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 14 Nov 2019 17:18:28 -0800
Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
 check
In-Reply-To: <82e3e8df-e6dc-5638-73a4-c5738b33fdad@oracle.com>
References: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
 <82e3e8df-e6dc-5638-73a4-c5738b33fdad@oracle.com>
Message-ID: <1e6d980f-5676-31f7-510c-7c7dc746540a@oracle.com>

I second this.

Vladimir K.

On 11/14/19 12:31 AM, Vladimir Ivanov wrote:
> 
> 
>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/
> 
> I just briefly looked through the patch and have a quick high-level question:
> 
> +? // Rewrite:
> +? //????????????? cmp?????????????????? cmp
> +? //????????????? / \??????????????????? |
> +? //???? (r1)? bool? \????????????????? bool (r1)
> +? //??????????? /??? bool (r2)??????????? \
> +? //??? (dom) if?????? \??????? ==>?????? if
> +? //??????????? \?????? )????????????????? \
> +? //??? (pre)? if[TF]? /????????????????? if[TF]X
> +? //?????????????? \? /
> +? //??????????????? if (this)
> +? //?????????????? /? \
> +? //???????????? ifT? ifF [X]
> 
> Why do you do complex graph surgery instead of simply adjusting condition at redundant If (to 0/1) and let existing 
> logic to eliminate it?
> 
> Best regards,
> Vladimir Ivanov
> 
>>
>> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check
>>
>> ???? Adding a simple subsumption test to IfNode::Ideal to enable a local
>> ???? short-circuit for (obviously) redundant if-nodes.
>>
>> Testing: hs-tier1-4, hs-precheckin-comp
>>
>>
>> Best regards,
>> Patric
>>

From Yang.Zhang at arm.com  Fri Nov 15 03:35:54 2019
From: Yang.Zhang at arm.com (Yang Zhang (Arm Technology China))
Date: Fri, 15 Nov 2019 03:35:54 +0000
Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java
 only works with server VMs.
In-Reply-To: <20191114112213.D20B8D9B6E@aojmv0009>
References: <20191112120936.1D826D285F@aojmv0009>
 <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com>
 <2w8hcs8xd1-1@aserp2030.oracle.com>
 <7BEE375F-312E-4EE5-B0FA-2223752689A3@oracle.com>
 <71204849-979a-bd2e-b302-69ac81a44bca@oracle.com>
 <20191114112213.D20B8D9B6E@aojmv0009>
Message-ID: <VE1PR08MB4656A89817FF722ECF400A778E700@VE1PR08MB4656.eurprd08.prod.outlook.com>

Hi Christoph, Igor, Vladimir,

Thanks very much for your fix. After discussion, we have got a better solution for this issue. Do we need to change the following files in which MaxVectorSize option is used?

[1] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestNaNVector.java
[2] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestPopCountVector.java
[3] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/c2/cr6340864

Ps. For [3],  it locates in c2 directory. So I'm not sure whether they will be excluded in jtreg test with client mode.

Regards
Yang

-----Original Message-----
From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of christoph.goettschkes at microdoc.com
Sent: Thursday, November 14, 2019 7:21 PM
To: vladimir.kozlov at oracle.com; igor.ignatyev at oracle.com
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs.

Thanks for your feedback, this resolves my concerns and I am happy with the solution. I integrated the suggestions from Vladimir, here is the latest webrev:

https://cr.openjdk.java.net/~cgo/8231954/webrev.02/

I re-tested and it works as expected.
Please give your consent if this is fine for you as well.

-- Christoph

Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote on 2019-11-13 20:32:18:

> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> To: Igor Ignatyev <igor.ignatyev at oracle.com>,
christoph.goettschkes at microdoc.com
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Date: 2019-11-13 20:32
> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/ 
> TestCharVect2.java only works with server VMs.
> 
> On 11/13/19 11:11 AM, Igor Ignatyev wrote:
> > @Christoph,
> > 
> > webrev.01 looks good to me.
> > I always thought that jvmci feature can be built only when compiler2
> feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp
> suggests that jvmci can be used w/o compiler2; I don't think we have 
> ever build/test, let alone support, this configuration.
> > 
> > @Vladimir,
> > did/do we plan to support compiler1 + jvmci w/o compiler2
configuration?
> 
> Yes. It could be configuration when we start looking on replacing C1 
> with Graal. I think several people were interested in "Client VM" like 
> configuration.
> Also Server configuration without C2 (with Graal or other jvmci
> compiler) which would be out configuration in a future.
> 
> But I would prefer to be more explicit in these changes:
> 
> @requires vm.compiler2.enabled | vm.graal.enabled
> 
> Thanks,
> Vladimir
> 
> > 
> > Thanks,
> > -- Igor
> > 
> >> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com
wrote:
> >>
> >> Hi Igor,
> >>
> >> thanks for your explanation.
> >>
> >> Igor Ignatyev <igor.ignatyev at oracle.com> wrote on 2019-11-12
20:40:46:
> >>
> >>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our
tests, as
> >>> in most cases, it causes wasted compute time (as in this test) and
can
> >>> also lead to wrong/deprecated/deleted flags sneaking into the
testbase
> >>
> >> Agreed. I also wanted to discuss this, since I think that your
solution
> >> is better than mine, but at the same time, I saw possible problems
with
> >> it, see below.
> >>
> >>> as '@requires vm.flavor == "server"' filters configurations based 
> >>> vm build type, it will still allow execution on JVM w/ JVMCI and 
> >>> when
JVMCI
> >>> compiler is selected, as it will still be Server VM build. so, in 
> >>> a sense, the test will be w/ JVMCI in the same way as w/ your
approach.
> >>
> >> My concern is not about server VMs with JVMCI, but client VMs with
JVMCI
> >> enabled. Is this a valid configuration? The MaxVectorSize option is 
> >> defined in [1] as well as in [2], so for me it looks like
MaxVectorSize
> >> can be used for any VM variant as long as JVMCI is enabled. The 
> >> configure script also states that both compilers are possible (if 
> >> you configure with --with-jvm-features='jvmci'):
> >>
> >> configure: error: Specified JVM feature 'jvmci' requires feature 
> >> 'compiler2' or 'compiler1'
> >>
> >> Should maybe the requires tag "vm.jvmci" be used as well, like:
> >>
> >>     @requires vm.flavor == "server" | vm.jvmci
> >>
> >>> this is the known limitation of jtreg/@requires, and our current 
> >>> way
to
> >>> workaround it is to split a test description based on @requires
values
> >>
> >> Yes, if the @requires tag is used, splitting up the test looks like 
> >> a
good
> >> idea. I didn't know that it is possible to have multiple test
descriptions
> >> in one test file.
> >>
> >> I created a new webrev with the new ideas:
> >>
> >> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/
> >>
> >> I tested with an amd64 client and server VM and it looks good. I am 
> >> currently unable to build a client VM with JVMCI enabled, hence no
test
> >> for that yet. I get compile errors and as soon as I resolve those, 
> >> runtime errors occur. Before I look into that, I would like to know
if
> >> client VMs with JVMCI enabled are supported or not.
> >>
> >> Thanks,
> >> Christoph
> >>
> >> [1]
> >> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/
> share/opto/c2_globals.hpp
> >>
> >> [2]
> >> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/
> share/jvmci/jvmci_globals.hpp
> >>
> > 
> 


From patrick at os.amperecomputing.com  Fri Nov 15 07:51:17 2019
From: patrick at os.amperecomputing.com (Patrick Zhang OS)
Date: Fri, 15 Nov 2019 07:51:17 +0000
Subject: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo
 intrinsic documentation and maintenance improvement
In-Reply-To: <399186f2-5cb7-b713-33c5-b29b6eb73eb8@samersoff.net>
References: <fc712566-1d25-88d6-b178-652efdc00f36@bell-sw.com>
 <DB7PR08MB31153C8A293F3B4FC2371B34967F0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <86f91401-b7f6-b634-fef1-b0615b8fcde0@redhat.com>
 <e47a5afa-e9ce-1286-97da-5fb9018846cb@bell-sw.com>
 <399186f2-5cb7-b713-33c5-b29b6eb73eb8@samersoff.net>
Message-ID: <MN2PR01MB6093B0E0C0BD01587FF96F778F700@MN2PR01MB6093.prod.exchangelabs.com>

Hi Dmitrij,

The inline document inside your this patch is nice in helping me understand the string_compare stub code generation in depth, although I don't know why it was not pushed. 
http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp.sdiff.html 

There is one point I don't quite understand, could you please clarify it a little more? Thanks in advance!
The large loop with prefetching logic, in different_encoding function, it uses "cnt2 - prefetchLoopExitCondition" to tell whether the 1st iteration should be executed or not, while the same_encoding does not do this, why?

I was thinking that "prefetching out of the string boundary" could be an invalid operation, but it seems a misunderstanding, isn't it? The AArch64 ISA document says that prfm instruction signals the memory system and expects "preloading the cache line containing the specified address into one or more caches" would be done, in order to speed up the memory accesses when they do occur. If this is just a hint for coming ldrs, and safe enough, could we not restrict the rest iterations (2 ~ n) with largeLoopExitCondition? instead use 64 LL/UU, and 128 for LU/UL. As such more iteration can say in the large loop (SoftwarePrefetchHintDistance=192 for example), for better performance. Any comments?

Thanks 

4327   address generate_compare_long_string_different_encoding(bool isLU) {
4377     if (SoftwarePrefetchHintDistance >= 0) {
4378       __ subs(rscratch2, cnt2, prefetchLoopExitCondition);
4379       __ br(__ LT, NO_PREFETCH);
4380       __ bind(LARGE_LOOP_PREFETCH);                  // 64-characters loop
... ...
4395           __ subs(rscratch2, cnt2, prefetchLoopExitCondition); // <-- could we use subs(rscratch2, cnt2, 128) instead?
4396           __ br(__ GE, LARGE_LOOP_PREFETCH);
4397     } // end of 64-characters loop

4616   address generate_compare_long_string_same_encoding(bool isLL) {
4637     if (SoftwarePrefetchHintDistance >= 0) {
4638       __ bind(LARGE_LOOP_PREFETCH);
4639         __ prfm(Address(str1, SoftwarePrefetchHintDistance));
4640         __ prfm(Address(str2, SoftwarePrefetchHintDistance));
4641         compare_string_16_bytes_same(DIFF, DIFF2);
4642         compare_string_16_bytes_same(DIFF, DIFF2);
4643         __ sub(cnt2, cnt2, 8 * characters_in_word);
4644         compare_string_16_bytes_same(DIFF, DIFF2);
4645         __ subs(rscratch2, cnt2, largeLoopExitCondition); // rscratch2 is not used. Use subs instead of cmp in case of potentially large constants // <-- could we use subs(rscratch2, cnt2, 64) instead?
4646         compare_string_16_bytes_same(DIFF, DIFF2);
4647         __ br(__ GT, LARGE_LOOP_PREFETCH);
4648         __ cbz(cnt2, LAST_CHECK);                    // no more loads left
4649     }

Regards
Patrick

-----Original Message-----
From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Dmitry Samersoff
Sent: Sunday, May 19, 2019 11:42 PM
To: Dmitrij Pochepko <dmitrij.pochepko at bell-sw.com>; Andrew Haley <aph at redhat.com>; Pengfei Li (Arm Technology China) <Pengfei.Li at arm.com>
Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net
Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement

Dmitrij,

The changes looks good to me.

-Dmitry

On 25.02.2019 19:52, Dmitrij Pochepko wrote:
> Hi Andrew, Pengfei,
> 
> I created webrev.02 with all your suggestions implemented:
> 
> webrev: http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/
> 
> - comments are now both in separate section and inlined into code.
> - documentation mismatch mentioned by Pengfei is fixed:
> -- SHORT_LAST_INIT label name misprint changed to correct SHORT_LAST
> -- SHORT_LOOP_TAIL block now merged with last instruction. 
> Documentation is updated respectively
> - minor other changes to layout and wording
> 
> Newly developed tests were run as sanity and they passed.
> 
> Thanks,
> Dmitrij
> 
> On 22/02/2019 6:42 PM, Andrew Haley wrote:
>> On 2/22/19 10:31 AM, Pengfei Li (Arm Technology China) wrote:
>>
>>> So personally, I still prefer to inline the comments with the 
>>> original code block to avoid this kind of inconsistencies. And it 
>>> makes us easier to review or maintain the code together with the 
>>> doc, as we don't need to scroll back and force. I don't know the 
>>> benefit of making the code documentation as a separate part. What's 
>>> your opinion, Andrew Haley?
>> I agree with you. There's no harm having both inline and separate.
>>


From Pengfei.Li at arm.com  Fri Nov 15 09:15:37 2019
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Fri, 15 Nov 2019 09:15:37 +0000
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
Message-ID: <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi Andrew,

> > This patch aligns with the implementation in [1] which makes the
> > x86_64
> > r12 register allocatable. Please let me know if I have missed anything
> > for AArch64.
> 
> We don't generally use r27 for compressed class pointers.

Do you mean that r27 is only used for encoding/decoding oops but not for
any klass pointers? I looked at the AArch64 code and find it also used in
MacroAssembler::encode_klass_not_null() if the compressed mode is not
zero-based.

--
Thanks,
Pengfei


From patric.hedlin at oracle.com  Fri Nov 15 09:28:23 2019
From: patric.hedlin at oracle.com (Patric Hedlin)
Date: Fri, 15 Nov 2019 10:28:23 +0100
Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
 check
In-Reply-To: <82e3e8df-e6dc-5638-73a4-c5738b33fdad@oracle.com>
References: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
 <82e3e8df-e6dc-5638-73a4-c5738b33fdad@oracle.com>
Message-ID: <82f94320-c260-add2-2627-4116c5667e89@oracle.com>

Thanks Vladimir for pointing this out.

The immediate reason is that this is a reduced patch derived from a more 
general one (addressing a few more cases and requiring some "surgery"). 
I thought I should put this out first (to address the very limited case 
in the trouble report) as I found the more general approach to grow a 
bit more than I was happy with (the intent being to follow-up with the 
more general version later). But you are absolutely right that this 
single case only requires a constant condition. I'll re-work the patch 
and save the "surgery" for later.

Best regards,
Patric Hedlin

On 14/11/2019 09:31, Vladimir Ivanov wrote:
>
>
>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/
>
> I just briefly looked through the patch and have a quick high-level 
> question:
>
> +? // Rewrite:
> +? //????????????? cmp?????????????????? cmp
> +? //????????????? / \??????????????????? |
> +? //???? (r1)? bool? \????????????????? bool (r1)
> +? //??????????? /??? bool (r2)??????????? \
> +? //??? (dom) if?????? \??????? ==>?????? if
> +? //??????????? \?????? )????????????????? \
> +? //??? (pre)? if[TF]? /????????????????? if[TF]X
> +? //?????????????? \? /
> +? //??????????????? if (this)
> +? //?????????????? /? \
> +? //???????????? ifT? ifF [X]
>
> Why do you do complex graph surgery instead of simply adjusting 
> condition at redundant If (to 0/1) and let existing logic to eliminate 
> it?
>
> Best regards,
> Vladimir Ivanov
>
>>
>> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check
>>
>> ???? Adding a simple subsumption test to IfNode::Ideal to enable a local
>> ???? short-circuit for (obviously) redundant if-nodes.
>>
>> Testing: hs-tier1-4, hs-precheckin-comp
>>
>>
>> Best regards,
>> Patric
>>


From patric.hedlin at oracle.com  Fri Nov 15 09:46:57 2019
From: patric.hedlin at oracle.com (Patric Hedlin)
Date: Fri, 15 Nov 2019 10:46:57 +0100
Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
 check
In-Reply-To: <VI1PR0201MB24791AD6578AAC150EA1B6B69A760@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
 <8d451a2e-fb94-8d59-e02e-e3182e115a0b@oracle.com>
 <VI1PR0201MB24791AD6578AAC150EA1B6B69A760@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <d4114e64-23ee-3f3e-265f-f63aa0227e53@oracle.com>

Hi Martin,

On 13/11/2019 20:26, Doerr, Martin wrote:
> Hi Patric,
>
> thanks for addressing this issue.
>
> There seems to be a small issue with the webrev:
> -  if (bol->is_Bool()) {
> +  if (!bol->is_Bool()) {
Good catch. Looks like I spoiled the patch when adding the line above 
(keeping 'bol' around).

Best regards,
Patric Hedlin
> I guess you have tested it already this way and just something with the webrev went wrong.
>
> You recognize and transform a specific pattern you have described in the comment. It is appropriate for fixing this example and it looks good to me. But I'm curious how often this matches. Maybe we can see a performance improvement.
>
> Thanks and best regards,
> Martin
>
>
>> -----Original Message-----
>> From: hotspot-compiler-dev <hotspot-compiler-dev-
>> bounces at openjdk.java.net> On Behalf Of Nils Eliasson
>> Sent: Mittwoch, 13. November 2019 17:13
>> To: hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
>> check
>>
>> Hi Patric,
>>
>> Looks good!
>>
>> (I have pre-reviewed this patch offline)
>>
>> Regards,
>>
>> Nils
>>
>> On 2019-11-12 15:16, Patric Hedlin wrote:
>>> Dear all,
>>>
>>> I would like to ask for help to review the following change/update:
>>>
>>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376
>>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/
>>>
>>> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check
>>>
>>>  ??? Adding a simple subsumption test to IfNode::Ideal to enable a local
>>>  ??? short-circuit for (obviously) redundant if-nodes.
>>>
>>> Testing: hs-tier1-4, hs-precheckin-comp
>>>
>>>
>>> Best regards,
>>> Patric


From christoph.goettschkes at microdoc.com  Fri Nov 15 10:16:50 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Fri, 15 Nov 2019 11:16:50 +0100
Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java
 only works with server VMs.
In-Reply-To: <79dba9cf-46c4-1a3d-aba5-28622f2bd505@oracle.com>
References: <2w947d9dgm-1@aserp2050.oracle.com>
 <9725CF6F-64B1-4101-BB3A-801C841A8441@oracle.com>
 <79dba9cf-46c4-1a3d-aba5-28622f2bd505@oracle.com>
Message-ID: <mailman.9.1573813111.19479.hotspot-compiler-dev@openjdk.java.net>

Thanks for the reviews. I created a new webrev with the Reviewed-by line 
added to the changeset:

https://cr.openjdk.java.net/~cgo/8231954/webrev.03/

Could you sponsor the change for me and commit it into the repository?

Thanks,
Christoph

Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote on 2019-11-14 18:53:20:

> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> To: christoph.goettschkes at microdoc.com
> Cc: hotspot-compiler-dev at openjdk.java.net
> Date: 2019-11-14 18:53
> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/
> TestCharVect2.java only works with server VMs.
> 
> +1
> 
> Vladimir
> 
> On 11/14/19 8:44 AM, Igor Ignatev wrote:
> > LGTM
> > 
> > ? Igor
> > 
> >> On Nov 14, 2019, at 3:22 AM, christoph.goettschkes at microdoc.com 
wrote:
> >>
> >> ?Thanks for your feedback, this resolves my concerns and I am happy 
with
> >> the solution. I integrated the suggestions from Vladimir, here is the
> >> latest webrev:
> >>
> >> https://cr.openjdk.java.net/~cgo/8231954/webrev.02/
> >>
> >> I re-tested and it works as expected.
> >> Please give your consent if this is fine for you as well.
> >>
> >> -- Christoph
> >>
> >> Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote on 2019-11-13 
20:32:18:
> >>
> >>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> >>> To: Igor Ignatyev <igor.ignatyev at oracle.com>,
> >> christoph.goettschkes at microdoc.com
> >>> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> >>> Date: 2019-11-13 20:32
> >>> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/
> >>> TestCharVect2.java only works with server VMs.
> >>>
> >>>> On 11/13/19 11:11 AM, Igor Ignatyev wrote:
> >>>> @Christoph,
> >>>>
> >>>> webrev.01 looks good to me.
> >>>> I always thought that jvmci feature can be built only when 
compiler2
> >>> feature is enabled, however 
src/hotspot/share/jvmci/jvmci_globals.hpp
> >>> suggests that jvmci can be used w/o compiler2; I don't think we have
> >>> ever build/test, let alone support, this configuration.
> >>>>
> >>>> @Vladimir,
> >>>> did/do we plan to support compiler1 + jvmci w/o compiler2
> >> configuration?
> >>>
> >>> Yes. It could be configuration when we start looking on replacing C1
> >>> with Graal. I think several people were interested
> >>> in "Client VM" like configuration.
> >>> Also Server configuration without C2 (with Graal or other jvmci
> >>> compiler) which would be out configuration in a future.
> >>>
> >>> But I would prefer to be more explicit in these changes:
> >>>
> >>> @requires vm.compiler2.enabled | vm.graal.enabled
> >>>
> >>> Thanks,
> >>> Vladimir
> >>>
> >>>>
> >>>> Thanks,
> >>>> -- Igor
> >>>>
> >>>>> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com
> >> wrote:
> >>>>>
> >>>>> Hi Igor,
> >>>>>
> >>>>> thanks for your explanation.
> >>>>>
> >>>>> Igor Ignatyev <igor.ignatyev at oracle.com> wrote on 2019-11-12
> >> 20:40:46:
> >>>>>
> >>>>>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our
> >> tests, as
> >>>>>> in most cases, it causes wasted compute time (as in this test) 
and
> >> can
> >>>>>> also lead to wrong/deprecated/deleted flags sneaking into the
> >> testbase
> >>>>>
> >>>>> Agreed. I also wanted to discuss this, since I think that your
> >> solution
> >>>>> is better than mine, but at the same time, I saw possible problems
> >> with
> >>>>> it, see below.
> >>>>>
> >>>>>> as '@requires vm.flavor == "server"' filters configurations based 
vm
> >>>>>> build type, it will still allow execution on JVM w/ JVMCI and 
when
> >> JVMCI
> >>>>>> compiler is selected, as it will still be Server VM build. so, in 
a
> >>>>>> sense, the test will be w/ JVMCI in the same way as w/ your
> >> approach.
> >>>>>
> >>>>> My concern is not about server VMs with JVMCI, but client VMs with
> >> JVMCI
> >>>>> enabled. Is this a valid configuration? The MaxVectorSize option 
is
> >>>>> defined in [1] as well as in [2], so for me it looks like
> >> MaxVectorSize
> >>>>> can be used for any VM variant as long as JVMCI is enabled. The
> >>>>> configure script also states that both compilers are possible (if 
you
> >>>>> configure with --with-jvm-features='jvmci'):
> >>>>>
> >>>>> configure: error: Specified JVM feature 'jvmci' requires feature
> >>>>> 'compiler2' or 'compiler1'
> >>>>>
> >>>>> Should maybe the requires tag "vm.jvmci" be used as well, like:
> >>>>>
> >>>>>     @requires vm.flavor == "server" | vm.jvmci
> >>>>>
> >>>>>> this is the known limitation of jtreg/@requires, and our current 
way
> >> to
> >>>>>> workaround it is to split a test description based on @requires
> >> values
> >>>>>
> >>>>> Yes, if the @requires tag is used, splitting up the test looks 
like a
> >> good
> >>>>> idea. I didn't know that it is possible to have multiple test
> >> descriptions
> >>>>> in one test file.
> >>>>>
> >>>>> I created a new webrev with the new ideas:
> >>>>>
> >>>>> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/
> >>>>>
> >>>>> I tested with an amd64 client and server VM and it looks good. I 
am
> >>>>> currently unable to build a client VM with JVMCI enabled, hence no
> >> test
> >>>>> for that yet. I get compile errors and as soon as I resolve those,
> >>>>> runtime errors occur. Before I look into that, I would like to 
know
> >> if
> >>>>> client VMs with JVMCI enabled are supported or not.
> >>>>>
> >>>>> Thanks,
> >>>>> Christoph
> >>>>>
> >>>>> [1]
> >>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/
> >>> share/opto/c2_globals.hpp
> >>>>>
> >>>>> [2]
> >>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/
> >>> share/jvmci/jvmci_globals.hpp
> >>>>>
> >>>>
> >>>
> >>
> > 
> 


From christoph.goettschkes at microdoc.com  Fri Nov 15 10:40:21 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Fri, 15 Nov 2019 11:40:21 +0100
Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java
 only works with server VMs.
In-Reply-To: <VE1PR08MB4656A89817FF722ECF400A778E700@VE1PR08MB4656.eurprd08.prod.outlook.com>
References: <20191112120936.1D826D285F@aojmv0009>
 <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com>
 <2w8hcs8xd1-1@aserp2030.oracle.com>
 <7BEE375F-312E-4EE5-B0FA-2223752689A3@oracle.com>
 <71204849-979a-bd2e-b302-69ac81a44bca@oracle.com>
 <20191114112213.D20B8D9B6E@aojmv0009>
 <VE1PR08MB4656A89817FF722ECF400A778E700@VE1PR08MB4656.eurprd08.prod.outlook.com>
Message-ID: <mailman.10.1573814463.19479.hotspot-compiler-dev@openjdk.java.net>

Hi Yang,

I don't see a problem with applying the same idea to other tests to get 
rid of the IgnoreUnrecognizedVMOptions option.

> Ps. For [3],  it locates in c2 directory. So I'm not sure whether they 
> will be excluded in jtreg test with client mode.

No, I don't think so. Those tests are not excluded automatically by jtreg. 
Running

    $ make test TEST=hotspot_all JTREG='OPTIONS=-l'

Lists all tests in the c2 directory:

...
compiler/c2/cr6340864/TestByteVect.java
compiler/c2/cr6340864/TestDoubleVect.java
...

The directory "cr6340864" is however excluded from the hotspot tier1 test 
group.

One thing I noticed: some of the tests use other flags, which are only 
supported by the C2 compiler, like TestNaNVector.java [1]. The flag "
OptimizeFill" is only present in c2_globals.hpp [4] and not in 
jvmci_globals.hpp [5], so the @requires tag should look different. Maybe 
for [1], we should add another @run tag with no -XX arguments whatsoever?

-- Christoph

[1]
https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestNaNVector.java

[4]
https://hg.openjdk.java.net/jdk/jdk/file/6f42d2a19117/src/hotspot/share/opto/c2_globals.hpp

[5]
https://hg.openjdk.java.net/jdk/jdk/file/6f42d2a19117/src/hotspot/share/jvmci/jvmci_globals.hpp

"Yang Zhang (Arm Technology China)" <Yang.Zhang at arm.com> wrote on 
2019-11-15 04:35:54:

> From: "Yang Zhang (Arm Technology China)" <Yang.Zhang at arm.com>
> To: "christoph.goettschkes at microdoc.com" 
> <christoph.goettschkes at microdoc.com>, "vladimir.kozlov at oracle.com" 
> <vladimir.kozlov at oracle.com>, "igor.ignatyev at oracle.com" 
> <igor.ignatyev at oracle.com>
> Cc: "hotspot-compiler-dev at openjdk.java.net" <hotspot-compiler-
> dev at openjdk.java.net>
> Date: 2019-11-15 04:36
> Subject: RE: RFR: 8231954: [TESTBUG] Test compiler/codegen/
> TestCharVect2.java only works with server VMs.
> 
> Hi Christoph, Igor, Vladimir,
> 
> Thanks very much for your fix. After discussion, we have got a better 
> solution for this issue. Do we need to change the following files in 
> which MaxVectorSize option is used?
> 
> [1] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/
> jtreg/compiler/vectorization/TestNaNVector.java
> [2] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/
> jtreg/compiler/vectorization/TestPopCountVector.java
> [3] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/
> jtreg/compiler/c2/cr6340864
> 
> Ps. For [3],  it locates in c2 directory. So I'm not sure whether they 
> will be excluded in jtreg test with client mode.
> 
> Regards
> Yang
> 
> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of 
christoph.goettschkes at microdoc.com
> Sent: Thursday, November 14, 2019 7:21 PM
> To: vladimir.kozlov at oracle.com; igor.ignatyev at oracle.com
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/
> TestCharVect2.java only works with server VMs.
> 
> Thanks for your feedback, this resolves my concerns and I am happy with
> the solution. I integrated the suggestions from Vladimir, here is the 
> latest webrev:
> 
> https://cr.openjdk.java.net/~cgo/8231954/webrev.02/
> 
> I re-tested and it works as expected.
> Please give your consent if this is fine for you as well.
> 
> -- Christoph
> 
> Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote on 2019-11-13 
20:32:18:
> 
> > From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> > To: Igor Ignatyev <igor.ignatyev at oracle.com>,
> christoph.goettschkes at microdoc.com
> > Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> > Date: 2019-11-13 20:32
> > Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/
> > TestCharVect2.java only works with server VMs.
> >
> > On 11/13/19 11:11 AM, Igor Ignatyev wrote:
> > > @Christoph,
> > >
> > > webrev.01 looks good to me.
> > > I always thought that jvmci feature can be built only when compiler2
> > feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp
> > suggests that jvmci can be used w/o compiler2; I don't think we have
> > ever build/test, let alone support, this configuration.
> > >
> > > @Vladimir,
> > > did/do we plan to support compiler1 + jvmci w/o compiler2
> configuration?
> >
> > Yes. It could be configuration when we start looking on replacing C1
> > with Graal. I think several people were interested in "Client VM" like
> > configuration.
> > Also Server configuration without C2 (with Graal or other jvmci
> > compiler) which would be out configuration in a future.
> >
> > But I would prefer to be more explicit in these changes:
> >
> > @requires vm.compiler2.enabled | vm.graal.enabled
> >
> > Thanks,
> > Vladimir
> >
> > >
> > > Thanks,
> > > -- Igor
> > >
> > >> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com
> wrote:
> > >>
> > >> Hi Igor,
> > >>
> > >> thanks for your explanation.
> > >>
> > >> Igor Ignatyev <igor.ignatyev at oracle.com> wrote on 2019-11-12
> 20:40:46:
> > >>
> > >>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our
> tests, as
> > >>> in most cases, it causes wasted compute time (as in this test) and
> can
> > >>> also lead to wrong/deprecated/deleted flags sneaking into the
> testbase
> > >>
> > >> Agreed. I also wanted to discuss this, since I think that your
> solution
> > >> is better than mine, but at the same time, I saw possible problems
> with
> > >> it, see below.
> > >>
> > >>> as '@requires vm.flavor == "server"' filters configurations based
> > >>> vm build type, it will still allow execution on JVM w/ JVMCI and
> > >>> when
> JVMCI
> > >>> compiler is selected, as it will still be Server VM build. so, in
> > >>> a sense, the test will be w/ JVMCI in the same way as w/ your
> approach.
> > >>
> > >> My concern is not about server VMs with JVMCI, but client VMs with
> JVMCI
> > >> enabled. Is this a valid configuration? The MaxVectorSize option is
> > >> defined in [1] as well as in [2], so for me it looks like
> MaxVectorSize
> > >> can be used for any VM variant as long as JVMCI is enabled. The
> > >> configure script also states that both compilers are possible (if
> > >> you configure with --with-jvm-features='jvmci'):
> > >>
> > >> configure: error: Specified JVM feature 'jvmci' requires feature
> > >> 'compiler2' or 'compiler1'
> > >>
> > >> Should maybe the requires tag "vm.jvmci" be used as well, like:
> > >>
> > >>     @requires vm.flavor == "server" | vm.jvmci
> > >>
> > >>> this is the known limitation of jtreg/@requires, and our current
> > >>> way
> to
> > >>> workaround it is to split a test description based on @requires
> values
> > >>
> > >> Yes, if the @requires tag is used, splitting up the test looks like
> > >> a
> good
> > >> idea. I didn't know that it is possible to have multiple test
> descriptions
> > >> in one test file.
> > >>
> > >> I created a new webrev with the new ideas:
> > >>
> > >> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/
> > >>
> > >> I tested with an amd64 client and server VM and it looks good. I am
> > >> currently unable to build a client VM with JVMCI enabled, hence no
> test
> > >> for that yet. I get compile errors and as soon as I resolve those,
> > >> runtime errors occur. Before I look into that, I would like to know
> if
> > >> client VMs with JVMCI enabled are supported or not.
> > >>
> > >> Thanks,
> > >> Christoph
> > >>
> > >> [1]
> > >> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/
> > share/opto/c2_globals.hpp
> > >>
> > >> [2]
> > >> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/
> > share/jvmci/jvmci_globals.hpp
> > >>
> > >
> >
> 
> IMPORTANT NOTICE: The contents of this email and any attachments are 
> confidential and may also be privileged. If you are not the intended 
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy 
> the information in any medium. Thank you.
> 


From patrick at os.amperecomputing.com  Fri Nov 15 10:54:16 2019
From: patrick at os.amperecomputing.com (Patrick Zhang OS)
Date: Fri, 15 Nov 2019 10:54:16 +0000
Subject: RFR (trivial): 8234228: AArch64: Clean up redundant temp vars in
 generate_compare_long_string_different_encoding
Message-ID: <MN2PR01MB60936ED9EF6FAE8A493694508F700@MN2PR01MB6093.prod.exchangelabs.com>

Hi Reviewers,

This is a simple patch which cleans up some redundant temp vars and related instructions in generate_compare_long_string_different_encoding.

JBS: https://bugs.openjdk.java.net/browse/JDK-8234228 
Webrev: http://cr.openjdk.java.net/~qpzhang/8234228/webrev.01 

In generate_compare_long_string_different_encoding, the two Register vars strU and strL were used to record the pointers of the last 4 characters for the final comparisons. strU has been no use since the latest code updates as the chars got pre-loaded (r12) by compare_string_16_x_LU early, and strL is redundant too since the pointer is available in r11. Cleaning up these can save two add, two temp vars, and replace two sub with mov.
In addition, r10 in compare_string_16_x_LU is not used, cleaned the temp var too.

Tested jtreg tier1, and hotspot runtime/compiler, no new failures found.
Double checked with string intrinsics cases under [1], no regression found.
Ran [2] CompareToBench LU/UL as performance check, no regression found, and slight gains with some input sizes

[1] http://hg.openjdk.java.net/jdk/jdk/file/3df2bf731a87/test/hotspot/jtreg/compiler/intrinsics/string 
[2] http://cr.openjdk.java.net/~shade/density/string-density-bench.jar 

Regards
Patrick


From martin.doerr at sap.com  Fri Nov 15 11:13:05 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 15 Nov 2019 11:13:05 +0000
Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
 check
In-Reply-To: <82f94320-c260-add2-2627-4116c5667e89@oracle.com>
References: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
 <82e3e8df-e6dc-5638-73a4-c5738b33fdad@oracle.com>
 <82f94320-c260-add2-2627-4116c5667e89@oracle.com>
Message-ID: <VI1PR0201MB24797E46764490CF7A6633BA9A700@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Patric,

excellent. I have found issues with the surgery while running some tests.
The VM was running into assert(cnt == _outcnt) failed: no insertions allowed
due to usage of DUIterator_Last.
So a version without the surgery sounds promising.

Best regards,
Martin


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of Patric Hedlin
> Sent: Freitag, 15. November 2019 10:28
> To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; hotspot compiler
> <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
> check
> 
> Thanks Vladimir for pointing this out.
> 
> The immediate reason is that this is a reduced patch derived from a more
> general one (addressing a few more cases and requiring some "surgery").
> I thought I should put this out first (to address the very limited case
> in the trouble report) as I found the more general approach to grow a
> bit more than I was happy with (the intent being to follow-up with the
> more general version later). But you are absolutely right that this
> single case only requires a constant condition. I'll re-work the patch
> and save the "surgery" for later.
> 
> Best regards,
> Patric Hedlin
> 
> On 14/11/2019 09:31, Vladimir Ivanov wrote:
> >
> >
> >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/
> >
> > I just briefly looked through the patch and have a quick high-level
> > question:
> >
> > +? // Rewrite:
> > +? //????????????? cmp?????????????????? cmp
> > +? //????????????? / \??????????????????? |
> > +? //???? (r1)? bool? \????????????????? bool (r1)
> > +? //??????????? /??? bool (r2)??????????? \
> > +? //??? (dom) if?????? \??????? ==>?????? if
> > +? //??????????? \?????? )????????????????? \
> > +? //??? (pre)? if[TF]? /????????????????? if[TF]X
> > +? //?????????????? \? /
> > +? //??????????????? if (this)
> > +? //?????????????? /? \
> > +? //???????????? ifT? ifF [X]
> >
> > Why do you do complex graph surgery instead of simply adjusting
> > condition at redundant If (to 0/1) and let existing logic to eliminate
> > it?
> >
> > Best regards,
> > Vladimir Ivanov
> >
> >>
> >> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check
> >>
> >> ???? Adding a simple subsumption test to IfNode::Ideal to enable a local
> >> ???? short-circuit for (obviously) redundant if-nodes.
> >>
> >> Testing: hs-tier1-4, hs-precheckin-comp
> >>
> >>
> >> Best regards,
> >> Patric
> >>


From aph at redhat.com  Fri Nov 15 14:49:14 2019
From: aph at redhat.com (Andrew Haley)
Date: Fri, 15 Nov 2019 14:49:14 +0000
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>

On 11/15/19 9:15 AM, Pengfei Li (Arm Technology China) wrote:

>>> This patch aligns with the implementation in [1] which makes the
>>> x86_64
>>> r12 register allocatable. Please let me know if I have missed anything
>>> for AArch64.
>>
>> We don't generally use r27 for compressed class pointers.
> 
> Do you mean that r27 is only used for encoding/decoding oops but not for
> any klass pointers?

Almost always, yes.

> I looked at the AArch64 code and find it also used in
> MacroAssembler::encode_klass_not_null() if the compressed mode is
> not zero-based.

I see

  if (use_XOR_for_compressed_class_base) {
    if (CompressedKlassPointers::shift() != 0) {
      eor(dst, src, (uint64_t)CompressedKlassPointers::base());
      lsr(dst, dst, LogKlassAlignmentInBytes);
    } else {
      eor(dst, src, (uint64_t)CompressedKlassPointers::base());
    }
    return;
  }

  if (((uint64_t)CompressedKlassPointers::base() & 0xffffffff) == 0
      && CompressedKlassPointers::shift() == 0) {
    movw(dst, src);
    return;
  }

  ... followed by code which does use r27.

Do you ever see r27 being used? If so, I'd be interested to know how
this gets triggered and what command-line arguments you use. It's
rather inefficient.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From dmitrij.pochepko at bell-sw.com  Fri Nov 15 15:51:42 2019
From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko)
Date: Fri, 15 Nov 2019 18:51:42 +0300
Subject: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo
 intrinsic documentation and maintenance improvement
In-Reply-To: <MN2PR01MB6093B0E0C0BD01587FF96F778F700@MN2PR01MB6093.prod.exchangelabs.com>
References: <fc712566-1d25-88d6-b178-652efdc00f36@bell-sw.com>
 <DB7PR08MB31153C8A293F3B4FC2371B34967F0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <86f91401-b7f6-b634-fef1-b0615b8fcde0@redhat.com>
 <e47a5afa-e9ce-1286-97da-5fb9018846cb@bell-sw.com>
 <399186f2-5cb7-b713-33c5-b29b6eb73eb8@samersoff.net>
 <MN2PR01MB6093B0E0C0BD01587FF96F778F700@MN2PR01MB6093.prod.exchangelabs.com>
Message-ID: <6843892a-4267-06cd-0af4-4618fce86396@bell-sw.com>

Hi Patrick,

My experiments back then showed that few platforms (some of Cortex A* 
series) behaves unexpectedly slow when dealing with overprefetch 
(probably CPU implementation specifics). So this code is some kind of 
compromise to run relatively well on all platforms I was able to test on 
(ThunderX, ThunderX2, Cortex A53, Cortex A73). That is the main reason 
for such code structure.
It's good that you're willing to experiment and improve it, but I'm 
afraid changing largeLoopExitCondition to 64 LL/UU and 128 for LU/UL 
will likely make some systems slower as I experimented a lot with this. 
Let us see the performance results for several systems you've got to 
avoid a situation when one platform benefits by slowing down others. We 
could offer some help if you don't have some HW available.

Thanks,
Dmitrij

On 15/11/2019 10:51 AM, Patrick Zhang OS wrote:
> Hi Dmitrij,
>
> The inline document inside your this patch is nice in helping me understand the string_compare stub code generation in depth, although I don't know why it was not pushed.
> http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp.sdiff.html
>
> There is one point I don't quite understand, could you please clarify it a little more? Thanks in advance!
> The large loop with prefetching logic, in different_encoding function, it uses "cnt2 - prefetchLoopExitCondition" to tell whether the 1st iteration should be executed or not, while the same_encoding does not do this, why?
>
> I was thinking that "prefetching out of the string boundary" could be an invalid operation, but it seems a misunderstanding, isn't it? The AArch64 ISA document says that prfm instruction signals the memory system and expects "preloading the cache line containing the specified address into one or more caches" would be done, in order to speed up the memory accesses when they do occur. If this is just a hint for coming ldrs, and safe enough, could we not restrict the rest iterations (2 ~ n) with largeLoopExitCondition? instead use 64 LL/UU, and 128 for LU/UL. As such more iteration can say in the large loop (SoftwarePrefetchHintDistance=192 for example), for better performance. Any comments?
>
> Thanks
>
> 4327   address generate_compare_long_string_different_encoding(bool isLU) {
> 4377     if (SoftwarePrefetchHintDistance >= 0) {
> 4378       __ subs(rscratch2, cnt2, prefetchLoopExitCondition);
> 4379       __ br(__ LT, NO_PREFETCH);
> 4380       __ bind(LARGE_LOOP_PREFETCH);                  // 64-characters loop
> ... ...
> 4395           __ subs(rscratch2, cnt2, prefetchLoopExitCondition); // <-- could we use subs(rscratch2, cnt2, 128) instead?
> 4396           __ br(__ GE, LARGE_LOOP_PREFETCH);
> 4397     } // end of 64-characters loop
>
> 4616   address generate_compare_long_string_same_encoding(bool isLL) {
> 4637     if (SoftwarePrefetchHintDistance >= 0) {
> 4638       __ bind(LARGE_LOOP_PREFETCH);
> 4639         __ prfm(Address(str1, SoftwarePrefetchHintDistance));
> 4640         __ prfm(Address(str2, SoftwarePrefetchHintDistance));
> 4641         compare_string_16_bytes_same(DIFF, DIFF2);
> 4642         compare_string_16_bytes_same(DIFF, DIFF2);
> 4643         __ sub(cnt2, cnt2, 8 * characters_in_word);
> 4644         compare_string_16_bytes_same(DIFF, DIFF2);
> 4645         __ subs(rscratch2, cnt2, largeLoopExitCondition); // rscratch2 is not used. Use subs instead of cmp in case of potentially large constants // <-- could we use subs(rscratch2, cnt2, 64) instead?
> 4646         compare_string_16_bytes_same(DIFF, DIFF2);
> 4647         __ br(__ GT, LARGE_LOOP_PREFETCH);
> 4648         __ cbz(cnt2, LAST_CHECK);                    // no more loads left
> 4649     }
>
> Regards
> Patrick
>
> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Dmitry Samersoff
> Sent: Sunday, May 19, 2019 11:42 PM
> To: Dmitrij Pochepko <dmitrij.pochepko at bell-sw.com>; Andrew Haley <aph at redhat.com>; Pengfei Li (Arm Technology China) <Pengfei.Li at arm.com>
> Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net
> Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement
>
> Dmitrij,
>
> The changes looks good to me.
>
> -Dmitry
>
> On 25.02.2019 19:52, Dmitrij Pochepko wrote:
>> Hi Andrew, Pengfei,
>>
>> I created webrev.02 with all your suggestions implemented:
>>
>> webrev: http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/
>>
>> - comments are now both in separate section and inlined into code.
>> - documentation mismatch mentioned by Pengfei is fixed:
>> -- SHORT_LAST_INIT label name misprint changed to correct SHORT_LAST
>> -- SHORT_LOOP_TAIL block now merged with last instruction.
>> Documentation is updated respectively
>> - minor other changes to layout and wording
>>
>> Newly developed tests were run as sanity and they passed.
>>
>> Thanks,
>> Dmitrij
>>
>> On 22/02/2019 6:42 PM, Andrew Haley wrote:
>>> On 2/22/19 10:31 AM, Pengfei Li (Arm Technology China) wrote:
>>>
>>>> So personally, I still prefer to inline the comments with the
>>>> original code block to avoid this kind of inconsistencies. And it
>>>> makes us easier to review or maintain the code together with the
>>>> doc, as we don't need to scroll back and force. I don't know the
>>>> benefit of making the code documentation as a separate part. What's
>>>> your opinion, Andrew Haley?
>>> I agree with you. There's no harm having both inline and separate.
>>>

From vladimir.kozlov at oracle.com  Fri Nov 15 19:11:58 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 15 Nov 2019 11:11:58 -0800
Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java
 only works with server VMs.
In-Reply-To: <VE1PR08MB4656A89817FF722ECF400A778E700@VE1PR08MB4656.eurprd08.prod.outlook.com>
References: <20191112120936.1D826D285F@aojmv0009>
 <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com>
 <2w8hcs8xd1-1@aserp2030.oracle.com>
 <7BEE375F-312E-4EE5-B0FA-2223752689A3@oracle.com>
 <71204849-979a-bd2e-b302-69ac81a44bca@oracle.com>
 <20191114112213.D20B8D9B6E@aojmv0009>
 <VE1PR08MB4656A89817FF722ECF400A778E700@VE1PR08MB4656.eurprd08.prod.outlook.com>
Message-ID: <8f0630cf-7fe6-8021-f195-9595d6f460c9@oracle.com>

Note, compiler/c2 and compiler/c1 was misleading naming for tests directories which is nothing to do with C1 and C2 JIT 
compilers. They are simply 2 groups of tests we split so they can be executed in parallel in reasonable time.

Vladimir

On 11/14/19 7:35 PM, Yang Zhang (Arm Technology China) wrote:
> Hi Christoph, Igor, Vladimir,
> 
> Thanks very much for your fix. After discussion, we have got a better solution for this issue. Do we need to change the following files in which MaxVectorSize option is used?
> 
> [1] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestNaNVector.java
> [2] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestPopCountVector.java
> [3] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/c2/cr6340864
> 
> Ps. For [3],  it locates in c2 directory. So I'm not sure whether they will be excluded in jtreg test with client mode.
> 
> Regards
> Yang
> 
> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of christoph.goettschkes at microdoc.com
> Sent: Thursday, November 14, 2019 7:21 PM
> To: vladimir.kozlov at oracle.com; igor.ignatyev at oracle.com
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs.
> 
> Thanks for your feedback, this resolves my concerns and I am happy with the solution. I integrated the suggestions from Vladimir, here is the latest webrev:
> 
> https://cr.openjdk.java.net/~cgo/8231954/webrev.02/
> 
> I re-tested and it works as expected.
> Please give your consent if this is fine for you as well.
> 
> -- Christoph
> 
> Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote on 2019-11-13 20:32:18:
> 
>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> To: Igor Ignatyev <igor.ignatyev at oracle.com>,
> christoph.goettschkes at microdoc.com
>> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>> Date: 2019-11-13 20:32
>> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/
>> TestCharVect2.java only works with server VMs.
>>
>> On 11/13/19 11:11 AM, Igor Ignatyev wrote:
>>> @Christoph,
>>>
>>> webrev.01 looks good to me.
>>> I always thought that jvmci feature can be built only when compiler2
>> feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp
>> suggests that jvmci can be used w/o compiler2; I don't think we have
>> ever build/test, let alone support, this configuration.
>>>
>>> @Vladimir,
>>> did/do we plan to support compiler1 + jvmci w/o compiler2
> configuration?
>>
>> Yes. It could be configuration when we start looking on replacing C1
>> with Graal. I think several people were interested in "Client VM" like
>> configuration.
>> Also Server configuration without C2 (with Graal or other jvmci
>> compiler) which would be out configuration in a future.
>>
>> But I would prefer to be more explicit in these changes:
>>
>> @requires vm.compiler2.enabled | vm.graal.enabled
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Thanks,
>>> -- Igor
>>>
>>>> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com
> wrote:
>>>>
>>>> Hi Igor,
>>>>
>>>> thanks for your explanation.
>>>>
>>>> Igor Ignatyev <igor.ignatyev at oracle.com> wrote on 2019-11-12
> 20:40:46:
>>>>
>>>>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our
> tests, as
>>>>> in most cases, it causes wasted compute time (as in this test) and
> can
>>>>> also lead to wrong/deprecated/deleted flags sneaking into the
> testbase
>>>>
>>>> Agreed. I also wanted to discuss this, since I think that your
> solution
>>>> is better than mine, but at the same time, I saw possible problems
> with
>>>> it, see below.
>>>>
>>>>> as '@requires vm.flavor == "server"' filters configurations based
>>>>> vm build type, it will still allow execution on JVM w/ JVMCI and
>>>>> when
> JVMCI
>>>>> compiler is selected, as it will still be Server VM build. so, in
>>>>> a sense, the test will be w/ JVMCI in the same way as w/ your
> approach.
>>>>
>>>> My concern is not about server VMs with JVMCI, but client VMs with
> JVMCI
>>>> enabled. Is this a valid configuration? The MaxVectorSize option is
>>>> defined in [1] as well as in [2], so for me it looks like
> MaxVectorSize
>>>> can be used for any VM variant as long as JVMCI is enabled. The
>>>> configure script also states that both compilers are possible (if
>>>> you configure with --with-jvm-features='jvmci'):
>>>>
>>>> configure: error: Specified JVM feature 'jvmci' requires feature
>>>> 'compiler2' or 'compiler1'
>>>>
>>>> Should maybe the requires tag "vm.jvmci" be used as well, like:
>>>>
>>>>      @requires vm.flavor == "server" | vm.jvmci
>>>>
>>>>> this is the known limitation of jtreg/@requires, and our current
>>>>> way
> to
>>>>> workaround it is to split a test description based on @requires
> values
>>>>
>>>> Yes, if the @requires tag is used, splitting up the test looks like
>>>> a
> good
>>>> idea. I didn't know that it is possible to have multiple test
> descriptions
>>>> in one test file.
>>>>
>>>> I created a new webrev with the new ideas:
>>>>
>>>> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/
>>>>
>>>> I tested with an amd64 client and server VM and it looks good. I am
>>>> currently unable to build a client VM with JVMCI enabled, hence no
> test
>>>> for that yet. I get compile errors and as soon as I resolve those,
>>>> runtime errors occur. Before I look into that, I would like to know
> if
>>>> client VMs with JVMCI enabled are supported or not.
>>>>
>>>> Thanks,
>>>> Christoph
>>>>
>>>> [1]
>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/
>> share/opto/c2_globals.hpp
>>>>
>>>> [2]
>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/
>> share/jvmci/jvmci_globals.hpp
>>>>
>>>
>>
> 

From igor.ignatyev at oracle.com  Fri Nov 15 19:21:51 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Fri, 15 Nov 2019 11:21:51 -0800
Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java
 only works with server VMs.
In-Reply-To: <8f0630cf-7fe6-8021-f195-9595d6f460c9@oracle.com>
References: <20191112120936.1D826D285F@aojmv0009>
 <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com>
 <2w8hcs8xd1-1@aserp2030.oracle.com>
 <7BEE375F-312E-4EE5-B0FA-2223752689A3@oracle.com>
 <71204849-979a-bd2e-b302-69ac81a44bca@oracle.com>
 <20191114112213.D20B8D9B6E@aojmv0009>
 <VE1PR08MB4656A89817FF722ECF400A778E700@VE1PR08MB4656.eurprd08.prod.outlook.com>
 <8f0630cf-7fe6-8021-f195-9595d6f460c9@oracle.com>
Message-ID: <27E94E0A-7A99-4B0D-B96D-7DADF6201542@oracle.com>

I'd also add that unfortunately it's not always right to add '@requirers vm.compiler2.enabled' to a test just b/c it uses some c2-only flags. there are cases when such tests aren't just still valid w/o these flags, but are capable to spot bugs, on the other hand, tests which have multiple runs w/ different values of c2-only (or c1-only) flags can be split to reduce wasted time. that's to say decision on whenever @requires is a right thing should be done on test-per-test basis.

as 8231954 was only about TestCharVect2 test, I suggest we push Christoph's webrev.03 and file an RFE to deal w/ other tests, or retrofit 8228493 to talk not only about non-product flags but also about c2/c1-only flags and use it as an umbrella for discussion/work-tracking.

-- Igor

> On Nov 15, 2019, at 11:11 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Note, compiler/c2 and compiler/c1 was misleading naming for tests directories which is nothing to do with C1 and C2 JIT compilers. They are simply 2 groups of tests we split so they can be executed in parallel in reasonable time.
> 
> Vladimir
> 
> On 11/14/19 7:35 PM, Yang Zhang (Arm Technology China) wrote:
>> Hi Christoph, Igor, Vladimir,
>> Thanks very much for your fix. After discussion, we have got a better solution for this issue. Do we need to change the following files in which MaxVectorSize option is used?
>> [1] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestNaNVector.java
>> [2] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestPopCountVector.java
>> [3] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/c2/cr6340864
>> Ps. For [3],  it locates in c2 directory. So I'm not sure whether they will be excluded in jtreg test with client mode.
>> Regards
>> Yang
>> -----Original Message-----
>> From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of christoph.goettschkes at microdoc.com
>> Sent: Thursday, November 14, 2019 7:21 PM
>> To: vladimir.kozlov at oracle.com; igor.ignatyev at oracle.com
>> Cc: hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs.
>> Thanks for your feedback, this resolves my concerns and I am happy with the solution. I integrated the suggestions from Vladimir, here is the latest webrev:
>> https://cr.openjdk.java.net/~cgo/8231954/webrev.02/
>> I re-tested and it works as expected.
>> Please give your consent if this is fine for you as well.
>> -- Christoph
>> Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote on 2019-11-13 20:32:18:
>>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>>> To: Igor Ignatyev <igor.ignatyev at oracle.com>,
>> christoph.goettschkes at microdoc.com
>>> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>>> Date: 2019-11-13 20:32
>>> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/
>>> TestCharVect2.java only works with server VMs.
>>> 
>>> On 11/13/19 11:11 AM, Igor Ignatyev wrote:
>>>> @Christoph,
>>>> 
>>>> webrev.01 looks good to me.
>>>> I always thought that jvmci feature can be built only when compiler2
>>> feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp
>>> suggests that jvmci can be used w/o compiler2; I don't think we have
>>> ever build/test, let alone support, this configuration.
>>>> 
>>>> @Vladimir,
>>>> did/do we plan to support compiler1 + jvmci w/o compiler2
>> configuration?
>>> 
>>> Yes. It could be configuration when we start looking on replacing C1
>>> with Graal. I think several people were interested in "Client VM" like
>>> configuration.
>>> Also Server configuration without C2 (with Graal or other jvmci
>>> compiler) which would be out configuration in a future.
>>> 
>>> But I would prefer to be more explicit in these changes:
>>> 
>>> @requires vm.compiler2.enabled | vm.graal.enabled
>>> 
>>> Thanks,
>>> Vladimir
>>> 
>>>> 
>>>> Thanks,
>>>> -- Igor
>>>> 
>>>>> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com
>> wrote:
>>>>> 
>>>>> Hi Igor,
>>>>> 
>>>>> thanks for your explanation.
>>>>> 
>>>>> Igor Ignatyev <igor.ignatyev at oracle.com> wrote on 2019-11-12
>> 20:40:46:
>>>>> 
>>>>>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our
>> tests, as
>>>>>> in most cases, it causes wasted compute time (as in this test) and
>> can
>>>>>> also lead to wrong/deprecated/deleted flags sneaking into the
>> testbase
>>>>> 
>>>>> Agreed. I also wanted to discuss this, since I think that your
>> solution
>>>>> is better than mine, but at the same time, I saw possible problems
>> with
>>>>> it, see below.
>>>>> 
>>>>>> as '@requires vm.flavor == "server"' filters configurations based
>>>>>> vm build type, it will still allow execution on JVM w/ JVMCI and
>>>>>> when
>> JVMCI
>>>>>> compiler is selected, as it will still be Server VM build. so, in
>>>>>> a sense, the test will be w/ JVMCI in the same way as w/ your
>> approach.
>>>>> 
>>>>> My concern is not about server VMs with JVMCI, but client VMs with
>> JVMCI
>>>>> enabled. Is this a valid configuration? The MaxVectorSize option is
>>>>> defined in [1] as well as in [2], so for me it looks like
>> MaxVectorSize
>>>>> can be used for any VM variant as long as JVMCI is enabled. The
>>>>> configure script also states that both compilers are possible (if
>>>>> you configure with --with-jvm-features='jvmci'):
>>>>> 
>>>>> configure: error: Specified JVM feature 'jvmci' requires feature
>>>>> 'compiler2' or 'compiler1'
>>>>> 
>>>>> Should maybe the requires tag "vm.jvmci" be used as well, like:
>>>>> 
>>>>>     @requires vm.flavor == "server" | vm.jvmci
>>>>> 
>>>>>> this is the known limitation of jtreg/@requires, and our current
>>>>>> way
>> to
>>>>>> workaround it is to split a test description based on @requires
>> values
>>>>> 
>>>>> Yes, if the @requires tag is used, splitting up the test looks like
>>>>> a
>> good
>>>>> idea. I didn't know that it is possible to have multiple test
>> descriptions
>>>>> in one test file.
>>>>> 
>>>>> I created a new webrev with the new ideas:
>>>>> 
>>>>> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/
>>>>> 
>>>>> I tested with an amd64 client and server VM and it looks good. I am
>>>>> currently unable to build a client VM with JVMCI enabled, hence no
>> test
>>>>> for that yet. I get compile errors and as soon as I resolve those,
>>>>> runtime errors occur. Before I look into that, I would like to know
>> if
>>>>> client VMs with JVMCI enabled are supported or not.
>>>>> 
>>>>> Thanks,
>>>>> Christoph
>>>>> 
>>>>> [1]
>>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/
>>> share/opto/c2_globals.hpp
>>>>> 
>>>>> [2]
>>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/
>>> share/jvmci/jvmci_globals.hpp
>>>>> 
>>>> 
>>> 


From igor.ignatyev at oracle.com  Fri Nov 15 19:33:07 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Fri, 15 Nov 2019 11:33:07 -0800
Subject: RFR(S) : 8214904 : Test8004741.java failed due to "Too few
 ThreadDeath hits; expected at least 6 but saw only 5"
Message-ID: <A4F32C46-6640-4632-9776-6B1F7C66E06F@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00
> 52 lines changed: 13 ins; 24 del; 15 mod;

Hi all,

could you please review this small patch which (hopefully) solves intermittent failures of compiler/c2/Test8004741 test?
the test used to run 12 times and expecting that no less (actually it was more than) 6 times ThreadDeath exception happenes during array allocation; the patch changes the test to run until ThreadDeath got caught 6 times.

the test has been also updated to use exceptions instead of System.exit to signal test failure and to use whitebox to check that 'test' method got compiled.

testing: 100 times on windows-x64-debug (where the test failed) + once on all platform
webrev: http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00
JBS: https://bugs.openjdk.java.net/browse/JDK-8214904

Thanks,
-- Igor

From vladimir.kozlov at oracle.com  Fri Nov 15 19:35:42 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 15 Nov 2019 11:35:42 -0800
Subject: RFR(S) : 8214904 : Test8004741.java failed due to "Too few
 ThreadDeath hits; expected at least 6 but saw only 5"
In-Reply-To: <A4F32C46-6640-4632-9776-6B1F7C66E06F@oracle.com>
References: <A4F32C46-6640-4632-9776-6B1F7C66E06F@oracle.com>
Message-ID: <88850dc4-30b3-6157-5f99-d62094b9c9b9@oracle.com>

Good.

Thanks,
Vladimir

On 11/15/19 11:33 AM, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00
>> 52 lines changed: 13 ins; 24 del; 15 mod;
> 
> Hi all,
> 
> could you please review this small patch which (hopefully) solves intermittent failures of compiler/c2/Test8004741 test?
> the test used to run 12 times and expecting that no less (actually it was more than) 6 times ThreadDeath exception happenes during array allocation; the patch changes the test to run until ThreadDeath got caught 6 times.
> 
> the test has been also updated to use exceptions instead of System.exit to signal test failure and to use whitebox to check that 'test' method got compiled.
> 
> testing: 100 times on windows-x64-debug (where the test failed) + once on all platform
> webrev: http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00
> JBS: https://bugs.openjdk.java.net/browse/JDK-8214904
> 
> Thanks,
> -- Igor
> 

From ekaterina.pavlova at oracle.com  Fri Nov 15 20:25:20 2019
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Fri, 15 Nov 2019 12:25:20 -0800
Subject: RFR(S) : 8214904 : Test8004741.java failed due to "Too few
 ThreadDeath hits; expected at least 6 but saw only 5"
In-Reply-To: <A4F32C46-6640-4632-9776-6B1F7C66E06F@oracle.com>
References: <A4F32C46-6640-4632-9776-6B1F7C66E06F@oracle.com>
Message-ID: <78486f8c-84e2-ab0b-d92b-72b538126572@oracle.com>

Hi Igor,

you also mentioned in the bug report that "there is also a race on 'passed' field, j.u.c.AtomicInteger should be used here".
Is it still good thing to do?

thanks,
-katya

On 11/15/19 11:33 AM, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00
>> 52 lines changed: 13 ins; 24 del; 15 mod;
> 
> Hi all,
> 
> could you please review this small patch which (hopefully) solves intermittent failures of compiler/c2/Test8004741 test?
> the test used to run 12 times and expecting that no less (actually it was more than) 6 times ThreadDeath exception happenes during array allocation; the patch changes the test to run until ThreadDeath got caught 6 times.
> 
> the test has been also updated to use exceptions instead of System.exit to signal test failure and to use whitebox to check that 'test' method got compiled.
> 
> testing: 100 times on windows-x64-debug (where the test failed) + once on all platform
> webrev: http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00
> JBS: https://bugs.openjdk.java.net/browse/JDK-8214904
> 
> Thanks,
> -- Igor
> 


From igor.ignatyev at oracle.com  Fri Nov 15 21:28:25 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Fri, 15 Nov 2019 13:28:25 -0800
Subject: RFR(S) : 8214904 : Test8004741.java failed due to "Too few
 ThreadDeath hits; expected at least 6 but saw only 5"
In-Reply-To: <78486f8c-84e2-ab0b-d92b-72b538126572@oracle.com>
References: <A4F32C46-6640-4632-9776-6B1F7C66E06F@oracle.com>
 <78486f8c-84e2-ab0b-d92b-72b538126572@oracle.com>
Message-ID: <8B066C86-0393-4FB5-8AEE-229EA9EED159@oracle.com>

Hi Katya,

actually, there is no race b/c there is happen-before edge b/w all actions in Test8004741::run (including updates of passed) and Thread.join (L142) in threadTest. so there is no need to use AtomicInteger. I'll add a comment to the bug report.

-- Igor

> On Nov 15, 2019, at 12:25 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
> 
> Hi Igor,
> 
> you also mentioned in the bug report that "there is also a race on 'passed' field, j.u.c.AtomicInteger should be used here".
> Is it still good thing to do?
> 
> thanks,
> -katya
> 
> On 11/15/19 11:33 AM, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00
>>> 52 lines changed: 13 ins; 24 del; 15 mod;
>> Hi all,
>> could you please review this small patch which (hopefully) solves intermittent failures of compiler/c2/Test8004741 test?
>> the test used to run 12 times and expecting that no less (actually it was more than) 6 times ThreadDeath exception happenes during array allocation; the patch changes the test to run until ThreadDeath got caught 6 times.
>> the test has been also updated to use exceptions instead of System.exit to signal test failure and to use whitebox to check that 'test' method got compiled.
>> testing: 100 times on windows-x64-debug (where the test failed) + once on all platform
>> webrev: http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8214904
>> Thanks,
>> -- Igor
> 


From ekaterina.pavlova at oracle.com  Fri Nov 15 21:54:01 2019
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Fri, 15 Nov 2019 13:54:01 -0800
Subject: RFR(S) : 8214904 : Test8004741.java failed due to "Too few
 ThreadDeath hits; expected at least 6 but saw only 5"
In-Reply-To: <8B066C86-0393-4FB5-8AEE-229EA9EED159@oracle.com>
References: <A4F32C46-6640-4632-9776-6B1F7C66E06F@oracle.com>
 <78486f8c-84e2-ab0b-d92b-72b538126572@oracle.com>
 <8B066C86-0393-4FB5-8AEE-229EA9EED159@oracle.com>
Message-ID: <c22aa381-8dbd-a567-e37b-7748fcfe6a2e@oracle.com>

Ok, thanks Igor, looks good then.

-katya

On 11/15/19 1:28 PM, Igor Ignatyev wrote:
> Hi Katya,
> 
> actually, there is no race b/c there is happen-before edge b/w all actions in Test8004741::run (including updates of passed) and Thread.join (L142) in threadTest. so there is no need to use AtomicInteger. I'll add a comment to the bug report.
> 
> -- Igor
> 
>> On Nov 15, 2019, at 12:25 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>
>> Hi Igor,
>>
>> you also mentioned in the bug report that "there is also a race on 'passed' field, j.u.c.AtomicInteger should be used here".
>> Is it still good thing to do?
>>
>> thanks,
>> -katya
>>
>> On 11/15/19 11:33 AM, Igor Ignatyev wrote:
>>> http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00
>>>> 52 lines changed: 13 ins; 24 del; 15 mod;
>>> Hi all,
>>> could you please review this small patch which (hopefully) solves intermittent failures of compiler/c2/Test8004741 test?
>>> the test used to run 12 times and expecting that no less (actually it was more than) 6 times ThreadDeath exception happenes during array allocation; the patch changes the test to run until ThreadDeath got caught 6 times.
>>> the test has been also updated to use exceptions instead of System.exit to signal test failure and to use whitebox to check that 'test' method got compiled.
>>> testing: 100 times on windows-x64-debug (where the test failed) + once on all platform
>>> webrev: http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8214904
>>> Thanks,
>>> -- Igor
>>
> 


From igor.ignatyev at oracle.com  Fri Nov 15 22:23:38 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Fri, 15 Nov 2019 14:23:38 -0800
Subject: RFR(S) : 8214904 : Test8004741.java failed due to "Too few
 ThreadDeath hits; expected at least 6 but saw only 5"
In-Reply-To: <c22aa381-8dbd-a567-e37b-7748fcfe6a2e@oracle.com>
References: <A4F32C46-6640-4632-9776-6B1F7C66E06F@oracle.com>
 <78486f8c-84e2-ab0b-d92b-72b538126572@oracle.com>
 <8B066C86-0393-4FB5-8AEE-229EA9EED159@oracle.com>
 <c22aa381-8dbd-a567-e37b-7748fcfe6a2e@oracle.com>
Message-ID: <E3569F21-FF24-49E3-B75D-9FD20D2BC8E1@oracle.com>

Katya, Vladimir,

thanks for your review, pushed.

-- Igor

> On Nov 15, 2019, at 11:35 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Good.
> 
> Thanks,
> Vladimir

> On Nov 15, 2019, at 1:54 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
> 
> Ok, thanks Igor, looks good then.
> 
> -katya
> 
> On 11/15/19 1:28 PM, Igor Ignatyev wrote:
>> Hi Katya,
>> actually, there is no race b/c there is happen-before edge b/w all actions in Test8004741::run (including updates of passed) and Thread.join (L142) in threadTest. so there is no need to use AtomicInteger. I'll add a comment to the bug report.
>> -- Igor
>>> On Nov 15, 2019, at 12:25 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>>> 
>>> Hi Igor,
>>> 
>>> you also mentioned in the bug report that "there is also a race on 'passed' field, j.u.c.AtomicInteger should be used here".
>>> Is it still good thing to do?
>>> 
>>> thanks,
>>> -katya
>>> 
>>> On 11/15/19 11:33 AM, Igor Ignatyev wrote:
>>>> http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00
>>>>> 52 lines changed: 13 ins; 24 del; 15 mod;
>>>> Hi all,
>>>> could you please review this small patch which (hopefully) solves intermittent failures of compiler/c2/Test8004741 test?
>>>> the test used to run 12 times and expecting that no less (actually it was more than) 6 times ThreadDeath exception happenes during array allocation; the patch changes the test to run until ThreadDeath got caught 6 times.
>>>> the test has been also updated to use exceptions instead of System.exit to signal test failure and to use whitebox to check that 'test' method got compiled.
>>>> testing: 100 times on windows-x64-debug (where the test failed) + once on all platform
>>>> webrev: http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00
>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8214904
>>>> Thanks,
>>>> -- Igor
>>> 
> 


From igor.ignatyev at oracle.com  Sat Nov 16 07:47:20 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Fri, 15 Nov 2019 23:47:20 -0800
Subject: RFR(S) : 8233462 : serviceability/tmtools/jstat tests times out with
 -Xcomp
Message-ID: <2B7FD259-FBC2-4850-B1B3-81D1669F150B@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8233462/webrev.00
> 33 lines changed: 1 ins; 14 del; 18 mod;

Hi all,

could you please review this small fix for tmtools testlibrary?
tmtools tests are believed to fail due to a deadlock-like situation b/w main test process and tmtools process:
(from JBS)
> it seems these tests attach jstat to the main test process, the same process which reads the tool's stdout/stderr, so there is a possibility that this will deadlock: jstat-process produces more output than the buffer can hold, so it blocks till someone (the main process reads it), while the main process waits till jstat completes. 

the patch changes serviceability/tmtools/share/common library (used by all serviceability/tmtools) to redirect tmtool's stdout and stderr into files instead of using jdk.test.lib.process.OutputAnalyzer; I've also added a bit of diagnostic output, so it will be easier to analyze future failures.

webrev: http://cr.openjdk.java.net/~iignatyev//8233462/webrev.00
JBS: https://bugs.openjdk.java.net/browse/JDK-8233462
testing:
 - serviceability/tmtools on windows-x64,linux-x64,macosx-x64,solaris-sparcv9
 - serviceability/tmtools 100 times on linux-x64-debug w/ '-Xcomp -ea -esa -XX:+TieredCompilation -XX:+DeoptimizeALot' (most of failures have been seen on this configuration)

Thanks,
-- Igor

From igor.ignatyev at oracle.com  Sun Nov 17 06:07:19 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Sat, 16 Nov 2019 22:07:19 -0800
Subject: RFR(S) : 8234290 : compiler/c2/Test6857159.java times out and fail to
 clean up files
Message-ID: <37F8BFE5-CF19-42CC-8C26-ECCB2008D4A1@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8234290/webrev.00/index.html
> 67 lines changed: 16 ins; 24 del; 27 mod;

Hi all,

could you please review this small fix for Test6857159 test?
from JBS:
> the test has -XX:CompileOnly=compiler.c2.Test6857159$Test$ct::run, but there is no 'ct' class, there are ct[0-2], and ct0 the only which has 'run' method. shouldNotContain("COMPILE SKIPPED") and shouldContain("$ct0::run (16 bytes)"), which, I guess, were a defense against such situation, didn't help b/c PrintCompilation output doesn't have 'COMPILE SKIPPED' lines and have 'made not compilable on levels 0 1 2 3 ... $ct0::run (16 bytes) excluded by CompileCommand' line.
the patch fixes CompileOnly value (actually replaces it w/ the correct CompileCommand), removes extra layer, and makes the test to use WhiteBox to check if ct0::run got compiled.

webrev: http://cr.openjdk.java.net/~iignatyev//8234290/webrev.00/index.html
JBS: https://bugs.openjdk.java.net/browse/JDK-8234290
testing: 
 - compiler/c2/Test6857159.java once on linux-x64,windows-x64,macosx-x64
 - compiler/c2/Test6857159.java 100 time on windows-x64-debug (where all failures were seen so far)

Thanks,
-- Igor


From igor.ignatyev at oracle.com  Sun Nov 17 19:00:33 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Sun, 17 Nov 2019 11:00:33 -0800
Subject: RFR(S) : 8147017 : Platform.isGraal should be removed
Message-ID: <981118AF-1DAD-4231-9FA6-7A89A46E5EDB@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html
> 16 lines changed: 2 ins; 8 del; 6 mod; 

Hi all,

jdk.test.lib.Platform.isGraal method assumes that JVM w/ Graal as JIT has 'Graal VM' in its name, which is wrong, and caused other to incorrectly assume that '-graal' flag exist and must be used to select Graal compiler. the patch removes this method and updates its only meaningful usage in TestGCLogMessages test. TestGCLogMessages test should use LogMessageWithLevelC2OrJVMCIOnly only when c2 or graal is available, so it's been updated to use corresponding methods of sun.hotspot.code.Compiler class, which requires WhiteBoxAPI being enabled. 


JBS: https://bugs.openjdk.java.net/browse/JDK-8147017
webrev: http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html
testing: tier1 + TestGCLogMessages w/ different JIT configurations

Thanks,
-- Igor

From david.holmes at oracle.com  Mon Nov 18 03:45:35 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 18 Nov 2019 13:45:35 +1000
Subject: RFR(S): 8233193: Incorrect bailout from
 possibly_add_compiler_threads
In-Reply-To: <VI1PR0201MB2479DBD887D69E520208951E9A710@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB2479DBD887D69E520208951E9A710@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <12461130-cafe-d239-46df-08527f67e779@oracle.com>

Hi Martin,

On 15/11/2019 2:18 am, Doerr, Martin wrote:
> Hi,
> 
> I'd like to cleanup exception handling in CompileBroker a little bit.
> 
> Here's my proposal:
> 
> - Use THREAD instead of CHECK where no exceptions get thrown.

That's fine but you also replaced TRAPS with "Thread* THREAD" which is 
pointless given:

#define TRAPS  Thread* THREAD

> - Remove unused preload_classes.

Ok

> - make_thread: Rename thread to new_thread to avoid confusion with 
> THREAD (the current compiler thread).

Ok

> - possibly_add_compiler_threads: Remove usage of EXCEPTION_MARK + CHECK 
> because this functions is not supposed to kill the VM on exceptions. Add 
> assertion to caller.

Ok

> Webrev:
> 
> http://cr.openjdk.java.net/~mdoerr/8233193_CompileBroker/webrev.00/
> 
> @David:
> 
> You didn't like usage of the CHECK macro in the initialization 
> functions, but I think they are ok.
> 
> Not very nice to read, but the behavior looks ok to me.
> 
> At least, I didn't find a better replacement for them. Maybe you have a 
> proposal?

May I suggest a comment then:

  844 void CompileBroker::init_compiler_sweeper_threads() {
        // Ensure any exceptions lead to vm_exit_during_initialization
  845   EXCEPTION_MARK;

Thanks,
David
-----

> Best regards,
> 
> Martin
> 

From patrick at os.amperecomputing.com  Mon Nov 18 03:52:26 2019
From: patrick at os.amperecomputing.com (Patrick Zhang OS)
Date: Mon, 18 Nov 2019 03:52:26 +0000
Subject: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo
 intrinsic documentation and maintenance improvement
In-Reply-To: <6843892a-4267-06cd-0af4-4618fce86396@bell-sw.com>
References: <fc712566-1d25-88d6-b178-652efdc00f36@bell-sw.com>
 <DB7PR08MB31153C8A293F3B4FC2371B34967F0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <86f91401-b7f6-b634-fef1-b0615b8fcde0@redhat.com>
 <e47a5afa-e9ce-1286-97da-5fb9018846cb@bell-sw.com>
 <399186f2-5cb7-b713-33c5-b29b6eb73eb8@samersoff.net>
 <MN2PR01MB6093B0E0C0BD01587FF96F778F700@MN2PR01MB6093.prod.exchangelabs.com>
 <6843892a-4267-06cd-0af4-4618fce86396@bell-sw.com>
Message-ID: <MN2PR01MB6093853821555FE4DE0BEB628F4D0@MN2PR01MB6093.prod.exchangelabs.com>

Thanks for the information.
I am interested in the inconsistence between same_encoding and different_encoding functions, if "overprefetch" can be safe enough, why do we prevent it at the end of large-loop inside same_encoding, why do we protect it more strictly in different_encoding at both the beginning and ending of the large-loop?
I did not mean globally updating largeLoopExitCondition to 64/128, merely the condition at the end of large-loop inside same_encoding. Suppose large-loop could be faster than small-loop (in theory), removing all "overprefetch" conditions would allow more strings go to the large-loop for better performance. Any other potential side-effects?

Regards
Patrick

-----Original Message-----
From: Dmitrij Pochepko <dmitrij.pochepko at bell-sw.com> 
Sent: Friday, November 15, 2019 11:52 PM
To: Patrick Zhang OS <patrick at os.amperecomputing.com>
Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; Dmitry Samersoff <dms at samersoff.net>; Andrew Haley <aph at redhat.com>; Pengfei Li (Arm Technology China) <Pengfei.Li at arm.com>
Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement

Hi Patrick,

My experiments back then showed that few platforms (some of Cortex A*
series) behaves unexpectedly slow when dealing with overprefetch (probably CPU implementation specifics). So this code is some kind of compromise to run relatively well on all platforms I was able to test on (ThunderX, ThunderX2, Cortex A53, Cortex A73). That is the main reason for such code structure.
It's good that you're willing to experiment and improve it, but I'm afraid changing largeLoopExitCondition to 64 LL/UU and 128 for LU/UL will likely make some systems slower as I experimented a lot with this. 
Let us see the performance results for several systems you've got to avoid a situation when one platform benefits by slowing down others. We could offer some help if you don't have some HW available.

Thanks,
Dmitrij

On 15/11/2019 10:51 AM, Patrick Zhang OS wrote:
> Hi Dmitrij,
>
> The inline document inside your this patch is nice in helping me understand the string_compare stub code generation in depth, although I don't know why it was not pushed.
> http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/src/hotspot/cpu
> /aarch64/stubGenerator_aarch64.cpp.sdiff.html
>
> There is one point I don't quite understand, could you please clarify it a little more? Thanks in advance!
> The large loop with prefetching logic, in different_encoding function, it uses "cnt2 - prefetchLoopExitCondition" to tell whether the 1st iteration should be executed or not, while the same_encoding does not do this, why?
>
> I was thinking that "prefetching out of the string boundary" could be an invalid operation, but it seems a misunderstanding, isn't it? The AArch64 ISA document says that prfm instruction signals the memory system and expects "preloading the cache line containing the specified address into one or more caches" would be done, in order to speed up the memory accesses when they do occur. If this is just a hint for coming ldrs, and safe enough, could we not restrict the rest iterations (2 ~ n) with largeLoopExitCondition? instead use 64 LL/UU, and 128 for LU/UL. As such more iteration can say in the large loop (SoftwarePrefetchHintDistance=192 for example), for better performance. Any comments?
>
> Thanks
>
> 4327   address generate_compare_long_string_different_encoding(bool isLU) {
> 4377     if (SoftwarePrefetchHintDistance >= 0) {
> 4378       __ subs(rscratch2, cnt2, prefetchLoopExitCondition);
> 4379       __ br(__ LT, NO_PREFETCH);
> 4380       __ bind(LARGE_LOOP_PREFETCH);                  // 64-characters loop
> ... ...
> 4395           __ subs(rscratch2, cnt2, prefetchLoopExitCondition); // <-- could we use subs(rscratch2, cnt2, 128) instead?
> 4396           __ br(__ GE, LARGE_LOOP_PREFETCH);
> 4397     } // end of 64-characters loop
>
> 4616   address generate_compare_long_string_same_encoding(bool isLL) {
> 4637     if (SoftwarePrefetchHintDistance >= 0) {
> 4638       __ bind(LARGE_LOOP_PREFETCH);
> 4639         __ prfm(Address(str1, SoftwarePrefetchHintDistance));
> 4640         __ prfm(Address(str2, SoftwarePrefetchHintDistance));
> 4641         compare_string_16_bytes_same(DIFF, DIFF2);
> 4642         compare_string_16_bytes_same(DIFF, DIFF2);
> 4643         __ sub(cnt2, cnt2, 8 * characters_in_word);
> 4644         compare_string_16_bytes_same(DIFF, DIFF2);
> 4645         __ subs(rscratch2, cnt2, largeLoopExitCondition); // rscratch2 is not used. Use subs instead of cmp in case of potentially large constants // <-- could we use subs(rscratch2, cnt2, 64) instead?
> 4646         compare_string_16_bytes_same(DIFF, DIFF2);
> 4647         __ br(__ GT, LARGE_LOOP_PREFETCH);
> 4648         __ cbz(cnt2, LAST_CHECK);                    // no more loads left
> 4649     }
>
> Regards
> Patrick
>
> -----Original Message-----
> From: hotspot-compiler-dev 
> <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Dmitry 
> Samersoff
> Sent: Sunday, May 19, 2019 11:42 PM
> To: Dmitrij Pochepko <dmitrij.pochepko at bell-sw.com>; Andrew Haley 
> <aph at redhat.com>; Pengfei Li (Arm Technology China) 
> <Pengfei.Li at arm.com>
> Cc: hotspot-compiler-dev at openjdk.java.net; 
> aarch64-port-dev at openjdk.java.net
> Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: 
> String::compareTo intrinsic documentation and maintenance improvement
>
> Dmitrij,
>
> The changes looks good to me.
>
> -Dmitry
>
> On 25.02.2019 19:52, Dmitrij Pochepko wrote:
>> Hi Andrew, Pengfei,
>>
>> I created webrev.02 with all your suggestions implemented:
>>
>> webrev: http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/
>>
>> - comments are now both in separate section and inlined into code.
>> - documentation mismatch mentioned by Pengfei is fixed:
>> -- SHORT_LAST_INIT label name misprint changed to correct SHORT_LAST
>> -- SHORT_LOOP_TAIL block now merged with last instruction.
>> Documentation is updated respectively
>> - minor other changes to layout and wording
>>
>> Newly developed tests were run as sanity and they passed.
>>
>> Thanks,
>> Dmitrij
>>
>> On 22/02/2019 6:42 PM, Andrew Haley wrote:
>>> On 2/22/19 10:31 AM, Pengfei Li (Arm Technology China) wrote:
>>>
>>>> So personally, I still prefer to inline the comments with the 
>>>> original code block to avoid this kind of inconsistencies. And it 
>>>> makes us easier to review or maintain the code together with the 
>>>> doc, as we don't need to scroll back and force. I don't know the 
>>>> benefit of making the code documentation as a separate part. What's 
>>>> your opinion, Andrew Haley?
>>> I agree with you. There's no harm having both inline and separate.
>>>

From patrick at os.amperecomputing.com  Mon Nov 18 04:03:55 2019
From: patrick at os.amperecomputing.com (Patrick Zhang OS)
Date: Mon, 18 Nov 2019 04:03:55 +0000
Subject: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo
 intrinsic documentation and maintenance improvement
In-Reply-To: <MN2PR01MB6093853821555FE4DE0BEB628F4D0@MN2PR01MB6093.prod.exchangelabs.com>
References: <fc712566-1d25-88d6-b178-652efdc00f36@bell-sw.com>
 <DB7PR08MB31153C8A293F3B4FC2371B34967F0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <86f91401-b7f6-b634-fef1-b0615b8fcde0@redhat.com>
 <e47a5afa-e9ce-1286-97da-5fb9018846cb@bell-sw.com>
 <399186f2-5cb7-b713-33c5-b29b6eb73eb8@samersoff.net>
 <MN2PR01MB6093B0E0C0BD01587FF96F778F700@MN2PR01MB6093.prod.exchangelabs.com>
 <6843892a-4267-06cd-0af4-4618fce86396@bell-sw.com>
 <MN2PR01MB6093853821555FE4DE0BEB628F4D0@MN2PR01MB6093.prod.exchangelabs.com>
Message-ID: <MN2PR01MB609392A8FA8C9EE6C27C17BE8F4D0@MN2PR01MB6093.prod.exchangelabs.com>

>> changing largeLoopExitCondition to 64 LL/UU and 128 for LU/UL will likely make some systems slower as I experimented a lot with this.
Sorry my second paragraph was inaccurate, it seems you experimented that there were some cases ran well with the first iteration of the large-loop but would rather quit the loop and go to the small-loop immediately for better performance (?). Please correct me if I misunderstood this. Thanks.

Regards
Patrick

-----Original Message-----
From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Patrick Zhang OS
Sent: Monday, November 18, 2019 11:52 AM
To: Dmitrij Pochepko <dmitrij.pochepko at bell-sw.com>
Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net
Subject: RE: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement

Thanks for the information.
I am interested in the inconsistence between same_encoding and different_encoding functions, if "overprefetch" can be safe enough, why do we prevent it at the end of large-loop inside same_encoding, why do we protect it more strictly in different_encoding at both the beginning and ending of the large-loop?
I did not mean globally updating largeLoopExitCondition to 64/128, merely the condition at the end of large-loop inside same_encoding. Suppose large-loop could be faster than small-loop (in theory), removing all "overprefetch" conditions would allow more strings go to the large-loop for better performance. Any other potential side-effects?

Regards
Patrick

-----Original Message-----
From: Dmitrij Pochepko <dmitrij.pochepko at bell-sw.com>
Sent: Friday, November 15, 2019 11:52 PM
To: Patrick Zhang OS <patrick at os.amperecomputing.com>
Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; Dmitry Samersoff <dms at samersoff.net>; Andrew Haley <aph at redhat.com>; Pengfei Li (Arm Technology China) <Pengfei.Li at arm.com>
Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement

Hi Patrick,

My experiments back then showed that few platforms (some of Cortex A*
series) behaves unexpectedly slow when dealing with overprefetch (probably CPU implementation specifics). So this code is some kind of compromise to run relatively well on all platforms I was able to test on (ThunderX, ThunderX2, Cortex A53, Cortex A73). That is the main reason for such code structure.
It's good that you're willing to experiment and improve it, but I'm afraid changing largeLoopExitCondition to 64 LL/UU and 128 for LU/UL will likely make some systems slower as I experimented a lot with this. 
Let us see the performance results for several systems you've got to avoid a situation when one platform benefits by slowing down others. We could offer some help if you don't have some HW available.

Thanks,
Dmitrij

On 15/11/2019 10:51 AM, Patrick Zhang OS wrote:
> Hi Dmitrij,
>
> The inline document inside your this patch is nice in helping me understand the string_compare stub code generation in depth, although I don't know why it was not pushed.
> http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/src/hotspot/cpu
> /aarch64/stubGenerator_aarch64.cpp.sdiff.html
>
> There is one point I don't quite understand, could you please clarify it a little more? Thanks in advance!
> The large loop with prefetching logic, in different_encoding function, it uses "cnt2 - prefetchLoopExitCondition" to tell whether the 1st iteration should be executed or not, while the same_encoding does not do this, why?
>
> I was thinking that "prefetching out of the string boundary" could be an invalid operation, but it seems a misunderstanding, isn't it? The AArch64 ISA document says that prfm instruction signals the memory system and expects "preloading the cache line containing the specified address into one or more caches" would be done, in order to speed up the memory accesses when they do occur. If this is just a hint for coming ldrs, and safe enough, could we not restrict the rest iterations (2 ~ n) with largeLoopExitCondition? instead use 64 LL/UU, and 128 for LU/UL. As such more iteration can say in the large loop (SoftwarePrefetchHintDistance=192 for example), for better performance. Any comments?
>
> Thanks
>
> 4327   address generate_compare_long_string_different_encoding(bool isLU) {
> 4377     if (SoftwarePrefetchHintDistance >= 0) {
> 4378       __ subs(rscratch2, cnt2, prefetchLoopExitCondition);
> 4379       __ br(__ LT, NO_PREFETCH);
> 4380       __ bind(LARGE_LOOP_PREFETCH);                  // 64-characters loop
> ... ...
> 4395           __ subs(rscratch2, cnt2, prefetchLoopExitCondition); // <-- could we use subs(rscratch2, cnt2, 128) instead?
> 4396           __ br(__ GE, LARGE_LOOP_PREFETCH);
> 4397     } // end of 64-characters loop
>
> 4616   address generate_compare_long_string_same_encoding(bool isLL) {
> 4637     if (SoftwarePrefetchHintDistance >= 0) {
> 4638       __ bind(LARGE_LOOP_PREFETCH);
> 4639         __ prfm(Address(str1, SoftwarePrefetchHintDistance));
> 4640         __ prfm(Address(str2, SoftwarePrefetchHintDistance));
> 4641         compare_string_16_bytes_same(DIFF, DIFF2);
> 4642         compare_string_16_bytes_same(DIFF, DIFF2);
> 4643         __ sub(cnt2, cnt2, 8 * characters_in_word);
> 4644         compare_string_16_bytes_same(DIFF, DIFF2);
> 4645         __ subs(rscratch2, cnt2, largeLoopExitCondition); // rscratch2 is not used. Use subs instead of cmp in case of potentially large constants // <-- could we use subs(rscratch2, cnt2, 64) instead?
> 4646         compare_string_16_bytes_same(DIFF, DIFF2);
> 4647         __ br(__ GT, LARGE_LOOP_PREFETCH);
> 4648         __ cbz(cnt2, LAST_CHECK);                    // no more loads left
> 4649     }
>
> Regards
> Patrick
>
> -----Original Message-----
> From: hotspot-compiler-dev
> <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Dmitry 
> Samersoff
> Sent: Sunday, May 19, 2019 11:42 PM
> To: Dmitrij Pochepko <dmitrij.pochepko at bell-sw.com>; Andrew Haley 
> <aph at redhat.com>; Pengfei Li (Arm Technology China) 
> <Pengfei.Li at arm.com>
> Cc: hotspot-compiler-dev at openjdk.java.net;
> aarch64-port-dev at openjdk.java.net
> Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: 
> String::compareTo intrinsic documentation and maintenance improvement
>
> Dmitrij,
>
> The changes looks good to me.
>
> -Dmitry
>
> On 25.02.2019 19:52, Dmitrij Pochepko wrote:
>> Hi Andrew, Pengfei,
>>
>> I created webrev.02 with all your suggestions implemented:
>>
>> webrev: http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/
>>
>> - comments are now both in separate section and inlined into code.
>> - documentation mismatch mentioned by Pengfei is fixed:
>> -- SHORT_LAST_INIT label name misprint changed to correct SHORT_LAST
>> -- SHORT_LOOP_TAIL block now merged with last instruction.
>> Documentation is updated respectively
>> - minor other changes to layout and wording
>>
>> Newly developed tests were run as sanity and they passed.
>>
>> Thanks,
>> Dmitrij
>>
>> On 22/02/2019 6:42 PM, Andrew Haley wrote:
>>> On 2/22/19 10:31 AM, Pengfei Li (Arm Technology China) wrote:
>>>
>>>> So personally, I still prefer to inline the comments with the 
>>>> original code block to avoid this kind of inconsistencies. And it 
>>>> makes us easier to review or maintain the code together with the 
>>>> doc, as we don't need to scroll back and force. I don't know the 
>>>> benefit of making the code documentation as a separate part. What's 
>>>> your opinion, Andrew Haley?
>>> I agree with you. There's no harm having both inline and separate.
>>>

From Pengfei.Li at arm.com  Mon Nov 18 09:58:11 2019
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Mon, 18 Nov 2019 09:58:11 +0000
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
Message-ID: <DB7PR08MB3115D5D4659DC9A7886F2A44964D0@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi Andrew,

> I see
> 
>   if (use_XOR_for_compressed_class_base) {
>     if (CompressedKlassPointers::shift() != 0) {
>       eor(dst, src, (uint64_t)CompressedKlassPointers::base());
>       lsr(dst, dst, LogKlassAlignmentInBytes);
>     } else {
>       eor(dst, src, (uint64_t)CompressedKlassPointers::base());
>     }
>     return;
>   }
> 
>   if (((uint64_t)CompressedKlassPointers::base() & 0xffffffff) == 0
>       && CompressedKlassPointers::shift() == 0) {
>     movw(dst, src);
>     return;
>   }
> 
>   ... followed by code which does use r27.
> 
> Do you ever see r27 being used? If so, I'd be interested to know how this gets
> triggered and what command-line arguments you use. It's rather inefficient.

I think you're right. I tried hard with various VM options but still failed to
get the code after this part triggered. The worst case I've ever found is that
the encoding/decoding returns at if block
  if (((uint64_t)CompressedKlassPointers::base() & 0xffffffff) == 0
      && CompressedKlassPointers::shift() == 0) { ... }

By browsing the code, I found this is caused by a metaspace reservation trick
that always tries to make AArch64 metaspace 4G-aligned. [1]

If we do have the confidence that r27 won't be used for class pointers, I will
remove UseCompressedClassPointers in my if condition. Another question, shall
we clean up the (almost) dead code which uses r27 for encoding/decoding class
pointers?

[1] http://hg.openjdk.java.net/jdk/jdk/file/7bdc4f073c7f/src/hotspot/share/memory/metaspace.cpp#l1048

--
Thanks,
Pengfei


From aph at redhat.com  Mon Nov 18 10:06:46 2019
From: aph at redhat.com (Andrew Haley)
Date: Mon, 18 Nov 2019 10:06:46 +0000
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <DB7PR08MB3115D5D4659DC9A7886F2A44964D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <DB7PR08MB3115D5D4659DC9A7886F2A44964D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com>

On 11/18/19 9:58 AM, Pengfei Li (Arm Technology China) wrote:
> I think you're right. I tried hard with various VM options but still failed to
> get the code after this part triggered. The worst case I've ever found is that
> the encoding/decoding returns at if block
>   if (((uint64_t)CompressedKlassPointers::base() & 0xffffffff) == 0
>       && CompressedKlassPointers::shift() == 0) { ... }
> 
> By browsing the code, I found this is caused by a metaspace reservation trick
> that always tries to make AArch64 metaspace 4G-aligned. [1]
> 
> If we do have the confidence that r27 won't be used for class pointers, I will
> remove UseCompressedClassPointers in my if condition. Another question, shall
> we clean up the (almost) dead code which uses r27 for encoding/decoding class
> pointers?
> 
> [1] http://hg.openjdk.java.net/jdk/jdk/file/7bdc4f073c7f/src/hotspot/share/memory/metaspace.cpp#l1048

We should have a flag which is set if the search for nicely-aligned memory
is successful, and then you can use that flag to determine if r27 is
needed.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From patric.hedlin at oracle.com  Mon Nov 18 10:06:14 2019
From: patric.hedlin at oracle.com (Patric Hedlin)
Date: Mon, 18 Nov 2019 11:06:14 +0100
Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
 check
In-Reply-To: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
References: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
Message-ID: <35dd0640-7f48-11ae-bd12-1fcaf893b2fc@oracle.com>

Dear all,

Please review the new patch, now reduced to a "minimum" (besides the 
table encoding).

Updated in-place.
Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/

Testing: hs-tier1-3


Best regards,
Patric

On 12/11/2019 15:16, Patric Hedlin wrote:
> Dear all,
>
> I would like to ask for help to review the following change/update:
>
> Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376
> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/
>
> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check
>
> ??? Adding a simple subsumption test to IfNode::Ideal to enable a local
> ??? short-circuit for (obviously) redundant if-nodes.
>
> Testing: hs-tier1-4, hs-precheckin-comp
>
>
> Best regards,
> Patric
>


From Pengfei.Li at arm.com  Mon Nov 18 10:35:18 2019
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Mon, 18 Nov 2019 10:35:18 +0000
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <DB7PR08MB3115D5D4659DC9A7886F2A44964D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com>
Message-ID: <DB7PR08MB311548BD55E8D8F07D3E97DF964D0@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi Andrew,

> We should have a flag which is set if the search for nicely-aligned memory is
> successful, and then you can use that flag to determine if r27 is needed.

I just found in current HotSpot code, UseCompressedOops must be on for
UseCompressedClassPointers to be on. See arguments.cpp [1].

If this is true, UseCompressedClassPointers cannot be used without
UseCompressedOops. So wouldn't a single condition of UseCompressedOops be
enough? But the x86_64 code which I referenced has both two conditions.
Is it because the relationship of the arguments are subject to change in the
future?

[1] http://hg.openjdk.java.net/jdk/jdk/file/7bdc4f073c7f/src/hotspot/share/runtime/arguments.cpp#l1715

--
Thanks,
Pengfei


From aph at redhat.com  Mon Nov 18 10:39:03 2019
From: aph at redhat.com (Andrew Haley)
Date: Mon, 18 Nov 2019 10:39:03 +0000
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <DB7PR08MB311548BD55E8D8F07D3E97DF964D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <DB7PR08MB3115D5D4659DC9A7886F2A44964D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com>
 <DB7PR08MB311548BD55E8D8F07D3E97DF964D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <638df51f-f9e0-5ceb-6509-7d396f85d8ec@redhat.com>

On 11/18/19 10:35 AM, Pengfei Li (Arm Technology China) wrote:
> If this is true, UseCompressedClassPointers cannot be used without
> UseCompressedOops. So wouldn't a single condition of UseCompressedOops be
> enough?

Why do you think so? UseCompressedOops doesn't usually need r27.

> But the x86_64 code which I referenced has both two conditions.
> Is it because the relationship of the arguments are subject to change in the
> future?

I have no idea why these flags depend on each other. I'd use compressed
class pointers all the time, regardless of compressed oops.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From martin.doerr at sap.com  Mon Nov 18 10:42:50 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 18 Nov 2019 10:42:50 +0000
Subject: RFR(S): 8233193: Incorrect bailout from
 possibly_add_compiler_threads
In-Reply-To: <12461130-cafe-d239-46df-08527f67e779@oracle.com>
References: <VI1PR0201MB2479DBD887D69E520208951E9A710@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <12461130-cafe-d239-46df-08527f67e779@oracle.com>
Message-ID: <VI1PR0201MB247906E01DF6DD0D7025C9089A4D0@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi David,

thanks for the review.


> That's fine but you also replaced TRAPS with "Thread* THREAD" which is
> pointless given:
> 
> #define TRAPS  Thread* THREAD

I know that it's technically pointless. But I think it's confusing to use TRAPS if we don't expect any exceptions.
We just want to pass a Thread pointer.


> May I suggest a comment then:
> 
>   844 void CompileBroker::init_compiler_sweeper_threads() {
>         // Ensure any exceptions lead to vm_exit_during_initialization
>   845   EXCEPTION_MARK;

Added. I also added:

@@ -647,6 +647,7 @@
   // totalTime performance counter is always created as it is required
   // by the implementation of java.lang.management.CompilationMBean.
   {
+    // Ensure OOM leads to vm_exit_during_initialization.
     EXCEPTION_MARK;
     _perf_total_compilation =
                  PerfDataManager::create_counter(JAVA_CI, "totalTime",


Best regards,
Martin


> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Montag, 18. November 2019 04:46
> To: Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-
> dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> Cc: David Holmes <David.Holmes at oracle.com>
> Subject: Re: RFR(S): 8233193: Incorrect bailout from
> possibly_add_compiler_threads
> 
> Hi Martin,
> 
> On 15/11/2019 2:18 am, Doerr, Martin wrote:
> > Hi,
> >
> > I'd like to cleanup exception handling in CompileBroker a little bit.
> >
> > Here's my proposal:
> >
> > - Use THREAD instead of CHECK where no exceptions get thrown.
> 
> That's fine but you also replaced TRAPS with "Thread* THREAD" which is
> pointless given:
> 
> #define TRAPS  Thread* THREAD
> 
> > - Remove unused preload_classes.
> 
> Ok
> 
> > - make_thread: Rename thread to new_thread to avoid confusion with
> > THREAD (the current compiler thread).
> 
> Ok
> 
> > - possibly_add_compiler_threads: Remove usage of EXCEPTION_MARK +
> CHECK
> > because this functions is not supposed to kill the VM on exceptions. Add
> > assertion to caller.
> 
> Ok
> 
> > Webrev:
> >
> > http://cr.openjdk.java.net/~mdoerr/8233193_CompileBroker/webrev.00/
> >
> > @David:
> >
> > You didn't like usage of the CHECK macro in the initialization
> > functions, but I think they are ok.
> >
> > Not very nice to read, but the behavior looks ok to me.
> >
> > At least, I didn't find a better replacement for them. Maybe you have a
> > proposal?
> 
> May I suggest a comment then:
> 
>   844 void CompileBroker::init_compiler_sweeper_threads() {
>         // Ensure any exceptions lead to vm_exit_during_initialization
>   845   EXCEPTION_MARK;
> 
> Thanks,
> David
> -----
> 
> > Best regards,
> >
> > Martin
> >

From martin.doerr at sap.com  Mon Nov 18 11:23:12 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 18 Nov 2019 11:23:12 +0000
Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
 check
In-Reply-To: <35dd0640-7f48-11ae-bd12-1fcaf893b2fc@oracle.com>
References: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
 <35dd0640-7f48-11ae-bd12-1fcaf893b2fc@oracle.com>
Message-ID: <VI1PR0201MB2479D5A620B74C860E7595709A4D0@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Patric,

I'd consider moving subsuming_bool_test_encode up to avoid the prototype.
But I can also live with it.

Looks good to me.

Best regards,
Martin


> -----Original Message-----
> From: Patric Hedlin <patric.hedlin at oracle.com>
> Sent: Montag, 18. November 2019 11:06
> To: hotspot-compiler-dev at openjdk.java.net; Nils Eliasson
> <nils.eliasson at oracle.com>; Vladimir Ivanov
> <vladimir.x.ivanov at oracle.com>; Doerr, Martin <martin.doerr at sap.com>
> Subject: Re: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
> check
> 
> Dear all,
> 
> Please review the new patch, now reduced to a "minimum" (besides the
> table encoding).
> 
> Updated in-place.
> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/
> 
> Testing: hs-tier1-3
> 
> 
> Best regards,
> Patric
> 
> On 12/11/2019 15:16, Patric Hedlin wrote:
> > Dear all,
> >
> > I would like to ask for help to review the following change/update:
> >
> > Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376
> > Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/
> >
> > 8220376: C2: Int >0 not recognized as !=0 for div by 0 check
> >
> > ??? Adding a simple subsumption test to IfNode::Ideal to enable a local
> > ??? short-circuit for (obviously) redundant if-nodes.
> >
> > Testing: hs-tier1-4, hs-precheckin-comp
> >
> >
> > Best regards,
> > Patric
> >


From david.holmes at oracle.com  Mon Nov 18 11:31:53 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 18 Nov 2019 21:31:53 +1000
Subject: RFR(S): 8233193: Incorrect bailout from
 possibly_add_compiler_threads
In-Reply-To: <VI1PR0201MB247906E01DF6DD0D7025C9089A4D0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB2479DBD887D69E520208951E9A710@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <12461130-cafe-d239-46df-08527f67e779@oracle.com>
 <VI1PR0201MB247906E01DF6DD0D7025C9089A4D0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <1d4cfee4-6063-f2c4-e7d0-060ff93c6b02@oracle.com>

On 18/11/2019 8:42 pm, Doerr, Martin wrote:
> Hi David,
> 
> thanks for the review.
> 
> 
>> That's fine but you also replaced TRAPS with "Thread* THREAD" which is
>> pointless given:
>>
>> #define TRAPS  Thread* THREAD
> 
> I know that it's technically pointless. But I think it's confusing to use TRAPS if we don't expect any exceptions.
> We just want to pass a Thread pointer.

Yes you are right. I just re-read the comments in exceptions.hpp :)

Thanks,
David

> 
>> May I suggest a comment then:
>>
>>    844 void CompileBroker::init_compiler_sweeper_threads() {
>>          // Ensure any exceptions lead to vm_exit_during_initialization
>>    845   EXCEPTION_MARK;
> 
> Added. I also added:
> 
> @@ -647,6 +647,7 @@
>     // totalTime performance counter is always created as it is required
>     // by the implementation of java.lang.management.CompilationMBean.
>     {
> +    // Ensure OOM leads to vm_exit_during_initialization.
>       EXCEPTION_MARK;
>       _perf_total_compilation =
>                    PerfDataManager::create_counter(JAVA_CI, "totalTime",
> 
> 
> 
> Best regards,
> Martin
> 
> 
> 
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Montag, 18. November 2019 04:46
>> To: Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-
>> dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
>> Cc: David Holmes <David.Holmes at oracle.com>
>> Subject: Re: RFR(S): 8233193: Incorrect bailout from
>> possibly_add_compiler_threads
>>
>> Hi Martin,
>>
>> On 15/11/2019 2:18 am, Doerr, Martin wrote:
>>> Hi,
>>>
>>> I'd like to cleanup exception handling in CompileBroker a little bit.
>>>
>>> Here's my proposal:
>>>
>>> - Use THREAD instead of CHECK where no exceptions get thrown.
>>
>> That's fine but you also replaced TRAPS with "Thread* THREAD" which is
>> pointless given:
>>
>> #define TRAPS  Thread* THREAD
>>
>>> - Remove unused preload_classes.
>>
>> Ok
>>
>>> - make_thread: Rename thread to new_thread to avoid confusion with
>>> THREAD (the current compiler thread).
>>
>> Ok
>>
>>> - possibly_add_compiler_threads: Remove usage of EXCEPTION_MARK +
>> CHECK
>>> because this functions is not supposed to kill the VM on exceptions. Add
>>> assertion to caller.
>>
>> Ok
>>
>>> Webrev:
>>>
>>> http://cr.openjdk.java.net/~mdoerr/8233193_CompileBroker/webrev.00/
>>>
>>> @David:
>>>
>>> You didn't like usage of the CHECK macro in the initialization
>>> functions, but I think they are ok.
>>>
>>> Not very nice to read, but the behavior looks ok to me.
>>>
>>> At least, I didn't find a better replacement for them. Maybe you have a
>>> proposal?
>>
>> May I suggest a comment then:
>>
>>    844 void CompileBroker::init_compiler_sweeper_threads() {
>>          // Ensure any exceptions lead to vm_exit_during_initialization
>>    845   EXCEPTION_MARK;
>>
>> Thanks,
>> David
>> -----
>>
>>> Best regards,
>>>
>>> Martin
>>>

From claes.redestad at oracle.com  Mon Nov 18 11:37:42 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Mon, 18 Nov 2019 12:37:42 +0100
Subject: RFR: 8234248: More VectorSet cleanups
Message-ID: <c900e895-598f-08b3-43e2-c4c5b1f44c43@oracle.com>

Hi,

JDK-8233708 removed unused code from VectorSet, but missed a few spots.

Additionally, the current code does a few other things like shifting up
signed longs and casting back to uint, which could be improved.

Bug:    https://bugs.openjdk.java.net/browse/JDK-8234248
Webrev: http://cr.openjdk.java.net/~redestad/8234248/open.00/

Testing: tier1-3

Thanks!

/Claes

From dmitrij.pochepko at bell-sw.com  Mon Nov 18 12:02:13 2019
From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko)
Date: Mon, 18 Nov 2019 15:02:13 +0300
Subject: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo
 intrinsic documentation and maintenance improvement
In-Reply-To: <MN2PR01MB609392A8FA8C9EE6C27C17BE8F4D0@MN2PR01MB6093.prod.exchangelabs.com>
References: <fc712566-1d25-88d6-b178-652efdc00f36@bell-sw.com>
 <DB7PR08MB31153C8A293F3B4FC2371B34967F0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <86f91401-b7f6-b634-fef1-b0615b8fcde0@redhat.com>
 <e47a5afa-e9ce-1286-97da-5fb9018846cb@bell-sw.com>
 <399186f2-5cb7-b713-33c5-b29b6eb73eb8@samersoff.net>
 <MN2PR01MB6093B0E0C0BD01587FF96F778F700@MN2PR01MB6093.prod.exchangelabs.com>
 <6843892a-4267-06cd-0af4-4618fce86396@bell-sw.com>
 <MN2PR01MB6093853821555FE4DE0BEB628F4D0@MN2PR01MB6093.prod.exchangelabs.com>
 <MN2PR01MB609392A8FA8C9EE6C27C17BE8F4D0@MN2PR01MB6093.prod.exchangelabs.com>
Message-ID: <3427facc-eb05-d690-eebe-acca39b87d4a@bell-sw.com>


On 18/11/2019 7:03 AM, Patrick Zhang OS wrote:
>>> changing largeLoopExitCondition to 64 LL/UU and 128 for LU/UL will likely make some systems slower as I experimented a lot with this.
> Sorry my second paragraph was inaccurate, it seems you experimented that there were some cases ran well with the first iteration of the large-loop but would rather quit the loop and go to the small-loop immediately for better performance (?). Please correct me if I misunderstood this. Thanks.
>
> Regards
> Patrick

Yes. That's correct.

Thanks,
Dmitrij

>
> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Patrick Zhang OS
> Sent: Monday, November 18, 2019 11:52 AM
> To: Dmitrij Pochepko <dmitrij.pochepko at bell-sw.com>
> Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net
> Subject: RE: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement
>
> Thanks for the information.
> I am interested in the inconsistence between same_encoding and different_encoding functions, if "overprefetch" can be safe enough, why do we prevent it at the end of large-loop inside same_encoding, why do we protect it more strictly in different_encoding at both the beginning and ending of the large-loop?
> I did not mean globally updating largeLoopExitCondition to 64/128, merely the condition at the end of large-loop inside same_encoding. Suppose large-loop could be faster than small-loop (in theory), removing all "overprefetch" conditions would allow more strings go to the large-loop for better performance. Any other potential side-effects?
>
> Regards
> Patrick
>
> -----Original Message-----
> From: Dmitrij Pochepko <dmitrij.pochepko at bell-sw.com>
> Sent: Friday, November 15, 2019 11:52 PM
> To: Patrick Zhang OS <patrick at os.amperecomputing.com>
> Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; Dmitry Samersoff <dms at samersoff.net>; Andrew Haley <aph at redhat.com>; Pengfei Li (Arm Technology China) <Pengfei.Li at arm.com>
> Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement
>
> Hi Patrick,
>
> My experiments back then showed that few platforms (some of Cortex A*
> series) behaves unexpectedly slow when dealing with overprefetch (probably CPU implementation specifics). So this code is some kind of compromise to run relatively well on all platforms I was able to test on (ThunderX, ThunderX2, Cortex A53, Cortex A73). That is the main reason for such code structure.
> It's good that you're willing to experiment and improve it, but I'm afraid changing largeLoopExitCondition to 64 LL/UU and 128 for LU/UL will likely make some systems slower as I experimented a lot with this.
> Let us see the performance results for several systems you've got to avoid a situation when one platform benefits by slowing down others. We could offer some help if you don't have some HW available.
>
> Thanks,
> Dmitrij
>
> On 15/11/2019 10:51 AM, Patrick Zhang OS wrote:
>> Hi Dmitrij,
>>
>> The inline document inside your this patch is nice in helping me understand the string_compare stub code generation in depth, although I don't know why it was not pushed.
>> http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/src/hotspot/cpu
>> /aarch64/stubGenerator_aarch64.cpp.sdiff.html
>>
>> There is one point I don't quite understand, could you please clarify it a little more? Thanks in advance!
>> The large loop with prefetching logic, in different_encoding function, it uses "cnt2 - prefetchLoopExitCondition" to tell whether the 1st iteration should be executed or not, while the same_encoding does not do this, why?
>>
>> I was thinking that "prefetching out of the string boundary" could be an invalid operation, but it seems a misunderstanding, isn't it? The AArch64 ISA document says that prfm instruction signals the memory system and expects "preloading the cache line containing the specified address into one or more caches" would be done, in order to speed up the memory accesses when they do occur. If this is just a hint for coming ldrs, and safe enough, could we not restrict the rest iterations (2 ~ n) with largeLoopExitCondition? instead use 64 LL/UU, and 128 for LU/UL. As such more iteration can say in the large loop (SoftwarePrefetchHintDistance=192 for example), for better performance. Any comments?
>>
>> Thanks
>>
>> 4327   address generate_compare_long_string_different_encoding(bool isLU) {
>> 4377     if (SoftwarePrefetchHintDistance >= 0) {
>> 4378       __ subs(rscratch2, cnt2, prefetchLoopExitCondition);
>> 4379       __ br(__ LT, NO_PREFETCH);
>> 4380       __ bind(LARGE_LOOP_PREFETCH);                  // 64-characters loop
>> ... ...
>> 4395           __ subs(rscratch2, cnt2, prefetchLoopExitCondition); // <-- could we use subs(rscratch2, cnt2, 128) instead?
>> 4396           __ br(__ GE, LARGE_LOOP_PREFETCH);
>> 4397     } // end of 64-characters loop
>>
>> 4616   address generate_compare_long_string_same_encoding(bool isLL) {
>> 4637     if (SoftwarePrefetchHintDistance >= 0) {
>> 4638       __ bind(LARGE_LOOP_PREFETCH);
>> 4639         __ prfm(Address(str1, SoftwarePrefetchHintDistance));
>> 4640         __ prfm(Address(str2, SoftwarePrefetchHintDistance));
>> 4641         compare_string_16_bytes_same(DIFF, DIFF2);
>> 4642         compare_string_16_bytes_same(DIFF, DIFF2);
>> 4643         __ sub(cnt2, cnt2, 8 * characters_in_word);
>> 4644         compare_string_16_bytes_same(DIFF, DIFF2);
>> 4645         __ subs(rscratch2, cnt2, largeLoopExitCondition); // rscratch2 is not used. Use subs instead of cmp in case of potentially large constants // <-- could we use subs(rscratch2, cnt2, 64) instead?
>> 4646         compare_string_16_bytes_same(DIFF, DIFF2);
>> 4647         __ br(__ GT, LARGE_LOOP_PREFETCH);
>> 4648         __ cbz(cnt2, LAST_CHECK);                    // no more loads left
>> 4649     }
>>
>> Regards
>> Patrick
>>
>> -----Original Message-----
>> From: hotspot-compiler-dev
>> <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Dmitry
>> Samersoff
>> Sent: Sunday, May 19, 2019 11:42 PM
>> To: Dmitrij Pochepko <dmitrij.pochepko at bell-sw.com>; Andrew Haley
>> <aph at redhat.com>; Pengfei Li (Arm Technology China)
>> <Pengfei.Li at arm.com>
>> Cc: hotspot-compiler-dev at openjdk.java.net;
>> aarch64-port-dev at openjdk.java.net
>> Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64:
>> String::compareTo intrinsic documentation and maintenance improvement
>>
>> Dmitrij,
>>
>> The changes looks good to me.
>>
>> -Dmitry
>>
>> On 25.02.2019 19:52, Dmitrij Pochepko wrote:
>>> Hi Andrew, Pengfei,
>>>
>>> I created webrev.02 with all your suggestions implemented:
>>>
>>> webrev: http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/
>>>
>>> - comments are now both in separate section and inlined into code.
>>> - documentation mismatch mentioned by Pengfei is fixed:
>>> -- SHORT_LAST_INIT label name misprint changed to correct SHORT_LAST
>>> -- SHORT_LOOP_TAIL block now merged with last instruction.
>>> Documentation is updated respectively
>>> - minor other changes to layout and wording
>>>
>>> Newly developed tests were run as sanity and they passed.
>>>
>>> Thanks,
>>> Dmitrij
>>>
>>> On 22/02/2019 6:42 PM, Andrew Haley wrote:
>>>> On 2/22/19 10:31 AM, Pengfei Li (Arm Technology China) wrote:
>>>>
>>>>> So personally, I still prefer to inline the comments with the
>>>>> original code block to avoid this kind of inconsistencies. And it
>>>>> makes us easier to review or maintain the code together with the
>>>>> doc, as we don't need to scroll back and force. I don't know the
>>>>> benefit of making the code documentation as a separate part. What's
>>>>> your opinion, Andrew Haley?
>>>> I agree with you. There's no harm having both inline and separate.
>>>>

From vladimir.x.ivanov at oracle.com  Mon Nov 18 12:34:55 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 18 Nov 2019 15:34:55 +0300
Subject: Bounds Check Elimination with Fast-Range
In-Reply-To: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com>
References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com>
Message-ID: <e67edfc9-121b-9a40-52b8-a1efff498a2d@oracle.com>

(CCing hotspot-compiler-dev at ...)

Thanks for the reference, August.

Indeed the proposed approach looks very promising.

I don't think range-check elimination in C2 can optimize such code shape 
(haven't verified it with test cases yet though). Filed an RFE for it:
   https://bugs.openjdk.java.net/browse/JDK-8234333

It looks like it should be pretty straightforward to cover this 
particular case.

Also, it's worth looking at how Graal handles it and file a separate RFE 
if it doesn't optimize it well.

Best regards,
Vladimir Ivanov

On 18.11.2019 07:02, August Nagro wrote:
> Hi!
> 
> The fast-range[1] algorithm is used to map well-distributed hash functions to a range of size N. It is ~4x faster than using integer modulo, and does not require the table to be a power of two. It is used by libraries like Tensorflow and the StockFish chess engine.
> 
> The idea is that, given (int) hash h and (int) size N, then ((long) h) * N) >>> 32 is a good mapping.
> 
> However, will the compiler be able to eliminate array range-checking? HashMap?s approach using power-of-two xor/mask was patched here: https://bugs.openjdk.java.net/browse/JDK-8003585.
> 
> Sincerely,
> 
> - August
> 
> [1]: https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
> 

From fweimer at redhat.com  Mon Nov 18 13:10:27 2019
From: fweimer at redhat.com (Florian Weimer)
Date: Mon, 18 Nov 2019 14:10:27 +0100
Subject: Bounds Check Elimination with Fast-Range
In-Reply-To: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com> (August Nagro's
 message of "Sun, 17 Nov 2019 22:02:22 -0600")
References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com>
Message-ID: <87r22549fg.fsf@oldenburg2.str.redhat.com>

* August Nagro:

> The fast-range[1] algorithm is used to map well-distributed hash
> functions to a range of size N. It is ~4x faster than using integer
> modulo, and does not require the table to be a power of two. It is
> used by libraries like Tensorflow and the StockFish chess engine.
>
> The idea is that, given (int) hash h and (int) size N, then ((long) h)
> * N) >>> 32 is a good mapping.

I looked at this in the past weeks in a different context, and I don't
think this would work because we have:

jshell> Integer.hashCode(0)
$1 ==> 0

jshell> Integer.hashCode(1)
$2 ==> 1

jshell> Integer.hashCode(2)
$3 ==> 2

jshell> "a".hashCode()
$4 ==> 97

jshell> "b".hashCode()
$5 ==> 98

Under the allegedly good mapping, those all map to bucket zero even for
really large arrays, which is not acceptable.  The multiplication
shortcut only works for hash functions which behave in certain ways.
Something FNV-style for strings is probably okay, but most Java
hashCode() implementations likely are not.

For non-power-of-two bucket counts, one could try to pre-compute the
reciprocal as explained in Hacker's Delight and in these posts:

<http://ridiculousfish.com/blog/posts/labor-of-division-episode-i.html>
<http://ridiculousfish.com/blog/posts/labor-of-division-episode-iii.html>

(I need to write to the author and have some of the math fixed, but I
think the general direction is solid.)

For an internal hash table, it is possible to use primes which are
convenient for the saturating increment algorithm because the choice of
bucket count is an implementation detail to some extent.  (It is not in
my case, so it would need data-dependent branches, which is kind of
counter-productive.)

Not discussed on the quoted pages is a generalization which uses

  hashCode - bucketCount * (int) Long.multiplyHigh(hashCode + 1L, magic)

as the bucket number.  That works for any table size that is not a power
of two, but requires a fast multiplier to get the upper half of a 64x64
product.

Thanks,
Florian


From nils.eliasson at oracle.com  Mon Nov 18 14:25:58 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Mon, 18 Nov 2019 15:25:58 +0100
Subject: RFR: 8234248: More VectorSet cleanups
In-Reply-To: <c900e895-598f-08b3-43e2-c4c5b1f44c43@oracle.com>
References: <c900e895-598f-08b3-43e2-c4c5b1f44c43@oracle.com>
Message-ID: <a9c5c1ca-360b-448f-ec63-b31f04146e58@oracle.com>

Hi Claes,

Looks good,

Regards,

Nils

On 2019-11-18 12:37, Claes Redestad wrote:
> Hi,
>
> JDK-8233708 removed unused code from VectorSet, but missed a few spots.
>
> Additionally, the current code does a few other things like shifting up
> signed longs and casting back to uint, which could be improved.
>
> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8234248
> Webrev: http://cr.openjdk.java.net/~redestad/8234248/open.00/
>
> Testing: tier1-3
>
> Thanks!
>
> /Claes

From claes.redestad at oracle.com  Mon Nov 18 14:33:18 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Mon, 18 Nov 2019 15:33:18 +0100
Subject: RFR: 8234248: More VectorSet cleanups
In-Reply-To: <a9c5c1ca-360b-448f-ec63-b31f04146e58@oracle.com>
References: <c900e895-598f-08b3-43e2-c4c5b1f44c43@oracle.com>
 <a9c5c1ca-360b-448f-ec63-b31f04146e58@oracle.com>
Message-ID: <7fba2025-5386-a83b-6ef9-6172d8327ff9@oracle.com>


On 2019-11-18 15:25, Nils Eliasson wrote:
> Hi Claes,
> 
> Looks good,

Thanks you, Nils!

/Claes

From tobias.hartmann at oracle.com  Mon Nov 18 14:53:41 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 18 Nov 2019 15:53:41 +0100
Subject: RFR: 8234248: More VectorSet cleanups
In-Reply-To: <c900e895-598f-08b3-43e2-c4c5b1f44c43@oracle.com>
References: <c900e895-598f-08b3-43e2-c4c5b1f44c43@oracle.com>
Message-ID: <0b353d7f-59d6-c796-5286-5d5d4ecf4726@oracle.com>

Hi Claes,

looks good to me.

vectset.cpp:41 "assert ("  -> "assert("

Best regards,
Tobias

On 18.11.19 12:37, Claes Redestad wrote:
> Hi,
> 
> JDK-8233708 removed unused code from VectorSet, but missed a few spots.
> 
> Additionally, the current code does a few other things like shifting up
> signed longs and casting back to uint, which could be improved.
> 
> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8234248
> Webrev: http://cr.openjdk.java.net/~redestad/8234248/open.00/
> 
> Testing: tier1-3
> 
> Thanks!
> 
> /Claes

From tobias.hartmann at oracle.com  Mon Nov 18 14:58:13 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 18 Nov 2019 15:58:13 +0100
Subject: RFR(S): 8233193: Incorrect bailout from
 possibly_add_compiler_threads
In-Reply-To: <VI1PR0201MB2479DBD887D69E520208951E9A710@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB2479DBD887D69E520208951E9A710@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <41ee3d36-66b9-438a-e791-2a8dcb63d061@oracle.com>

Hi Martin,

this looks good to me.

Best regards,
Tobias

On 14.11.19 17:18, Doerr, Martin wrote:
> Hi,
> 
> I'd like to cleanup exception handling in CompileBroker a little bit.
> 
> Here's my proposal:
> - Use THREAD instead of CHECK where no exceptions get thrown.
> - Remove unused preload_classes.
> - make_thread: Rename thread to new_thread to avoid confusion with THREAD (the current compiler thread).
> - possibly_add_compiler_threads: Remove usage of EXCEPTION_MARK + CHECK because this functions is not supposed to kill the VM on exceptions. Add assertion to caller.
> 
> Webrev:
> http://cr.openjdk.java.net/~mdoerr/8233193_CompileBroker/webrev.00/
> 
> @David:
> You didn't like usage of the CHECK macro in the initialization functions, but I think they are ok.
> Not very nice to read, but the behavior looks ok to me.
> At least, I didn't find a better replacement for them. Maybe you have a proposal?
> 
> Best regards,
> Martin
> 

From claes.redestad at oracle.com  Mon Nov 18 15:04:26 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Mon, 18 Nov 2019 16:04:26 +0100
Subject: RFR: 8234248: More VectorSet cleanups
In-Reply-To: <0b353d7f-59d6-c796-5286-5d5d4ecf4726@oracle.com>
References: <c900e895-598f-08b3-43e2-c4c5b1f44c43@oracle.com>
 <0b353d7f-59d6-c796-5286-5d5d4ecf4726@oracle.com>
Message-ID: <e9a59ecc-093e-1256-4688-fe33e35ac2bc@oracle.com>

On 2019-11-18 15:53, Tobias Hartmann wrote:
> Hi Claes,
> 
> looks good to me.

Thanks!

> 
> vectset.cpp:41 "assert ("  -> "assert("

Fixed!

/Claes

From martin.doerr at sap.com  Mon Nov 18 17:23:42 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 18 Nov 2019 17:23:42 +0000
Subject: RFR(S): 8233193: Incorrect bailout from
 possibly_add_compiler_threads
In-Reply-To: <41ee3d36-66b9-438a-e791-2a8dcb63d061@oracle.com>
References: <VI1PR0201MB2479DBD887D69E520208951E9A710@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <41ee3d36-66b9-438a-e791-2a8dcb63d061@oracle.com>
Message-ID: <VI1PR0201MB24792A791E7AFA50E857BA609A4D0@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Tobias,

thanks for the review. Pushed.

Best regards,
Martin


> -----Original Message-----
> From: Tobias Hartmann <tobias.hartmann at oracle.com>
> Sent: Montag, 18. November 2019 15:58
> To: Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-
> dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>; David
> Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>
> Subject: Re: RFR(S): 8233193: Incorrect bailout from
> possibly_add_compiler_threads
> 
> Hi Martin,
> 
> this looks good to me.
> 
> Best regards,
> Tobias
> 
> On 14.11.19 17:18, Doerr, Martin wrote:
> > Hi,
> >
> > I'd like to cleanup exception handling in CompileBroker a little bit.
> >
> > Here's my proposal:
> > - Use THREAD instead of CHECK where no exceptions get thrown.
> > - Remove unused preload_classes.
> > - make_thread: Rename thread to new_thread to avoid confusion with
> THREAD (the current compiler thread).
> > - possibly_add_compiler_threads: Remove usage of EXCEPTION_MARK +
> CHECK because this functions is not supposed to kill the VM on exceptions.
> Add assertion to caller.
> >
> > Webrev:
> > http://cr.openjdk.java.net/~mdoerr/8233193_CompileBroker/webrev.00/
> >
> > @David:
> > You didn't like usage of the CHECK macro in the initialization functions, but I
> think they are ok.
> > Not very nice to read, but the behavior looks ok to me.
> > At least, I didn't find a better replacement for them. Maybe you have a
> proposal?
> >
> > Best regards,
> > Martin
> >

From john.r.rose at oracle.com  Mon Nov 18 17:45:38 2019
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 18 Nov 2019 09:45:38 -0800
Subject: Bounds Check Elimination with Fast-Range
In-Reply-To: <87r22549fg.fsf@oldenburg2.str.redhat.com>
References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com>
 <87r22549fg.fsf@oldenburg2.str.redhat.com>
Message-ID: <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com>

On Nov 18, 2019, at 5:10 AM, Florian Weimer <fweimer at redhat.com> wrote:
> 
>> 
>> The idea is that, given (int) hash h and (int) size N, then ((long) h)
>> * N) >>> 32 is a good mapping.
> 
> I looked at this in the past weeks in a different context, and I don't
> think this would work because we have:

That technique appears to require either a well-conditioned hash code
(which is not the case with Integer.hashCode) or else a value of N that
performs extra mixing on h.  (So a very *non-*power-of-two value of N
would be better here, i.e., N with larger popcount.)

A little more mixing should help the problem Florian reports with a
badly conditioned h.  Given this:

int fr(int h) { return (int)(((long)h * N) >>> 32); }
int h = x.hashCode();
//int bucket = fr(h);  // weak if h is badly conditioned

then, assuming multiplication is cheap:

int bucket = fr(h * M); // M = 0x2357BD or something

or maybe something fast and sloppy like:

int bucket = fr(h + (h << 8));

or even:

int bucket = fr(h) ^ (h & (N-1));


From augustnagro at gmail.com  Mon Nov 18 19:26:30 2019
From: augustnagro at gmail.com (August Nagro)
Date: Mon, 18 Nov 2019 13:26:30 -0600
Subject: Bounds Check Elimination with Fast-Range
In-Reply-To: <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com>
References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com>
 <87r22549fg.fsf@oldenburg2.str.redhat.com>
 <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com>
Message-ID: <CAHNb0ZPRd1uyb7jHhpbH19KLLZB5YWo36oMdUFSOskKAq+WKyw@mail.gmail.com>

Yes, exactly. Once can also use Fibonacci hashing to ensure that an
arbitrary Key.hashCode() is well distributed.

See for instance my implementation of Universal Hashing..
https://gist.github.com/AugustNagro/4f2d70d261347e515efe0f87de9e8dc2

On Mon, Nov 18, 2019 at 11:45 AM John Rose <john.r.rose at oracle.com> wrote:

> On Nov 18, 2019, at 5:10 AM, Florian Weimer <fweimer at redhat.com> wrote:
>
>
>
> The idea is that, given (int) hash h and (int) size N, then ((long) h)
> * N) >>> 32 is a good mapping.
>
>
> I looked at this in the past weeks in a different context, and I don't
> think this would work because we have:
>
>
> That technique appears to require either a well-conditioned hash code
> (which is not the case with Integer.hashCode) or else a value of N that
> performs extra mixing on h.  (So a very *non-*power-of-two value of N
> would be better here, i.e., N with larger popcount.)
>
> A little more mixing should help the problem Florian reports with a
> badly conditioned h.  Given this:
>
> int fr(int h) { return (int)(((long)h * N) >>> 32); }
> int h = x.hashCode();
> //int bucket = fr(h);  // weak if h is badly conditioned
>
> then, assuming multiplication is cheap:
>
> int bucket = fr(h * M); // M = 0x2357BD or something
>
> or maybe something fast and sloppy like:
>
> int bucket = fr(h + (h << 8));
>
> or even:
>
> int bucket = fr(h) ^ (h & (N-1));
>
>

From fweimer at redhat.com  Mon Nov 18 20:17:05 2019
From: fweimer at redhat.com (Florian Weimer)
Date: Mon, 18 Nov 2019 21:17:05 +0100
Subject: Bounds Check Elimination with Fast-Range
In-Reply-To: <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com> (John Rose's
 message of "Mon, 18 Nov 2019 09:45:38 -0800")
References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com>
 <87r22549fg.fsf@oldenburg2.str.redhat.com>
 <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com>
Message-ID: <877e3x0wji.fsf@oldenburg2.str.redhat.com>

* John Rose:

> On Nov 18, 2019, at 5:10 AM, Florian Weimer <fweimer at redhat.com> wrote:
>> 
>>> 
>>> The idea is that, given (int) hash h and (int) size N, then ((long) h)
>>> * N) >>> 32 is a good mapping.
>> 
>> I looked at this in the past weeks in a different context, and I don't
>> think this would work because we have:
>
> That technique appears to require either a well-conditioned hash code
> (which is not the case with Integer.hashCode) or else a value of N that
> performs extra mixing on h.  (So a very *non-*power-of-two value of N
> would be better here, i.e., N with larger popcount.)
>
> A little more mixing should help the problem Florian reports with a
> badly conditioned h.  Given this:
>
> int fr(int h) { return (int)(((long)h * N) >>> 32); }
> int h = x.hashCode();
> //int bucket = fr(h);  // weak if h is badly conditioned
>
> then, assuming multiplication is cheap:

(Back-to-back multiplications probably are not.)

> int bucket = fr(h * M); // M = 0x2357BD or something
>
> or maybe something fast and sloppy like:
>
> int bucket = fr(h + (h << 8));
>
> or even:
>
> int bucket = fr(h) ^ (h & (N-1));

Does this really work?  I don't think so.

I think this kind of perturbation is quite expensive.  Arm's BITR should
be helpful here.  But even though this operation is commonly needed and
easily implemented in hardware, it's rarely found in CPUs.

Any scheme with another multiplication is probably not an improvement
over the multiply-shift-multiply-subtract sequence to implement modulo
for certain convenient bucket counts, and for that, we can look up
extensive analysis. 8-)

Thanks,
Florian


From sergei.tsypanov at yandex.ru  Mon Nov 18 22:44:00 2019
From: sergei.tsypanov at yandex.ru (=?utf-8?B?0KHQtdGA0LPQtdC5INCm0YvQv9Cw0L3QvtCy?=)
Date: Tue, 19 Nov 2019 00:44:00 +0200
Subject: Allocation of array copy can be eliminated in particular cases
Message-ID: <48528911574117040@sas8-55d8cbf44a35.qloud-c.yandex.net>

Hello,

this proposal was born as a result of discussion in core-libs-dev [1] and IDEA-226474 [2].

Originally I suggested to replace

int count = method.getParameterTypes().length;

with 

int count = method.getParameterCount();

Then it turned out that cloned array could be a subject of allocation elimination as any property dereferenced from the copy can be dereferenced from original array:

Consider this test:

@Test
void arrayClone() {
  final Object[] objects = new Object[3];
  objects[0] = "azaza";
  objects[1] = 365;
  objects[2] = 9876L;

  final Object[] clone = objects.clone();

  assertEquals(objects.length, clone.length);
  assertSame(objects[0], clone[0]);
  assertSame(objects[1], clone[1]);
  assertSame(objects[2], clone[2]);
}

Optimizing compiler could drop allocation of 'clone' variable and substitute its usage with original array, which is not done currently:

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class MethodParamCountBenchmark {
  private Method method;

  @Setup
  public void setup() throws Exception { method = getClass().getMethod("toString"); }

  @Benchmark
  public int getParameterCount() { return method.getParameterCount(); }

  @Benchmark
  public int getParameterTypes() { return method.getParameterTypes().length; }
}

on my i7-7700 with JDK 11 this benchmark yields these results:


Benchmark                                                               Mode  Cnt     Score    Error   Units

MethodToStringBenchmark.getParameterCount                               avgt   25     2,528 ?  0,085   ns/op
MethodToStringBenchmark.getParameterCount:?gc.alloc.rate                avgt   25    ? 10??           MB/sec
MethodToStringBenchmark.getParameterCount:?gc.alloc.rate.norm           avgt   25    ? 10??             B/op
MethodToStringBenchmark.getParameterCount:?gc.count                     avgt   25       ? 0           counts

MethodToStringBenchmark.getParameterTypes                               avgt   25     7,299 ?  0,410   ns/op
MethodToStringBenchmark.getParameterTypes:?gc.alloc.rate                avgt   25  1999,454 ? 89,929  MB/sec
MethodToStringBenchmark.getParameterTypes:?gc.alloc.rate.norm           avgt   25    16,000 ?  0,001    B/op
MethodToStringBenchmark.getParameterTypes:?gc.churn.G1_Eden_Space       avgt   25  2003,360 ? 91,537  MB/sec
MethodToStringBenchmark.getParameterTypes:?gc.churn.G1_Eden_Space.norm  avgt   25    16,030 ?  0,045    B/op
MethodToStringBenchmark.getParameterTypes:?gc.churn.G1_Old_Gen          avgt   25     0,004 ?  0,001  MB/sec
MethodToStringBenchmark.getParameterTypes:?gc.churn.G1_Old_Gen.norm     avgt   25    ? 10??             B/op
MethodToStringBenchmark.getParameterTypes:?gc.count                     avgt   25  2380,000           counts
MethodToStringBenchmark.getParameterTypes:?gc.time                      avgt   25  1325,000               ms

I.e. intermediate array is allocated even if it doesn't escape the method it is created in.

Is my speculation correct and does it make sence to implement optimization that turns sequence

array -> array.clone() - > clone.length

into

array -> array.length

for the cases clone's visibility scope is predictable?


1) http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-November/063344.html
2) https://youtrack.jetbrains.com/issue/IDEA-226474

Regards,
Sergey Tsypanov

From vladimir.x.ivanov at oracle.com  Mon Nov 18 22:56:47 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 19 Nov 2019 01:56:47 +0300
Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
 check
In-Reply-To: <35dd0640-7f48-11ae-bd12-1fcaf893b2fc@oracle.com>
References: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
 <35dd0640-7f48-11ae-bd12-1fcaf893b2fc@oracle.com>
Message-ID: <4314ca44-9094-a4d5-407e-9d9eaf5d4b37@oracle.com>


> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/

Looks good, Patric.

Best regards,
Vladimir Ivanov

PS: some cleanup suggestions (feel free to ignore them if you don't agree):

src/hotspot/share/opto/ifnode.cpp:

+//   \r1
+//  r2\  eqT  eqF  neT  neF  ltT  ltF  leT  leF  gtT  gtF  geT  geF
+//  eq    t    f    f    t    f    -    -    f    f    -    -    f
+//  ne    f    t    t    f    t    -    -    t    t    -    -    t
+//  lt    f    -    -    f    t    f    -    f    f    -    f    t
+//  le    t    -    -    t    t    -    t    f    f    t    -    t
+//  gt    f    -    -    f    f    -    f    t    t    f    -    f
+//  ge    t    -    -    t    f    t    -    t    t    -    t    f
+//
+Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) {
+  // Table encoding: N/A (na), True-branch (tb), False-branch (fb).
+  static enum { na, tb, fb } s_subsume_map[6][12] = {
+  /*rel: eq+T eq+F ne+T ne+F lt+T lt+F le+T le+F gt+T gt+F ge+T ge+F*/
+  /*eq*/{ tb,  fb,  fb,  tb,  fb,  na,  na,  fb,  fb,  na,  na,  fb },
+  /*ne*/{ fb,  tb,  tb,  fb,  tb,  na,  na,  tb,  tb,  na,  na,  tb },
+  /*lt*/{ fb,  na,  na,  fb,  tb,  fb,  na,  fb,  fb,  na,  fb,  tb },
+  /*le*/{ tb,  na,  na,  tb,  tb,  na,  tb,  fb,  fb,  tb,  na,  tb },
+  /*gt*/{ fb,  na,  na,  fb,  fb,  na,  fb,  tb,  tb,  fb,  na,  fb },
+  /*ge*/{ tb,  na,  na,  tb,  fb,  tb,  na,  tb,  tb,  na,  tb,  fb }};

IMO you can dump the table from the comment: it mostly duplicates the 
code. (Probably, you can use a different name for "N/A" or just refer to 
it in numeric form (0?) to preserve clean structure of the table from 
the comment.)

======================================

+  if (is_If() && (cmp = in(1)->in(1))->Opcode() == Op_CmpP) {
+    if (cmp->in(2) != NULL && // make sure cmp is not already dead
+        cmp->in(2)->bottom_type() == TypePtr::NULL_PTR) {

Merge nested ifs?

======================================

Looks like extracting the following code into a helper function (along 
with the enum and the table) can improve readability.

+  int drel = subsuming_bool_test_encode(dom->in(1));
+  int trel = subsuming_bool_test_encode(bol);
+  int bout = pre->is_IfFalse() ? 1 : 0;
+
+  if (drel < 0 || trel < 0) {
+    return NULL;
+  }
+  int br = s_subsume_map[trel][2*drel+bout];
+  if (br == na) {
+    return NULL;
+  }

New function can return intcon(0/1) or bol(or NULL?) and the caller 
decides whether the update is needed.

> On 12/11/2019 15:16, Patric Hedlin wrote:
>> Dear all,
>>
>> I would like to ask for help to review the following change/update:
>>
>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376
>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/
>>
>> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check
>>
>> ??? Adding a simple subsumption test to IfNode::Ideal to enable a local
>> ??? short-circuit for (obviously) redundant if-nodes.
>>
>> Testing: hs-tier1-4, hs-precheckin-comp
>>
>>
>> Best regards,
>> Patric
>>
> 

From serguei.spitsyn at oracle.com  Mon Nov 18 23:56:53 2019
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Mon, 18 Nov 2019 15:56:53 -0800
Subject: RFR(S) : 8233462 : serviceability/tmtools/jstat tests times out
 with -Xcomp
In-Reply-To: <2B7FD259-FBC2-4850-B1B3-81D1669F150B@oracle.com>
References: <2B7FD259-FBC2-4850-B1B3-81D1669F150B@oracle.com>
Message-ID: <728f5be0-2988-02e4-43b1-64a65c7b322e@oracle.com>

Hi Igor,

Looks good.
Thank you for taking care about this!

Thanks,
Serguei


On 11/15/19 23:47, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev//8233462/webrev.00
>> 33 lines changed: 1 ins; 14 del; 18 mod;
> Hi all,
>
> could you please review this small fix for tmtools testlibrary?
> tmtools tests are believed to fail due to a deadlock-like situation b/w main test process and tmtools process:
> (from JBS)
>> it seems these tests attach jstat to the main test process, the same process which reads the tool's stdout/stderr, so there is a possibility that this will deadlock: jstat-process produces more output than the buffer can hold, so it blocks till someone (the main process reads it), while the main process waits till jstat completes.
> the patch changes serviceability/tmtools/share/common library (used by all serviceability/tmtools) to redirect tmtool's stdout and stderr into files instead of using jdk.test.lib.process.OutputAnalyzer; I've also added a bit of diagnostic output, so it will be easier to analyze future failures.
>
> webrev: http://cr.openjdk.java.net/~iignatyev//8233462/webrev.00
> JBS: https://bugs.openjdk.java.net/browse/JDK-8233462
> testing:
>   - serviceability/tmtools on windows-x64,linux-x64,macosx-x64,solaris-sparcv9
>   - serviceability/tmtools 100 times on linux-x64-debug w/ '-Xcomp -ea -esa -XX:+TieredCompilation -XX:+DeoptimizeALot' (most of failures have been seen on this configuration)
>
> Thanks,
> -- Igor


From igor.ignatyev at oracle.com  Tue Nov 19 00:01:41 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Mon, 18 Nov 2019 16:01:41 -0800
Subject: RFR(S) : 8233462 : serviceability/tmtools/jstat tests times out
 with -Xcomp
In-Reply-To: <728f5be0-2988-02e4-43b1-64a65c7b322e@oracle.com>
References: <2B7FD259-FBC2-4850-B1B3-81D1669F150B@oracle.com>
 <728f5be0-2988-02e4-43b1-64a65c7b322e@oracle.com>
Message-ID: <D61AE92E-A742-4DF4-8ADD-7C2831F70DDE@oracle.com>

Hi Serguei,

Thank you for your review and discussion around this issue.

-- Igor

> On Nov 18, 2019, at 3:56 PM, serguei.spitsyn at oracle.com wrote:
> 
> Hi Igor,
> 
> Looks good.
> Thank you for taking care about this!
> 
> Thanks,
> Serguei
> 
> 
> On 11/15/19 23:47, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev//8233462/webrev.00
>>> 33 lines changed: 1 ins; 14 del; 18 mod;
>> Hi all,
>> 
>> could you please review this small fix for tmtools testlibrary?
>> tmtools tests are believed to fail due to a deadlock-like situation b/w main test process and tmtools process:
>> (from JBS)
>>> it seems these tests attach jstat to the main test process, the same process which reads the tool's stdout/stderr, so there is a possibility that this will deadlock: jstat-process produces more output than the buffer can hold, so it blocks till someone (the main process reads it), while the main process waits till jstat completes.
>> the patch changes serviceability/tmtools/share/common library (used by all serviceability/tmtools) to redirect tmtool's stdout and stderr into files instead of using jdk.test.lib.process.OutputAnalyzer; I've also added a bit of diagnostic output, so it will be easier to analyze future failures.
>> 
>> webrev: http://cr.openjdk.java.net/~iignatyev//8233462/webrev.00
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233462
>> testing:
>>  - serviceability/tmtools on windows-x64,linux-x64,macosx-x64,solaris-sparcv9
>>  - serviceability/tmtools 100 times on linux-x64-debug w/ '-Xcomp -ea -esa -XX:+TieredCompilation -XX:+DeoptimizeALot' (most of failures have been seen on this configuration)
>> 
>> Thanks,
>> -- Igor
> 


From mikhailo.seledtsov at oracle.com  Mon Nov 18 22:06:39 2019
From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com)
Date: Mon, 18 Nov 2019 14:06:39 -0800
Subject: RFR(S) : 8147017 : Platform.isGraal should be removed
In-Reply-To: <981118AF-1DAD-4231-9FA6-7A89A46E5EDB@oracle.com>
References: <981118AF-1DAD-4231-9FA6-7A89A46E5EDB@oracle.com>
Message-ID: <5e1d17af-798f-123f-ef5e-3957b98a8340@oracle.com>

Looks good to me,

Misha

On 11/17/19 11:00 AM, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html
>> 16 lines changed: 2 ins; 8 del; 6 mod;
> Hi all,
>
> jdk.test.lib.Platform.isGraal method assumes that JVM w/ Graal as JIT has 'Graal VM' in its name, which is wrong, and caused other to incorrectly assume that '-graal' flag exist and must be used to select Graal compiler. the patch removes this method and updates its only meaningful usage in TestGCLogMessages test. TestGCLogMessages test should use LogMessageWithLevelC2OrJVMCIOnly only when c2 or graal is available, so it's been updated to use corresponding methods of sun.hotspot.code.Compiler class, which requires WhiteBoxAPI being enabled.
>
>
> JBS: https://bugs.openjdk.java.net/browse/JDK-8147017
> webrev: http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html
> testing: tier1 + TestGCLogMessages w/ different JIT configurations
>
> Thanks,
> -- Igor

From Pengfei.Li at arm.com  Tue Nov 19 10:03:50 2019
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Tue, 19 Nov 2019 10:03:50 +0000
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <638df51f-f9e0-5ceb-6509-7d396f85d8ec@redhat.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <DB7PR08MB3115D5D4659DC9A7886F2A44964D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com>
 <DB7PR08MB311548BD55E8D8F07D3E97DF964D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <638df51f-f9e0-5ceb-6509-7d396f85d8ec@redhat.com>
Message-ID: <DB7PR08MB3115C5D0762DA2EEB3B7DF58964C0@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi Andrew,

> Why do you think so? UseCompressedOops doesn't usually need r27.

If I understand correctly, your point is to allocate r27 as well for some scenarios when UseCompressedOops or UseCompressedClassPointers is on. This optimization is much more aggressive and I will try to do it carefully.

> We should have a flag which is set if the search for nicely-aligned 
> memory is successful, and then you can use that flag to determine if r27 is needed.

In which file do you think we should add the flag? Can we just check the value of CompressedKlassPointers::base() in reg_mask_init() ?

--
Thanks,
Pengfei

From claes.redestad at oracle.com  Tue Nov 19 10:18:36 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Tue, 19 Nov 2019 11:18:36 +0100
Subject: RFR: 8234328: VectorSet::clear can cause fragmentation
Message-ID: <47dce9ee-0e62-7375-4dff-2924f824ecc6@oracle.com>

Hi,

today, VectorSet::clear "reclaims" storage when the size is large.

However, since the backing array is allocated in a resource arena, this
is dubious since the currently retained memory is only actually freed
and made reusable if it's currently the last chunk of memory allocated
in the arena. This means a clear() is likely to just waste the allocated
memory until we exit the current resource scope

Instead, I propose a strategy where instead of "freeing" we keep track
of the currently allocated size of the VectorSet separately from the in-
use size. We can then defer the memset to reset/clear the memory to the
next time we need to grow, thus avoiding unnecessary reallocations and
memsets. This limits the memory waste.

Bug:    https://bugs.openjdk.java.net/browse/JDK-8234328
Webrev: http://cr.openjdk.java.net/~redestad/8234328/open.00/

Testing: tier1-3

Either of reset() or clear() could now be removed, which seems like a
straightforward follow-up RFE. With some convincing I could roll it into
this patch.

Thanks!

/Claes

From christoph.goettschkes at microdoc.com  Tue Nov 19 10:38:42 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Tue, 19 Nov 2019 11:38:42 +0100
Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java
 only works with server VMs.
In-Reply-To: <27E94E0A-7A99-4B0D-B96D-7DADF6201542@oracle.com>
References: <20191112120936.1D826D285F@aojmv0009>
 <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com>
 <2w8hcs8xd1-1@aserp2030.oracle.com>
 <7BEE375F-312E-4EE5-B0FA-2223752689A3@oracle.com>
 <71204849-979a-bd2e-b302-69ac81a44bca@oracle.com>
 <20191114112213.D20B8D9B6E@aojmv0009>
 <VE1PR08MB4656A89817FF722ECF400A778E700@VE1PR08MB4656.eurprd08.prod.outlook.com>
 <8f0630cf-7fe6-8021-f195-9595d6f460c9@oracle.com>
 <27E94E0A-7A99-4B0D-B96D-7DADF6201542@oracle.com>
Message-ID: <mailman.15.1574159981.19479.hotspot-compiler-dev@openjdk.java.net>

Hi Igor,

could you please sponsor this changeset and commit it for me into the 
repository?

https://cr.openjdk.java.net/~cgo/8231954/webrev.03/jdk-jdk.changeset

Thanks,
Christoph

Igor Ignatyev <igor.ignatyev at oracle.com> wrote on 2019-11-15 20:21:51:

> 
> I'd also add that unfortunately it's not always right to add 
> '@requirers vm.compiler2.enabled' to a test just b/c it uses some c2-
> only flags. there are cases when such tests aren't just still valid w/o
> these flags, but are capable to spot bugs, on the other hand, tests 
> which have multiple runs w/ different values of c2-only (or c1-only) 
> flags can be split to reduce wasted time. that's to say decision on 
> whenever @requires is a right thing should be done on test-per-test 
basis.
> 
> as 8231954 was only about TestCharVect2 test, I suggest we push 
> Christoph's webrev.03 and file an RFE to deal w/ other tests, or 
> retrofit 8228493 to talk not only about non-product flags but also 
> about c2/c1-only flags and use it as an umbrella for 
discussion/work-tracking.
> 
> -- Igor
> 
> > On Nov 15, 2019, at 11:11 AM, Vladimir Kozlov 
> <vladimir.kozlov at oracle.com> wrote:
> > 
> > Note, compiler/c2 and compiler/c1 was misleading naming for tests 
> directories which is nothing to do with C1 and C2 JIT compilers. They 
> are simply 2 groups of tests we split so they can be executed in 
> parallel in reasonable time.
> > 
> > Vladimir
> > 
> > On 11/14/19 7:35 PM, Yang Zhang (Arm Technology China) wrote:
> >> Hi Christoph, Igor, Vladimir,
> >> Thanks very much for your fix. After discussion, we have got a 
> better solution for this issue. Do we need to change the following 
> files in which MaxVectorSize option is used?
> >> [1] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/
> hotspot/jtreg/compiler/vectorization/TestNaNVector.java
> >> [2] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/
> hotspot/jtreg/compiler/vectorization/TestPopCountVector.java
> >> [3] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/
> hotspot/jtreg/compiler/c2/cr6340864
> >> Ps. For [3],  it locates in c2 directory. So I'm not sure whether 
> they will be excluded in jtreg test with client mode.
> >> Regards
> >> Yang
> >> -----Original Message-----
> >> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of 
christoph.goettschkes at microdoc.com
> >> Sent: Thursday, November 14, 2019 7:21 PM
> >> To: vladimir.kozlov at oracle.com; igor.ignatyev at oracle.com
> >> Cc: hotspot-compiler-dev at openjdk.java.net
> >> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/
> TestCharVect2.java only works with server VMs.
> >> Thanks for your feedback, this resolves my concerns and I am happy 
> with the solution. I integrated the suggestions from Vladimir, here is 
> the latest webrev:
> >> https://cr.openjdk.java.net/~cgo/8231954/webrev.02/
> >> I re-tested and it works as expected.
> >> Please give your consent if this is fine for you as well.
> >> -- Christoph
> >> Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote on 2019-11-13 
20:32:18:
> >>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> >>> To: Igor Ignatyev <igor.ignatyev at oracle.com>,
> >> christoph.goettschkes at microdoc.com
> >>> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> >>> Date: 2019-11-13 20:32
> >>> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/
> >>> TestCharVect2.java only works with server VMs.
> >>> 
> >>> On 11/13/19 11:11 AM, Igor Ignatyev wrote:
> >>>> @Christoph,
> >>>> 
> >>>> webrev.01 looks good to me.
> >>>> I always thought that jvmci feature can be built only when 
compiler2
> >>> feature is enabled, however 
src/hotspot/share/jvmci/jvmci_globals.hpp
> >>> suggests that jvmci can be used w/o compiler2; I don't think we have
> >>> ever build/test, let alone support, this configuration.
> >>>> 
> >>>> @Vladimir,
> >>>> did/do we plan to support compiler1 + jvmci w/o compiler2
> >> configuration?
> >>> 
> >>> Yes. It could be configuration when we start looking on replacing C1
> >>> with Graal. I think several people were interested in "Client VM" 
like
> >>> configuration.
> >>> Also Server configuration without C2 (with Graal or other jvmci
> >>> compiler) which would be out configuration in a future.
> >>> 
> >>> But I would prefer to be more explicit in these changes:
> >>> 
> >>> @requires vm.compiler2.enabled | vm.graal.enabled
> >>> 
> >>> Thanks,
> >>> Vladimir
> >>> 
> >>>> 
> >>>> Thanks,
> >>>> -- Igor
> >>>> 
> >>>>> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com
> >> wrote:
> >>>>> 
> >>>>> Hi Igor,
> >>>>> 
> >>>>> thanks for your explanation.
> >>>>> 
> >>>>> Igor Ignatyev <igor.ignatyev at oracle.com> wrote on 2019-11-12
> >> 20:40:46:
> >>>>> 
> >>>>>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our
> >> tests, as
> >>>>>> in most cases, it causes wasted compute time (as in this test) 
and
> >> can
> >>>>>> also lead to wrong/deprecated/deleted flags sneaking into the
> >> testbase
> >>>>> 
> >>>>> Agreed. I also wanted to discuss this, since I think that your
> >> solution
> >>>>> is better than mine, but at the same time, I saw possible problems
> >> with
> >>>>> it, see below.
> >>>>> 
> >>>>>> as '@requires vm.flavor == "server"' filters configurations based
> >>>>>> vm build type, it will still allow execution on JVM w/ JVMCI and
> >>>>>> when
> >> JVMCI
> >>>>>> compiler is selected, as it will still be Server VM build. so, in
> >>>>>> a sense, the test will be w/ JVMCI in the same way as w/ your
> >> approach.
> >>>>> 
> >>>>> My concern is not about server VMs with JVMCI, but client VMs with
> >> JVMCI
> >>>>> enabled. Is this a valid configuration? The MaxVectorSize option 
is
> >>>>> defined in [1] as well as in [2], so for me it looks like
> >> MaxVectorSize
> >>>>> can be used for any VM variant as long as JVMCI is enabled. The
> >>>>> configure script also states that both compilers are possible (if
> >>>>> you configure with --with-jvm-features='jvmci'):
> >>>>> 
> >>>>> configure: error: Specified JVM feature 'jvmci' requires feature
> >>>>> 'compiler2' or 'compiler1'
> >>>>> 
> >>>>> Should maybe the requires tag "vm.jvmci" be used as well, like:
> >>>>> 
> >>>>>     @requires vm.flavor == "server" | vm.jvmci
> >>>>> 
> >>>>>> this is the known limitation of jtreg/@requires, and our current
> >>>>>> way
> >> to
> >>>>>> workaround it is to split a test description based on @requires
> >> values
> >>>>> 
> >>>>> Yes, if the @requires tag is used, splitting up the test looks 
like
> >>>>> a
> >> good
> >>>>> idea. I didn't know that it is possible to have multiple test
> >> descriptions
> >>>>> in one test file.
> >>>>> 
> >>>>> I created a new webrev with the new ideas:
> >>>>> 
> >>>>> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/
> >>>>> 
> >>>>> I tested with an amd64 client and server VM and it looks good. I 
am
> >>>>> currently unable to build a client VM with JVMCI enabled, hence no
> >> test
> >>>>> for that yet. I get compile errors and as soon as I resolve those,
> >>>>> runtime errors occur. Before I look into that, I would like to 
know
> >> if
> >>>>> client VMs with JVMCI enabled are supported or not.
> >>>>> 
> >>>>> Thanks,
> >>>>> Christoph
> >>>>> 
> >>>>> [1]
> >>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/
> >>> share/opto/c2_globals.hpp
> >>>>> 
> >>>>> [2]
> >>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/
> >>> share/jvmci/jvmci_globals.hpp
> >>>>> 
> >>>> 
> >>> 
> 


From augustnagro at gmail.com  Tue Nov 19 04:43:23 2019
From: augustnagro at gmail.com (August Nagro)
Date: Mon, 18 Nov 2019 22:43:23 -0600
Subject: Bounds Check Elimination with Fast-Range
In-Reply-To: <877e3x0wji.fsf@oldenburg2.str.redhat.com>
References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com>
 <87r22549fg.fsf@oldenburg2.str.redhat.com>
 <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com>
 <877e3x0wji.fsf@oldenburg2.str.redhat.com>
Message-ID: <2BAEB1D3-46B7-4BA9-81A3-4F5E7B47B82A@gmail.com>

Apologies; there were actually a few errors in my Universal Hashing javadoc (not the code); they?ve been corrected: https://gist.github.com/AugustNagro/4f2d70d261347e515efe0f87de9e8dc2

One thing that might be relevant to the bounds check elimination is that in Java fast-range will output in the range of [-tableSize / 2, tableSize / 2 - 1]. So then we need table[fr(hash) + tableSize/2]. However, tableSize / 2 will be a constant, so that division need only be done once.

Regarding Florian?s concerns: yes it is right that fast-range isn?t optimal in every case (and I never tried to claim that). If your tableSize is a power of 2, then just use xor/mask ala HashMap. But the benefit is when mapping to tables of arbitrary size, where those modulo intrinsics may not apply.

And here?s a tangent to thing to think about: is growing HashMap?s backing array by powers of 2 actually a good thing, when the HashMap gets large? What if you instead wanted to grow by powers of 1.5, or even grow probabilistically, based on the collision rate, allocation pressure, or other data? With fast-range you can do this if you want. And without the performance hit of %!

> On Nov 18, 2019, at 2:17 PM, Florian Weimer <fweimer at redhat.com> wrote:
> 
> * John Rose:
> 
>> On Nov 18, 2019, at 5:10 AM, Florian Weimer <fweimer at redhat.com> wrote:
>>> 
>>>> 
>>>> The idea is that, given (int) hash h and (int) size N, then ((long) h)
>>>> * N) >>> 32 is a good mapping.
>>> 
>>> I looked at this in the past weeks in a different context, and I don't
>>> think this would work because we have:
>> 
>> That technique appears to require either a well-conditioned hash code
>> (which is not the case with Integer.hashCode) or else a value of N that
>> performs extra mixing on h.  (So a very *non-*power-of-two value of N
>> would be better here, i.e., N with larger popcount.)
>> 
>> A little more mixing should help the problem Florian reports with a
>> badly conditioned h.  Given this:
>> 
>> int fr(int h) { return (int)(((long)h * N) >>> 32); }
>> int h = x.hashCode();
>> //int bucket = fr(h);  // weak if h is badly conditioned
>> 
>> then, assuming multiplication is cheap:
> 
> (Back-to-back multiplications probably are not.)
> 
>> int bucket = fr(h * M); // M = 0x2357BD or something
>> 
>> or maybe something fast and sloppy like:
>> 
>> int bucket = fr(h + (h << 8));
>> 
>> or even:
>> 
>> int bucket = fr(h) ^ (h & (N-1));
> 
> Does this really work?  I don't think so.
> 
> I think this kind of perturbation is quite expensive.  Arm's BITR should
> be helpful here.  But even though this operation is commonly needed and
> easily implemented in hardware, it's rarely found in CPUs.
> 
> Any scheme with another multiplication is probably not an improvement
> over the multiply-shift-multiply-subtract sequence to implement modulo
> for certain convenient bucket counts, and for that, we can look up
> extensive analysis. 8-)
> 
> Thanks,
> Florian


From vladimir.x.ivanov at oracle.com  Tue Nov 19 12:40:14 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 19 Nov 2019 15:40:14 +0300
Subject: 8230015: [instruction selector] generic vector operands support.
In-Reply-To: <0868bad5-c573-362f-0ff6-4a4332d95ecb@oracle.com>
References: <A66BBE673E08E1428E3A918AE4D5B32C0A042065@BGSMSX106.gar.corp.intel.com>
 <0868bad5-c573-362f-0ff6-4a4332d95ecb@oracle.com>
Message-ID: <6231a756-bb90-3536-9bfd-a6fdc0702f5a@oracle.com>

Short update on the progress: after extensive offline discussions 
between Jatin, Sandhya, and me, we decided to split the original patch 
into multiple independent pieces and post them for review separately.

On behalf of Jatin, I'll initiate next round of reviews shortly.

Generic vector support will be posted first (along with a couple of 
enhancements). AD instruction merges are in good shape, but still need 
more work. They'll be posted for review later.

Best regards,
Vladimir Ivanov

On 12.10.2019 03:41, Vladimir Ivanov wrote:
> Hi Jatin,
> 
> FTR I'm looking at:
>  ? http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/webrev.03/
> 
> Good work, Jatin. The reduction in number of instructions looks 
> impressive. Also, it's nice to see excessive vec? <-> legVec? moves 
> being eliminated.
> 
> High-level comments on the patch:
> 
>  ? - Please, separate the patch in 2 changes: generic vector type 
> support and AD instruction unification. It would significantly simplify 
> review.
> 
>  ? - Try to avoid x86-specific changes (#ifdefs) in shared files.
> 
> 
> I'm still looking through the implementation, but here are the bugs in 
> AD unification spotted so far:
> 
> ====================================================================
> src/hotspot/cpu/x86/x86_64.ad:
> 
> +operand vecG() %{
> +? constraint(ALLOC_IN_RC(vectorg_reg));
> 
> +operand legVecG() %{
> +? constraint(ALLOC_IN_RC(vectorg_reg));
> 
> legVecG definition shouldn't be equivalent to vecG: on AVX512-capable 
> host legVecG should never include zmm16-zmm31 while vecG can.
> 
> 
> ====================================================================
> src/hotspot/cpu/x86/x86.ad:
> 
> +instruct ReplF_zero_avx(vecG dst, immF0 zero) %{
> +? predicate(UseAVX < 3 &&
> 
> UseAVX > 0 check is missing (same with ReplD_zero_avx).
> 
> 
> ====================================================================
> +instruct vaddB_mem(vecG dst, vecG src, memory mem) %{
> +? predicate(UseAVX && (UseAVX <= 2 || VM_Version::supports_avx512bw()) &&
> +??????????? n->as_Vector()->length() >= 4 && n->as_Vector()->length() 
> <= 64);
> 
> It doesn't match the following case anymore:
>  ? UseAVX == 3 &&
>  ? VM_Version::supports_avx512bw() == false &&
>  ? n->as_Vector()->length() < 64 // smaller than 512bit
> 
> There are other AD instructions? (with VMVersion::supports_avx512bw() 
> check) which are affected the same way.
> 
> Best regards,
> Vladimir Ivanov
> 
> On 22/08/2019 09:49, Bhateja, Jatin wrote:
>> Hi All,
>>
>> Please find below a patch for generic vector operands[1] support 
>> during instruction selection.
>>
>> Motivation behind the patch is to reduce the number of vector 
>> selection patterns whose operands meagerly differ in vector lengths.
>>
>> This will not only result in lesser code being generated by ADLC which 
>> effectively translates to size reduction in libjvm.so but also
>>
>> help in better maintenance of AD files.
>>
>> Using generic operands we were able to collapse multiple vector 
>> patterns over mainline
>>
>> ??????? ??????Initial number of vector instruction patterns 
>> (vec[XYZSD] + legVec[ZXYSD] ? : *510*
>>
>> ??????????????Reduced vector instruction patterns? (vecG + 
>> legVecG)?????? ?????????????? ? ? ? ? ? ? : *222*
>>
>> **
>>
>> With this we could see around 1MB size reduction in libjvm.so.
>>
>> In order to have minimal impact over downstream compiler passes, a 
>> post-selection pass has been introduced (currently enabled only for 
>> X86 target)
>>
>> which replaces these generic operands with their corresponding 
>> concreter vector length variants.
>>
>> JBS?? ???: https://bugs.openjdk.java.net/browse/JDK-8230015
>>
>> Patch ?: 
>> http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/webrev.00/
>>
>> Kindly review and share your feedback.
>>
>> Best Regards,
>>
>> Jatin Bhateja
>>
>> [1] 
>> http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf 
>>
>>

From vladimir.x.ivanov at oracle.com  Tue Nov 19 13:00:36 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 19 Nov 2019 16:00:36 +0300
Subject: [14] RFR (S): 8234387: C2: Better support of operands with multiple
 match rules in AD files
Message-ID: <30655159-7431-1a33-cb10-373e32c68002@oracle.com>

http://cr.openjdk.java.net/~vlivanov/jbhateja/8234387/webrev.00
https://bugs.openjdk.java.net/browse/JDK-8234387

Though ADLC accepts operands with multiple match rules, it doesn't 
generate correct code to handle them except the first one.

It doesn't cause any noticeable problems for existing code, but is a 
major limitation for generic vector operands (JDK-8234391 [1]).

Proposed fix enumerates all match rules.

Fixed some missing declarations along the way.

Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
Reviewed-by: vlivanov, sviswanathan, ?

Testing: tier1-4 (both with and without generic vectors)

Best regards,
Vladimir Ivanov

[1] https://bugs.openjdk.java.net/browse/JDK-8234391

From vladimir.x.ivanov at oracle.com  Tue Nov 19 13:40:47 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 19 Nov 2019 16:40:47 +0300
Subject: [14] RFR (S): 8234394: C2: Dynamic register class support in ADLC
Message-ID: <82d28c5a-1b18-240d-8356-5e4266c63bd1@oracle.com>

http://cr.openjdk.java.net/~vlivanov/jbhateja/8234394/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8234394

Introduce new "placeholder" register class which denotes that 
instructions which use operands of such class should dynamically query 
register masks from the operand instance and not hard-code them in the code.

It is required for generic vectors in order to support generic vector 
operand (vec/legVec) replacement with fixed-sized vector operands 
(vec[SDXYZ]/legVec[SDXYZ]) after matching is over.

As an example of usage, generic vector operand is declared as:

operand vec() %{
   constraint(ALLOC_IN_RC(dynamic));
   match(VecX);
   match(VecY);
   match(VecZ);
   match(VecS);
   match(VecD);
...

Then for an instruction which uses vec as DEF

x86.ad:
instruct loadV4(vec dst, memory mem) %{

=ADLC=>

ad_x86_misc.cpp:
const RegMask &loadV4Node::out_RegMask() const {
   return (*_opnds[0]->in_RegMask(0));
}

vs

x86.ad:
instruct loadV4(vecS dst, memory mem) %{

=ADLC=>

ad_x86_misc.cpp:
const RegMask &loadV4Node::out_RegMask() const {
   return (VECTORS_REG_VLBWDQ_mask());
}


An operand with dynamic register class can't be used during code 
emission and should be replaced with something different before register 
allocation:

const RegMask *vecOper::in_RegMask(int index) const {
   return &RegMask::Empty;
}

Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
Reviewed-by: vlivanov, sviswanathan, ?

Testing: tier1-4 (both with and without generic vector operands)

Best regards,
Vladimir Ivanov

From erik.osterlund at oracle.com  Tue Nov 19 14:20:13 2019
From: erik.osterlund at oracle.com (erik.osterlund at oracle.com)
Date: Tue, 19 Nov 2019 15:20:13 +0100
Subject: RFR: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum
 in C2 load barrier
Message-ID: <af3039a6-6929-25d8-47b9-afef2d6ddfa6@oracle.com>

Hi,

Intel released an erratum (SKX102) which causes "unexpected system 
behaviour" when branches
(including fused conditional branches) cross or end at 64 byte boundaries.
They are mitigating this by rolling out microcode updates that disable 
micro op caching for
conditional branches that cross or end at 32 byte boundaries. The 
mitigation can cause
performance regressions, unless affected branches are aligned properly.

The erratum and its mitigation are described in more detail in this 
document published by Intel:
https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf

My intention for this patch is to introduce the infrastructure to 
determine that we may
have an affected CPU, and mitigate this by aligning the most important 
branch in the whole
JVM: the ZGC load barrier fast path check. Perhaps similar methodology 
can be reused later
to solve this for other performance critical code, but that is outside 
the scope of this CR.

The sprinkling of nops do not seem to cause regressions in workloads I 
have tried, given a
machine without the JCC mitigations.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8234160

Webrev:
http://cr.openjdk.java.net/~eosterlund/8234160/webrev.00/

Thanks,
/Erik

From vladimir.x.ivanov at oracle.com  Tue Nov 19 14:30:37 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 19 Nov 2019 17:30:37 +0300
Subject: [14] RFR (L): 8234391: C2: Generic vector operands
Message-ID: <89904467-5010-129f-6f61-e279cce8936a@oracle.com>

http://cr.openjdk.java.net/~vlivanov/jbhateja/8234391/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8234391

Introduce generic vector operands and migrate existing usages from fixed 
sized operands (vec[SDXYZ]) to generic ones.

(It's an updated version of generic vector support posted for review in 
August, 2019 [1] [2]. AD instruction merges will be handled separately.)

On a high-level it is organized as follows:

   (1) all AD instructions in x86.ad/x86_64.ad/x86_32.ad use vec/legVec;

   (2) at runtime, right after matching is over, a special pass is 
performed which does:

       * replaces vecOper with vec[SDXYZ] depending on mach node type
          - vector mach nodes capute bottom_type() of their ideal prototype;

       * eliminates redundant reg-to-reg vector moves (MoveVec2Leg 
/MoveLeg2Vec)
          - matcher needs them, but they are useless for register 
allocator (moreover, may cause additional spills);


    (3) after post-selection pass is over, all mach nodes should have 
fixed-size vector operands.


Some details:

    (1) vec and legVec are marked as "dynamic" operands, so 
post-selection rewriting works


    (2) new logic is guarded by new matcher flag 
(Matcher::supports_generic_vector_operands) which is enabled only on x86


    (3) post-selection analysis is implemented as a single pass over the 
graph and processing individual nodes using their own (for DEF operands) 
or their inputs (USE operands) bottom_type() (which is an instance of 
TypeVect)


    (4) most of the analysis is cross-platform and interface with 
platform-specific code through 3 methods:

      static bool is_generic_reg2reg_move(MachNode* m);
      // distinguishes MoveVec2Leg/MoveLeg2Vec nodes

      static bool is_generic_vector(MachOper* opnd);
      // distinguishes vec/legVec operands

      static MachOper* clone_generic_vector_operand(MachOper* 
generic_opnd, uint ideal_reg);
      // constructs fixed-sized vector operand based on ideal reg
      //   vec    + Op_Vec[SDXYZ] =>    vec[SDXYZ]
      //   legVec + Op_Vec[SDXYZ] => legVec[SDXYZ]


    (5) TEMP operands are handled specially:
      - TEMP uses max_vector_size() to determine what fixed-sized 
operand to use
          * it is needed to cover reductions which don't produce vectors 
but scalars
      - TEMP_DEF inherits fixed-sized operand type from DEF;


    (6) there is limited number of special cases for mach nodes in 
Matcher::get_vector_operand_helper:

        - RShiftCntV/RShiftCntV: though it reports wide vector type as 
Node::bottom_type(), its ideal_reg is VecS! But for vector nodes only 
Node::bottom_type() is captured during matching and not ideal_reg().

        - vshiftcntimm: chain instructions which convert scalar to 
vector don't have vector type.


    (7) idealreg2regmask initialization logic is adjusted to handle 
generic vector operands (see Matcher::get_vector_regmask)


    (8) operand renaming in x86_32.ad & x86_64.ad to avoid name 
conflicts with new vec/legVec operands


    (9) x86_64.ad: all TEMP usages of vecS/legVecS are replaced with 
regD/legRegD
       - it aligns the code between x86_64.ad and x86_32.ad
       - strictly speaking, it's illegal to use vector operands on a 
non-vector node (e.g., string_inflate) unless its usage is guarded by C2 
vector support checks (-XX:MaxVectorSize=0)


Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
Reviewed-by: vlivanov, sviswanathan, ?

Testing: tier1-tier4, jtreg compiler tests on KNL and SKL,
          performance testing (SPEC* + Octane + micros / G1 + ParGC).

Best regards,
Vladimir Ivanov

[1] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html

[2] 
http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf

From lutz.schmidt at sap.com  Tue Nov 19 14:35:45 2019
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Tue, 19 Nov 2019 14:35:45 +0000
Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free
 blocks
In-Reply-To: <D146649B-38B0-448A-9322-69F41BE227A3@sap.com>
References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com>
 <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com>
 <B89E11F0-98EE-49F1-87EB-FC22CB7D54A2@sap.com>
 <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com>
 <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com>
 <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com>
 <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com>
 <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com>
 <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com>
 <dd34fa81-b027-4353-df0f-daedf9911b6d@redhat.com>
 <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com>
 <bfc79f43-c00d-3f0c-8025-dbe19bc582e1@redhat.com>
 <D146649B-38B0-448A-9322-69F41BE227A3@sap.com>
Message-ID: <A5E6BDFC-64BF-44FB-A653-050E9947A2D6@sap.com>

Hi Andrew, 

finally(!) I was able to create some measurements which show kind of an effect on a real-world problem. 

I added my timers when running the renaissance benchmark (https://renaissance.dev). I am well aware of the limitations. One could argue this benchmark does not solve a real-world problem. Furthermore, the optimizations do not have a visible effect on the overall runtime (> 1 hour) of the test. But at least, deep down, the inner mechanics of CodeHeap management show some timing difference. I have attached a file with some measurement data to this mail for convenience. The same file was also uploaded to the bug. The measurements are from runs on linuxppc64. Other platforms show similar results. 

Here is what you can see (and my interpretation of the visible):

CodeHeap::mark_segmap_as_used()
===============================
The number of segment map entries to be processed per call is reduced by a factor of 2.5 to 5. As a consequence, the time spent in the method decreases as well, but not by the same factor. This is due to the added check for fragmentation and the defragmentation itself which occurs twice and eliminates roughly 3.500 excessive fragments. 

CodeHeap::add_to_freelist()
===========================
Here, the free list length controls the effort spent. Depending on the platform, the length increases by a factor of 2 (with optimizations turned on) or decreases by the same factor. Even with increased free list length, the total time spent in the method decreases. That's obviously an effect of not having to search the free list from the beginning every time.


I have created a new webrev, mainly to reflect the changes I applied, based on Thomas' comments:
http://cr.openjdk.java.net/~lucy/webrevs/8231460.02/ 

jdk/submit tests pending...

Please let me know if we have reached a state now where this change can be considered reviewed. 

Thanks a lot,
Lutz


On 07.11.19, 22:33, "Schmidt, Lutz" <lutz.schmidt at sap.com> wrote:

    Hi Andrew,
    
    thanks for spending more thoughts on this matter - and for updating
    your opinion.
    
    The instrumentation and measurement of other tests will take longer than expected. It got delayed by JDK-8233787. The fix for this bug will enable my timing code to run smoother.
    
    Side note: this timing code I have mentioned now several times is nothing secret. It's just not suitable to contribute, among other reasons because it's only available for ppc and s390. I can give you more information in case you are interested - no problem if you say "ahhh, never mind...".
    
    Thanks,
    Lutz
    
    On 07.11.19, 17:34, "Andrew Dinn" <adinn at redhat.com> wrote:
    
        On 04/11/2019 15:35, Schmidt, Lutz wrote:
        > thank you for your thoughts. I do not agree to your conclusion, 
        > though.
        > 
        > There are two bottlenecks in the CodeHeap management code. One is in
        > CodeHeap::mark_segmap_as_used(), uncovered by
        > OverflowCodeCacheTest.java.  The other is in 
        > CodeHeap::add_to_freelist(), uncovered by StressCodeCacheTest.java.
        > 
        > Both bottlenecks are tackled by the recommended changeset.
          . . .
        > CodeHeap::add_to_freelist() is still O(n*n), with n being the free 
        > list length. But the kick-in point of the non-linearity could be 
        > significantly shifted towards larger n. The time reduction from 
        > approx. 8 seconds to 160 milliseconds supports this statement.
        
        Ah sorry, I was not clear from your original post that the proposed
        change had significantly improved the time spent in free list management
        in the second test by significantly cutting down the free list size. As
        you say, a reduction factor of 1/K in list size will give a 1/K*K
        reduction in execution time. Since this test is a lot nearer to reality
        than the overflow test I think the current result is perhaps enough to
        justify its value.
        
        > I agree it would be helpful to have a "real-world" example showing 
        > some improvement. Providing such evidence is hard, though. I could 
        > instrument the code and print some values form time to time. It's 
        > certain this additional output will mess up success/failure decisions
        > in our test environment. Not sure everybody likes that. But I will
        > give it a try and take the hits. This will be a multi-day effort.
        
        Well, that would be nice to have but not if it stops other work. The one
        thing about the Stress test that I fear may be 'unreal' is the
        potentially over-high probability of generating long(ish) runs of
        adjacent free segments. That might be giving an artificial win that we
        will not in fact see. However, given the current numbers I'd be happy to
        risk that and let this patch go in as is.
        
        > On a general note, I am always uncomfortable knowing of a O(n*n) 
        > effort, in particular when it could be removed or at least tamed 
        > considerably. Experience tells (at least to me) that, at some point 
        > in time, n will be large enough to hurt.
        
        Well, yes, although salesman do travel /and/ make money ... ;-)
        
        > I'll be back.
        
        Sure, thanks for following up. This is all very interesting.
        
        regards,
        
        
        Andrew Dinn
        -----------
        Senior Principal Software Engineer
        Red Hat UK Ltd
        Registered in England and Wales under Company Registration No. 03798903
        Directors: Michael Cunningham, Michael ("Mike") O'Neill
        
        
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Renaissance_CodeHeap_timing.txt
URL: <https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20191119/1fdd6ed6/Renaissance_CodeHeap_timing.txt>

From adinn at redhat.com  Tue Nov 19 15:33:36 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Tue, 19 Nov 2019 15:33:36 +0000
Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free
 blocks
In-Reply-To: <A5E6BDFC-64BF-44FB-A653-050E9947A2D6@sap.com>
References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com>
 <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com>
 <B89E11F0-98EE-49F1-87EB-FC22CB7D54A2@sap.com>
 <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com>
 <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com>
 <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com>
 <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com>
 <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com>
 <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com>
 <dd34fa81-b027-4353-df0f-daedf9911b6d@redhat.com>
 <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com>
 <bfc79f43-c00d-3f0c-8025-dbe19bc582e1@redhat.com>
 <D146649B-38B0-448A-9322-69F41BE227A3@sap.com>
 <A5E6BDFC-64BF-44FB-A653-050E9947A2D6@sap.com>
Message-ID: <873965f0-61cc-4aac-fac1-a6cba442971f@redhat.com>

Hi Lutz,

Thanks for persevering and obtaining these measurements. The benchmark
may not necessarily be a great indicator of real-world problems but it
is a better indicator of code cache performance than any other tests we
have seen so far. I'm happy to accept this as evidence that the patch
will improve performance. So, yes, reviewed!

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill

On 19/11/2019 14:35, Schmidt, Lutz wrote:
> finally(!) I was able to create some measurements which show kind of
> an effect on a real-world problem.
> 
> I added my timers when running the renaissance benchmark
> (https://renaissance.dev). I am well aware of the limitations. One
> could argue this benchmark does not solve a real-world problem.
> Furthermore, the optimizations do not have a visible effect on the
> overall runtime (> 1 hour) of the test. But at least, deep down, the
> inner mechanics of CodeHeap management show some timing difference. I
> have attached a file with some measurement data to this mail for
> convenience. The same file was also uploaded to the bug. The
> measurements are from runs on linuxppc64. Other platforms show
> similar results.
> 
> Here is what you can see (and my interpretation of the visible):
> 
> CodeHeap::mark_segmap_as_used()
> ===============================
> The number of segment map entries to be processed per call is reduced
> by a factor of 2.5 to 5. As a consequence, the time spent in the
> method decreases as well, but not by the same factor. This is due to
> the added check for fragmentation and the defragmentation itself
> which occurs twice and eliminates roughly 3.500 excessive fragments.
> 
> CodeHeap::add_to_freelist()
> ===========================
> Here, the free list length controls the effort spent. Depending on
> the platform, the length increases by a factor of 2 (with
> optimizations turned on) or decreases by the same factor. Even with
> increased free list length, the total time spent in the method
> decreases. That's obviously an effect of not having to search the
> free list from the beginning every time.
> 
> 
> I have created a new webrev, mainly to reflect the changes I applied,
> based on Thomas' comments: 
> http://cr.openjdk.java.net/~lucy/webrevs/8231460.02/
> 
> jdk/submit tests pending...
> 
> Please let me know if we have reached a state now where this change
> can be considered reviewed.


From vladimir.x.ivanov at oracle.com  Tue Nov 19 16:53:09 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 19 Nov 2019 19:53:09 +0300
Subject: [14] RFR (S): 8234401: ConstantCallSite may stuck in non-frozen state
Message-ID: <b9131f9f-5c87-76ab-5151-77ae73c3ef6e@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8234401/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8234401

ConstantCallSite has a ctor which deliberately leaks partially 
initialized instance into user code. isFrozen is declared final and if 
user code is obstinate enough, it can end up with non-frozen state 
embedded into the generated code. It manifests as a ConstantCallSite 
instance which is stuck in non-frozen state.

I switched isFrozen from final to @Stable, so non-frozen state is never 
constant folded. Put some store-store barriers along the way to restore 
final field handling.

I deliberately stopped there (just restoring isFrozen final field 
behavior). Without proper synchronization, there's still a theoretical 
possibility of transiently observing a call site in non-frozen state 
right after ctor is over. But at least there's no way anymore to 
accidentally break an instance in such a way it becomes permanently 
unusable.

PS: converted CallSite.target to final along the way and made some other 
minor refactorings.

Testing: regression test, tier1-2

Best regards,
Vladimir Ivanov

From thomas.stuefe at gmail.com  Tue Nov 19 16:57:52 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Tue, 19 Nov 2019 17:57:52 +0100
Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free
 blocks
In-Reply-To: <A5E6BDFC-64BF-44FB-A653-050E9947A2D6@sap.com>
References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com>
 <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com>
 <B89E11F0-98EE-49F1-87EB-FC22CB7D54A2@sap.com>
 <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com>
 <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com>
 <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com>
 <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com>
 <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com>
 <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com>
 <dd34fa81-b027-4353-df0f-daedf9911b6d@redhat.com>
 <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com>
 <bfc79f43-c00d-3f0c-8025-dbe19bc582e1@redhat.com>
 <D146649B-38B0-448A-9322-69F41BE227A3@sap.com>
 <A5E6BDFC-64BF-44FB-A653-050E9947A2D6@sap.com>
Message-ID: <CAA-vtUwu2DnBQMUTeqDrk=kZ=azOxxfBUq0sG5jrO7SZfcfmKg@mail.gmail.com>

Looks good, Lutz.

..Thomas

On Tue, Nov 19, 2019 at 3:36 PM Schmidt, Lutz <lutz.schmidt at sap.com> wrote:

> Hi Andrew,
>
> finally(!) I was able to create some measurements which show kind of an
> effect on a real-world problem.
>
> I added my timers when running the renaissance benchmark (
> https://renaissance.dev). I am well aware of the limitations. One could
> argue this benchmark does not solve a real-world problem. Furthermore, the
> optimizations do not have a visible effect on the overall runtime (> 1
> hour) of the test. But at least, deep down, the inner mechanics of CodeHeap
> management show some timing difference. I have attached a file with some
> measurement data to this mail for convenience. The same file was also
> uploaded to the bug. The measurements are from runs on linuxppc64. Other
> platforms show similar results.
>
> Here is what you can see (and my interpretation of the visible):
>
> CodeHeap::mark_segmap_as_used()
> ===============================
> The number of segment map entries to be processed per call is reduced by a
> factor of 2.5 to 5. As a consequence, the time spent in the method
> decreases as well, but not by the same factor. This is due to the added
> check for fragmentation and the defragmentation itself which occurs twice
> and eliminates roughly 3.500 excessive fragments.
>
> CodeHeap::add_to_freelist()
> ===========================
> Here, the free list length controls the effort spent. Depending on the
> platform, the length increases by a factor of 2 (with optimizations turned
> on) or decreases by the same factor. Even with increased free list length,
> the total time spent in the method decreases. That's obviously an effect of
> not having to search the free list from the beginning every time.
>
>
> I have created a new webrev, mainly to reflect the changes I applied,
> based on Thomas' comments:
> http://cr.openjdk.java.net/~lucy/webrevs/8231460.02/
>
> jdk/submit tests pending...
>
> Please let me know if we have reached a state now where this change can be
> considered reviewed.
>
> Thanks a lot,
> Lutz
>
>
>
> On 07.11.19, 22:33, "Schmidt, Lutz" <lutz.schmidt at sap.com> wrote:
>
>     Hi Andrew,
>
>     thanks for spending more thoughts on this matter - and for updating
>     your opinion.
>
>     The instrumentation and measurement of other tests will take longer
> than expected. It got delayed by JDK-8233787. The fix for this bug will
> enable my timing code to run smoother.
>
>     Side note: this timing code I have mentioned now several times is
> nothing secret. It's just not suitable to contribute, among other reasons
> because it's only available for ppc and s390. I can give you more
> information in case you are interested - no problem if you say "ahhh, never
> mind...".
>
>     Thanks,
>     Lutz
>
>     On 07.11.19, 17:34, "Andrew Dinn" <adinn at redhat.com> wrote:
>
>         On 04/11/2019 15:35, Schmidt, Lutz wrote:
>         > thank you for your thoughts. I do not agree to your conclusion,
>         > though.
>         >
>         > There are two bottlenecks in the CodeHeap management code. One
> is in
>         > CodeHeap::mark_segmap_as_used(), uncovered by
>         > OverflowCodeCacheTest.java.  The other is in
>         > CodeHeap::add_to_freelist(), uncovered by
> StressCodeCacheTest.java.
>         >
>         > Both bottlenecks are tackled by the recommended changeset.
>           . . .
>         > CodeHeap::add_to_freelist() is still O(n*n), with n being the
> free
>         > list length. But the kick-in point of the non-linearity could be
>         > significantly shifted towards larger n. The time reduction from
>         > approx. 8 seconds to 160 milliseconds supports this statement.
>
>         Ah sorry, I was not clear from your original post that the proposed
>         change had significantly improved the time spent in free list
> management
>         in the second test by significantly cutting down the free list
> size. As
>         you say, a reduction factor of 1/K in list size will give a 1/K*K
>         reduction in execution time. Since this test is a lot nearer to
> reality
>         than the overflow test I think the current result is perhaps
> enough to
>         justify its value.
>
>         > I agree it would be helpful to have a "real-world" example
> showing
>         > some improvement. Providing such evidence is hard, though. I
> could
>         > instrument the code and print some values form time to time.
> It's
>         > certain this additional output will mess up success/failure
> decisions
>         > in our test environment. Not sure everybody likes that. But I
> will
>         > give it a try and take the hits. This will be a multi-day effort.
>
>         Well, that would be nice to have but not if it stops other work.
> The one
>         thing about the Stress test that I fear may be 'unreal' is the
>         potentially over-high probability of generating long(ish) runs of
>         adjacent free segments. That might be giving an artificial win
> that we
>         will not in fact see. However, given the current numbers I'd be
> happy to
>         risk that and let this patch go in as is.
>
>         > On a general note, I am always uncomfortable knowing of a O(n*n)
>         > effort, in particular when it could be removed or at least tamed
>         > considerably. Experience tells (at least to me) that, at some
> point
>         > in time, n will be large enough to hurt.
>
>         Well, yes, although salesman do travel /and/ make money ... ;-)
>
>         > I'll be back.
>
>         Sure, thanks for following up. This is all very interesting.
>
>         regards,
>
>
>         Andrew Dinn
>         -----------
>         Senior Principal Software Engineer
>         Red Hat UK Ltd
>         Registered in England and Wales under Company Registration No.
> 03798903
>         Directors: Michael Cunningham, Michael ("Mike") O'Neill
>
>
>
>
>
>
>

From vladimir.x.ivanov at oracle.com  Tue Nov 19 17:18:32 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 19 Nov 2019 20:18:32 +0300
Subject: [14] RFR (XS): 8234403: C2: Enable CallSite.target updates in
 constructors
Message-ID: <23f57c45-b8cf-9d98-c8ed-8f5147afaa03@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8234403/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8234403

Direct CallSite.target updates are not supported in C2, because in 
general the JVM has to invalidate all nmethod dependency before 
performing the update (handled by 
MethodHandleNatives.setCallSiteTargetNormal/Volatile).

But it's not the case for initializing stores during CallSite instance 
construction.

Proposed fix assumes all raw updates happen on not-yet-published 
instances (so no nmethod dependencies) and treats CallSite.target 
updates inside ctors as an ordinary field.

Considering the changes proposed for 8234401 [1], all direct updates in 
CallSite ctors are safe to be treated as ordinary field updates.

Testing: tier1-4

Best regards,
Vladimir Ivanov

[1] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036019.html

From paul.sandoz at oracle.com  Tue Nov 19 17:35:44 2019
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Tue, 19 Nov 2019 09:35:44 -0800
Subject: [14] RFR (S): 8234401: ConstantCallSite may stuck in non-frozen
 state
In-Reply-To: <b9131f9f-5c87-76ab-5151-77ae73c3ef6e@oracle.com>
References: <b9131f9f-5c87-76ab-5151-77ae73c3ef6e@oracle.com>
Message-ID: <D19C5DB0-8463-4FB4-84CC-3C30E3276D4E@oracle.com>

Ah the perils of partial construction :-)

Subtle, so I could be misunderstanding something, did you intend to remove the assignment of isFrozen in the ConstantCallSite constructor?

ConstantCallSite:

     protected ConstantCallSite(MethodType targetType, MethodHandle createTargetHook) throws Throwable {

-        super(targetType, createTargetHook);
-        isFrozen = true;
+        super(targetType, createTargetHook); // "this" instance leaks into createTargetHook
+        UNSAFE.storeStoreFence(); // properly publish isFrozen

     }

Paul.

> On Nov 19, 2019, at 8:53 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~vlivanov/8234401/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8234401
> 
> ConstantCallSite has a ctor which deliberately leaks partially initialized instance into user code. isFrozen is declared final and if user code is obstinate enough, it can end up with non-frozen state embedded into the generated code. It manifests as a ConstantCallSite instance which is stuck in non-frozen state.
> 
> I switched isFrozen from final to @Stable, so non-frozen state is never constant folded. Put some store-store barriers along the way to restore final field handling.
> 
> I deliberately stopped there (just restoring isFrozen final field behavior). Without proper synchronization, there's still a theoretical possibility of transiently observing a call site in non-frozen state right after ctor is over. But at least there's no way anymore to accidentally break an instance in such a way it becomes permanently unusable.
> 
> PS: converted CallSite.target to final along the way and made some other minor refactorings.
> 
> Testing: regression test, tier1-2
> 
> Best regards,
> Vladimir Ivanov


From vladimir.x.ivanov at oracle.com  Tue Nov 19 17:49:35 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 19 Nov 2019 20:49:35 +0300
Subject: [14] RFR (S): 8234401: ConstantCallSite may stuck in non-frozen
 state
In-Reply-To: <D19C5DB0-8463-4FB4-84CC-3C30E3276D4E@oracle.com>
References: <b9131f9f-5c87-76ab-5151-77ae73c3ef6e@oracle.com>
 <D19C5DB0-8463-4FB4-84CC-3C30E3276D4E@oracle.com>
Message-ID: <e524d9ae-3f83-6dae-e047-d92df5fcb701@oracle.com>


> Subtle, so I could be misunderstanding something, did you intend to remove the assignment of isFrozen in the ConstantCallSite constructor?

Oh, good catch. It is my fault: the update should be there. (The 
barriers are added just to preserve final field semantics for isFrozen.)

Published the wrong version (with some leftovers from last-minute failed 
experiment). Updated in place.

Best regards,
Vladimir Ivanov

>> On Nov 19, 2019, at 8:53 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>
>> http://cr.openjdk.java.net/~vlivanov/8234401/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8234401
>>
>> ConstantCallSite has a ctor which deliberately leaks partially initialized instance into user code. isFrozen is declared final and if user code is obstinate enough, it can end up with non-frozen state embedded into the generated code. It manifests as a ConstantCallSite instance which is stuck in non-frozen state.
>>
>> I switched isFrozen from final to @Stable, so non-frozen state is never constant folded. Put some store-store barriers along the way to restore final field handling.
>>
>> I deliberately stopped there (just restoring isFrozen final field behavior). Without proper synchronization, there's still a theoretical possibility of transiently observing a call site in non-frozen state right after ctor is over. But at least there's no way anymore to accidentally break an instance in such a way it becomes permanently unusable.
>>
>> PS: converted CallSite.target to final along the way and made some other minor refactorings.
>>
>> Testing: regression test, tier1-2
>>
>> Best regards,
>> Vladimir Ivanov
> 

From paul.sandoz at oracle.com  Tue Nov 19 18:03:09 2019
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Tue, 19 Nov 2019 10:03:09 -0800
Subject: [14] RFR (S): 8234401: ConstantCallSite may stuck in non-frozen
 state
In-Reply-To: <e524d9ae-3f83-6dae-e047-d92df5fcb701@oracle.com>
References: <b9131f9f-5c87-76ab-5151-77ae73c3ef6e@oracle.com>
 <D19C5DB0-8463-4FB4-84CC-3C30E3276D4E@oracle.com>
 <e524d9ae-3f83-6dae-e047-d92df5fcb701@oracle.com>
Message-ID: <E3EC1C0B-C8BB-489E-8C51-6F433D9A84AA@oracle.com>

Much better :-)

I accumulated some more questions while I was looking further.

CallSite:
 public class CallSite {
 
-    // The actual payload of this call site:
+    // The actual payload of this call site.
+    // Can be modified using {@link MethodHandleNatives#setCallSiteTargetNormal} or {@link MethodHandleNatives#setCallSiteTargetVolatile}.
     /*package-private*/
-    MethodHandle target;    // Note: This field is known to the JVM.  Do not change.
+    final MethodHandle target;  // Note: This field is known to the JVM.

Is there any benefit to making target final, even though it's not really for mutable call sites? (With the recent discussion of "final means final" it would be nice to not introduce more special case stomping on final fields if we can avoid it).


     CallSite(MethodType targetType, MethodHandle createTargetHook) throws Throwable {
-        this(targetType);
+        this(targetType); // need to initialize target to make CallSite.type() work in createTargetHook
         ConstantCallSite selfCCS = (ConstantCallSite) this;
         MethodHandle boundTarget = (MethodHandle) createTargetHook.invokeWithArguments(selfCCS);
-        checkTargetChange(this.target, boundTarget);
-        this.target = boundTarget;
+        setTargetNormal(boundTarget); // ConstantCallSite doesn't publish CallSite.target
+        UNSAFE.storeStoreFence(); // barrier between target and isFrozen updates
     }

I wonder if instead of introducing the store store fence here we could move it into ConstantCallSite?

     protected ConstantCallSite(MethodType targetType, MethodHandle createTargetHook) throws Throwable {
-        super(targetType, createTargetHook);
+        super(targetType, createTargetHook); // "this" instance leaks into createTargetHook
+        UNSAFE.storeStoreFence(); // barrier between target and isFrozen updates <? Add here
         isFrozen = true;
+        UNSAFE.storeStoreFence(); // properly publish isFrozen
     }

Paul.

> On Nov 19, 2019, at 9:49 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> 
>> Subtle, so I could be misunderstanding something, did you intend to remove the assignment of isFrozen in the ConstantCallSite constructor?
> 
> Oh, good catch. It is my fault: the update should be there. (The barriers are added just to preserve final field semantics for isFrozen.)
> 
> Published the wrong version (with some leftovers from last-minute failed experiment). Updated in place.
> 
> Best regards,
> Vladimir Ivanov
> 
>>> On Nov 19, 2019, at 8:53 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>> 
>>> http://cr.openjdk.java.net/~vlivanov/8234401/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8234401
>>> 
>>> ConstantCallSite has a ctor which deliberately leaks partially initialized instance into user code. isFrozen is declared final and if user code is obstinate enough, it can end up with non-frozen state embedded into the generated code. It manifests as a ConstantCallSite instance which is stuck in non-frozen state.
>>> 
>>> I switched isFrozen from final to @Stable, so non-frozen state is never constant folded. Put some store-store barriers along the way to restore final field handling.
>>> 
>>> I deliberately stopped there (just restoring isFrozen final field behavior). Without proper synchronization, there's still a theoretical possibility of transiently observing a call site in non-frozen state right after ctor is over. But at least there's no way anymore to accidentally break an instance in such a way it becomes permanently unusable.
>>> 
>>> PS: converted CallSite.target to final along the way and made some other minor refactorings.
>>> 
>>> Testing: regression test, tier1-2
>>> 
>>> Best regards,
>>> Vladimir Ivanov


From vladimir.x.ivanov at oracle.com  Tue Nov 19 18:12:37 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 19 Nov 2019 21:12:37 +0300
Subject: [14] RFR (S): 8234401: ConstantCallSite may stuck in non-frozen
 state
In-Reply-To: <E3EC1C0B-C8BB-489E-8C51-6F433D9A84AA@oracle.com>
References: <b9131f9f-5c87-76ab-5151-77ae73c3ef6e@oracle.com>
 <D19C5DB0-8463-4FB4-84CC-3C30E3276D4E@oracle.com>
 <e524d9ae-3f83-6dae-e047-d92df5fcb701@oracle.com>
 <E3EC1C0B-C8BB-489E-8C51-6F433D9A84AA@oracle.com>
Message-ID: <c6fa9f7b-72b5-dc93-55c8-4c21f59c83b3@oracle.com>

Thanks, Paul.

> CallSite:
> 
>   public class CallSite {
>   
> - // The actual payload of this call site:
> + // The actual payload of this call site.
> + // Can be modified using {@link 
> MethodHandleNatives#setCallSiteTargetNormal} or {@link 
> MethodHandleNatives#setCallSiteTargetVolatile}.
>       /*package-private*/
> - MethodHandle target; // Note: This field is known to the JVM. Do not 
> change.
> + final MethodHandle target; // Note: This field is known to the JVM.
> 
> 
> Is there any benefit to making target final, even though it's not really 
> for mutable call sites? (With the recent discussion of "final means 
> final" it would be nice to not introduce more special case stomping on 
> final fields if we can avoid it).

CallSite.target is already treated specially: all updates go through 
MethodHandleNatives and JIT-compiler treat it as "final" irrespective of 
the flags.

My main interest in marking it final is to enforce proper initialization 
on JDK side to make it easier to reason about:

 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036021.html 


>       CallSite(MethodType targetType, MethodHandle createTargetHook) throws Throwable {
> - this(targetType);
> + this(targetType); // need to initialize target to make CallSite.type() 
> work in createTargetHook
>           ConstantCallSite selfCCS = (ConstantCallSite) this;
>           MethodHandle boundTarget = (MethodHandle) createTargetHook.invokeWithArguments(selfCCS);
> - checkTargetChange(this.target, boundTarget);
> - this.target = boundTarget;
> + setTargetNormal(boundTarget); // ConstantCallSite doesn't publish 
> CallSite.target
> + UNSAFE.storeStoreFence(); // barrier between target and isFrozen updates
>       }
> 
> 
> I wonder if instead of introducing the store store fence here we could 
> move it into ConstantCallSite?

Sure, if you prefer to see it on ConstantCallSite side, we can move it 
there.

By putting it in CallSite near the call site update, I wanted to stress 
there's a CallSite.target update happening on partially published 
instance.

Best regards,
Vladimir Ivanov

> 
>       protected ConstantCallSite(MethodType targetType, MethodHandle createTargetHook) throws Throwable {
> - super(targetType, createTargetHook);
> + super(targetType, createTargetHook); // "this" instance leaks into 
> createTargetHook
> 
> + UNSAFE.storeStoreFence(); // barrier between target and isFrozen 
> updates<? Add here
> 
>           isFrozen = true;
> + UNSAFE.storeStoreFence(); // properly publish isFrozen
>       }
>> On Nov 19, 2019, at 9:49 AM, Vladimir Ivanov 
>> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>> 
>> wrote:
>>
>>
>>> Subtle, so I could be misunderstanding something, did you intend to 
>>> remove the assignment of isFrozen in the ConstantCallSite constructor?
>>
>> Oh, good catch. It is my fault: the update should be there. (The 
>> barriers are added just to preserve final field semantics for isFrozen.)
>>
>> Published the wrong version (with some leftovers from last-minute 
>> failed experiment). Updated in place.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>>> On Nov 19, 2019, at 8:53 AM, Vladimir Ivanov 
>>>> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>> 
>>>> wrote:
>>>>
>>>> http://cr.openjdk.java.net/~vlivanov/8234401/webrev.00/
>>>> https://bugs.openjdk.java.net/browse/JDK-8234401
>>>>
>>>> ConstantCallSite has a ctor which deliberately leaks partially 
>>>> initialized instance into user code. isFrozen is declared final and 
>>>> if user code is obstinate enough, it can end up with non-frozen 
>>>> state embedded into the generated code. It manifests as a 
>>>> ConstantCallSite instance which is stuck in non-frozen state.
>>>>
>>>> I switched isFrozen from final to @Stable, so non-frozen state is 
>>>> never constant folded. Put some store-store barriers along the way 
>>>> to restore final field handling.
>>>>
>>>> I deliberately stopped there (just restoring isFrozen final field 
>>>> behavior). Without proper synchronization, there's still a 
>>>> theoretical possibility of transiently observing a call site in 
>>>> non-frozen state right after ctor is over. But at least there's no 
>>>> way anymore to accidentally break an instance in such a way it 
>>>> becomes permanently unusable.
>>>>
>>>> PS: converted CallSite.target to final along the way and made some 
>>>> other minor refactorings.
>>>>
>>>> Testing: regression test, tier1-2
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
> 

From lutz.schmidt at sap.com  Tue Nov 19 18:14:17 2019
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Tue, 19 Nov 2019 18:14:17 +0000
Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free
 blocks
In-Reply-To: <873965f0-61cc-4aac-fac1-a6cba442971f@redhat.com>
References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com>
 <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com>
 <B89E11F0-98EE-49F1-87EB-FC22CB7D54A2@sap.com>
 <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com>
 <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com>
 <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com>
 <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com>
 <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com>
 <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com>
 <dd34fa81-b027-4353-df0f-daedf9911b6d@redhat.com>
 <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com>
 <bfc79f43-c00d-3f0c-8025-dbe19bc582e1@redhat.com>
 <D146649B-38B0-448A-9322-69F41BE227A3@sap.com>
 <A5E6BDFC-64BF-44FB-A653-050E9947A2D6@sap.com>
 <873965f0-61cc-4aac-fac1-a6cba442971f@redhat.com>
Message-ID: <6E415446-6DCC-4B98-8341-C335457C4753@sap.com>

Thank you, Andrew and Thomas,
for the reviews. Meanwhile, jdk/submit tests returned OK.
So I will proceed and push the change, probably on Wednesday. 
Best Regards,
Lutz

?On 19.11.19, 16:33, "Andrew Dinn" <adinn at redhat.com> wrote:

    Hi Lutz,
    
    Thanks for persevering and obtaining these measurements. The benchmark
    may not necessarily be a great indicator of real-world problems but it
    is a better indicator of code cache performance than any other tests we
    have seen so far. I'm happy to accept this as evidence that the patch
    will improve performance. So, yes, reviewed!
    
    regards,
    
    
    Andrew Dinn
    -----------
    Senior Principal Software Engineer
    Red Hat UK Ltd
    Registered in England and Wales under Company Registration No. 03798903
    Directors: Michael Cunningham, Michael ("Mike") O'Neill
    
    On 19/11/2019 14:35, Schmidt, Lutz wrote:
    > finally(!) I was able to create some measurements which show kind of
    > an effect on a real-world problem.
    > 
    > I added my timers when running the renaissance benchmark
    > (https://renaissance.dev). I am well aware of the limitations. One
    > could argue this benchmark does not solve a real-world problem.
    > Furthermore, the optimizations do not have a visible effect on the
    > overall runtime (> 1 hour) of the test. But at least, deep down, the
    > inner mechanics of CodeHeap management show some timing difference. I
    > have attached a file with some measurement data to this mail for
    > convenience. The same file was also uploaded to the bug. The
    > measurements are from runs on linuxppc64. Other platforms show
    > similar results.
    > 
    > Here is what you can see (and my interpretation of the visible):
    > 
    > CodeHeap::mark_segmap_as_used()
    > ===============================
    > The number of segment map entries to be processed per call is reduced
    > by a factor of 2.5 to 5. As a consequence, the time spent in the
    > method decreases as well, but not by the same factor. This is due to
    > the added check for fragmentation and the defragmentation itself
    > which occurs twice and eliminates roughly 3.500 excessive fragments.
    > 
    > CodeHeap::add_to_freelist()
    > ===========================
    > Here, the free list length controls the effort spent. Depending on
    > the platform, the length increases by a factor of 2 (with
    > optimizations turned on) or decreases by the same factor. Even with
    > increased free list length, the total time spent in the method
    > decreases. That's obviously an effect of not having to search the
    > free list from the beginning every time.
    > 
    > 
    > I have created a new webrev, mainly to reflect the changes I applied,
    > based on Thomas' comments: 
    > http://cr.openjdk.java.net/~lucy/webrevs/8231460.02/
    > 
    > jdk/submit tests pending...
    > 
    > Please let me know if we have reached a state now where this change
    > can be considered reviewed.
    
    
From paul.sandoz at oracle.com  Tue Nov 19 18:25:11 2019
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Tue, 19 Nov 2019 10:25:11 -0800
Subject: [14] RFR (S): 8234401: ConstantCallSite may stuck in non-frozen
 state
In-Reply-To: <c6fa9f7b-72b5-dc93-55c8-4c21f59c83b3@oracle.com>
References: <b9131f9f-5c87-76ab-5151-77ae73c3ef6e@oracle.com>
 <D19C5DB0-8463-4FB4-84CC-3C30E3276D4E@oracle.com>
 <e524d9ae-3f83-6dae-e047-d92df5fcb701@oracle.com>
 <E3EC1C0B-C8BB-489E-8C51-6F433D9A84AA@oracle.com>
 <c6fa9f7b-72b5-dc93-55c8-4c21f59c83b3@oracle.com>
Message-ID: <FBF865B4-B903-4A6A-9C8A-72DC05DD0963@oracle.com>

+1

> On Nov 19, 2019, at 10:12 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Thanks, Paul.
> 
>> CallSite:
>>  public class CallSite {
>>  - // The actual payload of this call site:
>> + // The actual payload of this call site.
>> + // Can be modified using {@link MethodHandleNatives#setCallSiteTargetNormal} or {@link MethodHandleNatives#setCallSiteTargetVolatile}.
>>      /*package-private*/
>> - MethodHandle target; // Note: This field is known to the JVM. Do not change.
>> + final MethodHandle target; // Note: This field is known to the JVM.
>> Is there any benefit to making target final, even though it's not really for mutable call sites? (With the recent discussion of "final means final" it would be nice to not introduce more special case stomping on final fields if we can avoid it).
> 
> CallSite.target is already treated specially: all updates go through MethodHandleNatives and JIT-compiler treat it as "final" irrespective of the flags.
> 
> My main interest in marking it final is to enforce proper initialization on JDK side to make it easier to reason about:
> 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036021.html 
> 

Ok, I see now in light of that context.


>>      CallSite(MethodType targetType, MethodHandle createTargetHook) throws Throwable {
>> - this(targetType);
>> + this(targetType); // need to initialize target to make CallSite.type() work in createTargetHook
>>          ConstantCallSite selfCCS = (ConstantCallSite) this;
>>          MethodHandle boundTarget = (MethodHandle) createTargetHook.invokeWithArguments(selfCCS);
>> - checkTargetChange(this.target, boundTarget);
>> - this.target = boundTarget;
>> + setTargetNormal(boundTarget); // ConstantCallSite doesn't publish CallSite.target
>> + UNSAFE.storeStoreFence(); // barrier between target and isFrozen updates
>>      }
>> I wonder if instead of introducing the store store fence here we could move it into ConstantCallSite?
> 
> Sure, if you prefer to see it on ConstantCallSite side, we can move it there.
> 
> By putting it in CallSite near the call site update, I wanted to stress there's a CallSite.target update happening on partially published instance.
> 

Up to you.

Paul.

From paul.sandoz at oracle.com  Tue Nov 19 18:26:36 2019
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Tue, 19 Nov 2019 10:26:36 -0800
Subject: [14] RFR (XS): 8234403: C2: Enable CallSite.target updates in
 constructors
In-Reply-To: <23f57c45-b8cf-9d98-c8ed-8f5147afaa03@oracle.com>
References: <23f57c45-b8cf-9d98-c8ed-8f5147afaa03@oracle.com>
Message-ID: <D1106EF1-FED5-4695-806D-A4413F7C78ED@oracle.com>

+1

Paul.

> On Nov 19, 2019, at 9:18 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~vlivanov/8234403/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8234403
> 
> Direct CallSite.target updates are not supported in C2, because in general the JVM has to invalidate all nmethod dependency before performing the update (handled by MethodHandleNatives.setCallSiteTargetNormal/Volatile).
> 
> But it's not the case for initializing stores during CallSite instance construction.
> 
> Proposed fix assumes all raw updates happen on not-yet-published instances (so no nmethod dependencies) and treats CallSite.target updates inside ctors as an ordinary field.
> 
> Considering the changes proposed for 8234401 [1], all direct updates in CallSite ctors are safe to be treated as ordinary field updates.
> 
> Testing: tier1-4
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036019.html


From vladimir.kozlov at oracle.com  Tue Nov 19 19:03:39 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 19 Nov 2019 11:03:39 -0800
Subject: [14] RFR (XS): 8234403: C2: Enable CallSite.target updates in
 constructors
In-Reply-To: <D1106EF1-FED5-4695-806D-A4413F7C78ED@oracle.com>
References: <23f57c45-b8cf-9d98-c8ed-8f5147afaa03@oracle.com>
 <D1106EF1-FED5-4695-806D-A4413F7C78ED@oracle.com>
Message-ID: <91fd323a-1b51-bca5-fbdd-d9da321b85c4@oracle.com>

+1

Vladimir K

On 11/19/19 10:26 AM, Paul Sandoz wrote:
> +1
> 
> Paul.
> 
>> On Nov 19, 2019, at 9:18 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>
>> http://cr.openjdk.java.net/~vlivanov/8234403/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8234403
>>
>> Direct CallSite.target updates are not supported in C2, because in general the JVM has to invalidate all nmethod dependency before performing the update (handled by MethodHandleNatives.setCallSiteTargetNormal/Volatile).
>>
>> But it's not the case for initializing stores during CallSite instance construction.
>>
>> Proposed fix assumes all raw updates happen on not-yet-published instances (so no nmethod dependencies) and treats CallSite.target updates inside ctors as an ordinary field.
>>
>> Considering the changes proposed for 8234401 [1], all direct updates in CallSite ctors are safe to be treated as ordinary field updates.
>>
>> Testing: tier1-4
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036019.html
> 

From thomas.stuefe at gmail.com  Tue Nov 19 19:31:00 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Tue, 19 Nov 2019 20:31:00 +0100
Subject: RFR: 8234328: VectorSet::clear can cause fragmentation
In-Reply-To: <47dce9ee-0e62-7375-4dff-2924f824ecc6@oracle.com>
References: <47dce9ee-0e62-7375-4dff-2924f824ecc6@oracle.com>
Message-ID: <CAA-vtUzJ_QQU8qewhzVt_tyv-Hot434gLT3+aZb4uwzM+mevPQ@mail.gmail.com>

Hi Claes,

Not that this is wrong, but do we have to live in resource area? I fell
over such problems several times already, e.g. with resource-area-backed
StringStreams. Maybe it would be better to just forbid resizing of
RA-allocated arrays altogether.

Then there is also the problem with passing RA-allocated arrays down the
stack and accidentally resizing them under a different ResourceMark. I am
not sure if this could happen with VectorSet though.

Thanks, Thomas


On Tue, Nov 19, 2019 at 11:16 AM Claes Redestad <claes.redestad at oracle.com>
wrote:

> Hi,
>
> today, VectorSet::clear "reclaims" storage when the size is large.
>
> However, since the backing array is allocated in a resource arena, this
> is dubious since the currently retained memory is only actually freed
> and made reusable if it's currently the last chunk of memory allocated
> in the arena. This means a clear() is likely to just waste the allocated
> memory until we exit the current resource scope
>
> Instead, I propose a strategy where instead of "freeing" we keep track
> of the currently allocated size of the VectorSet separately from the in-
> use size. We can then defer the memset to reset/clear the memory to the
> next time we need to grow, thus avoiding unnecessary reallocations and
> memsets. This limits the memory waste.
>
> Bug:    https://bugs.openjdk.java.net/browse/JDK-8234328
> Webrev: http://cr.openjdk.java.net/~redestad/8234328/open.00/
>
> Testing: tier1-3
>
> Either of reset() or clear() could now be removed, which seems like a
> straightforward follow-up RFE. With some convincing I could roll it into
> this patch.
>
> Thanks!
>
> /Claes
>

From dean.long at oracle.com  Tue Nov 19 22:16:39 2019
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Tue, 19 Nov 2019 14:16:39 -0800
Subject: [14] RFR (S): 8234387: C2: Better support of operands with
 multiple match rules in AD files
In-Reply-To: <30655159-7431-1a33-cb10-373e32c68002@oracle.com>
References: <30655159-7431-1a33-cb10-373e32c68002@oracle.com>
Message-ID: <0f57998a-7e25-807d-a562-1f74cffbd2e1@oracle.com>

Hi Vladimir.? The change seems to be doing what you describe, however 
for my curiosity, could you give an example of an operand with multiple 
match rules?

dl

On 11/19/19 5:00 AM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234387/webrev.00
> https://bugs.openjdk.java.net/browse/JDK-8234387
>
> Though ADLC accepts operands with multiple match rules, it doesn't 
> generate correct code to handle them except the first one.
>
> It doesn't cause any noticeable problems for existing code, but is a 
> major limitation for generic vector operands (JDK-8234391 [1]).
>
> Proposed fix enumerates all match rules.
>
> Fixed some missing declarations along the way.
>
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
>
> Testing: tier1-4 (both with and without generic vectors)
>
> Best regards,
> Vladimir Ivanov
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8234391


From Xiaohong.Gong at arm.com  Wed Nov 20 06:36:09 2019
From: Xiaohong.Gong at arm.com (Xiaohong Gong (Arm Technology China))
Date: Wed, 20 Nov 2019 06:36:09 +0000
Subject: RFR: 8234321: Call cache flush after generating trampoline.
Message-ID: <VE1PR08MB50080DA2524E36FBC48BFBFEF54F0@VE1PR08MB5008.eurprd08.prod.outlook.com>

Hi,

Please help to review this small patch:
Webrev: http://cr.openjdk.java.net/~xgong/rfr/8234321/webrev.00/
JBS: https://bugs.openjdk.java.net/browse/JDK-8234321

This patch fixes issue: https://github.com/oracle/graal/issues/1801.

It is caused by a "SIGLL (0x4)" when the compiler calls a shared method in the
CDS archive. It shows that the trampoline instruction is invalid, while actually
it's valid. And the trampoline is just an unconditional branch to the real entry
of the method, which is generated at runtime when the method is linked. So
one possible reason is that the instruction cache and data cache are not synced.

This patch fixes it by calling the cache flush after generating the trampoline for
shared methods. It invalidates the icache to make sure the instructions fetched
are updated. This is important for platforms that the CPUs do not have a coherent
icache like AArch64.

Thanks,
Xiaohong Gong

From rwestrel at redhat.com  Wed Nov 20 08:59:10 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 20 Nov 2019 09:59:10 +0100
Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use ==
 sfpt || !use->is_reachable_from_root())) failed: missed a node
Message-ID: <871ru2diu9.fsf@redhat.com>


http://cr.openjdk.java.net/~roland/8234350/webrev.00/

This is the same issue and fix as 8230061 (dead nodes in the outer strip
mined loop should be ignored in verification code when cloning the loop
body). The only difference is that the assert is relaxed so it applies
to all forms of cloning where the outer strip mined loop is involved.

Roland.


From adinn at redhat.com  Wed Nov 20 09:11:43 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 20 Nov 2019 09:11:43 +0000
Subject: RFR: 8234321: Call cache flush after generating trampoline.
In-Reply-To: <VE1PR08MB50080DA2524E36FBC48BFBFEF54F0@VE1PR08MB5008.eurprd08.prod.outlook.com>
References: <VE1PR08MB50080DA2524E36FBC48BFBFEF54F0@VE1PR08MB5008.eurprd08.prod.outlook.com>
Message-ID: <b88446e2-986d-a0c8-a62c-0884f129e6d7@redhat.com>

On 20/11/2019 06:36, Xiaohong Gong (Arm Technology China) wrote:
> Please help to review this small patch:
> Webrev: http://cr.openjdk.java.net/~xgong/rfr/8234321/webrev.00/
> JBS: https://bugs.openjdk.java.net/browse/JDK-8234321
Yes, the patch looks good.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From tobias.hartmann at oracle.com  Wed Nov 20 10:13:47 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 20 Nov 2019 11:13:47 +0100
Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use
 == sfpt || !use->is_reachable_from_root())) failed: missed a node
In-Reply-To: <871ru2diu9.fsf@redhat.com>
References: <871ru2diu9.fsf@redhat.com>
Message-ID: <02b9176e-8cae-fbd1-3d7a-0dfcdfa4b0a7@oracle.com>

Hi Roland,

this looks good to me but due to "-Xcomp -XX:-TieredCompilation", the test should either not be
executed with Graal as JIT or compilation should be restricted to test methods.

Best regards,
Tobias

On 20.11.19 09:59, Roland Westrelin wrote:
> 
> http://cr.openjdk.java.net/~roland/8234350/webrev.00/
> 
> This is the same issue and fix as 8230061 (dead nodes in the outer strip
> mined loop should be ignored in verification code when cloning the loop
> body). The only difference is that the assert is relaxed so it applies
> to all forms of cloning where the outer strip mined loop is involved.
> 
> Roland.
> 

From rwestrel at redhat.com  Wed Nov 20 10:17:44 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 20 Nov 2019 11:17:44 +0100
Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use
 == sfpt || !use->is_reachable_from_root())) failed: missed a node
In-Reply-To: <02b9176e-8cae-fbd1-3d7a-0dfcdfa4b0a7@oracle.com>
References: <871ru2diu9.fsf@redhat.com>
 <02b9176e-8cae-fbd1-3d7a-0dfcdfa4b0a7@oracle.com>
Message-ID: <87y2wac0mv.fsf@redhat.com>


Hi Tobias,

Thanks for reviewing this.

> this looks good to me but due to "-Xcomp -XX:-TieredCompilation", the test should either not be
> executed with Graal as JIT or compilation should be restricted to test methods.

Wouldn't -XX:CompileOnly=DeadNodesInOuterLoopAtLoopCloning2 (included in
the test case) do that?

Roland.


From tobias.hartmann at oracle.com  Wed Nov 20 10:26:22 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 20 Nov 2019 11:26:22 +0100
Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use
 == sfpt || !use->is_reachable_from_root())) failed: missed a node
In-Reply-To: <87y2wac0mv.fsf@redhat.com>
References: <871ru2diu9.fsf@redhat.com>
 <02b9176e-8cae-fbd1-3d7a-0dfcdfa4b0a7@oracle.com> <87y2wac0mv.fsf@redhat.com>
Message-ID: <d79cf491-fba8-1eaa-ec36-33b7f096d234@oracle.com>


On 20.11.19 11:17, Roland Westrelin wrote:
> Wouldn't -XX:CompileOnly=DeadNodesInOuterLoopAtLoopCloning2 (included in
> the test case) do that?

Oops, somehow missed that.. Of course it does :)

Thanks,
Tobias

From vladimir.x.ivanov at oracle.com  Wed Nov 20 11:09:30 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 20 Nov 2019 14:09:30 +0300
Subject: [14] RFR (S): 8234387: C2: Better support of operands with
 multiple match rules in AD files
In-Reply-To: <0f57998a-7e25-807d-a562-1f74cffbd2e1@oracle.com>
References: <30655159-7431-1a33-cb10-373e32c68002@oracle.com>
 <0f57998a-7e25-807d-a562-1f74cffbd2e1@oracle.com>
Message-ID: <b6d627dc-a4e1-3045-7343-45d401427bef@oracle.com>

Hi Dean,

Actually there are many operands declared with multiple match rules.

The one which is hit the hardest (matching failures) is being added by 
8234391 [1]:

+operand vec() %{
+  constraint(ALLOC_IN_RC(dynamic));
+  match(VecX);
+  match(VecY);
+  match(VecZ);
+  match(VecS);
+  match(VecD);
+
+  format %{ %}
+  interface(REG_INTER);
+%}

But there are existing operand declarations which are affected:

operand rRegP()
%{
   constraint(ALLOC_IN_RC(ptr_reg));
   match(RegP);
   match(rax_RegP);
   match(rbx_RegP);
   match(rdi_RegP);
   match(rsi_RegP);
   match(rbp_RegP);  // See Q&A below about
   match(r15_RegP);  // r15_RegP and rbp_RegP.

   format %{ %}
   interface(REG_INTER);
%}

There was no rbp_RegP operand declared and ADLC didn't notice it since 
it didn't enumerate all the match rules.

Best regards,
Vladimir Ivanov

[1] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html

On 20.11.2019 01:16, dean.long at oracle.com wrote:
> Hi Vladimir.? The change seems to be doing what you describe, however 
> for my curiosity, could you give an example of an operand with multiple 
> match rules?
> 
> dl
> 
> On 11/19/19 5:00 AM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234387/webrev.00
>> https://bugs.openjdk.java.net/browse/JDK-8234387
>>
>> Though ADLC accepts operands with multiple match rules, it doesn't 
>> generate correct code to handle them except the first one.
>>
>> It doesn't cause any noticeable problems for existing code, but is a 
>> major limitation for generic vector operands (JDK-8234391 [1]).
>>
>> Proposed fix enumerates all match rules.
>>
>> Fixed some missing declarations along the way.
>>
>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>> Reviewed-by: vlivanov, sviswanathan, ?
>>
>> Testing: tier1-4 (both with and without generic vectors)
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8234391
> 

From vladimir.x.ivanov at oracle.com  Wed Nov 20 11:10:25 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 20 Nov 2019 14:10:25 +0300
Subject: [14] RFR (XS): 8234403: C2: Enable CallSite.target updates in
 constructors
In-Reply-To: <91fd323a-1b51-bca5-fbdd-d9da321b85c4@oracle.com>
References: <23f57c45-b8cf-9d98-c8ed-8f5147afaa03@oracle.com>
 <D1106EF1-FED5-4695-806D-A4413F7C78ED@oracle.com>
 <91fd323a-1b51-bca5-fbdd-d9da321b85c4@oracle.com>
Message-ID: <c4a578ff-b279-dcc4-29cc-d101abfe0dab@oracle.com>

Thanks for reviews, Vladimir K. and Paul.

Best regards,
Vladimir Ivanov

On 19.11.2019 22:03, Vladimir Kozlov wrote:
> +1
> 
> Vladimir K
> 
> On 11/19/19 10:26 AM, Paul Sandoz wrote:
>> +1
>>
>> Paul.
>>
>>> On Nov 19, 2019, at 9:18 AM, Vladimir Ivanov 
>>> <vladimir.x.ivanov at oracle.com> wrote:
>>>
>>> http://cr.openjdk.java.net/~vlivanov/8234403/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8234403
>>>
>>> Direct CallSite.target updates are not supported in C2, because in 
>>> general the JVM has to invalidate all nmethod dependency before 
>>> performing the update (handled by 
>>> MethodHandleNatives.setCallSiteTargetNormal/Volatile).
>>>
>>> But it's not the case for initializing stores during CallSite 
>>> instance construction.
>>>
>>> Proposed fix assumes all raw updates happen on not-yet-published 
>>> instances (so no nmethod dependencies) and treats CallSite.target 
>>> updates inside ctors as an ordinary field.
>>>
>>> Considering the changes proposed for 8234401 [1], all direct updates 
>>> in CallSite ctors are safe to be treated as ordinary field updates.
>>>
>>> Testing: tier1-4
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1] 
>>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036019.html 
>>>
>>

From vladimir.x.ivanov at oracle.com  Wed Nov 20 11:11:53 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 20 Nov 2019 14:11:53 +0300
Subject: [14] RFR (S): 8234401: ConstantCallSite may stuck in non-frozen
 state
In-Reply-To: <FBF865B4-B903-4A6A-9C8A-72DC05DD0963@oracle.com>
References: <b9131f9f-5c87-76ab-5151-77ae73c3ef6e@oracle.com>
 <D19C5DB0-8463-4FB4-84CC-3C30E3276D4E@oracle.com>
 <e524d9ae-3f83-6dae-e047-d92df5fcb701@oracle.com>
 <E3EC1C0B-C8BB-489E-8C51-6F433D9A84AA@oracle.com>
 <c6fa9f7b-72b5-dc93-55c8-4c21f59c83b3@oracle.com>
 <FBF865B4-B903-4A6A-9C8A-72DC05DD0963@oracle.com>
Message-ID: <3949ef3f-ba3c-2ff5-282c-86b6662e502f@oracle.com>

Thanks for review, Paul.

Best regards,
Vladimir Ivanov

On 19.11.2019 21:25, Paul Sandoz wrote:
> +1
> 
>> On Nov 19, 2019, at 10:12 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>
>> Thanks, Paul.
>>
>>> CallSite:
>>>   public class CallSite {
>>>   - // The actual payload of this call site:
>>> + // The actual payload of this call site.
>>> + // Can be modified using {@link MethodHandleNatives#setCallSiteTargetNormal} or {@link MethodHandleNatives#setCallSiteTargetVolatile}.
>>>       /*package-private*/
>>> - MethodHandle target; // Note: This field is known to the JVM. Do not change.
>>> + final MethodHandle target; // Note: This field is known to the JVM.
>>> Is there any benefit to making target final, even though it's not really for mutable call sites? (With the recent discussion of "final means final" it would be nice to not introduce more special case stomping on final fields if we can avoid it).
>>
>> CallSite.target is already treated specially: all updates go through MethodHandleNatives and JIT-compiler treat it as "final" irrespective of the flags.
>>
>> My main interest in marking it final is to enforce proper initialization on JDK side to make it easier to reason about:
>>
>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036021.html
>>
> 
> Ok, I see now in light of that context.
> 
> 
>>>       CallSite(MethodType targetType, MethodHandle createTargetHook) throws Throwable {
>>> - this(targetType);
>>> + this(targetType); // need to initialize target to make CallSite.type() work in createTargetHook
>>>           ConstantCallSite selfCCS = (ConstantCallSite) this;
>>>           MethodHandle boundTarget = (MethodHandle) createTargetHook.invokeWithArguments(selfCCS);
>>> - checkTargetChange(this.target, boundTarget);
>>> - this.target = boundTarget;
>>> + setTargetNormal(boundTarget); // ConstantCallSite doesn't publish CallSite.target
>>> + UNSAFE.storeStoreFence(); // barrier between target and isFrozen updates
>>>       }
>>> I wonder if instead of introducing the store store fence here we could move it into ConstantCallSite?
>>
>> Sure, if you prefer to see it on ConstantCallSite side, we can move it there.
>>
>> By putting it in CallSite near the call site update, I wanted to stress there's a CallSite.target update happening on partially published instance.
>>
> 
> Up to you.
> 
> Paul.
> 

From per.liden at oracle.com  Wed Nov 20 12:08:12 2019
From: per.liden at oracle.com (Per Liden)
Date: Wed, 20 Nov 2019 13:08:12 +0100
Subject: RFC 8233915: JVMTI FollowReferences: Java Heap Leak not found
 because of C2 Scalar Replacement
In-Reply-To: <729138cc-7a21-cf79-947c-c6a68f34237a@oracle.com>
References: <VI1PR02MB482960E862A276EF918D88B59B740@VI1PR02MB4829.eurprd02.prod.outlook.com>
 <729138cc-7a21-cf79-947c-c6a68f34237a@oracle.com>
Message-ID: <2087d0a1-8561-7972-a96b-8caf5ed8c2e6@oracle.com>

FYI, I've just pushed a patch[1] that makes ZGC work better with smaller 
heaps (down to 8M).

[1] 
https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2019-November/027844.html

/Per

On 11/12/19 8:33 PM, Leonid Mesnik wrote:
> Hi
> 
> I don't make complete review just sanity verified your test headers. I 
> see a couple of potential issues with them.
> 
> 1) The using Xmx32M could cause OOME failures if test is executed with 
> ZGC. I think that at least 256M should be set. Could you please verify 
> that your tests pass with ZGC enabled.
> 
> 
> 2) I think it makes sense to add requires
> 
> vm.opt.TieredCompilation != true
> 
> to just skip tests if anyone runs them with tiered compilation disabled 
> explicitly.
> 
> Leonid
> 
> On 11/11/19 7:29 AM, Reingruber, Richard wrote:
>> Hi,
>>
>> I have created https://bugs.openjdk.java.net/browse/JDK-8233915
>>
>> In short, a set of live objects L is not found using JVMTI 
>> FollowReferences() if L is only reachable
>> from a scalar replaced object in a frame of a C2 compiled method. If L 
>> happens to be a growing leak,
>> then a dynamically loaded JVMTI agent (note: can_tag_objects is an 
>> always capability) for heap
>> diagnostics won't discover L as live and it won't be able to find root 
>> references that lead to L.
>>
>> I'd like to suggest the implementation for the proposed enhancement 
>> JDK-8227745 as bug-fix.
>>
>> RFE:?????? https://bugs.openjdk.java.net/browse/JDK-8227745
>> Webrev(*): 
>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.1/
>>
>> Please comment on the suggestion. Dou you see other solutions that 
>> allow an agent to discover the
>> chain of references to L?
>>
>> I'd like to work on the complexity as well. One significant 
>> simplification could be, if it was
>> possible to reallocate scalar replaced objects at safepoints (i.e. 
>> allow the VM thread to call
>> Deoptimization::realloc_objects()). The GC interface does not seem to 
>> allow this.
>>
>> Thanks, Richard.
>>
>> (*) Not yet accepted, because deemed too complex for the performance 
>> gain. Note that I was able to
>> ???? reduce webrev.1 in size compared to webrev.0

From fujie at loongson.cn  Wed Nov 20 12:39:09 2019
From: fujie at loongson.cn (Jie Fu)
Date: Wed, 20 Nov 2019 20:39:09 +0800
Subject: RFR: 8234499: [Graal]
 compiler/compilercontrol/CompilationModeHighOnlyTest.java test fails with
 timeout
Message-ID: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn>

Hi all,

May I get reviews for this small fix?

JBS:??? https://bugs.openjdk.java.net/browse/JDK-8234499
Webrev: http://cr.openjdk.java.net/~jiefu/8234499/webrev.00/

And I need a sponsor.

Thanks a lot.
Best regards,
Jie


From vladimir.x.ivanov at oracle.com  Wed Nov 20 13:49:01 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 20 Nov 2019 16:49:01 +0300
Subject: Allocation of array copy can be eliminated in particular cases
In-Reply-To: <48528911574117040@sas8-55d8cbf44a35.qloud-c.yandex.net>
References: <48528911574117040@sas8-55d8cbf44a35.qloud-c.yandex.net>
Message-ID: <9d229f16-e3e8-c848-eea8-f4a24082aa3f@oracle.com>

Hi Sergey,

> Is my speculation correct and does it make sence to implement optimization that turns sequence
> 
> array -> array.clone() - > clone.length
> 
> into
> 
> array -> array.length
> 
> for the cases clone's visibility scope is predictable?

Considering there's no way to grow/shrink Java arrays, 
"cloned_array.length => original_array.length" transformation is correct 
irrespective of whether cloned variant escapes or not.

Moreover, the transformation is already there:
 
http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/opto/memnode.cpp#l2388

I haven't looked into the benchmarks you mentioned, but it looks like 
cloned_array.length access is not the reason why cloned array is still 
there.

Regarding your other ideas, redirecting accesses from cloned instance to 
original is problematic (in general case) since compiler has to prove 
there were no changes in both versions and indexed accesses make it even 
harder. And safepoints cause problems as well (for rematerialization).

But I agree that it would be nice to cover (at least) simple cases of 
defensive copying.

Best regards,
Vladimir Ivanov

From christian.hagedorn at oracle.com  Wed Nov 20 14:14:31 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Wed, 20 Nov 2019 15:14:31 +0100
Subject: [14] RFR(S): 8231501: VM crash in
 MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected
 tag 99
Message-ID: <c7d6ea8a-fc38-400f-1071-47b2a10bd4cb@oracle.com>

Hi

Please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8231501
http://cr.openjdk.java.net/~chagedorn/8231501/webrev.00/

The bug could be traced back to the concurrent cleaning of method data 
with its extra data in MethodData::clean_method_data() and the 
loading/copying of extra data for the ci method data in 
ciMethodData::load_extra_data(). I reproduced the bug by using the test 
[1] which extensively cleans method data by using the whitebox API [2].

Before loading and copying the extra data from the MDO to the ciMDO in 
ciMethodData::load_extra_data(), the metadata is prepared in a 
fixed-point iteration by cleaning all SpeculativeTrapData entries of 
methods whose klasses are unloaded [3]. If it encounters such a dead 
entry it releases the extra data lock (due to ranking issues) and tries 
again later [4]. This release of the lock triggers the bug: There can be 
cases where one thread A is waiting in the whitebox API method to get 
the extra data lock [2] to clean the extra data for the very same MDO 
for which another thread B just released the lock at [4]. If that MDO 
actually contained SpeculativeTrapData entries, then thread A cleaned 
those but the ciMDO, which thread B is preparing, still contains the 
uncleaned old MDO extra data (because thread B only made a snapshot of 
the MDO earlier at [5]). Things then go wrong when thread B can 
reacquire the lock after thread A. It tries to load the now cleaned 
extra data and immediately finishes at [6] since there are no 
SpeculativeTrapData entries anymore. It copied a single entry with tag 
DataLayout::no_tag [7] to the ciMDO which actually contained a 
SpeculativeTrapData entry. This results in a half way cleared entry 
(since a SpeculativeTrapData entry has an additional cell for the 
method) and possible other remaining SpeculativeTrapData entries:


Let's assume a little-endian ordering and that both 0x00007fff... 
addresses are real pointers to methods. Tag 13 (0x0d) is used for 
SpeculativeTrapData and dp points to the first extra data entry:

ciMDO extra data before thread B releases the lock at [4] (same extra 
data for MDO and ciMDO):
0x800000040011000d 0x00007fffd4993c63 0x800000040011000d 
0x00007fffd49b1a68 0x0000000000000000
dp: tag = 13 -> next entry = dp+16; dp+8: method 0x00007fffd4993c63
dp+16: tag = 13 -> next entry = dp+32; dp+24: method 0x00007fffd49b1a68
dp+32: tag = 0 -> end of extra data

MDO extra data after thread B reacquires the lock and thread A cleaned 
the MDO (ciMDO extra data is unchanged):
0x0000000000000000 0x0000000000000000 0x0000000000000000 
0x0000000000000000 0x0000000000000000
dp: tag = 0 -> end of extra data


Returning at [6] when the extra data loading from MDO to ciMDO is finished:
MDO extra data:
0x0000000000000000 0x0000000000000000 0x0000000000000000 
0x0000000000000000 0x0000000000000000
dp: tag = 0 -> end of extra data

ciMDO extra data, only copied the first no_tag entry from MDO at [7] (8 
bytes):
0x0000000000000000 0x00007fffd4993c63 0x800000040011000d 
0x00007fffd49b1a68 0x0000000000000000
dp: tag = 0 -> next entry = dp+8
dp+8: tag = 0x63 = 99 -> there is no tag 99 -> fatal...


The next time the ciMDO extra data is iterated, for example by using 
MethodData::next_extra(), it reads tag 99 after processing the first 
no_tag entry and jumping to the value at offset 8 which causes a crash 
since there is no tag 99 available.


The fix is to completely zero out the current and all following 
SpeculativeTrapData entries if we encounter a no_tag in the MDO but a 
speculative_trap_data_tag tag in the ciMDO. There are also other cases 
where the method data is cleaned. Thus the bug is not only related to 
the whitebox API usage but occurs very rarely.

Thank you!

Best regards,
Christian


[1] 
http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/test/hotspot/jtreg/compiler/types/correctness/CorrectnessTest.java
[2] 
http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/prims/whitebox.cpp#l1137
[3] 
http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l137
[4] 
http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l115
[5] 
http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l219
[6] 
http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l191
[7] 
http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l176

From nils.eliasson at oracle.com  Wed Nov 20 14:25:54 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Wed, 20 Nov 2019 15:25:54 +0100
Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles
Message-ID: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com>

Hi,

I found a few bugs after the enabling of the clone intrinsic in ZGC.

1) The arraycopy clone_basic has the parameters adjusted to work as a 
memcopy. For an oop the src is pointing inside the oop to where we want 
to start copying. But when we want to do a runtime call to clone - the 
parameters are supposed to be the actual src oop and dst oop, and the 
size should be the instance size.

For now I have made a workaround. What should be done later is using the 
offset in the arraycopy node to encode where the payload is, so that the 
base pointers are always correct. But that would require changes to the 
BarrierSet classes of all GCs. So I leave that for next release.

2) The size parameter of the TypeFunc for the runtime call has the wrong 
type. It was originally Long but missed the upper Half, it was fixed to 
INT (JDK-8233834), but that is wrong and causes the compiles to be 
skipped. We didn't notice that since they failed silently. That is also 
why we didn't notice problem #1 too.

https://bugs.openjdk.java.net/browse/JDK-8234520

http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/

Please review!

Nils


From nils.eliasson at oracle.com  Wed Nov 20 14:36:16 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Wed, 20 Nov 2019 15:36:16 +0100
Subject: Allocation of array copy can be eliminated in particular cases
In-Reply-To: <9d229f16-e3e8-c848-eea8-f4a24082aa3f@oracle.com>
References: <48528911574117040@sas8-55d8cbf44a35.qloud-c.yandex.net>
 <9d229f16-e3e8-c848-eea8-f4a24082aa3f@oracle.com>
Message-ID: <fd709a7e-6cfa-f75c-4532-2e1f1ce47276@oracle.com>


On 2019-11-20 14:49, Vladimir Ivanov wrote:
> Hi Sergey,
>
>> Is my speculation correct and does it make sence to implement 
>> optimization that turns sequence
>>
>> array -> array.clone() - > clone.length
>>
>> into
>>
>> array -> array.length
>>
>> for the cases clone's visibility scope is predictable?
>
> Considering there's no way to grow/shrink Java arrays, 
> "cloned_array.length => original_array.length" transformation is 
> correct irrespective of whether cloned variant escapes or not.
>
> Moreover, the transformation is already there:
>
> http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/opto/memnode.cpp#l2388 
>
>
> I haven't looked into the benchmarks you mentioned, but it looks like 
> cloned_array.length access is not the reason why cloned array is still 
> there.

We don't eliminate array allocations that doesn't have a known length 
because they might cause a NegativeArraySize Exception. In this case we 
should be able to prove that the length is positive though.

Anyway - I have an almost finished patch that replace unused array 
allocations with a proper guard.

// Nils


>
> Regarding your other ideas, redirecting accesses from cloned instance 
> to original is problematic (in general case) since compiler has to 
> prove there were no changes in both versions and indexed accesses make 
> it even harder. And safepoints cause problems as well (for 
> rematerialization).
>
> But I agree that it would be nice to cover (at least) simple cases of 
> defensive copying.
>
> Best regards,
> Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Wed Nov 20 15:34:05 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 20 Nov 2019 18:34:05 +0300
Subject: Allocation of array copy can be eliminated in particular cases
In-Reply-To: <fd709a7e-6cfa-f75c-4532-2e1f1ce47276@oracle.com>
References: <48528911574117040@sas8-55d8cbf44a35.qloud-c.yandex.net>
 <9d229f16-e3e8-c848-eea8-f4a24082aa3f@oracle.com>
 <fd709a7e-6cfa-f75c-4532-2e1f1ce47276@oracle.com>
Message-ID: <c3008371-343e-b66c-2955-182c2958f05e@oracle.com>


>> I haven't looked into the benchmarks you mentioned, but it looks like 
>> cloned_array.length access is not the reason why cloned array is still 
>> there.
> 
> We don't eliminate array allocations that doesn't have a known length 
> because they might cause a NegativeArraySize Exception. In this case we 
> should be able to prove that the length is positive though.

Good point, Nils!

Yes, in this particular case the length is always non-negative.
So, the guard you are working on will go away right away.

Best regards,
Vladimir Ivanov

From sergei.tsypanov at yandex.ru  Wed Nov 20 21:30:48 2019
From: sergei.tsypanov at yandex.ru (=?utf-8?B?0KHQtdGA0LPQtdC5INCm0YvQv9Cw0L3QvtCy?=)
Date: Wed, 20 Nov 2019 23:30:48 +0200
Subject: Allocation of array copy can be eliminated in particular cases
In-Reply-To: <9d229f16-e3e8-c848-eea8-f4a24082aa3f@oracle.com>
References: <48528911574117040@sas8-55d8cbf44a35.qloud-c.yandex.net>
 <9d229f16-e3e8-c848-eea8-f4a24082aa3f@oracle.com>
Message-ID: <30768651574285448@vla1-a6eaa355d163.qloud-c.yandex.net>

Hello Vladimir,

thank you for your response!

> Moreover, the transformation is already there:
>
> http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/opto/memnode.cpp#l2388

Comment in line 2389 seems confusing to me:

// This works even if the length is not constant (clone or newArray).

When we clone array isn't the length constant and equal to the length of original array? I guess it cannot be different.

> I haven't looked into the benchmarks you mentioned, but it looks like
> cloned_array.length access is not the reason why cloned array is still
> there.

Once I thought that cloned array is retained at run time because it's returned from method in original benchmark:

@Benchmark
public int getParameterTypes() { return method.getParameterTypes().length; }

To check whether this speculation is correct I've tried to change my benchmark in order to strip any additional logic from it [1]:

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class ArrayAllocationEliminationBenchmark {

  private int length = 10;

  //...

  @Benchmark
  public int baseline() {
    return new int[length].length;
  }

  @Benchmark
  public int baselineClone() {
    return new int[length].clone().length;
  }
  //...
}

Here I don't see any reason for runtime to hold cloned array:

1) int is returned from the method
2) cloned array doesn't escape the place where it's created

So the cloned array should be dropped off, but according to benchmarking results it's not:

JDK 11

                                                               Mode  Cnt     Score     Error   Units
baseline                                                       avgt   25    10,860 ?   0,604   ns/op
baseline:?gc.alloc.rate                                        avgt   25  4703,477 ? 215,986  MB/sec
baseline:?gc.alloc.rate.norm                                   avgt   25    56,000 ?   0,001    B/op
baseline:?gc.churn.CodeHeap_'non-profiled_nmethods'            avgt   25     0,002 ?   0,001  MB/sec
baseline:?gc.churn.CodeHeap_'non-profiled_nmethods'.norm       avgt   25    ? 10??              B/op
baseline:?gc.churn.G1_Old_Gen                                  avgt   25  4711,586 ? 218,439  MB/sec
baseline:?gc.churn.G1_Old_Gen.norm                             avgt   25    56,094 ?   0,084    B/op
baseline:?gc.count                                             avgt   25  5400,000            counts
baseline:?gc.time                                              avgt   25  3926,000                ms

baselineClone                                                  avgt   25    21,906 ?   1,234   ns/op
baselineClone:?gc.alloc.rate                                   avgt   25  4667,440 ? 248,731  MB/sec
baselineClone:?gc.alloc.rate.norm                              avgt   25   112,000 ?   0,001    B/op
baselineClone:?gc.churn.CodeHeap_'non-profiled_nmethods'       avgt   25     0,008 ?   0,002  MB/sec
baselineClone:?gc.churn.CodeHeap_'non-profiled_nmethods'.norm  avgt   25    ? 10??              B/op
baselineClone:?gc.churn.G1_Old_Gen                             avgt   25  4675,250 ? 247,341  MB/sec
baselineClone:?gc.churn.G1_Old_Gen.norm                        avgt   25   112,192 ?   0,162    B/op
baselineClone:?gc.count                                        avgt   25  5489,000            counts
baselineClone:?gc.time                                         avgt   25  4042,000                ms

JDK 13

                                                Mode  Cnt     Score     Error   Units
baseline                                        avgt   25    10,014 ?   0,227   ns/op
baseline:?gc.alloc.rate                         avgt   25  5082,913 ? 110,593  MB/sec
baseline:?gc.alloc.rate.norm                    avgt   25    56,000 ?   0,001    B/op
baseline:?gc.churn.G1_Eden_Space                avgt   25  5092,013 ? 110,500  MB/sec
baseline:?gc.churn.G1_Eden_Space.norm           avgt   25    56,100 ?   0,076    B/op
baseline:?gc.churn.G1_Survivor_Space            avgt   25     0,005 ?   0,001  MB/sec
baseline:?gc.churn.G1_Survivor_Space.norm       avgt   25    ? 10??              B/op
baseline:?gc.count                              avgt   25  5753,000            counts
baseline:?gc.time                               avgt   25  3733,000                ms

baselineClone                                   avgt   25    26,619 ?   1,405   ns/op
baselineClone:?gc.alloc.rate                    avgt   25  3837,924 ? 185,292  MB/sec
baselineClone:?gc.alloc.rate.norm               avgt   25   112,000 ?   0,001    B/op
baselineClone:?gc.churn.G1_Eden_Space           avgt   25  3844,010 ? 185,460  MB/sec
baselineClone:?gc.churn.G1_Eden_Space.norm      avgt   25   112,178 ?   0,168    B/op
baselineClone:?gc.churn.G1_Survivor_Space       avgt   25     0,008 ?   0,001  MB/sec
baselineClone:?gc.churn.G1_Survivor_Space.norm  avgt   25    ? 10??              B/op
baselineClone:?gc.count                         avgt   25  4668,000            counts
baselineClone:?gc.time                          avgt   25  2923,000                ms


>From this output I conclude that either I miss something from understanding of how compiler and runtime work, or this is a bug.

I will be happy to understand which of the two is correct :)

There is also good news though, the latest Graal can drop allocation off for baseline method [2]

With best regards,

Sergey Tsypanov

1) https://github.com/stsypanov/logeek-night-benchmark/blob/master/benchmark-runners/src/main/java/com/luxoft/logeek/benchmark/array/ArrayAllocationEliminationBenchmark.java

2) https://github.com/oracle/graal/issues/1847

From sergei.tsypanov at yandex.ru  Wed Nov 20 21:32:52 2019
From: sergei.tsypanov at yandex.ru (=?utf-8?B?0KHQtdGA0LPQtdC5INCm0YvQv9Cw0L3QvtCy?=)
Date: Wed, 20 Nov 2019 23:32:52 +0200
Subject: Allocation of array copy can be eliminated in particular cases
In-Reply-To: <fd709a7e-6cfa-f75c-4532-2e1f1ce47276@oracle.com>
References: <48528911574117040@sas8-55d8cbf44a35.qloud-c.yandex.net>
 <9d229f16-e3e8-c848-eea8-f4a24082aa3f@oracle.com>
 <fd709a7e-6cfa-f75c-4532-2e1f1ce47276@oracle.com>
Message-ID: <67884931574285572@sas2-c8fd3ed78d67.qloud-c.yandex.net>

Hello Nils,

> Anyway - I have an almost finished patch that replace unused array
> allocations with a proper guard.

is there any particular JDK-xxxxxxx issue where this is tracked?

Regards,
Sergey Tsypanov

From hohensee at amazon.com  Wed Nov 20 22:22:30 2019
From: hohensee at amazon.com (Hohensee, Paul)
Date: Wed, 20 Nov 2019 22:22:30 +0000
Subject: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum
 in C2 load barrier
In-Reply-To: <af3039a6-6929-25d8-47b9-afef2d6ddfa6@oracle.com>
References: <af3039a6-6929-25d8-47b9-afef2d6ddfa6@oracle.com>
Message-ID: <D7B5FCA3-B3C5-445D-9A4F-39671E4AEFD1@amazon.com>

It's not just the zgc load barrier that's affected, it's every jcc and fuzed jcc (e.g., cmp/jcc and sub/jcc, because the pairs are issued on the same clock). There's a code pattern attribute called ins_alignment(<n>) in the ad file, vis

ins_attrib ins_alignment(1);    // Required alignment attribute (must
                                // be a power of 2) specifies the
                                // alignment that some part of the
                                // instruction (not necessarily the
                                // start) requires.  If > 1, a
                                // compute_padding() function must be
                                // provided for the instruction

Would it be possible to use/enhance ins_alignment() rather than do something zgc-specific?

Thanks,

Paul

?On 11/19/19, 6:23 AM, "hotspot-compiler-dev on behalf of erik.osterlund at oracle.com" <hotspot-compiler-dev-bounces at openjdk.java.net on behalf of erik.osterlund at oracle.com> wrote:

    Hi,
    
    Intel released an erratum (SKX102) which causes "unexpected system 
    behaviour" when branches
    (including fused conditional branches) cross or end at 64 byte boundaries.
    They are mitigating this by rolling out microcode updates that disable 
    micro op caching for
    conditional branches that cross or end at 32 byte boundaries. The 
    mitigation can cause
    performance regressions, unless affected branches are aligned properly.
    
    The erratum and its mitigation are described in more detail in this 
    document published by Intel:
    https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf
    
    My intention for this patch is to introduce the infrastructure to 
    determine that we may
    have an affected CPU, and mitigate this by aligning the most important 
    branch in the whole
    JVM: the ZGC load barrier fast path check. Perhaps similar methodology 
    can be reused later
    to solve this for other performance critical code, but that is outside 
    the scope of this CR.
    
    The sprinkling of nops do not seem to cause regressions in workloads I 
    have tried, given a
    machine without the JCC mitigations.
    
    Bug:
    https://bugs.openjdk.java.net/browse/JDK-8234160
    
    Webrev:
    http://cr.openjdk.java.net/~eosterlund/8234160/webrev.00/
    
    Thanks,
    /Erik
    

From igor.ignatyev at oracle.com  Thu Nov 21 00:27:42 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 20 Nov 2019 16:27:42 -0800
Subject: RFR(S) : 8147017 : Platform.isGraal should be removed
In-Reply-To: <5e1d17af-798f-123f-ef5e-3957b98a8340@oracle.com>
References: <981118AF-1DAD-4231-9FA6-7A89A46E5EDB@oracle.com>
 <5e1d17af-798f-123f-ef5e-3957b98a8340@oracle.com>
Message-ID: <C0966C97-0281-4421-B7A1-A7B11C2738E9@oracle.com>

@Misha,

thanks for your review.

@list,
can I get a 2nd review from a Reviewer?

-- Igor

> On Nov 18, 2019, at 2:06 PM, mikhailo.seledtsov at oracle.com wrote:
> 
> Looks good to me,
> 
> Misha
> 
> On 11/17/19 11:00 AM, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html
>>> 16 lines changed: 2 ins; 8 del; 6 mod;
>> Hi all,
>> 
>> jdk.test.lib.Platform.isGraal method assumes that JVM w/ Graal as JIT has 'Graal VM' in its name, which is wrong, and caused other to incorrectly assume that '-graal' flag exist and must be used to select Graal compiler. the patch removes this method and updates its only meaningful usage in TestGCLogMessages test. TestGCLogMessages test should use LogMessageWithLevelC2OrJVMCIOnly only when c2 or graal is available, so it's been updated to use corresponding methods of sun.hotspot.code.Compiler class, which requires WhiteBoxAPI being enabled.
>> 
>> 
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8147017
>> webrev: http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html
>> testing: tier1 + TestGCLogMessages w/ different JIT configurations
>> 
>> Thanks,
>> -- Igor


From igor.ignatyev at oracle.com  Thu Nov 21 01:05:41 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 20 Nov 2019 17:05:41 -0800
Subject: RFR: 8234499: [Graal]
 compiler/compilercontrol/CompilationModeHighOnlyTest.java test fails with
 timeout
In-Reply-To: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn>
References: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn>
Message-ID: <4F729DBC-1527-4899-8F0F-7CA1930BBC7A@oracle.com>

Hi Jie,

wouldn't it be a better solution to limit compilation to CompilationModeHighOnlyTest via -XX:CompileControl=compileonly? this should solve timeout w/ Graal w/o removing the only test for CompilationMode=high-only?
 
Thanks,
-- Igor

> On Nov 20, 2019, at 4:39 AM, Jie Fu <fujie at loongson.cn> wrote:
> 
> Hi all,
> 
> May I get reviews for this small fix?
> 
> JBS:    https://bugs.openjdk.java.net/browse/JDK-8234499
> Webrev: http://cr.openjdk.java.net/~jiefu/8234499/webrev.00/
> 
> And I need a sponsor.
> 
> Thanks a lot.
> Best regards,
> Jie
> 


From vladimir.kozlov at oracle.com  Thu Nov 21 01:54:34 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 20 Nov 2019 17:54:34 -0800
Subject: RFR(S) : 8147017 : Platform.isGraal should be removed
In-Reply-To: <C0966C97-0281-4421-B7A1-A7B11C2738E9@oracle.com>
References: <981118AF-1DAD-4231-9FA6-7A89A46E5EDB@oracle.com>
 <5e1d17af-798f-123f-ef5e-3957b98a8340@oracle.com>
 <C0966C97-0281-4421-B7A1-A7B11C2738E9@oracle.com>
Message-ID: <3d0ddee5-ab54-3742-c053-d9cd74a93cb8@oracle.com>

Reviewed. Good.

Thanks,
Vladimir K

On 11/20/19 4:27 PM, Igor Ignatyev wrote:
> @Misha,
> 
> thanks for your review.
> 
> @list,
> can I get a 2nd review from a Reviewer?
> 
> -- Igor
> 
>> On Nov 18, 2019, at 2:06 PM, mikhailo.seledtsov at oracle.com wrote:
>>
>> Looks good to me,
>>
>> Misha
>>
>> On 11/17/19 11:00 AM, Igor Ignatyev wrote:
>>> http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html
>>>> 16 lines changed: 2 ins; 8 del; 6 mod;
>>> Hi all,
>>>
>>> jdk.test.lib.Platform.isGraal method assumes that JVM w/ Graal as JIT has 'Graal VM' in its name, which is wrong, and caused other to incorrectly assume that '-graal' flag exist and must be used to select Graal compiler. the patch removes this method and updates its only meaningful usage in TestGCLogMessages test. TestGCLogMessages test should use LogMessageWithLevelC2OrJVMCIOnly only when c2 or graal is available, so it's been updated to use corresponding methods of sun.hotspot.code.Compiler class, which requires WhiteBoxAPI being enabled.
>>>
>>>
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8147017
>>> webrev: http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html
>>> testing: tier1 + TestGCLogMessages w/ different JIT configurations
>>>
>>> Thanks,
>>> -- Igor
> 

From fujie at loongson.cn  Thu Nov 21 01:58:44 2019
From: fujie at loongson.cn (Jie Fu)
Date: Thu, 21 Nov 2019 09:58:44 +0800
Subject: RFR: 8234499: [Graal]
 compiler/compilercontrol/CompilationModeHighOnlyTest.java test fails with
 timeout
In-Reply-To: <4F729DBC-1527-4899-8F0F-7CA1930BBC7A@oracle.com>
References: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn>
 <4F729DBC-1527-4899-8F0F-7CA1930BBC7A@oracle.com>
Message-ID: <3f17a07b-29bc-4386-a539-48866cd79e04@loongson.cn>

Hi Igor,

Good idea! Thank you very much.

Updated: http://cr.openjdk.java.net/~jiefu/8234499/webrev.01/

Hope you can sponsor it if you're OK with it.

Thanks a lot.
Best regards,
Jie

On 2019/11/21 ??9:05, Igor Ignatyev wrote:
> Hi Jie,
>
> wouldn't it be a better solution to limit compilation to CompilationModeHighOnlyTest via -XX:CompileControl=compileonly? this should solve timeout w/ Graal w/o removing the only test for CompilationMode=high-only?
>   
> Thanks,
> -- Igor
>
>> On Nov 20, 2019, at 4:39 AM, Jie Fu <fujie at loongson.cn> wrote:
>>
>> Hi all,
>>
>> May I get reviews for this small fix?
>>
>> JBS:    https://bugs.openjdk.java.net/browse/JDK-8234499
>> Webrev: http://cr.openjdk.java.net/~jiefu/8234499/webrev.00/
>>
>> And I need a sponsor.
>>
>> Thanks a lot.
>> Best regards,
>> Jie
>>


From Xiaohong.Gong at arm.com  Thu Nov 21 02:16:49 2019
From: Xiaohong.Gong at arm.com (Xiaohong Gong (Arm Technology China))
Date: Thu, 21 Nov 2019 02:16:49 +0000
Subject: RFR: 8234321: Call cache flush after generating trampoline.
In-Reply-To: <b88446e2-986d-a0c8-a62c-0884f129e6d7@redhat.com>
References: <VE1PR08MB50080DA2524E36FBC48BFBFEF54F0@VE1PR08MB5008.eurprd08.prod.outlook.com>
 <b88446e2-986d-a0c8-a62c-0884f129e6d7@redhat.com>
Message-ID: <VE1PR08MB500804C382D2073F463514DAF54E0@VE1PR08MB5008.eurprd08.prod.outlook.com>

Hi,

Thanks for your reviewing! @Andrew Dinn
So could someone else help to review this patch? Thanks a lot if someone who are familiar with other platforms (ppc, etc) could take a look at it.

Thanks,
Xiaohong Gong

-----Original Message-----
From: Andrew Dinn <adinn at redhat.com> 
Sent: Wednesday, November 20, 2019 5:12 PM
To: Xiaohong Gong (Arm Technology China) <Xiaohong.Gong at arm.com>; hotspot-compiler-dev at openjdk.java.net; ioi.lam at oracle.com; calvin.cheung at oracle.com
Cc: nd <nd at arm.com>
Subject: Re: RFR: 8234321: Call cache flush after generating trampoline.

On 20/11/2019 06:36, Xiaohong Gong (Arm Technology China) wrote:
> Please help to review this small patch:
> Webrev: http://cr.openjdk.java.net/~xgong/rfr/8234321/webrev.00/
> JBS: https://bugs.openjdk.java.net/browse/JDK-8234321
Yes, the patch looks good.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From igor.ignatyev at oracle.com  Thu Nov 21 02:17:51 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 20 Nov 2019 18:17:51 -0800
Subject: RFR: 8234499: [Graal]
 compiler/compilercontrol/CompilationModeHighOnlyTest.java test fails with
 timeout
In-Reply-To: <3f17a07b-29bc-4386-a539-48866cd79e04@loongson.cn>
References: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn>
 <4F729DBC-1527-4899-8F0F-7CA1930BBC7A@oracle.com>
 <3f17a07b-29bc-4386-a539-48866cd79e04@loongson.cn>
Message-ID: <EE946B42-0CA9-406D-A6AF-A5D8826D8404@oracle.com>

Hi Jie,

we are trying to replace all usages of -XX:CompileOnly w/ XX:CompileCommand=compileonly, could you please update your patch accordingly?

-- Igor

> On Nov 20, 2019, at 5:58 PM, Jie Fu <fujie at loongson.cn> wrote:
> 
> Hi Igor,
> 
> Good idea! Thank you very much.
> 
> Updated: http://cr.openjdk.java.net/~jiefu/8234499/webrev.01/
> 
> Hope you can sponsor it if you're OK with it.
> 
> Thanks a lot.
> Best regards,
> Jie
> 
> On 2019/11/21 ??9:05, Igor Ignatyev wrote:
>> Hi Jie,
>> 
>> wouldn't it be a better solution to limit compilation to CompilationModeHighOnlyTest via -XX:CompileControl=compileonly? this should solve timeout w/ Graal w/o removing the only test for CompilationMode=high-only?
>>  Thanks,
>> -- Igor
>> 
>>> On Nov 20, 2019, at 4:39 AM, Jie Fu <fujie at loongson.cn> wrote:
>>> 
>>> Hi all,
>>> 
>>> May I get reviews for this small fix?
>>> 
>>> JBS:    https://bugs.openjdk.java.net/browse/JDK-8234499
>>> Webrev: http://cr.openjdk.java.net/~jiefu/8234499/webrev.00/
>>> 
>>> And I need a sponsor.
>>> 
>>> Thanks a lot.
>>> Best regards,
>>> Jie
>>> 
> 


From igor.ignatyev at oracle.com  Thu Nov 21 02:27:30 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 20 Nov 2019 18:27:30 -0800
Subject: RFR(S) : 8147017 : Platform.isGraal should be removed
In-Reply-To: <3d0ddee5-ab54-3742-c053-d9cd74a93cb8@oracle.com>
References: <981118AF-1DAD-4231-9FA6-7A89A46E5EDB@oracle.com>
 <5e1d17af-798f-123f-ef5e-3957b98a8340@oracle.com>
 <C0966C97-0281-4421-B7A1-A7B11C2738E9@oracle.com>
 <3d0ddee5-ab54-3742-c053-d9cd74a93cb8@oracle.com>
Message-ID: <E5018B64-B35C-41FD-8669-58E477BFB505@oracle.com>

Hi Vladimir,

thanks for your review, pushed.
-- Igor

> On Nov 20, 2019, at 5:54 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Reviewed. Good.
> 
> Thanks,
> Vladimir K
> 
> On 11/20/19 4:27 PM, Igor Ignatyev wrote:
>> @Misha,
>> thanks for your review.
>> @list,
>> can I get a 2nd review from a Reviewer?
>> -- Igor
>>> On Nov 18, 2019, at 2:06 PM, mikhailo.seledtsov at oracle.com wrote:
>>> 
>>> Looks good to me,
>>> 
>>> Misha
>>> 
>>> On 11/17/19 11:00 AM, Igor Ignatyev wrote:
>>>> http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html
>>>>> 16 lines changed: 2 ins; 8 del; 6 mod;
>>>> Hi all,
>>>> 
>>>> jdk.test.lib.Platform.isGraal method assumes that JVM w/ Graal as JIT has 'Graal VM' in its name, which is wrong, and caused other to incorrectly assume that '-graal' flag exist and must be used to select Graal compiler. the patch removes this method and updates its only meaningful usage in TestGCLogMessages test. TestGCLogMessages test should use LogMessageWithLevelC2OrJVMCIOnly only when c2 or graal is available, so it's been updated to use corresponding methods of sun.hotspot.code.Compiler class, which requires WhiteBoxAPI being enabled.
>>>> 
>>>> 
>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8147017
>>>> webrev: http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html
>>>> testing: tier1 + TestGCLogMessages w/ different JIT configurations
>>>> 
>>>> Thanks,
>>>> -- Igor


From fujie at loongson.cn  Thu Nov 21 02:48:53 2019
From: fujie at loongson.cn (Jie Fu)
Date: Thu, 21 Nov 2019 10:48:53 +0800
Subject: RFR: 8234499: [Graal]
 compiler/compilercontrol/CompilationModeHighOnlyTest.java test fails with
 timeout
In-Reply-To: <EE946B42-0CA9-406D-A6AF-A5D8826D8404@oracle.com>
References: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn>
 <4F729DBC-1527-4899-8F0F-7CA1930BBC7A@oracle.com>
 <3f17a07b-29bc-4386-a539-48866cd79e04@loongson.cn>
 <EE946B42-0CA9-406D-A6AF-A5D8826D8404@oracle.com>
Message-ID: <c6cd31ab-6565-b6cd-0b57-f79537e95bfa@loongson.cn>

Hi Igor,

OK.

Updated: http://cr.openjdk.java.net/~jiefu/8234499/webrev.02/

Thanks a lot.
Best regards,
Jie

On 2019/11/21 ??10:17, Igor Ignatyev wrote:
> Hi Jie,
>
> we are trying to replace all usages of -XX:CompileOnly w/ XX:CompileCommand=compileonly, could you please update your patch accordingly?
>
> -- Igor
>
>> On Nov 20, 2019, at 5:58 PM, Jie Fu <fujie at loongson.cn> wrote:
>>
>> Hi Igor,
>>
>> Good idea! Thank you very much.
>>
>> Updated: http://cr.openjdk.java.net/~jiefu/8234499/webrev.01/
>>
>> Hope you can sponsor it if you're OK with it.
>>
>> Thanks a lot.
>> Best regards,
>> Jie
>>
>> On 2019/11/21 ??9:05, Igor Ignatyev wrote:
>>> Hi Jie,
>>>
>>> wouldn't it be a better solution to limit compilation to CompilationModeHighOnlyTest via -XX:CompileControl=compileonly? this should solve timeout w/ Graal w/o removing the only test for CompilationMode=high-only?
>>>   Thanks,
>>> -- Igor
>>>
>>>> On Nov 20, 2019, at 4:39 AM, Jie Fu <fujie at loongson.cn> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> May I get reviews for this small fix?
>>>>
>>>> JBS:    https://bugs.openjdk.java.net/browse/JDK-8234499
>>>> Webrev: http://cr.openjdk.java.net/~jiefu/8234499/webrev.00/
>>>>
>>>> And I need a sponsor.
>>>>
>>>> Thanks a lot.
>>>> Best regards,
>>>> Jie
>>>>


From igor.ignatyev at oracle.com  Thu Nov 21 03:25:01 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 20 Nov 2019 19:25:01 -0800
Subject: RFR: 8234499: [Graal]
 compiler/compilercontrol/CompilationModeHighOnlyTest.java test fails with
 timeout
In-Reply-To: <c6cd31ab-6565-b6cd-0b57-f79537e95bfa@loongson.cn>
References: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn>
 <4F729DBC-1527-4899-8F0F-7CA1930BBC7A@oracle.com>
 <3f17a07b-29bc-4386-a539-48866cd79e04@loongson.cn>
 <EE946B42-0CA9-406D-A6AF-A5D8826D8404@oracle.com>
 <c6cd31ab-6565-b6cd-0b57-f79537e95bfa@loongson.cn>
Message-ID: <C6A947B7-2CE8-4526-BF6A-F94CFBBD9EE2@oracle.com>

Hi Jie,

just to double check, have you verified that the updated test can still provoke 8233885 ?

-- Igor 

> On Nov 20, 2019, at 6:48 PM, Jie Fu <fujie at loongson.cn> wrote:
> 
> Hi Igor,
> 
> OK.
> 
> Updated: http://cr.openjdk.java.net/~jiefu/8234499/webrev.02/
> 
> Thanks a lot.
> Best regards,
> Jie
> 
> On 2019/11/21 ??10:17, Igor Ignatyev wrote:
>> Hi Jie,
>> 
>> we are trying to replace all usages of -XX:CompileOnly w/ XX:CompileCommand=compileonly, could you please update your patch accordingly?
>> 
>> -- Igor
>> 
>>> On Nov 20, 2019, at 5:58 PM, Jie Fu <fujie at loongson.cn> wrote:
>>> 
>>> Hi Igor,
>>> 
>>> Good idea! Thank you very much.
>>> 
>>> Updated: http://cr.openjdk.java.net/~jiefu/8234499/webrev.01/
>>> 
>>> Hope you can sponsor it if you're OK with it.
>>> 
>>> Thanks a lot.
>>> Best regards,
>>> Jie
>>> 
>>> On 2019/11/21 ??9:05, Igor Ignatyev wrote:
>>>> Hi Jie,
>>>> 
>>>> wouldn't it be a better solution to limit compilation to CompilationModeHighOnlyTest via -XX:CompileControl=compileonly? this should solve timeout w/ Graal w/o removing the only test for CompilationMode=high-only?
>>>>  Thanks,
>>>> -- Igor
>>>> 
>>>>> On Nov 20, 2019, at 4:39 AM, Jie Fu <fujie at loongson.cn> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> May I get reviews for this small fix?
>>>>> 
>>>>> JBS:    https://bugs.openjdk.java.net/browse/JDK-8234499
>>>>> Webrev: http://cr.openjdk.java.net/~jiefu/8234499/webrev.00/
>>>>> 
>>>>> And I need a sponsor.
>>>>> 
>>>>> Thanks a lot.
>>>>> Best regards,
>>>>> Jie
>>>>> 
> 


From fujie at loongson.cn  Thu Nov 21 03:34:27 2019
From: fujie at loongson.cn (Jie Fu)
Date: Thu, 21 Nov 2019 11:34:27 +0800
Subject: RFR: 8234499: [Graal]
 compiler/compilercontrol/CompilationModeHighOnlyTest.java test fails with
 timeout
In-Reply-To: <C6A947B7-2CE8-4526-BF6A-F94CFBBD9EE2@oracle.com>
References: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn>
 <4F729DBC-1527-4899-8F0F-7CA1930BBC7A@oracle.com>
 <3f17a07b-29bc-4386-a539-48866cd79e04@loongson.cn>
 <EE946B42-0CA9-406D-A6AF-A5D8826D8404@oracle.com>
 <c6cd31ab-6565-b6cd-0b57-f79537e95bfa@loongson.cn>
 <C6A947B7-2CE8-4526-BF6A-F94CFBBD9EE2@oracle.com>
Message-ID: <d8580ded-8bb5-0c20-6dd3-dbe6b5a17410@loongson.cn>

Yes, it can still provoke 8233885.

Thanks.

On 2019/11/21 ??11:25, Igor Ignatyev wrote:
> have you verified that the updated test can still provoke 8233885 ?


From igor.ignatyev at oracle.com  Thu Nov 21 03:59:02 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 20 Nov 2019 19:59:02 -0800
Subject: RFR: 8234499: [Graal]
 compiler/compilercontrol/CompilationModeHighOnlyTest.java test fails with
 timeout
In-Reply-To: <d8580ded-8bb5-0c20-6dd3-dbe6b5a17410@loongson.cn>
References: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn>
 <4F729DBC-1527-4899-8F0F-7CA1930BBC7A@oracle.com>
 <3f17a07b-29bc-4386-a539-48866cd79e04@loongson.cn>
 <EE946B42-0CA9-406D-A6AF-A5D8826D8404@oracle.com>
 <c6cd31ab-6565-b6cd-0b57-f79537e95bfa@loongson.cn>
 <C6A947B7-2CE8-4526-BF6A-F94CFBBD9EE2@oracle.com>
 <d8580ded-8bb5-0c20-6dd3-dbe6b5a17410@loongson.cn>
Message-ID: <928BDF2E-9B89-49F8-BCA6-CFDB14A03298@oracle.com>

great, reviewed and integrated.

-- Igor

> On Nov 20, 2019, at 7:34 PM, Jie Fu <fujie at loongson.cn> wrote:
> 
> Yes, it can still provoke 8233885.
> 
> Thanks.
> 
> On 2019/11/21 ??11:25, Igor Ignatyev wrote:
>> have you verified that the updated test can still provoke 8233885 ?
> 


From fujie at loongson.cn  Thu Nov 21 04:01:03 2019
From: fujie at loongson.cn (Jie Fu)
Date: Thu, 21 Nov 2019 12:01:03 +0800
Subject: RFR: 8234499: [Graal]
 compiler/compilercontrol/CompilationModeHighOnlyTest.java test fails with
 timeout
In-Reply-To: <928BDF2E-9B89-49F8-BCA6-CFDB14A03298@oracle.com>
References: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn>
 <4F729DBC-1527-4899-8F0F-7CA1930BBC7A@oracle.com>
 <3f17a07b-29bc-4386-a539-48866cd79e04@loongson.cn>
 <EE946B42-0CA9-406D-A6AF-A5D8826D8404@oracle.com>
 <c6cd31ab-6565-b6cd-0b57-f79537e95bfa@loongson.cn>
 <C6A947B7-2CE8-4526-BF6A-F94CFBBD9EE2@oracle.com>
 <d8580ded-8bb5-0c20-6dd3-dbe6b5a17410@loongson.cn>
 <928BDF2E-9B89-49F8-BCA6-CFDB14A03298@oracle.com>
Message-ID: <3a35f216-dbb0-7b27-6399-7b804b5788f9@loongson.cn>

Thank you so much, Igor.

On 2019/11/21 ??11:59, Igor Ignatyev wrote:
> great, reviewed and integrated.
>
> -- Igor
>
>> On Nov 20, 2019, at 7:34 PM, Jie Fu <fujie at loongson.cn> wrote:
>>
>> Yes, it can still provoke 8233885.
>>
>> Thanks.
>>
>> On 2019/11/21 ??11:25, Igor Ignatyev wrote:
>>> have you verified that the updated test can still provoke 8233885 ?


From igor.ignatyev at oracle.com  Thu Nov 21 04:25:44 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 20 Nov 2019 20:25:44 -0800
Subject: RFR(S) : 8225554 : add JFR event for uncommon trap
In-Reply-To: <8B759742-CAD0-4811-93C6-18466203A070@oracle.com>
References: <CD6846A8-35E7-4CD2-8632-C749E1CE9320@oracle.com>
 <688e6abd-fe01-43fd-99c0-a4b8066ddbb2@default>
 <8B759742-CAD0-4811-93C6-18466203A070@oracle.com>
Message-ID: <32AFB259-4D1D-4C0E-B417-3EED0DD07302@oracle.com>

ping?

> On Jun 19, 2019, at 5:20 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
> 
> Hi Markus,
> 
> I definitely support the idea of making the event as helpful for end-users as possible; and having information about the uncommon trap "location" (method, line-number, bci) seems to be very useful. I don't think that 'instruction' field is helpful though b/c w/o seeing the rest of method code it can't be really used to understand what/why happened. what do you think?
> 
> regarding the name, although I don't think this even can be used by people w/o understanding some level of hotspot internals, they at least need to understand what a bit cryptic reasons and actions mean, "Deoptimization" sounds good to me.
> 
> please let me know how you can proceed further here, I can update my patch to rename even and include location info, or I can just withdraw my patch in favor of yours (that's if you plan to finish work on it in near future and it won't be left for other few years :) ) 
> 
> Thanks,
> -- Igor
> 
>> On Jun 18, 2019, at 2:44 AM, Markus Gronlund <markus.gronlund at oracle.com> wrote:
>> 
>> Hi Igor,
>> 
>> Thank you for looking into providing this support.
>> 
>> This work partly overlaps with something I have been working on under the following enhancement:
>> 
>> Enh: https://bugs.openjdk.java.net/browse/JDK-8216041
>> 
>> I have had a patch somewhat semi-ready for some years now, please see:
>> http://cr.openjdk.java.net/~mgronlun/8216041/
>> 
>> Here is what the information set could look visually by default (no structured rendering) in JDK Mission Control:
>> http://cr.openjdk.java.net/~mgronlun/8216041/DeoptimizationEvent.jpg
>> 
>> Maybe we should merge our work for this effort (I am interested in your test case)?
>> 
>> I think we need to take a larger view on this, especially to see if this information could also be made understandable and maybe even useful to the end-user / developer.
>> 
>> This is the reason I choose to use the "deoptimization" concept instead of the more internal UncomonTrap.
>> 
>> Let's see if we together can craft a useful event here.
>> 
>> Thanks
>> Markus
>> 
>> -----Original Message-----
>> From: Igor Ignatyev 
>> Sent: den 11 juni 2019 20:49
>> To: hotspot-jfr-dev at openjdk.java.net; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>> Subject: RFR(S) : 8225554 : add JFR event for uncommon trap
>> 
>> http://cr.openjdk.java.net/~iignatyev//8225554/webrev.00/index.html
>>> 187 lines changed: 184 ins; 0 del; 3 mod;
>> 
>> Hi all,
>> 
>> could you please review this small patch which adds jfr event for uncommon trap?
>> 
>> webrev: http://cr.openjdk.java.net/~iignatyev//8225554/webrev.00/index.html
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8225554
>> testing:
>> - tier1 (which includes a newly added test)
>> - modified version of compiler/intrinsics/klass/CastNullCheckDroppingsTest.java (see JDK-8129092[1])
>> 
>> [1] https://bugs.openjdk.java.net/browse/JDK-8129092
>> 
>> Thanks,
>> -- Igor
> 


From ioi.lam at oracle.com  Thu Nov 21 04:50:10 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Wed, 20 Nov 2019 20:50:10 -0800
Subject: RFR: 8234321: Call cache flush after generating trampoline.
In-Reply-To: <VE1PR08MB500804C382D2073F463514DAF54E0@VE1PR08MB5008.eurprd08.prod.outlook.com>
References: <VE1PR08MB50080DA2524E36FBC48BFBFEF54F0@VE1PR08MB5008.eurprd08.prod.outlook.com>
 <b88446e2-986d-a0c8-a62c-0884f129e6d7@redhat.com>
 <VE1PR08MB500804C382D2073F463514DAF54E0@VE1PR08MB5008.eurprd08.prod.outlook.com>
Message-ID: <4f8dcb9c-f150-7f50-b920-94c2d9cfec13@oracle.com>

Hi Xiaohong,

The changes look good to me. I am running tests on our test 
infrastructure now.

Do you need a sponsor for pushing the changeset?

Thanks
- Ioi

On 11/20/19 6:16 PM, Xiaohong Gong (Arm Technology China) wrote:
> Hi,
>
> Thanks for your reviewing! @Andrew Dinn
> So could someone else help to review this patch? Thanks a lot if someone who are familiar with other platforms (ppc, etc) could take a look at it.
>
> Thanks,
> Xiaohong Gong
>
> -----Original Message-----
> From: Andrew Dinn <adinn at redhat.com>
> Sent: Wednesday, November 20, 2019 5:12 PM
> To: Xiaohong Gong (Arm Technology China) <Xiaohong.Gong at arm.com>; hotspot-compiler-dev at openjdk.java.net; ioi.lam at oracle.com; calvin.cheung at oracle.com
> Cc: nd <nd at arm.com>
> Subject: Re: RFR: 8234321: Call cache flush after generating trampoline.
>
> On 20/11/2019 06:36, Xiaohong Gong (Arm Technology China) wrote:
>> Please help to review this small patch:
>> Webrev: http://cr.openjdk.java.net/~xgong/rfr/8234321/webrev.00/
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8234321
> Yes, the patch looks good.
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill
>


From ioi.lam at oracle.com  Thu Nov 21 05:03:14 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Wed, 20 Nov 2019 21:03:14 -0800
Subject: RFR: 8234321: Call cache flush after generating trampoline.
In-Reply-To: <VE1PR08MB500804C382D2073F463514DAF54E0@VE1PR08MB5008.eurprd08.prod.outlook.com>
References: <VE1PR08MB50080DA2524E36FBC48BFBFEF54F0@VE1PR08MB5008.eurprd08.prod.outlook.com>
 <b88446e2-986d-a0c8-a62c-0884f129e6d7@redhat.com>
 <VE1PR08MB500804C382D2073F463514DAF54E0@VE1PR08MB5008.eurprd08.prod.outlook.com>
Message-ID: <cab48f0e-3bf8-da0d-9bbb-58a87d62144c@oracle.com>

Hi Xiaohong,

The changes look good to me. I am running tests on our test 
infrastructure now.

Do you need a sponsor for pushing the changeset?

Thanks
- Ioi

On 11/20/19 6:16 PM, Xiaohong Gong (Arm Technology China) wrote:
> Hi,
>
> Thanks for your reviewing! @Andrew Dinn
> So could someone else help to review this patch? Thanks a lot if 
> someone who are familiar with other platforms (ppc, etc) could take a 
> look at it.
>
> Thanks,
> Xiaohong Gong
>
> -----Original Message-----
> From: Andrew Dinn <adinn at redhat.com>
> Sent: Wednesday, November 20, 2019 5:12 PM
> To: Xiaohong Gong (Arm Technology China) <Xiaohong.Gong at arm.com>; 
> hotspot-compiler-dev at openjdk.java.net; ioi.lam at oracle.com; 
> calvin.cheung at oracle.com
> Cc: nd <nd at arm.com>
> Subject: Re: RFR: 8234321: Call cache flush after generating trampoline.
>
> On 20/11/2019 06:36, Xiaohong Gong (Arm Technology China) wrote:
>> Please help to review this small patch:
>> Webrev: http://cr.openjdk.java.net/~xgong/rfr/8234321/webrev.00/
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8234321
> Yes, the patch looks good.
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill
>


From tom.rodriguez at oracle.com  Thu Nov 21 05:21:59 2019
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Wed, 20 Nov 2019 21:21:59 -0800
Subject: RFR 8234359: [JVMCI] invalidate_nmethod_mirror shouldn't use a
 phantom reference
Message-ID: <9e58f59e-b879-c9d5-be40-01a6b61cc87c@oracle.com>

http://cr.openjdk.java.net/~never/8234359/webrev
https://bugs.openjdk.java.net/browse/JDK-8234359

While testing the latest JVMCI in JDK11, crashes were occurring during 
draining of the SATB buffers.  The problem was tracked down to 
invalidate_nmethod_mirror being called on an nmethod whose InstalledCode 
instance was also dead in the current GC. Reading this oop using 
NativeAccess<ON_PHANTOM_OOP_REF> lead to that oop being enqueued in the 
SATB buffer.  In JDK 14 it appears some other change in G1 disables 
those barriers at the point this code is executed but in JDK11 no such 
logic exists.  This code never resurrects that oop so using the normal 
AS_NO_KEEPALIVE semantics is correct and avoids attempting to enqueue 
the potentially dead object.

tom

From tobias.hartmann at oracle.com  Thu Nov 21 06:13:19 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 21 Nov 2019 07:13:19 +0100
Subject: [14] RFR (S): 8234387: C2: Better support of operands with
 multiple match rules in AD files
In-Reply-To: <30655159-7431-1a33-cb10-373e32c68002@oracle.com>
References: <30655159-7431-1a33-cb10-373e32c68002@oracle.com>
Message-ID: <cd4d7bc7-23f0-bd30-1846-f7aa72fad4d8@oracle.com>

Hi Vladimir,

looks good to me.

Best regards,
Tobias

On 19.11.19 14:00, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234387/webrev.00
> https://bugs.openjdk.java.net/browse/JDK-8234387
> 
> Though ADLC accepts operands with multiple match rules, it doesn't generate correct code to handle
> them except the first one.
> 
> It doesn't cause any noticeable problems for existing code, but is a major limitation for generic
> vector operands (JDK-8234391 [1]).
> 
> Proposed fix enumerates all match rules.
> 
> Fixed some missing declarations along the way.
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
> 
> Testing: tier1-4 (both with and without generic vectors)
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8234391

From Xiaohong.Gong at arm.com  Thu Nov 21 06:24:02 2019
From: Xiaohong.Gong at arm.com (Xiaohong Gong (Arm Technology China))
Date: Thu, 21 Nov 2019 06:24:02 +0000
Subject: RFR: 8234321: Call cache flush after generating trampoline.
In-Reply-To: <4f8dcb9c-f150-7f50-b920-94c2d9cfec13@oracle.com>
References: <VE1PR08MB50080DA2524E36FBC48BFBFEF54F0@VE1PR08MB5008.eurprd08.prod.outlook.com>
 <b88446e2-986d-a0c8-a62c-0884f129e6d7@redhat.com>
 <VE1PR08MB500804C382D2073F463514DAF54E0@VE1PR08MB5008.eurprd08.prod.outlook.com>
 <4f8dcb9c-f150-7f50-b920-94c2d9cfec13@oracle.com>
Message-ID: <VE1PR08MB5008B4E0DD6E9C0FC1C14504F54E0@VE1PR08MB5008.eurprd08.prod.outlook.com>

Hi Ioi,

Thanks for your reviewing!
And I'm glad that if you can help to push it if all the tests pass.

Thanks,
Xiaohong

-----Original Message-----
From: Ioi Lam <ioi.lam at oracle.com> 
Sent: Thursday, November 21, 2019 12:50 PM
To: Xiaohong Gong (Arm Technology China) <Xiaohong.Gong at arm.com>; Andrew Dinn <adinn at redhat.com>; hotspot-compiler-dev at openjdk.java.net; calvin.cheung at oracle.com
Cc: nd <nd at arm.com>
Subject: Re: RFR: 8234321: Call cache flush after generating trampoline.

Hi Xiaohong,

The changes look good to me. I am running tests on our test infrastructure now.

Do you need a sponsor for pushing the changeset?

Thanks
- Ioi

On 11/20/19 6:16 PM, Xiaohong Gong (Arm Technology China) wrote:
> Hi,
>
> Thanks for your reviewing! @Andrew Dinn So could someone else help to 
> review this patch? Thanks a lot if someone who are familiar with other platforms (ppc, etc) could take a look at it.
>
> Thanks,
> Xiaohong Gong
>
> -----Original Message-----
> From: Andrew Dinn <adinn at redhat.com>
> Sent: Wednesday, November 20, 2019 5:12 PM
> To: Xiaohong Gong (Arm Technology China) <Xiaohong.Gong at arm.com>; 
> hotspot-compiler-dev at openjdk.java.net; ioi.lam at oracle.com; 
> calvin.cheung at oracle.com
> Cc: nd <nd at arm.com>
> Subject: Re: RFR: 8234321: Call cache flush after generating trampoline.
>
> On 20/11/2019 06:36, Xiaohong Gong (Arm Technology China) wrote:
>> Please help to review this small patch:
>> Webrev: http://cr.openjdk.java.net/~xgong/rfr/8234321/webrev.00/
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8234321
> Yes, the patch looks good.
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 
> 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill
>


From ioi.lam at oracle.com  Thu Nov 21 06:35:32 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Wed, 20 Nov 2019 22:35:32 -0800
Subject: RFR: 8234321: Call cache flush after generating trampoline.
In-Reply-To: <VE1PR08MB5008B4E0DD6E9C0FC1C14504F54E0@VE1PR08MB5008.eurprd08.prod.outlook.com>
References: <VE1PR08MB50080DA2524E36FBC48BFBFEF54F0@VE1PR08MB5008.eurprd08.prod.outlook.com>
 <b88446e2-986d-a0c8-a62c-0884f129e6d7@redhat.com>
 <VE1PR08MB500804C382D2073F463514DAF54E0@VE1PR08MB5008.eurprd08.prod.outlook.com>
 <4f8dcb9c-f150-7f50-b920-94c2d9cfec13@oracle.com>
 <VE1PR08MB5008B4E0DD6E9C0FC1C14504F54E0@VE1PR08MB5008.eurprd08.prod.outlook.com>
Message-ID: <4aedfe03-cdee-0b9b-c80d-26dff55829d6@oracle.com>

Hi Xiaohong,

The patch passed our hs-tier1 and hs-tier2 tests, so I pushed it.

http://hg.openjdk.java.net/jdk/jdk/rev/2c55c2fc08f5

Thanks
- Ioi

On 11/20/19 10:24 PM, Xiaohong Gong (Arm Technology China) wrote:
> Hi Ioi,
>
> Thanks for your reviewing!
> And I'm glad that if you can help to push it if all the tests pass.
>
> Thanks,
> Xiaohong
>
> -----Original Message-----
> From: Ioi Lam <ioi.lam at oracle.com>
> Sent: Thursday, November 21, 2019 12:50 PM
> To: Xiaohong Gong (Arm Technology China) <Xiaohong.Gong at arm.com>; Andrew Dinn <adinn at redhat.com>; hotspot-compiler-dev at openjdk.java.net; calvin.cheung at oracle.com
> Cc: nd <nd at arm.com>
> Subject: Re: RFR: 8234321: Call cache flush after generating trampoline.
>
> Hi Xiaohong,
>
> The changes look good to me. I am running tests on our test infrastructure now.
>
> Do you need a sponsor for pushing the changeset?
>
> Thanks
> - Ioi
>
> On 11/20/19 6:16 PM, Xiaohong Gong (Arm Technology China) wrote:
>> Hi,
>>
>> Thanks for your reviewing! @Andrew Dinn So could someone else help to
>> review this patch? Thanks a lot if someone who are familiar with other platforms (ppc, etc) could take a look at it.
>>
>> Thanks,
>> Xiaohong Gong
>>
>> -----Original Message-----
>> From: Andrew Dinn <adinn at redhat.com>
>> Sent: Wednesday, November 20, 2019 5:12 PM
>> To: Xiaohong Gong (Arm Technology China) <Xiaohong.Gong at arm.com>;
>> hotspot-compiler-dev at openjdk.java.net; ioi.lam at oracle.com;
>> calvin.cheung at oracle.com
>> Cc: nd <nd at arm.com>
>> Subject: Re: RFR: 8234321: Call cache flush after generating trampoline.
>>
>> On 20/11/2019 06:36, Xiaohong Gong (Arm Technology China) wrote:
>>> Please help to review this small patch:
>>> Webrev: http://cr.openjdk.java.net/~xgong/rfr/8234321/webrev.00/
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8234321
>> Yes, the patch looks good.
>>
>> regards,
>>
>>
>> Andrew Dinn
>> -----------
>> Senior Principal Software Engineer
>> Red Hat UK Ltd
>> Registered in England and Wales under Company Registration No.
>> 03798903

>> Directors: Michael Cunningham, Michael ("Mike") O'Neill
>>


From Xiaohong.Gong at arm.com  Thu Nov 21 06:37:25 2019
From: Xiaohong.Gong at arm.com (Xiaohong Gong (Arm Technology China))
Date: Thu, 21 Nov 2019 06:37:25 +0000
Subject: RFR: 8234321: Call cache flush after generating trampoline.
In-Reply-To: <4aedfe03-cdee-0b9b-c80d-26dff55829d6@oracle.com>
References: <VE1PR08MB50080DA2524E36FBC48BFBFEF54F0@VE1PR08MB5008.eurprd08.prod.outlook.com>
 <b88446e2-986d-a0c8-a62c-0884f129e6d7@redhat.com>
 <VE1PR08MB500804C382D2073F463514DAF54E0@VE1PR08MB5008.eurprd08.prod.outlook.com>
 <4f8dcb9c-f150-7f50-b920-94c2d9cfec13@oracle.com>
 <VE1PR08MB5008B4E0DD6E9C0FC1C14504F54E0@VE1PR08MB5008.eurprd08.prod.outlook.com>
 <4aedfe03-cdee-0b9b-c80d-26dff55829d6@oracle.com>
Message-ID: <VE1PR08MB50087253EE4353E3DB311152F54E0@VE1PR08MB5008.eurprd08.prod.outlook.com>

Hi Ioi,

Thanks so much for it!

Thanks,
Xiaohong

-----Original Message-----
From: Ioi Lam <ioi.lam at oracle.com> 
Sent: Thursday, November 21, 2019 2:36 PM
To: Xiaohong Gong (Arm Technology China) <Xiaohong.Gong at arm.com>; Andrew Dinn <adinn at redhat.com>; hotspot-compiler-dev at openjdk.java.net; calvin.cheung at oracle.com
Cc: nd <nd at arm.com>
Subject: Re: RFR: 8234321: Call cache flush after generating trampoline.

Hi Xiaohong,

The patch passed our hs-tier1 and hs-tier2 tests, so I pushed it.

http://hg.openjdk.java.net/jdk/jdk/rev/2c55c2fc08f5

Thanks
- Ioi

On 11/20/19 10:24 PM, Xiaohong Gong (Arm Technology China) wrote:
> Hi Ioi,
>
> Thanks for your reviewing!
> And I'm glad that if you can help to push it if all the tests pass.
>
> Thanks,
> Xiaohong
>
> -----Original Message-----
> From: Ioi Lam <ioi.lam at oracle.com>
> Sent: Thursday, November 21, 2019 12:50 PM
> To: Xiaohong Gong (Arm Technology China) <Xiaohong.Gong at arm.com>; 
> Andrew Dinn <adinn at redhat.com>; hotspot-compiler-dev at openjdk.java.net; 
> calvin.cheung at oracle.com
> Cc: nd <nd at arm.com>
> Subject: Re: RFR: 8234321: Call cache flush after generating trampoline.
>
> Hi Xiaohong,
>
> The changes look good to me. I am running tests on our test infrastructure now.
>
> Do you need a sponsor for pushing the changeset?
>
> Thanks
> - Ioi
>
> On 11/20/19 6:16 PM, Xiaohong Gong (Arm Technology China) wrote:
>> Hi,
>>
>> Thanks for your reviewing! @Andrew Dinn So could someone else help to 
>> review this patch? Thanks a lot if someone who are familiar with other platforms (ppc, etc) could take a look at it.
>>
>> Thanks,
>> Xiaohong Gong
>>
>> -----Original Message-----
>> From: Andrew Dinn <adinn at redhat.com>
>> Sent: Wednesday, November 20, 2019 5:12 PM
>> To: Xiaohong Gong (Arm Technology China) <Xiaohong.Gong at arm.com>; 
>> hotspot-compiler-dev at openjdk.java.net; ioi.lam at oracle.com; 
>> calvin.cheung at oracle.com
>> Cc: nd <nd at arm.com>
>> Subject: Re: RFR: 8234321: Call cache flush after generating trampoline.
>>
>> On 20/11/2019 06:36, Xiaohong Gong (Arm Technology China) wrote:
>>> Please help to review this small patch:
>>> Webrev: http://cr.openjdk.java.net/~xgong/rfr/8234321/webrev.00/
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8234321
>> Yes, the patch looks good.
>>
>> regards,
>>
>>
>> Andrew Dinn
>> -----------
>> Senior Principal Software Engineer
>> Red Hat UK Ltd
>> Registered in England and Wales under Company Registration No.
>> 03798903

>> Directors: Michael Cunningham, Michael ("Mike") O'Neill
>>


From erik.osterlund at oracle.com  Thu Nov 21 06:50:55 2019
From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=)
Date: Thu, 21 Nov 2019 07:50:55 +0100
Subject: RFR 8234359: [JVMCI] invalidate_nmethod_mirror shouldn't use a
 phantom reference
In-Reply-To: <9e58f59e-b879-c9d5-be40-01a6b61cc87c@oracle.com>
References: <9e58f59e-b879-c9d5-be40-01a6b61cc87c@oracle.com>
Message-ID: <EB276519-BABD-4F83-A762-FEB63BC5EF7F@oracle.com>

Hi Tom,

Looks good.

Thanks,
/Erik

> On 21 Nov 2019, at 06:22, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
> 
> ?http://cr.openjdk.java.net/~never/8234359/webrev
> https://bugs.openjdk.java.net/browse/JDK-8234359
> 
> While testing the latest JVMCI in JDK11, crashes were occurring during draining of the SATB buffers.  The problem was tracked down to invalidate_nmethod_mirror being called on an nmethod whose InstalledCode instance was also dead in the current GC. Reading this oop using NativeAccess<ON_PHANTOM_OOP_REF> lead to that oop being enqueued in the SATB buffer.  In JDK 14 it appears some other change in G1 disables those barriers at the point this code is executed but in JDK11 no such logic exists.  This code never resurrects that oop so using the normal AS_NO_KEEPALIVE semantics is correct and avoids attempting to enqueue the potentially dead object.
> 
> tom


From igor.ignatyev at oracle.com  Thu Nov 21 07:33:19 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 20 Nov 2019 23:33:19 -0800
Subject: RFR(S) : 8234290 : compiler/c2/Test6857159.java times out and
 fail to clean up files
In-Reply-To: <37F8BFE5-CF19-42CC-8C26-ECCB2008D4A1@oracle.com>
References: <37F8BFE5-CF19-42CC-8C26-ECCB2008D4A1@oracle.com>
Message-ID: <B5231169-CD06-44CE-ADC9-03757FBE2A5A@oracle.com>

ping?

-- Igor

> On Nov 16, 2019, at 10:07 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~iignatyev//8234290/webrev.00/index.html
>> 67 lines changed: 16 ins; 24 del; 27 mod;
> 
> Hi all,
> 
> could you please review this small fix for Test6857159 test?
> from JBS:
>> the test has -XX:CompileOnly=compiler.c2.Test6857159$Test$ct::run, but there is no 'ct' class, there are ct[0-2], and ct0 the only which has 'run' method. shouldNotContain("COMPILE SKIPPED") and shouldContain("$ct0::run (16 bytes)"), which, I guess, were a defense against such situation, didn't help b/c PrintCompilation output doesn't have 'COMPILE SKIPPED' lines and have 'made not compilable on levels 0 1 2 3 ... $ct0::run (16 bytes) excluded by CompileCommand' line.
> the patch fixes CompileOnly value (actually replaces it w/ the correct CompileCommand), removes extra layer, and makes the test to use WhiteBox to check if ct0::run got compiled.
> 
> webrev: http://cr.openjdk.java.net/~iignatyev//8234290/webrev.00/index.html
> JBS: https://bugs.openjdk.java.net/browse/JDK-8234290
> testing: 
> - compiler/c2/Test6857159.java once on linux-x64,windows-x64,macosx-x64
> - compiler/c2/Test6857159.java 100 time on windows-x64-debug (where all failures were seen so far)
> 
> Thanks,
> -- Igor
> 


From erik.osterlund at oracle.com  Thu Nov 21 08:40:12 2019
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Thu, 21 Nov 2019 09:40:12 +0100
Subject: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum
 in C2 load barrier
In-Reply-To: <D7B5FCA3-B3C5-445D-9A4F-39671E4AEFD1@amazon.com>
References: <af3039a6-6929-25d8-47b9-afef2d6ddfa6@oracle.com>
 <D7B5FCA3-B3C5-445D-9A4F-39671E4AEFD1@amazon.com>
Message-ID: <e8482585-3a67-5e07-dfc8-51550b97b650@oracle.com>

Hi Paul,

On 2019-11-20 23:22, Hohensee, Paul wrote:
> It's not just the zgc load barrier that's affected, it's every jcc and fuzed jcc (e.g., cmp/jcc and sub/jcc, because the pairs are issued on the same clock).

Yeah. We also have to mitigate against unconditional branches, like ret 
and jmp. So I suppose the name "jcc erratum"
is slightly misleading in this context.

> There's a code pattern attribute called ins_alignment(<n>) in the ad file, vis
>
> ins_attrib ins_alignment(1);    // Required alignment attribute (must
>                                  // be a power of 2) specifies the
>                                  // alignment that some part of the
>                                  // instruction (not necessarily the
>                                  // start) requires.  If > 1, a
>                                  // compute_padding() function must be
>                                  // provided for the instruction
>
> Would it be possible to use/enhance ins_alignment() rather than do something zgc-specific?
> Thanks,

That is a good question. Unfortunately, there are a few problems 
applying such a strategy:

1) We do not want to constrain the alignment such that the instruction 
(+ specific offset) sits at e.g. the beginning of a 32 byte boundary. We 
want to be more loose and say that any alignment is fine... except the 
bad ones (crossing and ending at a 32 byte boundary). Otherwise I fear 
we will find ourselves bloating the code cache with unnecessary nops to 
align instructions that would never have been a problem. So in terms of 
alignment constraints, I think such a hammer is too big.
2) Another issue is that the alignment constraints apply not just to the 
one Mach node. It's sometimes for a fused op + jcc. Since we currently 
match the conditions and their branches separately (and the conditions 
not necessarily knowing they are indeed conditions to a branch, like for 
example an and instruction). So aligning the jcc for example is not 
necessarily going to help, unless its alignment knows what its preceding 
instruction is, and whether it will be fused or not. And depending on 
that, we want different alignment properties. So here the hammer is 
seemingly too loose.

I'm not 100% sure what to suggest for the generic case, but perhaps:

After things stopped moving around, add a pass to the Mach nodes, 
similar to branch shortening that:

1) Set up a new flag (Flags_intel_jcc_mitigation or something) to be 
used on Mach nodes to mark affected nodes.
2) Walk the Mach nodes and tag branches and conditions used by fused 
branches (by walking edges), checking that the two are adjacent (by 
looking at the node index in the block), and possibly also checking that 
it is one of the affected condition instructions that will get fused.
3) Now that we know what Mach nodes (and sequences of macro fused nodes) 
are problematic, we can put some code where the mach nodes are emitted 
that checks for consecutively tagged nodes and inject nops in the code 
buffer if they cross or end at 32 byte boundaries.

I suppose an alternative strategy is making sure that any problematic 
instruction sequence that would be fused, is also fused into one Mach 
node by sprinkling more rules in the AD file for the various forms of 
conditional branches that we think cover all the cases, and then 
applying the alignment constraint on individual nodes only. But it feels 
like that could be more intrusive and less efficient).

Since the generic problem is more involved compared to the simpler ZGC 
load barrier fix (which will need special treatment anyway), I would 
like to focus this RFE only on the ZGC load barrier branch, because it 
makes me sad when it has to suffer. Having said that, we will certainly 
look into fixing the generic problem too after this.

Thanks,
/Erik

From per.liden at oracle.com  Thu Nov 21 10:35:44 2019
From: per.liden at oracle.com (Per Liden)
Date: Thu, 21 Nov 2019 11:35:44 +0100
Subject: RFR: 8234160: ZGC: Enable optimized mitigation for Intel jcc
 erratum in C2 load barrier
In-Reply-To: <af3039a6-6929-25d8-47b9-afef2d6ddfa6@oracle.com>
References: <af3039a6-6929-25d8-47b9-afef2d6ddfa6@oracle.com>
Message-ID: <1dd569ef-6651-f728-bcdb-684dcb7c61f2@oracle.com>

On 11/19/19 3:20 PM, erik.osterlund at oracle.com wrote:
> Hi,
> 
> Intel released an erratum (SKX102) which causes "unexpected system 
> behaviour" when branches
> (including fused conditional branches) cross or end at 64 byte boundaries.
> They are mitigating this by rolling out microcode updates that disable 
> micro op caching for
> conditional branches that cross or end at 32 byte boundaries. The 
> mitigation can cause
> performance regressions, unless affected branches are aligned properly.
> 
> The erratum and its mitigation are described in more detail in this 
> document published by Intel:
> https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf 
> 
> 
> My intention for this patch is to introduce the infrastructure to 
> determine that we may
> have an affected CPU, and mitigate this by aligning the most important 
> branch in the whole
> JVM: the ZGC load barrier fast path check. Perhaps similar methodology 
> can be reused later
> to solve this for other performance critical code, but that is outside 
> the scope of this CR.
> 
> The sprinkling of nops do not seem to cause regressions in workloads I 
> have tried, given a
> machine without the JCC mitigations.
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8234160
> 
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8234160/webrev.00/

Looks good!

I think this is a good solution for ZGC for now. Solving this for all 
non-ZGC branches is a lot more involved. But if/when a more generic 
solution arrives, we can easily just remove this again.

/Per

> 
> Thanks,
> /Erik

From vladimir.x.ivanov at oracle.com  Thu Nov 21 11:02:08 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 21 Nov 2019 14:02:08 +0300
Subject: RFR: 8234160: ZGC: Enable optimized mitigation for Intel jcc
 erratum in C2 load barrier
In-Reply-To: <af3039a6-6929-25d8-47b9-afef2d6ddfa6@oracle.com>
References: <af3039a6-6929-25d8-47b9-afef2d6ddfa6@oracle.com>
Message-ID: <1e9031de-a5a9-fdf6-8284-bf3cc904320b@oracle.com>

Thanks for taking care of it, Erik.

Overall, the approach you chose looks promising.

I'll let Intel folks to comment on the details of CPU model dispatching 
in VM_Version::compute_has_intel_jcc_erratum().

As an alternative solution, you could just align instructions on 
8/16-byte boundary (for 5 and 10 byte instruction sequencies 
respectively). It'll definitely need more padding, but it looks easier 
to implement as well. Do you consider additional padding as risky from 
performance perspective?

Regarding the implementation itself, it looks like MacroAssembler is the 
best place for it.

There are 3 parts (mostly independent) of the fix you put in the single 
place: what to do (how much padding needed), where to do (what code is 
affected), and when to apply it (based on whether hardware is affected 
or not).

Even if you want to start with ZGC load barrier, it would be nice to 
factor the machinery in such a way that it's easy to apply it in the new 
code.

For example, AbstractAssembler already holds some state which is managed 
in RAII-style (e.g., InstructionMark and ShortBranchVerifier).

You could introduce a new capability in MacroAssembler which 
conditionally pads jumps and conditional jumps.

I'm fine with doing the full refactoring later, but it would be nice to 
do first steps in that direction right away.

Best regards,
Vladimir Ivanov

On 19.11.2019 17:20, erik.osterlund at oracle.com wrote:
> Hi,
> 
> Intel released an erratum (SKX102) which causes "unexpected system 
> behaviour" when branches
> (including fused conditional branches) cross or end at 64 byte boundaries.
> They are mitigating this by rolling out microcode updates that disable 
> micro op caching for
> conditional branches that cross or end at 32 byte boundaries. The 
> mitigation can cause
> performance regressions, unless affected branches are aligned properly.
> 
> The erratum and its mitigation are described in more detail in this 
> document published by Intel:
> https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf 
> 
> 
> My intention for this patch is to introduce the infrastructure to 
> determine that we may
> have an affected CPU, and mitigate this by aligning the most important 
> branch in the whole
> JVM: the ZGC load barrier fast path check. Perhaps similar methodology 
> can be reused later
> to solve this for other performance critical code, but that is outside 
> the scope of this CR.
> 
> The sprinkling of nops do not seem to cause regressions in workloads I 
> have tried, given a
> machine without the JCC mitigations.
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8234160
> 
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8234160/webrev.00/
> 
> Thanks,
> /Erik

From vladimir.x.ivanov at oracle.com  Thu Nov 21 11:12:53 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 21 Nov 2019 14:12:53 +0300
Subject: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum
 in C2 load barrier
In-Reply-To: <e8482585-3a67-5e07-dfc8-51550b97b650@oracle.com>
References: <af3039a6-6929-25d8-47b9-afef2d6ddfa6@oracle.com>
 <D7B5FCA3-B3C5-445D-9A4F-39671E4AEFD1@amazon.com>
 <e8482585-3a67-5e07-dfc8-51550b97b650@oracle.com>
Message-ID: <8fb87bca-a013-e2e0-211f-ebcd5931b8c1@oracle.com>

(Missed Paul's and your response when sending previous email.)

> That is a good question. Unfortunately, there are a few problems 
> applying such a strategy:
> 
> 1) We do not want to constrain the alignment such that the instruction 
> (+ specific offset) sits at e.g. the beginning of a 32 byte boundary. We 
> want to be more loose and say that any alignment is fine... except the 
> bad ones (crossing and ending at a 32 byte boundary). Otherwise I fear 
> we will find ourselves bloating the code cache with unnecessary nops to 
> align instructions that would never have been a problem. So in terms of 
> alignment constraints, I think such a hammer is too big.

It would be interesting to have some data on that one. Aligning 5-byte 
instruction on 8-byte boundary wastes 3 bytes at most. For 10-byte 
sequence it wastes 6 bytes at most which doesn't sound good.

> 2) Another issue is that the alignment constraints apply not just to the 
> one Mach node. It's sometimes for a fused op + jcc. Since we currently 
> match the conditions and their branches separately (and the conditions 
> not necessarily knowing they are indeed conditions to a branch, like for 
> example an and instruction). So aligning the jcc for example is not 
> necessarily going to help, unless its alignment knows what its preceding 
> instruction is, and whether it will be fused or not. And depending on 
> that, we want different alignment properties. So here the hammer is 
> seemingly too loose.

I mentioned MacroAssembler in previous email, because I don't consider 
it as C2-specific problem. Stubs, interpreter, and C1 are also affected 
and we need to fix them too (considering being on the edge of cache line 
may cause unpredictable behavior).

Detecting instruction sequencies is harder than aligning a single one, 
but still possible. And MacroAssembler can introduce a new "macro" 
instruction for conditional jumps which solves the detection problem 
once the code base migrate to it.

Best regards,
Vladimir Ivanov

> I'm not 100% sure what to suggest for the generic case, but perhaps:
> 
> After things stopped moving around, add a pass to the Mach nodes, 
> similar to branch shortening that:
> 
> 1) Set up a new flag (Flags_intel_jcc_mitigation or something) to be 
> used on Mach nodes to mark affected nodes.
> 2) Walk the Mach nodes and tag branches and conditions used by fused 
> branches (by walking edges), checking that the two are adjacent (by 
> looking at the node index in the block), and possibly also checking that 
> it is one of the affected condition instructions that will get fused.
> 3) Now that we know what Mach nodes (and sequences of macro fused nodes) 
> are problematic, we can put some code where the mach nodes are emitted 
> that checks for consecutively tagged nodes and inject nops in the code 
> buffer if they cross or end at 32 byte boundaries.
> 
> I suppose an alternative strategy is making sure that any problematic 
> instruction sequence that would be fused, is also fused into one Mach 
> node by sprinkling more rules in the AD file for the various forms of 
> conditional branches that we think cover all the cases, and then 
> applying the alignment constraint on individual nodes only. But it feels 
> like that could be more intrusive and less efficient).
> 
> Since the generic problem is more involved compared to the simpler ZGC 
> load barrier fix (which will need special treatment anyway), I would 
> like to focus this RFE only on the ZGC load barrier branch, because it 
> makes me sad when it has to suffer. Having said that, we will certainly 
> look into fixing the generic problem too after this.


From vladimir.x.ivanov at oracle.com  Thu Nov 21 11:21:32 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 21 Nov 2019 14:21:32 +0300
Subject: Allocation of array copy can be eliminated in particular cases
In-Reply-To: <30768651574285448@vla1-a6eaa355d163.qloud-c.yandex.net>
References: <48528911574117040@sas8-55d8cbf44a35.qloud-c.yandex.net>
 <9d229f16-e3e8-c848-eea8-f4a24082aa3f@oracle.com>
 <30768651574285448@vla1-a6eaa355d163.qloud-c.yandex.net>
Message-ID: <7e0bbca0-56ea-e64f-1720-1c4fbed8f1ff@oracle.com>


>> Moreover, the transformation is already there:
>>
>> http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/opto/memnode.cpp#l2388
> 
> Comment in line 2389 seems confusing to me:
> 
> // This works even if the length is not constant (clone or newArray).
> 
> When we clone array isn't the length constant and equal to the length of original array? I guess it cannot be different.

In the context of JIT-compilers, a "constant" means "compile-time 
constant" - a value known to JIT-compiler. The constant case is when the 
array has always the same length (e.g., "original_array.length == 0").

What you are describing is an invariant on 2 values: the length of 
original array and the length of cloned array are equal. But everything 
JIT-compiler knows about those values is (1) "original_array.length == 
cloned_array.length"; and (2) "original_array.length >= 0".

>> I haven't looked into the benchmarks you mentioned, but it looks like
>> cloned_array.length access is not the reason why cloned array is still
>> there.
> 
> Once I thought that cloned array is retained at run time because it's returned from method in original benchmark:
...
>  From this output I conclude that either I miss something from understanding of how compiler and runtime work, or this is a bug. >
> I will be happy to understand which of the two is correct :)

I assume Nils answered your question why cloning isn't eliminated right now.

Best regards,
Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Thu Nov 21 11:22:13 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 21 Nov 2019 14:22:13 +0300
Subject: [14] RFR (S): 8234387: C2: Better support of operands with
 multiple match rules in AD files
In-Reply-To: <cd4d7bc7-23f0-bd30-1846-f7aa72fad4d8@oracle.com>
References: <30655159-7431-1a33-cb10-373e32c68002@oracle.com>
 <cd4d7bc7-23f0-bd30-1846-f7aa72fad4d8@oracle.com>
Message-ID: <12d293f8-3ecd-2027-f731-b0ee1361250c@oracle.com>

Thanks, Tobias.

Best regards,
Vladimir Ivanov

On 21.11.2019 09:13, Tobias Hartmann wrote:
> Hi Vladimir,
> 
> looks good to me.
> 
> Best regards,
> Tobias
> 
> On 19.11.19 14:00, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234387/webrev.00
>> https://bugs.openjdk.java.net/browse/JDK-8234387
>>
>> Though ADLC accepts operands with multiple match rules, it doesn't generate correct code to handle
>> them except the first one.
>>
>> It doesn't cause any noticeable problems for existing code, but is a major limitation for generic
>> vector operands (JDK-8234391 [1]).
>>
>> Proposed fix enumerates all match rules.
>>
>> Fixed some missing declarations along the way.
>>
>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>> Reviewed-by: vlivanov, sviswanathan, ?
>>
>> Testing: tier1-4 (both with and without generic vectors)
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8234391

From nils.eliasson at oracle.com  Thu Nov 21 11:53:43 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Thu, 21 Nov 2019 12:53:43 +0100
Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles
In-Reply-To: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com>
References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com>
Message-ID: <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com>

I updated this to version 2.

http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/

I found a problen running compiler/arguments/TestStressReflectiveCode.java

Even though the clone was created as a oop clone, the type node type 
returns isa_aryprt. This is caused by the src ptr not being the base 
pointer. Until I fix that I wanted a more robust test.

In this webrev I split up the is_clonebasic into is_clone_oop and 
is_clone_array. (is_clone_oop_array is already there). Having a complete 
set with the three clone types allows for a robust test and easy 
verification. (The three variants end up in different paths with 
different GCs).

Regards,

Nils


On 2019-11-20 15:25, Nils Eliasson wrote:
> Hi,
>
> I found a few bugs after the enabling of the clone intrinsic in ZGC.
>
> 1) The arraycopy clone_basic has the parameters adjusted to work as a 
> memcopy. For an oop the src is pointing inside the oop to where we 
> want to start copying. But when we want to do a runtime call to clone 
> - the parameters are supposed to be the actual src oop and dst oop, 
> and the size should be the instance size.
>
> For now I have made a workaround. What should be done later is using 
> the offset in the arraycopy node to encode where the payload is, so 
> that the base pointers are always correct. But that would require 
> changes to the BarrierSet classes of all GCs. So I leave that for next 
> release.
>
> 2) The size parameter of the TypeFunc for the runtime call has the 
> wrong type. It was originally Long but missed the upper Half, it was 
> fixed to INT (JDK-8233834), but that is wrong and causes the compiles 
> to be skipped. We didn't notice that since they failed silently. That 
> is also why we didn't notice problem #1 too.
>
> https://bugs.openjdk.java.net/browse/JDK-8234520
>
> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/
>
> Please review!
>
> Nils
>

From claes.redestad at oracle.com  Thu Nov 21 13:21:31 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Thu, 21 Nov 2019 14:21:31 +0100
Subject: RFR: 8234328: VectorSet::clear can cause fragmentation
In-Reply-To: <CAA-vtUzJ_QQU8qewhzVt_tyv-Hot434gLT3+aZb4uwzM+mevPQ@mail.gmail.com>
References: <47dce9ee-0e62-7375-4dff-2924f824ecc6@oracle.com>
 <CAA-vtUzJ_QQU8qewhzVt_tyv-Hot434gLT3+aZb4uwzM+mevPQ@mail.gmail.com>
Message-ID: <aa9bc4db-35cf-d4dd-a1e1-798590690fae@oracle.com>

Hi Thomas,

On 2019-11-19 20:31, Thomas St?fe wrote:
> Hi Claes,
> 
> Not that this is wrong, but do we have to live in resource area? I fell 
> over such problems several times already, e.g. with resource-area-backed 
> StringStreams. Maybe it would be better to just forbid resizing of 
> RA-allocated arrays altogether.

this all gets a bit out of scope, but what alternatives do you see to
living in the resource area in general for something like this?

> 
> Then there is also the problem with passing RA-allocated arrays down the 
> stack and accidentally resizing them under a different ResourceMark. I 
> am not sure if this could happen with VectorSet though.

AFAICT there's a ResourceMark at the entry point of compilation,
then all others are restricted to a local scope around logging and
similar, so it doesn't _look_ like there's any potential issues around.

I guess it'd be nice in general with some sort of debug-only
ProhibitResourceMark(Arena*) you could wrap around calls into utilities
out of your control which would assert if any code tries to
allocate/reallocate/free in a specific resource arena.

Thanks /Claes

> 
> Thanks, Thomas
> 
> On Tue, Nov 19, 2019 at 11:16 AM Claes Redestad 
> <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
>     Webrev: http://cr.openjdk.java.net/~redestad/8234328/open.00/

From aph at redhat.com  Thu Nov 21 14:04:07 2019
From: aph at redhat.com (Andrew Haley)
Date: Thu, 21 Nov 2019 14:04:07 +0000
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <DB7PR08MB3115C5D0762DA2EEB3B7DF58964C0@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <DB7PR08MB3115D5D4659DC9A7886F2A44964D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com>
 <DB7PR08MB311548BD55E8D8F07D3E97DF964D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <638df51f-f9e0-5ceb-6509-7d396f85d8ec@redhat.com>
 <DB7PR08MB3115C5D0762DA2EEB3B7DF58964C0@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <399b921f-2110-3ceb-24b9-e346e7733ab5@redhat.com>

On 11/19/19 10:03 AM, Pengfei Li (Arm Technology China) wrote:
>> We should have a flag which is set if the search for nicely-aligned 
>> memory is successful, and then you can use that flag to determine if r27 is needed.
> In which file do you think we should add the flag? Can we just check the value of CompressedKlassPointers::base() in reg_mask_init() ?

I would call from the #ifdef AARCH64 code that allocates the memory into
a static method Assembler::setCompressedBaseAndScale().

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From tobias.hartmann at oracle.com  Thu Nov 21 14:10:03 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 21 Nov 2019 15:10:03 +0100
Subject: [14] RFR (S): 8234394: C2: Dynamic register class support in ADLC
In-Reply-To: <82d28c5a-1b18-240d-8356-5e4266c63bd1@oracle.com>
References: <82d28c5a-1b18-240d-8356-5e4266c63bd1@oracle.com>
Message-ID: <9091c3e0-e784-cba4-94ae-88daf42fed12@oracle.com>

Hi Vladimir,

looks good to me.

Best regards,
Tobias

On 19.11.19 14:40, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234394/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8234394
> 
> Introduce new "placeholder" register class which denotes that instructions which use operands of
> such class should dynamically query register masks from the operand instance and not hard-code them
> in the code.
> 
> It is required for generic vectors in order to support generic vector operand (vec/legVec)
> replacement with fixed-sized vector operands (vec[SDXYZ]/legVec[SDXYZ]) after matching is over.
> 
> As an example of usage, generic vector operand is declared as:
> 
> operand vec() %{
> ? constraint(ALLOC_IN_RC(dynamic));
> ? match(VecX);
> ? match(VecY);
> ? match(VecZ);
> ? match(VecS);
> ? match(VecD);
> ...
> 
> Then for an instruction which uses vec as DEF
> 
> x86.ad:
> instruct loadV4(vec dst, memory mem) %{
> 
> =ADLC=>
> 
> ad_x86_misc.cpp:
> const RegMask &loadV4Node::out_RegMask() const {
> ? return (*_opnds[0]->in_RegMask(0));
> }
> 
> vs
> 
> x86.ad:
> instruct loadV4(vecS dst, memory mem) %{
> 
> =ADLC=>
> 
> ad_x86_misc.cpp:
> const RegMask &loadV4Node::out_RegMask() const {
> ? return (VECTORS_REG_VLBWDQ_mask());
> }
> 
> 
> An operand with dynamic register class can't be used during code emission and should be replaced
> with something different before register allocation:
> 
> const RegMask *vecOper::in_RegMask(int index) const {
> ? return &RegMask::Empty;
> }
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
> 
> Testing: tier1-4 (both with and without generic vector operands)
> 
> Best regards,
> Vladimir Ivanov

From tobias.hartmann at oracle.com  Thu Nov 21 14:14:57 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 21 Nov 2019 15:14:57 +0100
Subject: RFR(S) : 8234290 : compiler/c2/Test6857159.java times out and
 fail to clean up files
In-Reply-To: <B5231169-CD06-44CE-ADC9-03757FBE2A5A@oracle.com>
References: <37F8BFE5-CF19-42CC-8C26-ECCB2008D4A1@oracle.com>
 <B5231169-CD06-44CE-ADC9-03757FBE2A5A@oracle.com>
Message-ID: <84f88569-8467-7a01-2941-5f69c1ee862b@oracle.com>

Hi Igor,

nice cleanup. Looks good to me.

Best regards,
Tobias

On 21.11.19 08:33, Igor Ignatyev wrote:
> ping?
> 
> -- Igor
> 
>> On Nov 16, 2019, at 10:07 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>
>> http://cr.openjdk.java.net/~iignatyev//8234290/webrev.00/index.html
>>> 67 lines changed: 16 ins; 24 del; 27 mod;
>>
>> Hi all,
>>
>> could you please review this small fix for Test6857159 test?
>> from JBS:
>>> the test has -XX:CompileOnly=compiler.c2.Test6857159$Test$ct::run, but there is no 'ct' class, there are ct[0-2], and ct0 the only which has 'run' method. shouldNotContain("COMPILE SKIPPED") and shouldContain("$ct0::run (16 bytes)"), which, I guess, were a defense against such situation, didn't help b/c PrintCompilation output doesn't have 'COMPILE SKIPPED' lines and have 'made not compilable on levels 0 1 2 3 ... $ct0::run (16 bytes) excluded by CompileCommand' line.
>> the patch fixes CompileOnly value (actually replaces it w/ the correct CompileCommand), removes extra layer, and makes the test to use WhiteBox to check if ct0::run got compiled.
>>
>> webrev: http://cr.openjdk.java.net/~iignatyev//8234290/webrev.00/index.html
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8234290
>> testing: 
>> - compiler/c2/Test6857159.java once on linux-x64,windows-x64,macosx-x64
>> - compiler/c2/Test6857159.java 100 time on windows-x64-debug (where all failures were seen so far)
>>
>> Thanks,
>> -- Igor
>>
> 

From vladimir.x.ivanov at oracle.com  Thu Nov 21 14:30:10 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 21 Nov 2019 17:30:10 +0300
Subject: [14] RFR (S): 8234394: C2: Dynamic register class support in ADLC
In-Reply-To: <9091c3e0-e784-cba4-94ae-88daf42fed12@oracle.com>
References: <82d28c5a-1b18-240d-8356-5e4266c63bd1@oracle.com>
 <9091c3e0-e784-cba4-94ae-88daf42fed12@oracle.com>
Message-ID: <be94f774-130f-8996-d739-21378c62087e@oracle.com>

Thanks, Tobias.

Best regards,
Vladimir Ivanov

On 21.11.2019 17:10, Tobias Hartmann wrote:
> Hi Vladimir,
> 
> looks good to me.
> 
> Best regards,
> Tobias
> 
> On 19.11.19 14:40, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234394/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8234394
>>
>> Introduce new "placeholder" register class which denotes that instructions which use operands of
>> such class should dynamically query register masks from the operand instance and not hard-code them
>> in the code.
>>
>> It is required for generic vectors in order to support generic vector operand (vec/legVec)
>> replacement with fixed-sized vector operands (vec[SDXYZ]/legVec[SDXYZ]) after matching is over.
>>
>> As an example of usage, generic vector operand is declared as:
>>
>> operand vec() %{
>>  ? constraint(ALLOC_IN_RC(dynamic));
>>  ? match(VecX);
>>  ? match(VecY);
>>  ? match(VecZ);
>>  ? match(VecS);
>>  ? match(VecD);
>> ...
>>
>> Then for an instruction which uses vec as DEF
>>
>> x86.ad:
>> instruct loadV4(vec dst, memory mem) %{
>>
>> =ADLC=>
>>
>> ad_x86_misc.cpp:
>> const RegMask &loadV4Node::out_RegMask() const {
>>  ? return (*_opnds[0]->in_RegMask(0));
>> }
>>
>> vs
>>
>> x86.ad:
>> instruct loadV4(vecS dst, memory mem) %{
>>
>> =ADLC=>
>>
>> ad_x86_misc.cpp:
>> const RegMask &loadV4Node::out_RegMask() const {
>>  ? return (VECTORS_REG_VLBWDQ_mask());
>> }
>>
>>
>> An operand with dynamic register class can't be used during code emission and should be replaced
>> with something different before register allocation:
>>
>> const RegMask *vecOper::in_RegMask(int index) const {
>>  ? return &RegMask::Empty;
>> }
>>
>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>> Reviewed-by: vlivanov, sviswanathan, ?
>>
>> Testing: tier1-4 (both with and without generic vector operands)
>>
>> Best regards,
>> Vladimir Ivanov

From erik.osterlund at oracle.com  Thu Nov 21 15:51:05 2019
From: erik.osterlund at oracle.com (erik.osterlund at oracle.com)
Date: Thu, 21 Nov 2019 16:51:05 +0100
Subject: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum
 in C2 load barrier
In-Reply-To: <8fb87bca-a013-e2e0-211f-ebcd5931b8c1@oracle.com>
References: <af3039a6-6929-25d8-47b9-afef2d6ddfa6@oracle.com>
 <D7B5FCA3-B3C5-445D-9A4F-39671E4AEFD1@amazon.com>
 <e8482585-3a67-5e07-dfc8-51550b97b650@oracle.com>
 <8fb87bca-a013-e2e0-211f-ebcd5931b8c1@oracle.com>
Message-ID: <0bf38389-2b1a-3198-acf5-3b526a9944d9@oracle.com>

Hi Vladimir,

Thank you for reviewing this patch.

On 11/21/19 12:12 PM, Vladimir Ivanov wrote:
> (Missed Paul's and your response when sending previous email.)
>
>> That is a good question. Unfortunately, there are a few problems 
>> applying such a strategy:
>>
>> 1) We do not want to constrain the alignment such that the 
>> instruction (+ specific offset) sits at e.g. the beginning of a 32 
>> byte boundary. We want to be more loose and say that any alignment is 
>> fine... except the bad ones (crossing and ending at a 32 byte 
>> boundary). Otherwise I fear we will find ourselves bloating the code 
>> cache with unnecessary nops to align instructions that would never 
>> have been a problem. So in terms of alignment constraints, I think 
>> such a hammer is too big.
>
> It would be interesting to have some data on that one. Aligning 5-byte 
> instruction on 8-byte boundary wastes 3 bytes at most. For 10-byte 
> sequence it wastes 6 bytes at most which doesn't sound good.

I think you missed one of my points (#2), which is that aligning single 
instructions is not enough to remedy the problem.
For example, consider you have an add; jne; sequence. Let's say we 
decide on a magical alignment that we apply globally for
jcc instructions.
Assume that the jne was indeed aligned correctly and we insert no nops. 
Then it is not necessarily the case that the fused
add + jne sequence has the desired alignment property as well (i.e. the 
add might cross 32 byte boundaries, tainting the
macro fused micro code). Therefore, this will not work.
I suppose that if you would always put in at least one nop before the 
jcc to break all macro fusions of branches globally,
then you will be able to do that. But that seems like a larger hammer 
than what we need.

>
>> 2) Another issue is that the alignment constraints apply not just to 
>> the one Mach node. It's sometimes for a fused op + jcc. Since we 
>> currently match the conditions and their branches separately (and the 
>> conditions not necessarily knowing they are indeed conditions to a 
>> branch, like for example an and instruction). So aligning the jcc for 
>> example is not necessarily going to help, unless its alignment knows 
>> what its preceding instruction is, and whether it will be fused or 
>> not. And depending on that, we want different alignment properties. 
>> So here the hammer is seemingly too loose.
>
> I mentioned MacroAssembler in previous email, because I don't consider 
> it as C2-specific problem. Stubs, interpreter, and C1 are also 
> affected and we need to fix them too (considering being on the edge of 
> cache line may cause unpredictable behavior).

I disagree that this is a correctness fix. The correctness fix is for 
branches on 64 byte boundaries, and is being dealt with
using micro code updates (that disables micro op caching of the 
problematic branch and fused branch micro ops). What
we are dealing with here is mitigating the performance hit of Intel's 
correctness mitigation for the erratum, which involves
branches and fused branches crossing or ending at 32 byte boundaries. In 
other words, the correctness is dealt with elsewhere,
and we are optimizing the code to avoid regressions for performance 
sensitive branches, due to that correctness fix.
Therefore, I think it is wise to focus the optimization efforts where it 
matters the most: C2.

> Detecting instruction sequencies is harder than aligning a single one, 
> but still possible. And MacroAssembler can introduce a new "macro" 
> instruction for conditional jumps which solves the detection problem 
> once the code base migrate to it.

Maybe. As I said, alignment is not enough, because it does not catch 
problematic macro fusing. We would have to do something
more drastic like intentionally breaking all macro fusion by putting 
leading nops regardless of alignment in branches. I'm not
sure what I think about that, given that we do this to optimize the code 
(and not as a correctness fix).

I don't think that doing some analysis on the Mach nodes and injecting 
the padding only where we actually need it is too
complicated in C2 (which I believe, at least for now, is where we should 
focus).

I have made a prototype, what this might look like and it looks like this:
http://cr.openjdk.java.net/~eosterlund/8234160/webrev.01/

Incremental:
http://cr.openjdk.java.net/~eosterlund/8234160/webrev.00_01/

The idea is pretty much what I said to Paul. There are 3 hooks needed:
1) Apply pessimistic size measurements during branch shortening on 
affected nodes
2) Analyze which nodes will fuse, and tag all affected mach nodes with a 
flag
3) When emitting the code, add required padding on the flagged nodes 
that end at or cross 32 byte boundaries.

I haven't run exhaustive tests or measurements yet on this. I thought we 
should sync ideas so we agree about direction before I do too much.

What do you think?

Thanks,
/Erik

> Best regards,
> Vladimir Ivanov
>
>> I'm not 100% sure what to suggest for the generic case, but perhaps:
>>
>> After things stopped moving around, add a pass to the Mach nodes, 
>> similar to branch shortening that:
>>
>> 1) Set up a new flag (Flags_intel_jcc_mitigation or something) to be 
>> used on Mach nodes to mark affected nodes.
>> 2) Walk the Mach nodes and tag branches and conditions used by fused 
>> branches (by walking edges), checking that the two are adjacent (by 
>> looking at the node index in the block), and possibly also checking 
>> that it is one of the affected condition instructions that will get 
>> fused.
>> 3) Now that we know what Mach nodes (and sequences of macro fused 
>> nodes) are problematic, we can put some code where the mach nodes are 
>> emitted that checks for consecutively tagged nodes and inject nops in 
>> the code buffer if they cross or end at 32 byte boundaries.
>>
>> I suppose an alternative strategy is making sure that any problematic 
>> instruction sequence that would be fused, is also fused into one Mach 
>> node by sprinkling more rules in the AD file for the various forms of 
>> conditional branches that we think cover all the cases, and then 
>> applying the alignment constraint on individual nodes only. But it 
>> feels like that could be more intrusive and less efficient).
>>
>> Since the generic problem is more involved compared to the simpler 
>> ZGC load barrier fix (which will need special treatment anyway), I 
>> would like to focus this RFE only on the ZGC load barrier branch, 
>> because it makes me sad when it has to suffer. Having said that, we 
>> will certainly look into fixing the generic problem too after this.
>


From fweimer at redhat.com  Thu Nov 21 16:05:44 2019
From: fweimer at redhat.com (Florian Weimer)
Date: Thu, 21 Nov 2019 17:05:44 +0100
Subject: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum
 in C2 load barrier
In-Reply-To: <D7B5FCA3-B3C5-445D-9A4F-39671E4AEFD1@amazon.com> (Paul
 Hohensee's message of "Wed, 20 Nov 2019 22:22:30 +0000")
References: <af3039a6-6929-25d8-47b9-afef2d6ddfa6@oracle.com>
 <D7B5FCA3-B3C5-445D-9A4F-39671E4AEFD1@amazon.com>
Message-ID: <87d0dlkyef.fsf@oldenburg2.str.redhat.com>

* Paul Hohensee:

> It's not just the zgc load barrier that's affected, it's every jcc and
> fuzed jcc (e.g., cmp/jcc and sub/jcc, because the pairs are issued on
> the same clock).

The microcode update reportedly affects all of JMP/Jcc/CALL/RET subject
to the alignment constraints, not just Jcc.  Therefore, mitigating the
performance impact and papering over the original issue require
different machine code.

Thanks,
Florian


From vladimir.kozlov at oracle.com  Thu Nov 21 17:44:00 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 21 Nov 2019 09:44:00 -0800
Subject: [14] RFR (S): 8234394: C2: Dynamic register class support in ADLC
In-Reply-To: <9091c3e0-e784-cba4-94ae-88daf42fed12@oracle.com>
References: <82d28c5a-1b18-240d-8356-5e4266c63bd1@oracle.com>
 <9091c3e0-e784-cba4-94ae-88daf42fed12@oracle.com>
Message-ID: <4a0741bf-77b6-eb67-87f6-ae4da7e4c3e0@oracle.com>

+1

Vladimir K

On 11/21/19 6:10 AM, Tobias Hartmann wrote:
> Hi Vladimir,
> 
> looks good to me.
> 
> Best regards,
> Tobias
> 
> On 19.11.19 14:40, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234394/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8234394
>>
>> Introduce new "placeholder" register class which denotes that instructions which use operands of
>> such class should dynamically query register masks from the operand instance and not hard-code them
>> in the code.
>>
>> It is required for generic vectors in order to support generic vector operand (vec/legVec)
>> replacement with fixed-sized vector operands (vec[SDXYZ]/legVec[SDXYZ]) after matching is over.
>>
>> As an example of usage, generic vector operand is declared as:
>>
>> operand vec() %{
>>  ? constraint(ALLOC_IN_RC(dynamic));
>>  ? match(VecX);
>>  ? match(VecY);
>>  ? match(VecZ);
>>  ? match(VecS);
>>  ? match(VecD);
>> ...
>>
>> Then for an instruction which uses vec as DEF
>>
>> x86.ad:
>> instruct loadV4(vec dst, memory mem) %{
>>
>> =ADLC=>
>>
>> ad_x86_misc.cpp:
>> const RegMask &loadV4Node::out_RegMask() const {
>>  ? return (*_opnds[0]->in_RegMask(0));
>> }
>>
>> vs
>>
>> x86.ad:
>> instruct loadV4(vecS dst, memory mem) %{
>>
>> =ADLC=>
>>
>> ad_x86_misc.cpp:
>> const RegMask &loadV4Node::out_RegMask() const {
>>  ? return (VECTORS_REG_VLBWDQ_mask());
>> }
>>
>>
>> An operand with dynamic register class can't be used during code emission and should be replaced
>> with something different before register allocation:
>>
>> const RegMask *vecOper::in_RegMask(int index) const {
>>  ? return &RegMask::Empty;
>> }
>>
>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>> Reviewed-by: vlivanov, sviswanathan, ?
>>
>> Testing: tier1-4 (both with and without generic vector operands)
>>
>> Best regards,
>> Vladimir Ivanov

From vladimir.kozlov at oracle.com  Thu Nov 21 18:15:45 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 21 Nov 2019 10:15:45 -0800
Subject: RFR 8234359: [JVMCI] invalidate_nmethod_mirror shouldn't use a
 phantom reference
In-Reply-To: <EB276519-BABD-4F83-A762-FEB63BC5EF7F@oracle.com>
References: <9e58f59e-b879-c9d5-be40-01a6b61cc87c@oracle.com>
 <EB276519-BABD-4F83-A762-FEB63BC5EF7F@oracle.com>
Message-ID: <d0b7c4a2-0fe0-19bd-8887-6cfd234d5e7a@oracle.com>

+1

Vladimir K

On 11/20/19 10:50 PM, Erik ?sterlund wrote:
> Hi Tom,
> 
> Looks good.
> 
> Thanks,
> /Erik
> 
>> On 21 Nov 2019, at 06:22, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
>>
>> ?http://cr.openjdk.java.net/~never/8234359/webrev
>> https://bugs.openjdk.java.net/browse/JDK-8234359
>>
>> While testing the latest JVMCI in JDK11, crashes were occurring during draining of the SATB buffers.  The problem was tracked down to invalidate_nmethod_mirror being called on an nmethod whose InstalledCode instance was also dead in the current GC. Reading this oop using NativeAccess<ON_PHANTOM_OOP_REF> lead to that oop being enqueued in the SATB buffer.  In JDK 14 it appears some other change in G1 disables those barriers at the point this code is executed but in JDK11 no such logic exists.  This code never resurrects that oop so using the normal AS_NO_KEEPALIVE semantics is correct and avoids attempting to enqueue the potentially dead object.
>>
>> tom
> 

From vladimir.kozlov at oracle.com  Thu Nov 21 18:21:42 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 21 Nov 2019 10:21:42 -0800
Subject: RFR 8234359: [JVMCI] invalidate_nmethod_mirror shouldn't use a
 phantom reference
In-Reply-To: <d0b7c4a2-0fe0-19bd-8887-6cfd234d5e7a@oracle.com>
References: <9e58f59e-b879-c9d5-be40-01a6b61cc87c@oracle.com>
 <EB276519-BABD-4F83-A762-FEB63BC5EF7F@oracle.com>
 <d0b7c4a2-0fe0-19bd-8887-6cfd234d5e7a@oracle.com>
Message-ID: <62e3317b-c1f2-fedb-d2b1-a1d0fdc30bed@oracle.com>

On other hand there is testing failure which seems 8234429.

May be we should hold this fix until 8234429 is resolved. And retest 
again after it fixed

Thanks,
Vladimir

On 11/21/19 10:15 AM, Vladimir Kozlov wrote:
> +1
> 
> Vladimir K
> 
> On 11/20/19 10:50 PM, Erik ?sterlund wrote:
>> Hi Tom,
>>
>> Looks good.
>>
>> Thanks,
>> /Erik
>>
>>> On 21 Nov 2019, at 06:22, Tom Rodriguez <tom.rodriguez at oracle.com> 
>>> wrote:
>>>
>>> ?http://cr.openjdk.java.net/~never/8234359/webrev
>>> https://bugs.openjdk.java.net/browse/JDK-8234359
>>>
>>> While testing the latest JVMCI in JDK11, crashes were occurring 
>>> during draining of the SATB buffers.? The problem was tracked down to 
>>> invalidate_nmethod_mirror being called on an nmethod whose 
>>> InstalledCode instance was also dead in the current GC. Reading this 
>>> oop using NativeAccess<ON_PHANTOM_OOP_REF> lead to that oop being 
>>> enqueued in the SATB buffer.? In JDK 14 it appears some other change 
>>> in G1 disables those barriers at the point this code is executed but 
>>> in JDK11 no such logic exists.? This code never resurrects that oop 
>>> so using the normal AS_NO_KEEPALIVE semantics is correct and avoids 
>>> attempting to enqueue the potentially dead object.
>>>
>>> tom
>>

From igor.ignatyev at oracle.com  Thu Nov 21 22:16:03 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Thu, 21 Nov 2019 14:16:03 -0800
Subject: RFR(S) : 8234290 : compiler/c2/Test6857159.java times out and
 fail to clean up files
In-Reply-To: <84f88569-8467-7a01-2941-5f69c1ee862b@oracle.com>
References: <37F8BFE5-CF19-42CC-8C26-ECCB2008D4A1@oracle.com>
 <B5231169-CD06-44CE-ADC9-03757FBE2A5A@oracle.com>
 <84f88569-8467-7a01-2941-5f69c1ee862b@oracle.com>
Message-ID: <6FEC2418-9848-4F25-86A2-FD2C88632184@oracle.com>

Tobias, thanks for your review, pushed.

-- Igor

> On Nov 21, 2019, at 6:14 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> Hi Igor,
> 
> nice cleanup. Looks good to me.
> 
> Best regards,
> Tobias
> 
> On 21.11.19 08:33, Igor Ignatyev wrote:
>> ping?
>> 
>> -- Igor
>> 
>>> On Nov 16, 2019, at 10:07 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>> 
>>> http://cr.openjdk.java.net/~iignatyev//8234290/webrev.00/index.html
>>>> 67 lines changed: 16 ins; 24 del; 27 mod;
>>> 
>>> Hi all,
>>> 
>>> could you please review this small fix for Test6857159 test?
>>> from JBS:
>>>> the test has -XX:CompileOnly=compiler.c2.Test6857159$Test$ct::run, but there is no 'ct' class, there are ct[0-2], and ct0 the only which has 'run' method. shouldNotContain("COMPILE SKIPPED") and shouldContain("$ct0::run (16 bytes)"), which, I guess, were a defense against such situation, didn't help b/c PrintCompilation output doesn't have 'COMPILE SKIPPED' lines and have 'made not compilable on levels 0 1 2 3 ... $ct0::run (16 bytes) excluded by CompileCommand' line.
>>> the patch fixes CompileOnly value (actually replaces it w/ the correct CompileCommand), removes extra layer, and makes the test to use WhiteBox to check if ct0::run got compiled.
>>> 
>>> webrev: http://cr.openjdk.java.net/~iignatyev//8234290/webrev.00/index.html
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8234290
>>> testing: 
>>> - compiler/c2/Test6857159.java once on linux-x64,windows-x64,macosx-x64
>>> - compiler/c2/Test6857159.java 100 time on windows-x64-debug (where all failures were seen so far)
>>> 
>>> Thanks,
>>> -- Igor
>>> 
>> 


From sandhya.viswanathan at intel.com  Thu Nov 21 23:58:47 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Thu, 21 Nov 2019 23:58:47 +0000
Subject: RFR: 8234160: ZGC: Enable optimized mitigation for Intel jcc
 erratum in C2 load barrier
In-Reply-To: <1e9031de-a5a9-fdf6-8284-bf3cc904320b@oracle.com>
References: <af3039a6-6929-25d8-47b9-afef2d6ddfa6@oracle.com>
 <1e9031de-a5a9-fdf6-8284-bf3cc904320b@oracle.com>
Message-ID: <CH2PR11MB428025DB139FEDDF8D18637CEF4E0@CH2PR11MB4280.namprd11.prod.outlook.com>

Hi Vladimir/Eric,

The CPU model list in VM_Version::compute_has_intel_jcc_erratum() looks correct and is per section 4.0 of the document:
https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf

Best Regards,
Sandhya

 
-----Original Message-----
From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Vladimir Ivanov
Sent: Thursday, November 21, 2019 3:02 AM
To: erik.osterlund at oracle.com; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier

Thanks for taking care of it, Erik.

Overall, the approach you chose looks promising.

I'll let Intel folks to comment on the details of CPU model dispatching in VM_Version::compute_has_intel_jcc_erratum().

As an alternative solution, you could just align instructions on 8/16-byte boundary (for 5 and 10 byte instruction sequencies respectively). It'll definitely need more padding, but it looks easier to implement as well. Do you consider additional padding as risky from performance perspective?

Regarding the implementation itself, it looks like MacroAssembler is the best place for it.

There are 3 parts (mostly independent) of the fix you put in the single
place: what to do (how much padding needed), where to do (what code is affected), and when to apply it (based on whether hardware is affected or not).

Even if you want to start with ZGC load barrier, it would be nice to factor the machinery in such a way that it's easy to apply it in the new code.

For example, AbstractAssembler already holds some state which is managed in RAII-style (e.g., InstructionMark and ShortBranchVerifier).

You could introduce a new capability in MacroAssembler which conditionally pads jumps and conditional jumps.

I'm fine with doing the full refactoring later, but it would be nice to do first steps in that direction right away.

Best regards,
Vladimir Ivanov

On 19.11.2019 17:20, erik.osterlund at oracle.com wrote:
> Hi,
> 
> Intel released an erratum (SKX102) which causes "unexpected system 
> behaviour" when branches (including fused conditional branches) cross 
> or end at 64 byte boundaries.
> They are mitigating this by rolling out microcode updates that disable 
> micro op caching for conditional branches that cross or end at 32 byte 
> boundaries. The mitigation can cause performance regressions, unless 
> affected branches are aligned properly.
> 
> The erratum and its mitigation are described in more detail in this 
> document published by Intel:
> https://www.intel.com/content/dam/support/us/en/documents/processors/m
> itigations-jump-conditional-code-erratum.pdf
> 
> 
> My intention for this patch is to introduce the infrastructure to 
> determine that we may have an affected CPU, and mitigate this by 
> aligning the most important branch in the whole
> JVM: the ZGC load barrier fast path check. Perhaps similar methodology 
> can be reused later to solve this for other performance critical code, 
> but that is outside the scope of this CR.
> 
> The sprinkling of nops do not seem to cause regressions in workloads I 
> have tried, given a machine without the JCC mitigations.
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8234160
> 
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8234160/webrev.00/
> 
> Thanks,
> /Erik

From sandhya.viswanathan at intel.com  Fri Nov 22 01:27:34 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 22 Nov 2019 01:27:34 +0000
Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is
 specified after JDK-8221092
Message-ID: <DM6PR11MB428392576157AE6E8AB61289EF490@DM6PR11MB4283.namprd11.prod.outlook.com>

On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>.

When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument.
This should automatically result in MaxVectorSize being set to 64 bytes.

However post JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes.
I have a patch which fixes the issue.

JBS: https://bugs.openjdk.java.net/browse/JDK-8234610
Webrev: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.00/

Please review and approve.

Best Regards,
Sandhya


From vladimir.kozlov at oracle.com  Fri Nov 22 02:02:43 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 21 Nov 2019 18:02:43 -0800
Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is
 specified after JDK-8221092
In-Reply-To: <DM6PR11MB428392576157AE6E8AB61289EF490@DM6PR11MB4283.namprd11.prod.outlook.com>
References: <DM6PR11MB428392576157AE6E8AB61289EF490@DM6PR11MB4283.namprd11.prod.outlook.com>
Message-ID: <cb77913b-32cd-2da7-d3d2-3f9c7c92063a@oracle.com>

Hi Sandhya,

I think you should put cpuid code added by 8221092 under if (use_evex) 
checks because if user specified UseAVX=2 the code under (use_evex) will 
not be executed anyway. Or I am missing something.

I did not get why you said MaxVectorSize is being wrongly set to 16 
bytes. It should 32 because it will set UseAVX=1 in current code.

Thanks,
Vladimir

On 11/21/19 5:27 PM, Viswanathan, Sandhya wrote:
> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>.
> 
> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument.
> This should automatically result in MaxVectorSize being set to 64 bytes.
> 
> However post JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes.
> I have a patch which fixes the issue.
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8234610
> Webrev: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.00/
> 
> Please review and approve.
> 
> Best Regards,
> Sandhya
> 
> 

From Pengfei.Li at arm.com  Fri Nov 22 08:45:47 2019
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Fri, 22 Nov 2019 08:45:47 +0000
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <399b921f-2110-3ceb-24b9-e346e7733ab5@redhat.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <DB7PR08MB3115D5D4659DC9A7886F2A44964D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com>
 <DB7PR08MB311548BD55E8D8F07D3E97DF964D0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <638df51f-f9e0-5ceb-6509-7d396f85d8ec@redhat.com>
 <DB7PR08MB3115C5D0762DA2EEB3B7DF58964C0@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <399b921f-2110-3ceb-24b9-e346e7733ab5@redhat.com>
Message-ID: <DB7PR08MB3115AC735AAABBEEA51B439496490@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi Andrew,

> > In which file do you think we should add the flag? Can we just check the
> value of CompressedKlassPointers::base() in reg_mask_init() ?
> 
> I would call from the #ifdef AARCH64 code that allocates the memory into a
> static method Assembler::setCompressedBaseAndScale().

Thanks for your suggestion. I have ever tried to set a flag from the metaspace reservation code but now I'm switching back to my another approach. Below is my justification.

The #ifdef code block which allocates metaspace is actually used by both AARCH64 and AIX. Of course, we can add AArch64-specific logic inside with AARCH64_ONLY(), but it doesn't cover all scenarios that r27 isn't used. In klass pointers encoding and decoding, we have a special path called use_XOR_for_compressed_class_base where the metaspace may be not nicely fit but r27 isn't used. [1]

Regarding your suggestion of setting compressed base and shift values into AArch64 assembler, it can solve the problem of covering the use_XOR_for_compressed_class_base path. But we have to do it in Metaspace::set_narrow_klass_base_and_shift() where the base and shift are finally determined and introduce new code block of "#ifdef AARCH64 #endif" in HotSpot shared code.

In my approach, I added a method in aarch64.ad to check the base and shift in reg_mask_init(), and moved the logic of use_XOR_for_compressed_class_base here from the MacroAssembler constructor. I know my implementation has a drawback that the logic of my new method may be mis-aligned with the encoding/decoding logic if someone changes the MacroAssembler code without noticing my code. So I also added a few lines of comments to avoid this happening. See my updated webrev below.

http://cr.openjdk.java.net/~pli/rfr/8233743/webrev.01/

Please let me know if you have any further suggestions or disagreements.

[1] http://hg.openjdk.java.net/jdk/jdk/file/fcd74557a9cc/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#l3918

--
Thanks,
Pengfei


From vladimir.x.ivanov at oracle.com  Fri Nov 22 12:39:07 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 22 Nov 2019 15:39:07 +0300
Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is
 specified after JDK-8221092
In-Reply-To: <DM6PR11MB428392576157AE6E8AB61289EF490@DM6PR11MB4283.namprd11.prod.outlook.com>
References: <DM6PR11MB428392576157AE6E8AB61289EF490@DM6PR11MB4283.namprd11.prod.outlook.com>
Message-ID: <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com>

Hi Sandhya,

> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>.
> 
> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument.
> This should automatically result in MaxVectorSize being set to 64 bytes.
> 
> However post JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes.

Please, elaborate how it happens and how legacy_setup affects it?

Why 8221092 needs the code you conditionally exclude.
Why the following isn't enough?

    if (FLAG_IS_DEFAULT(UseAVX)) {
      FLAG_SET_DEFAULT(UseAVX, use_avx_limit);
+    if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE && 
_stepping < 5) {
+      FLAG_SET_DEFAULT(UseAVX, 2);  //Set UseAVX=2 for Skylake
+    }
    } else if (UseAVX > use_avx_limit) {

Best regards,
Vladimir Ivanov

From rwestrel at redhat.com  Fri Nov 22 13:19:36 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 22 Nov 2019 14:19:36 +0100
Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out
In-Reply-To: <CAHjP37GtPX0p984_7Q-akN5gSGDJEgWDhJitWn2dHc8DRZFAdQ@mail.gmail.com>
References: <878spbc0c8.fsf@redhat.com>
 <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com>
 <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com> <87y2wu7kpn.fsf@redhat.com>
 <CAHjP37GtPX0p984_7Q-akN5gSGDJEgWDhJitWn2dHc8DRZFAdQ@mail.gmail.com>
Message-ID: <87mucoaw0n.fsf@redhat.com>


Hi Vitaly,

> Thanks for fixing this! :) Perhaps a bit too premature to ask but: any
> chance this will get backported to 11?

I just backported it to openjdk 11u.

Roland.


From vladimir.x.ivanov at oracle.com  Fri Nov 22 13:22:13 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 22 Nov 2019 16:22:13 +0300
Subject: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum
 in C2 load barrier
In-Reply-To: <0bf38389-2b1a-3198-acf5-3b526a9944d9@oracle.com>
References: <af3039a6-6929-25d8-47b9-afef2d6ddfa6@oracle.com>
 <D7B5FCA3-B3C5-445D-9A4F-39671E4AEFD1@amazon.com>
 <e8482585-3a67-5e07-dfc8-51550b97b650@oracle.com>
 <8fb87bca-a013-e2e0-211f-ebcd5931b8c1@oracle.com>
 <0bf38389-2b1a-3198-acf5-3b526a9944d9@oracle.com>
Message-ID: <edb5323e-604c-7d7f-cf9b-8c9901979f9e@oracle.com>

Hi Erik,
>>> That is a good question. Unfortunately, there are a few problems
>>> applying such a strategy:
>>>
>>> 1) We do not want to constrain the alignment such that the 
>>> instruction (+ specific offset) sits at e.g. the beginning of a 32 
>>> byte boundary. We want to be more loose and say that any alignment is 
>>> fine... except the bad ones (crossing and ending at a 32 byte 
>>> boundary). Otherwise I fear we will find ourselves bloating the code 
>>> cache with unnecessary nops to align instructions that would never 
>>> have been a problem. So in terms of alignment constraints, I think 
>>> such a hammer is too big.
>>
>> It would be interesting to have some data on that one. Aligning 5-byte 
>> instruction on 8-byte boundary wastes 3 bytes at most. For 10-byte 
>> sequence it wastes 6 bytes at most which doesn't sound good.
> 
> I think you missed one of my points (#2), which is that aligning single 
> instructions is not enough to remedy the problem.
> For example, consider you have an add; jne; sequence. Let's say we 
> decide on a magical alignment that we apply globally for
> jcc instructions.
> Assume that the jne was indeed aligned correctly and we insert no nops. 
> Then it is not necessarily the case that the fused
> add + jne sequence has the desired alignment property as well (i.e. the 
> add might cross 32 byte boundaries, tainting the
> macro fused micro code). Therefore, this will not work.
> I suppose that if you would always put in at least one nop before the 
> jcc to break all macro fusions of branches globally,
> then you will be able to do that. But that seems like a larger hammer 
> than what we need.

I was thinking about aligning macro-fused instruction sequences and not 
individual jcc instructions. There are both automatic and cooperative 
ways to detect them.

>>> 2) Another issue is that the alignment constraints apply not just to 
>>> the one Mach node. It's sometimes for a fused op + jcc. Since we 
>>> currently match the conditions and their branches separately (and the 
>>> conditions not necessarily knowing they are indeed conditions to a 
>>> branch, like for example an and instruction). So aligning the jcc for 
>>> example is not necessarily going to help, unless its alignment knows 
>>> what its preceding instruction is, and whether it will be fused or 
>>> not. And depending on that, we want different alignment properties. 
>>> So here the hammer is seemingly too loose.
>>
>> I mentioned MacroAssembler in previous email, because I don't consider 
>> it as C2-specific problem. Stubs, interpreter, and C1 are also 
>> affected and we need to fix them too (considering being on the edge of 
>> cache line may cause unpredictable behavior).
> 
> I disagree that this is a correctness fix. The correctness fix is for 
> branches on 64 byte boundaries, and is being dealt with
> using micro code updates (that disables micro op caching of the 
> problematic branch and fused branch micro ops). What
> we are dealing with here is mitigating the performance hit of Intel's 
> correctness mitigation for the erratum, which involves
> branches and fused branches crossing or ending at 32 byte boundaries. In 
> other words, the correctness is dealt with elsewhere,
> and we are optimizing the code to avoid regressions for performance 
> sensitive branches, due to that correctness fix.
> Therefore, I think it is wise to focus the optimization efforts where it 
> matters the most: C2.

Point taken. I agree that the focus should be on performance.

But I'd include stubs as well. Many of them are extensively used from 
C2-generated code.

> I don't think that doing some analysis on the Mach nodes and injecting 
> the padding only where we actually need it is too
> complicated in C2 (which I believe, at least for now, is where we should 
> focus).

I'm curious is there anything special about Mach IR which 
helps/simplifies the analysis?

One problem I see with it is that some mach nodes issue local branches 
for their own purposes (inside MachNode::emit()) and you can't mitigate 
such cases on the level of Mach IR.

> I have made a prototype, what this might look like and it looks like this:
> http://cr.openjdk.java.net/~eosterlund/8234160/webrev.01/

Just one more comment: it's weird to see intel_jcc_erratum referenced in 
shared code. You could #ifdef it for x86-only, but it's much better to 
move the code to x86-specific location.

Best regards,
Vladimir Ivanov

> Incremental:
> http://cr.openjdk.java.net/~eosterlund/8234160/webrev.00_01/
> 
> The idea is pretty much what I said to Paul. There are 3 hooks needed:
> 1) Apply pessimistic size measurements during branch shortening on 
> affected nodes
> 2) Analyze which nodes will fuse, and tag all affected mach nodes with a 
> flag
> 3) When emitting the code, add required padding on the flagged nodes 
> that end at or cross 32 byte boundaries.
> 
> I haven't run exhaustive tests or measurements yet on this. I thought we 
> should sync ideas so we agree about direction before I do too much.
> 
> What do you think?

> 
> Thanks,
> /Erik
> 
>> Best regards,
>> Vladimir Ivanov
>>
>>> I'm not 100% sure what to suggest for the generic case, but perhaps:
>>>
>>> After things stopped moving around, add a pass to the Mach nodes, 
>>> similar to branch shortening that:
>>>
>>> 1) Set up a new flag (Flags_intel_jcc_mitigation or something) to be 
>>> used on Mach nodes to mark affected nodes.
>>> 2) Walk the Mach nodes and tag branches and conditions used by fused 
>>> branches (by walking edges), checking that the two are adjacent (by 
>>> looking at the node index in the block), and possibly also checking 
>>> that it is one of the affected condition instructions that will get 
>>> fused.
>>> 3) Now that we know what Mach nodes (and sequences of macro fused 
>>> nodes) are problematic, we can put some code where the mach nodes are 
>>> emitted that checks for consecutively tagged nodes and inject nops in 
>>> the code buffer if they cross or end at 32 byte boundaries.
>>>
>>> I suppose an alternative strategy is making sure that any problematic 
>>> instruction sequence that would be fused, is also fused into one Mach 
>>> node by sprinkling more rules in the AD file for the various forms of 
>>> conditional branches that we think cover all the cases, and then 
>>> applying the alignment constraint on individual nodes only. But it 
>>> feels like that could be more intrusive and less efficient).
>>>
>>> Since the generic problem is more involved compared to the simpler 
>>> ZGC load barrier fix (which will need special treatment anyway), I 
>>> would like to focus this RFE only on the ZGC load barrier branch, 
>>> because it makes me sad when it has to suffer. Having said that, we 
>>> will certainly look into fixing the generic problem too after this.
>>
> 

From vladimir.x.ivanov at oracle.com  Fri Nov 22 13:50:09 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 22 Nov 2019 16:50:09 +0300
Subject: [14] RFR (S): 8234394: C2: Dynamic register class support in ADLC
In-Reply-To: <4a0741bf-77b6-eb67-87f6-ae4da7e4c3e0@oracle.com>
References: <82d28c5a-1b18-240d-8356-5e4266c63bd1@oracle.com>
 <9091c3e0-e784-cba4-94ae-88daf42fed12@oracle.com>
 <4a0741bf-77b6-eb67-87f6-ae4da7e4c3e0@oracle.com>
Message-ID: <c6b46d1c-3a89-de29-4f66-0e61814b0837@oracle.com>

Thanks, Vladimir.

Best regards,
Vladimir Ivanov

On 21.11.2019 20:44, Vladimir Kozlov wrote:
> +1
> 
> Vladimir K
> 
> On 11/21/19 6:10 AM, Tobias Hartmann wrote:
>> Hi Vladimir,
>>
>> looks good to me.
>>
>> Best regards,
>> Tobias
>>
>> On 19.11.19 14:40, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234394/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8234394
>>>
>>> Introduce new "placeholder" register class which denotes that 
>>> instructions which use operands of
>>> such class should dynamically query register masks from the operand 
>>> instance and not hard-code them
>>> in the code.
>>>
>>> It is required for generic vectors in order to support generic vector 
>>> operand (vec/legVec)
>>> replacement with fixed-sized vector operands 
>>> (vec[SDXYZ]/legVec[SDXYZ]) after matching is over.
>>>
>>> As an example of usage, generic vector operand is declared as:
>>>
>>> operand vec() %{
>>> ?? constraint(ALLOC_IN_RC(dynamic));
>>> ?? match(VecX);
>>> ?? match(VecY);
>>> ?? match(VecZ);
>>> ?? match(VecS);
>>> ?? match(VecD);
>>> ...
>>>
>>> Then for an instruction which uses vec as DEF
>>>
>>> x86.ad:
>>> instruct loadV4(vec dst, memory mem) %{
>>>
>>> =ADLC=>
>>>
>>> ad_x86_misc.cpp:
>>> const RegMask &loadV4Node::out_RegMask() const {
>>> ?? return (*_opnds[0]->in_RegMask(0));
>>> }
>>>
>>> vs
>>>
>>> x86.ad:
>>> instruct loadV4(vecS dst, memory mem) %{
>>>
>>> =ADLC=>
>>>
>>> ad_x86_misc.cpp:
>>> const RegMask &loadV4Node::out_RegMask() const {
>>> ?? return (VECTORS_REG_VLBWDQ_mask());
>>> }
>>>
>>>
>>> An operand with dynamic register class can't be used during code 
>>> emission and should be replaced
>>> with something different before register allocation:
>>>
>>> const RegMask *vecOper::in_RegMask(int index) const {
>>> ?? return &RegMask::Empty;
>>> }
>>>
>>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>>> Reviewed-by: vlivanov, sviswanathan, ?
>>>
>>> Testing: tier1-4 (both with and without generic vector operands)
>>>
>>> Best regards,
>>> Vladimir Ivanov

From vitalyd at gmail.com  Fri Nov 22 13:53:22 2019
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Fri, 22 Nov 2019 08:53:22 -0500
Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out
In-Reply-To: <87mucoaw0n.fsf@redhat.com>
References: <878spbc0c8.fsf@redhat.com>
 <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com>
 <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com> <87y2wu7kpn.fsf@redhat.com>
 <CAHjP37GtPX0p984_7Q-akN5gSGDJEgWDhJitWn2dHc8DRZFAdQ@mail.gmail.com>
 <87mucoaw0n.fsf@redhat.com>
Message-ID: <CAHjP37E38ceG_YZth4Kip-cDy_2wCZWpFjwQwpp9db1CeUuZBw@mail.gmail.com>

Hey Roland,

On Fri, Nov 22, 2019 at 8:19 AM Roland Westrelin <rwestrel at redhat.com>
wrote:

>
> Hi Vitaly,
>
> > Thanks for fixing this! :) Perhaps a bit too premature to ask but: any
> > chance this will get backported to 11?
>
> I just backported it to openjdk 11u.

Fantastic - thanks very much!

>
>
> Roland.
>
> --
Sent from my phone

From erik.osterlund at oracle.com  Fri Nov 22 13:55:52 2019
From: erik.osterlund at oracle.com (erik.osterlund at oracle.com)
Date: Fri, 22 Nov 2019 14:55:52 +0100
Subject: RFR: 8234160: ZGC: Enable optimized mitigation for Intel jcc
 erratum in C2 load barrier
In-Reply-To: <CH2PR11MB428025DB139FEDDF8D18637CEF4E0@CH2PR11MB4280.namprd11.prod.outlook.com>
References: <af3039a6-6929-25d8-47b9-afef2d6ddfa6@oracle.com>
 <1e9031de-a5a9-fdf6-8284-bf3cc904320b@oracle.com>
 <CH2PR11MB428025DB139FEDDF8D18637CEF4E0@CH2PR11MB4280.namprd11.prod.outlook.com>
Message-ID: <e6218978-1075-9657-a515-dfdc2364dd7a@oracle.com>

Hi Sandhya,

Thanks for verifying the magic table.

/Erik

On 11/22/19 12:58 AM, Viswanathan, Sandhya wrote:
> Hi Vladimir/Eric,
>
> The CPU model list in VM_Version::compute_has_intel_jcc_erratum() looks correct and is per section 4.0 of the document:
> https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf
>
> Best Regards,
> Sandhya
>
>   
>
> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Vladimir Ivanov
> Sent: Thursday, November 21, 2019 3:02 AM
> To: erik.osterlund at oracle.com; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier
>
> Thanks for taking care of it, Erik.
>
> Overall, the approach you chose looks promising.
>
> I'll let Intel folks to comment on the details of CPU model dispatching in VM_Version::compute_has_intel_jcc_erratum().
>
> As an alternative solution, you could just align instructions on 8/16-byte boundary (for 5 and 10 byte instruction sequencies respectively). It'll definitely need more padding, but it looks easier to implement as well. Do you consider additional padding as risky from performance perspective?
>
> Regarding the implementation itself, it looks like MacroAssembler is the best place for it.
>
> There are 3 parts (mostly independent) of the fix you put in the single
> place: what to do (how much padding needed), where to do (what code is affected), and when to apply it (based on whether hardware is affected or not).
>
> Even if you want to start with ZGC load barrier, it would be nice to factor the machinery in such a way that it's easy to apply it in the new code.
>
> For example, AbstractAssembler already holds some state which is managed in RAII-style (e.g., InstructionMark and ShortBranchVerifier).
>
> You could introduce a new capability in MacroAssembler which conditionally pads jumps and conditional jumps.
>
> I'm fine with doing the full refactoring later, but it would be nice to do first steps in that direction right away.
>
> Best regards,
> Vladimir Ivanov
>
> On 19.11.2019 17:20, erik.osterlund at oracle.com wrote:
>> Hi,
>>
>> Intel released an erratum (SKX102) which causes "unexpected system
>> behaviour" when branches (including fused conditional branches) cross
>> or end at 64 byte boundaries.
>> They are mitigating this by rolling out microcode updates that disable
>> micro op caching for conditional branches that cross or end at 32 byte
>> boundaries. The mitigation can cause performance regressions, unless
>> affected branches are aligned properly.
>>
>> The erratum and its mitigation are described in more detail in this
>> document published by Intel:
>> https://www.intel.com/content/dam/support/us/en/documents/processors/m
>> itigations-jump-conditional-code-erratum.pdf
>>
>>
>> My intention for this patch is to introduce the infrastructure to
>> determine that we may have an affected CPU, and mitigate this by
>> aligning the most important branch in the whole
>> JVM: the ZGC load barrier fast path check. Perhaps similar methodology
>> can be reused later to solve this for other performance critical code,
>> but that is outside the scope of this CR.
>>
>> The sprinkling of nops do not seem to cause regressions in workloads I
>> have tried, given a machine without the JCC mitigations.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8234160
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8234160/webrev.00/
>>
>> Thanks,
>> /Erik


From erik.osterlund at oracle.com  Fri Nov 22 14:23:29 2019
From: erik.osterlund at oracle.com (erik.osterlund at oracle.com)
Date: Fri, 22 Nov 2019 15:23:29 +0100
Subject: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum
 in C2 load barrier
In-Reply-To: <edb5323e-604c-7d7f-cf9b-8c9901979f9e@oracle.com>
References: <af3039a6-6929-25d8-47b9-afef2d6ddfa6@oracle.com>
 <D7B5FCA3-B3C5-445D-9A4F-39671E4AEFD1@amazon.com>
 <e8482585-3a67-5e07-dfc8-51550b97b650@oracle.com>
 <8fb87bca-a013-e2e0-211f-ebcd5931b8c1@oracle.com>
 <0bf38389-2b1a-3198-acf5-3b526a9944d9@oracle.com>
 <edb5323e-604c-7d7f-cf9b-8c9901979f9e@oracle.com>
Message-ID: <5c442ef3-be3b-e540-0097-5a5b00f5b94e@oracle.com>

Hi Vladimir,

On 11/22/19 2:22 PM, Vladimir Ivanov wrote:
> Hi Erik,
>>>> That is a good question. Unfortunately, there are a few problems
>>>> applying such a strategy:
>>>>
>>>> 1) We do not want to constrain the alignment such that the 
>>>> instruction (+ specific offset) sits at e.g. the beginning of a 32 
>>>> byte boundary. We want to be more loose and say that any alignment 
>>>> is fine... except the bad ones (crossing and ending at a 32 byte 
>>>> boundary). Otherwise I fear we will find ourselves bloating the 
>>>> code cache with unnecessary nops to align instructions that would 
>>>> never have been a problem. So in terms of alignment constraints, I 
>>>> think such a hammer is too big.
>>>
>>> It would be interesting to have some data on that one. Aligning 
>>> 5-byte instruction on 8-byte boundary wastes 3 bytes at most. For 
>>> 10-byte sequence it wastes 6 bytes at most which doesn't sound good.
>>
>> I think you missed one of my points (#2), which is that aligning 
>> single instructions is not enough to remedy the problem.
>> For example, consider you have an add; jne; sequence. Let's say we 
>> decide on a magical alignment that we apply globally for
>> jcc instructions.
>> Assume that the jne was indeed aligned correctly and we insert no 
>> nops. Then it is not necessarily the case that the fused
>> add + jne sequence has the desired alignment property as well (i.e. 
>> the add might cross 32 byte boundaries, tainting the
>> macro fused micro code). Therefore, this will not work.
>> I suppose that if you would always put in at least one nop before the 
>> jcc to break all macro fusions of branches globally,
>> then you will be able to do that. But that seems like a larger hammer 
>> than what we need.
>
> I was thinking about aligning macro-fused instruction sequences and 
> not individual jcc instructions. There are both automatic and 
> cooperative ways to detect them.

Okay.

>
>>>> 2) Another issue is that the alignment constraints apply not just 
>>>> to the one Mach node. It's sometimes for a fused op + jcc. Since we 
>>>> currently match the conditions and their branches separately (and 
>>>> the conditions not necessarily knowing they are indeed conditions 
>>>> to a branch, like for example an and instruction). So aligning the 
>>>> jcc for example is not necessarily going to help, unless its 
>>>> alignment knows what its preceding instruction is, and whether it 
>>>> will be fused or not. And depending on that, we want different 
>>>> alignment properties. So here the hammer is seemingly too loose.
>>>
>>> I mentioned MacroAssembler in previous email, because I don't 
>>> consider it as C2-specific problem. Stubs, interpreter, and C1 are 
>>> also affected and we need to fix them too (considering being on the 
>>> edge of cache line may cause unpredictable behavior).
>>
>> I disagree that this is a correctness fix. The correctness fix is for 
>> branches on 64 byte boundaries, and is being dealt with
>> using micro code updates (that disables micro op caching of the 
>> problematic branch and fused branch micro ops). What
>> we are dealing with here is mitigating the performance hit of Intel's 
>> correctness mitigation for the erratum, which involves
>> branches and fused branches crossing or ending at 32 byte boundaries. 
>> In other words, the correctness is dealt with elsewhere,
>> and we are optimizing the code to avoid regressions for performance 
>> sensitive branches, due to that correctness fix.
>> Therefore, I think it is wise to focus the optimization efforts where 
>> it matters the most: C2.
>
> Point taken. I agree that the focus should be on performance.
>
> But I'd include stubs as well. Many of them are extensively used from 
> C2-generated code.

Okay. Any specific stubs you have in mind?If there are some critical 
ones, we can sprinkle some scope objects like I did in the ZGC code.

>
>> I don't think that doing some analysis on the Mach nodes and 
>> injecting the padding only where we actually need it is too
>> complicated in C2 (which I believe, at least for now, is where we 
>> should focus).
>
> I'm curious is there anything special about Mach IR which 
> helps/simplifies the analysis?

Yeah, the fact that you can walk the graph and tag the problematic 
combinations of mach nodes in one pass, and then fuse the alignment with 
that knowledge when code is emitted.

> One problem I see with it is that some mach nodes issue local branches 
> for their own purposes (inside MachNode::emit()) and you can't 
> mitigate such cases on the level of Mach IR.

Sure. Things like the MacroAssembler::fast_lock which is directly 
injected into the emission of a Mach node will have some internal 
branches that my analysis will not find.
I do have concerns though about injecting magic into the MacroAssembler 
that tries to solve this automagically on the assembly level, by having 
the assembler spit out different
instructions than you requested.
The following comment from assembler.hpp captures my thought exactly:

207: // The Abstract Assembler: Pure assembler doing NO optimizations on the
208: // instruction level; i.e., what you write is what you get.
209: // The Assembler is generating code into a CodeBuffer.

I think it is desirable to keep the property that when we tell the 
*Assembler to generate a __ cmp(); __ jcc(); it will do exactly that.
When such assumptions break, any code that has calculated the size of 
instructions, making assumptions about their size, will fail.
For example, any MachNode with hardcoded size() might underestimate how 
much memory is really needed, and code such as nmethod entry barriers
that have calculated the offset to the cmp immediate might suddenly stop 
working because. There is similar code for oop maps where we
calculate offsets into mach nodes with oop maps to describe the PC after 
a call, which will stop working:

// !!!!! Special hack to get all types of calls to specify the byte offset
//?????? from the start of the call to the point where the return address
//?????? will point.
int MachCallStaticJavaNode::ret_addr_offset()
{
 ? int offset = 5; // 5 bytes from start of call to where return address 
points
 ? offset += clear_avx_size();
 ? return offset;
}

Basically, I think you might be able to mitigate more branches on the 
MacroAssembler layer, but I think it would also be more risky, as code 
that was
not built for having random size will start failing, in places we didn't 
think of.I can think of a few, and feel like there are probably other 
places I have not thought about.

So from that point of view, I think I would rather to this on Mach nodes 
where it is safe, and I think we can catch the most important ones there,
and miss a few branches that the macro assembler would have found with 
magic, than apply it to all branches and hope we find all the bugs due 
to unexpected magic.

Do you agree? Or perhaps I misunderstood what you are suggesting.

>> I have made a prototype, what this might look like and it looks like 
>> this:
>> http://cr.openjdk.java.net/~eosterlund/8234160/webrev.01/
>
> Just one more comment: it's weird to see intel_jcc_erratum referenced 
> in shared code. You could #ifdef it for x86-only, but it's much better 
> to move the code to x86-specific location.

Sure, I can move that to an x86 file and make it build only on x86_64.

Thanks,
/Erik

From thomas.stuefe at gmail.com  Fri Nov 22 17:06:52 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Fri, 22 Nov 2019 18:06:52 +0100
Subject: RFR: 8234328: VectorSet::clear can cause fragmentation
In-Reply-To: <aa9bc4db-35cf-d4dd-a1e1-798590690fae@oracle.com>
References: <47dce9ee-0e62-7375-4dff-2924f824ecc6@oracle.com>
 <CAA-vtUzJ_QQU8qewhzVt_tyv-Hot434gLT3+aZb4uwzM+mevPQ@mail.gmail.com>
 <aa9bc4db-35cf-d4dd-a1e1-798590690fae@oracle.com>
Message-ID: <CAA-vtUyWhX7R_Wu4WqyiC57YXWwj9VKAxsQPz7X02AWU1ydCXQ@mail.gmail.com>

Hi Claes,

I was curious how much space we actually waste this way, so I did a little
test. Very simple, counted waste due to not-in-place reallocation, and on
Arena destruction I printed it out.

For a simple java -version -Xcomp, I had ~6000 arenas reporting waste on
destruction, with a median waste of about 2K, outliers of about 250K. This
is just the final waste, I did not count how many separate reallocation
steps were involved.

So, not sure how indicative this result is. I have seen scenarios in the
past where this kind of "reallocation" gets excessive, albeit not in the
compiler.

On Thu, Nov 21, 2019 at 2:19 PM Claes Redestad <claes.redestad at oracle.com>
wrote:

> Hi Thomas,
>
> On 2019-11-19 20:31, Thomas St?fe wrote:
> > Hi Claes,
> >
> > Not that this is wrong, but do we have to live in resource area? I fell
> > over such problems several times already, e.g. with resource-area-backed
> > StringStreams. Maybe it would be better to just forbid resizing of
> > RA-allocated arrays altogether.
>
> this all gets a bit out of scope, but what alternatives do you see to
> living in the resource area in general for something like this?
>

In general, I'd say if we rely on fine granular realloc, Arenas are not a
good option.

For VectorSet in particular, how about a two-layered sparse array instead.
This only works if you know the max number of bits you'll ever need, e.g.
the NodeLimit for the compiler.

Let the first layer be a pointer array of size N to uint32 blocks of size
M, which are to be allocated on demand. If M and N are powers of 2 access
can be simple.

Something like this:

class VectorSet {
  struct bitblock_t { uint32_t bits[M]; }
  bitblock_t* _blocks[N];

  void set_bit(int i) {
     int block_idx = i >> log2(M)
     if (_blocks[block_idx] == NULL) allocate-block
     _blocks[block_idx]->bits[i % M] = 1;
   }

   bool is_set(int i) {
     int block_idx = i >> log2(M)
     return (_blocks[block_idx] != NULL && _blocks[block_idx]->bits[i % M]
== 1);
   }
};

Advantages would be that the blocks would not need reallocating; no copying
data around; you use less memory if the memory is sparse and not all
bitblocks are populated, which also makes VectorSet::size() faster since
you can omit counting NULL blocks; you might have better locality too since
once a block is allocated it never changes position in memory..

Disadvantages would be a bit more memory overhead for the top pointer array
and, on access, one indirection more to resolve.

One could make this more involved for initially-smaller-sets, e.g. first
allocate just a bitblock_t and use this as a small fixed sized VectorSet;
then, should we outgrow it, change to a two-layered VectorSet with the
initial bitblock_t as first block.

I hope this makes any sense :)


>
> >
> > Then there is also the problem with passing RA-allocated arrays down the
> > stack and accidentally resizing them under a different ResourceMark. I
> > am not sure if this could happen with VectorSet though.
>
> AFAICT there's a ResourceMark at the entry point of compilation,
> then all others are restricted to a local scope around logging and
> similar, so it doesn't _look_ like there's any potential issues around.
>
> I guess it'd be nice in general with some sort of debug-only
> ProhibitResourceMark(Arena*) you could wrap around calls into utilities
> out of your control which would assert if any code tries to
> allocate/reallocate/free in a specific resource arena.
>
>
That would be good to have, yes.


> Thanks /Claes
>

Cheers, Thomas


>
> >
> > Thanks, Thomas
> >
> > On Tue, Nov 19, 2019 at 11:16 AM Claes Redestad
> > <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
> >     Webrev: http://cr.openjdk.java.net/~redestad/8234328/open.00/
>

From navy.xliu at gmail.com  Fri Nov 22 18:09:15 2019
From: navy.xliu at gmail.com (Liu Xin)
Date: Fri, 22 Nov 2019 10:09:15 -0800
Subject: [XXS] C1 misses to dump a reason when it inlines successfully
Message-ID: <CAHPdc0nHkkqNGK=_HkwCCdHwqva-Nty9K8fMQvB28dP66N1y7Q@mail.gmail.com>

hi, Reviewers,

Could you review this extremely small change?
Bugs: https://bugs.openjdk.java.net/browse/JDK-8234541
Webrev: https://cr.openjdk.java.net/~xliu/8234541/00/webrev/

When I analyzed PrintInlining, I was confused by the inline message without
any detail. It's not easy for developer to tell if this method is inlined
or not. This patch add a comment "inline by the rules of C1".

I would like to add an explicit reason, but there's no decisive reason in
GraphBuilder::try_inline_full. It just passes all restrict rules. Any other
suggestion would be appreciated.

Thanks,
--lx

From tom.rodriguez at oracle.com  Fri Nov 22 18:27:20 2019
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Fri, 22 Nov 2019 10:27:20 -0800
Subject: RFR 8234359: [JVMCI] invalidate_nmethod_mirror shouldn't use a
 phantom reference
In-Reply-To: <62e3317b-c1f2-fedb-d2b1-a1d0fdc30bed@oracle.com>
References: <9e58f59e-b879-c9d5-be40-01a6b61cc87c@oracle.com>
 <EB276519-BABD-4F83-A762-FEB63BC5EF7F@oracle.com>
 <d0b7c4a2-0fe0-19bd-8887-6cfd234d5e7a@oracle.com>
 <62e3317b-c1f2-fedb-d2b1-a1d0fdc30bed@oracle.com>
Message-ID: <7d7df088-87bc-34ef-0123-a7de32199fbd@oracle.com>


Vladimir Kozlov wrote on 11/21/19 10:21 AM:
> On other hand there is testing failure which seems 8234429.
> 
> May be we should hold this fix until 8234429 is resolved. And retest 
> again after it fixed

Whatever you like is fine with me.  Just let me know.

tom

> 
> Thanks,
> Vladimir
> 
> On 11/21/19 10:15 AM, Vladimir Kozlov wrote:
>> +1
>>
>> Vladimir K
>>
>> On 11/20/19 10:50 PM, Erik ?sterlund wrote:
>>> Hi Tom,
>>>
>>> Looks good.
>>>
>>> Thanks,
>>> /Erik
>>>
>>>> On 21 Nov 2019, at 06:22, Tom Rodriguez <tom.rodriguez at oracle.com> 
>>>> wrote:
>>>>
>>>> ?http://cr.openjdk.java.net/~never/8234359/webrev
>>>> https://bugs.openjdk.java.net/browse/JDK-8234359
>>>>
>>>> While testing the latest JVMCI in JDK11, crashes were occurring 
>>>> during draining of the SATB buffers.? The problem was tracked down 
>>>> to invalidate_nmethod_mirror being called on an nmethod whose 
>>>> InstalledCode instance was also dead in the current GC. Reading this 
>>>> oop using NativeAccess<ON_PHANTOM_OOP_REF> lead to that oop being 
>>>> enqueued in the SATB buffer.? In JDK 14 it appears some other change 
>>>> in G1 disables those barriers at the point this code is executed but 
>>>> in JDK11 no such logic exists.? This code never resurrects that oop 
>>>> so using the normal AS_NO_KEEPALIVE semantics is correct and avoids 
>>>> attempting to enqueue the potentially dead object.
>>>>
>>>> tom
>>>

From tom.rodriguez at oracle.com  Fri Nov 22 18:27:37 2019
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Fri, 22 Nov 2019 10:27:37 -0800
Subject: RFR 8234359: [JVMCI] invalidate_nmethod_mirror shouldn't use a
 phantom reference
In-Reply-To: <EB276519-BABD-4F83-A762-FEB63BC5EF7F@oracle.com>
References: <9e58f59e-b879-c9d5-be40-01a6b61cc87c@oracle.com>
 <EB276519-BABD-4F83-A762-FEB63BC5EF7F@oracle.com>
Message-ID: <fbd36079-5596-1a5e-67b2-3c12afbf1ebc@oracle.com>

Thanks!

tom

Erik ?sterlund wrote on 11/20/19 10:50 PM:
> Hi Tom,
> 
> Looks good.
> 
> Thanks,
> /Erik
> 
>> On 21 Nov 2019, at 06:22, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
>>
>> ?http://cr.openjdk.java.net/~never/8234359/webrev
>> https://bugs.openjdk.java.net/browse/JDK-8234359
>>
>> While testing the latest JVMCI in JDK11, crashes were occurring during draining of the SATB buffers.  The problem was tracked down to invalidate_nmethod_mirror being called on an nmethod whose InstalledCode instance was also dead in the current GC. Reading this oop using NativeAccess<ON_PHANTOM_OOP_REF> lead to that oop being enqueued in the SATB buffer.  In JDK 14 it appears some other change in G1 disables those barriers at the point this code is executed but in JDK11 no such logic exists.  This code never resurrects that oop so using the normal AS_NO_KEEPALIVE semantics is correct and avoids attempting to enqueue the potentially dead object.
>>
>> tom
> 

From vladimir.kozlov at oracle.com  Fri Nov 22 22:44:51 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 22 Nov 2019 14:44:51 -0800
Subject: [14] RFR(S) 8234681: Remove UseJVMCIClassLoader logic from JVMCI code
Message-ID: <adc04e99-803a-6310-26c3-d45718a9cbd6@oracle.com>

https://cr.openjdk.java.net/~kvn/8234681/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8234681

UseJVMCIClassLoader logic is only applicable to graal-jvmci-8. It was ported from graal-jvmci-8 as part of JDK-8220623 
"JVMCI support of libgraal". It is not needed in JDK.

Tested tier1-3

Thanks,
Vladimir

From vladimir.kozlov at oracle.com  Fri Nov 22 22:48:04 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 22 Nov 2019 14:48:04 -0800
Subject: [14] RFR(S) 8234681: Remove UseJVMCIClassLoader logic from JVMCI
 code
In-Reply-To: <adc04e99-803a-6310-26c3-d45718a9cbd6@oracle.com>
References: <adc04e99-803a-6310-26c3-d45718a9cbd6@oracle.com>
Message-ID: <70fac7e7-5f2c-5fc6-2de2-3567a176a03d@oracle.com>

Forgot to say that Author of patch is Doug Simon.

On 11/22/19 2:44 PM, Vladimir Kozlov wrote:
> https://cr.openjdk.java.net/~kvn/8234681/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8234681
> 
> UseJVMCIClassLoader logic is only applicable to graal-jvmci-8. It was ported from graal-jvmci-8 as part of JDK-8220623 
> "JVMCI support of libgraal". It is not needed in JDK.
> 
> Tested tier1-3
> 
> Thanks,
> Vladimir

From sandhya.viswanathan at intel.com  Fri Nov 22 23:51:38 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 22 Nov 2019 23:51:38 +0000
Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is
 specified after JDK-8221092
In-Reply-To: <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com>
References: <DM6PR11MB428392576157AE6E8AB61289EF490@DM6PR11MB4283.namprd11.prod.outlook.com>
 <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com>
Message-ID: <CH2PR11MB4280AFBBB2A821DC9A98B36FEF490@CH2PR11MB4280.namprd11.prod.outlook.com>

Hi Vladimir,

The bug happens like below:

* User specifies -XX:UseAVX=3 on command line.
* On Skylake platform due to the following lines:
    __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset())));
    __ movl(rax, Address(rsi, 0));
    __ cmpl(rax, 0x50654);              // If it is Skylake
    __ jcc(Assembler::equal, legacy_setup);
   The zmm registers are not saved/restored.
* This results in os_supports_avx_vectors() returning false when called from VM_Version::get_processor_features():
  int max_vector_size = 0;
  if (UseSSE < 2) {
    // Vectors (in XMM) are only supported with SSE2+
    // SSE is always 2 on x64.
    max_vector_size = 0;
  } else if (UseAVX == 0 || !os_supports_avx_vectors()) {
    // 16 byte vectors (in XMM) are supported with SSE2+
    max_vector_size = 16;                              ====> This is the point where max_vector_size is set to 16
  } else else if (UseAVX == 1 || UseAVX == 2) {   
     ...
  }
* And so we get UseAVX=3 and max_vector_size = 16.

So the fix is to save/restore zmm registers when the flag is not default and user specifies UseAVX > 2.

On your question regarding why 8221092 needs the code you conditionally exclude:
This was introduced so as not to do any AVX512 execution if not required.  ZMM register save/restore uses AVX512 instruction. 

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com> 
Sent: Friday, November 22, 2019 4:39 AM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092

Hi Sandhya,

> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>.
> 
> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument.
> This should automatically result in MaxVectorSize being set to 64 bytes.
> 
> However post JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes.

Please, elaborate how it happens and how legacy_setup affects it?

Why 8221092 needs the code you conditionally exclude.
Why the following isn't enough?

    if (FLAG_IS_DEFAULT(UseAVX)) {
      FLAG_SET_DEFAULT(UseAVX, use_avx_limit);
+    if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE && 
_stepping < 5) {
+      FLAG_SET_DEFAULT(UseAVX, 2);  //Set UseAVX=2 for Skylake
+    }
    } else if (UseAVX > use_avx_limit) {

Best regards,
Vladimir Ivanov

From sandhya.viswanathan at intel.com  Fri Nov 22 23:58:42 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 22 Nov 2019 23:58:42 +0000
Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is
 specified after JDK-8221092
In-Reply-To: <cb77913b-32cd-2da7-d3d2-3f9c7c92063a@oracle.com>
References: <DM6PR11MB428392576157AE6E8AB61289EF490@DM6PR11MB4283.namprd11.prod.outlook.com>
 <cb77913b-32cd-2da7-d3d2-3f9c7c92063a@oracle.com>
Message-ID: <CH2PR11MB4280849A06066AA8C17F3561EF490@CH2PR11MB4280.namprd11.prod.outlook.com>

Hi Vladimir,

I agree the following code could be moved under use_evex:
+    if (FLAG_IS_DEFAULT(UseAVX)) {
+      __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset())));
+      __ movl(rax, Address(rsi, 0));
+      __ cmpl(rax, 0x50654);              // If it is Skylake
+      __ jcc(Assembler::equal, legacy_setup);
+    }

I will send the updated patch.

I explained in the other email how we are getting MaxVectorSize as 16 when user specifies UseAVX=3 on Skylake.

Best Regards,
Sandhya

-----Original Message-----
From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Vladimir Kozlov
Sent: Thursday, November 21, 2019 6:03 PM
To: hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092

Hi Sandhya,

I think you should put cpuid code added by 8221092 under if (use_evex) checks because if user specified UseAVX=2 the code under (use_evex) will not be executed anyway. Or I am missing something.

I did not get why you said MaxVectorSize is being wrongly set to 16 bytes. It should 32 because it will set UseAVX=1 in current code.

Thanks,
Vladimir

On 11/21/19 5:27 PM, Viswanathan, Sandhya wrote:
> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>.
> 
> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument.
> This should automatically result in MaxVectorSize being set to 64 bytes.
> 
> However post JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes.
> I have a patch which fixes the issue.
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8234610
> Webrev: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.00/
> 
> Please review and approve.
> 
> Best Regards,
> Sandhya
> 
> 

From sandhya.viswanathan at intel.com  Sat Nov 23 00:35:23 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Sat, 23 Nov 2019 00:35:23 +0000
Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is
 specified after JDK-8221092
In-Reply-To: <CH2PR11MB4280849A06066AA8C17F3561EF490@CH2PR11MB4280.namprd11.prod.outlook.com>
References: <DM6PR11MB428392576157AE6E8AB61289EF490@DM6PR11MB4283.namprd11.prod.outlook.com>
 <cb77913b-32cd-2da7-d3d2-3f9c7c92063a@oracle.com>
 <CH2PR11MB4280849A06066AA8C17F3561EF490@CH2PR11MB4280.namprd11.prod.outlook.com>
Message-ID: <CH2PR11MB4280CFA8C4F6CE3BE957D6F5EF480@CH2PR11MB4280.namprd11.prod.outlook.com>

Please find the updated webrev at: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.01/

JBS: https://bugs.openjdk.java.net/browse/JDK-8234610

Best Regards,
Sandhya

-----Original Message-----
From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Viswanathan, Sandhya
Sent: Friday, November 22, 2019 3:59 PM
To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
Subject: RE: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092

Hi Vladimir,

I agree the following code could be moved under use_evex:
+    if (FLAG_IS_DEFAULT(UseAVX)) {
+      __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset())));
+      __ movl(rax, Address(rsi, 0));
+      __ cmpl(rax, 0x50654);              // If it is Skylake
+      __ jcc(Assembler::equal, legacy_setup);
+    }

I will send the updated patch.

I explained in the other email how we are getting MaxVectorSize as 16 when user specifies UseAVX=3 on Skylake.

Best Regards,
Sandhya

-----Original Message-----
From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Vladimir Kozlov
Sent: Thursday, November 21, 2019 6:03 PM
To: hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092

Hi Sandhya,

I think you should put cpuid code added by 8221092 under if (use_evex) checks because if user specified UseAVX=2 the code under (use_evex) will not be executed anyway. Or I am missing something.

I did not get why you said MaxVectorSize is being wrongly set to 16 bytes. It should 32 because it will set UseAVX=1 in current code.

Thanks,
Vladimir

On 11/21/19 5:27 PM, Viswanathan, Sandhya wrote:
> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>.
> 
> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument.
> This should automatically result in MaxVectorSize being set to 64 bytes.
> 
> However post JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes.
> I have a patch which fixes the issue.
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8234610
> Webrev: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.00/
> 
> Please review and approve.
> 
> Best Regards,
> Sandhya
> 
> 

From vladimir.kozlov at oracle.com  Sat Nov 23 00:45:49 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 22 Nov 2019 16:45:49 -0800
Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is
 specified after JDK-8221092
In-Reply-To: <CH2PR11MB4280CFA8C4F6CE3BE957D6F5EF480@CH2PR11MB4280.namprd11.prod.outlook.com>
References: <DM6PR11MB428392576157AE6E8AB61289EF490@DM6PR11MB4283.namprd11.prod.outlook.com>
 <cb77913b-32cd-2da7-d3d2-3f9c7c92063a@oracle.com>
 <CH2PR11MB4280849A06066AA8C17F3561EF490@CH2PR11MB4280.namprd11.prod.outlook.com>
 <CH2PR11MB4280CFA8C4F6CE3BE957D6F5EF480@CH2PR11MB4280.namprd11.prod.outlook.com>
Message-ID: <be684820-314c-4d09-0aa7-c76cabedf3ec@oracle.com>

Looks good.

Thanks,
Vladimir

On 11/22/19 4:35 PM, Viswanathan, Sandhya wrote:
> Please find the updated webrev at: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.01/
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8234610
> 
> Best Regards,
> Sandhya
> 
> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Viswanathan, Sandhya
> Sent: Friday, November 22, 2019 3:59 PM
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: RE: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092
> 
> Hi Vladimir,
> 
> I agree the following code could be moved under use_evex:
> +    if (FLAG_IS_DEFAULT(UseAVX)) {
> +      __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset())));
> +      __ movl(rax, Address(rsi, 0));
> +      __ cmpl(rax, 0x50654);              // If it is Skylake
> +      __ jcc(Assembler::equal, legacy_setup);
> +    }
> 
> I will send the updated patch.
> 
> I explained in the other email how we are getting MaxVectorSize as 16 when user specifies UseAVX=3 on Skylake.
> 
> Best Regards,
> Sandhya
> 
> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Vladimir Kozlov
> Sent: Thursday, November 21, 2019 6:03 PM
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092
> 
> Hi Sandhya,
> 
> I think you should put cpuid code added by 8221092 under if (use_evex) checks because if user specified UseAVX=2 the code under (use_evex) will not be executed anyway. Or I am missing something.
> 
> I did not get why you said MaxVectorSize is being wrongly set to 16 bytes. It should 32 because it will set UseAVX=1 in current code.
> 
> Thanks,
> Vladimir
> 
> On 11/21/19 5:27 PM, Viswanathan, Sandhya wrote:
>> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>.
>>
>> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument.
>> This should automatically result in MaxVectorSize being set to 64 bytes.
>>
>> However post JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes.
>> I have a patch which fixes the issue.
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8234610
>> Webrev: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.00/
>>
>> Please review and approve.
>>
>> Best Regards,
>> Sandhya
>>
>>

From igor.ignatyev at oracle.com  Sat Nov 23 01:13:12 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Fri, 22 Nov 2019 17:13:12 -0800
Subject: [14] RFR(S) 8234681: Remove UseJVMCIClassLoader logic from JVMCI
 code
In-Reply-To: <70fac7e7-5f2c-5fc6-2de2-3567a176a03d@oracle.com>
References: <adc04e99-803a-6310-26c3-d45718a9cbd6@oracle.com>
 <70fac7e7-5f2c-5fc6-2de2-3567a176a03d@oracle.com>
Message-ID: <67B72D86-54F5-4964-972D-B03B220D11D6@oracle.com>

Looks good to me.

-- Igor

> On Nov 22, 2019, at 2:48 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Forgot to say that Author of patch is Doug Simon.
> 
> On 11/22/19 2:44 PM, Vladimir Kozlov wrote:
>> https://cr.openjdk.java.net/~kvn/8234681/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8234681
>> UseJVMCIClassLoader logic is only applicable to graal-jvmci-8. It was ported from graal-jvmci-8 as part of JDK-8220623 "JVMCI support of libgraal". It is not needed in JDK.
>> Tested tier1-3
>> Thanks,
>> Vladimir


From vladimir.kozlov at oracle.com  Sat Nov 23 01:14:08 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 22 Nov 2019 17:14:08 -0800
Subject: [14] RFR(S) 8234681: Remove UseJVMCIClassLoader logic from JVMCI
 code
In-Reply-To: <67B72D86-54F5-4964-972D-B03B220D11D6@oracle.com>
References: <adc04e99-803a-6310-26c3-d45718a9cbd6@oracle.com>
 <70fac7e7-5f2c-5fc6-2de2-3567a176a03d@oracle.com>
 <67B72D86-54F5-4964-972D-B03B220D11D6@oracle.com>
Message-ID: <3cfca173-8a0a-ecce-dd0b-83055e53873d@oracle.com>

Thank you, Igor

Vladimir

On 11/22/19 5:13 PM, Igor Ignatyev wrote:
> Looks good to me.
> 
> -- Igor
> 
>> On Nov 22, 2019, at 2:48 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> Forgot to say that Author of patch is Doug Simon.
>>
>> On 11/22/19 2:44 PM, Vladimir Kozlov wrote:
>>> https://cr.openjdk.java.net/~kvn/8234681/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8234681
>>> UseJVMCIClassLoader logic is only applicable to graal-jvmci-8. It was ported from graal-jvmci-8 as part of JDK-8220623 "JVMCI support of libgraal". It is not needed in JDK.
>>> Tested tier1-3
>>> Thanks,
>>> Vladimir
> 

From sandhya.viswanathan at intel.com  Sat Nov 23 01:18:00 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Sat, 23 Nov 2019 01:18:00 +0000
Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is
 specified after JDK-8221092
In-Reply-To: <be684820-314c-4d09-0aa7-c76cabedf3ec@oracle.com>
References: <DM6PR11MB428392576157AE6E8AB61289EF490@DM6PR11MB4283.namprd11.prod.outlook.com>
 <cb77913b-32cd-2da7-d3d2-3f9c7c92063a@oracle.com>
 <CH2PR11MB4280849A06066AA8C17F3561EF490@CH2PR11MB4280.namprd11.prod.outlook.com>
 <CH2PR11MB4280CFA8C4F6CE3BE957D6F5EF480@CH2PR11MB4280.namprd11.prod.outlook.com>
 <be684820-314c-4d09-0aa7-c76cabedf3ec@oracle.com>
Message-ID: <CH2PR11MB428049BEF52AB0EB73C1C2E0EF480@CH2PR11MB4280.namprd11.prod.outlook.com>

Thank a lot, Vladimir.

Best Regards,
Sandhya


-----Original Message-----
From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Vladimir Kozlov
Sent: Friday, November 22, 2019 4:46 PM
To: hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092

Looks good.

Thanks,
Vladimir

On 11/22/19 4:35 PM, Viswanathan, Sandhya wrote:
> Please find the updated webrev at: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.01/
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8234610
> 
> Best Regards,
> Sandhya
> 
> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Viswanathan, Sandhya
> Sent: Friday, November 22, 2019 3:59 PM
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: RE: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092
> 
> Hi Vladimir,
> 
> I agree the following code could be moved under use_evex:
> +    if (FLAG_IS_DEFAULT(UseAVX)) {
> +      __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset())));
> +      __ movl(rax, Address(rsi, 0));
> +      __ cmpl(rax, 0x50654);              // If it is Skylake
> +      __ jcc(Assembler::equal, legacy_setup);
> +    }
> 
> I will send the updated patch.
> 
> I explained in the other email how we are getting MaxVectorSize as 16 when user specifies UseAVX=3 on Skylake.
> 
> Best Regards,
> Sandhya
> 
> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-bounces at openjdk.java.net> On Behalf Of Vladimir Kozlov
> Sent: Thursday, November 21, 2019 6:03 PM
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092
> 
> Hi Sandhya,
> 
> I think you should put cpuid code added by 8221092 under if (use_evex) checks because if user specified UseAVX=2 the code under (use_evex) will not be executed anyway. Or I am missing something.
> 
> I did not get why you said MaxVectorSize is being wrongly set to 16 bytes. It should 32 because it will set UseAVX=1 in current code.
> 
> Thanks,
> Vladimir
> 
> On 11/21/19 5:27 PM, Viswanathan, Sandhya wrote:
>> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>.
>>
>> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument.
>> This should automatically result in MaxVectorSize being set to 64 bytes.
>>
>> However post JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes.
>> I have a patch which fixes the issue.
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8234610
>> Webrev: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.00/
>>
>> Please review and approve.
>>
>> Best Regards,
>> Sandhya
>>
>>

From dean.long at oracle.com  Sat Nov 23 01:55:36 2019
From: dean.long at oracle.com (Dean Long)
Date: Fri, 22 Nov 2019 17:55:36 -0800
Subject: RFR(S) 8234432: AOT tests failing with 'used 'epsilon gc' is
 different from current 'g1 gc'' after CMS removal
Message-ID: <e4183319-4fe7-b392-8be6-38103f5740d5@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8234432
http://cr.openjdk.java.net/~dlong/8234432/webrev/

The change fixes AOT after CMS was removed.? Previously we relied to a 
Graal enum matching a JDK enum, but now we map from one to the other.

dl

From vladimir.kozlov at oracle.com  Sat Nov 23 02:03:54 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 22 Nov 2019 18:03:54 -0800
Subject: RFR 8234359: [JVMCI] invalidate_nmethod_mirror shouldn't use a
 phantom reference
In-Reply-To: <7d7df088-87bc-34ef-0123-a7de32199fbd@oracle.com>
References: <9e58f59e-b879-c9d5-be40-01a6b61cc87c@oracle.com>
 <EB276519-BABD-4F83-A762-FEB63BC5EF7F@oracle.com>
 <d0b7c4a2-0fe0-19bd-8887-6cfd234d5e7a@oracle.com>
 <62e3317b-c1f2-fedb-d2b1-a1d0fdc30bed@oracle.com>
 <7d7df088-87bc-34ef-0123-a7de32199fbd@oracle.com>
Message-ID: <ea654d4b-1924-0c51-4f88-658cfaa81af2@oracle.com>

Hi Tom

8234429 was just fixed. Please, rebase your changes and test it again.

Thanks,
Vladimir

On 11/22/19 10:27 AM, Tom Rodriguez wrote:
> 
> 
> Vladimir Kozlov wrote on 11/21/19 10:21 AM:
>> On other hand there is testing failure which seems 8234429.
>>
>> May be we should hold this fix until 8234429 is resolved. And retest again after it fixed
> 
> Whatever you like is fine with me.? Just let me know.
> 
> tom
> 
>>
>> Thanks,
>> Vladimir
>>
>> On 11/21/19 10:15 AM, Vladimir Kozlov wrote:
>>> +1
>>>
>>> Vladimir K
>>>
>>> On 11/20/19 10:50 PM, Erik ?sterlund wrote:
>>>> Hi Tom,
>>>>
>>>> Looks good.
>>>>
>>>> Thanks,
>>>> /Erik
>>>>
>>>>> On 21 Nov 2019, at 06:22, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
>>>>>
>>>>> ?http://cr.openjdk.java.net/~never/8234359/webrev
>>>>> https://bugs.openjdk.java.net/browse/JDK-8234359
>>>>>
>>>>> While testing the latest JVMCI in JDK11, crashes were occurring during draining of the SATB buffers.? The problem 
>>>>> was tracked down to invalidate_nmethod_mirror being called on an nmethod whose InstalledCode instance was also dead 
>>>>> in the current GC. Reading this oop using NativeAccess<ON_PHANTOM_OOP_REF> lead to that oop being enqueued in the 
>>>>> SATB buffer.? In JDK 14 it appears some other change in G1 disables those barriers at the point this code is 
>>>>> executed but in JDK11 no such logic exists.? This code never resurrects that oop so using the normal 
>>>>> AS_NO_KEEPALIVE semantics is correct and avoids attempting to enqueue the potentially dead object.
>>>>>
>>>>> tom
>>>>

From vladimir.kozlov at oracle.com  Sat Nov 23 02:37:21 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 22 Nov 2019 18:37:21 -0800
Subject: RFR(S) 8234432: AOT tests failing with 'used 'epsilon gc' is
 different from current 'g1 gc'' after CMS removal
In-Reply-To: <e4183319-4fe7-b392-8be6-38103f5740d5@oracle.com>
References: <e4183319-4fe7-b392-8be6-38103f5740d5@oracle.com>
Message-ID: <5a054eb4-3685-5887-7ea8-6bfb52a56c21@oracle.com>

Hmm. I assumed that Graal should have GCs list which is subset of GCs in Hotspot. But it could be not true since GraalVM 
have to run with JDK 8.

May be we should bailout AOT compilation if GC is unknown in Hotspot instead of recording in library enum 'def' from 
Graal which does not match enum in HotSpot. And check for GC early before we start collecting classes to compile.

Thanks,
Vladimir

On 11/22/19 5:55 PM, Dean Long wrote:
> https://bugs.openjdk.java.net/browse/JDK-8234432
> http://cr.openjdk.java.net/~dlong/8234432/webrev/
> 
> The change fixes AOT after CMS was removed.? Previously we relied to a Graal enum matching a JDK enum, but now we map 
> from one to the other.
> 
> dl

From dean.long at oracle.com  Sat Nov 23 02:47:58 2019
From: dean.long at oracle.com (Dean Long)
Date: Fri, 22 Nov 2019 18:47:58 -0800
Subject: RFR(S) 8234432: AOT tests failing with 'used 'epsilon gc' is
 different from current 'g1 gc'' after CMS removal
In-Reply-To: <5a054eb4-3685-5887-7ea8-6bfb52a56c21@oracle.com>
References: <e4183319-4fe7-b392-8be6-38103f5740d5@oracle.com>
 <5a054eb4-3685-5887-7ea8-6bfb52a56c21@oracle.com>
Message-ID: <6ec6734c-261b-dc65-095e-ade7dffe4e71@oracle.com>

On 11/22/19 6:37 PM, Vladimir Kozlov wrote:
> Hmm. I assumed that Graal should have GCs list which is subset of GCs 
> in Hotspot. But it could be not true since GraalVM have to run with 
> JDK 8.
>
> May be we should bailout AOT compilation if GC is unknown in Hotspot 
> instead of recording in library enum 'def' from Graal which does not 
> match enum in HotSpot. And check for GC early before we start 
> collecting classes to compile.
>

Graal uses the HotSpot flags to determine which GC is being used, so 
there is no way for AOT to store a GC that the underlying HotSpot 
doesn't know about.? The default fall-back of ordinal() + 1 is only for 
pre-JDK14 which doesn't have the CollectedHeap GC constants exported to 
JVMCI.? We could get rid of that if we backport the vmStructs_jvmci.cpp 
change to all the JDK versions that Graal supports.

There is a separate issue, if you try to use a GC that JVMCI/Graal 
doesn't support:

 ?% jaotc -J-XX:+UseZGC java.lang.String

JVMCI Compiler does not support selected GC: z gc

dl
> Thanks,
> Vladimir
>
> On 11/22/19 5:55 PM, Dean Long wrote:
>> https://bugs.openjdk.java.net/browse/JDK-8234432
>> http://cr.openjdk.java.net/~dlong/8234432/webrev/
>>
>> The change fixes AOT after CMS was removed.? Previously we relied to 
>> a Graal enum matching a JDK enum, but now we map from one to the other.
>>
>> dl


From vladimir.kozlov at oracle.com  Sat Nov 23 02:54:05 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 22 Nov 2019 18:54:05 -0800
Subject: RFR(S) 8234432: AOT tests failing with 'used 'epsilon gc' is
 different from current 'g1 gc'' after CMS removal
In-Reply-To: <6ec6734c-261b-dc65-095e-ade7dffe4e71@oracle.com>
References: <e4183319-4fe7-b392-8be6-38103f5740d5@oracle.com>
 <5a054eb4-3685-5887-7ea8-6bfb52a56c21@oracle.com>
 <6ec6734c-261b-dc65-095e-ade7dffe4e71@oracle.com>
Message-ID: <ef29019f-ef31-e357-d40f-dc8f44bb753c@oracle.com>

Got it. My thinking was in reverse ;)

Changes are good.

Vladimir

On 11/22/19 6:47 PM, Dean Long wrote:
> On 11/22/19 6:37 PM, Vladimir Kozlov wrote:
>> Hmm. I assumed that Graal should have GCs list which is subset of GCs in Hotspot. But it could be not true since 
>> GraalVM have to run with JDK 8.
>>
>> May be we should bailout AOT compilation if GC is unknown in Hotspot instead of recording in library enum 'def' from 
>> Graal which does not match enum in HotSpot. And check for GC early before we start collecting classes to compile.
>>
> 
> Graal uses the HotSpot flags to determine which GC is being used, so there is no way for AOT to store a GC that the 
> underlying HotSpot doesn't know about.? The default fall-back of ordinal() + 1 is only for pre-JDK14 which doesn't have 
> the CollectedHeap GC constants exported to JVMCI.? We could get rid of that if we backport the vmStructs_jvmci.cpp 
> change to all the JDK versions that Graal supports.
> 
> There is a separate issue, if you try to use a GC that JVMCI/Graal doesn't support:
> 
>  ?% jaotc -J-XX:+UseZGC java.lang.String
> 
> JVMCI Compiler does not support selected GC: z gc
> 
> dl
>> Thanks,
>> Vladimir
>>
>> On 11/22/19 5:55 PM, Dean Long wrote:
>>> https://bugs.openjdk.java.net/browse/JDK-8234432
>>> http://cr.openjdk.java.net/~dlong/8234432/webrev/
>>>
>>> The change fixes AOT after CMS was removed.? Previously we relied to a Graal enum matching a JDK enum, but now we map 
>>> from one to the other.
>>>
>>> dl
> 

From dean.long at oracle.com  Sat Nov 23 03:22:24 2019
From: dean.long at oracle.com (Dean Long)
Date: Fri, 22 Nov 2019 19:22:24 -0800
Subject: RFR(S) 8234432: AOT tests failing with 'used 'epsilon gc' is
 different from current 'g1 gc'' after CMS removal
In-Reply-To: <ef29019f-ef31-e357-d40f-dc8f44bb753c@oracle.com>
References: <e4183319-4fe7-b392-8be6-38103f5740d5@oracle.com>
 <5a054eb4-3685-5887-7ea8-6bfb52a56c21@oracle.com>
 <6ec6734c-261b-dc65-095e-ade7dffe4e71@oracle.com>
 <ef29019f-ef31-e357-d40f-dc8f44bb753c@oracle.com>
Message-ID: <9e790dd3-aba1-ecad-5d72-583613094f2e@oracle.com>

Thanks Vladimir!

dl

On 11/22/19 6:54 PM, Vladimir Kozlov wrote:
> Got it. My thinking was in reverse ;)
>
> Changes are good.
>
> Vladimir
>
> On 11/22/19 6:47 PM, Dean Long wrote:
>> On 11/22/19 6:37 PM, Vladimir Kozlov wrote:
>>> Hmm. I assumed that Graal should have GCs list which is subset of 
>>> GCs in Hotspot. But it could be not true since GraalVM have to run 
>>> with JDK 8.
>>>
>>> May be we should bailout AOT compilation if GC is unknown in Hotspot 
>>> instead of recording in library enum 'def' from Graal which does not 
>>> match enum in HotSpot. And check for GC early before we start 
>>> collecting classes to compile.
>>>
>>
>> Graal uses the HotSpot flags to determine which GC is being used, so 
>> there is no way for AOT to store a GC that the underlying HotSpot 
>> doesn't know about.? The default fall-back of ordinal() + 1 is only 
>> for pre-JDK14 which doesn't have the CollectedHeap GC constants 
>> exported to JVMCI.? We could get rid of that if we backport the 
>> vmStructs_jvmci.cpp change to all the JDK versions that Graal supports.
>>
>> There is a separate issue, if you try to use a GC that JVMCI/Graal 
>> doesn't support:
>>
>> ??% jaotc -J-XX:+UseZGC java.lang.String
>>
>> JVMCI Compiler does not support selected GC: z gc
>>
>> dl
>>> Thanks,
>>> Vladimir
>>>
>>> On 11/22/19 5:55 PM, Dean Long wrote:
>>>> https://bugs.openjdk.java.net/browse/JDK-8234432
>>>> http://cr.openjdk.java.net/~dlong/8234432/webrev/
>>>>
>>>> The change fixes AOT after CMS was removed.? Previously we relied 
>>>> to a Graal enum matching a JDK enum, but now we map from one to the 
>>>> other.
>>>>
>>>> dl
>>


From lutz.schmidt at sap.com  Mon Nov 25 14:06:12 2019
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Mon, 25 Nov 2019 14:06:12 +0000
Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis
 library
Message-ID: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com>

Dear all, 

may I please request reviews for this small change, fixing a regression in the disassembler. Parameters to the hsdis-<platform> library were not passed on. 

The change was verified to fix the issue by the reporter (Jean-Philippe Bempel, on CC:). jdk/submit tests pending...

Bug:    https://bugs.openjdk.java.net/browse/JDK-8234583 
Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/ 

Thank you,
Lutz 


From vladimir.x.ivanov at oracle.com  Mon Nov 25 15:31:03 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 25 Nov 2019 18:31:03 +0300
Subject: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum
 in C2 load barrier
In-Reply-To: <5c442ef3-be3b-e540-0097-5a5b00f5b94e@oracle.com>
References: <af3039a6-6929-25d8-47b9-afef2d6ddfa6@oracle.com>
 <D7B5FCA3-B3C5-445D-9A4F-39671E4AEFD1@amazon.com>
 <e8482585-3a67-5e07-dfc8-51550b97b650@oracle.com>
 <8fb87bca-a013-e2e0-211f-ebcd5931b8c1@oracle.com>
 <0bf38389-2b1a-3198-acf5-3b526a9944d9@oracle.com>
 <edb5323e-604c-7d7f-cf9b-8c9901979f9e@oracle.com>
 <5c442ef3-be3b-e540-0097-5a5b00f5b94e@oracle.com>
Message-ID: <742a43b3-0d0f-e7e3-8b16-410514cfc855@oracle.com>

Hi Erik,

>> But I'd include stubs as well. Many of them are extensively used from 
>> C2-generated code.
> 
> Okay. Any specific stubs you have in mind?If there are some critical 
> ones, we can sprinkle some scope objects like I did in the ZGC code.

There are intrinsics for compressed strings [1], numerous copy stubs 
[2], trigonometric functions [3].

It would be unfortunate if we have to go over all that code and manually 
instrument all the places where problematic instructions are issued. 
Moreover, the process has to be repeated for new code being added over time.

> I do have concerns though about injecting magic into the MacroAssembler 
> that tries to solve this automagically on the assembly level, by having 
> the assembler spit out different
> instructions than you requested.
> The following comment from assembler.hpp captures my thought exactly:
> 
> 207: // The Abstract Assembler: Pure assembler doing NO optimizations on 
> the
> 208: // instruction level; i.e., what you write is what you get.
> 209: // The Assembler is generating code into a CodeBuffer.

While I see that Assembler follows that (instruction per method), 
MacroAssembler does not: there are cases when generated code differ 
depending on runtime flags (e.g., verification code) or input values 
(e.g., whether AddressLiteral is reachable or not).

> I think it is desirable to keep the property that when we tell the 
> *Assembler to generate a __ cmp(); __ jcc(); it will do exactly that.
> When such assumptions break, any code that has calculated the size of 
> instructions, making assumptions about their size, will fail.
> For example, any MachNode with hardcoded size() might underestimate how 
> much memory is really needed, and code such as nmethod entry barriers
> that have calculated the offset to the cmp immediate might suddenly stop 
> working because. There is similar code for oop maps where we
> calculate offsets into mach nodes with oop maps to describe the PC after 
> a call, which will stop working:
> 
> // !!!!! Special hack to get all types of calls to specify the byte offset
> //?????? from the start of the call to the point where the return address
> //?????? will point.
> int MachCallStaticJavaNode::ret_addr_offset()
> {
>  ? int offset = 5; // 5 bytes from start of call to where return address 
> points
>  ? offset += clear_avx_size();
>  ? return offset;
> }
> 
> Basically, I think you might be able to mitigate more branches on the 
> MacroAssembler layer, but I think it would also be more risky, as code 
> that was
> not built for having random size will start failing, in places we didn't 
> think of.I can think of a few, and feel like there are probably other 
> places I have not thought about.
> 
> So from that point of view, I think I would rather to this on Mach nodes 
> where it is safe, and I think we can catch the most important ones there,
> and miss a few branches that the macro assembler would have found with 
> magic, than apply it to all branches and hope we find all the bugs due 
> to unexpected magic.
> 
> Do you agree? Or perhaps I misunderstood what you are suggesting.

You raise a valid point: there are places in the VM which rely on 
hard-coded instruction sequences. If such instruction changes, all 
relevant places have to be adjusted. And JVM is already very cautious 
about such cases.

I agree with you that MacroAssembler-based more risky, but IMO the risk 
is modest (few places are affected) and manageable (dedicated stress 
mode should greatly improve test effectiveness).

My opinion is that if we are satisfied with the coverage C2 CFG 
instrumentation provides and don't expect any more work on mitigations, 
then there's no motivation in investing into MacroAssembler-based approach.

Otherwise, there are basically 2 options:

   * "opt-in": explicitly mark all the places where mitigations are 
applied, by default nothing is mitigated

   * "opt-out": mitigate everything unless mitigations are explicitly 
disabled

Both approaches provide fine-grained control over what's being 
mitigated, but with "opt-out" there's more code to care about: it's easy 
to miss important cases and too tempting to enable more than we are 100% 
certain about.

Both can be applied to individual CFG nodes and make CFG instrumentation 
redundant.

But if there's a need to instrument large portions of (macro)assembly 
code, then IMO opt-in adds too much in terms of work required, noise (on 
code level), maintenance, and burden for future code changes. So, I 
don't consider it as a feasible option in such situation.

It looks like a mixture of opt-in (explicitly enable in some context: in 
C2 during code emission, particular stub generation, etc) and opt-out 
(on the level of individual instructions) gives the best of both approaches.

But, again, if C2 CFG instrumentation is good enough, then it'll be a 
wasted effort.

So, I envision 3 possible scenarios:

   (1) just instrument Mach IR and be done with it;

   (2) (a) start with Mach IR;
       (b) later it turns out that extensive portions of (macro)assembly 
code have to me instrumented (or, for example, C1/Interpreter)
       (c) implement MacroAssembler mitigations

   (3) start with MacroAssembler mitigations and be done with it
      * doesn't perclude gradual roll out across different subsystems

Mach IR instrumentation (#1/#2) is the safest variant, but it may 
require more work.

#3 is broadly applicable, but also riskier.

What I don't consider as a viable option is C2 CFG instrumentation 
accompanied by numerous per-instruction mitigations scattered across the 
code base.

>>> I have made a prototype, what this might look like and it looks like 
>>> this:
>>> http://cr.openjdk.java.net/~eosterlund/8234160/webrev.01/
>>
>> Just one more comment: it's weird to see intel_jcc_erratum referenced 
>> in shared code. You could #ifdef it for x86-only, but it's much better 
>> to move the code to x86-specific location.
> 
> Sure, I can move that to an x86 file and make it build only on x86_64.

Yes, sounds good. But let's agree on general direction first.

Best regards,
Vladimir Ivanov

[1] 
http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/macroAssembler_x86.hpp#l1666

[2] 
http://hg.openjdk.java.net/jdk/jdk/file/623722a6aeb9/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp

[3] http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/
     macroAssembler_x86_(sin|cos|...).cpp


From vladimir.x.ivanov at oracle.com  Mon Nov 25 16:05:26 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 25 Nov 2019 19:05:26 +0300
Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis
 library
In-Reply-To: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com>
References: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com>
Message-ID: <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com>

Lutz,

Can you elaborate, please, how the patch fixes the problem?

Why did you decide to add the following guards?

+  if ((options() == NULL) || (strlen(options()) == 0)) {

Best regards,
Vladimir Ivanov

On 25.11.2019 17:06, Schmidt, Lutz wrote:
> Dear all,
> 
> may I please request reviews for this small change, fixing a regression in the disassembler. Parameters to the hsdis-<platform> library were not passed on.
> 
> The change was verified to fix the issue by the reporter (Jean-Philippe Bempel, on CC:). jdk/submit tests pending...
> 
> Bug:    https://bugs.openjdk.java.net/browse/JDK-8234583
> Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/
> 
> Thank you,
> Lutz
> 

From lutz.schmidt at sap.com  Mon Nov 25 16:59:21 2019
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Mon, 25 Nov 2019 16:59:21 +0000
Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis
 library
In-Reply-To: <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com>
References: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com>
 <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com>
Message-ID: <FDDC3449-891B-4133-B0E0-E46316CD8422@sap.com>

Hi Vladimir, 

I'm happy to elaborate in more detail about the issue and the fix.

For each decode_env instance which is constructed, process_options() is called. It collects the disassembly options from various sources (Disassembler::pd_cpu_opts() and PrintAssemblyOptions), storing them in the private member "char _option_buf[512]".

Further processing derives static flag settings from these options. Being static, these flags need to be set only once, not every time a decode_env is constructed. 

But that's just one part of the story. It was not taken into account that _option_buf is passed to and analyzed by hsdis-<platform>.so as well. That requires _option_buf to be filled every time a decode_env is constructed. 

Moving
  if (_optionsParsed) return;
after the collect_options() calls heals this deficiency.

I added the guards you question as additional "safety net". After looking at the code again I must admit the guards are not necessary. _option_buf can never be NULL and every invocation of process_options() is directly preceded by a memset(_option_buf, 0, sizeof(_option_buf)). I can remove the guards if you like. 

Please let me know if there are any more questions to be answered. 

Thanks,
Lutz


?On 25.11.19, 17:05, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com> wrote:

    Lutz,
    
    Can you elaborate, please, how the patch fixes the problem?
    
    Why did you decide to add the following guards?
    
    +  if ((options() == NULL) || (strlen(options()) == 0)) {
    
    Best regards,
    Vladimir Ivanov
    
    On 25.11.2019 17:06, Schmidt, Lutz wrote:
    > Dear all,
    > 
    > may I please request reviews for this small change, fixing a regression in the disassembler. Parameters to the hsdis-<platform> library were not passed on.
    > 
    > The change was verified to fix the issue by the reporter (Jean-Philippe Bempel, on CC:). jdk/submit tests pending...
    > 
    > Bug:    https://bugs.openjdk.java.net/browse/JDK-8234583
    > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/
    > 
    > Thank you,
    > Lutz
    > 
    

From vladimir.x.ivanov at oracle.com  Mon Nov 25 18:14:43 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 25 Nov 2019 21:14:43 +0300
Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is
 specified after JDK-8221092
In-Reply-To: <CH2PR11MB4280AFBBB2A821DC9A98B36FEF490@CH2PR11MB4280.namprd11.prod.outlook.com>
References: <DM6PR11MB428392576157AE6E8AB61289EF490@DM6PR11MB4283.namprd11.prod.outlook.com>
 <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com>
 <CH2PR11MB4280AFBBB2A821DC9A98B36FEF490@CH2PR11MB4280.namprd11.prod.outlook.com>
Message-ID: <c0916ad3-e4fa-52ba-398c-cd2782ab6c4a@oracle.com>

Thanks for the clarifications, Sandhya!

Some more questions inlined.
> The bug happens like below:
> 
> * User specifies -XX:UseAVX=3 on command line.
> * On Skylake platform due to the following lines:
>      __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset())));
>      __ movl(rax, Address(rsi, 0));
>      __ cmpl(rax, 0x50654);              // If it is Skylake
>      __ jcc(Assembler::equal, legacy_setup);
>     The zmm registers are not saved/restored.
> * This results in os_supports_avx_vectors() returning false when called from VM_Version::get_processor_features():
>    int max_vector_size = 0;
>    if (UseSSE < 2) {
>      // Vectors (in XMM) are only supported with SSE2+
>      // SSE is always 2 on x64.
>      max_vector_size = 0;
>    } else if (UseAVX == 0 || !os_supports_avx_vectors()) {
>      // 16 byte vectors (in XMM) are supported with SSE2+
>      max_vector_size = 16;                              ====> This is the point where max_vector_size is set to 16
>    } else else if (UseAVX == 1 || UseAVX == 2) {
>       ...
>    }
> * And so we get UseAVX=3 and max_vector_size = 16.
> 
> So the fix is to save/restore zmm registers when the flag is not default and user specifies UseAVX > 2.

Unfortunately, I'm still confused :-(

If it turns os_supports_avx_vectors() == false, why -XX:UseAVX=1 and 
-XX:UseAVX=2 aren't affected on Skylake as well?

> On your question regarding why 8221092 needs the code you conditionally exclude:
> This was introduced so as not to do any AVX512 execution if not required.  ZMM register save/restore uses AVX512 instruction.

Are you talking about completely avoiding execution of AVX512 
instructions on Skylakes if UseAVX < 3?

Considering it is in generate_get_cpu_info() which is run only once 
early at startup, what kind of effects you intend to avoid?

Best regards,
Vladimir Ivanov

> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Friday, November 22, 2019 4:39 AM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092
> 
> Hi Sandhya,
> 
>> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>.
>>
>> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument.
>> This should automatically result in MaxVectorSize being set to 64 bytes.
>>
>> However post JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes.
> 
> Please, elaborate how it happens and how legacy_setup affects it?
> 
> Why 8221092 needs the code you conditionally exclude.
> Why the following isn't enough?
> 
>      if (FLAG_IS_DEFAULT(UseAVX)) {
>        FLAG_SET_DEFAULT(UseAVX, use_avx_limit);
> +    if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE &&
> _stepping < 5) {
> +      FLAG_SET_DEFAULT(UseAVX, 2);  //Set UseAVX=2 for Skylake
> +    }
>      } else if (UseAVX > use_avx_limit) {
> 
> Best regards,
> Vladimir Ivanov
> 

From vladimir.x.ivanov at oracle.com  Mon Nov 25 18:30:54 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 25 Nov 2019 21:30:54 +0300
Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis
 library
In-Reply-To: <FDDC3449-891B-4133-B0E0-E46316CD8422@sap.com>
References: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com>
 <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com>
 <FDDC3449-891B-4133-B0E0-E46316CD8422@sap.com>
Message-ID: <03722e38-d5fc-fe27-3426-49e1e92d9d17@oracle.com>

Thanks for the clarifications, Lutz.

So, I assume you have a typo in the patch then:

+  if ((options() == NULL) || (strlen(options()) == 0)) {
+    // We need to fill the options buffer for each newly created
+    // decode_env instance. The hsdis_* library looks for options
+    // in that buffer.
+    collect_options(Disassembler::pd_cpu_opts());
+    collect_options(PrintAssemblyOptions);
+  }

It performs collect_options() calls only if _option_buf is either NULL 
or "\0".

Also, what about the following updates of instance members?

   if (strstr(options(), "print-raw")) {
     _print_raw = (strstr(options(), "xml") ? 2 : 1);
   }

   if (strstr(options(), "help")) {
     _print_help = true;
   }

BTW should _print_help (along with _helpPrinted) be better turned into 
static member?

Can we make _option_buf static as well? Or do we want to keep a 
defensive copy to pass into hsdis.so?

As an alternative approach to fix the bug, we could create a golden copy 
during parsing instead and then just copy it to _option_buf as part of 
decode_env initialization.

Best regards,
Vladimir Ivanov

On 25.11.2019 19:59, Schmidt, Lutz wrote:
> Hi Vladimir,
> 
> I'm happy to elaborate in more detail about the issue and the fix.
> 
> For each decode_env instance which is constructed, process_options() is called. It collects the disassembly options from various sources (Disassembler::pd_cpu_opts() and PrintAssemblyOptions), storing them in the private member "char _option_buf[512]".
> 
> Further processing derives static flag settings from these options. Being static, these flags need to be set only once, not every time a decode_env is constructed.
> 
> But that's just one part of the story. It was not taken into account that _option_buf is passed to and analyzed by hsdis-<platform>.so as well. That requires _option_buf to be filled every time a decode_env is constructed.
> 
> Moving
>    if (_optionsParsed) return;
> after the collect_options() calls heals this deficiency.
> 
> I added the guards you question as additional "safety net". After looking at the code again I must admit the guards are not necessary. _option_buf can never be NULL and every invocation of process_options() is directly preceded by a memset(_option_buf, 0, sizeof(_option_buf)). I can remove the guards if you like.
> 
> Please let me know if there are any more questions to be answered.
> 
> Thanks,
> Lutz
> 
> 
> ?On 25.11.19, 17:05, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com> wrote:
> 
>      Lutz,
>      
>      Can you elaborate, please, how the patch fixes the problem?
>      
>      Why did you decide to add the following guards?
>      
>      +  if ((options() == NULL) || (strlen(options()) == 0)) {
>      
>      Best regards,
>      Vladimir Ivanov
>      
>      On 25.11.2019 17:06, Schmidt, Lutz wrote:
>      > Dear all,
>      >
>      > may I please request reviews for this small change, fixing a regression in the disassembler. Parameters to the hsdis-<platform> library were not passed on.
>      >
>      > The change was verified to fix the issue by the reporter (Jean-Philippe Bempel, on CC:). jdk/submit tests pending...
>      >
>      > Bug:    https://bugs.openjdk.java.net/browse/JDK-8234583
>      > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/
>      >
>      > Thank you,
>      > Lutz
>      >
>      
> 

From sandhya.viswanathan at intel.com  Mon Nov 25 18:44:48 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Mon, 25 Nov 2019 18:44:48 +0000
Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is
 specified after JDK-8221092
In-Reply-To: <c0916ad3-e4fa-52ba-398c-cd2782ab6c4a@oracle.com>
References: <DM6PR11MB428392576157AE6E8AB61289EF490@DM6PR11MB4283.namprd11.prod.outlook.com>
 <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com>
 <CH2PR11MB4280AFBBB2A821DC9A98B36FEF490@CH2PR11MB4280.namprd11.prod.outlook.com>
 <c0916ad3-e4fa-52ba-398c-cd2782ab6c4a@oracle.com>
Message-ID: <CH2PR11MB42806416E3D0607FDEDD1F7EEF4A0@CH2PR11MB4280.namprd11.prod.outlook.com>

Hi Vladimir,

From the source code, it looks like os_supports_avx_vectors() does two different things based on the AVX level. 
If AVX=3, it checks if save/restore of zmm worked properly.
If AVX=1 or 2, it checks is save/restore of ymm worked properly.
Since only save/restore of zmm is done conditionally for Skylake, the problem is only with AVX=3.  And that is what this patch is trying to fix.

The rest we are entering 8221092 discussion. 
I think Vivek's intent there was not do any AVX 512 instructions at all if AVX < 3 to overcome performance regressions observed by Scott Oaks.

Best Regards,
Sandhya

-----Original Message-----
From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com> 
Sent: Monday, November 25, 2019 10:15 AM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092

Thanks for the clarifications, Sandhya!

Some more questions inlined.
> The bug happens like below:
> 
> * User specifies -XX:UseAVX=3 on command line.
> * On Skylake platform due to the following lines:
>      __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset())));
>      __ movl(rax, Address(rsi, 0));
>      __ cmpl(rax, 0x50654);              // If it is Skylake
>      __ jcc(Assembler::equal, legacy_setup);
>     The zmm registers are not saved/restored.
> * This results in os_supports_avx_vectors() returning false when called from VM_Version::get_processor_features():
>    int max_vector_size = 0;
>    if (UseSSE < 2) {
>      // Vectors (in XMM) are only supported with SSE2+
>      // SSE is always 2 on x64.
>      max_vector_size = 0;
>    } else if (UseAVX == 0 || !os_supports_avx_vectors()) {
>      // 16 byte vectors (in XMM) are supported with SSE2+
>      max_vector_size = 16;                              ====> This is the point where max_vector_size is set to 16
>    } else else if (UseAVX == 1 || UseAVX == 2) {
>       ...
>    }
> * And so we get UseAVX=3 and max_vector_size = 16.
> 
> So the fix is to save/restore zmm registers when the flag is not default and user specifies UseAVX > 2.

Unfortunately, I'm still confused :-(

If it turns os_supports_avx_vectors() == false, why -XX:UseAVX=1 and
-XX:UseAVX=2 aren't affected on Skylake as well?

> On your question regarding why 8221092 needs the code you conditionally exclude:
> This was introduced so as not to do any AVX512 execution if not required.  ZMM register save/restore uses AVX512 instruction.

Are you talking about completely avoiding execution of AVX512 instructions on Skylakes if UseAVX < 3?

Considering it is in generate_get_cpu_info() which is run only once early at startup, what kind of effects you intend to avoid?

Best regards,
Vladimir Ivanov

> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Friday, November 22, 2019 4:39 AM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot 
> compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when 
> UseAVX=3 is specified after JDK-8221092
> 
> Hi Sandhya,
> 
>> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>.
>>
>> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument.
>> This should automatically result in MaxVectorSize being set to 64 bytes.
>>
>> However post JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes.
> 
> Please, elaborate how it happens and how legacy_setup affects it?
> 
> Why 8221092 needs the code you conditionally exclude.
> Why the following isn't enough?
> 
>      if (FLAG_IS_DEFAULT(UseAVX)) {
>        FLAG_SET_DEFAULT(UseAVX, use_avx_limit);
> +    if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE &&
> _stepping < 5) {
> +      FLAG_SET_DEFAULT(UseAVX, 2);  //Set UseAVX=2 for Skylake
> +    }
>      } else if (UseAVX > use_avx_limit) {
> 
> Best regards,
> Vladimir Ivanov
> 

From lutz.schmidt at sap.com  Mon Nov 25 20:03:34 2019
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Mon, 25 Nov 2019 20:03:34 +0000
Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis
 library
In-Reply-To: <03722e38-d5fc-fe27-3426-49e1e92d9d17@oracle.com>
References: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com>
 <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com>
 <FDDC3449-891B-4133-B0E0-E46316CD8422@sap.com>
 <03722e38-d5fc-fe27-3426-49e1e92d9d17@oracle.com>
Message-ID: <30AB85CF-8A51-494A-AA08-9A4C9C2F1EF1@sap.com>

Hi Vladimir,

you are welcome. 

And you are right: the condition should read
    if ((options() != NULL) && (strlen(options()) == 0)) {
to be correct. I suggest we end this discussion and I just remove these checks.

All your other comments are valid. There is an open bug to address and improve the very basic options parsing: https://bugs.openjdk.java.net/browse/JDK-8223765 This task was split off from JDK-8213084.

I would like to cover the improvements you suggest when working on that bug. To make _print_raw work correctly, I suggest to just move 
    if (_optionsParsed) return;
a bit further down. The help text should be printed only once anyway.

Here is a new webrev iteration. It reflects what I suggest: https://cr.openjdk.java.net/~lucy/webrevs/8234583.01/ 

Thanks,
Lutz


?On 25.11.19, 19:30, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com> wrote:

    Thanks for the clarifications, Lutz.
    
    So, I assume you have a typo in the patch then:
    
    +  if ((options() == NULL) || (strlen(options()) == 0)) {
    +    // We need to fill the options buffer for each newly created
    +    // decode_env instance. The hsdis_* library looks for options
    +    // in that buffer.
    +    collect_options(Disassembler::pd_cpu_opts());
    +    collect_options(PrintAssemblyOptions);
    +  }
    
    It performs collect_options() calls only if _option_buf is either NULL 
    or "\0".
    
    Also, what about the following updates of instance members?
    
       if (strstr(options(), "print-raw")) {
         _print_raw = (strstr(options(), "xml") ? 2 : 1);
       }
    
       if (strstr(options(), "help")) {
         _print_help = true;
       }
    
    BTW should _print_help (along with _helpPrinted) be better turned into 
    static member?
    
    Can we make _option_buf static as well? Or do we want to keep a 
    defensive copy to pass into hsdis.so?
    
    As an alternative approach to fix the bug, we could create a golden copy 
    during parsing instead and then just copy it to _option_buf as part of 
    decode_env initialization.
    
    Best regards,
    Vladimir Ivanov
    
    On 25.11.2019 19:59, Schmidt, Lutz wrote:
    > Hi Vladimir,
    > 
    > I'm happy to elaborate in more detail about the issue and the fix.
    > 
    > For each decode_env instance which is constructed, process_options() is called. It collects the disassembly options from various sources (Disassembler::pd_cpu_opts() and PrintAssemblyOptions), storing them in the private member "char _option_buf[512]".
    > 
    > Further processing derives static flag settings from these options. Being static, these flags need to be set only once, not every time a decode_env is constructed.
    > 
    > But that's just one part of the story. It was not taken into account that _option_buf is passed to and analyzed by hsdis-<platform>.so as well. That requires _option_buf to be filled every time a decode_env is constructed.
    > 
    > Moving
    >    if (_optionsParsed) return;
    > after the collect_options() calls heals this deficiency.
    > 
    > I added the guards you question as additional "safety net". After looking at the code again I must admit the guards are not necessary. _option_buf can never be NULL and every invocation of process_options() is directly preceded by a memset(_option_buf, 0, sizeof(_option_buf)). I can remove the guards if you like.
    > 
    > Please let me know if there are any more questions to be answered.
    > 
    > Thanks,
    > Lutz
    > 
    > 
    > On 25.11.19, 17:05, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com> wrote:
    > 
    >      Lutz,
    >      
    >      Can you elaborate, please, how the patch fixes the problem?
    >      
    >      Why did you decide to add the following guards?
    >      
    >      +  if ((options() == NULL) || (strlen(options()) == 0)) {
    >      
    >      Best regards,
    >      Vladimir Ivanov
    >      
    >      On 25.11.2019 17:06, Schmidt, Lutz wrote:
    >      > Dear all,
    >      >
    >      > may I please request reviews for this small change, fixing a regression in the disassembler. Parameters to the hsdis-<platform> library were not passed on.
    >      >
    >      > The change was verified to fix the issue by the reporter (Jean-Philippe Bempel, on CC:). jdk/submit tests pending...
    >      >
    >      > Bug:    https://bugs.openjdk.java.net/browse/JDK-8234583
    >      > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/
    >      >
    >      > Thank you,
    >      > Lutz
    >      >
    >      
    > 
    

From vladimir.x.ivanov at oracle.com  Mon Nov 25 20:23:04 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 25 Nov 2019 23:23:04 +0300
Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is
 specified after JDK-8221092
In-Reply-To: <CH2PR11MB42806416E3D0607FDEDD1F7EEF4A0@CH2PR11MB4280.namprd11.prod.outlook.com>
References: <DM6PR11MB428392576157AE6E8AB61289EF490@DM6PR11MB4283.namprd11.prod.outlook.com>
 <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com>
 <CH2PR11MB4280AFBBB2A821DC9A98B36FEF490@CH2PR11MB4280.namprd11.prod.outlook.com>
 <c0916ad3-e4fa-52ba-398c-cd2782ab6c4a@oracle.com>
 <CH2PR11MB42806416E3D0607FDEDD1F7EEF4A0@CH2PR11MB4280.namprd11.prod.outlook.com>
Message-ID: <a890b2fc-7a7b-6c75-96c0-5940f0ad8692@oracle.com>


>  From the source code, it looks like os_supports_avx_vectors() does two different things based on the AVX level.
> If AVX=3, it checks if save/restore of zmm worked properly.
> If AVX=1 or 2, it checks is save/restore of ymm worked properly.
> Since only save/restore of zmm is done conditionally for Skylake, the problem is only with AVX=3.  And that is what this patch is trying to fix.

Ok, now I see how it works:

os_supports_avx_vectors() uses supports_evex() and supports_avx() to
dispatch which inspect supported CPU features:

   static bool supports_avx()      { return (_features & CPU_AVX) != 0; }

   static bool supports_evex()     { return (_features & CPU_AVX512F) != 
0; }

But VM_Version::get_processor_features() clears detected features 
depending on UseAVX level.

So, as you described os_supports_avx_vectors() goes through different 
paths between UseAVX=3 and UseAVX=1/2.

> The rest we are entering 8221092 discussion.
> I think Vivek's intent there was not do any AVX 512 instructions at all if AVX < 3 to overcome performance regressions observed by Scott Oaks.

My best guess is it helped analysis by eliminating all problematic 
instructions.

Anyway, I'm fine with leaving that code. (Though it would be nice to 
simply get rid of it.)

It looks like all the code after start_simd_check (and checks to 
legacy_save_restore) can go under use_evex check. And it'll make the 
code clearer (I spent too much time reasoning about interactions between 
use_evex and FLAG_IS_DEFAULT(UseAVX) you added).

Otherwise, looks good.

Best regards,
Vladimir Ivanov

> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Monday, November 25, 2019 10:15 AM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092
> 
> Thanks for the clarifications, Sandhya!
> 
> Some more questions inlined.
>> The bug happens like below:
>>
>> * User specifies -XX:UseAVX=3 on command line.
>> * On Skylake platform due to the following lines:
>>       __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset())));
>>       __ movl(rax, Address(rsi, 0));
>>       __ cmpl(rax, 0x50654);              // If it is Skylake
>>       __ jcc(Assembler::equal, legacy_setup);
>>      The zmm registers are not saved/restored.
>> * This results in os_supports_avx_vectors() returning false when called from VM_Version::get_processor_features():
>>     int max_vector_size = 0;
>>     if (UseSSE < 2) {
>>       // Vectors (in XMM) are only supported with SSE2+
>>       // SSE is always 2 on x64.
>>       max_vector_size = 0;
>>     } else if (UseAVX == 0 || !os_supports_avx_vectors()) {
>>       // 16 byte vectors (in XMM) are supported with SSE2+
>>       max_vector_size = 16;                              ====> This is the point where max_vector_size is set to 16
>>     } else else if (UseAVX == 1 || UseAVX == 2) {
>>        ...
>>     }
>> * And so we get UseAVX=3 and max_vector_size = 16.
>>
>> So the fix is to save/restore zmm registers when the flag is not default and user specifies UseAVX > 2.
> 
> Unfortunately, I'm still confused :-(
> 
> If it turns os_supports_avx_vectors() == false, why -XX:UseAVX=1 and
> -XX:UseAVX=2 aren't affected on Skylake as well?
> 
>> On your question regarding why 8221092 needs the code you conditionally exclude:
>> This was introduced so as not to do any AVX512 execution if not required.  ZMM register save/restore uses AVX512 instruction.
> 
> Are you talking about completely avoiding execution of AVX512 instructions on Skylakes if UseAVX < 3?
> 
> Considering it is in generate_get_cpu_info() which is run only once early at startup, what kind of effects you intend to avoid?
> 
> Best regards,
> Vladimir Ivanov
> 
>>
>> Best Regards,
>> Sandhya
>>
>>
>> -----Original Message-----
>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>> Sent: Friday, November 22, 2019 4:39 AM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot
>> compiler <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when
>> UseAVX=3 is specified after JDK-8221092
>>
>> Hi Sandhya,
>>
>>> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>.
>>>
>>> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument.
>>> This should automatically result in MaxVectorSize being set to 64 bytes.
>>>
>>> However post JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes.
>>
>> Please, elaborate how it happens and how legacy_setup affects it?
>>
>> Why 8221092 needs the code you conditionally exclude.
>> Why the following isn't enough?
>>
>>       if (FLAG_IS_DEFAULT(UseAVX)) {
>>         FLAG_SET_DEFAULT(UseAVX, use_avx_limit);
>> +    if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE &&
>> _stepping < 5) {
>> +      FLAG_SET_DEFAULT(UseAVX, 2);  //Set UseAVX=2 for Skylake
>> +    }
>>       } else if (UseAVX > use_avx_limit) {
>>
>> Best regards,
>> Vladimir Ivanov
>>

From vladimir.x.ivanov at oracle.com  Mon Nov 25 20:26:16 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 25 Nov 2019 23:26:16 +0300
Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis
 library
In-Reply-To: <30AB85CF-8A51-494A-AA08-9A4C9C2F1EF1@sap.com>
References: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com>
 <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com>
 <FDDC3449-891B-4133-B0E0-E46316CD8422@sap.com>
 <03722e38-d5fc-fe27-3426-49e1e92d9d17@oracle.com>
 <30AB85CF-8A51-494A-AA08-9A4C9C2F1EF1@sap.com>
Message-ID: <50f96d01-aeca-d2ce-44df-093be8c77310@oracle.com>


> All your other comments are valid. There is an open bug to address and improve the very basic options parsing: https://bugs.openjdk.java.net/browse/JDK-8223765 This task was split off from JDK-8213084.
> 
> I would like to cover the improvements you suggest when working on that bug. To make _print_raw work correctly, I suggest to just move
>      if (_optionsParsed) return;
> a bit further down. The help text should be printed only once anyway.

Ok, I'm fine with addressing it later. And thanks for taking care of it.

> Here is a new webrev iteration. It reflects what I suggest: https://cr.openjdk.java.net/~lucy/webrevs/8234583.01/

Looks good.

Best regards,
Vladimir Ivanov

> ?On 25.11.19, 19:30, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com> wrote:
> 
>      Thanks for the clarifications, Lutz.
>      
>      So, I assume you have a typo in the patch then:
>      
>      +  if ((options() == NULL) || (strlen(options()) == 0)) {
>      +    // We need to fill the options buffer for each newly created
>      +    // decode_env instance. The hsdis_* library looks for options
>      +    // in that buffer.
>      +    collect_options(Disassembler::pd_cpu_opts());
>      +    collect_options(PrintAssemblyOptions);
>      +  }
>      
>      It performs collect_options() calls only if _option_buf is either NULL
>      or "\0".
>      
>      Also, what about the following updates of instance members?
>      
>         if (strstr(options(), "print-raw")) {
>           _print_raw = (strstr(options(), "xml") ? 2 : 1);
>         }
>      
>         if (strstr(options(), "help")) {
>           _print_help = true;
>         }
>      
>      BTW should _print_help (along with _helpPrinted) be better turned into
>      static member?
>      
>      Can we make _option_buf static as well? Or do we want to keep a
>      defensive copy to pass into hsdis.so?
>      
>      As an alternative approach to fix the bug, we could create a golden copy
>      during parsing instead and then just copy it to _option_buf as part of
>      decode_env initialization.
>      
>      Best regards,
>      Vladimir Ivanov
>      
>      On 25.11.2019 19:59, Schmidt, Lutz wrote:
>      > Hi Vladimir,
>      >
>      > I'm happy to elaborate in more detail about the issue and the fix.
>      >
>      > For each decode_env instance which is constructed, process_options() is called. It collects the disassembly options from various sources (Disassembler::pd_cpu_opts() and PrintAssemblyOptions), storing them in the private member "char _option_buf[512]".
>      >
>      > Further processing derives static flag settings from these options. Being static, these flags need to be set only once, not every time a decode_env is constructed.
>      >
>      > But that's just one part of the story. It was not taken into account that _option_buf is passed to and analyzed by hsdis-<platform>.so as well. That requires _option_buf to be filled every time a decode_env is constructed.
>      >
>      > Moving
>      >    if (_optionsParsed) return;
>      > after the collect_options() calls heals this deficiency.
>      >
>      > I added the guards you question as additional "safety net". After looking at the code again I must admit the guards are not necessary. _option_buf can never be NULL and every invocation of process_options() is directly preceded by a memset(_option_buf, 0, sizeof(_option_buf)). I can remove the guards if you like.
>      >
>      > Please let me know if there are any more questions to be answered.
>      >
>      > Thanks,
>      > Lutz
>      >
>      >
>      > On 25.11.19, 17:05, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com> wrote:
>      >
>      >      Lutz,
>      >
>      >      Can you elaborate, please, how the patch fixes the problem?
>      >
>      >      Why did you decide to add the following guards?
>      >
>      >      +  if ((options() == NULL) || (strlen(options()) == 0)) {
>      >
>      >      Best regards,
>      >      Vladimir Ivanov
>      >
>      >      On 25.11.2019 17:06, Schmidt, Lutz wrote:
>      >      > Dear all,
>      >      >
>      >      > may I please request reviews for this small change, fixing a regression in the disassembler. Parameters to the hsdis-<platform> library were not passed on.
>      >      >
>      >      > The change was verified to fix the issue by the reporter (Jean-Philippe Bempel, on CC:). jdk/submit tests pending...
>      >      >
>      >      > Bug:    https://bugs.openjdk.java.net/browse/JDK-8234583
>      >      > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/
>      >      >
>      >      > Thank you,
>      >      > Lutz
>      >      >
>      >
>      >
>      
> 

From lutz.schmidt at sap.com  Mon Nov 25 20:47:57 2019
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Mon, 25 Nov 2019 20:47:57 +0000
Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis
 library
In-Reply-To: <50f96d01-aeca-d2ce-44df-093be8c77310@oracle.com>
References: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com>
 <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com>
 <FDDC3449-891B-4133-B0E0-E46316CD8422@sap.com>
 <03722e38-d5fc-fe27-3426-49e1e92d9d17@oracle.com>
 <30AB85CF-8A51-494A-AA08-9A4C9C2F1EF1@sap.com>
 <50f96d01-aeca-d2ce-44df-093be8c77310@oracle.com>
Message-ID: <7B40FDAF-7E30-4ABC-9E1B-B18B2C139150@sap.com>

Thanks for the review, Vladimir!
Still one to go.
Regards,
Lutz

?On 25.11.19, 21:26, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com> wrote:

    
    > All your other comments are valid. There is an open bug to address and improve the very basic options parsing: https://bugs.openjdk.java.net/browse/JDK-8223765 This task was split off from JDK-8213084.
    > 
    > I would like to cover the improvements you suggest when working on that bug. To make _print_raw work correctly, I suggest to just move
    >      if (_optionsParsed) return;
    > a bit further down. The help text should be printed only once anyway.
    
    Ok, I'm fine with addressing it later. And thanks for taking care of it.
    
    > Here is a new webrev iteration. It reflects what I suggest: https://cr.openjdk.java.net/~lucy/webrevs/8234583.01/
    
    Looks good.
    
    Best regards,
    Vladimir Ivanov
    
    > On 25.11.19, 19:30, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com> wrote:
    > 
    >      Thanks for the clarifications, Lutz.
    >      
    >      So, I assume you have a typo in the patch then:
    >      
    >      +  if ((options() == NULL) || (strlen(options()) == 0)) {
    >      +    // We need to fill the options buffer for each newly created
    >      +    // decode_env instance. The hsdis_* library looks for options
    >      +    // in that buffer.
    >      +    collect_options(Disassembler::pd_cpu_opts());
    >      +    collect_options(PrintAssemblyOptions);
    >      +  }
    >      
    >      It performs collect_options() calls only if _option_buf is either NULL
    >      or "\0".
    >      
    >      Also, what about the following updates of instance members?
    >      
    >         if (strstr(options(), "print-raw")) {
    >           _print_raw = (strstr(options(), "xml") ? 2 : 1);
    >         }
    >      
    >         if (strstr(options(), "help")) {
    >           _print_help = true;
    >         }
    >      
    >      BTW should _print_help (along with _helpPrinted) be better turned into
    >      static member?
    >      
    >      Can we make _option_buf static as well? Or do we want to keep a
    >      defensive copy to pass into hsdis.so?
    >      
    >      As an alternative approach to fix the bug, we could create a golden copy
    >      during parsing instead and then just copy it to _option_buf as part of
    >      decode_env initialization.
    >      
    >      Best regards,
    >      Vladimir Ivanov
    >      
    >      On 25.11.2019 19:59, Schmidt, Lutz wrote:
    >      > Hi Vladimir,
    >      >
    >      > I'm happy to elaborate in more detail about the issue and the fix.
    >      >
    >      > For each decode_env instance which is constructed, process_options() is called. It collects the disassembly options from various sources (Disassembler::pd_cpu_opts() and PrintAssemblyOptions), storing them in the private member "char _option_buf[512]".
    >      >
    >      > Further processing derives static flag settings from these options. Being static, these flags need to be set only once, not every time a decode_env is constructed.
    >      >
    >      > But that's just one part of the story. It was not taken into account that _option_buf is passed to and analyzed by hsdis-<platform>.so as well. That requires _option_buf to be filled every time a decode_env is constructed.
    >      >
    >      > Moving
    >      >    if (_optionsParsed) return;
    >      > after the collect_options() calls heals this deficiency.
    >      >
    >      > I added the guards you question as additional "safety net". After looking at the code again I must admit the guards are not necessary. _option_buf can never be NULL and every invocation of process_options() is directly preceded by a memset(_option_buf, 0, sizeof(_option_buf)). I can remove the guards if you like.
    >      >
    >      > Please let me know if there are any more questions to be answered.
    >      >
    >      > Thanks,
    >      > Lutz
    >      >
    >      >
    >      > On 25.11.19, 17:05, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com> wrote:
    >      >
    >      >      Lutz,
    >      >
    >      >      Can you elaborate, please, how the patch fixes the problem?
    >      >
    >      >      Why did you decide to add the following guards?
    >      >
    >      >      +  if ((options() == NULL) || (strlen(options()) == 0)) {
    >      >
    >      >      Best regards,
    >      >      Vladimir Ivanov
    >      >
    >      >      On 25.11.2019 17:06, Schmidt, Lutz wrote:
    >      >      > Dear all,
    >      >      >
    >      >      > may I please request reviews for this small change, fixing a regression in the disassembler. Parameters to the hsdis-<platform> library were not passed on.
    >      >      >
    >      >      > The change was verified to fix the issue by the reporter (Jean-Philippe Bempel, on CC:). jdk/submit tests pending...
    >      >      >
    >      >      > Bug:    https://bugs.openjdk.java.net/browse/JDK-8234583
    >      >      > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/
    >      >      >
    >      >      > Thank you,
    >      >      > Lutz
    >      >      >
    >      >
    >      >
    >      
    > 
    

From sandhya.viswanathan at intel.com  Mon Nov 25 23:31:42 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Mon, 25 Nov 2019 23:31:42 +0000
Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is
 specified after JDK-8221092
In-Reply-To: <a890b2fc-7a7b-6c75-96c0-5940f0ad8692@oracle.com>
References: <DM6PR11MB428392576157AE6E8AB61289EF490@DM6PR11MB4283.namprd11.prod.outlook.com>
 <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com>
 <CH2PR11MB4280AFBBB2A821DC9A98B36FEF490@CH2PR11MB4280.namprd11.prod.outlook.com>
 <c0916ad3-e4fa-52ba-398c-cd2782ab6c4a@oracle.com>
 <CH2PR11MB42806416E3D0607FDEDD1F7EEF4A0@CH2PR11MB4280.namprd11.prod.outlook.com>
 <a890b2fc-7a7b-6c75-96c0-5940f0ad8692@oracle.com>
Message-ID: <CH2PR11MB4280286FFA113210E565C283EF4A0@CH2PR11MB4280.namprd11.prod.outlook.com>

Hi Vladimir,

Please find below updated webrev with your comments implemented:
http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.02/

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com> 
Sent: Monday, November 25, 2019 12:23 PM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092


>  From the source code, it looks like os_supports_avx_vectors() does two different things based on the AVX level.
> If AVX=3, it checks if save/restore of zmm worked properly.
> If AVX=1 or 2, it checks is save/restore of ymm worked properly.
> Since only save/restore of zmm is done conditionally for Skylake, the problem is only with AVX=3.  And that is what this patch is trying to fix.

Ok, now I see how it works:

os_supports_avx_vectors() uses supports_evex() and supports_avx() to dispatch which inspect supported CPU features:

   static bool supports_avx()      { return (_features & CPU_AVX) != 0; }

   static bool supports_evex()     { return (_features & CPU_AVX512F) != 
0; }

But VM_Version::get_processor_features() clears detected features depending on UseAVX level.

So, as you described os_supports_avx_vectors() goes through different paths between UseAVX=3 and UseAVX=1/2.

> The rest we are entering 8221092 discussion.
> I think Vivek's intent there was not do any AVX 512 instructions at all if AVX < 3 to overcome performance regressions observed by Scott Oaks.

My best guess is it helped analysis by eliminating all problematic instructions.

Anyway, I'm fine with leaving that code. (Though it would be nice to simply get rid of it.)

It looks like all the code after start_simd_check (and checks to
legacy_save_restore) can go under use_evex check. And it'll make the code clearer (I spent too much time reasoning about interactions between use_evex and FLAG_IS_DEFAULT(UseAVX) you added).

Otherwise, looks good.

Best regards,
Vladimir Ivanov

> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Monday, November 25, 2019 10:15 AM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Vladimir 
> Kozlov <vladimir.kozlov at oracle.com>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when 
> UseAVX=3 is specified after JDK-8221092
> 
> Thanks for the clarifications, Sandhya!
> 
> Some more questions inlined.
>> The bug happens like below:
>>
>> * User specifies -XX:UseAVX=3 on command line.
>> * On Skylake platform due to the following lines:
>>       __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset())));
>>       __ movl(rax, Address(rsi, 0));
>>       __ cmpl(rax, 0x50654);              // If it is Skylake
>>       __ jcc(Assembler::equal, legacy_setup);
>>      The zmm registers are not saved/restored.
>> * This results in os_supports_avx_vectors() returning false when called from VM_Version::get_processor_features():
>>     int max_vector_size = 0;
>>     if (UseSSE < 2) {
>>       // Vectors (in XMM) are only supported with SSE2+
>>       // SSE is always 2 on x64.
>>       max_vector_size = 0;
>>     } else if (UseAVX == 0 || !os_supports_avx_vectors()) {
>>       // 16 byte vectors (in XMM) are supported with SSE2+
>>       max_vector_size = 16;                              ====> This is the point where max_vector_size is set to 16
>>     } else else if (UseAVX == 1 || UseAVX == 2) {
>>        ...
>>     }
>> * And so we get UseAVX=3 and max_vector_size = 16.
>>
>> So the fix is to save/restore zmm registers when the flag is not default and user specifies UseAVX > 2.
> 
> Unfortunately, I'm still confused :-(
> 
> If it turns os_supports_avx_vectors() == false, why -XX:UseAVX=1 and
> -XX:UseAVX=2 aren't affected on Skylake as well?
> 
>> On your question regarding why 8221092 needs the code you conditionally exclude:
>> This was introduced so as not to do any AVX512 execution if not required.  ZMM register save/restore uses AVX512 instruction.
> 
> Are you talking about completely avoiding execution of AVX512 instructions on Skylakes if UseAVX < 3?
> 
> Considering it is in generate_get_cpu_info() which is run only once early at startup, what kind of effects you intend to avoid?
> 
> Best regards,
> Vladimir Ivanov
> 
>>
>> Best Regards,
>> Sandhya
>>
>>
>> -----Original Message-----
>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>> Sent: Friday, November 22, 2019 4:39 AM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot 
>> compiler <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when
>> UseAVX=3 is specified after JDK-8221092
>>
>> Hi Sandhya,
>>
>>> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>.
>>>
>>> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument.
>>> This should automatically result in MaxVectorSize being set to 64 bytes.
>>>
>>> However post JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes.
>>
>> Please, elaborate how it happens and how legacy_setup affects it?
>>
>> Why 8221092 needs the code you conditionally exclude.
>> Why the following isn't enough?
>>
>>       if (FLAG_IS_DEFAULT(UseAVX)) {
>>         FLAG_SET_DEFAULT(UseAVX, use_avx_limit);
>> +    if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE &&
>> _stepping < 5) {
>> +      FLAG_SET_DEFAULT(UseAVX, 2);  //Set UseAVX=2 for Skylake
>> +    }
>>       } else if (UseAVX > use_avx_limit) {
>>
>> Best regards,
>> Vladimir Ivanov
>>

From sandhya.viswanathan at intel.com  Tue Nov 26 00:08:55 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Tue, 26 Nov 2019 00:08:55 +0000
Subject: [14] RFR (L): 8234391: C2: Generic vector operands
In-Reply-To: <89904467-5010-129f-6f61-e279cce8936a@oracle.com>
References: <89904467-5010-129f-6f61-e279cce8936a@oracle.com>
Message-ID: <CH2PR11MB42803E5932D6B4A5F991EFEBEF450@CH2PR11MB4280.namprd11.prod.outlook.com>

Could we please get one more review of this patch?

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com> 
Sent: Tuesday, November 19, 2019 6:31 AM
To: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Cc: Bhateja, Jatin <jatin.bhateja at intel.com>; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
Subject: [14] RFR (L): 8234391: C2: Generic vector operands

http://cr.openjdk.java.net/~vlivanov/jbhateja/8234391/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8234391

Introduce generic vector operands and migrate existing usages from fixed sized operands (vec[SDXYZ]) to generic ones.

(It's an updated version of generic vector support posted for review in August, 2019 [1] [2]. AD instruction merges will be handled separately.)

On a high-level it is organized as follows:

   (1) all AD instructions in x86.ad/x86_64.ad/x86_32.ad use vec/legVec;

   (2) at runtime, right after matching is over, a special pass is performed which does:

       * replaces vecOper with vec[SDXYZ] depending on mach node type
          - vector mach nodes capute bottom_type() of their ideal prototype;

       * eliminates redundant reg-to-reg vector moves (MoveVec2Leg
/MoveLeg2Vec)
          - matcher needs them, but they are useless for register allocator (moreover, may cause additional spills);


    (3) after post-selection pass is over, all mach nodes should have fixed-size vector operands.


Some details:

    (1) vec and legVec are marked as "dynamic" operands, so 
post-selection rewriting works


    (2) new logic is guarded by new matcher flag 
(Matcher::supports_generic_vector_operands) which is enabled only on x86


    (3) post-selection analysis is implemented as a single pass over the 
graph and processing individual nodes using their own (for DEF operands) 
or their inputs (USE operands) bottom_type() (which is an instance of 
TypeVect)


    (4) most of the analysis is cross-platform and interface with 
platform-specific code through 3 methods:

      static bool is_generic_reg2reg_move(MachNode* m);
      // distinguishes MoveVec2Leg/MoveLeg2Vec nodes

      static bool is_generic_vector(MachOper* opnd);
      // distinguishes vec/legVec operands

      static MachOper* clone_generic_vector_operand(MachOper* 
generic_opnd, uint ideal_reg);
      // constructs fixed-sized vector operand based on ideal reg
      //   vec    + Op_Vec[SDXYZ] =>    vec[SDXYZ]
      //   legVec + Op_Vec[SDXYZ] => legVec[SDXYZ]


    (5) TEMP operands are handled specially:
      - TEMP uses max_vector_size() to determine what fixed-sized 
operand to use
          * it is needed to cover reductions which don't produce vectors 
but scalars
      - TEMP_DEF inherits fixed-sized operand type from DEF;


    (6) there is limited number of special cases for mach nodes in 
Matcher::get_vector_operand_helper:

        - RShiftCntV/RShiftCntV: though it reports wide vector type as 
Node::bottom_type(), its ideal_reg is VecS! But for vector nodes only 
Node::bottom_type() is captured during matching and not ideal_reg().

        - vshiftcntimm: chain instructions which convert scalar to 
vector don't have vector type.


    (7) idealreg2regmask initialization logic is adjusted to handle 
generic vector operands (see Matcher::get_vector_regmask)


    (8) operand renaming in x86_32.ad & x86_64.ad to avoid name 
conflicts with new vec/legVec operands


    (9) x86_64.ad: all TEMP usages of vecS/legVecS are replaced with 
regD/legRegD
       - it aligns the code between x86_64.ad and x86_32.ad
       - strictly speaking, it's illegal to use vector operands on a 
non-vector node (e.g., string_inflate) unless its usage is guarded by C2 
vector support checks (-XX:MaxVectorSize=0)


Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
Reviewed-by: vlivanov, sviswanathan, ?

Testing: tier1-tier4, jtreg compiler tests on KNL and SKL,
          performance testing (SPEC* + Octane + micros / G1 + ParGC).

Best regards,
Vladimir Ivanov

[1] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html

[2] 
http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf

From nick.gasson at arm.com  Tue Nov 26 09:25:03 2019
From: nick.gasson at arm.com (Nick Gasson)
Date: Tue, 26 Nov 2019 17:25:03 +0800
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
Message-ID: <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>

Hi Andrew,

> 
> I see
> 
>    if (use_XOR_for_compressed_class_base) {
>      if (CompressedKlassPointers::shift() != 0) {
>        eor(dst, src, (uint64_t)CompressedKlassPointers::base());
>        lsr(dst, dst, LogKlassAlignmentInBytes);
>      } else {
>        eor(dst, src, (uint64_t)CompressedKlassPointers::base());
>      }
>      return;
>    }
> 
>    if (((uint64_t)CompressedKlassPointers::base() & 0xffffffff) == 0
>        && CompressedKlassPointers::shift() == 0) {
>      movw(dst, src);
>      return;
>    }
> 
>    ... followed by code which does use r27.
> 
> Do you ever see r27 being used? If so, I'd be interested to know how
> this gets triggered and what command-line arguments you use. It's
> rather inefficient.
> 

Oddly enough the test case runtime/memory/ReadFromNoaccessArea.java now 
hits this. I see:

CompressedKlassPointers::base() => 0xffff0b4b5000
CompressedKlassPointers::shift() => 3

The itable stub calls MacroAssembler::load_klass() twice which then 
calls the above decode_klass_not_null() with dst==src if 
UseCompressedClassPointers is true. So we do the saving/restoring 
rheapbase dance twice which blows up the size of the itable stub beyond 
the estimated 152B max size.

The key is that this test passes -XX:HeapBaseMinAddress=33G. That in 
conjunction with the recent changes to where the CDS archive is loaded 
hits this code path (I don't see this with -Xshare:off).


Thanks,
Nick

From lutz.schmidt at sap.com  Tue Nov 26 09:25:33 2019
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Tue, 26 Nov 2019 09:25:33 +0000
Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis
 library
In-Reply-To: <CALrzOLvspdhsFUhb2BWUdBa9wX6J3OjU-Ncq3iDhfEBz9Gk3Gg@mail.gmail.com>
References: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com>
 <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com>
 <FDDC3449-891B-4133-B0E0-E46316CD8422@sap.com>
 <03722e38-d5fc-fe27-3426-49e1e92d9d17@oracle.com>
 <30AB85CF-8A51-494A-AA08-9A4C9C2F1EF1@sap.com>
 <50f96d01-aeca-d2ce-44df-093be8c77310@oracle.com>
 <7B40FDAF-7E30-4ABC-9E1B-B18B2C139150@sap.com>
 <CALrzOLvspdhsFUhb2BWUdBa9wX6J3OjU-Ncq3iDhfEBz9Gk3Gg@mail.gmail.com>
Message-ID: <8F130FCB-4AA3-431F-A1D8-718EF80D010F@sap.com>

Thank you, Jean-Philippe, for checking once again.
Regards,
Lutz

From: Jean-Philippe BEMPEL <jean-philippe at bempel.fr>
Date: Tuesday, 26. November 2019 at 10:19
To: Lutz Schmidt <lutz.schmidt at sap.com>
Cc: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>, "hotspot-compiler-dev at openjdk.java.net" <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis library

Hello,

With last review, still works for me.

Thanks

On Mon, Nov 25, 2019 at 9:47 PM Schmidt, Lutz <lutz.schmidt at sap.com<mailto:lutz.schmidt at sap.com>> wrote:
Thanks for the review, Vladimir!
Still one to go.
Regards,
Lutz

On 25.11.19, 21:26, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com<mailto:vladimir.x.ivanov at oracle.com>> wrote:


    > All your other comments are valid. There is an open bug to address and improve the very basic options parsing: https://bugs.openjdk.java.net/browse/JDK-8223765 This task was split off from JDK-8213084.
    >
    > I would like to cover the improvements you suggest when working on that bug. To make _print_raw work correctly, I suggest to just move
    >      if (_optionsParsed) return;
    > a bit further down. The help text should be printed only once anyway.

    Ok, I'm fine with addressing it later. And thanks for taking care of it.

    > Here is a new webrev iteration. It reflects what I suggest: https://cr.openjdk.java.net/~lucy/webrevs/8234583.01/

    Looks good.

    Best regards,
    Vladimir Ivanov

    > On 25.11.19, 19:30, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com<mailto:vladimir.x.ivanov at oracle.com>> wrote:
    >
    >      Thanks for the clarifications, Lutz.
    >
    >      So, I assume you have a typo in the patch then:
    >
    >      +  if ((options() == NULL) || (strlen(options()) == 0)) {
    >      +    // We need to fill the options buffer for each newly created
    >      +    // decode_env instance. The hsdis_* library looks for options
    >      +    // in that buffer.
    >      +    collect_options(Disassembler::pd_cpu_opts());
    >      +    collect_options(PrintAssemblyOptions);
    >      +  }
    >
    >      It performs collect_options() calls only if _option_buf is either NULL
    >      or "\0".
    >
    >      Also, what about the following updates of instance members?
    >
    >         if (strstr(options(), "print-raw")) {
    >           _print_raw = (strstr(options(), "xml") ? 2 : 1);
    >         }
    >
    >         if (strstr(options(), "help")) {
    >           _print_help = true;
    >         }
    >
    >      BTW should _print_help (along with _helpPrinted) be better turned into
    >      static member?
    >
    >      Can we make _option_buf static as well? Or do we want to keep a
    >      defensive copy to pass into hsdis.so?
    >
    >      As an alternative approach to fix the bug, we could create a golden copy
    >      during parsing instead and then just copy it to _option_buf as part of
    >      decode_env initialization.
    >
    >      Best regards,
    >      Vladimir Ivanov
    >
    >      On 25.11.2019 19:59, Schmidt, Lutz wrote:
    >      > Hi Vladimir,
    >      >
    >      > I'm happy to elaborate in more detail about the issue and the fix.
    >      >
    >      > For each decode_env instance which is constructed, process_options() is called. It collects the disassembly options from various sources (Disassembler::pd_cpu_opts() and PrintAssemblyOptions), storing them in the private member "char _option_buf[512]".
    >      >
    >      > Further processing derives static flag settings from these options. Being static, these flags need to be set only once, not every time a decode_env is constructed.
    >      >
    >      > But that's just one part of the story. It was not taken into account that _option_buf is passed to and analyzed by hsdis-<platform>.so as well. That requires _option_buf to be filled every time a decode_env is constructed.
    >      >
    >      > Moving
    >      >    if (_optionsParsed) return;
    >      > after the collect_options() calls heals this deficiency.
    >      >
    >      > I added the guards you question as additional "safety net". After looking at the code again I must admit the guards are not necessary. _option_buf can never be NULL and every invocation of process_options() is directly preceded by a memset(_option_buf, 0, sizeof(_option_buf)). I can remove the guards if you like.
    >      >
    >      > Please let me know if there are any more questions to be answered.
    >      >
    >      > Thanks,
    >      > Lutz
    >      >
    >      >
    >      > On 25.11.19, 17:05, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com<mailto:vladimir.x.ivanov at oracle.com>> wrote:
    >      >
    >      >      Lutz,
    >      >
    >      >      Can you elaborate, please, how the patch fixes the problem?
    >      >
    >      >      Why did you decide to add the following guards?
    >      >
    >      >      +  if ((options() == NULL) || (strlen(options()) == 0)) {
    >      >
    >      >      Best regards,
    >      >      Vladimir Ivanov
    >      >
    >      >      On 25.11.2019 17:06, Schmidt, Lutz wrote:
    >      >      > Dear all,
    >      >      >
    >      >      > may I please request reviews for this small change, fixing a regression in the disassembler. Parameters to the hsdis-<platform> library were not passed on.
    >      >      >
    >      >      > The change was verified to fix the issue by the reporter (Jean-Philippe Bempel, on CC:). jdk/submit tests pending...
    >      >      >
    >      >      > Bug:    https://bugs.openjdk.java.net/browse/JDK-8234583
    >      >      > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/
    >      >      >
    >      >      > Thank you,
    >      >      > Lutz
    >      >      >
    >      >
    >      >
    >
    >


From vladimir.x.ivanov at oracle.com  Tue Nov 26 09:26:20 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 26 Nov 2019 12:26:20 +0300
Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is
 specified after JDK-8221092
In-Reply-To: <CH2PR11MB4280286FFA113210E565C283EF4A0@CH2PR11MB4280.namprd11.prod.outlook.com>
References: <DM6PR11MB428392576157AE6E8AB61289EF490@DM6PR11MB4283.namprd11.prod.outlook.com>
 <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com>
 <CH2PR11MB4280AFBBB2A821DC9A98B36FEF490@CH2PR11MB4280.namprd11.prod.outlook.com>
 <c0916ad3-e4fa-52ba-398c-cd2782ab6c4a@oracle.com>
 <CH2PR11MB42806416E3D0607FDEDD1F7EEF4A0@CH2PR11MB4280.namprd11.prod.outlook.com>
 <a890b2fc-7a7b-6c75-96c0-5940f0ad8692@oracle.com>
 <CH2PR11MB4280286FFA113210E565C283EF4A0@CH2PR11MB4280.namprd11.prod.outlook.com>
Message-ID: <e900029f-7d42-0d01-4311-15b37f6ee15c@oracle.com>


> http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.02/

Looks good.

Testing results (hs-precheckin-comp,hs-tier1,hs-tier2) are clean.

Best regards,
Vladimir Ivanov

> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Monday, November 25, 2019 12:23 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092
> 
> 
>>   From the source code, it looks like os_supports_avx_vectors() does two different things based on the AVX level.
>> If AVX=3, it checks if save/restore of zmm worked properly.
>> If AVX=1 or 2, it checks is save/restore of ymm worked properly.
>> Since only save/restore of zmm is done conditionally for Skylake, the problem is only with AVX=3.  And that is what this patch is trying to fix.
> 
> Ok, now I see how it works:
> 
> os_supports_avx_vectors() uses supports_evex() and supports_avx() to dispatch which inspect supported CPU features:
> 
>     static bool supports_avx()      { return (_features & CPU_AVX) != 0; }
> 
>     static bool supports_evex()     { return (_features & CPU_AVX512F) !=
> 0; }
> 
> But VM_Version::get_processor_features() clears detected features depending on UseAVX level.
> 
> So, as you described os_supports_avx_vectors() goes through different paths between UseAVX=3 and UseAVX=1/2.
> 
>> The rest we are entering 8221092 discussion.
>> I think Vivek's intent there was not do any AVX 512 instructions at all if AVX < 3 to overcome performance regressions observed by Scott Oaks.
> 
> My best guess is it helped analysis by eliminating all problematic instructions.
> 
> Anyway, I'm fine with leaving that code. (Though it would be nice to simply get rid of it.)
> 
> It looks like all the code after start_simd_check (and checks to
> legacy_save_restore) can go under use_evex check. And it'll make the code clearer (I spent too much time reasoning about interactions between use_evex and FLAG_IS_DEFAULT(UseAVX) you added).
> 
> Otherwise, looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
>> -----Original Message-----
>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>> Sent: Monday, November 25, 2019 10:15 AM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Vladimir
>> Kozlov <vladimir.kozlov at oracle.com>
>> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when
>> UseAVX=3 is specified after JDK-8221092
>>
>> Thanks for the clarifications, Sandhya!
>>
>> Some more questions inlined.
>>> The bug happens like below:
>>>
>>> * User specifies -XX:UseAVX=3 on command line.
>>> * On Skylake platform due to the following lines:
>>>        __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset())));
>>>        __ movl(rax, Address(rsi, 0));
>>>        __ cmpl(rax, 0x50654);              // If it is Skylake
>>>        __ jcc(Assembler::equal, legacy_setup);
>>>       The zmm registers are not saved/restored.
>>> * This results in os_supports_avx_vectors() returning false when called from VM_Version::get_processor_features():
>>>      int max_vector_size = 0;
>>>      if (UseSSE < 2) {
>>>        // Vectors (in XMM) are only supported with SSE2+
>>>        // SSE is always 2 on x64.
>>>        max_vector_size = 0;
>>>      } else if (UseAVX == 0 || !os_supports_avx_vectors()) {
>>>        // 16 byte vectors (in XMM) are supported with SSE2+
>>>        max_vector_size = 16;                              ====> This is the point where max_vector_size is set to 16
>>>      } else else if (UseAVX == 1 || UseAVX == 2) {
>>>         ...
>>>      }
>>> * And so we get UseAVX=3 and max_vector_size = 16.
>>>
>>> So the fix is to save/restore zmm registers when the flag is not default and user specifies UseAVX > 2.
>>
>> Unfortunately, I'm still confused :-(
>>
>> If it turns os_supports_avx_vectors() == false, why -XX:UseAVX=1 and
>> -XX:UseAVX=2 aren't affected on Skylake as well?
>>
>>> On your question regarding why 8221092 needs the code you conditionally exclude:
>>> This was introduced so as not to do any AVX512 execution if not required.  ZMM register save/restore uses AVX512 instruction.
>>
>> Are you talking about completely avoiding execution of AVX512 instructions on Skylakes if UseAVX < 3?
>>
>> Considering it is in generate_get_cpu_info() which is run only once early at startup, what kind of effects you intend to avoid?
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>>
>>> Best Regards,
>>> Sandhya
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>> Sent: Friday, November 22, 2019 4:39 AM
>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot
>>> compiler <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when
>>> UseAVX=3 is specified after JDK-8221092
>>>
>>> Hi Sandhya,
>>>
>>>> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>.
>>>>
>>>> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument.
>>>> This should automatically result in MaxVectorSize being set to 64 bytes.
>>>>
>>>> However post JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes.
>>>
>>> Please, elaborate how it happens and how legacy_setup affects it?
>>>
>>> Why 8221092 needs the code you conditionally exclude.
>>> Why the following isn't enough?
>>>
>>>        if (FLAG_IS_DEFAULT(UseAVX)) {
>>>          FLAG_SET_DEFAULT(UseAVX, use_avx_limit);
>>> +    if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE &&
>>> _stepping < 5) {
>>> +      FLAG_SET_DEFAULT(UseAVX, 2);  //Set UseAVX=2 for Skylake
>>> +    }
>>>        } else if (UseAVX > use_avx_limit) {
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>

From claes.redestad at oracle.com  Tue Nov 26 09:44:36 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Tue, 26 Nov 2019 10:44:36 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
Message-ID: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>

Hi,

in various places in the hotspot we have custom code to calculate the
next power of two, some of which have potential to go into an infinite 
loop in case of an overflow.

This patch proposes adding next_power_of_two utility methods which
avoid infinite loops on overflow, while providing slightly more
efficient code in most cases.

Bug:    https://bugs.openjdk.java.net/browse/JDK-8234331
Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/

Testing: tier1-3

Thanks!

/Claes

From david.holmes at oracle.com  Tue Nov 26 09:50:22 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 26 Nov 2019 19:50:22 +1000
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
Message-ID: <d12ae70f-77ad-b2d5-8af6-87a9d13a725f@oracle.com>

Hi Claes,

Just some high-level comments

- should next_power_of_two be defined in globalDefinitions.hpp along 
side the related functionality ie is_power_of_two ?

- can next_power_of_two build on the existing log2_* functions (or vice 
versa)?

- do the existing ZUtils not cover the same general area?

./share/gc/z/zUtils.inline.hpp

inline size_t ZUtils::round_up_power_of_2(size_t value) {
   assert(value != 0, "Invalid value");

   if (is_power_of_2(value)) {
     return value;
   }

   return (size_t)1 << (log2_intptr(value) + 1);
}

inline size_t ZUtils::round_down_power_of_2(size_t value) {
   assert(value != 0, "Invalid value");
   return (size_t)1 << log2_intptr(value);
}


Cheers,
David

On 26/11/2019 7:44 pm, Claes Redestad wrote:
> Hi,
> 
> in various places in the hotspot we have custom code to calculate the
> next power of two, some of which have potential to go into an infinite 
> loop in case of an overflow.
> 
> This patch proposes adding next_power_of_two utility methods which
> avoid infinite loops on overflow, while providing slightly more
> efficient code in most cases.
> 
> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8234331
> Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/
> 
> Testing: tier1-3
> 
> Thanks!
> 
> /Claes

From claes.redestad at oracle.com  Tue Nov 26 10:06:29 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Tue, 26 Nov 2019 11:06:29 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <d12ae70f-77ad-b2d5-8af6-87a9d13a725f@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <d12ae70f-77ad-b2d5-8af6-87a9d13a725f@oracle.com>
Message-ID: <2686ec0d-3dac-a0c3-d2c3-0fa5211bc07b@oracle.com>


On 2019-11-26 10:50, David Holmes wrote:
> Hi Claes,
> 
> Just some high-level comments
> 
> - should next_power_of_two be defined in globalDefinitions.hpp along 
> side the related functionality ie is_power_of_two ?

I thought we are trying to move things _out_ of globalDefinitions. I
agree align.hpp might not be the best place, either, though..

> 
> - can next_power_of_two build on the existing log2_* functions (or vice 
> versa)?

Yes, log2_intptr et al could probably be tamed to do a single step
operation, although we'd need to add 64-bit implementations in
count_leading_zeros. At least these log2_* functions already deal with
overflows without looping forever.

> 
> - do the existing ZUtils not cover the same general area?
> 
> ./share/gc/z/zUtils.inline.hpp
> 
> inline size_t ZUtils::round_up_power_of_2(size_t value) {
>  ? assert(value != 0, "Invalid value");
> 
>  ? if (is_power_of_2(value)) {
>  ??? return value;
>  ? }
> 
>  ? return (size_t)1 << (log2_intptr(value) + 1);
> }
> 
> inline size_t ZUtils::round_down_power_of_2(size_t value) {
>  ? assert(value != 0, "Invalid value");
>  ? return (size_t)1 << log2_intptr(value);
> }

round_up_power_of_2 is similar, but not identical (next_power_of_two 
doesn't care if the value is already a power of 2, nor should it).

/Claes

From david.holmes at oracle.com  Tue Nov 26 10:23:12 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 26 Nov 2019 20:23:12 +1000
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <2686ec0d-3dac-a0c3-d2c3-0fa5211bc07b@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <d12ae70f-77ad-b2d5-8af6-87a9d13a725f@oracle.com>
 <2686ec0d-3dac-a0c3-d2c3-0fa5211bc07b@oracle.com>
Message-ID: <2eb6f996-cd85-00cb-b795-dd2eefabd10b@oracle.com>

On 26/11/2019 8:06 pm, Claes Redestad wrote:
> 
> 
> On 2019-11-26 10:50, David Holmes wrote:
>> Hi Claes,
>>
>> Just some high-level comments
>>
>> - should next_power_of_two be defined in globalDefinitions.hpp along 
>> side the related functionality ie is_power_of_two ?
> 
> I thought we are trying to move things _out_ of globalDefinitions. I

We are? I don't recall hearing that. But wherever these go seems they 
all belong together.

> agree align.hpp might not be the best place, either, though..

I thought align.hpp as strange place too. :)

>>
>> - can next_power_of_two build on the existing log2_* functions (or 
>> vice versa)?
> 
> Yes, log2_intptr et al could probably be tamed to do a single step
> operation, although we'd need to add 64-bit implementations in
> count_leading_zeros. At least these log2_* functions already deal with
> overflows without looping forever.
> 
>>
>> - do the existing ZUtils not cover the same general area?
>>
>> ./share/gc/z/zUtils.inline.hpp
>>
>> inline size_t ZUtils::round_up_power_of_2(size_t value) {
>> ?? assert(value != 0, "Invalid value");
>>
>> ?? if (is_power_of_2(value)) {
>> ???? return value;
>> ?? }
>>
>> ?? return (size_t)1 << (log2_intptr(value) + 1);
>> }
>>
>> inline size_t ZUtils::round_down_power_of_2(size_t value) {
>> ?? assert(value != 0, "Invalid value");
>> ?? return (size_t)1 << log2_intptr(value);
>> }
> 
> round_up_power_of_2 is similar, but not identical (next_power_of_two 
> doesn't care if the value is already a power of 2, nor should it).

Okay but seems perhaps these should also be moved out of ZUtils and 
co-located with the other "power of two" functions.

Cheers,
David
-----

> /Claes

From nils.eliasson at oracle.com  Tue Nov 26 10:49:48 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Tue, 26 Nov 2019 11:49:48 +0100
Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles
In-Reply-To: <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com>
References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com>
 <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com>
Message-ID: <49878c82-d174-b2c9-219a-0fe561155c0d@oracle.com>

Patch 2 was not the full patch.

Use this instead:

http://cr.openjdk.java.net/~neliasso/8234520/webrev.03/

Regards,

Nils


On 2019-11-21 12:53, Nils Eliasson wrote:
> I updated this to version 2.
>
> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/
>
> I found a problen running 
> compiler/arguments/TestStressReflectiveCode.java
>
> Even though the clone was created as a oop clone, the type node type 
> returns isa_aryprt. This is caused by the src ptr not being the base 
> pointer. Until I fix that I wanted a more robust test.
>
> In this webrev I split up the is_clonebasic into is_clone_oop and 
> is_clone_array. (is_clone_oop_array is already there). Having a 
> complete set with the three clone types allows for a robust test and 
> easy verification. (The three variants end up in different paths with 
> different GCs).
>
> Regards,
>
> Nils
>
>
> On 2019-11-20 15:25, Nils Eliasson wrote:
>> Hi,
>>
>> I found a few bugs after the enabling of the clone intrinsic in ZGC.
>>
>> 1) The arraycopy clone_basic has the parameters adjusted to work as a 
>> memcopy. For an oop the src is pointing inside the oop to where we 
>> want to start copying. But when we want to do a runtime call to clone 
>> - the parameters are supposed to be the actual src oop and dst oop, 
>> and the size should be the instance size.
>>
>> For now I have made a workaround. What should be done later is using 
>> the offset in the arraycopy node to encode where the payload is, so 
>> that the base pointers are always correct. But that would require 
>> changes to the BarrierSet classes of all GCs. So I leave that for 
>> next release.
>>
>> 2) The size parameter of the TypeFunc for the runtime call has the 
>> wrong type. It was originally Long but missed the upper Half, it was 
>> fixed to INT (JDK-8233834), but that is wrong and causes the compiles 
>> to be skipped. We didn't notice that since they failed silently. That 
>> is also why we didn't notice problem #1 too.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8234520
>>
>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/
>>
>> Please review!
>>
>> Nils
>>

From vladimir.x.ivanov at oracle.com  Tue Nov 26 12:02:06 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 26 Nov 2019 15:02:06 +0300
Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles
In-Reply-To: <49878c82-d174-b2c9-219a-0fe561155c0d@oracle.com>
References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com>
 <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com>
 <49878c82-d174-b2c9-219a-0fe561155c0d@oracle.com>
Message-ID: <58fd5565-4342-ea70-511d-bace68308391@oracle.com>


> http://cr.openjdk.java.net/~neliasso/8234520/webrev.03/

====================================================================

src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp:

+  // The currently modeled arraycopy-clone_basic doesn't have the base 
pointers for src and dst,
+  // rather point at the start of the payload.
+  Node* src_base = get_base_for_arracycopy_clone(phase, src);
+  Node* dst_base = get_base_for_arracycopy_clone(phase, dst);
+
+  // The size must also be increased to match the instance size.
+  int base_off = BarrierSetC2::extract_base_offset(false);
+  Node* full_size = phase->transform_later(new AddLNode(size, 
phase->longcon(base_off >> LogBytesPerLong)));
    Node* const call = phase->make_leaf_call(ctrl,
                                             mem,
                                             clone_type(),
 
ZBarrierSetRuntime::clone_addr(),
                                             "ZBarrierSetRuntime::clone",
                                             TypeRawPtr::BOTTOM,
-                                           src,
-                                           dst,
-                                           size);
+                                           src_base,
+                                           dst_base,
+                                           full_size,
+                                           phase->top());


Do you see any problems with copying object header?

====================================================================

The rest are minor comments:

src/hotspot/share/gc/shared/c2/barrierSetC2.cpp:

-void BarrierSetC2::clone(GraphKit* kit, Node* src, Node* dst, Node* 
size, bool is_array) const {
-  // Exclude the header but include array length to copy by 8 bytes words.
-  // Can't use base_offset_in_bytes(bt) since basic type is unknown.
+int BarrierSetC2::extract_base_offset(bool is_array) {

...

+void BarrierSetC2::clone(GraphKit* kit, Node* src, Node* dst, Node* 
size, bool is_array) const {
+  // Exclude the header but include array length to copy by 8 bytes words.
+  // Can't use base_offset_in_bytes(bt) since basic type is unknown.
+  int base_off = extract_base_offset(is_array);


I'd leave the comment in BarrierSetC2::extract_base_offset(). After the 
refactoring it looks confusing in BarrierSetC2::clone().

Also, considering it's used from zBarrierSetC2.cpp, it would be nice to 
have a more descriptive name. 
BarrierSetC2::arraycopy_payload_base_offset(bool is_array) maybe?


====================================================================

src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp:

+Node* get_base_for_arracycopy_clone(PhaseMacroExpand* phase, Node* n) {

Should get_base_for_arracycopy_clone be static? Also, would be nice if 
the name reflects that it works on instances and not arrays.

====================================================================

+  // The currently modeled arraycopy-clone_basic doesn't have the base 
pointers for src and dst,
+  // rather point at the start of the payload.
+  Node* src_base = get_base_for_arracycopy_clone(phase, src);
+  Node* dst_base = get_base_for_arracycopy_clone(phase, dst);

Another thing: "base" in zBarrierSetC2.cpp and in BarrierSetC2 has 
opposite meaning which is confusing.

Renaming src/dst<->src_base/dst_base in 
ZBarrierSetC2::clone_at_expansion() would improve things.

====================================================================

-  if (src->bottom_type()->isa_aryptr()) {
+  if (ac->is_clone_array()) {
      // Clone primitive array

Is the comment valid? Doesn't it cover object array case as well?


Best regards,
Vladimir Ivanov

> 
> Regards,
> 
> Nils
> 
> 
> On 2019-11-21 12:53, Nils Eliasson wrote:
>> I updated this to version 2.
>>
>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/
>>
>> I found a problen running 
>> compiler/arguments/TestStressReflectiveCode.java
>>
>> Even though the clone was created as a oop clone, the type node type 
>> returns isa_aryprt. This is caused by the src ptr not being the base 
>> pointer. Until I fix that I wanted a more robust test.
>>
>> In this webrev I split up the is_clonebasic into is_clone_oop and 
>> is_clone_array. (is_clone_oop_array is already there). Having a 
>> complete set with the three clone types allows for a robust test and 
>> easy verification. (The three variants end up in different paths with 
>> different GCs).
>>
>> Regards,
>>
>> Nils
>>
>>
>> On 2019-11-20 15:25, Nils Eliasson wrote:
>>> Hi,
>>>
>>> I found a few bugs after the enabling of the clone intrinsic in ZGC.
>>>
>>> 1) The arraycopy clone_basic has the parameters adjusted to work as a 
>>> memcopy. For an oop the src is pointing inside the oop to where we 
>>> want to start copying. But when we want to do a runtime call to clone 
>>> - the parameters are supposed to be the actual src oop and dst oop, 
>>> and the size should be the instance size.
>>>
>>> For now I have made a workaround. What should be done later is using 
>>> the offset in the arraycopy node to encode where the payload is, so 
>>> that the base pointers are always correct. But that would require 
>>> changes to the BarrierSet classes of all GCs. So I leave that for 
>>> next release.
>>>
>>> 2) The size parameter of the TypeFunc for the runtime call has the 
>>> wrong type. It was originally Long but missed the upper Half, it was 
>>> fixed to INT (JDK-8233834), but that is wrong and causes the compiles 
>>> to be skipped. We didn't notice that since they failed silently. That 
>>> is also why we didn't notice problem #1 too.
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8234520
>>>
>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/
>>>
>>> Please review!
>>>
>>> Nils
>>>

From per.liden at oracle.com  Tue Nov 26 13:29:40 2019
From: per.liden at oracle.com (Per Liden)
Date: Tue, 26 Nov 2019 14:29:40 +0100
Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles
In-Reply-To: <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com>
References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com>
 <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com>
Message-ID: <2154a53d-4d36-d26f-9155-5c955796f566@oracle.com>

Hi Nils,

On 11/21/19 12:53 PM, Nils Eliasson wrote:
> I updated this to version 2.
> 
> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/
> 
> I found a problen running compiler/arguments/TestStressReflectiveCode.java
> 
> Even though the clone was created as a oop clone, the type node type 
> returns isa_aryprt. This is caused by the src ptr not being the base 
> pointer. Until I fix that I wanted a more robust test.
> 
> In this webrev I split up the is_clonebasic into is_clone_oop and 
> is_clone_array. (is_clone_oop_array is already there). Having a complete 
> set with the three clone types allows for a robust test and easy 
> verification. (The three variants end up in different paths with 
> different GCs).

A couple of suggestions:

1) Instead of

  CloneOop
  CloneArray
  CloneOopArray

I think we should call the three types:

  CloneInstance
  CloneTypeArray
  CloneOopArray

Since CloneOop is not actually cloning an oop, but an object/instance. 
And being explicit about TypeArray seems like a good thing to avoid any 
confusion about the difference compared to CloneOopArray. I guess 
PrimArray would be an alternative to TypeArray.

And of course, if we change this then the is_clone/set_clone functions 
should follow the same naming convention.

Btw, what about CopyOf and CopyOfRange? Don't they also come in Oop and 
Type versions, or are we handling those differently in some way? Looking 
at the code it looks like they are only used for the oop array case?


2) In zBarrierSerC2.cpp, do you mind if we do like this instead? I find 
that quite a bit easier to read.

[...]
   const Type** domain_fields = TypeTuple::fields(4);
   domain_fields[TypeFunc::Parms + 0] = TypeInstPtr::NOTNULL;  // src
   domain_fields[TypeFunc::Parms + 1] = TypeInstPtr::NOTNULL;  // dst
   domain_fields[TypeFunc::Parms + 2] = TypeLong::LONG;        // size lower
   domain_fields[TypeFunc::Parms + 3] = Type::HALF;            // size upper
   const TypeTuple* domain = TypeTuple::make(TypeFunc::Parms + 4, 
domain_fields);
[...]


3) I'd also like to add some const, adjust indentation, etc, in a few 
places. Instead of listing them here I made a patch, which goes on top 
of yours. This patch also adjusts 2) above. Just shout if you have any 
objections.

http://cr.openjdk.java.net/~pliden/8234520/webrev.03-review

/Per

> 
> Regards,
> 
> Nils
> 
> 
> On 2019-11-20 15:25, Nils Eliasson wrote:
>> Hi,
>>
>> I found a few bugs after the enabling of the clone intrinsic in ZGC.
>>
>> 1) The arraycopy clone_basic has the parameters adjusted to work as a 
>> memcopy. For an oop the src is pointing inside the oop to where we 
>> want to start copying. But when we want to do a runtime call to clone 
>> - the parameters are supposed to be the actual src oop and dst oop, 
>> and the size should be the instance size.
>>
>> For now I have made a workaround. What should be done later is using 
>> the offset in the arraycopy node to encode where the payload is, so 
>> that the base pointers are always correct. But that would require 
>> changes to the BarrierSet classes of all GCs. So I leave that for next 
>> release.
>>
>> 2) The size parameter of the TypeFunc for the runtime call has the 
>> wrong type. It was originally Long but missed the upper Half, it was 
>> fixed to INT (JDK-8233834), but that is wrong and causes the compiles 
>> to be skipped. We didn't notice that since they failed silently. That 
>> is also why we didn't notice problem #1 too.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8234520
>>
>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/
>>
>> Please review!
>>
>> Nils
>>

From christoph.goettschkes at microdoc.com  Tue Nov 26 13:28:23 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Tue, 26 Nov 2019 14:28:23 +0100
Subject: RFR: 823480: [TESTBUG] LoopRotateBadNodeBudget fails for client VMs
 due to Unrecognized VM option PartialPeelNewPhiDelta
Message-ID: <mailman.1.1574775015.28664.hotspot-compiler-dev@openjdk.java.net>

Hi,

please review the following small changeset which fixes the test 
test/hotspot/jtreg/compiler/loopopts/LoopRotateBadNodeBudget.java for 
client VMs.
I simply added vm.compiler2.enabled to the requires tag, since the 
original bug only appeared with the server JIT:

Bug: https://bugs.openjdk.java.net/browse/JDK-8234807
Webrev: http://cr.openjdk.java.net/~cgo/8234807/webrev.00/

Bug which introduced the issue: 
https://bugs.openjdk.java.net/browse/JDK-8231565

Thanks,
Christoph


From patric.hedlin at oracle.com  Tue Nov 26 14:29:38 2019
From: patric.hedlin at oracle.com (Patric Hedlin)
Date: Tue, 26 Nov 2019 15:29:38 +0100
Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
 check
In-Reply-To: <8d451a2e-fb94-8d59-e02e-e3182e115a0b@oracle.com>
References: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
 <8d451a2e-fb94-8d59-e02e-e3182e115a0b@oracle.com>
Message-ID: <ebe18158-3002-cb65-b8cf-8b0bdbcf93c7@oracle.com>

Thanks for reviewing Nils.

/Patric

On 13/11/2019 17:12, Nils Eliasson wrote:
> Hi Patric,
>
> Looks good!
>
> (I have pre-reviewed this patch offline)
>
> Regards,
>
> Nils
>
> On 2019-11-12 15:16, Patric Hedlin wrote:
>> Dear all,
>>
>> I would like to ask for help to review the following change/update:
>>
>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376
>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/
>>
>> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check
>>
>> ??? Adding a simple subsumption test to IfNode::Ideal to enable a local
>> ??? short-circuit for (obviously) redundant if-nodes.
>>
>> Testing: hs-tier1-4, hs-precheckin-comp
>>
>>
>> Best regards,
>> Patric


From patric.hedlin at oracle.com  Tue Nov 26 14:29:51 2019
From: patric.hedlin at oracle.com (Patric Hedlin)
Date: Tue, 26 Nov 2019 15:29:51 +0100
Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
 check
In-Reply-To: <6f1dbc0a-b1a4-43e8-65e1-1e0df2115c33@oracle.com>
References: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
 <6f1dbc0a-b1a4-43e8-65e1-1e0df2115c33@oracle.com>
Message-ID: <416ea976-e907-ace1-f166-f16318fbe6bb@oracle.com>

Thanks Dean,

I should take a look for the full patch.

Best regards,
Patric

On 14/11/2019 06:12, dean.long at oracle.com wrote:
> Hi Patric.? I was expecting the fix to allow the following existing 
> logic in DivINode::Ideal to work:
>
> ? // Check for excluding div-zero case
> ? if (in(0) && (ti->_hi < 0 || ti->_lo > 0)) {
> ??? set_req(0, NULL);?????????? // Yank control input
> ??? return this;
> ? }
>
> by making sure the range of "ti" has been sharpened by the previous 
> if-node.? I was just wondering if you looked at that solution and 
> thought it was feasible.? I see Parse::sharpen_type_after_if() is 
> almost doing the right thing, but only handles BoolTest::eq.
>
> dl
>
> On 11/12/19 6:16 AM, Patric Hedlin wrote:
>> Dear all,
>>
>> I would like to ask for help to review the following change/update:
>>
>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376
>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/
>>
>> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check
>>
>> ??? Adding a simple subsumption test to IfNode::Ideal to enable a local
>> ??? short-circuit for (obviously) redundant if-nodes.
>>
>> Testing: hs-tier1-4, hs-precheckin-comp
>>
>>
>> Best regards,
>> Patric
>>
>


From patric.hedlin at oracle.com  Tue Nov 26 14:30:02 2019
From: patric.hedlin at oracle.com (Patric Hedlin)
Date: Tue, 26 Nov 2019 15:30:02 +0100
Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
 check
In-Reply-To: <VI1PR0201MB2479D5A620B74C860E7595709A4D0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
 <35dd0640-7f48-11ae-bd12-1fcaf893b2fc@oracle.com>
 <VI1PR0201MB2479D5A620B74C860E7595709A4D0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <b6188dcb-2809-89f3-2454-b32e9472c3d3@oracle.com>

Thanks for reviewing Martin.

Best regards,
Patric

On 18/11/2019 12:23, Doerr, Martin wrote:
> Hi Patric,
>
> I'd consider moving subsuming_bool_test_encode up to avoid the prototype.
> But I can also live with it.
>
> Looks good to me.
>
> Best regards,
> Martin
>
>
>> -----Original Message-----
>> From: Patric Hedlin <patric.hedlin at oracle.com>
>> Sent: Montag, 18. November 2019 11:06
>> To: hotspot-compiler-dev at openjdk.java.net; Nils Eliasson
>> <nils.eliasson at oracle.com>; Vladimir Ivanov
>> <vladimir.x.ivanov at oracle.com>; Doerr, Martin <martin.doerr at sap.com>
>> Subject: Re: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
>> check
>>
>> Dear all,
>>
>> Please review the new patch, now reduced to a "minimum" (besides the
>> table encoding).
>>
>> Updated in-place.
>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/
>>
>> Testing: hs-tier1-3
>>
>>
>> Best regards,
>> Patric
>>
>> On 12/11/2019 15:16, Patric Hedlin wrote:
>>> Dear all,
>>>
>>> I would like to ask for help to review the following change/update:
>>>
>>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376
>>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/
>>>
>>> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check
>>>
>>>  ??? Adding a simple subsumption test to IfNode::Ideal to enable a local
>>>  ??? short-circuit for (obviously) redundant if-nodes.
>>>
>>> Testing: hs-tier1-4, hs-precheckin-comp
>>>
>>>
>>> Best regards,
>>> Patric
>>>


From patric.hedlin at oracle.com  Tue Nov 26 14:30:20 2019
From: patric.hedlin at oracle.com (Patric Hedlin)
Date: Tue, 26 Nov 2019 15:30:20 +0100
Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
 check
In-Reply-To: <4314ca44-9094-a4d5-407e-9d9eaf5d4b37@oracle.com>
References: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
 <35dd0640-7f48-11ae-bd12-1fcaf893b2fc@oracle.com>
 <4314ca44-9094-a4d5-407e-9d9eaf5d4b37@oracle.com>
Message-ID: <44fbba54-fdd2-af82-13e7-7bd36f16b752@oracle.com>

Thanks for reviewing Vladimir.


On 18/11/2019 23:56, Vladimir Ivanov wrote:
>
>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/
>
> Looks good, Patric.
>
> Best regards,
> Vladimir Ivanov
>
> PS: some cleanup suggestions (feel free to ignore them if you don't 
> agree):
>
> src/hotspot/share/opto/ifnode.cpp:
>
> +//?? \r1
> +//? r2\? eqT? eqF? neT? neF? ltT? ltF? leT? leF? gtT? gtF? geT geF
> +//? eq??? t??? f??? f??? t??? f??? -??? -??? f??? f??? -??? - f
> +//? ne??? f??? t??? t??? f??? t??? -??? -??? t??? t??? -??? - t
> +//? lt??? f??? -??? -??? f??? t??? f??? -??? f??? f??? -??? f t
> +//? le??? t??? -??? -??? t??? t??? -??? t??? f??? f??? t??? - t
> +//? gt??? f??? -??? -??? f??? f??? -??? f??? t??? t??? f??? - f
> +//? ge??? t??? -??? -??? t??? f??? t??? -??? t??? t??? -??? t f
> +//
> +Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) {
> +? // Table encoding: N/A (na), True-branch (tb), False-branch (fb).
> +? static enum { na, tb, fb } s_subsume_map[6][12] = {
> +? /*rel: eq+T eq+F ne+T ne+F lt+T lt+F le+T le+F gt+T gt+F ge+T ge+F*/
> +? /*eq*/{ tb,? fb,? fb,? tb,? fb,? na,? na,? fb,? fb,? na,? na, fb },
> +? /*ne*/{ fb,? tb,? tb,? fb,? tb,? na,? na,? tb,? tb,? na,? na, tb },
> +? /*lt*/{ fb,? na,? na,? fb,? tb,? fb,? na,? fb,? fb,? na,? fb, tb },
> +? /*le*/{ tb,? na,? na,? tb,? tb,? na,? tb,? fb,? fb,? tb,? na, tb },
> +? /*gt*/{ fb,? na,? na,? fb,? fb,? na,? fb,? tb,? tb,? fb,? na, fb },
> +? /*ge*/{ tb,? na,? na,? tb,? fb,? tb,? na,? tb,? tb,? na,? tb, fb }};
>
> IMO you can dump the table from the comment: it mostly duplicates the 
> code. (Probably, you can use a different name for "N/A" or just refer 
> to it in numeric form (0?) to preserve clean structure of the table 
> from the comment.)
>
> ======================================
>
> +? if (is_If() && (cmp = in(1)->in(1))->Opcode() == Op_CmpP) {
> +??? if (cmp->in(2) != NULL && // make sure cmp is not already dead
> +??????? cmp->in(2)->bottom_type() == TypePtr::NULL_PTR) {
>
> Merge nested ifs?
>
I to have problems with this part but for other reasons. What about: (?)

-? Node* cmp;
 ?? int dist = 4;?????????????? // Cutoff limit for search
-? if (is_If() && (cmp = in(1)->in(1))->Opcode() == Op_CmpP) {
-??? if (cmp->in(2) != NULL && // make sure cmp is not already dead
+? if (is_If() && in(1)->is_Bool()) {
+??? Node* cmp = in(1)->in(1);
+??? if (cmp->Opcode() == Op_CmpP &&
+??????? cmp->in(2) != NULL && // make sure cmp is not already dead

Best regards,
Patric
> ======================================
>
> Looks like extracting the following code into a helper function (along 
> with the enum and the table) can improve readability.
>
> +? int drel = subsuming_bool_test_encode(dom->in(1));
> +? int trel = subsuming_bool_test_encode(bol);
> +? int bout = pre->is_IfFalse() ? 1 : 0;
> +
> +? if (drel < 0 || trel < 0) {
> +??? return NULL;
> +? }
> +? int br = s_subsume_map[trel][2*drel+bout];
> +? if (br == na) {
> +??? return NULL;
> +? }
>
> New function can return intcon(0/1) or bol(or NULL?) and the caller 
> decides whether the update is needed.
>
>> On 12/11/2019 15:16, Patric Hedlin wrote:
>>> Dear all,
>>>
>>> I would like to ask for help to review the following change/update:
>>>
>>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376
>>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/
>>>
>>> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check
>>>
>>> ??? Adding a simple subsumption test to IfNode::Ideal to enable a local
>>> ??? short-circuit for (obviously) redundant if-nodes.
>>>
>>> Testing: hs-tier1-4, hs-precheckin-comp
>>>
>>>
>>> Best regards,
>>> Patric


From ivan.gerasimov at oracle.com  Tue Nov 26 10:14:11 2019
From: ivan.gerasimov at oracle.com (Ivan Gerasimov)
Date: Tue, 26 Nov 2019 02:14:11 -0800
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
Message-ID: <603eb8ec-ea48-42ae-1c6e-92c2165133b7@oracle.com>

Hi Claes!

In the code in align.hpp it is assumed that (1U << 32) == 0, which is 
not guaranteed.

In fact, if the right argument of the shift operator is >= 32 (for 
32-bit left argument) then the behavior is undefined, and thus is 
compiler specific.

With kind regards,

Ivan


On 11/26/19 1:44 AM, Claes Redestad wrote:
> Hi,
>
> in various places in the hotspot we have custom code to calculate the
> next power of two, some of which have potential to go into an infinite 
> loop in case of an overflow.
>
> This patch proposes adding next_power_of_two utility methods which
> avoid infinite loops on overflow, while providing slightly more
> efficient code in most cases.
>
> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8234331
> Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/
>
> Testing: tier1-3
>
> Thanks!
>
> /Claes

-- 
With kind regards,
Ivan Gerasimov


From vladimir.x.ivanov at oracle.com  Tue Nov 26 15:04:17 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 26 Nov 2019 18:04:17 +0300
Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0
 check
In-Reply-To: <44fbba54-fdd2-af82-13e7-7bd36f16b752@oracle.com>
References: <b7784963-c2fc-8139-06b1-a49ea486a620@oracle.com>
 <35dd0640-7f48-11ae-bd12-1fcaf893b2fc@oracle.com>
 <4314ca44-9094-a4d5-407e-9d9eaf5d4b37@oracle.com>
 <44fbba54-fdd2-af82-13e7-7bd36f16b752@oracle.com>
Message-ID: <3ddbec9f-63c8-1c72-6399-e0cbe1758a0b@oracle.com>


>>
>> +? if (is_If() && (cmp = in(1)->in(1))->Opcode() == Op_CmpP) {
>> +??? if (cmp->in(2) != NULL && // make sure cmp is not already dead
>> +??????? cmp->in(2)->bottom_type() == TypePtr::NULL_PTR) {
>>
>> Merge nested ifs?
>>
> I to have problems with this part but for other reasons. What about: (?)
> 
> -? Node* cmp;
>  ?? int dist = 4;?????????????? // Cutoff limit for search
> -? if (is_If() && (cmp = in(1)->in(1))->Opcode() == Op_CmpP) {
> -??? if (cmp->in(2) != NULL && // make sure cmp is not already dead
> +? if (is_If() && in(1)->is_Bool()) {
> +??? Node* cmp = in(1)->in(1);
> +??? if (cmp->Opcode() == Op_CmpP &&
> +??????? cmp->in(2) != NULL && // make sure cmp is not already dead

Looks even better.

Best regards,
Vladimir Ivanov

>> Looks like extracting the following code into a helper function (along 
>> with the enum and the table) can improve readability.
>>
>> +? int drel = subsuming_bool_test_encode(dom->in(1));
>> +? int trel = subsuming_bool_test_encode(bol);
>> +? int bout = pre->is_IfFalse() ? 1 : 0;
>> +
>> +? if (drel < 0 || trel < 0) {
>> +??? return NULL;
>> +? }
>> +? int br = s_subsume_map[trel][2*drel+bout];
>> +? if (br == na) {
>> +??? return NULL;
>> +? }
>>
>> New function can return intcon(0/1) or bol(or NULL?) and the caller 
>> decides whether the update is needed.
>>
>>> On 12/11/2019 15:16, Patric Hedlin wrote:
>>>> Dear all,
>>>>
>>>> I would like to ask for help to review the following change/update:
>>>>
>>>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376
>>>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/
>>>>
>>>> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check
>>>>
>>>> ??? Adding a simple subsumption test to IfNode::Ideal to enable a local
>>>> ??? short-circuit for (obviously) redundant if-nodes.
>>>>
>>>> Testing: hs-tier1-4, hs-precheckin-comp
>>>>
>>>>
>>>> Best regards,
>>>> Patric
> 

From sandhya.viswanathan at intel.com  Tue Nov 26 16:08:59 2019
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Tue, 26 Nov 2019 16:08:59 +0000
Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is
 specified after JDK-8221092
In-Reply-To: <e900029f-7d42-0d01-4311-15b37f6ee15c@oracle.com>
References: <DM6PR11MB428392576157AE6E8AB61289EF490@DM6PR11MB4283.namprd11.prod.outlook.com>
 <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com>
 <CH2PR11MB4280AFBBB2A821DC9A98B36FEF490@CH2PR11MB4280.namprd11.prod.outlook.com>
 <c0916ad3-e4fa-52ba-398c-cd2782ab6c4a@oracle.com>
 <CH2PR11MB42806416E3D0607FDEDD1F7EEF4A0@CH2PR11MB4280.namprd11.prod.outlook.com>
 <a890b2fc-7a7b-6c75-96c0-5940f0ad8692@oracle.com>
 <CH2PR11MB4280286FFA113210E565C283EF4A0@CH2PR11MB4280.namprd11.prod.outlook.com>
 <e900029f-7d42-0d01-4311-15b37f6ee15c@oracle.com>
Message-ID: <CH2PR11MB42801CA4A7EBFDD891DBB25EEF450@CH2PR11MB4280.namprd11.prod.outlook.com>

Hi Vladimir,

Please do sponsor the patch. I am not a committer.

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com> 
Sent: Tuesday, November 26, 2019 1:26 AM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092


> http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.02/

Looks good.

Testing results (hs-precheckin-comp,hs-tier1,hs-tier2) are clean.

Best regards,
Vladimir Ivanov

> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Monday, November 25, 2019 12:23 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Vladimir 
> Kozlov <vladimir.kozlov at oracle.com>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when 
> UseAVX=3 is specified after JDK-8221092
> 
> 
>>   From the source code, it looks like os_supports_avx_vectors() does two different things based on the AVX level.
>> If AVX=3, it checks if save/restore of zmm worked properly.
>> If AVX=1 or 2, it checks is save/restore of ymm worked properly.
>> Since only save/restore of zmm is done conditionally for Skylake, the problem is only with AVX=3.  And that is what this patch is trying to fix.
> 
> Ok, now I see how it works:
> 
> os_supports_avx_vectors() uses supports_evex() and supports_avx() to dispatch which inspect supported CPU features:
> 
>     static bool supports_avx()      { return (_features & CPU_AVX) != 0; }
> 
>     static bool supports_evex()     { return (_features & CPU_AVX512F) !=
> 0; }
> 
> But VM_Version::get_processor_features() clears detected features depending on UseAVX level.
> 
> So, as you described os_supports_avx_vectors() goes through different paths between UseAVX=3 and UseAVX=1/2.
> 
>> The rest we are entering 8221092 discussion.
>> I think Vivek's intent there was not do any AVX 512 instructions at all if AVX < 3 to overcome performance regressions observed by Scott Oaks.
> 
> My best guess is it helped analysis by eliminating all problematic instructions.
> 
> Anyway, I'm fine with leaving that code. (Though it would be nice to 
> simply get rid of it.)
> 
> It looks like all the code after start_simd_check (and checks to
> legacy_save_restore) can go under use_evex check. And it'll make the code clearer (I spent too much time reasoning about interactions between use_evex and FLAG_IS_DEFAULT(UseAVX) you added).
> 
> Otherwise, looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
>> -----Original Message-----
>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>> Sent: Monday, November 25, 2019 10:15 AM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Vladimir 
>> Kozlov <vladimir.kozlov at oracle.com>
>> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when
>> UseAVX=3 is specified after JDK-8221092
>>
>> Thanks for the clarifications, Sandhya!
>>
>> Some more questions inlined.
>>> The bug happens like below:
>>>
>>> * User specifies -XX:UseAVX=3 on command line.
>>> * On Skylake platform due to the following lines:
>>>        __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset())));
>>>        __ movl(rax, Address(rsi, 0));
>>>        __ cmpl(rax, 0x50654);              // If it is Skylake
>>>        __ jcc(Assembler::equal, legacy_setup);
>>>       The zmm registers are not saved/restored.
>>> * This results in os_supports_avx_vectors() returning false when called from VM_Version::get_processor_features():
>>>      int max_vector_size = 0;
>>>      if (UseSSE < 2) {
>>>        // Vectors (in XMM) are only supported with SSE2+
>>>        // SSE is always 2 on x64.
>>>        max_vector_size = 0;
>>>      } else if (UseAVX == 0 || !os_supports_avx_vectors()) {
>>>        // 16 byte vectors (in XMM) are supported with SSE2+
>>>        max_vector_size = 16;                              ====> This is the point where max_vector_size is set to 16
>>>      } else else if (UseAVX == 1 || UseAVX == 2) {
>>>         ...
>>>      }
>>> * And so we get UseAVX=3 and max_vector_size = 16.
>>>
>>> So the fix is to save/restore zmm registers when the flag is not default and user specifies UseAVX > 2.
>>
>> Unfortunately, I'm still confused :-(
>>
>> If it turns os_supports_avx_vectors() == false, why -XX:UseAVX=1 and
>> -XX:UseAVX=2 aren't affected on Skylake as well?
>>
>>> On your question regarding why 8221092 needs the code you conditionally exclude:
>>> This was introduced so as not to do any AVX512 execution if not required.  ZMM register save/restore uses AVX512 instruction.
>>
>> Are you talking about completely avoiding execution of AVX512 instructions on Skylakes if UseAVX < 3?
>>
>> Considering it is in generate_get_cpu_info() which is run only once early at startup, what kind of effects you intend to avoid?
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>>
>>> Best Regards,
>>> Sandhya
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>> Sent: Friday, November 22, 2019 4:39 AM
>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot 
>>> compiler <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when
>>> UseAVX=3 is specified after JDK-8221092
>>>
>>> Hi Sandhya,
>>>
>>>> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>.
>>>>
>>>> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument.
>>>> This should automatically result in MaxVectorSize being set to 64 bytes.
>>>>
>>>> However post JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes.
>>>
>>> Please, elaborate how it happens and how legacy_setup affects it?
>>>
>>> Why 8221092 needs the code you conditionally exclude.
>>> Why the following isn't enough?
>>>
>>>        if (FLAG_IS_DEFAULT(UseAVX)) {
>>>          FLAG_SET_DEFAULT(UseAVX, use_avx_limit);
>>> +    if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE &&
>>> _stepping < 5) {
>>> +      FLAG_SET_DEFAULT(UseAVX, 2);  //Set UseAVX=2 for Skylake
>>> +    }
>>>        } else if (UseAVX > use_avx_limit) {
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>

From vladimir.x.ivanov at oracle.com  Tue Nov 26 16:20:48 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 26 Nov 2019 19:20:48 +0300
Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is
 specified after JDK-8221092
In-Reply-To: <CH2PR11MB42801CA4A7EBFDD891DBB25EEF450@CH2PR11MB4280.namprd11.prod.outlook.com>
References: <DM6PR11MB428392576157AE6E8AB61289EF490@DM6PR11MB4283.namprd11.prod.outlook.com>
 <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com>
 <CH2PR11MB4280AFBBB2A821DC9A98B36FEF490@CH2PR11MB4280.namprd11.prod.outlook.com>
 <c0916ad3-e4fa-52ba-398c-cd2782ab6c4a@oracle.com>
 <CH2PR11MB42806416E3D0607FDEDD1F7EEF4A0@CH2PR11MB4280.namprd11.prod.outlook.com>
 <a890b2fc-7a7b-6c75-96c0-5940f0ad8692@oracle.com>
 <CH2PR11MB4280286FFA113210E565C283EF4A0@CH2PR11MB4280.namprd11.prod.outlook.com>
 <e900029f-7d42-0d01-4311-15b37f6ee15c@oracle.com>
 <CH2PR11MB42801CA4A7EBFDD891DBB25EEF450@CH2PR11MB4280.namprd11.prod.outlook.com>
Message-ID: <8085a2bd-5204-e0d4-473f-39b19fcf20f4@oracle.com>

Pushed [1]

Best regards,
Vladimir Ivanov

[1] http://hg.openjdk.java.net/jdk/jdk/rev/dff8053bdb74

On 26.11.2019 19:08, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> Please do sponsor the patch. I am not a committer.
> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Tuesday, November 26, 2019 1:26 AM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092
> 
> 
>> http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.02/
> 
> Looks good.
> 
> Testing results (hs-precheckin-comp,hs-tier1,hs-tier2) are clean.
> 
> Best regards,
> Vladimir Ivanov
> 
>> -----Original Message-----
>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>> Sent: Monday, November 25, 2019 12:23 PM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Vladimir
>> Kozlov <vladimir.kozlov at oracle.com>
>> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when
>> UseAVX=3 is specified after JDK-8221092
>>
>>
>>>    From the source code, it looks like os_supports_avx_vectors() does two different things based on the AVX level.
>>> If AVX=3, it checks if save/restore of zmm worked properly.
>>> If AVX=1 or 2, it checks is save/restore of ymm worked properly.
>>> Since only save/restore of zmm is done conditionally for Skylake, the problem is only with AVX=3.  And that is what this patch is trying to fix.
>>
>> Ok, now I see how it works:
>>
>> os_supports_avx_vectors() uses supports_evex() and supports_avx() to dispatch which inspect supported CPU features:
>>
>>      static bool supports_avx()      { return (_features & CPU_AVX) != 0; }
>>
>>      static bool supports_evex()     { return (_features & CPU_AVX512F) !=
>> 0; }
>>
>> But VM_Version::get_processor_features() clears detected features depending on UseAVX level.
>>
>> So, as you described os_supports_avx_vectors() goes through different paths between UseAVX=3 and UseAVX=1/2.
>>
>>> The rest we are entering 8221092 discussion.
>>> I think Vivek's intent there was not do any AVX 512 instructions at all if AVX < 3 to overcome performance regressions observed by Scott Oaks.
>>
>> My best guess is it helped analysis by eliminating all problematic instructions.
>>
>> Anyway, I'm fine with leaving that code. (Though it would be nice to
>> simply get rid of it.)
>>
>> It looks like all the code after start_simd_check (and checks to
>> legacy_save_restore) can go under use_evex check. And it'll make the code clearer (I spent too much time reasoning about interactions between use_evex and FLAG_IS_DEFAULT(UseAVX) you added).
>>
>> Otherwise, looks good.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>> -----Original Message-----
>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>> Sent: Monday, November 25, 2019 10:15 AM
>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Vladimir
>>> Kozlov <vladimir.kozlov at oracle.com>
>>> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when
>>> UseAVX=3 is specified after JDK-8221092
>>>
>>> Thanks for the clarifications, Sandhya!
>>>
>>> Some more questions inlined.
>>>> The bug happens like below:
>>>>
>>>> * User specifies -XX:UseAVX=3 on command line.
>>>> * On Skylake platform due to the following lines:
>>>>         __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset())));
>>>>         __ movl(rax, Address(rsi, 0));
>>>>         __ cmpl(rax, 0x50654);              // If it is Skylake
>>>>         __ jcc(Assembler::equal, legacy_setup);
>>>>        The zmm registers are not saved/restored.
>>>> * This results in os_supports_avx_vectors() returning false when called from VM_Version::get_processor_features():
>>>>       int max_vector_size = 0;
>>>>       if (UseSSE < 2) {
>>>>         // Vectors (in XMM) are only supported with SSE2+
>>>>         // SSE is always 2 on x64.
>>>>         max_vector_size = 0;
>>>>       } else if (UseAVX == 0 || !os_supports_avx_vectors()) {
>>>>         // 16 byte vectors (in XMM) are supported with SSE2+
>>>>         max_vector_size = 16;                              ====> This is the point where max_vector_size is set to 16
>>>>       } else else if (UseAVX == 1 || UseAVX == 2) {
>>>>          ...
>>>>       }
>>>> * And so we get UseAVX=3 and max_vector_size = 16.
>>>>
>>>> So the fix is to save/restore zmm registers when the flag is not default and user specifies UseAVX > 2.
>>>
>>> Unfortunately, I'm still confused :-(
>>>
>>> If it turns os_supports_avx_vectors() == false, why -XX:UseAVX=1 and
>>> -XX:UseAVX=2 aren't affected on Skylake as well?
>>>
>>>> On your question regarding why 8221092 needs the code you conditionally exclude:
>>>> This was introduced so as not to do any AVX512 execution if not required.  ZMM register save/restore uses AVX512 instruction.
>>>
>>> Are you talking about completely avoiding execution of AVX512 instructions on Skylakes if UseAVX < 3?
>>>
>>> Considering it is in generate_get_cpu_info() which is run only once early at startup, what kind of effects you intend to avoid?
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>>
>>>> Best Regards,
>>>> Sandhya
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>> Sent: Friday, November 22, 2019 4:39 AM
>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot
>>>> compiler <hotspot-compiler-dev at openjdk.java.net>
>>>> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when
>>>> UseAVX=3 is specified after JDK-8221092
>>>>
>>>> Hi Sandhya,
>>>>
>>>>> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>.
>>>>>
>>>>> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument.
>>>>> This should automatically result in MaxVectorSize being set to 64 bytes.
>>>>>
>>>>> However post JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes.
>>>>
>>>> Please, elaborate how it happens and how legacy_setup affects it?
>>>>
>>>> Why 8221092 needs the code you conditionally exclude.
>>>> Why the following isn't enough?
>>>>
>>>>         if (FLAG_IS_DEFAULT(UseAVX)) {
>>>>           FLAG_SET_DEFAULT(UseAVX, use_avx_limit);
>>>> +    if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE &&
>>>> _stepping < 5) {
>>>> +      FLAG_SET_DEFAULT(UseAVX, 2);  //Set UseAVX=2 for Skylake
>>>> +    }
>>>>         } else if (UseAVX > use_avx_limit) {
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>

From martin.doerr at sap.com  Tue Nov 26 16:21:58 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 26 Nov 2019 16:21:58 +0000
Subject: RFR(XS): 8234645: ARM32: C1: PatchingStub for field access: not
 enough bytes
Message-ID: <VI1PR0201MB2479CD69D07E505D34E8C8329A450@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Christoph,

thanks for reporting the bug
https://bugs.openjdk.java.net/browse/JDK-8234645

Seems like the large offset fix in mem2reg and reg2mem missed the first patching stub in the long/double cases.
We should have nop padding there, too (for same reason as for the 2nd patching stub).
NativeMovRegMem should always consist of 2 instructions on arm32 in order to support larger offsets.

Webrev:
http://cr.openjdk.java.net/~mdoerr/8234645_arm_padding/webrev.00/

May I ask you to test this fix? We don't have arm32 in our testing landscape.

Best regards,
Martin


From tobias.hartmann at oracle.com  Tue Nov 26 16:30:40 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 26 Nov 2019 17:30:40 +0100
Subject: [14] RFR (L): 8234391: C2: Generic vector operands
In-Reply-To: <89904467-5010-129f-6f61-e279cce8936a@oracle.com>
References: <89904467-5010-129f-6f61-e279cce8936a@oracle.com>
Message-ID: <0e506a31-9107-0354-ffce-308d332cbfbd@oracle.com>

Hi Vladimir,

hard to review the .ad file changes but this looks good to me.

Just noticed some code style issues:
- x86_64.ad:11284, 11346, 11410, 11426: indentation is wrong (already before your fix)
- whitespace in matcher.cpp:2598/2601 can be removed

Best regards,
Tobias

On 19.11.19 15:30, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234391/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8234391
> 
> Introduce generic vector operands and migrate existing usages from fixed sized operands (vec[SDXYZ])
> to generic ones.
> 
> (It's an updated version of generic vector support posted for review in August, 2019 [1] [2]. AD
> instruction merges will be handled separately.)
> 
> On a high-level it is organized as follows:
> 
> ? (1) all AD instructions in x86.ad/x86_64.ad/x86_32.ad use vec/legVec;
> 
> ? (2) at runtime, right after matching is over, a special pass is performed which does:
> 
> ????? * replaces vecOper with vec[SDXYZ] depending on mach node type
> ???????? - vector mach nodes capute bottom_type() of their ideal prototype;
> 
> ????? * eliminates redundant reg-to-reg vector moves (MoveVec2Leg /MoveLeg2Vec)
> ???????? - matcher needs them, but they are useless for register allocator (moreover, may cause
> additional spills);
> 
> 
> ?? (3) after post-selection pass is over, all mach nodes should have fixed-size vector operands.
> 
> 
> Some details:
> 
> ?? (1) vec and legVec are marked as "dynamic" operands, so post-selection rewriting works
> 
> 
> ?? (2) new logic is guarded by new matcher flag (Matcher::supports_generic_vector_operands) which is
> enabled only on x86
> 
> 
> ?? (3) post-selection analysis is implemented as a single pass over the graph and processing
> individual nodes using their own (for DEF operands) or their inputs (USE operands) bottom_type()
> (which is an instance of TypeVect)
> 
> 
> ?? (4) most of the analysis is cross-platform and interface with platform-specific code through 3
> methods:
> 
> ???? static bool is_generic_reg2reg_move(MachNode* m);
> ???? // distinguishes MoveVec2Leg/MoveLeg2Vec nodes
> 
> ???? static bool is_generic_vector(MachOper* opnd);
> ???? // distinguishes vec/legVec operands
> 
> ???? static MachOper* clone_generic_vector_operand(MachOper* generic_opnd, uint ideal_reg);
> ???? // constructs fixed-sized vector operand based on ideal reg
> ???? //?? vec??? + Op_Vec[SDXYZ] =>??? vec[SDXYZ]
> ???? //?? legVec + Op_Vec[SDXYZ] => legVec[SDXYZ]
> 
> 
> ?? (5) TEMP operands are handled specially:
> ???? - TEMP uses max_vector_size() to determine what fixed-sized operand to use
> ???????? * it is needed to cover reductions which don't produce vectors but scalars
> ???? - TEMP_DEF inherits fixed-sized operand type from DEF;
> 
> 
> ?? (6) there is limited number of special cases for mach nodes in Matcher::get_vector_operand_helper:
> 
> ?????? - RShiftCntV/RShiftCntV: though it reports wide vector type as Node::bottom_type(), its
> ideal_reg is VecS! But for vector nodes only Node::bottom_type() is captured during matching and not
> ideal_reg().
> 
> ?????? - vshiftcntimm: chain instructions which convert scalar to vector don't have vector type.
> 
> 
> ?? (7) idealreg2regmask initialization logic is adjusted to handle generic vector operands (see
> Matcher::get_vector_regmask)
> 
> 
> ?? (8) operand renaming in x86_32.ad & x86_64.ad to avoid name conflicts with new vec/legVec operands
> 
> 
> ?? (9) x86_64.ad: all TEMP usages of vecS/legVecS are replaced with regD/legRegD
> ????? - it aligns the code between x86_64.ad and x86_32.ad
> ????? - strictly speaking, it's illegal to use vector operands on a non-vector node (e.g.,
> string_inflate) unless its usage is guarded by C2 vector support checks (-XX:MaxVectorSize=0)
> 
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
> 
> Testing: tier1-tier4, jtreg compiler tests on KNL and SKL,
> ???????? performance testing (SPEC* + Octane + micros / G1 + ParGC).
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html
> 
> [2] http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf

From vladimir.x.ivanov at oracle.com  Tue Nov 26 16:52:37 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 26 Nov 2019 19:52:37 +0300
Subject: [14] RFR (L): 8234391: C2: Generic vector operands
In-Reply-To: <0e506a31-9107-0354-ffce-308d332cbfbd@oracle.com>
References: <89904467-5010-129f-6f61-e279cce8936a@oracle.com>
 <0e506a31-9107-0354-ffce-308d332cbfbd@oracle.com>
Message-ID: <77c66795-ecd4-1461-4151-66c82b4e554c@oracle.com>

Thanks, Tobias.

> hard to review the .ad file changes but this looks good to me.

Yes, the changes are massive, but mostly straightforward. In addition to 
the code needed for generic vector operand, the changes are:

   (1) (x86.ad) switching vec[SDXYZ] => vec and legVec[SDXYZ] => legVec;

   (2) (x86.ad) after the switch reduction instructions need additional 
checks on input vector size (example [1]);

   (3) (x86_64.ad/x86_32.ad) rename operands with "vec" name to avoid 
name conflicts with vec operand;

   (4) (x86_64.ad) migrate compressed string instructions from legVecS 
to legRegD to keep it working when vector support is explicitly disabled 
(e.g., -XX:MaxVectorSize=0)

(1) is needed to avoid explicit moves between concrete 
(legVec/vec[SDXYZ]) and generic vectors (vec/legVec).

(2)-(4) could have been reviewed/integrated separately, but they did 
look trivial enough to avoid the effort.

> Just noticed some code style issues:
> - x86_64.ad:11284, 11346, 11410, 11426: indentation is wrong (already before your fix)
> - whitespace in matcher.cpp:2598/2601 can be removed

Good catch.

Best regards,
Vladimir Ivanov

[1]
-instruct rvadd2F_reduction_reg(regF dst, vecD src2, vecD tmp) %{
-  predicate(UseAVX > 0);
+instruct rvadd2F_reduction_reg(regF dst, vec src2, vec tmp) %{
+  predicate(UseAVX > 0 && n->in(2)->bottom_type()->is_vect()->length() 
== 2);

> On 19.11.19 15:30, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234391/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8234391
>>
>> Introduce generic vector operands and migrate existing usages from fixed sized operands (vec[SDXYZ])
>> to generic ones.
>>
>> (It's an updated version of generic vector support posted for review in August, 2019 [1] [2]. AD
>> instruction merges will be handled separately.)
>>
>> On a high-level it is organized as follows:
>>
>>  ? (1) all AD instructions in x86.ad/x86_64.ad/x86_32.ad use vec/legVec;
>>
>>  ? (2) at runtime, right after matching is over, a special pass is performed which does:
>>
>>  ????? * replaces vecOper with vec[SDXYZ] depending on mach node type
>>  ???????? - vector mach nodes capute bottom_type() of their ideal prototype;
>>
>>  ????? * eliminates redundant reg-to-reg vector moves (MoveVec2Leg /MoveLeg2Vec)
>>  ???????? - matcher needs them, but they are useless for register allocator (moreover, may cause
>> additional spills);
>>
>>
>>  ?? (3) after post-selection pass is over, all mach nodes should have fixed-size vector operands.
>>
>>
>> Some details:
>>
>>  ?? (1) vec and legVec are marked as "dynamic" operands, so post-selection rewriting works
>>
>>
>>  ?? (2) new logic is guarded by new matcher flag (Matcher::supports_generic_vector_operands) which is
>> enabled only on x86
>>
>>
>>  ?? (3) post-selection analysis is implemented as a single pass over the graph and processing
>> individual nodes using their own (for DEF operands) or their inputs (USE operands) bottom_type()
>> (which is an instance of TypeVect)
>>
>>
>>  ?? (4) most of the analysis is cross-platform and interface with platform-specific code through 3
>> methods:
>>
>>  ???? static bool is_generic_reg2reg_move(MachNode* m);
>>  ???? // distinguishes MoveVec2Leg/MoveLeg2Vec nodes
>>
>>  ???? static bool is_generic_vector(MachOper* opnd);
>>  ???? // distinguishes vec/legVec operands
>>
>>  ???? static MachOper* clone_generic_vector_operand(MachOper* generic_opnd, uint ideal_reg);
>>  ???? // constructs fixed-sized vector operand based on ideal reg
>>  ???? //?? vec??? + Op_Vec[SDXYZ] =>??? vec[SDXYZ]
>>  ???? //?? legVec + Op_Vec[SDXYZ] => legVec[SDXYZ]
>>
>>
>>  ?? (5) TEMP operands are handled specially:
>>  ???? - TEMP uses max_vector_size() to determine what fixed-sized operand to use
>>  ???????? * it is needed to cover reductions which don't produce vectors but scalars
>>  ???? - TEMP_DEF inherits fixed-sized operand type from DEF;
>>
>>
>>  ?? (6) there is limited number of special cases for mach nodes in Matcher::get_vector_operand_helper:
>>
>>  ?????? - RShiftCntV/RShiftCntV: though it reports wide vector type as Node::bottom_type(), its
>> ideal_reg is VecS! But for vector nodes only Node::bottom_type() is captured during matching and not
>> ideal_reg().
>>
>>  ?????? - vshiftcntimm: chain instructions which convert scalar to vector don't have vector type.
>>
>>
>>  ?? (7) idealreg2regmask initialization logic is adjusted to handle generic vector operands (see
>> Matcher::get_vector_regmask)
>>
>>
>>  ?? (8) operand renaming in x86_32.ad & x86_64.ad to avoid name conflicts with new vec/legVec operands
>>
>>
>>  ?? (9) x86_64.ad: all TEMP usages of vecS/legVecS are replaced with regD/legRegD
>>  ????? - it aligns the code between x86_64.ad and x86_32.ad
>>  ????? - strictly speaking, it's illegal to use vector operands on a non-vector node (e.g.,
>> string_inflate) unless its usage is guarded by C2 vector support checks (-XX:MaxVectorSize=0)
>>
>>
>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>> Reviewed-by: vlivanov, sviswanathan, ?
>>
>> Testing: tier1-tier4, jtreg compiler tests on KNL and SKL,
>>  ???????? performance testing (SPEC* + Octane + micros / G1 + ParGC).
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html
>>
>> [2] http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf

From vladimir.kozlov at oracle.com  Tue Nov 26 17:05:22 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 26 Nov 2019 09:05:22 -0800
Subject: RFR: 823480: [TESTBUG] LoopRotateBadNodeBudget fails for client
 VMs due to Unrecognized VM option PartialPeelNewPhiDelta
In-Reply-To: <20191126133017.4DEC711CEFA@aojmv0009>
References: <20191126133017.4DEC711CEFA@aojmv0009>
Message-ID: <46e4c188-2d15-bd33-010b-cb203741cbd8@oracle.com>

Good.

Thanks,
Vladimir

On 11/26/19 5:28 AM, christoph.goettschkes at microdoc.com wrote:
> Hi,
> 
> please review the following small changeset which fixes the test
> test/hotspot/jtreg/compiler/loopopts/LoopRotateBadNodeBudget.java for
> client VMs.
> I simply added vm.compiler2.enabled to the requires tag, since the
> original bug only appeared with the server JIT:
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8234807
> Webrev: http://cr.openjdk.java.net/~cgo/8234807/webrev.00/
> 
> Bug which introduced the issue:
> https://bugs.openjdk.java.net/browse/JDK-8231565
> 
> Thanks,
> Christoph
> 

From nils.eliasson at oracle.com  Tue Nov 26 17:59:44 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Tue, 26 Nov 2019 18:59:44 +0100
Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles
In-Reply-To: <58fd5565-4342-ea70-511d-bace68308391@oracle.com>
References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com>
 <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com>
 <49878c82-d174-b2c9-219a-0fe561155c0d@oracle.com>
 <58fd5565-4342-ea70-511d-bace68308391@oracle.com>
Message-ID: <3fcbaa19-1068-d30d-96b4-3f8a52089d28@oracle.com>

Hi,

On 2019-11-26 13:02, Vladimir Ivanov wrote:
>
>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.03/
>
> ====================================================================
>
> src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp:
>
> +? // The currently modeled arraycopy-clone_basic doesn't have the 
> base pointers for src and dst,
> +? // rather point at the start of the payload.
> +? Node* src_base = get_base_for_arracycopy_clone(phase, src);
> +? Node* dst_base = get_base_for_arracycopy_clone(phase, dst);
> +
> +? // The size must also be increased to match the instance size.
> +? int base_off = BarrierSetC2::extract_base_offset(false);
> +? Node* full_size = phase->transform_later(new AddLNode(size, 
> phase->longcon(base_off >> LogBytesPerLong)));
> ?? Node* const call = phase->make_leaf_call(ctrl,
> ??????????????????????????????????????????? mem,
> ??????????????????????????????????????????? clone_type(),
>
> ZBarrierSetRuntime::clone_addr(),
> "ZBarrierSetRuntime::clone",
> ??????????????????????????????????????????? TypeRawPtr::BOTTOM,
> -?????????????????????????????????????????? src,
> -?????????????????????????????????????????? dst,
> -?????????????????????????????????????????? size);
> +?????????????????????????????????????????? src_base,
> +?????????????????????????????????????????? dst_base,
> +?????????????????????????????????????????? full_size,
> +?????????????????????????????????????????? phase->top());
>
>
> Do you see any problems with copying object header?

It won't be copied. It's just that the runtime call expects the 
arguments to be pointers to the objects, and the size of the object. 
It's the same function that is used by a call to the native clone impl. 
(jvm.cpp:720)

>
> ====================================================================
>
> The rest are minor comments:
>
> src/hotspot/share/gc/shared/c2/barrierSetC2.cpp:
>
> -void BarrierSetC2::clone(GraphKit* kit, Node* src, Node* dst, Node* 
> size, bool is_array) const {
> -? // Exclude the header but include array length to copy by 8 bytes 
> words.
> -? // Can't use base_offset_in_bytes(bt) since basic type is unknown.
> +int BarrierSetC2::extract_base_offset(bool is_array) {
>
> ...
>
> +void BarrierSetC2::clone(GraphKit* kit, Node* src, Node* dst, Node* 
> size, bool is_array) const {
> +? // Exclude the header but include array length to copy by 8 bytes 
> words.
> +? // Can't use base_offset_in_bytes(bt) since basic type is unknown.
> +? int base_off = extract_base_offset(is_array);
>
>
> I'd leave the comment in BarrierSetC2::extract_base_offset(). After 
> the refactoring it looks confusing in BarrierSetC2::clone().
>
> Also, considering it's used from zBarrierSetC2.cpp, it would be nice 
> to have a more descriptive name. 
> BarrierSetC2::arraycopy_payload_base_offset(bool is_array) maybe?

Sounds reasonable.

>
>
> ====================================================================
>
> src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp:
>
> +Node* get_base_for_arracycopy_clone(PhaseMacroExpand* phase, Node* n) {
>
> Should get_base_for_arracycopy_clone be static? Also, would be nice if 
> the name reflects that it works on instances and not arrays.
Fixed
>
> ====================================================================
>
> +? // The currently modeled arraycopy-clone_basic doesn't have the 
> base pointers for src and dst,
> +? // rather point at the start of the payload.
> +? Node* src_base = get_base_for_arracycopy_clone(phase, src);
> +? Node* dst_base = get_base_for_arracycopy_clone(phase, dst);
>
> Another thing: "base" in zBarrierSetC2.cpp and in BarrierSetC2 has 
> opposite meaning which is confusing.
>
> Renaming src/dst<->src_base/dst_base in 
> ZBarrierSetC2::clone_at_expansion() would improve things.

Done.

>
> ====================================================================
>
> -? if (src->bottom_type()->isa_aryptr()) {
> +? if (ac->is_clone_array()) {
> ???? // Clone primitive array
>
> Is the comment valid? Doesn't it cover object array case as well?

Nope - object arrays will be handled as clone_oop_array which uses the 
normal object copy which already applies the appropriate load barriers.

The special case for ZGC is the cloning of instances because we don't 
know where to apply load barriers without looking up the type. (Except 
for clone on small objects and short arrays that are transformed to a 
series of load-stores.)

http://cr.openjdk.java.net/~neliasso/8234520/webrev.04

Thank you for the feedback!

// Nils

>
>
> Best regards,
> Vladimir Ivanov
>
>>
>> Regards,
>>
>> Nils
>>
>>
>> On 2019-11-21 12:53, Nils Eliasson wrote:
>>> I updated this to version 2.
>>>
>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/
>>>
>>> I found a problen running 
>>> compiler/arguments/TestStressReflectiveCode.java
>>>
>>> Even though the clone was created as a oop clone, the type node type 
>>> returns isa_aryprt. This is caused by the src ptr not being the base 
>>> pointer. Until I fix that I wanted a more robust test.
>>>
>>> In this webrev I split up the is_clonebasic into is_clone_oop and 
>>> is_clone_array. (is_clone_oop_array is already there). Having a 
>>> complete set with the three clone types allows for a robust test and 
>>> easy verification. (The three variants end up in different paths 
>>> with different GCs).
>>>
>>> Regards,
>>>
>>> Nils
>>>
>>>
>>> On 2019-11-20 15:25, Nils Eliasson wrote:
>>>> Hi,
>>>>
>>>> I found a few bugs after the enabling of the clone intrinsic in ZGC.
>>>>
>>>> 1) The arraycopy clone_basic has the parameters adjusted to work as 
>>>> a memcopy. For an oop the src is pointing inside the oop to where 
>>>> we want to start copying. But when we want to do a runtime call to 
>>>> clone - the parameters are supposed to be the actual src oop and 
>>>> dst oop, and the size should be the instance size.
>>>>
>>>> For now I have made a workaround. What should be done later is 
>>>> using the offset in the arraycopy node to encode where the payload 
>>>> is, so that the base pointers are always correct. But that would 
>>>> require changes to the BarrierSet classes of all GCs. So I leave 
>>>> that for next release.
>>>>
>>>> 2) The size parameter of the TypeFunc for the runtime call has the 
>>>> wrong type. It was originally Long but missed the upper Half, it 
>>>> was fixed to INT (JDK-8233834), but that is wrong and causes the 
>>>> compiles to be skipped. We didn't notice that since they failed 
>>>> silently. That is also why we didn't notice problem #1 too.
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8234520
>>>>
>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/
>>>>
>>>> Please review!
>>>>
>>>> Nils
>>>>

From christoph.goettschkes at microdoc.com  Wed Nov 27 09:09:20 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Wed, 27 Nov 2019 10:09:20 +0100
Subject: RFR(XS): 8234645: ARM32: C1: PatchingStub for field access: not
 enough bytes
In-Reply-To: <VI1PR0201MB2479CD69D07E505D34E8C8329A450@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB2479CD69D07E505D34E8C8329A450@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <mailman.2.1574845867.28664.hotspot-compiler-dev@openjdk.java.net>

Hi Martin,

thanks for looking into the issue.
I executed the tier1 hotspot tests with your patch applied and it looks 
good. Tests which were failing before pass now. I have some, but unrelated 
failures, which I will start looking into now.

One small remark:
When I encountered the problem, I was very much confused by the constant 
instruction size of 8 [1]. Maybe it would be more clear to define the 
instruction size to be 4, and instead, in "num_bytes_to_end_of_patch()", 
return "instruction_size * 2"?

Best regards,
Christoph

[1]
https://hg.openjdk.java.net/jdk/jdk/file/a2441ac23eeb/src/hotspot/cpu/arm/nativeInst_arm_32.hpp

"Doerr, Martin" <martin.doerr at sap.com> wrote on 2019-11-26 17:21:58:

> From: "Doerr, Martin" <martin.doerr at sap.com>
> To: "christoph.goettschkes at microdoc.com" 
> <christoph.goettschkes at microdoc.com>, "'hotspot-compiler-
> dev at openjdk.java.net'" <hotspot-compiler-dev at openjdk.java.net>
> Date: 2019-11-26 17:22
> Subject: RFR(XS): 8234645: ARM32: C1: PatchingStub for field access: 
> not enough bytes
> 
> Hi Christoph,
> 
> thanks for reporting the bug
> https://bugs.openjdk.java.net/browse/JDK-8234645
> 
> Seems like the large offset fix in mem2reg and reg2mem missed the first
> patching stub in the long/double cases.
> We should have nop padding there, too (for same reason as for the 2nd 
> patching stub).
> NativeMovRegMem should always consist of 2 instructions on arm32 in 
> order to support larger offsets.
> 
> Webrev:
> http://cr.openjdk.java.net/~mdoerr/8234645_arm_padding/webrev.00/
> 
> May I ask you to test this fix? We don?t have arm32 in our testing 
landscape.
> 
> Best regards,
> Martin
> 

From christoph.goettschkes at microdoc.com  Wed Nov 27 09:22:33 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Wed, 27 Nov 2019 10:22:33 +0100
Subject: RFR: 823480: [TESTBUG] LoopRotateBadNodeBudget fails for client
 VMs due to Unrecognized VM option PartialPeelNewPhiDelta
In-Reply-To: <46e4c188-2d15-bd33-010b-cb203741cbd8@oracle.com>
References: <20191126133017.4DEC711CEFA@aojmv0009>
 <46e4c188-2d15-bd33-010b-cb203741cbd8@oracle.com>
Message-ID: <mailman.3.1574846655.28664.hotspot-compiler-dev@openjdk.java.net>

Hi Vladimir,

thanks for the review. I update the webrev, please find the changeset 
here:
https://cr.openjdk.java.net/~cgo/8234807/webrev.02/jdk-jdk.changeset

Could you please sponsor this change for me and commit it into the 
repository?

Thanks,
Christoph

"hotspot-compiler-dev" <hotspot-compiler-dev-bounces at openjdk.java.net> 
wrote on 2019-11-26 18:05:22:

> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> To: hotspot-compiler-dev at openjdk.java.net
> Date: 2019-11-26 18:06
> Subject: Re: RFR: 823480: [TESTBUG] LoopRotateBadNodeBudget fails for 
> client VMs due to Unrecognized VM option PartialPeelNewPhiDelta
> Sent by: "hotspot-compiler-dev" 
<hotspot-compiler-dev-bounces at openjdk.java.net>
> 
> Good.
> 
> Thanks,
> Vladimir
> 
> On 11/26/19 5:28 AM, christoph.goettschkes at microdoc.com wrote:
> > Hi,
> > 
> > please review the following small changeset which fixes the test
> > test/hotspot/jtreg/compiler/loopopts/LoopRotateBadNodeBudget.java for
> > client VMs.
> > I simply added vm.compiler2.enabled to the requires tag, since the
> > original bug only appeared with the server JIT:
> > 
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8234807
> > Webrev: http://cr.openjdk.java.net/~cgo/8234807/webrev.00/
> > 
> > Bug which introduced the issue:
> > https://bugs.openjdk.java.net/browse/JDK-8231565
> > 
> > Thanks,
> > Christoph
> > 
> 


From christoph.goettschkes at microdoc.com  Wed Nov 27 09:57:27 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Wed, 27 Nov 2019 10:57:27 +0100
Subject: RFR: 8234894: [TESTBUG] TestEliminateLocksOffCrash fails for client
 VMs due to Unrecognized VM option EliminateLocks
Message-ID: <mailman.4.1574848752.28664.hotspot-compiler-dev@openjdk.java.net>

Hi,

please review the following small changeset which fixes the test 
test/hotspot/jtreg/compiler/escapeAnalysis/TestEliminateLocksOffCrash.java 
for client VMs.
I added the requirement "vm.compiler2.enabled & !vm.graal.enabled", since 
the original bug is for C2 only. Also, the flag EliminateLocks is only 
defined in c2_globals.hpp, and neither for C1, nor for JVMCI.

Bug: https://bugs.openjdk.java.net/browse/JDK-8234894
Webrev: https://cr.openjdk.java.net/~cgo/8234894/webrev.00

Bug which introduced the issue:
https://bugs.openjdk.java.net/browse/JDK-8227384

Thanks,
Christoph


From nils.eliasson at oracle.com  Wed Nov 27 10:50:29 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Wed, 27 Nov 2019 11:50:29 +0100
Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles
In-Reply-To: <2154a53d-4d36-d26f-9155-5c955796f566@oracle.com>
References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com>
 <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com>
 <2154a53d-4d36-d26f-9155-5c955796f566@oracle.com>
Message-ID: <79bc1866-2293-37a6-9780-af24a01eb699@oracle.com>

Hi Per,

Here is an update webrev with your fixes included:

http://cr.openjdk.java.net/~neliasso/8234520/webrev.05/

I chose to go with CloneInst, and ClonePrimArray.

CopyOf, CopyOfRange - like ArrayCopy, doesn't have specialized versions, 
rather check the type when expanding, and of course - there are no 
versions for instances. But there are differences to what guards they 
need, and when the guards are expanded. I would need to dig down into 
the details to determine if it could be simplified.

Regards,

Nils

On 2019-11-26 14:29, Per Liden wrote:
> Hi Nils,
>
> On 11/21/19 12:53 PM, Nils Eliasson wrote:
>> I updated this to version 2.
>>
>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/
>>
>> I found a problen running 
>> compiler/arguments/TestStressReflectiveCode.java
>>
>> Even though the clone was created as a oop clone, the type node type 
>> returns isa_aryprt. This is caused by the src ptr not being the base 
>> pointer. Until I fix that I wanted a more robust test.
>>
>> In this webrev I split up the is_clonebasic into is_clone_oop and 
>> is_clone_array. (is_clone_oop_array is already there). Having a 
>> complete set with the three clone types allows for a robust test and 
>> easy verification. (The three variants end up in different paths with 
>> different GCs).
>
> A couple of suggestions:
>
> 1) Instead of
>
> ?CloneOop
> ?CloneArray
> ?CloneOopArray
>
> I think we should call the three types:
>
> ?CloneInstance
> ?CloneTypeArray
> ?CloneOopArray
>
> Since CloneOop is not actually cloning an oop, but an object/instance. 
> And being explicit about TypeArray seems like a good thing to avoid 
> any confusion about the difference compared to CloneOopArray. I guess 
> PrimArray would be an alternative to TypeArray.
>
> And of course, if we change this then the is_clone/set_clone functions 
> should follow the same naming convention.
>
> Btw, what about CopyOf and CopyOfRange? Don't they also come in Oop 
> and Type versions, or are we handling those differently in some way? 
> Looking at the code it looks like they are only used for the oop array 
> case?

>
>
> 2) In zBarrierSerC2.cpp, do you mind if we do like this instead? I 
> find that quite a bit easier to read.
>
> [...]
> ? const Type** domain_fields = TypeTuple::fields(4);
> ? domain_fields[TypeFunc::Parms + 0] = TypeInstPtr::NOTNULL;? // src
> ? domain_fields[TypeFunc::Parms + 1] = TypeInstPtr::NOTNULL;? // dst
> ? domain_fields[TypeFunc::Parms + 2] = TypeLong::LONG;??????? // size 
> lower
> ? domain_fields[TypeFunc::Parms + 3] = Type::HALF;??????????? // size 
> upper
> ? const TypeTuple* domain = TypeTuple::make(TypeFunc::Parms + 4, 
> domain_fields);
> [...]
>
>
> 3) I'd also like to add some const, adjust indentation, etc, in a few 
> places. Instead of listing them here I made a patch, which goes on top 
> of yours. This patch also adjusts 2) above. Just shout if you have any 
> objections.
>
> http://cr.openjdk.java.net/~pliden/8234520/webrev.03-review
>
> /Per
>
>>
>> Regards,
>>
>> Nils
>>
>>
>> On 2019-11-20 15:25, Nils Eliasson wrote:
>>> Hi,
>>>
>>> I found a few bugs after the enabling of the clone intrinsic in ZGC.
>>>
>>> 1) The arraycopy clone_basic has the parameters adjusted to work as 
>>> a memcopy. For an oop the src is pointing inside the oop to where we 
>>> want to start copying. But when we want to do a runtime call to 
>>> clone - the parameters are supposed to be the actual src oop and dst 
>>> oop, and the size should be the instance size.
>>>
>>> For now I have made a workaround. What should be done later is using 
>>> the offset in the arraycopy node to encode where the payload is, so 
>>> that the base pointers are always correct. But that would require 
>>> changes to the BarrierSet classes of all GCs. So I leave that for 
>>> next release.
>>>
>>> 2) The size parameter of the TypeFunc for the runtime call has the 
>>> wrong type. It was originally Long but missed the upper Half, it was 
>>> fixed to INT (JDK-8233834), but that is wrong and causes the 
>>> compiles to be skipped. We didn't notice that since they failed 
>>> silently. That is also why we didn't notice problem #1 too.
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8234520
>>>
>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/
>>>
>>> Please review!
>>>
>>> Nils
>>>

From aph at redhat.com  Wed Nov 27 10:54:36 2019
From: aph at redhat.com (Andrew Haley)
Date: Wed, 27 Nov 2019 10:54:36 +0000
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
Message-ID: <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com>

On 11/26/19 9:25 AM, Nick Gasson wrote:
> Oddly enough the test case runtime/memory/ReadFromNoaccessArea.java now 
> hits this. I see:
> 
> CompressedKlassPointers::base() => 0xffff0b4b5000
> CompressedKlassPointers::shift() => 3

This is bad. Can you have a look at the allocation code to see why the search
for an appropriate address range fails?

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From martin.doerr at sap.com  Wed Nov 27 11:07:10 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 27 Nov 2019 11:07:10 +0000
Subject: RFR(XS): 8234645: ARM32: C1: PatchingStub for field access: not
 enough bytes
In-Reply-To: <14cb9da700b74a74b52d409aa1a018b7@DEROTE13EDGE02.wdf.sap.corp>
References: <VI1PR0201MB2479CD69D07E505D34E8C8329A450@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <14cb9da700b74a74b52d409aa1a018b7@DEROTE13EDGE02.wdf.sap.corp>
Message-ID: <VI1PR0201MB247974FFB22A7F128FC7A3119A440@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Christoph,

thank you for testing it.

> One small remark:
> When I encountered the problem, I was very much confused by the constant
> instruction size of 8 [1]. Maybe it would be more clear to define the
> instruction size to be 4, and instead, in "num_bytes_to_end_of_patch()",
> return "instruction_size * 2"?

"instruction_size" refers to the size of a "NativeMovRegMem". It's used this way on other platforms, too.
I think the confusion comes from the fact that the "NativeMovRegMem" processor instructions get emitted individually on arm32, not by a "NativeMovRegMem" emitter which doesn't exist.
Emitting less than 8 bytes is always a bug (see set_offset which needs that space and you don't know in advance if the second 4 bytes will be written or not).

I need a 2nd review. I'll request an 11.0.6 backport afterwards.

Best regards,
Martin


> -----Original Message-----
> From: christoph.goettschkes at microdoc.com
> <christoph.goettschkes at microdoc.com>
> Sent: Mittwoch, 27. November 2019 10:09
> To: Doerr, Martin <martin.doerr at sap.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR(XS): 8234645: ARM32: C1: PatchingStub for field access: not
> enough bytes
> 
> Hi Martin,
> 
> thanks for looking into the issue.
> I executed the tier1 hotspot tests with your patch applied and it looks
> good. Tests which were failing before pass now. I have some, but unrelated
> failures, which I will start looking into now.
> 
> One small remark:
> When I encountered the problem, I was very much confused by the constant
> instruction size of 8 [1]. Maybe it would be more clear to define the
> instruction size to be 4, and instead, in "num_bytes_to_end_of_patch()",
> return "instruction_size * 2"?
> 
> Best regards,
> Christoph
> 
> [1]
> https://hg.openjdk.java.net/jdk/jdk/file/a2441ac23eeb/src/hotspot/cpu/ar
> m/nativeInst_arm_32.hpp
> 
> "Doerr, Martin" <martin.doerr at sap.com> wrote on 2019-11-26 17:21:58:
> 
> > From: "Doerr, Martin" <martin.doerr at sap.com>
> > To: "christoph.goettschkes at microdoc.com"
> > <christoph.goettschkes at microdoc.com>, "'hotspot-compiler-
> > dev at openjdk.java.net'" <hotspot-compiler-dev at openjdk.java.net>
> > Date: 2019-11-26 17:22
> > Subject: RFR(XS): 8234645: ARM32: C1: PatchingStub for field access:
> > not enough bytes
> >
> > Hi Christoph,
> >
> > thanks for reporting the bug
> > https://bugs.openjdk.java.net/browse/JDK-8234645
> >
> > Seems like the large offset fix in mem2reg and reg2mem missed the first
> > patching stub in the long/double cases.
> > We should have nop padding there, too (for same reason as for the 2nd
> > patching stub).
> > NativeMovRegMem should always consist of 2 instructions on arm32 in
> > order to support larger offsets.
> >
> > Webrev:
> > http://cr.openjdk.java.net/~mdoerr/8234645_arm_padding/webrev.00/
> >
> > May I ask you to test this fix? We don?t have arm32 in our testing
> landscape.
> >
> > Best regards,
> > Martin
> >

From boris.ulasevich at bell-sw.com  Wed Nov 27 12:55:18 2019
From: boris.ulasevich at bell-sw.com (Boris Ulasevich)
Date: Wed, 27 Nov 2019 15:55:18 +0300
Subject: RFR(S) 8234891: AArch64: Fix build failure after JDK-8234387
Message-ID: <847197ac-5969-900c-840a-61c475b19d84@bell-sw.com>

Hi,

Please review the fix in aarch64.ad to address the build issue "Ideal 
node missing: CmpOp" raised after recent change in C2. The intuitive 
operand name case correction CmpOp->cmpOp fixes the build, but leads to 
unworkable jvm. Removing the match rule works good and jdk/hotspot tests 
are Ok.

http://bugs.openjdk.java.net/browse/JDK-8234891
http://cr.openjdk.java.net/~bulasevich/8234891/webrev.00

ARM32 build fails too. I will fix the problem in arm32.ad file separately.

thanks,
Boris

From christoph.goettschkes at microdoc.com  Wed Nov 27 12:55:47 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Wed, 27 Nov 2019 13:55:47 +0100
Subject: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for client VMs
 due to Unrecognized VM option LoopUnrollLimit
Message-ID: <mailman.5.1574859451.28664.hotspot-compiler-dev@openjdk.java.net>

Hi,

please review the following small changeset which fixes the test 
test/hotspot/jtreg/compiler/loopopts/TestDivZeroCheckControl.java for 
client VMs.
I added the requirement "vm.compiler2.enabled & !vm.graal.enabled", since 
the original bug is for C2 only. If I understand the bug description well, 
the provided test case works solely because of the provided 
"LoopUnrollLimit" flag, which is only valid for C2.

Bug: https://bugs.openjdk.java.net/browse/JDK-8234906
Webrev: https://cr.openjdk.java.net/~cgo/8234906/webrev.00/

Bug which introduced the issue:
https://bugs.openjdk.java.net/browse/JDK-8229496

Thanks,
Christoph


From vladimir.x.ivanov at oracle.com  Wed Nov 27 13:23:57 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 27 Nov 2019 16:23:57 +0300
Subject: RFR(S) 8234891: AArch64: Fix build failure after JDK-8234387
In-Reply-To: <847197ac-5969-900c-840a-61c475b19d84@bell-sw.com>
References: <847197ac-5969-900c-840a-61c475b19d84@bell-sw.com>
Message-ID: <abea05bf-b100-83e5-0b33-cc44c16fe42c@oracle.com>

The fix looks good and trivial.

Best regards,
Vladimir Ivanov

On 27.11.2019 15:55, Boris Ulasevich wrote:
> Hi,
> 
> Please review the fix in aarch64.ad to address the build issue "Ideal 
> node missing: CmpOp" raised after recent change in C2. The intuitive 
> operand name case correction CmpOp->cmpOp fixes the build, but leads to 
> unworkable jvm. Removing the match rule works good and jdk/hotspot tests 
> are Ok.
> 
> http://bugs.openjdk.java.net/browse/JDK-8234891
> http://cr.openjdk.java.net/~bulasevich/8234891/webrev.00
> 
> ARM32 build fails too. I will fix the problem in arm32.ad file separately.
> 
> thanks,
> Boris

From vladimir.x.ivanov at oracle.com  Wed Nov 27 13:54:38 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 27 Nov 2019 16:54:38 +0300
Subject: [14] RFR (S): 8231430: C2: Memory stomp in max_array_length() for
 T_ILLEGAL type
Message-ID: <00ab4462-ca1d-7e37-6e92-aca8e975e79d@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8231430/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8231430

There's a memory stomp happening in max_array_length() for T_ILLEGAL 
type. T_ILLEGAL type arises as an element basic type for a merge of 2 
primitive arrays (bottom[]). max_array_length() does some input 
normalization (T_ILLEGAL => T_BYTE), but first it acquires a reference 
to the a cache slot which is out-of-bounds (T_ILLEGAL = 99 vs T_CONFLICT 
= 19).

I was able to reproduce the problem as a corruption of one of the OOPs 
in Universe::_mirrors array which happened to be put close enough to 
max_array_length_cache in memory.

I propose to completely remove the cache. 
arrayOopDesc::max_array_length() doesn't look too expensive and the 
method is not used on a hot path anywhere.

Also, I put an assert for T_VOID, T_CONFLICT, T_NARROWKLASS cases, but 
left the logic there (=> T_BYTE) to get more testing before removing them.

Testing: hs-precheckin-comp, tier1-5.

Best regards,
Vladimir Ivanov

From stuart.monteith at linaro.org  Wed Nov 27 16:06:44 2019
From: stuart.monteith at linaro.org (Stuart Monteith)
Date: Wed, 27 Nov 2019 16:06:44 +0000
Subject: [aarch64-port-dev ] RFR(S) 8234891: AArch64: Fix build failure
 after JDK-8234387
In-Reply-To: <abea05bf-b100-83e5-0b33-cc44c16fe42c@oracle.com>
References: <847197ac-5969-900c-840a-61c475b19d84@bell-sw.com>
 <abea05bf-b100-83e5-0b33-cc44c16fe42c@oracle.com>
Message-ID: <CAEGA6kbYiHBkrQb5vTcjPmP1J4n146nGQWoonm=8a_Tg1_6tSg@mail.gmail.com>

Thanks Boris - looks good to me.
Please ask me or my fellow Arm engineers if you should need any help
testing in future.

On Wed, 27 Nov 2019 at 13:26, Vladimir Ivanov
<vladimir.x.ivanov at oracle.com> wrote:
>
> The fix looks good and trivial.
>
> Best regards,
> Vladimir Ivanov
>
> On 27.11.2019 15:55, Boris Ulasevich wrote:
> > Hi,
> >
> > Please review the fix in aarch64.ad to address the build issue "Ideal
> > node missing: CmpOp" raised after recent change in C2. The intuitive
> > operand name case correction CmpOp->cmpOp fixes the build, but leads to
> > unworkable jvm. Removing the match rule works good and jdk/hotspot tests
> > are Ok.
> >
> > http://bugs.openjdk.java.net/browse/JDK-8234891
> > http://cr.openjdk.java.net/~bulasevich/8234891/webrev.00
> >
> > ARM32 build fails too. I will fix the problem in arm32.ad file separately.
> >
> > thanks,
> > Boris

From vladimir.kozlov at oracle.com  Wed Nov 27 19:20:04 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 27 Nov 2019 11:20:04 -0800
Subject: [14] RFR (S): 8231430: C2: Memory stomp in max_array_length() for
 T_ILLEGAL type
In-Reply-To: <00ab4462-ca1d-7e37-6e92-aca8e975e79d@oracle.com>
References: <00ab4462-ca1d-7e37-6e92-aca8e975e79d@oracle.com>
Message-ID: <83147646-d353-1f46-f50b-8c0edf16645f@oracle.com>

There is assert in arrayOopDesc::max_array_length() which checks '< T_CONFLICT'.
Next assert 'type2aelembytes(type) != 0' will be triggered for T_VOID.
The assert in type2aelembytes() will be triggered for T_ADDRESS since allow_address argument is false by default.

Which leaves T_METADATA and T_NARROWKLASS to check for since they can't be elements of array. Right?

May be we should have permanent guarantee() in TypeAryPtr::max_array_length() for all types which we don't expect to see 
and not temporary assert().

Thanks,
Vladimir

On 11/27/19 5:54 AM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/8231430/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8231430
> 
> There's a memory stomp happening in max_array_length() for T_ILLEGAL type. T_ILLEGAL type arises as an element basic 
> type for a merge of 2 primitive arrays (bottom[]). max_array_length() does some input normalization (T_ILLEGAL => 
> T_BYTE), but first it acquires a reference to the a cache slot which is out-of-bounds (T_ILLEGAL = 99 vs T_CONFLICT = 19).
> 
> I was able to reproduce the problem as a corruption of one of the OOPs in Universe::_mirrors array which happened to be 
> put close enough to max_array_length_cache in memory.
> 
> I propose to completely remove the cache. arrayOopDesc::max_array_length() doesn't look too expensive and the method is 
> not used on a hot path anywhere.
> 
> Also, I put an assert for T_VOID, T_CONFLICT, T_NARROWKLASS cases, but left the logic there (=> T_BYTE) to get more 
> testing before removing them.
> 
> Testing: hs-precheckin-comp, tier1-5.
> 
> Best regards,
> Vladimir Ivanov

From vladimir.kozlov at oracle.com  Wed Nov 27 19:54:02 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 27 Nov 2019 11:54:02 -0800
Subject: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for client
 VMs due to Unrecognized VM option LoopUnrollLimit
In-Reply-To: <20191127125735.B9BE111F377@aojmv0009>
References: <20191127125735.B9BE111F377@aojmv0009>
Message-ID: <bd187ed5-6dc3-0e43-f13e-06e9c48be13b@oracle.com>

Hi Christoph

I was about suggest IgnoreUnrecognizedVMOptions flag but remembered discussion about 8231954 fix.
But I think the test should be run with Graal - it does have OSR compilation and we need to test it.

We can do it by splitting test runs (duplicate @test block with different run flags) to have 2 tests with different 
flags and conditions. See [1].

For existing @run block we use `@requires vm.compiler2.enabled` and for new without LoopUnrollLimit - `vm.graal.enabled`.

Thanks,
Vladimir

[1] test/hotspot/jtreg/runtime/exceptionMsgs/ArrayIndexOutOfBoundsException/ArrayIndexOutOfBoundsExceptionTest.java

On 11/27/19 4:55 AM, christoph.goettschkes at microdoc.com wrote:
> Hi,
> 
> please review the following small changeset which fixes the test
> test/hotspot/jtreg/compiler/loopopts/TestDivZeroCheckControl.java for
> client VMs.
> I added the requirement "vm.compiler2.enabled & !vm.graal.enabled", since
> the original bug is for C2 only. If I understand the bug description well,
> the provided test case works solely because of the provided
> "LoopUnrollLimit" flag, which is only valid for C2.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8234906
> Webrev: https://cr.openjdk.java.net/~cgo/8234906/webrev.00/
> 
> Bug which introduced the issue:
> https://bugs.openjdk.java.net/browse/JDK-8229496
> 
> Thanks,
> Christoph
> 

From Charlie.Gracie at microsoft.com  Wed Nov 27 19:59:11 2019
From: Charlie.Gracie at microsoft.com (Charlie Gracie)
Date: Wed, 27 Nov 2019 19:59:11 +0000
Subject: scalar replacement of arrays affected by minor changes to
 surrounding code
In-Reply-To: <87a7azid39.fsf@redhat.com>
References: <CAKy2qXcocPHKcArSKu1XgPAtmmVM5PRdT0GUMgCK6qmk=_dmSg@mail.gmail.com>
 <87d0fvidxs.fsf@redhat.com> <87a7azid39.fsf@redhat.com>
Message-ID: <21F5DF77-4DA9-437C-9DEB-CFABFC4C9C19@microsoft.com>

Hi,

Recently, I was analyzing the performance of a workload and noticed a significant
amount of allocation from a single array allocation site. This allocation does not survive
the simple method which allocates it. A simplified version of the code is very similar to
the code being described by Govind. 

From this conversation I noticed JDK-8231291 and the patch provided by Roland.
I applied the patch to tip and after a few minor additions it indeed provides measurable
improvements to the workload I was measuring and passes all of my testing.

Roland is there anything I could do to help get this type of change into tip and any
appropriate back ports? I have tested some workloads with this patch and I think
I covered the required checks but I could have easily missed something.

Here is a modified version of Roland's original patch:
diff --git a/src/hotspot/share/opto/compile.cpp b/src/hotspot/share/opto/compile.cpp
index 08552e0756..98bf37f6e0 100644
--- a/src/hotspot/share/opto/compile.cpp
+++ b/src/hotspot/share/opto/compile.cpp
@@ -2292,7 +2292,7 @@ void Compile::Optimize() {
     if (has_loops()) {
       // Cleanup graph (remove dead nodes).
       TracePhase tp("idealLoop", &timers[_t_idealLoop]);
-      PhaseIdealLoop::optimize(igvn, LoopOptsNone);
+      PhaseIdealLoop::optimize(igvn, LoopOptsMaxUnroll);
       if (major_progress()) print_method(PHASE_PHASEIDEAL_BEFORE_EA, 2);
       if (failing())  return;
     }
diff --git a/src/hotspot/share/opto/compile.hpp b/src/hotspot/share/opto/compile.hpp
index 50906800ba..710aec8e3a 100644
--- a/src/hotspot/share/opto/compile.hpp
+++ b/src/hotspot/share/opto/compile.hpp
@@ -93,6 +93,7 @@ struct Final_Reshape_Counts;
 enum LoopOptsMode {
   LoopOptsDefault,
   LoopOptsNone,
+  LoopOptsMaxUnroll,
   LoopOptsShenandoahExpand,
   LoopOptsShenandoahPostExpand,
   LoopOptsSkipSplitIf,
diff --git a/src/hotspot/share/opto/loopnode.cpp b/src/hotspot/share/opto/loopnode.cpp
index 2038a4c8f9..3c13a9aa10 100644
--- a/src/hotspot/share/opto/loopnode.cpp
+++ b/src/hotspot/share/opto/loopnode.cpp
@@ -2796,6 +2796,7 @@ bool PhaseIdealLoop::process_expensive_nodes() {
 void PhaseIdealLoop::build_and_optimize(LoopOptsMode mode) {
   bool do_split_ifs = (mode == LoopOptsDefault);
   bool skip_loop_opts = (mode == LoopOptsNone);
+  bool do_max_unroll = (mode == LoopOptsMaxUnroll);
 
   int old_progress = C->major_progress();
   uint orig_worklist_size = _igvn._worklist.size();
@@ -2859,7 +2860,7 @@ void PhaseIdealLoop::build_and_optimize(LoopOptsMode mode) {
 
   BarrierSetC2* bs = BarrierSet::barrier_set()->barrier_set_c2();
   // Nothing to do, so get out
-  bool stop_early = !C->has_loops() && !skip_loop_opts && !do_split_ifs && !_verify_me && !_verify_only &&
+  bool stop_early = !C->has_loops() && !skip_loop_opts && !do_max_unroll && !do_split_ifs && !_verify_me && !_verify_only &&
     !bs->is_gc_specific_loop_opts_pass(mode);
   bool do_expensive_nodes = C->should_optimize_expensive_nodes(_igvn);
   bool strip_mined_loops_expanded = bs->strip_mined_loops_expanded(mode);
@@ -3009,6 +3010,44 @@ void PhaseIdealLoop::build_and_optimize(LoopOptsMode mode) {
     return;
   }
 
+  if (do_max_unroll) {
+    for (LoopTreeIterator iter(_ltree_root); !iter.done(); iter.next()) {
+      IdealLoopTree* lpt = iter.current();
+      if (lpt->is_innermost() && lpt->_allow_optimizations && !lpt->_has_call && lpt->is_counted()) {
+        lpt->compute_trip_count(this);
+
+        if (lpt->do_one_iteration_loop(this)) {
+          continue;
+        }
+
+        if (lpt->do_remove_empty_loop(this)) {
+          continue;
+        }
+        AutoNodeBudget node_budget(this);
+        CountedLoopNode *cl = lpt->_head->as_CountedLoop();
+        // Do not do anything for invalid, pre or post loops
+        if (cl->is_valid_counted_loop() && !cl->is_pre_loop() && !cl->is_post_loop()) {
+          // Compute loop trip count from profile data
+          lpt->compute_profile_trip_cnt(this);
+          if (cl->is_normal_loop()) {
+            if (lpt->policy_maximally_unroll(this)) {
+              memset(worklist.adr(), 0, worklist.Size()*sizeof(Node*));
+              do_maximally_unroll(lpt, worklist);
+            }
+          }
+        }
+      }
+    }
+
+    C->restore_major_progress(old_progress);
+    _igvn.optimize();
+
+    if (C->log() != NULL) {
+      log_loop_tree(_ltree_root, _ltree_root, C->log());
+    }
+    return;
+  }
+
   if (bs->optimize_loops(this, mode, visited, nstack, worklist)) {
     _igvn.optimize();
     if (C->log() != NULL) {

Thanks,
Charlie Gracie

?On 2019-09-20, 4:38 AM, "hotspot-compiler-dev on behalf of Roland Westrelin" <hotspot-compiler-dev-bounces at openjdk.java.net on behalf of rwestrel at redhat.com> wrote:

    
    > The problem with that one is that EA would need the loop to be fully
    > unrolled to eliminate the allocation but that only happens after EA. So
    > it's a pass ordering problem. We already run a pass of loop
    > optimizations before EA so it seems we could have it take care of fully
    > unrolling the loop.
    
    I created JDK-8231291 for that one.
    
    Roland.
    

From john.r.rose at oracle.com  Thu Nov 28 03:05:11 2019
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 27 Nov 2019 19:05:11 -0800
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <2eb6f996-cd85-00cb-b795-dd2eefabd10b@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <d12ae70f-77ad-b2d5-8af6-87a9d13a725f@oracle.com>
 <2686ec0d-3dac-a0c3-d2c3-0fa5211bc07b@oracle.com>
 <2eb6f996-cd85-00cb-b795-dd2eefabd10b@oracle.com>
Message-ID: <FB5656BF-068D-486F-BF4C-244A106BFCA5@oracle.com>

I too would expect these things to be placed in globalDefinitions.hpp
rather than align.hpp, for the reasons already given:  There are already
similar functions in there.

I?m not against cleaning up globalDefinitions.hpp, but until there?s
a better proposal on the table, let?s stay with it for functions like this.

? John

On Nov 26, 2019, at 2:23 AM, David Holmes <david.holmes at oracle.com> wrote:
> 
> On 26/11/2019 8:06 pm, Claes Redestad wrote:
>> On 2019-11-26 10:50, David Holmes wrote:
>>> Hi Claes,
>>> 
>>> Just some high-level comments
>>> 
>>> - should next_power_of_two be defined in globalDefinitions.hpp along side the related functionality ie is_power_of_two ?
>> I thought we are trying to move things _out_ of globalDefinitions. I
> 
> We are? I don't recall hearing that. But wherever these go seems they all belong together.
> 
>> agree align.hpp might not be the best place, either, though..
> 
> I thought align.hpp as strange place too. :)
> 
>>> 
>>> - can next_power_of_two build on the existing log2_* functions (or vice versa)?
>> Yes, log2_intptr et al could probably be tamed to do a single step
>> operation, although we'd need to add 64-bit implementations in
>> count_leading_zeros. At least these log2_* functions already deal with
>> overflows without looping forever.
>>> 
>>> - do the existing ZUtils not cover the same general area?
>>> 
>>> ./share/gc/z/zUtils.inline.hpp
>>> 
>>> inline size_t ZUtils::round_up_power_of_2(size_t value) {
>>>    assert(value != 0, "Invalid value");
>>> 
>>>    if (is_power_of_2(value)) {
>>>      return value;
>>>    }
>>> 
>>>    return (size_t)1 << (log2_intptr(value) + 1);
>>> }
>>> 
>>> inline size_t ZUtils::round_down_power_of_2(size_t value) {
>>>    assert(value != 0, "Invalid value");
>>>    return (size_t)1 << log2_intptr(value);
>>> }
>> round_up_power_of_2 is similar, but not identical (next_power_of_two doesn't care if the value is already a power of 2, nor should it).
> 
> Okay but seems perhaps these should also be moved out of ZUtils and co-located with the other "power of two" functions.
> 
> Cheers,
> David
> -----
> 
>> /Claes


From thomas.stuefe at gmail.com  Thu Nov 28 07:34:49 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 28 Nov 2019 08:34:49 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
Message-ID: <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>

Hi Claes,

I think this is useful. Why not a 64bit variant too? If you do not want to
go through the hassle of providing a count_leading_zeros(uint64_t), you
could call the 32bit variant twice and take care of endianness for the
caller.

--

In inline int32_t next_power_of_two(int32_t value) , should we weed out
negative input values right away instead of asserting at the end of the
function?

--

The functions will always return the next power of two, even if the input
is a power of two - e.g. "2" for "1". Is that intended? It would be nice to
have an API comment in the header describing these corner cases (what
happens for negative input, what happens if input is power 2).

--

The patch can cause subtle differences in some caller code, I think, if
input value is a power of 2 already. See e.g:

http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/libadt/dict.cpp.udiff.html

-  i=16;
-  while( i < size ) i <<= 1;
+  i = MAX2(16, (int)next_power_of_two(size));

If i == size == 16, old code would keep i==16, new code would come to
i==32, I think.

http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/opto/phaseX.cpp.udiff.html

 //------------------------------round_up---------------------------------------
 // Round up to nearest power of 2
-uint NodeHash::round_up( uint x ) {
-  x += (x>>2);                  // Add 25% slop
-  if( x <16 ) return 16;        // Small stuff
-  uint i=16;
-  while( i < x ) i <<= 1;       // Double to fit
-  return i;                     // Return hash table size
+uint NodeHash::round_up(uint x) {
+  x += (x >> 2);                  // Add 25% slop
+  return MAX2(16U, next_power_of_two(x));
 }

same here. If x == 16, before we'd return 16, now 32.

---

http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/runtime/threadSMR.cpp.udiff.html

I admit I do not understand the current coding :) I do not believe it works
for all input values, e.g. were get_java_thread_list()->length()==1025,
we'd get 1861 - if I am not mistaken. Your code is definitely clearer but
not equivalent to the old one.

---

In the end, I wonder whether we should have two kind of APIs, or a
parameter, distinguishing between "next power of 2" and "next power of 2
unless input value is already power of 2".

Cheers, Thomas


On Tue, Nov 26, 2019 at 10:42 AM Claes Redestad <claes.redestad at oracle.com>
wrote:

> Hi,
>
> in various places in the hotspot we have custom code to calculate the
> next power of two, some of which have potential to go into an infinite
> loop in case of an overflow.
>
> This patch proposes adding next_power_of_two utility methods which
> avoid infinite loops on overflow, while providing slightly more
> efficient code in most cases.
>
> Bug:    https://bugs.openjdk.java.net/browse/JDK-8234331
> Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/
>
> Testing: tier1-3
>
> Thanks!
>
> /Claes
>

From thomas.stuefe at gmail.com  Thu Nov 28 07:44:25 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 28 Nov 2019 08:44:25 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
Message-ID: <CAA-vtUzPCHtQz3BVnLFSrW1+jUSbtwiJU9EJk5FKU2Q854O_qA@mail.gmail.com>

p.s. I think it would be good to have some gtests for these functions,
especially to test corner cases.

Cheers, Thomas

On Thu, Nov 28, 2019 at 8:34 AM Thomas St?fe <thomas.stuefe at gmail.com>
wrote:

> Hi Claes,
>
> I think this is useful. Why not a 64bit variant too? If you do not want to
> go through the hassle of providing a count_leading_zeros(uint64_t), you
> could call the 32bit variant twice and take care of endianness for the
> caller.
>
> --
>
> In inline int32_t next_power_of_two(int32_t value) , should we weed out
> negative input values right away instead of asserting at the end of the
> function?
>
> --
>
> The functions will always return the next power of two, even if the input
> is a power of two - e.g. "2" for "1". Is that intended? It would be nice to
> have an API comment in the header describing these corner cases (what
> happens for negative input, what happens if input is power 2).
>
> --
>
> The patch can cause subtle differences in some caller code, I think, if
> input value is a power of 2 already. See e.g:
>
>
> http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/libadt/dict.cpp.udiff.html
>
> -  i=16;
> -  while( i < size ) i <<= 1;
> +  i = MAX2(16, (int)next_power_of_two(size));
>
> If i == size == 16, old code would keep i==16, new code would come to
> i==32, I think.
>
>
> http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/opto/phaseX.cpp.udiff.html
>
>
>  //------------------------------round_up---------------------------------------
>  // Round up to nearest power of 2
> -uint NodeHash::round_up( uint x ) {
> -  x += (x>>2);                  // Add 25% slop
> -  if( x <16 ) return 16;        // Small stuff
> -  uint i=16;
> -  while( i < x ) i <<= 1;       // Double to fit
> -  return i;                     // Return hash table size
> +uint NodeHash::round_up(uint x) {
> +  x += (x >> 2);                  // Add 25% slop
> +  return MAX2(16U, next_power_of_two(x));
>  }
>
> same here. If x == 16, before we'd return 16, now 32.
>
> ---
>
>
> http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/runtime/threadSMR.cpp.udiff.html
>
> I admit I do not understand the current coding :) I do not believe it
> works for all input values, e.g. were
> get_java_thread_list()->length()==1025, we'd get 1861 - if I am not
> mistaken. Your code is definitely clearer but not equivalent to the old one.
>
> ---
>
> In the end, I wonder whether we should have two kind of APIs, or a
> parameter, distinguishing between "next power of 2" and "next power of 2
> unless input value is already power of 2".
>
> Cheers, Thomas
>
>
>
>
>
> On Tue, Nov 26, 2019 at 10:42 AM Claes Redestad <claes.redestad at oracle.com>
> wrote:
>
>> Hi,
>>
>> in various places in the hotspot we have custom code to calculate the
>> next power of two, some of which have potential to go into an infinite
>> loop in case of an overflow.
>>
>> This patch proposes adding next_power_of_two utility methods which
>> avoid infinite loops on overflow, while providing slightly more
>> efficient code in most cases.
>>
>> Bug:    https://bugs.openjdk.java.net/browse/JDK-8234331
>> Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/
>>
>> Testing: tier1-3
>>
>> Thanks!
>>
>> /Claes
>>
>

From nick.gasson at arm.com  Thu Nov 28 07:50:32 2019
From: nick.gasson at arm.com (Nick Gasson)
Date: Thu, 28 Nov 2019 15:50:32 +0800
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
 <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com>
Message-ID: <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com>

Hi Andrew,

>>
>> CompressedKlassPointers::base() => 0xffff0b4b5000
>> CompressedKlassPointers::shift() => 3
> 
> This is bad. Can you have a look at the allocation code to see why the search
> for an appropriate address range fails?
> 

We have a loop in Metaspace::allocate_metaspace_compressed_klass_ptrs 
that searches for a 4G aligned location for the compressed class space 
on AArch64, but this search is not done if CDS is in use and the archive 
was loaded successfully, because in that case the class space has 
already been mapped (i.e. `metaspace_rs.is_reserved()' is true).

Previously it was only possible to map the CDS archive at 0x800000000. 
The compressed class base is set to the start of this region which 
happens to be 4G aligned so our MacroAssembler::load_klass optimisation 
applies and we emit the short code sequence.

With the recent change in 8231610, if the CDS archive cannot be mapped 
at that address (e.g. because of ASLR or because the heap is mapped 
there) then the CDS archive will be relocated to an arbitrary address 
decided by mmap. That's where the oddly-aligned compressed klass base 
above comes from. This causes MacroAssembler::load_klass to emit the 
inefficient sequence which then overflows the buffer for the itable stub 
(the worst-case size estimate there is wrong, which needs to be fixed 
separately).

A minimal way to reproduce this is:

$ java -XX:HeapBaseMinAddress=33G -Xshare:on -Xlog:cds=debug -version
...
[0.050s][info ][cds] CDS archive was created with max heap size = 128M, 
and the following configuration:
[0.050s][info ][cds]     narrow_klass_base = 0x0000fffec7507000, 
narrow_klass_shift = 3
...
#  guarantee(masm->pc() <= s->code_end()) failed: itable #2: overflowed 
buffer, estimated len: 180, actual len: 184, overrun: 4


I suggest we move the 4G-aligned search from 
allocate_metaspace_compressed_klass_ptrs into its own function that can 
then be called from MetaspaceShared::reserve_shared_space when 
requested_address==NULL (i.e. the fallback path when mmap at 0x800000000 
fails). If you're happy with this I'll make a patch for review?


Thanks,
Nick

From ioi.lam at oracle.com  Thu Nov 28 08:19:36 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Thu, 28 Nov 2019 00:19:36 -0800
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
 <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com>
 <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com>
Message-ID: <f7aa728b-a540-67b9-39d3-8a39a3318a09@oracle.com>


On 11/27/19 11:50 PM, Nick Gasson wrote:
> Hi Andrew,
>
>>>
>>> CompressedKlassPointers::base() => 0xffff0b4b5000
>>> CompressedKlassPointers::shift() => 3
>>
>> This is bad. Can you have a look at the allocation code to see why 
>> the search
>> for an appropriate address range fails?
>>
>
> We have a loop in Metaspace::allocate_metaspace_compressed_klass_ptrs 
> that searches for a 4G aligned location for the compressed class space 
> on AArch64, but this search is not done if CDS is in use and the 
> archive was loaded successfully, because in that case the class space 
> has already been mapped (i.e. `metaspace_rs.is_reserved()' is true).
>
> Previously it was only possible to map the CDS archive at 0x800000000. 
> The compressed class base is set to the start of this region which 
> happens to be 4G aligned so our MacroAssembler::load_klass 
> optimisation applies and we emit the short code sequence.
>
> With the recent change in 8231610, if the CDS archive cannot be mapped 
> at that address (e.g. because of ASLR or because the heap is mapped 
> there) then the CDS archive will be relocated to an arbitrary address 
> decided by mmap. That's where the oddly-aligned compressed klass base 
> above comes from. This causes MacroAssembler::load_klass to emit the 
> inefficient sequence which then overflows the buffer for the itable 
> stub (the worst-case size estimate there is wrong, which needs to be 
> fixed separately).
>
> A minimal way to reproduce this is:
>
> $ java -XX:HeapBaseMinAddress=33G -Xshare:on -Xlog:cds=debug -version
> ...
> [0.050s][info ][cds] CDS archive was created with max heap size = 
> 128M, and the following configuration:
> [0.050s][info ][cds]???? narrow_klass_base = 0x0000fffec7507000, 
> narrow_klass_shift = 3
> ...
> #? guarantee(masm->pc() <= s->code_end()) failed: itable #2: 
> overflowed buffer, estimated len: 180, actual len: 184, overrun: 4
>
>
> I suggest we move the 4G-aligned search from 
> allocate_metaspace_compressed_klass_ptrs into its own function that 
> can then be called from MetaspaceShared::reserve_shared_space when 
> requested_address==NULL (i.e. the fallback path when mmap at 
> 0x800000000 fails). If you're happy with this I'll make a patch for 
> review?
>

You can also force CDS archive relocation with 
-XX:+UnlockDiagnosticVMOptions -XX:ArchiveRelocationMode=1. That way you 
can test the behavior with the default heap settings.

Thanks
- Ioi


>
> Thanks,
> Nick


From boris.ulasevich at bell-sw.com  Thu Nov 28 08:42:37 2019
From: boris.ulasevich at bell-sw.com (Boris Ulasevich)
Date: Thu, 28 Nov 2019 11:42:37 +0300
Subject: RFR(S) 8234893: ARM32: build failure after JDK-8234387
Message-ID: <ba56df0b-af44-9c1a-acfa-a87e0fbe2ba3@bell-sw.com>

Hi,

Please review the fix in arm.ad to address the ARM32 build issue "Ideal 
node missing". The fix is just trivial adding missing declarations: 
R8RegP, R9RegP, R12RegP, SPRegP. jdk/hotspot jtreg tests are Ok.

http://bugs.openjdk.java.net/browse/JDK-8234893
http://cr.openjdk.java.net/~bulasevich/8234893/webrev.00

thanks,
Boris

From boris.ulasevich at bell-sw.com  Thu Nov 28 08:42:43 2019
From: boris.ulasevich at bell-sw.com (Boris Ulasevich)
Date: Thu, 28 Nov 2019 11:42:43 +0300
Subject: [aarch64-port-dev ] RFR(S) 8234891: AArch64: Fix build failure
 after JDK-8234387
In-Reply-To: <CAEGA6kbYiHBkrQb5vTcjPmP1J4n146nGQWoonm=8a_Tg1_6tSg@mail.gmail.com>
References: <847197ac-5969-900c-840a-61c475b19d84@bell-sw.com>
 <abea05bf-b100-83e5-0b33-cc44c16fe42c@oracle.com>
 <CAEGA6kbYiHBkrQb5vTcjPmP1J4n146nGQWoonm=8a_Tg1_6tSg@mail.gmail.com>
Message-ID: <6e5d8aec-538e-20c7-a035-b04ff7e8691f@bell-sw.com>

Thank you!

On 27.11.2019 19:06, Stuart Monteith wrote:
> Thanks Boris - looks good to me.
> Please ask me or my fellow Arm engineers if you should need any help
> testing in future.
> 
> On Wed, 27 Nov 2019 at 13:26, Vladimir Ivanov
> <vladimir.x.ivanov at oracle.com> wrote:
>>
>> The fix looks good and trivial.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 27.11.2019 15:55, Boris Ulasevich wrote:
>>> Hi,
>>>
>>> Please review the fix in aarch64.ad to address the build issue "Ideal
>>> node missing: CmpOp" raised after recent change in C2. The intuitive
>>> operand name case correction CmpOp->cmpOp fixes the build, but leads to
>>> unworkable jvm. Removing the match rule works good and jdk/hotspot tests
>>> are Ok.
>>>
>>> http://bugs.openjdk.java.net/browse/JDK-8234891
>>> http://cr.openjdk.java.net/~bulasevich/8234891/webrev.00
>>>
>>> ARM32 build fails too. I will fix the problem in arm32.ad file separately.
>>>
>>> thanks,
>>> Boris

From vladimir.x.ivanov at oracle.com  Thu Nov 28 08:55:49 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 28 Nov 2019 11:55:49 +0300
Subject: RFR(S) 8234893: ARM32: build failure after JDK-8234387
In-Reply-To: <ba56df0b-af44-9c1a-acfa-a87e0fbe2ba3@bell-sw.com>
References: <ba56df0b-af44-9c1a-acfa-a87e0fbe2ba3@bell-sw.com>
Message-ID: <cac2fdca-773b-b43c-fbad-a42a0debc7b7@oracle.com>

Looks good.

Best regards,
Vladimir Ivanov

On 28.11.2019 11:42, Boris Ulasevich wrote:
> Hi,
> 
> Please review the fix in arm.ad to address the ARM32 build issue "Ideal 
> node missing". The fix is just trivial adding missing declarations: 
> R8RegP, R9RegP, R12RegP, SPRegP. jdk/hotspot jtreg tests are Ok.
> 
> http://bugs.openjdk.java.net/browse/JDK-8234893
> http://cr.openjdk.java.net/~bulasevich/8234893/webrev.00
> 
> thanks,
> Boris

From christian.hagedorn at oracle.com  Thu Nov 28 09:24:06 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Thu, 28 Nov 2019 10:24:06 +0100
Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and C1/Xint
Message-ID: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com>

Hi

Please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8233033
http://cr.openjdk.java.net/~chagedorn/8233033/webrev.00/

The C2 compiled code produces a wrong result for 'iFld' in the test 
case. It is -8 instead of -7. The loop in the test case is partially 
peeled and then unswitched. The wrong result is produced because a wrong 
state is transferred to the interpreter when an uncommon trap is hit in 
the C2 compiled code in the fast version of the unswitched loop.

The problem is when unswitching the loop, we clone the original loop 
predicates for the slow and fast version of the loop [1] but we do not 
account for partially peeled statements that are control dependent on 
the loop predicates (i.e. need to be executed after the predicates). As 
a result, these are executed before the cloned loop predicates.

The situation of the test case method PartialPeelingUnswitch::test() is 
visualized in [2]. IfTrue 118, the entry control of the original loop 
which follows right after the loop predicates, has an output edge to the 
StoreI 353 node. This node belongs to the "iFld += -7" statement which 
was partially peeled before. When creating the slow version of the loop 
and cloning the predicates in 
PhaseIdealLoop::create_slow_version_of_loop(), this control dependency 
is lost. StoreI 353 is still dependent on IfTrue 118 instead of IfTrue 
472 (fast loop entry control) and IfTrue 476 (slow loop entry control). 
The original loop predicates are later removed and thus, when hitting 
the uncommon trap in the fast loop, we accidentally executed "iFld += 
-7" (StoreI 353) already even though the interpreter assumes C2 has not 
executed any statements in the loop. As a result, "iFld += -7" is 
executed twice in a row which produces a wrong result.

The fix is to replace the control input of all statements that have a 
control input from the original loop entry control (and are not the 
"loop selection" IfNode) with the fast and slow entry control, 
respectively. Since the statements cannot have two control inputs they 
need to be cloned together with all following nodes on a path to the 
loop phi nodes. The output of the last node before a loop phi on such a 
path needs to be adjusted to only point to the phi node belonging to the 
fast loop. The last node on the cloned path is set to the phi node 
belonging to the slow loop. The fix is visualized in [3]. The control 
input of StoreI 353 is now the entry control of the fast loop (IfTrue 
472) and its output only points to the corresponding Phi 442 of the fast 
loop. The same was done for the cloned node StoreI 476 of StoreI 353 for 
the slow loop.

This bug can also be reproduced with JDK 11. Should we target this fix 
to 14 or defer it to 15 (since it's a more complex one)?


Thank you!

Best regards,
Christian


[1] 
http://hg.openjdk.java.net/jdk/jdk/file/6f42d2a19117/src/hotspot/share/opto/loopUnswitch.cpp#l272
[2] 
https://bugs.openjdk.java.net/secure/attachment/85593/wrong_dependencies.png
[3] 
https://bugs.openjdk.java.net/secure/attachment/85592/fixed_dependencies.png

From vladimir.x.ivanov at oracle.com  Thu Nov 28 09:46:39 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 28 Nov 2019 12:46:39 +0300
Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and
 C1/Xint
In-Reply-To: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com>
References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com>
Message-ID: <e515f650-c46c-a7c5-b08e-86c6fb02197a@oracle.com>

(while I'm looking into proposed fix)

> This bug can also be reproduced with JDK 11. Should we target this fix 
> to 14 or defer it to 15 (since it's a more complex one)?

Silently producing erroneous code is a more serious issue than a JVM 
crash (either during compilation or while executing generated code). 
Instead of just shutting down the process, it silently continues running 
and can stay unnoticed for a long time. Also, it makes it hard estimate 
the impact.

If we aren't confident in the stability of the fix, then it's worth 
considering implementing a stop-the-gap solution first (detect 
problematic code shape and either avoid the transformation or even bail 
out the compilation) while continue working on a proper fix.

But we should try our best to fix the problematic behavior in a prompt 
manner (in 14 and backported to 11).

Best regards,
Vladimir Ivanov

PS: please, update bug synopsis with a summary of the problem.

From aph at redhat.com  Thu Nov 28 10:03:18 2019
From: aph at redhat.com (Andrew Haley)
Date: Thu, 28 Nov 2019 10:03:18 +0000
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
 <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com>
 <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com>
Message-ID: <52a223ea-13d3-d7ef-72c7-bd0f590c12be@redhat.com>

On 11/28/19 7:50 AM, Nick Gasson wrote:
> Hi Andrew,
> 
>>>
>>> CompressedKlassPointers::base() => 0xffff0b4b5000
>>> CompressedKlassPointers::shift() => 3
>>
>> This is bad. Can you have a look at the allocation code to see why the search
>> for an appropriate address range fails?
> 
> We have a loop in Metaspace::allocate_metaspace_compressed_klass_ptrs 
> that searches for a 4G aligned location for the compressed class space 
> on AArch64, but this search is not done if CDS is in use and the archive 
> was loaded successfully, because in that case the class space has 
> already been mapped (i.e. `metaspace_rs.is_reserved()' is true).

Right. At the time I wrote that code, CDS was not much used by anything, so
I thought of it as a mariganl use case.

> Previously it was only possible to map the CDS archive at 0x800000000. 
> The compressed class base is set to the start of this region which 
> happens to be 4G aligned so our MacroAssembler::load_klass optimisation 
> applies and we emit the short code sequence.
> 
> With the recent change in 8231610, if the CDS archive cannot be mapped 
> at that address (e.g. because of ASLR or because the heap is mapped 
> there) then the CDS archive will be relocated to an arbitrary address 
> decided by mmap. That's where the oddly-aligned compressed klass base 
> above comes from. This causes MacroAssembler::load_klass to emit the 
> inefficient sequence which then overflows the buffer for the itable stub 
> (the worst-case size estimate there is wrong, which needs to be fixed 
> separately).

Correcting the stub size is a minor tidy-up which does not really need
its own Bug ID.

> A minimal way to reproduce this is:
> 
> $ java -XX:HeapBaseMinAddress=33G -Xshare:on -Xlog:cds=debug -version
> ...
> [0.050s][info ][cds] CDS archive was created with max heap size = 128M, 
> and the following configuration:
> [0.050s][info ][cds]     narrow_klass_base = 0x0000fffec7507000, 
> narrow_klass_shift = 3
> ...
> #  guarantee(masm->pc() <= s->code_end()) failed: itable #2: overflowed 
> buffer, estimated len: 180, actual len: 184, overrun: 4
> 
> 
> I suggest we move the 4G-aligned search from 
> allocate_metaspace_compressed_klass_ptrs into its own function that can 
> then be called from MetaspaceShared::reserve_shared_space when 
> requested_address==NULL (i.e. the fallback path when mmap at 0x800000000 
> fails). If you're happy with this I'll make a patch for review?

Yes, that sounds excellent. We really need it to avoid compressed
class pointers becoming an expensive option.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From nick.gasson at arm.com  Thu Nov 28 10:18:38 2019
From: nick.gasson at arm.com (Nick Gasson)
Date: Thu, 28 Nov 2019 18:18:38 +0800
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <52a223ea-13d3-d7ef-72c7-bd0f590c12be@redhat.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
 <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com>
 <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com>
 <52a223ea-13d3-d7ef-72c7-bd0f590c12be@redhat.com>
Message-ID: <044ce397-96c7-8b01-a6ed-d3ea2546749a@arm.com>

On 28/11/2019 18:03, Andrew Haley wrote:
>> (the worst-case size estimate there is wrong, which needs to be fixed
>> separately).
> 
> Correcting the stub size is a minor tidy-up which does not really need
> its own Bug ID.
> 

OK, but I'd like to also try removing the second call to __ load_klass 
in VtableStubs::create_itable_stub as that will shave a few instructions 
even in the normal case. I'll recalculate the size estimate when I do that.


Thanks,
Nick

From aph at redhat.com  Thu Nov 28 10:28:49 2019
From: aph at redhat.com (Andrew Haley)
Date: Thu, 28 Nov 2019 10:28:49 +0000
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <044ce397-96c7-8b01-a6ed-d3ea2546749a@arm.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
 <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com>
 <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com>
 <52a223ea-13d3-d7ef-72c7-bd0f590c12be@redhat.com>
 <044ce397-96c7-8b01-a6ed-d3ea2546749a@arm.com>
Message-ID: <f1ba5a8e-daaa-5165-44ad-a9ef3c7a72c8@redhat.com>

On 11/28/19 10:18 AM, Nick Gasson wrote:
> OK, but I'd like to also try removing the second call to __ load_klass 
> in VtableStubs::create_itable_stub as that will shave a few instructions 
> even in the normal case. I'll recalculate the size estimate when I do that.

OK. But beware of spending time on things that don't really matter. There's
a risk in making any change.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From christian.hagedorn at oracle.com  Thu Nov 28 10:32:28 2019
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Thu, 28 Nov 2019 11:32:28 +0100
Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and
 C1/Xint
In-Reply-To: <e515f650-c46c-a7c5-b08e-86c6fb02197a@oracle.com>
References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com>
 <e515f650-c46c-a7c5-b08e-86c6fb02197a@oracle.com>
Message-ID: <4bd7c970-f35f-008c-67a9-dac871fd6c3a@oracle.com>

Hi Vladimir

On 28.11.19 10:46, Vladimir Ivanov wrote:
>> This bug can also be reproduced with JDK 11. Should we target this fix 
>> to 14 or defer it to 15 (since it's a more complex one)?
> 
> Silently producing erroneous code is a more serious issue than a JVM 
> crash (either during compilation or while executing generated code). 
> Instead of just shutting down the process, it silently continues running 
> and can stay unnoticed for a long time. Also, it makes it hard estimate 
> the impact.
> 
> If we aren't confident in the stability of the fix, then it's worth 
> considering implementing a stop-the-gap solution first (detect 
> problematic code shape and either avoid the transformation or even bail 
> out the compilation) while continue working on a proper fix.
> 
> But we should try our best to fix the problematic behavior in a prompt 
> manner (in 14 and backported to 11).

Thanks for the explanation. That sounds reasonable.

> PS: please, update bug synopsis with a summary of the problem.

I added a more precise description and changed the bug summary from
"Results of execution differ for C2 and C1/Xint" into "C2 produces wrong 
result while unswitching a loop due to lost control dependencies"

Best regards,
Christian

From goetz.lindenmaier at sap.com  Thu Nov 28 10:50:50 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Thu, 28 Nov 2019 10:50:50 +0000
Subject: RFR(XS): 8234645: ARM32: C1: PatchingStub for field access: not
 enough bytes
In-Reply-To: <VI1PR0201MB2479CD69D07E505D34E8C8329A450@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB2479CD69D07E505D34E8C8329A450@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <AM6PR02MB5347B46979F65DB7B740A7FBEC470@AM6PR02MB5347.eurprd02.prod.outlook.com>

Hi Martin, 

the change looks good. 
I had to take a second look to grok your comment. Mentioning "2nd"
is a bit shaky.
I would prefer if you either move up the real comment to the 
first occurrence in the function, and add comment "// see above" at 
the others.  

Or you say:
// see comment before patching_epilog for 2nd ldr (str respectively)

Don't need a webrev for this.

Best regards,
  Goetz.


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of Doerr, Martin
> Sent: Dienstag, 26. November 2019 17:22
> To: christoph.goettschkes at microdoc.com; 'hotspot-compiler-
> dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> Subject: [CAUTION] RFR(XS): 8234645: ARM32: C1: PatchingStub for field
> access: not enough bytes
> 
> Hi Christoph,
> 
> thanks for reporting the bug
> https://bugs.openjdk.java.net/browse/JDK-8234645
> 
> Seems like the large offset fix in mem2reg and reg2mem missed the first
> patching stub in the long/double cases.
> We should have nop padding there, too (for same reason as for the 2nd
> patching stub).
> NativeMovRegMem should always consist of 2 instructions on arm32 in order
> to support larger offsets.
> 
> Webrev:
> http://cr.openjdk.java.net/~mdoerr/8234645_arm_padding/webrev.00/
> 
> May I ask you to test this fix? We don't have arm32 in our testing landscape.
> 
> Best regards,
> Martin


From martin.doerr at sap.com  Thu Nov 28 11:08:21 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 28 Nov 2019 11:08:21 +0000
Subject: RFR(XS): 8234645: ARM32: C1: PatchingStub for field access: not
 enough bytes
In-Reply-To: <AM6PR02MB5347B46979F65DB7B740A7FBEC470@AM6PR02MB5347.eurprd02.prod.outlook.com>
References: <VI1PR0201MB2479CD69D07E505D34E8C8329A450@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <AM6PR02MB5347B46979F65DB7B740A7FBEC470@AM6PR02MB5347.eurprd02.prod.outlook.com>
Message-ID: <VI1PR0201MB2479F1F3A4B3A7259F34ED069A470@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi G?tz,

thanks for the review. Pushed with updated comments.

Best regards,
Martin


> -----Original Message-----
> From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
> Sent: Donnerstag, 28. November 2019 11:51
> To: Doerr, Martin <martin.doerr at sap.com>;
> christoph.goettschkes at microdoc.com; 'hotspot-compiler-
> dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> Subject: RE: RFR(XS): 8234645: ARM32: C1: PatchingStub for field access: not
> enough bytes
> 
> Hi Martin,
> 
> the change looks good.
> I had to take a second look to grok your comment. Mentioning "2nd"
> is a bit shaky.
> I would prefer if you either move up the real comment to the
> first occurrence in the function, and add comment "// see above" at
> the others.
> 
> Or you say:
> // see comment before patching_epilog for 2nd ldr (str respectively)
> 
> Don't need a webrev for this.
> 
> Best regards,
>   Goetz.
> 
> 
> > -----Original Message-----
> > From: hotspot-compiler-dev <hotspot-compiler-dev-
> > bounces at openjdk.java.net> On Behalf Of Doerr, Martin
> > Sent: Dienstag, 26. November 2019 17:22
> > To: christoph.goettschkes at microdoc.com; 'hotspot-compiler-
> > dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> > Subject: [CAUTION] RFR(XS): 8234645: ARM32: C1: PatchingStub for field
> > access: not enough bytes
> >
> > Hi Christoph,
> >
> > thanks for reporting the bug
> > https://bugs.openjdk.java.net/browse/JDK-8234645
> >
> > Seems like the large offset fix in mem2reg and reg2mem missed the first
> > patching stub in the long/double cases.
> > We should have nop padding there, too (for same reason as for the 2nd
> > patching stub).
> > NativeMovRegMem should always consist of 2 instructions on arm32 in
> order
> > to support larger offsets.
> >
> > Webrev:
> > http://cr.openjdk.java.net/~mdoerr/8234645_arm_padding/webrev.00/
> >
> > May I ask you to test this fix? We don't have arm32 in our testing
> landscape.
> >
> > Best regards,
> > Martin


From per.liden at oracle.com  Thu Nov 28 11:10:29 2019
From: per.liden at oracle.com (Per Liden)
Date: Thu, 28 Nov 2019 12:10:29 +0100
Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles
In-Reply-To: <79bc1866-2293-37a6-9780-af24a01eb699@oracle.com>
References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com>
 <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com>
 <2154a53d-4d36-d26f-9155-5c955796f566@oracle.com>
 <79bc1866-2293-37a6-9780-af24a01eb699@oracle.com>
Message-ID: <d704ccb5-98f4-cd27-526f-80fd8e80cd0c@oracle.com>

Thanks Nils!

Looks good to me.

/Per

On 11/27/19 11:50 AM, Nils Eliasson wrote:
> Hi Per,
> 
> Here is an update webrev with your fixes included:
> 
> http://cr.openjdk.java.net/~neliasso/8234520/webrev.05/
> 
> I chose to go with CloneInst, and ClonePrimArray.
> 
> CopyOf, CopyOfRange - like ArrayCopy, doesn't have specialized versions, 
> rather check the type when expanding, and of course - there are no 
> versions for instances. But there are differences to what guards they 
> need, and when the guards are expanded. I would need to dig down into 
> the details to determine if it could be simplified.
> 
> Regards,
> 
> Nils
> 
> On 2019-11-26 14:29, Per Liden wrote:
>> Hi Nils,
>>
>> On 11/21/19 12:53 PM, Nils Eliasson wrote:
>>> I updated this to version 2.
>>>
>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/
>>>
>>> I found a problen running 
>>> compiler/arguments/TestStressReflectiveCode.java
>>>
>>> Even though the clone was created as a oop clone, the type node type 
>>> returns isa_aryprt. This is caused by the src ptr not being the base 
>>> pointer. Until I fix that I wanted a more robust test.
>>>
>>> In this webrev I split up the is_clonebasic into is_clone_oop and 
>>> is_clone_array. (is_clone_oop_array is already there). Having a 
>>> complete set with the three clone types allows for a robust test and 
>>> easy verification. (The three variants end up in different paths with 
>>> different GCs).
>>
>> A couple of suggestions:
>>
>> 1) Instead of
>>
>> ?CloneOop
>> ?CloneArray
>> ?CloneOopArray
>>
>> I think we should call the three types:
>>
>> ?CloneInstance
>> ?CloneTypeArray
>> ?CloneOopArray
>>
>> Since CloneOop is not actually cloning an oop, but an object/instance. 
>> And being explicit about TypeArray seems like a good thing to avoid 
>> any confusion about the difference compared to CloneOopArray. I guess 
>> PrimArray would be an alternative to TypeArray.
>>
>> And of course, if we change this then the is_clone/set_clone functions 
>> should follow the same naming convention.
>>
>> Btw, what about CopyOf and CopyOfRange? Don't they also come in Oop 
>> and Type versions, or are we handling those differently in some way? 
>> Looking at the code it looks like they are only used for the oop array 
>> case?
> 
>>
>>
>> 2) In zBarrierSerC2.cpp, do you mind if we do like this instead? I 
>> find that quite a bit easier to read.
>>
>> [...]
>> ? const Type** domain_fields = TypeTuple::fields(4);
>> ? domain_fields[TypeFunc::Parms + 0] = TypeInstPtr::NOTNULL;? // src
>> ? domain_fields[TypeFunc::Parms + 1] = TypeInstPtr::NOTNULL;? // dst
>> ? domain_fields[TypeFunc::Parms + 2] = TypeLong::LONG;??????? // size 
>> lower
>> ? domain_fields[TypeFunc::Parms + 3] = Type::HALF;??????????? // size 
>> upper
>> ? const TypeTuple* domain = TypeTuple::make(TypeFunc::Parms + 4, 
>> domain_fields);
>> [...]
>>
>>
>> 3) I'd also like to add some const, adjust indentation, etc, in a few 
>> places. Instead of listing them here I made a patch, which goes on top 
>> of yours. This patch also adjusts 2) above. Just shout if you have any 
>> objections.
>>
>> http://cr.openjdk.java.net/~pliden/8234520/webrev.03-review
>>
>> /Per
>>
>>>
>>> Regards,
>>>
>>> Nils
>>>
>>>
>>> On 2019-11-20 15:25, Nils Eliasson wrote:
>>>> Hi,
>>>>
>>>> I found a few bugs after the enabling of the clone intrinsic in ZGC.
>>>>
>>>> 1) The arraycopy clone_basic has the parameters adjusted to work as 
>>>> a memcopy. For an oop the src is pointing inside the oop to where we 
>>>> want to start copying. But when we want to do a runtime call to 
>>>> clone - the parameters are supposed to be the actual src oop and dst 
>>>> oop, and the size should be the instance size.
>>>>
>>>> For now I have made a workaround. What should be done later is using 
>>>> the offset in the arraycopy node to encode where the payload is, so 
>>>> that the base pointers are always correct. But that would require 
>>>> changes to the BarrierSet classes of all GCs. So I leave that for 
>>>> next release.
>>>>
>>>> 2) The size parameter of the TypeFunc for the runtime call has the 
>>>> wrong type. It was originally Long but missed the upper Half, it was 
>>>> fixed to INT (JDK-8233834), but that is wrong and causes the 
>>>> compiles to be skipped. We didn't notice that since they failed 
>>>> silently. That is also why we didn't notice problem #1 too.
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8234520
>>>>
>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/
>>>>
>>>> Please review!
>>>>
>>>> Nils
>>>>

From nils.eliasson at oracle.com  Thu Nov 28 11:25:08 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Thu, 28 Nov 2019 12:25:08 +0100
Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles
In-Reply-To: <d704ccb5-98f4-cd27-526f-80fd8e80cd0c@oracle.com>
References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com>
 <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com>
 <2154a53d-4d36-d26f-9155-5c955796f566@oracle.com>
 <79bc1866-2293-37a6-9780-af24a01eb699@oracle.com>
 <d704ccb5-98f4-cd27-526f-80fd8e80cd0c@oracle.com>
Message-ID: <c20e0f1c-7b22-e69d-5cb0-0b1d46dae454@oracle.com>

Thank you Per!

// Nils

On 2019-11-28 12:10, Per Liden wrote:
> Thanks Nils!
>
> Looks good to me.
>
> /Per
>
> On 11/27/19 11:50 AM, Nils Eliasson wrote:
>> Hi Per,
>>
>> Here is an update webrev with your fixes included:
>>
>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.05/
>>
>> I chose to go with CloneInst, and ClonePrimArray.
>>
>> CopyOf, CopyOfRange - like ArrayCopy, doesn't have specialized 
>> versions, rather check the type when expanding, and of course - there 
>> are no versions for instances. But there are differences to what 
>> guards they need, and when the guards are expanded. I would need to 
>> dig down into the details to determine if it could be simplified.
>>
>> Regards,
>>
>> Nils
>>
>> On 2019-11-26 14:29, Per Liden wrote:
>>> Hi Nils,
>>>
>>> On 11/21/19 12:53 PM, Nils Eliasson wrote:
>>>> I updated this to version 2.
>>>>
>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/
>>>>
>>>> I found a problen running 
>>>> compiler/arguments/TestStressReflectiveCode.java
>>>>
>>>> Even though the clone was created as a oop clone, the type node 
>>>> type returns isa_aryprt. This is caused by the src ptr not being 
>>>> the base pointer. Until I fix that I wanted a more robust test.
>>>>
>>>> In this webrev I split up the is_clonebasic into is_clone_oop and 
>>>> is_clone_array. (is_clone_oop_array is already there). Having a 
>>>> complete set with the three clone types allows for a robust test 
>>>> and easy verification. (The three variants end up in different 
>>>> paths with different GCs).
>>>
>>> A couple of suggestions:
>>>
>>> 1) Instead of
>>>
>>> ?CloneOop
>>> ?CloneArray
>>> ?CloneOopArray
>>>
>>> I think we should call the three types:
>>>
>>> ?CloneInstance
>>> ?CloneTypeArray
>>> ?CloneOopArray
>>>
>>> Since CloneOop is not actually cloning an oop, but an 
>>> object/instance. And being explicit about TypeArray seems like a 
>>> good thing to avoid any confusion about the difference compared to 
>>> CloneOopArray. I guess PrimArray would be an alternative to TypeArray.
>>>
>>> And of course, if we change this then the is_clone/set_clone 
>>> functions should follow the same naming convention.
>>>
>>> Btw, what about CopyOf and CopyOfRange? Don't they also come in Oop 
>>> and Type versions, or are we handling those differently in some way? 
>>> Looking at the code it looks like they are only used for the oop 
>>> array case?
>>
>>>
>>>
>>> 2) In zBarrierSerC2.cpp, do you mind if we do like this instead? I 
>>> find that quite a bit easier to read.
>>>
>>> [...]
>>> ? const Type** domain_fields = TypeTuple::fields(4);
>>> ? domain_fields[TypeFunc::Parms + 0] = TypeInstPtr::NOTNULL; // src
>>> ? domain_fields[TypeFunc::Parms + 1] = TypeInstPtr::NOTNULL; // dst
>>> ? domain_fields[TypeFunc::Parms + 2] = TypeLong::LONG; // size lower
>>> ? domain_fields[TypeFunc::Parms + 3] = Type::HALF; // size upper
>>> ? const TypeTuple* domain = TypeTuple::make(TypeFunc::Parms + 4, 
>>> domain_fields);
>>> [...]
>>>
>>>
>>> 3) I'd also like to add some const, adjust indentation, etc, in a 
>>> few places. Instead of listing them here I made a patch, which goes 
>>> on top of yours. This patch also adjusts 2) above. Just shout if you 
>>> have any objections.
>>>
>>> http://cr.openjdk.java.net/~pliden/8234520/webrev.03-review
>>>
>>> /Per
>>>
>>>>
>>>> Regards,
>>>>
>>>> Nils
>>>>
>>>>
>>>> On 2019-11-20 15:25, Nils Eliasson wrote:
>>>>> Hi,
>>>>>
>>>>> I found a few bugs after the enabling of the clone intrinsic in ZGC.
>>>>>
>>>>> 1) The arraycopy clone_basic has the parameters adjusted to work 
>>>>> as a memcopy. For an oop the src is pointing inside the oop to 
>>>>> where we want to start copying. But when we want to do a runtime 
>>>>> call to clone - the parameters are supposed to be the actual src 
>>>>> oop and dst oop, and the size should be the instance size.
>>>>>
>>>>> For now I have made a workaround. What should be done later is 
>>>>> using the offset in the arraycopy node to encode where the payload 
>>>>> is, so that the base pointers are always correct. But that would 
>>>>> require changes to the BarrierSet classes of all GCs. So I leave 
>>>>> that for next release.
>>>>>
>>>>> 2) The size parameter of the TypeFunc for the runtime call has the 
>>>>> wrong type. It was originally Long but missed the upper Half, it 
>>>>> was fixed to INT (JDK-8233834), but that is wrong and causes the 
>>>>> compiles to be skipped. We didn't notice that since they failed 
>>>>> silently. That is also why we didn't notice problem #1 too.
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8234520
>>>>>
>>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/
>>>>>
>>>>> Please review!
>>>>>
>>>>> Nils
>>>>>

From vladimir.x.ivanov at oracle.com  Thu Nov 28 12:28:43 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 28 Nov 2019 15:28:43 +0300
Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles
In-Reply-To: <3fcbaa19-1068-d30d-96b4-3f8a52089d28@oracle.com>
References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com>
 <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com>
 <49878c82-d174-b2c9-219a-0fe561155c0d@oracle.com>
 <58fd5565-4342-ea70-511d-bace68308391@oracle.com>
 <3fcbaa19-1068-d30d-96b4-3f8a52089d28@oracle.com>
Message-ID: <d1c6622b-e42d-412f-b32f-b9a79e3c7db1@oracle.com>


>> Do you see any problems with copying object header?
> 
> It won't be copied. It's just that the runtime call expects the 
> arguments to be pointers to the objects, and the size of the object. 
> It's the same function that is used by a call to the native clone impl. 
> (jvm.cpp:720)

Actually the header is copied, but then cleared.

 
http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/oops/accessBackend.inline.hpp#l363


>> -? if (src->bottom_type()->isa_aryptr()) {
>> +? if (ac->is_clone_array()) {
>> ???? // Clone primitive array
>>
>> Is the comment valid? Doesn't it cover object array case as well?
> 
> Nope - object arrays will be handled as clone_oop_array which uses the 
> normal object copy which already applies the appropriate load barriers.
> 
> The special case for ZGC is the cloning of instances because we don't 
> know where to apply load barriers without looking up the type. (Except 
> for clone on small objects and short arrays that are transformed to a 
> series of load-stores.)
> 
> http://cr.openjdk.java.net/~neliasso/8234520/webrev.04

I'm looking at webrev.05:

src/hotspot/share/gc/shared/c2/barrierSetC2.cpp:

-  ArrayCopyNode* ac = ArrayCopyNode::make(kit, false, src_base, NULL, 
dst_base, NULL, countx, true, false);
-  ac->set_clonebasic();
+  ArrayCopyNode* ac = ArrayCopyNode::make(kit, false, payload_src, 
NULL, payload_dst, NULL, payload_size, true, false);
+  if (is_array) {
+    ac->set_clone_prim_array();
+  } else {
+    ac->set_clone_inst();
+  }

I'm looking at LibraryCallKit::inline_native_clone():

 
http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/opto/library_call.cpp#l4323

It looks like object arrays are filtered out only if 
array_copy_requires_gc_barriers() == true:

 
http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/opto/library_call.cpp#l4333

But your change sets set_clone_prim_array() irrespective of whether 
object arrays are specially treated or not.

It looks like a naming problem, but still.

PS: and extract_base_offset is still there:

-void BarrierSetC2::clone(GraphKit* kit, Node* src, Node* dst, Node* 
size, bool is_array) const {
+int BarrierSetC2::extract_base_offset(bool is_array) {

Best regards,
Vladimir Ivanov


> 
> Thank you for the feedback!
> 
> // Nils
> 
>>
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>>
>>> Regards,
>>>
>>> Nils
>>>
>>>
>>> On 2019-11-21 12:53, Nils Eliasson wrote:
>>>> I updated this to version 2.
>>>>
>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/
>>>>
>>>> I found a problen running 
>>>> compiler/arguments/TestStressReflectiveCode.java
>>>>
>>>> Even though the clone was created as a oop clone, the type node type 
>>>> returns isa_aryprt. This is caused by the src ptr not being the base 
>>>> pointer. Until I fix that I wanted a more robust test.
>>>>
>>>> In this webrev I split up the is_clonebasic into is_clone_oop and 
>>>> is_clone_array. (is_clone_oop_array is already there). Having a 
>>>> complete set with the three clone types allows for a robust test and 
>>>> easy verification. (The three variants end up in different paths 
>>>> with different GCs).
>>>>
>>>> Regards,
>>>>
>>>> Nils
>>>>
>>>>
>>>> On 2019-11-20 15:25, Nils Eliasson wrote:
>>>>> Hi,
>>>>>
>>>>> I found a few bugs after the enabling of the clone intrinsic in ZGC.
>>>>>
>>>>> 1) The arraycopy clone_basic has the parameters adjusted to work as 
>>>>> a memcopy. For an oop the src is pointing inside the oop to where 
>>>>> we want to start copying. But when we want to do a runtime call to 
>>>>> clone - the parameters are supposed to be the actual src oop and 
>>>>> dst oop, and the size should be the instance size.
>>>>>
>>>>> For now I have made a workaround. What should be done later is 
>>>>> using the offset in the arraycopy node to encode where the payload 
>>>>> is, so that the base pointers are always correct. But that would 
>>>>> require changes to the BarrierSet classes of all GCs. So I leave 
>>>>> that for next release.
>>>>>
>>>>> 2) The size parameter of the TypeFunc for the runtime call has the 
>>>>> wrong type. It was originally Long but missed the upper Half, it 
>>>>> was fixed to INT (JDK-8233834), but that is wrong and causes the 
>>>>> compiles to be skipped. We didn't notice that since they failed 
>>>>> silently. That is also why we didn't notice problem #1 too.
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8234520
>>>>>
>>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/
>>>>>
>>>>> Please review!
>>>>>
>>>>> Nils
>>>>>

From christoph.goettschkes at microdoc.com  Thu Nov 28 12:44:05 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Thu, 28 Nov 2019 13:44:05 +0100
Subject: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for client
 VMs due to Unrecognized VM option LoopUnrollLimit
In-Reply-To: <bd187ed5-6dc3-0e43-f13e-06e9c48be13b@oracle.com>
References: <20191127125735.B9BE111F377@aojmv0009>
 <bd187ed5-6dc3-0e43-f13e-06e9c48be13b@oracle.com>
Message-ID: <mailman.6.1574945150.28664.hotspot-compiler-dev@openjdk.java.net>

Hi Vladimir,

Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote on 2019-11-27 20:54:02:

> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> To: christoph.goettschkes at microdoc.com, 
hotspot-compiler-dev at openjdk.java.net
> Date: 2019-11-27 20:54
> Subject: Re: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for 
> client VMs due to Unrecognized VM option LoopUnrollLimit
> 
> Hi Christoph
> 
> I was about suggest IgnoreUnrecognizedVMOptions flag but remembered 
> discussion about 8231954 fix.

Yes, I try to avoid "IgnoreUnrecognizedVMOptions" because of our previous 
discussion. I also think that it doesn't make sense to execute tests in VM 
configurations for which they are not written for. Most of the compiler 
tests simply have "IgnoreUnrecognizedVMOptions" and probably waste a good 
amount of time in certain VM configurations.

> But I think the test should be run with Graal - it does have OSR 
> compilation and we need to test it.

Sure. I disabled it, because I thought that the flag "LoopUnrollLimit" is 
required to trigger the faulty behavior, but I don't know much about 
optimization in the graal JIT.

> 
> We can do it by splitting test runs (duplicate @test block with 
> different run flags) to have 2 tests with different 
> flags and conditions. See [1].
> 
> For existing @run block we use `@requires vm.compiler2.enabled` and for
> new without LoopUnrollLimit - `vm.graal.enabled`.

I did the following:

https://cr.openjdk.java.net/~cgo/8234906/webrev.01/

Could you elaborate how the two flags are related? I though, if graal is 
used as a JIT, both `vm.graal.enabled` and `vm.compiler2.enabled` are set 
to true. Is that correct? I don't have a setup with graal, so I can not 
test this.

Thanks,
Christoph


From martin.doerr at sap.com  Thu Nov 28 13:10:40 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 28 Nov 2019 13:10:40 +0000
Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis
 library
In-Reply-To: <7B40FDAF-7E30-4ABC-9E1B-B18B2C139150@sap.com>
References: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com>
 <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com>
 <FDDC3449-891B-4133-B0E0-E46316CD8422@sap.com>
 <03722e38-d5fc-fe27-3426-49e1e92d9d17@oracle.com>
 <30AB85CF-8A51-494A-AA08-9A4C9C2F1EF1@sap.com>
 <50f96d01-aeca-d2ce-44df-093be8c77310@oracle.com>
 <7B40FDAF-7E30-4ABC-9E1B-B18B2C139150@sap.com>
Message-ID: <VI1PR0201MB2479F2511557C83176F0D28C9A470@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Lutz,

looks good to me, too. Thanks for fixing.

Best regards,
Martin


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of Schmidt, Lutz
> Sent: Montag, 25. November 2019 21:48
> To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; 'hotspot-compiler-
> dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> Cc: Jean-Philippe BEMPEL <jean-philippe at bempel.fr>
> Subject: [CAUTION] Re: [14] RFR(S): 8234583: PrintAssemblyOptions isn't
> passed to hsdis library
> 
> Thanks for the review, Vladimir!
> Still one to go.
> Regards,
> Lutz
> 
> ?On 25.11.19, 21:26, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com> wrote:
> 
> 
>     > All your other comments are valid. There is an open bug to address and
> improve the very basic options parsing:
> https://bugs.openjdk.java.net/browse/JDK-8223765 This task was split off
> from JDK-8213084.
>     >
>     > I would like to cover the improvements you suggest when working on
> that bug. To make _print_raw work correctly, I suggest to just move
>     >      if (_optionsParsed) return;
>     > a bit further down. The help text should be printed only once anyway.
> 
>     Ok, I'm fine with addressing it later. And thanks for taking care of it.
> 
>     > Here is a new webrev iteration. It reflects what I suggest:
> https://cr.openjdk.java.net/~lucy/webrevs/8234583.01/
> 
>     Looks good.
> 
>     Best regards,
>     Vladimir Ivanov
> 
>     > On 25.11.19, 19:30, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com>
> wrote:
>     >
>     >      Thanks for the clarifications, Lutz.
>     >
>     >      So, I assume you have a typo in the patch then:
>     >
>     >      +  if ((options() == NULL) || (strlen(options()) == 0)) {
>     >      +    // We need to fill the options buffer for each newly created
>     >      +    // decode_env instance. The hsdis_* library looks for options
>     >      +    // in that buffer.
>     >      +    collect_options(Disassembler::pd_cpu_opts());
>     >      +    collect_options(PrintAssemblyOptions);
>     >      +  }
>     >
>     >      It performs collect_options() calls only if _option_buf is either NULL
>     >      or "\0".
>     >
>     >      Also, what about the following updates of instance members?
>     >
>     >         if (strstr(options(), "print-raw")) {
>     >           _print_raw = (strstr(options(), "xml") ? 2 : 1);
>     >         }
>     >
>     >         if (strstr(options(), "help")) {
>     >           _print_help = true;
>     >         }
>     >
>     >      BTW should _print_help (along with _helpPrinted) be better turned
> into
>     >      static member?
>     >
>     >      Can we make _option_buf static as well? Or do we want to keep a
>     >      defensive copy to pass into hsdis.so?
>     >
>     >      As an alternative approach to fix the bug, we could create a golden
> copy
>     >      during parsing instead and then just copy it to _option_buf as part of
>     >      decode_env initialization.
>     >
>     >      Best regards,
>     >      Vladimir Ivanov
>     >
>     >      On 25.11.2019 19:59, Schmidt, Lutz wrote:
>     >      > Hi Vladimir,
>     >      >
>     >      > I'm happy to elaborate in more detail about the issue and the fix.
>     >      >
>     >      > For each decode_env instance which is constructed,
> process_options() is called. It collects the disassembly options from various
> sources (Disassembler::pd_cpu_opts() and PrintAssemblyOptions), storing
> them in the private member "char _option_buf[512]".
>     >      >
>     >      > Further processing derives static flag settings from these options.
> Being static, these flags need to be set only once, not every time a
> decode_env is constructed.
>     >      >
>     >      > But that's just one part of the story. It was not taken into account
> that _option_buf is passed to and analyzed by hsdis-<platform>.so as well.
> That requires _option_buf to be filled every time a decode_env is
> constructed.
>     >      >
>     >      > Moving
>     >      >    if (_optionsParsed) return;
>     >      > after the collect_options() calls heals this deficiency.
>     >      >
>     >      > I added the guards you question as additional "safety net". After
> looking at the code again I must admit the guards are not necessary.
> _option_buf can never be NULL and every invocation of process_options() is
> directly preceded by a memset(_option_buf, 0, sizeof(_option_buf)). I can
> remove the guards if you like.
>     >      >
>     >      > Please let me know if there are any more questions to be answered.
>     >      >
>     >      > Thanks,
>     >      > Lutz
>     >      >
>     >      >
>     >      > On 25.11.19, 17:05, "Vladimir Ivanov"
> <vladimir.x.ivanov at oracle.com> wrote:
>     >      >
>     >      >      Lutz,
>     >      >
>     >      >      Can you elaborate, please, how the patch fixes the problem?
>     >      >
>     >      >      Why did you decide to add the following guards?
>     >      >
>     >      >      +  if ((options() == NULL) || (strlen(options()) == 0)) {
>     >      >
>     >      >      Best regards,
>     >      >      Vladimir Ivanov
>     >      >
>     >      >      On 25.11.2019 17:06, Schmidt, Lutz wrote:
>     >      >      > Dear all,
>     >      >      >
>     >      >      > may I please request reviews for this small change, fixing a
> regression in the disassembler. Parameters to the hsdis-<platform> library
> were not passed on.
>     >      >      >
>     >      >      > The change was verified to fix the issue by the reporter (Jean-
> Philippe Bempel, on CC:). jdk/submit tests pending...
>     >      >      >
>     >      >      > Bug:    https://bugs.openjdk.java.net/browse/JDK-8234583
>     >      >      > Webrev:
> https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/
>     >      >      >
>     >      >      > Thank you,
>     >      >      > Lutz
>     >      >      >
>     >      >
>     >      >
>     >
>     >
> 


From adinn at redhat.com  Thu Nov 28 13:32:03 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 28 Nov 2019 13:32:03 +0000
Subject: RFR(S) 8234891: AArch64: Fix build failure after JDK-8234387
In-Reply-To: <abea05bf-b100-83e5-0b33-cc44c16fe42c@oracle.com>
References: <847197ac-5969-900c-840a-61c475b19d84@bell-sw.com>
 <abea05bf-b100-83e5-0b33-cc44c16fe42c@oracle.com>
Message-ID: <0960f240-6801-48c2-9664-c7509e90f4a5@redhat.com>

On 27/11/2019 13:23, Vladimir Ivanov wrote:
> The fix looks good and trivial.

Yes, the patch is good. The CmpOp matches are not needed and perhaps
never were.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From rwestrel at redhat.com  Thu Nov 28 13:45:26 2019
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 28 Nov 2019 14:45:26 +0100
Subject: scalar replacement of arrays affected by minor changes to
 surrounding code
In-Reply-To: <21F5DF77-4DA9-437C-9DEB-CFABFC4C9C19@microsoft.com>
References: <CAKy2qXcocPHKcArSKu1XgPAtmmVM5PRdT0GUMgCK6qmk=_dmSg@mail.gmail.com>
 <87d0fvidxs.fsf@redhat.com> <87a7azid39.fsf@redhat.com>
 <21F5DF77-4DA9-437C-9DEB-CFABFC4C9C19@microsoft.com>
Message-ID: <878so0azd5.fsf@redhat.com>


Hi Charlie,

> Roland is there anything I could do to help get this type of change into tip and any
> appropriate back ports? I have tested some workloads with this patch and I think
> I covered the required checks but I could have easily missed something.

Let me go over the patch again and prepare it for review.

Roland.


From tobias.hartmann at oracle.com  Thu Nov 28 14:04:01 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 28 Nov 2019 15:04:01 +0100
Subject: [XXS] C1 misses to dump a reason when it inlines successfully
In-Reply-To: <CAHPdc0nHkkqNGK=_HkwCCdHwqva-Nty9K8fMQvB28dP66N1y7Q@mail.gmail.com>
References: <CAHPdc0nHkkqNGK=_HkwCCdHwqva-Nty9K8fMQvB28dP66N1y7Q@mail.gmail.com>
Message-ID: <3759c24e-cdd7-a39c-40cd-26a403a07477@oracle.com>

Hi,

I'm still not convinced that this message adds any useful information but let's see what other
reviewers think.

Anyway, with your change, all callers now pass a msg argument and therefore the null handling logic
should be removed (replaced by an assert). Also, the default value for the argument can be removed.

Best regards,
Tobias

On 22.11.19 19:09, Liu Xin wrote:
> hi, Reviewers,
> 
> Could you review this extremely small change?
> Bugs: https://bugs.openjdk.java.net/browse/JDK-8234541
> Webrev: https://cr.openjdk.java.net/~xliu/8234541/00/webrev/
> 
> When I analyzed PrintInlining, I was confused by the inline message without
> any detail. It's not easy for developer to tell if this method is inlined
> or not. This patch add a comment "inline by the rules of C1".
> 
> I would like to add an explicit reason, but there's no decisive reason in
> GraphBuilder::try_inline_full. It just passes all restrict rules. Any other
> suggestion would be appreciated.
> 
> Thanks,
> --lx
> 

From tobias.hartmann at oracle.com  Thu Nov 28 14:08:32 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 28 Nov 2019 15:08:32 +0100
Subject: [14] RFR (L): 8234391: C2: Generic vector operands
In-Reply-To: <77c66795-ecd4-1461-4151-66c82b4e554c@oracle.com>
References: <89904467-5010-129f-6f61-e279cce8936a@oracle.com>
 <0e506a31-9107-0354-ffce-308d332cbfbd@oracle.com>
 <77c66795-ecd4-1461-4151-66c82b4e554c@oracle.com>
Message-ID: <346c4324-7069-0aca-288e-26c90363b170@oracle.com>

Hi Vladimir,

thanks for the additional details. Looks all good to me.

Best regards,
Tobias

On 26.11.19 17:52, Vladimir Ivanov wrote:
> Thanks, Tobias.
> 
>> hard to review the .ad file changes but this looks good to me.
> 
> Yes, the changes are massive, but mostly straightforward. In addition to the code needed for generic
> vector operand, the changes are:
> 
> ? (1) (x86.ad) switching vec[SDXYZ] => vec and legVec[SDXYZ] => legVec;
> 
> ? (2) (x86.ad) after the switch reduction instructions need additional checks on input vector size
> (example [1]);
> 
> ? (3) (x86_64.ad/x86_32.ad) rename operands with "vec" name to avoid name conflicts with vec operand;
> 
> ? (4) (x86_64.ad) migrate compressed string instructions from legVecS to legRegD to keep it working
> when vector support is explicitly disabled (e.g., -XX:MaxVectorSize=0)
> 
> (1) is needed to avoid explicit moves between concrete (legVec/vec[SDXYZ]) and generic vectors
> (vec/legVec).
> 
> (2)-(4) could have been reviewed/integrated separately, but they did look trivial enough to avoid
> the effort.
> 
>> Just noticed some code style issues:
>> - x86_64.ad:11284, 11346, 11410, 11426: indentation is wrong (already before your fix)
>> - whitespace in matcher.cpp:2598/2601 can be removed
> 
> Good catch.
> 
> Best regards,
> Vladimir Ivanov
> 
> [1]
> -instruct rvadd2F_reduction_reg(regF dst, vecD src2, vecD tmp) %{
> -? predicate(UseAVX > 0);
> +instruct rvadd2F_reduction_reg(regF dst, vec src2, vec tmp) %{
> +? predicate(UseAVX > 0 && n->in(2)->bottom_type()->is_vect()->length() == 2);
> 
>> On 19.11.19 15:30, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234391/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8234391
>>>
>>> Introduce generic vector operands and migrate existing usages from fixed sized operands (vec[SDXYZ])
>>> to generic ones.
>>>
>>> (It's an updated version of generic vector support posted for review in August, 2019 [1] [2]. AD
>>> instruction merges will be handled separately.)
>>>
>>> On a high-level it is organized as follows:
>>>
>>> ?? (1) all AD instructions in x86.ad/x86_64.ad/x86_32.ad use vec/legVec;
>>>
>>> ?? (2) at runtime, right after matching is over, a special pass is performed which does:
>>>
>>> ?????? * replaces vecOper with vec[SDXYZ] depending on mach node type
>>> ????????? - vector mach nodes capute bottom_type() of their ideal prototype;
>>>
>>> ?????? * eliminates redundant reg-to-reg vector moves (MoveVec2Leg /MoveLeg2Vec)
>>> ????????? - matcher needs them, but they are useless for register allocator (moreover, may cause
>>> additional spills);
>>>
>>>
>>> ??? (3) after post-selection pass is over, all mach nodes should have fixed-size vector operands.
>>>
>>>
>>> Some details:
>>>
>>> ??? (1) vec and legVec are marked as "dynamic" operands, so post-selection rewriting works
>>>
>>>
>>> ??? (2) new logic is guarded by new matcher flag (Matcher::supports_generic_vector_operands)
>>> which is
>>> enabled only on x86
>>>
>>>
>>> ??? (3) post-selection analysis is implemented as a single pass over the graph and processing
>>> individual nodes using their own (for DEF operands) or their inputs (USE operands) bottom_type()
>>> (which is an instance of TypeVect)
>>>
>>>
>>> ??? (4) most of the analysis is cross-platform and interface with platform-specific code through 3
>>> methods:
>>>
>>> ????? static bool is_generic_reg2reg_move(MachNode* m);
>>> ????? // distinguishes MoveVec2Leg/MoveLeg2Vec nodes
>>>
>>> ????? static bool is_generic_vector(MachOper* opnd);
>>> ????? // distinguishes vec/legVec operands
>>>
>>> ????? static MachOper* clone_generic_vector_operand(MachOper* generic_opnd, uint ideal_reg);
>>> ????? // constructs fixed-sized vector operand based on ideal reg
>>> ????? //?? vec??? + Op_Vec[SDXYZ] =>??? vec[SDXYZ]
>>> ????? //?? legVec + Op_Vec[SDXYZ] => legVec[SDXYZ]
>>>
>>>
>>> ??? (5) TEMP operands are handled specially:
>>> ????? - TEMP uses max_vector_size() to determine what fixed-sized operand to use
>>> ????????? * it is needed to cover reductions which don't produce vectors but scalars
>>> ????? - TEMP_DEF inherits fixed-sized operand type from DEF;
>>>
>>>
>>> ??? (6) there is limited number of special cases for mach nodes in
>>> Matcher::get_vector_operand_helper:
>>>
>>> ??????? - RShiftCntV/RShiftCntV: though it reports wide vector type as Node::bottom_type(), its
>>> ideal_reg is VecS! But for vector nodes only Node::bottom_type() is captured during matching and not
>>> ideal_reg().
>>>
>>> ??????? - vshiftcntimm: chain instructions which convert scalar to vector don't have vector type.
>>>
>>>
>>> ??? (7) idealreg2regmask initialization logic is adjusted to handle generic vector operands (see
>>> Matcher::get_vector_regmask)
>>>
>>>
>>> ??? (8) operand renaming in x86_32.ad & x86_64.ad to avoid name conflicts with new vec/legVec
>>> operands
>>>
>>>
>>> ??? (9) x86_64.ad: all TEMP usages of vecS/legVecS are replaced with regD/legRegD
>>> ?????? - it aligns the code between x86_64.ad and x86_32.ad
>>> ?????? - strictly speaking, it's illegal to use vector operands on a non-vector node (e.g.,
>>> string_inflate) unless its usage is guarded by C2 vector support checks (-XX:MaxVectorSize=0)
>>>
>>>
>>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>>> Reviewed-by: vlivanov, sviswanathan, ?
>>>
>>> Testing: tier1-tier4, jtreg compiler tests on KNL and SKL,
>>> ????????? performance testing (SPEC* + Octane + micros / G1 + ParGC).
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html
>>>
>>> [2] http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf

From tobias.hartmann at oracle.com  Thu Nov 28 14:14:13 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 28 Nov 2019 15:14:13 +0100
Subject: RFR: 823480: [TESTBUG] LoopRotateBadNodeBudget fails for client
 VMs due to Unrecognized VM option PartialPeelNewPhiDelta
In-Reply-To: <20191127092416.6556811C437@aojmv0009>
References: <20191126133017.4DEC711CEFA@aojmv0009>
 <46e4c188-2d15-bd33-010b-cb203741cbd8@oracle.com>
 <20191127092416.6556811C437@aojmv0009>
Message-ID: <05973876-c74f-dace-4c3c-06cfdf779e48@oracle.com>

Hi Christoph,

I've sponsored it.

Best regards,
Tobias

On 27.11.19 10:22, christoph.goettschkes at microdoc.com wrote:
> Hi Vladimir,
> 
> thanks for the review. I update the webrev, please find the changeset 
> here:
> https://cr.openjdk.java.net/~cgo/8234807/webrev.02/jdk-jdk.changeset
> 
> Could you please sponsor this change for me and commit it into the 
> repository?
> 
> Thanks,
> Christoph
> 
> "hotspot-compiler-dev" <hotspot-compiler-dev-bounces at openjdk.java.net> 
> wrote on 2019-11-26 18:05:22:
> 
>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> To: hotspot-compiler-dev at openjdk.java.net
>> Date: 2019-11-26 18:06
>> Subject: Re: RFR: 823480: [TESTBUG] LoopRotateBadNodeBudget fails for 
>> client VMs due to Unrecognized VM option PartialPeelNewPhiDelta
>> Sent by: "hotspot-compiler-dev" 
> <hotspot-compiler-dev-bounces at openjdk.java.net>
>>
>> Good.
>>
>> Thanks,
>> Vladimir
>>
>> On 11/26/19 5:28 AM, christoph.goettschkes at microdoc.com wrote:
>>> Hi,
>>>
>>> please review the following small changeset which fixes the test
>>> test/hotspot/jtreg/compiler/loopopts/LoopRotateBadNodeBudget.java for
>>> client VMs.
>>> I simply added vm.compiler2.enabled to the requires tag, since the
>>> original bug only appeared with the server JIT:
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234807
>>> Webrev: http://cr.openjdk.java.net/~cgo/8234807/webrev.00/
>>>
>>> Bug which introduced the issue:
>>> https://bugs.openjdk.java.net/browse/JDK-8231565
>>>
>>> Thanks,
>>> Christoph
>>>
>>
> 

From tobias.hartmann at oracle.com  Thu Nov 28 14:20:20 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 28 Nov 2019 15:20:20 +0100
Subject: [14] RFR(S): 8234617: C1: Incorrect result of field load due to
 missing narrowing conversion
Message-ID: <d317e7aa-92fa-c726-40dd-c004302a8b9f@oracle.com>

Hi,

please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8234617
http://cr.openjdk.java.net/~thartmann/8234617/webrev.00/

Writing an (integer) value to a boolean, byte, char or short field includes an implicit narrowing
conversion [1]. With -XX:+EliminateFieldAccess (default), C1 tries to omit field loads by caching
and reusing the last written value. The problem is that this value is not necessarily converted to
the field type and we end up using an incorrect value.

For example, for the field store/load in testShort, C1 emits:
  [...]
  0x00007f0fc582bd6c:   mov    %dx,0x12(%rsi)
  0x00007f0fc582bd70:   mov    %rdx,%rax
  [...]

The field load has been eliminated and the non-converted integer value (%rdx) is returned.

The fix is to emit an explicit conversion to get the correct field value after the write:
  [...]
  0x00007ff07982bd6c:   mov    %dx,0x12(%rsi)
  0x00007ff07982bd70:   movswl %dx,%edx
  0x00007ff07982bd73:   mov    %rdx,%rax
  [...]

Thanks,
Tobias

[1] https://docs.oracle.com/javase/specs/jvms/se13/html/jvms-6.html#jvms-6.5.putfield

From tobias.hartmann at oracle.com  Thu Nov 28 14:23:12 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 28 Nov 2019 14:23:12 +0000 (UTC)
Subject: RFR: 8234894: [TESTBUG] TestEliminateLocksOffCrash fails for
 client VMs due to Unrecognized VM option EliminateLocks
In-Reply-To: <20191127095912.A371B11C5C1@aojmv0009>
References: <20191127095912.A371B11C5C1@aojmv0009>
Message-ID: <719a17e8-2710-a890-54df-e80714385b4d@oracle.com>

Hi Christoph,

looks good to me.

Best regards,
Tobias

On 27.11.19 10:57, christoph.goettschkes at microdoc.com wrote:
> Hi,
> 
> please review the following small changeset which fixes the test 
> test/hotspot/jtreg/compiler/escapeAnalysis/TestEliminateLocksOffCrash.java 
> for client VMs.
> I added the requirement "vm.compiler2.enabled & !vm.graal.enabled", since 
> the original bug is for C2 only. Also, the flag EliminateLocks is only 
> defined in c2_globals.hpp, and neither for C1, nor for JVMCI.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8234894
> Webrev: https://cr.openjdk.java.net/~cgo/8234894/webrev.00
> 
> Bug which introduced the issue:
> https://bugs.openjdk.java.net/browse/JDK-8227384
> 
> Thanks,
> Christoph
> 

From lutz.schmidt at sap.com  Thu Nov 28 14:31:49 2019
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Thu, 28 Nov 2019 14:31:49 +0000
Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis
 library
In-Reply-To: <VI1PR0201MB2479F2511557C83176F0D28C9A470@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com>
 <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com>
 <FDDC3449-891B-4133-B0E0-E46316CD8422@sap.com>
 <03722e38-d5fc-fe27-3426-49e1e92d9d17@oracle.com>
 <30AB85CF-8A51-494A-AA08-9A4C9C2F1EF1@sap.com>
 <50f96d01-aeca-d2ce-44df-093be8c77310@oracle.com>
 <7B40FDAF-7E30-4ABC-9E1B-B18B2C139150@sap.com>
 <VI1PR0201MB2479F2511557C83176F0D28C9A470@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <59DDA9B7-D7CF-4BAD-9826-62186F4698C4@sap.com>

HI Martin,
thanks for reviewing. I'll go ahead and push.
Regards,
Lutz

?On 28.11.19, 14:10, "Doerr, Martin" <martin.doerr at sap.com> wrote:

    Hi Lutz,
    
    looks good to me, too. Thanks for fixing.
    
    Best regards,
    Martin
    
    
    > -----Original Message-----
    > From: hotspot-compiler-dev <hotspot-compiler-dev-
    > bounces at openjdk.java.net> On Behalf Of Schmidt, Lutz
    > Sent: Montag, 25. November 2019 21:48
    > To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; 'hotspot-compiler-
    > dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
    > Cc: Jean-Philippe BEMPEL <jean-philippe at bempel.fr>
    > Subject: [CAUTION] Re: [14] RFR(S): 8234583: PrintAssemblyOptions isn't
    > passed to hsdis library
    > 
    > Thanks for the review, Vladimir!
    > Still one to go.
    > Regards,
    > Lutz
    > 
    > On 25.11.19, 21:26, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com> wrote:
    > 
    > 
    >     > All your other comments are valid. There is an open bug to address and
    > improve the very basic options parsing:
    > https://bugs.openjdk.java.net/browse/JDK-8223765 This task was split off
    > from JDK-8213084.
    >     >
    >     > I would like to cover the improvements you suggest when working on
    > that bug. To make _print_raw work correctly, I suggest to just move
    >     >      if (_optionsParsed) return;
    >     > a bit further down. The help text should be printed only once anyway.
    > 
    >     Ok, I'm fine with addressing it later. And thanks for taking care of it.
    > 
    >     > Here is a new webrev iteration. It reflects what I suggest:
    > https://cr.openjdk.java.net/~lucy/webrevs/8234583.01/
    > 
    >     Looks good.
    > 
    >     Best regards,
    >     Vladimir Ivanov
    > 
    >     > On 25.11.19, 19:30, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com>
    > wrote:
    >     >
    >     >      Thanks for the clarifications, Lutz.
    >     >
    >     >      So, I assume you have a typo in the patch then:
    >     >
    >     >      +  if ((options() == NULL) || (strlen(options()) == 0)) {
    >     >      +    // We need to fill the options buffer for each newly created
    >     >      +    // decode_env instance. The hsdis_* library looks for options
    >     >      +    // in that buffer.
    >     >      +    collect_options(Disassembler::pd_cpu_opts());
    >     >      +    collect_options(PrintAssemblyOptions);
    >     >      +  }
    >     >
    >     >      It performs collect_options() calls only if _option_buf is either NULL
    >     >      or "\0".
    >     >
    >     >      Also, what about the following updates of instance members?
    >     >
    >     >         if (strstr(options(), "print-raw")) {
    >     >           _print_raw = (strstr(options(), "xml") ? 2 : 1);
    >     >         }
    >     >
    >     >         if (strstr(options(), "help")) {
    >     >           _print_help = true;
    >     >         }
    >     >
    >     >      BTW should _print_help (along with _helpPrinted) be better turned
    > into
    >     >      static member?
    >     >
    >     >      Can we make _option_buf static as well? Or do we want to keep a
    >     >      defensive copy to pass into hsdis.so?
    >     >
    >     >      As an alternative approach to fix the bug, we could create a golden
    > copy
    >     >      during parsing instead and then just copy it to _option_buf as part of
    >     >      decode_env initialization.
    >     >
    >     >      Best regards,
    >     >      Vladimir Ivanov
    >     >
    >     >      On 25.11.2019 19:59, Schmidt, Lutz wrote:
    >     >      > Hi Vladimir,
    >     >      >
    >     >      > I'm happy to elaborate in more detail about the issue and the fix.
    >     >      >
    >     >      > For each decode_env instance which is constructed,
    > process_options() is called. It collects the disassembly options from various
    > sources (Disassembler::pd_cpu_opts() and PrintAssemblyOptions), storing
    > them in the private member "char _option_buf[512]".
    >     >      >
    >     >      > Further processing derives static flag settings from these options.
    > Being static, these flags need to be set only once, not every time a
    > decode_env is constructed.
    >     >      >
    >     >      > But that's just one part of the story. It was not taken into account
    > that _option_buf is passed to and analyzed by hsdis-<platform>.so as well.
    > That requires _option_buf to be filled every time a decode_env is
    > constructed.
    >     >      >
    >     >      > Moving
    >     >      >    if (_optionsParsed) return;
    >     >      > after the collect_options() calls heals this deficiency.
    >     >      >
    >     >      > I added the guards you question as additional "safety net". After
    > looking at the code again I must admit the guards are not necessary.
    > _option_buf can never be NULL and every invocation of process_options() is
    > directly preceded by a memset(_option_buf, 0, sizeof(_option_buf)). I can
    > remove the guards if you like.
    >     >      >
    >     >      > Please let me know if there are any more questions to be answered.
    >     >      >
    >     >      > Thanks,
    >     >      > Lutz
    >     >      >
    >     >      >
    >     >      > On 25.11.19, 17:05, "Vladimir Ivanov"
    > <vladimir.x.ivanov at oracle.com> wrote:
    >     >      >
    >     >      >      Lutz,
    >     >      >
    >     >      >      Can you elaborate, please, how the patch fixes the problem?
    >     >      >
    >     >      >      Why did you decide to add the following guards?
    >     >      >
    >     >      >      +  if ((options() == NULL) || (strlen(options()) == 0)) {
    >     >      >
    >     >      >      Best regards,
    >     >      >      Vladimir Ivanov
    >     >      >
    >     >      >      On 25.11.2019 17:06, Schmidt, Lutz wrote:
    >     >      >      > Dear all,
    >     >      >      >
    >     >      >      > may I please request reviews for this small change, fixing a
    > regression in the disassembler. Parameters to the hsdis-<platform> library
    > were not passed on.
    >     >      >      >
    >     >      >      > The change was verified to fix the issue by the reporter (Jean-
    > Philippe Bempel, on CC:). jdk/submit tests pending...
    >     >      >      >
    >     >      >      > Bug:    https://bugs.openjdk.java.net/browse/JDK-8234583
    >     >      >      > Webrev:
    > https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/
    >     >      >      >
    >     >      >      > Thank you,
    >     >      >      > Lutz
    >     >      >      >
    >     >      >
    >     >      >
    >     >
    >     >
    > 
    
    
From martin.doerr at sap.com  Thu Nov 28 14:53:48 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 28 Nov 2019 14:53:48 +0000
Subject: [XXS] C1 misses to dump a reason when it inlines successfully
In-Reply-To: <3759c24e-cdd7-a39c-40cd-26a403a07477@oracle.com>
References: <CAHPdc0nHkkqNGK=_HkwCCdHwqva-Nty9K8fMQvB28dP66N1y7Q@mail.gmail.com>
 <3759c24e-cdd7-a39c-40cd-26a403a07477@oracle.com>
Message-ID: <VI1PR0201MB2479F3C817CFDA97B7FF9AE59A470@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi,

I agree with Tobias. " by the rules of C1" doesn't add any useful information IMHO.
But I'd be fine with just adding "inline" to avoid the confusion.

> Anyway, with your change, all callers now pass a msg argument and
> therefore the null handling logic
> should be removed (replaced by an assert). Also, the default value for the
> argument can be removed.
+1

Best regards,
Martin


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of Tobias Hartmann
> Sent: Donnerstag, 28. November 2019 15:04
> To: Liu Xin <navy.xliu at gmail.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: [XXS] C1 misses to dump a reason when it inlines successfully
> 
> Hi,
> 
> I'm still not convinced that this message adds any useful information but let's
> see what other
> reviewers think.
> 
> Anyway, with your change, all callers now pass a msg argument and
> therefore the null handling logic
> should be removed (replaced by an assert). Also, the default value for the
> argument can be removed.
> 
> Best regards,
> Tobias
> 
> On 22.11.19 19:09, Liu Xin wrote:
> > hi, Reviewers,
> >
> > Could you review this extremely small change?
> > Bugs: https://bugs.openjdk.java.net/browse/JDK-8234541
> > Webrev: https://cr.openjdk.java.net/~xliu/8234541/00/webrev/
> >
> > When I analyzed PrintInlining, I was confused by the inline message
> without
> > any detail. It's not easy for developer to tell if this method is inlined
> > or not. This patch add a comment "inline by the rules of C1".
> >
> > I would like to add an explicit reason, but there's no decisive reason in
> > GraphBuilder::try_inline_full. It just passes all restrict rules. Any other
> > suggestion would be appreciated.
> >
> > Thanks,
> > --lx
> >

From nils.eliasson at oracle.com  Thu Nov 28 15:15:40 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Thu, 28 Nov 2019 16:15:40 +0100
Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles
In-Reply-To: <d1c6622b-e42d-412f-b32f-b9a79e3c7db1@oracle.com>
References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com>
 <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com>
 <49878c82-d174-b2c9-219a-0fe561155c0d@oracle.com>
 <58fd5565-4342-ea70-511d-bace68308391@oracle.com>
 <3fcbaa19-1068-d30d-96b4-3f8a52089d28@oracle.com>
 <d1c6622b-e42d-412f-b32f-b9a79e3c7db1@oracle.com>
Message-ID: <edaf2a22-b3cf-86e7-3745-283827f8d302@oracle.com>


On 2019-11-28 13:28, Vladimir Ivanov wrote:
>
>>> Do you see any problems with copying object header?
>>
>> It won't be copied. It's just that the runtime call expects the 
>> arguments to be pointers to the objects, and the size of the object. 
>> It's the same function that is used by a call to the native clone 
>> impl. (jvm.cpp:720)
>
> Actually the header is copied, but then cleared.
>
>
> http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/oops/accessBackend.inline.hpp#l363 
>

Let me correct myself:

For the intrinsic case - we are copying part of the header - the klass, 
but not the markword.

For the runtime call that is used by native clone, and ZGCs clone_inst - 
the entire Object is cloned. This seems unnecessary to me - but that is 
how it is done when the intrinsic is disabled too. This could probably 
be changed, but then we would need to verify it everywhere.


>
>
>>> -? if (src->bottom_type()->isa_aryptr()) {
>>> +? if (ac->is_clone_array()) {
>>> ???? // Clone primitive array
>>>
>>> Is the comment valid? Doesn't it cover object array case as well?
>>
>> Nope - object arrays will be handled as clone_oop_array which uses 
>> the normal object copy which already applies the appropriate load 
>> barriers.
>>
>> The special case for ZGC is the cloning of instances because we don't 
>> know where to apply load barriers without looking up the type. 
>> (Except for clone on small objects and short arrays that are 
>> transformed to a series of load-stores.)
>>
>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.04
>
> I'm looking at webrev.05:
>
> src/hotspot/share/gc/shared/c2/barrierSetC2.cpp:
>
> -? ArrayCopyNode* ac = ArrayCopyNode::make(kit, false, src_base, NULL, 
> dst_base, NULL, countx, true, false);
> -? ac->set_clonebasic();
> +? ArrayCopyNode* ac = ArrayCopyNode::make(kit, false, payload_src, 
> NULL, payload_dst, NULL, payload_size, true, false);
> +? if (is_array) {
> +??? ac->set_clone_prim_array();
> +? } else {
> +??? ac->set_clone_inst();
> +? }
>
> I'm looking at LibraryCallKit::inline_native_clone():
>
>
> http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/opto/library_call.cpp#l4323 
>
>
> It looks like object arrays are filtered out only if 
> array_copy_requires_gc_barriers() == true:
>
>
> http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/opto/library_call.cpp#l4333 
>
>
> But your change sets set_clone_prim_array() irrespective of whether 
> object arrays are specially treated or not.
>
> It looks like a naming problem, but still.

Yes, you are right, for GCs without a barriers on oop arrays the naming 
will be off.

The right thing to do would probably be to have one state for inst or 
array, and another state for what type of array.

And it gets more complicated, at all the places where 
is_clone_basic()/is_clone_inst_or_prim_array() is used, different GCs 
would need different answers. ZGC can treat prim and oop arrays the 
same, until when choosing the right type of acopy, while other GCs that 
expand barriers early, can't do that.

After talking to Per I suggest to revert is_clone_inst_or_prim_array() 
to is_clonebasic() and give it a proper comment.

I will need to revisit this and continue to simplify things and clean it up.

http://cr.openjdk.java.net/~neliasso/8234520/webrev.06/

One small diff - src/hotspot/share/opto/arraycopynode.cpp only had a 
formatting change - so I have reverted it completely.

// Nils


>
> PS: and extract_base_offset is still there:
>
> -void BarrierSetC2::clone(GraphKit* kit, Node* src, Node* dst, Node* 
> size, bool is_array) const {
> +int BarrierSetC2::extract_base_offset(bool is_array) {

ah - good - missed that one when merging with Pers stuff.

Thanks for you feedback.

/ Nils


>
> Best regards,
> Vladimir Ivanov
>
>
>>
>> Thank you for the feedback!
>>
>> // Nils
>>
>>>
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>>
>>>> Regards,
>>>>
>>>> Nils
>>>>
>>>>
>>>> On 2019-11-21 12:53, Nils Eliasson wrote:
>>>>> I updated this to version 2.
>>>>>
>>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/
>>>>>
>>>>> I found a problen running 
>>>>> compiler/arguments/TestStressReflectiveCode.java
>>>>>
>>>>> Even though the clone was created as a oop clone, the type node 
>>>>> type returns isa_aryprt. This is caused by the src ptr not being 
>>>>> the base pointer. Until I fix that I wanted a more robust test.
>>>>>
>>>>> In this webrev I split up the is_clonebasic into is_clone_oop and 
>>>>> is_clone_array. (is_clone_oop_array is already there). Having a 
>>>>> complete set with the three clone types allows for a robust test 
>>>>> and easy verification. (The three variants end up in different 
>>>>> paths with different GCs).
>>>>>
>>>>> Regards,
>>>>>
>>>>> Nils
>>>>>
>>>>>
>>>>> On 2019-11-20 15:25, Nils Eliasson wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I found a few bugs after the enabling of the clone intrinsic in ZGC.
>>>>>>
>>>>>> 1) The arraycopy clone_basic has the parameters adjusted to work 
>>>>>> as a memcopy. For an oop the src is pointing inside the oop to 
>>>>>> where we want to start copying. But when we want to do a runtime 
>>>>>> call to clone - the parameters are supposed to be the actual src 
>>>>>> oop and dst oop, and the size should be the instance size.
>>>>>>
>>>>>> For now I have made a workaround. What should be done later is 
>>>>>> using the offset in the arraycopy node to encode where the 
>>>>>> payload is, so that the base pointers are always correct. But 
>>>>>> that would require changes to the BarrierSet classes of all GCs. 
>>>>>> So I leave that for next release.
>>>>>>
>>>>>> 2) The size parameter of the TypeFunc for the runtime call has 
>>>>>> the wrong type. It was originally Long but missed the upper Half, 
>>>>>> it was fixed to INT (JDK-8233834), but that is wrong and causes 
>>>>>> the compiles to be skipped. We didn't notice that since they 
>>>>>> failed silently. That is also why we didn't notice problem #1 too.
>>>>>>
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234520
>>>>>>
>>>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/
>>>>>>
>>>>>> Please review!
>>>>>>
>>>>>> Nils
>>>>>>

From christoph.goettschkes at microdoc.com  Thu Nov 28 15:17:14 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Thu, 28 Nov 2019 16:17:14 +0100
Subject: RFR: 8234894: [TESTBUG] TestEliminateLocksOffCrash fails for
 client VMs due to Unrecognized VM option EliminateLocks
In-Reply-To: <719a17e8-2710-a890-54df-e80714385b4d@oracle.com>
References: <20191127095912.A371B11C5C1@aojmv0009>
 <719a17e8-2710-a890-54df-e80714385b4d@oracle.com>
Message-ID: <mailman.7.1574954334.28664.hotspot-compiler-dev@openjdk.java.net>

Hi Tobias,

thanks for the review. I created the changeset:

https://cr.openjdk.java.net/~cgo/8234894/webrev.01/jdk-jdk.changeset

Could you please sponsor this change for me and commit it into the 
repository?

Thanks,
Christoph

Tobias Hartmann <tobias.hartmann at oracle.com> wrote on 2019-11-28 15:23:12:

> From: Tobias Hartmann <tobias.hartmann at oracle.com>
> To: christoph.goettschkes at microdoc.com, 
hotspot-compiler-dev at openjdk.java.net
> Date: 2019-11-28 15:23
> Subject: Re: RFR: 8234894: [TESTBUG] TestEliminateLocksOffCrash fails 
> for client VMs due to Unrecognized VM option EliminateLocks
> 
> Hi Christoph,
> 
> looks good to me.
> 
> Best regards,
> Tobias
> 
> On 27.11.19 10:57, christoph.goettschkes at microdoc.com wrote:
> > Hi,
> > 
> > please review the following small changeset which fixes the test 
> > 
test/hotspot/jtreg/compiler/escapeAnalysis/TestEliminateLocksOffCrash.java 

> > for client VMs.
> > I added the requirement "vm.compiler2.enabled & !vm.graal.enabled", 
since 
> > the original bug is for C2 only. Also, the flag EliminateLocks is only 

> > defined in c2_globals.hpp, and neither for C1, nor for JVMCI.
> > 
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8234894
> > Webrev: https://cr.openjdk.java.net/~cgo/8234894/webrev.00
> > 
> > Bug which introduced the issue:
> > https://bugs.openjdk.java.net/browse/JDK-8227384
> > 
> > Thanks,
> > Christoph
> > 
> 


From tobias.hartmann at oracle.com  Thu Nov 28 15:20:52 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 28 Nov 2019 16:20:52 +0100
Subject: RFR: 8234894: [TESTBUG] TestEliminateLocksOffCrash fails for
 client VMs due to Unrecognized VM option EliminateLocks
In-Reply-To: <2wjg9h0fwy-1@userp2020.oracle.com>
References: <20191127095912.A371B11C5C1@aojmv0009>
 <719a17e8-2710-a890-54df-e80714385b4d@oracle.com>
 <2wjg9h0fwy-1@userp2020.oracle.com>
Message-ID: <981a697d-77c9-1c1d-bdd3-3d057ec196b7@oracle.com>

Sure, pushed.

Best regards,
Tobias

On 28.11.19 16:17, christoph.goettschkes at microdoc.com wrote:
> Hi Tobias,
> 
> thanks for the review. I created the changeset:
> 
> https://cr.openjdk.java.net/~cgo/8234894/webrev.01/jdk-jdk.changeset
> 
> Could you please sponsor this change for me and commit it into the 
> repository?
> 
> Thanks,
> Christoph
> 
> Tobias Hartmann <tobias.hartmann at oracle.com> wrote on 2019-11-28 15:23:12:
> 
>> From: Tobias Hartmann <tobias.hartmann at oracle.com>
>> To: christoph.goettschkes at microdoc.com, 
> hotspot-compiler-dev at openjdk.java.net
>> Date: 2019-11-28 15:23
>> Subject: Re: RFR: 8234894: [TESTBUG] TestEliminateLocksOffCrash fails 
>> for client VMs due to Unrecognized VM option EliminateLocks
>>
>> Hi Christoph,
>>
>> looks good to me.
>>
>> Best regards,
>> Tobias
>>
>> On 27.11.19 10:57, christoph.goettschkes at microdoc.com wrote:
>>> Hi,
>>>
>>> please review the following small changeset which fixes the test 
>>>
> test/hotspot/jtreg/compiler/escapeAnalysis/TestEliminateLocksOffCrash.java 
> 
>>> for client VMs.
>>> I added the requirement "vm.compiler2.enabled & !vm.graal.enabled", 
> since 
>>> the original bug is for C2 only. Also, the flag EliminateLocks is only 
> 
>>> defined in c2_globals.hpp, and neither for C1, nor for JVMCI.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234894
>>> Webrev: https://cr.openjdk.java.net/~cgo/8234894/webrev.00
>>>
>>> Bug which introduced the issue:
>>> https://bugs.openjdk.java.net/browse/JDK-8227384
>>>
>>> Thanks,
>>> Christoph
>>>
>>
> 

From vladimir.x.ivanov at oracle.com  Thu Nov 28 15:47:40 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 28 Nov 2019 18:47:40 +0300
Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles
In-Reply-To: <edaf2a22-b3cf-86e7-3745-283827f8d302@oracle.com>
References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com>
 <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com>
 <49878c82-d174-b2c9-219a-0fe561155c0d@oracle.com>
 <58fd5565-4342-ea70-511d-bace68308391@oracle.com>
 <3fcbaa19-1068-d30d-96b4-3f8a52089d28@oracle.com>
 <d1c6622b-e42d-412f-b32f-b9a79e3c7db1@oracle.com>
 <edaf2a22-b3cf-86e7-3745-283827f8d302@oracle.com>
Message-ID: <63464ec9-c7fb-24b7-bf50-65b605087451@oracle.com>


> http://cr.openjdk.java.net/~neliasso/8234520/webrev.06/

Looks good.

src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp

+#include "opto/castnode.hpp"

Unused?

Best regards,
Vladimir Ivanov

>>>>> On 2019-11-21 12:53, Nils Eliasson wrote:
>>>>>> I updated this to version 2.
>>>>>>
>>>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/
>>>>>>
>>>>>> I found a problen running 
>>>>>> compiler/arguments/TestStressReflectiveCode.java
>>>>>>
>>>>>> Even though the clone was created as a oop clone, the type node 
>>>>>> type returns isa_aryprt. This is caused by the src ptr not being 
>>>>>> the base pointer. Until I fix that I wanted a more robust test.
>>>>>>
>>>>>> In this webrev I split up the is_clonebasic into is_clone_oop and 
>>>>>> is_clone_array. (is_clone_oop_array is already there). Having a 
>>>>>> complete set with the three clone types allows for a robust test 
>>>>>> and easy verification. (The three variants end up in different 
>>>>>> paths with different GCs).
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Nils
>>>>>>
>>>>>>
>>>>>> On 2019-11-20 15:25, Nils Eliasson wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I found a few bugs after the enabling of the clone intrinsic in ZGC.
>>>>>>>
>>>>>>> 1) The arraycopy clone_basic has the parameters adjusted to work 
>>>>>>> as a memcopy. For an oop the src is pointing inside the oop to 
>>>>>>> where we want to start copying. But when we want to do a runtime 
>>>>>>> call to clone - the parameters are supposed to be the actual src 
>>>>>>> oop and dst oop, and the size should be the instance size.
>>>>>>>
>>>>>>> For now I have made a workaround. What should be done later is 
>>>>>>> using the offset in the arraycopy node to encode where the 
>>>>>>> payload is, so that the base pointers are always correct. But 
>>>>>>> that would require changes to the BarrierSet classes of all GCs. 
>>>>>>> So I leave that for next release.
>>>>>>>
>>>>>>> 2) The size parameter of the TypeFunc for the runtime call has 
>>>>>>> the wrong type. It was originally Long but missed the upper Half, 
>>>>>>> it was fixed to INT (JDK-8233834), but that is wrong and causes 
>>>>>>> the compiles to be skipped. We didn't notice that since they 
>>>>>>> failed silently. That is also why we didn't notice problem #1 too.
>>>>>>>
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234520
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/
>>>>>>>
>>>>>>> Please review!
>>>>>>>
>>>>>>> Nils
>>>>>>>

From nils.eliasson at oracle.com  Thu Nov 28 16:35:39 2019
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Thu, 28 Nov 2019 17:35:39 +0100
Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles
In-Reply-To: <63464ec9-c7fb-24b7-bf50-65b605087451@oracle.com>
References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com>
 <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com>
 <49878c82-d174-b2c9-219a-0fe561155c0d@oracle.com>
 <58fd5565-4342-ea70-511d-bace68308391@oracle.com>
 <3fcbaa19-1068-d30d-96b4-3f8a52089d28@oracle.com>
 <d1c6622b-e42d-412f-b32f-b9a79e3c7db1@oracle.com>
 <edaf2a22-b3cf-86e7-3745-283827f8d302@oracle.com>
 <63464ec9-c7fb-24b7-bf50-65b605087451@oracle.com>
Message-ID: <33a7c1bb-8a6b-5dc5-82a6-98c6508343fd@oracle.com>


On 2019-11-28 16:47, Vladimir Ivanov wrote:
>
>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.06/
>
> Looks good.
>
> src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp
>
> +#include "opto/castnode.hpp"
>
> Unused?

Yes, I'll remove that one.

Thanks!

// Nils

>
> Best regards,
> Vladimir Ivanov
>
>>>>>> On 2019-11-21 12:53, Nils Eliasson wrote:
>>>>>>> I updated this to version 2.
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/
>>>>>>>
>>>>>>> I found a problen running 
>>>>>>> compiler/arguments/TestStressReflectiveCode.java
>>>>>>>
>>>>>>> Even though the clone was created as a oop clone, the type node 
>>>>>>> type returns isa_aryprt. This is caused by the src ptr not being 
>>>>>>> the base pointer. Until I fix that I wanted a more robust test.
>>>>>>>
>>>>>>> In this webrev I split up the is_clonebasic into is_clone_oop 
>>>>>>> and is_clone_array. (is_clone_oop_array is already there). 
>>>>>>> Having a complete set with the three clone types allows for a 
>>>>>>> robust test and easy verification. (The three variants end up in 
>>>>>>> different paths with different GCs).
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Nils
>>>>>>>
>>>>>>>
>>>>>>> On 2019-11-20 15:25, Nils Eliasson wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I found a few bugs after the enabling of the clone intrinsic in 
>>>>>>>> ZGC.
>>>>>>>>
>>>>>>>> 1) The arraycopy clone_basic has the parameters adjusted to 
>>>>>>>> work as a memcopy. For an oop the src is pointing inside the 
>>>>>>>> oop to where we want to start copying. But when we want to do a 
>>>>>>>> runtime call to clone - the parameters are supposed to be the 
>>>>>>>> actual src oop and dst oop, and the size should be the instance 
>>>>>>>> size.
>>>>>>>>
>>>>>>>> For now I have made a workaround. What should be done later is 
>>>>>>>> using the offset in the arraycopy node to encode where the 
>>>>>>>> payload is, so that the base pointers are always correct. But 
>>>>>>>> that would require changes to the BarrierSet classes of all 
>>>>>>>> GCs. So I leave that for next release.
>>>>>>>>
>>>>>>>> 2) The size parameter of the TypeFunc for the runtime call has 
>>>>>>>> the wrong type. It was originally Long but missed the upper 
>>>>>>>> Half, it was fixed to INT (JDK-8233834), but that is wrong and 
>>>>>>>> causes the compiles to be skipped. We didn't notice that since 
>>>>>>>> they failed silently. That is also why we didn't notice problem 
>>>>>>>> #1 too.
>>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234520
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/
>>>>>>>>
>>>>>>>> Please review!
>>>>>>>>
>>>>>>>> Nils
>>>>>>>>

From john.r.rose at oracle.com  Thu Nov 28 20:23:26 2019
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 28 Nov 2019 12:23:26 -0800
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
Message-ID: <C5533D91-B35E-4956-BD67-41FDEC505F74@oracle.com>

On Nov 27, 2019, at 11:34 PM, Thomas St?fe <thomas.stuefe at gmail.com> wrote:
> 
> In the end, I wonder whether we should have two kind of APIs, or a
> parameter, distinguishing between "next power of 2" and "next power of 2
> unless input value is already power of 2?.

Naming is important for clarity.  ?Round up? means if it?s already ?rounded?
(whatever that means) the input is returned unchanged.

The other notion is a true ?next up?, because it always increases ?
barring overflow.  The possibility of overflow makes the ?next up? function
more bug-prone than the ?round up? function.

The usual trick for deriving that second function is to add one to the argument
to the first function, ensuring that the result will always increase.

If (for clarity) we implement a ?next power of two? function, rather than ask
coders to use the +1 trick, the second function should be implemented in terms of
the first function using the +1 trick, maybe with an assert added against overflow.

My $0.02.

? John


From claes.redestad at oracle.com  Thu Nov 28 21:02:56 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Thu, 28 Nov 2019 22:02:56 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <C5533D91-B35E-4956-BD67-41FDEC505F74@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
 <C5533D91-B35E-4956-BD67-41FDEC505F74@oracle.com>
Message-ID: <7e25e62f-46c1-ec08-2b2f-60436251d12b@oracle.com>

I'm working on a new version, but I'm also out sick, so don't expect
anything soon.

I just want to point out that the "round up to power of 2"
implementations I've seen seem prone to the same kind of overflows as a
next up would, just not for exactly the same set of inputs.

/Claes

On 2019-11-28 21:23, John Rose wrote:
> On Nov 27, 2019, at 11:34 PM, Thomas St?fe <thomas.stuefe at gmail.com 
> <mailto:thomas.stuefe at gmail.com>> wrote:
>>
>> In the end, I wonder whether we should have two kind of APIs, or a
>> parameter, distinguishing between "next power of 2" and "next power of 2
>> unless input value is already power of 2?.
> 
> Naming is important for clarity. ??Round up? means if it?s already ?rounded?
> (whatever that means) the input is returned unchanged.
> 
> The other notion is a true ?next up?, because it always increases ?
> barring overflow. ?The possibility of overflow makes the ?next up? function
> more bug-prone than the ?round up? function.
> 
> The usual trick for deriving that second function is to add one to the 
> argument
> to the first function, ensuring that the result will always increase.
> 
> If (for clarity) we implement a ?next power of two? function, rather 
> than ask
> coders to use the +1 trick, the second function should be implemented in 
> terms of
> the first function using the +1 trick, maybe with an assert added 
> against overflow.
> 
> My $0.02.
> 
> ? John
> 

From Pengfei.Li at arm.com  Fri Nov 29 03:41:56 2019
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Fri, 29 Nov 2019 03:41:56 +0000
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
 <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com>
 <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com>
Message-ID: <DB7PR08MB31153A896B02FC1F39C498C196460@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi Andrew,

I just caught up with your discussion with Nick.

> I guess. It'd be nicer to fix CDS on AArch64 so that it doesn't cause
> performance regressions.

The 4G alignment search may still fail after the fix. Regarding to my second webrev, do you agree that checking the base and shift values in a function in aarch64.ad? See my last email [1] for detail explanations.

[1] https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2019-November/008278.html

--
Thanks,
Pengfei


From nick.gasson at arm.com  Fri Nov 29 06:40:23 2019
From: nick.gasson at arm.com (Nick Gasson)
Date: Fri, 29 Nov 2019 14:40:23 +0800
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <DB7PR08MB31153A896B02FC1F39C498C196460@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
 <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com>
 <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com>
 <DB7PR08MB31153A896B02FC1F39C498C196460@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <f1d7906d-9a1e-1c35-5909-02397c651e2b@arm.com>

On 29/11/2019 11:41, Pengfei Li (Arm Technology China) wrote:
> 
> The 4G alignment search may still fail after the fix. Regarding to my second webrev, do you agree that checking the base and shift values in a function in aarch64.ad? See my last email [1] for detail explanations.
> 

How about we exit with a fatal error if we can't find a suitably aligned 
region? Then we can remove the code in decode_klass_non_null that uses 
R27 and this patch is much simpler. That code path is poorly tested at 
the moment so it seems risky to leave it in. With a hard error at least 
users will report it to us so we can fix it.

Thanks,
Nick

From navy.xliu at gmail.com  Fri Nov 29 07:16:57 2019
From: navy.xliu at gmail.com (Liu Xin)
Date: Thu, 28 Nov 2019 23:16:57 -0800
Subject: [XXS] C1 misses to dump a reason when it inlines successfully
In-Reply-To: <VI1PR0201MB2479F3C817CFDA97B7FF9AE59A470@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <CAHPdc0nHkkqNGK=_HkwCCdHwqva-Nty9K8fMQvB28dP66N1y7Q@mail.gmail.com>
 <3759c24e-cdd7-a39c-40cd-26a403a07477@oracle.com>
 <VI1PR0201MB2479F3C817CFDA97B7FF9AE59A470@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <CAHPdc0n0XSzvUsPe3HAunxP2tNqmpu9-hka+_9UYzcusmWbpNQ@mail.gmail.com>

Hi, Tobias & Martin,

The original message even didn't say inline succeed or not. When I see
those, I have to dig into the source code. I spent even more time because I
didn't realize that those messages are from C1. That's my standpoint and
motivation as a new hotspot developer.
What C2 inliner emits "inline (hot)"  is not very informative neither, but
at least developers know that this callee is inlined. I gave up "by the
rules of C1". I think "inline" is fine. developers can also tell difference
from C2's output.

I agree with you about the default parameter.  Failed attempts should
always have a reason. Developers can inspect them and make adjustment.

Here is a new revision. Could you take a look?
https://cr.openjdk.java.net/~xliu/8234541/01/webrev/

sample output is like this.
                              @ 1   java.lang.String::isLatin1 (19 bytes)
inline
                              @ 12   java.lang.StringLatin1::charAt (28
bytes)   inline
                                @ 15
 java/lang/StringIndexOutOfBoundsException::<init> (not loaded)   not
inlineable
                              @ 21  java/lang/StringUTF16::charAt (not
loaded)   not inlineable
                              @ 15
 java/lang/StringIndexOutOfBoundsException::<init> (not loaded)   not
inlineable
                              @ 3   java.util.stream.ReduceOps::makeInt (18
bytes)   inline
                                @ 1   java.util.Objects::requireNonNull (14
bytes)   inline
                                  @ 8
java.lang.NullPointerException::<init> (5 bytes)   don't inline Throwable
constructors
                                @ 14   java.util.stream.ReduceOps$6::<init>
(16 bytes)   inline
                                  @ 12
java.util.stream.ReduceOps$ReduceOp::<init> (10 bytes)   inline
                                    @ 1   java.lang.Object::<init> (1
bytes)   inline
                              @ 6
java.util.stream.AbstractPipeline::evaluate (94 bytes)   callee is too large
thanks,
--lx


On Thu, Nov 28, 2019 at 6:53 AM Doerr, Martin <martin.doerr at sap.com> wrote:

> Hi,
>
> I agree with Tobias. " by the rules of C1" doesn't add any useful
> information IMHO.
> But I'd be fine with just adding "inline" to avoid the confusion.
>
> > Anyway, with your change, all callers now pass a msg argument and
> > therefore the null handling logic
> > should be removed (replaced by an assert). Also, the default value for
> the
> > argument can be removed.
> +1
>
> Best regards,
> Martin
>
>
> > -----Original Message-----
> > From: hotspot-compiler-dev <hotspot-compiler-dev-
> > bounces at openjdk.java.net> On Behalf Of Tobias Hartmann
> > Sent: Donnerstag, 28. November 2019 15:04
> > To: Liu Xin <navy.xliu at gmail.com>; hotspot-compiler-dev at openjdk.java.net
> > Subject: Re: [XXS] C1 misses to dump a reason when it inlines
> successfully
> >
> > Hi,
> >
> > I'm still not convinced that this message adds any useful information
> but let's
> > see what other
> > reviewers think.
> >
> > Anyway, with your change, all callers now pass a msg argument and
> > therefore the null handling logic
> > should be removed (replaced by an assert). Also, the default value for
> the
> > argument can be removed.
> >
> > Best regards,
> > Tobias
> >
> > On 22.11.19 19:09, Liu Xin wrote:
> > > hi, Reviewers,
> > >
> > > Could you review this extremely small change?
> > > Bugs: https://bugs.openjdk.java.net/browse/JDK-8234541
> > > Webrev: https://cr.openjdk.java.net/~xliu/8234541/00/webrev/
> > >
> > > When I analyzed PrintInlining, I was confused by the inline message
> > without
> > > any detail. It's not easy for developer to tell if this method is
> inlined
> > > or not. This patch add a comment "inline by the rules of C1".
> > >
> > > I would like to add an explicit reason, but there's no decisive reason
> in
> > > GraphBuilder::try_inline_full. It just passes all restrict rules. Any
> other
> > > suggestion would be appreciated.
> > >
> > > Thanks,
> > > --lx
> > >
>

From martin.doerr at sap.com  Fri Nov 29 08:51:58 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 29 Nov 2019 08:51:58 +0000
Subject: [XXS] C1 misses to dump a reason when it inlines successfully
In-Reply-To: <CAHPdc0n0XSzvUsPe3HAunxP2tNqmpu9-hka+_9UYzcusmWbpNQ@mail.gmail.com>
References: <CAHPdc0nHkkqNGK=_HkwCCdHwqva-Nty9K8fMQvB28dP66N1y7Q@mail.gmail.com>
 <3759c24e-cdd7-a39c-40cd-26a403a07477@oracle.com>
 <VI1PR0201MB2479F3C817CFDA97B7FF9AE59A470@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <CAHPdc0n0XSzvUsPe3HAunxP2tNqmpu9-hka+_9UYzcusmWbpNQ@mail.gmail.com>
Message-ID: <HE1PR0201MB24757D7F9BE562BA394ADB689A460@HE1PR0201MB2475.eurprd02.prod.outlook.com>

Hi,

looks good to me. Thanks for cleaning it up.

Best regards,
Martin


From: Liu Xin <navy.xliu at gmail.com>
Sent: Freitag, 29. November 2019 08:17
To: Doerr, Martin <martin.doerr at sap.com>
Cc: Tobias Hartmann <tobias.hartmann at oracle.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: [XXS] C1 misses to dump a reason when it inlines successfully

Hi, Tobias & Martin,

The original message even didn't say inline succeed or not. When I see those, I have to dig into the source code. I spent even more time because I didn't realize that those messages are from C1. That's my standpoint and motivation as a new hotspot developer.
What C2 inliner emits "inline (hot)"  is not very informative neither, but at least developers know that this callee is inlined. I gave up "by the rules of C1". I think "inline" is fine. developers can also tell difference from C2's output.

I agree with you about the default parameter.  Failed attempts should always have a reason. Developers can inspect them and make adjustment.

Here is a new revision. Could you take a look?
https://cr.openjdk.java.net/~xliu/8234541/01/webrev/

sample output is like this.
                              @ 1   java.lang.String::isLatin1 (19 bytes)   inline
                              @ 12   java.lang.StringLatin1::charAt (28 bytes)   inline
                                @ 15  java/lang/StringIndexOutOfBoundsException::<init> (not loaded)   not inlineable
                              @ 21  java/lang/StringUTF16::charAt (not loaded)   not inlineable
                              @ 15  java/lang/StringIndexOutOfBoundsException::<init> (not loaded)   not inlineable
                              @ 3   java.util.stream.ReduceOps::makeInt (18 bytes)   inline
                                @ 1   java.util.Objects::requireNonNull (14 bytes)   inline
                                  @ 8   java.lang.NullPointerException::<init> (5 bytes)   don't inline Throwable constructors
                                @ 14   java.util.stream.ReduceOps$6::<init> (16 bytes)   inline
                                  @ 12   java.util.stream.ReduceOps$ReduceOp::<init> (10 bytes)   inline
                                    @ 1   java.lang.Object::<init> (1 bytes)   inline
                              @ 6   java.util.stream.AbstractPipeline::evaluate (94 bytes)   callee is too large
thanks,
--lx


On Thu, Nov 28, 2019 at 6:53 AM Doerr, Martin <martin.doerr at sap.com<mailto:martin.doerr at sap.com>> wrote:
Hi,

I agree with Tobias. " by the rules of C1" doesn't add any useful information IMHO.
But I'd be fine with just adding "inline" to avoid the confusion.

> Anyway, with your change, all callers now pass a msg argument and
> therefore the null handling logic
> should be removed (replaced by an assert). Also, the default value for the
> argument can be removed.
+1

Best regards,
Martin


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net<mailto:bounces at openjdk.java.net>> On Behalf Of Tobias Hartmann
> Sent: Donnerstag, 28. November 2019 15:04
> To: Liu Xin <navy.xliu at gmail.com<mailto:navy.xliu at gmail.com>>; hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: [XXS] C1 misses to dump a reason when it inlines successfully
>
> Hi,
>
> I'm still not convinced that this message adds any useful information but let's
> see what other
> reviewers think.
>
> Anyway, with your change, all callers now pass a msg argument and
> therefore the null handling logic
> should be removed (replaced by an assert). Also, the default value for the
> argument can be removed.
>
> Best regards,
> Tobias
>
> On 22.11.19 19:09, Liu Xin wrote:
> > hi, Reviewers,
> >
> > Could you review this extremely small change?
> > Bugs: https://bugs.openjdk.java.net/browse/JDK-8234541
> > Webrev: https://cr.openjdk.java.net/~xliu/8234541/00/webrev/
> >
> > When I analyzed PrintInlining, I was confused by the inline message
> without
> > any detail. It's not easy for developer to tell if this method is inlined
> > or not. This patch add a comment "inline by the rules of C1".
> >
> > I would like to add an explicit reason, but there's no decisive reason in
> > GraphBuilder::try_inline_full. It just passes all restrict rules. Any other
> > suggestion would be appreciated.
> >
> > Thanks,
> > --lx
> >

From tobias.hartmann at oracle.com  Fri Nov 29 08:56:14 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 29 Nov 2019 09:56:14 +0100
Subject: [XXS] C1 misses to dump a reason when it inlines successfully
In-Reply-To: <HE1PR0201MB24757D7F9BE562BA394ADB689A460@HE1PR0201MB2475.eurprd02.prod.outlook.com>
References: <CAHPdc0nHkkqNGK=_HkwCCdHwqva-Nty9K8fMQvB28dP66N1y7Q@mail.gmail.com>
 <3759c24e-cdd7-a39c-40cd-26a403a07477@oracle.com>
 <VI1PR0201MB2479F3C817CFDA97B7FF9AE59A470@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <CAHPdc0n0XSzvUsPe3HAunxP2tNqmpu9-hka+_9UYzcusmWbpNQ@mail.gmail.com>
 <HE1PR0201MB24757D7F9BE562BA394ADB689A460@HE1PR0201MB2475.eurprd02.prod.outlook.com>
Message-ID: <dfa1a816-1935-9923-aa35-4148ed3abde8@oracle.com>

Hi,

looks good to me too.

Best regards,
Tobias

On 29.11.19 09:51, Doerr, Martin wrote:
> Hi,
> 
> ?
> 
> looks good to me. Thanks for cleaning it up.
> 
> ?
> 
> Best regards,
> 
> Martin
> 
> ?
> 
> ?
> 
> *From:*Liu Xin <navy.xliu at gmail.com>
> *Sent:* Freitag, 29. November 2019 08:17
> *To:* Doerr, Martin <martin.doerr at sap.com>
> *Cc:* Tobias Hartmann <tobias.hartmann at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> *Subject:* Re: [XXS] C1 misses to dump a reason when it inlines successfully
> 
> ?
> 
> Hi, Tobias & Martin,?
> 
> ?
> 
> The original?message even didn't say inline succeed or not. When I see those, I have to dig into the
> source code. I spent even more time because I didn't realize that those messages are from C1. That's
> my standpoint and motivation as a new hotspot developer.
> 
> What C2 inliner emits "inline (hot)"? is not very informative neither, but at least developers know
> that this callee is inlined. I gave up "by the rules of C1". I think "inline" is fine. developers
> can also tell difference from C2's output.?
> 
> ?
> 
> I agree with you about the default parameter.? Failed attempts should always have a reason.
> Developers can inspect them and make adjustment.
> 
> ?
> 
> Here is a new revision. Could you take a look?
> 
> https://cr.openjdk.java.net/~xliu/8234541/01/webrev/
> 
> ?
> 
> sample output is like this.??
> 
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 1 ? java.lang.String::isLatin1 (19 bytes) ? inline
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 12 ? java.lang.StringLatin1::charAt (28 bytes) ? inline
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 15 ?java/lang/StringIndexOutOfBoundsException::<init> (not loaded)
> ? not inlineable
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 21 ?java/lang/StringUTF16::charAt (not loaded) ? not inlineable
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 15 ?java/lang/StringIndexOutOfBoundsException::<init> (not loaded) ?
> not inlineable
> 
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 3 ? java.util.stream.ReduceOps::makeInt (18 bytes) ? inline
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 1 ? java.util.Objects::requireNonNull (14 bytes) ? inline
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 8 ? java.lang.NullPointerException::<init> (5 bytes) ? don't
> inline Throwable constructors
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 14 ? java.util.stream.ReduceOps$6::<init> (16 bytes) ? inline
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 12 ? java.util.stream.ReduceOps$ReduceOp::<init> (10 bytes) ? inline
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 1 ? java.lang.Object::<init> (1 bytes) ? inline
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 6 ? java.util.stream.AbstractPipeline::evaluate (94 bytes) ? callee
> is too large
> 
> thanks,?
> 
> --lx
> 
> ?
> 
> ?
> 
> ?
> 
> On Thu, Nov 28, 2019 at 6:53 AM Doerr, Martin <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>
> wrote:
> 
>     Hi,
> 
>     I agree with Tobias. " by the rules of C1" doesn't add any useful information IMHO.
>     But I'd be fine with just adding "inline" to avoid the confusion.
> 
>     > Anyway, with your change, all callers now pass a msg argument and
>     > therefore the null handling logic
>     > should be removed (replaced by an assert). Also, the default value for the
>     > argument can be removed.
>     +1
> 
>     Best regards,
>     Martin
> 
> 
>     > -----Original Message-----
>     > From: hotspot-compiler-dev <hotspot-compiler-dev-
>     > bounces at openjdk.java.net <mailto:bounces at openjdk.java.net>> On Behalf Of Tobias Hartmann
>     > Sent: Donnerstag, 28. November 2019 15:04
>     > To: Liu Xin <navy.xliu at gmail.com <mailto:navy.xliu at gmail.com>>;
>     hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>     > Subject: Re: [XXS] C1 misses to dump a reason when it inlines successfully
>     >
>     > Hi,
>     >
>     > I'm still not convinced that this message adds any useful information but let's
>     > see what other
>     > reviewers think.
>     >
>     > Anyway, with your change, all callers now pass a msg argument and
>     > therefore the null handling logic
>     > should be removed (replaced by an assert). Also, the default value for the
>     > argument can be removed.
>     >
>     > Best regards,
>     > Tobias
>     >
>     > On 22.11.19 19:09, Liu Xin wrote:
>     > > hi, Reviewers,
>     > >
>     > > Could you review this extremely small change?
>     > > Bugs: https://bugs.openjdk.java.net/browse/JDK-8234541
>     > > Webrev: https://cr.openjdk.java.net/~xliu/8234541/00/webrev/
>     > >
>     > > When I analyzed PrintInlining, I was confused by the inline message
>     > without
>     > > any detail. It's not easy for developer to tell if this method is inlined
>     > > or not. This patch add a comment "inline by the rules of C1".
>     > >
>     > > I would like to add an explicit reason, but there's no decisive reason in
>     > > GraphBuilder::try_inline_full. It just passes all restrict rules. Any other
>     > > suggestion would be appreciated.
>     > >
>     > > Thanks,
>     > > --lx
>     > >
> 

From aph at redhat.com  Fri Nov 29 10:07:38 2019
From: aph at redhat.com (Andrew Haley)
Date: Fri, 29 Nov 2019 10:07:38 +0000
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <DB7PR08MB31153A896B02FC1F39C498C196460@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
 <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com>
 <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com>
 <DB7PR08MB31153A896B02FC1F39C498C196460@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <8a0ae655-8544-a4fc-7551-d7634ebdaaa8@redhat.com>

On 11/29/19 3:41 AM, Pengfei Li (Arm Technology China) wrote:
> The 4G alignment search may still fail after the fix.

It may, but very unlikely.

 Regarding to my second webrev, do you agree that checking the base and shift values in a function in aarch64.ad? See my last email [1] for detail explanations.
> 
> [1] https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2019-November/008278.html

Not really, no. A method should be called exactly once from the code
that does the memory allocation, and then set a flag to be read
thereafter. It is not ideal to do it from the MacroAssembler
constructor, because Assembler instances are created wihte very hihg
frequency. I don't undestand why you simply can't do what I suggested.

You say

> But we have to do it in Metaspace::set_narrow_klass_base_and_shift()
> where the base and shift are finally determined and introduce new
> code block of "#ifdef AARCH64 #endif" in HotSpot shared code.

So do that, or perhaps introduce an overridable function in
AbstractAssembler which does nothing on other ports. But don't keep
executing the same logic again and again. Once base and shift are set
they never change.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From aph at redhat.com  Fri Nov 29 10:10:07 2019
From: aph at redhat.com (Andrew Haley)
Date: Fri, 29 Nov 2019 10:10:07 +0000
Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27
 conditionally allocatable
In-Reply-To: <f1d7906d-9a1e-1c35-5909-02397c651e2b@arm.com>
References: <DB7PR08MB3115B98F12ACC996FE0EE4FA96760@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com>
 <DB7PR08MB311558A967C30C11C10CAADE96700@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com>
 <cb73db58-b00b-2e12-d64f-ae7671aa7cbe@arm.com>
 <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com>
 <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com>
 <DB7PR08MB31153A896B02FC1F39C498C196460@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <f1d7906d-9a1e-1c35-5909-02397c651e2b@arm.com>
Message-ID: <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com>

On 11/29/19 6:40 AM, Nick Gasson wrote:
> On 29/11/2019 11:41, Pengfei Li (Arm Technology China) wrote:
>> The 4G alignment search may still fail after the fix. Regarding to my second webrev, do you agree that checking the base and shift values in a function in aarch64.ad? See my last email [1] for detail explanations.
>>
> How about we exit with a fatal error if we can't find a suitably aligned 
> region? Then we can remove the code in decode_klass_non_null that uses 
> R27 and this patch is much simpler. That code path is poorly tested at 
> the moment so it seems risky to leave it in. With a hard error at least 
> users will report it to us so we can fix it.

That is starting to sound very attractive. With a 64-bit address space I'm
finding it very hard to imagine a scenario in which we don't find a
suitable address. I think AOT-compiled code would still be OK, because it
generates different code, but we'd have to do some testing.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From boris.ulasevich at bell-sw.com  Fri Nov 29 10:15:48 2019
From: boris.ulasevich at bell-sw.com (Boris Ulasevich)
Date: Fri, 29 Nov 2019 13:15:48 +0300
Subject: RFR(S) 8234893: ARM32: build failure after JDK-8234387
In-Reply-To: <cac2fdca-773b-b43c-fbad-a42a0debc7b7@oracle.com>
References: <ba56df0b-af44-9c1a-acfa-a87e0fbe2ba3@bell-sw.com>
 <cac2fdca-773b-b43c-fbad-a42a0debc7b7@oracle.com>
Message-ID: <813c0785-01b7-0be4-a084-eb4306040fc3@bell-sw.com>

Thank you!

On 28.11.2019 11:55, Vladimir Ivanov wrote:
> Looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
> On 28.11.2019 11:42, Boris Ulasevich wrote:
>> Hi,
>>
>> Please review the fix in arm.ad to address the ARM32 build issue 
>> "Ideal node missing". The fix is just trivial adding missing 
>> declarations: R8RegP, R9RegP, R12RegP, SPRegP. jdk/hotspot jtreg tests 
>> are Ok.
>>
>> http://bugs.openjdk.java.net/browse/JDK-8234893
>> http://cr.openjdk.java.net/~bulasevich/8234893/webrev.00
>>
>> thanks,
>> Boris

From vladimir.x.ivanov at oracle.com  Fri Nov 29 14:19:52 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 29 Nov 2019 17:19:52 +0300
Subject: [14] RFR(S): 8234617: C1: Incorrect result of field load due to
 missing narrowing conversion
In-Reply-To: <d317e7aa-92fa-c726-40dd-c004302a8b9f@oracle.com>
References: <d317e7aa-92fa-c726-40dd-c004302a8b9f@oracle.com>
Message-ID: <d3045fa5-e6dc-505b-6cce-7098aaff0656@oracle.com>


> http://cr.openjdk.java.net/~thartmann/8234617/webrev.00/

Looks good.

Best regards,
Vladimir Ivanov

> 
> Writing an (integer) value to a boolean, byte, char or short field includes an implicit narrowing
> conversion [1]. With -XX:+EliminateFieldAccess (default), C1 tries to omit field loads by caching
> and reusing the last written value. The problem is that this value is not necessarily converted to
> the field type and we end up using an incorrect value.
> 
> For example, for the field store/load in testShort, C1 emits:
>    [...]
>    0x00007f0fc582bd6c:   mov    %dx,0x12(%rsi)
>    0x00007f0fc582bd70:   mov    %rdx,%rax
>    [...]
> 
> The field load has been eliminated and the non-converted integer value (%rdx) is returned.
> 
> The fix is to emit an explicit conversion to get the correct field value after the write:
>    [...]
>    0x00007ff07982bd6c:   mov    %dx,0x12(%rsi)
>    0x00007ff07982bd70:   movswl %dx,%edx
>    0x00007ff07982bd73:   mov    %rdx,%rax
>    [...]
> 
> Thanks,
> Tobias
> 
> [1] https://docs.oracle.com/javase/specs/jvms/se13/html/jvms-6.html#jvms-6.5.putfield
> 

From vladimir.x.ivanov at oracle.com  Fri Nov 29 15:28:08 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 29 Nov 2019 18:28:08 +0300
Subject: [14] RFR (S): 8231430: C2: Memory stomp in max_array_length() for
 T_ILLEGAL type
In-Reply-To: <83147646-d353-1f46-f50b-8c0edf16645f@oracle.com>
References: <00ab4462-ca1d-7e37-6e92-aca8e975e79d@oracle.com>
 <83147646-d353-1f46-f50b-8c0edf16645f@oracle.com>
Message-ID: <a356313c-a7e7-fd36-5a3f-cde4bf799a8c@oracle.com>

Thanks for the review, Vladimir.

> May be we should have permanent guarantee() in 
> TypeAryPtr::max_array_length() for all types which we don't expect to 
> see and not temporary assert().

Makes sense. What do you think about the following?

   http://cr.openjdk.java.net/~vlivanov/8231430/webrev.01/

Best regards,
Vladimir Ivanov

> On 11/27/19 5:54 AM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/8231430/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8231430
>>
>> There's a memory stomp happening in max_array_length() for T_ILLEGAL 
>> type. T_ILLEGAL type arises as an element basic type for a merge of 2 
>> primitive arrays (bottom[]). max_array_length() does some input 
>> normalization (T_ILLEGAL => T_BYTE), but first it acquires a reference 
>> to the a cache slot which is out-of-bounds (T_ILLEGAL = 99 vs 
>> T_CONFLICT = 19).
>>
>> I was able to reproduce the problem as a corruption of one of the OOPs 
>> in Universe::_mirrors array which happened to be put close enough to 
>> max_array_length_cache in memory.
>>
>> I propose to completely remove the cache. 
>> arrayOopDesc::max_array_length() doesn't look too expensive and the 
>> method is not used on a hot path anywhere.
>>
>> Also, I put an assert for T_VOID, T_CONFLICT, T_NARROWKLASS cases, but 
>> left the logic there (=> T_BYTE) to get more testing before removing 
>> them.
>>
>> Testing: hs-precheckin-comp, tier1-5.
>>
>> Best regards,
>> Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Fri Nov 29 15:42:14 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 29 Nov 2019 18:42:14 +0300
Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap
 unsafe accesses
Message-ID: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8226411

There were a number of fixes in C2 support for unsafe accesses recently 
which led to additional memory barriers around them. It improved 
stability, but in some cases it was redundant. One of important use 
cases which regressed is off-heap accesses [1]. The barriers around them 
are redundant because they are serialized on raw memory and don't 
intersect with any on-heap accesses.

Proposed fix skips memory barriers around unsafe accesses which are 
provably off-heap (base == NULL).

It (almost completely) recovers performance on the microbenchmark 
provided in JDK-8224182 [1].

Testing: tier1-6.

Best regards,
Vladimir Ivanov

[1] https://bugs.openjdk.java.net/browse/JDK-8224182

From adinn at redhat.com  Fri Nov 29 15:48:45 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 29 Nov 2019 15:48:45 +0000
Subject: RFR: (T) AArch64: compiler/c2/aarch64/TestVolatilesG1.java fails with
 "Missing expected output membar_volatile..."
Message-ID: <be6441ad-e90b-55bc-00c4-7a10579a5b70@redhat.com>

Could I please have a review of the following fix to file
TestVolatiles.java which is currently broken after the commit of the fix
for relates to JDK-8225776 (Optimize branch frequency of G1's write
post-barrier in C2).

JIRA:   https://bugs.openjdk.java.net/browse/JDK-8225776
webrev: http://cr.openjdk.java.net/~adinn/8232828/webrev.00

The test parses compiler AArch64 PrintAssembly output foir a variety of
volatile read, write and CAS operations to check that membars are added
or omitted appropriately when using, respectively, acquire/release
accesses vs unordered accesses supplemented with barriers. n.b. the test
only runs on a debug build JVM.

The current parse check expects to see G1 barrier operations between the
access and the trailing barrier + return. JDK-8225776 relocates the
barrier operations out of line after the trailing barrier + return. The
test has been updated so that the expected pattern of instructions
reflects this new order.


Testing:

Before the fix the test fails. After the fix it succeeds.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From vladimir.x.ivanov at oracle.com  Fri Nov 29 15:55:43 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 29 Nov 2019 18:55:43 +0300
Subject: [14] RFR (S): 8234923: Missed call_site_target nmethod dependency for
 non-fully initialized ConstantCallSite instance
Message-ID: <7d4c2ab1-f8ec-8ccc-a442-8401a048b353@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8234923/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8234923

The fix for 8234401 is incomplete: though it does notify the JVM about 
call site target update (by calling setTargetNormal [2]), on JVM side 
JITs skip nmethod dependencies for ConstantCallSites irrespective of 
whether they are fully initialized or not. So, affected nmethods aren't 
invalidated.

The fix is to make JITs aware about initialization status of 
ConstantCallSite instance by inspecting its isFrozen field and register 
proper nmethod dependency if a CallSite is being initialized.

Testing: tier1-6

Best regards,
Vladimir Ivanov

[1] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036019.html

[2] https://hg.openjdk.java.net/jdk/jdk/rev/a6e25566cb56#l1.26

From aph at redhat.com  Fri Nov 29 15:56:57 2019
From: aph at redhat.com (Andrew Haley)
Date: Fri, 29 Nov 2019 15:56:57 +0000
Subject: RFR: (T) AArch64: compiler/c2/aarch64/TestVolatilesG1.java fails
 with "Missing expected output membar_volatile..."
In-Reply-To: <be6441ad-e90b-55bc-00c4-7a10579a5b70@redhat.com>
References: <be6441ad-e90b-55bc-00c4-7a10579a5b70@redhat.com>
Message-ID: <b4356e57-0c76-8097-d9c1-40a24f62317e@redhat.com>

On 11/29/19 3:48 PM, Andrew Dinn wrote:
> JIRA:   https://bugs.openjdk.java.net/browse/JDK-8225776
> webrev: http://cr.openjdk.java.net/~adinn/8232828/webrev.00
> 
> The test parses compiler AArch64 PrintAssembly output foir a variety of
> volatile read, write and CAS operations to check that membars are added
> or omitted appropriately when using, respectively, acquire/release
> accesses vs unordered accesses supplemented with barriers. n.b. the test
> only runs on a debug build JVM.

Cool, thanks. The endless game of whack-a-mole. :-)

Trouble is, there are many possible correct sequences. Still, we can't
check for them all without making this test AI-complete!

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From rkennke at redhat.com  Fri Nov 29 15:59:26 2019
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 29 Nov 2019 16:59:26 +0100
Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap
 unsafe accesses
In-Reply-To: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com>
References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com>
Message-ID: <024ee660-a0e5-e899-5c48-4ca12ffa37fa@redhat.com>

Hi Vladimir,

This is cool.

Does it affect this:
https://bugs.openjdk.java.net/browse/JDK-8220714

Thanks,
Roman

> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8226411
> 
> There were a number of fixes in C2 support for unsafe accesses recently
> which led to additional memory barriers around them. It improved
> stability, but in some cases it was redundant. One of important use
> cases which regressed is off-heap accesses [1]. The barriers around them
> are redundant because they are serialized on raw memory and don't
> intersect with any on-heap accesses.
> 
> Proposed fix skips memory barriers around unsafe accesses which are
> provably off-heap (base == NULL).
> 
> It (almost completely) recovers performance on the microbenchmark
> provided in JDK-8224182 [1].
> 
> Testing: tier1-6.
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8224182
> 


From adinn at redhat.com  Fri Nov 29 16:12:14 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 29 Nov 2019 16:12:14 +0000
Subject: RFR: (T) AArch64: compiler/c2/aarch64/TestVolatilesG1.java fails
 with "Missing expected output membar_volatile..."
In-Reply-To: <b4356e57-0c76-8097-d9c1-40a24f62317e@redhat.com>
References: <be6441ad-e90b-55bc-00c4-7a10579a5b70@redhat.com>
 <b4356e57-0c76-8097-d9c1-40a24f62317e@redhat.com>
Message-ID: <eab13fe3-27f4-e360-abee-a95834aa7b92@redhat.com>

On 29/11/2019 15:56, Andrew Haley wrote:
> On 11/29/19 3:48 PM, Andrew Dinn wrote:
>> JIRA:   https://bugs.openjdk.java.net/browse/JDK-8225776
>> webrev: http://cr.openjdk.java.net/~adinn/8232828/webrev.00
>>
>> The test parses compiler AArch64 PrintAssembly output foir a variety of
>> volatile read, write and CAS operations to check that membars are added
>> or omitted appropriately when using, respectively, acquire/release
>> accesses vs unordered accesses supplemented with barriers. n.b. the test
>> only runs on a debug build JVM.
> 
> Cool, thanks. The endless game of whack-a-mole. :-)
> 
> Trouble is, there are many possible correct sequences. Still, we can't
> check for them all without making this test AI-complete!
There are in general many correct sequences for all programs. In this
case we have a very straightforward program so there is no room for
variation in the generated code -- modulo code inline/out of line
scheduling fixes like 8232828, that is.

So, although this test in no way guarantees that every volatile/CAS wil
have the correct sequence of generated/elided barriers it will reliably
check that the barrier elision/generation has not been messed up
wholesale (the more logically minded readers will already have
marshalled their quantifiers and negation operators accordingly).

Anyway, I'll take your 'cool, thanks' as a confirmation that this is
indeed trivial and also as a license to push.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From aph at redhat.com  Fri Nov 29 16:14:08 2019
From: aph at redhat.com (Andrew Haley)
Date: Fri, 29 Nov 2019 16:14:08 +0000
Subject: RFR: (T) AArch64: compiler/c2/aarch64/TestVolatilesG1.java fails
 with "Missing expected output membar_volatile..."
In-Reply-To: <eab13fe3-27f4-e360-abee-a95834aa7b92@redhat.com>
References: <be6441ad-e90b-55bc-00c4-7a10579a5b70@redhat.com>
 <b4356e57-0c76-8097-d9c1-40a24f62317e@redhat.com>
 <eab13fe3-27f4-e360-abee-a95834aa7b92@redhat.com>
Message-ID: <bd254da0-02e0-a744-1e34-7e3d989bd899@redhat.com>

On 11/29/19 4:12 PM, Andrew Dinn wrote:
> Anyway, I'll take your 'cool, thanks' as a confirmation that this is
> indeed trivial and also as a license to push.

Yes, indeed. Sorry, I hadn't intended to be vague.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From vladimir.x.ivanov at oracle.com  Fri Nov 29 17:18:30 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 29 Nov 2019 20:18:30 +0300
Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap
 unsafe accesses
In-Reply-To: <024ee660-a0e5-e899-5c48-4ca12ffa37fa@redhat.com>
References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com>
 <024ee660-a0e5-e899-5c48-4ca12ffa37fa@redhat.com>
Message-ID: <f327a08a-b74d-32b9-2caf-e1ad9145361f@oracle.com>


> Does it affect this:
> https://bugs.openjdk.java.net/browse/JDK-8220714

Good point, Roman. Proposed patch breaks the fix for JDK-8220714.

I'll investigate what happens there and come back with a revised fix.

Best regards,
Vladimir Ivanov

>> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8226411
>>
>> There were a number of fixes in C2 support for unsafe accesses recently
>> which led to additional memory barriers around them. It improved
>> stability, but in some cases it was redundant. One of important use
>> cases which regressed is off-heap accesses [1]. The barriers around them
>> are redundant because they are serialized on raw memory and don't
>> intersect with any on-heap accesses.
>>
>> Proposed fix skips memory barriers around unsafe accesses which are
>> provably off-heap (base == NULL).
>>
>> It (almost completely) recovers performance on the microbenchmark
>> provided in JDK-8224182 [1].
>>
>> Testing: tier1-6.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8224182
>>
> 

From vladimir.x.ivanov at oracle.com  Fri Nov 29 18:22:39 2019
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 29 Nov 2019 21:22:39 +0300
Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap
 unsafe accesses
In-Reply-To: <f327a08a-b74d-32b9-2caf-e1ad9145361f@oracle.com>
References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com>
 <024ee660-a0e5-e899-5c48-4ca12ffa37fa@redhat.com>
 <f327a08a-b74d-32b9-2caf-e1ad9145361f@oracle.com>
Message-ID: <0330b8f0-28c7-1372-2d18-2b5d0ced05c3@oracle.com>

Roman,

JDK-8220714 looks like a bug in Shenandoah barrier expansion.

I slightly modified the test to simplify the analysis [1].

While running the test [2] I'm seeing the following:

   127 LoadI  === 44   7 125
   160 StoreI === 44   7  91 127
    94 LoadI  === 44   7  91
   193 StoreI === 44 160 125  94

After expansion is over, it looks as follows:

   127 LoadI  ===  44 209 125
   160 StoreI ===  44 209  91 127
    94 LoadI  ===  44 160  91
   193 StoreI ===  44 160 125  94

Note that 94 LoadI depends on 160 StoreI memory now. Before the 
expansion they were independent (7 Parm == initial memory state).

And then 94 goes away, since it now reads updated value:

< 		< 94	LoadI	=== _ _ _  [[]]   [3200094]
 > int		 127	LoadI	===  44  209  125  [[ 160  193 ]]  @rawptr:BotPTR, 
idx=Raw; unsafe #int (does not depend only on test) !orig=[94] !jvms: 
Unsafe::getInt @ bci:3 TestUnsafeOffheapSwap$Memory::getInt @ bci:14 
TestUnsafeOffheapSwap$Memory::swap @ bci:10 
TestUnsafeOffheapSwap::testUnsafeHelper @ bci:7

The final graph is evidently wrong:

   127 LoadI  ===  44 209 125
   160 StoreI ===  44 209  91 127
   193 StoreI ===  44 160 125 127

Let me know how you want to proceed with it.

Best regards,
Vladimir Ivanov

[1]

diff --git 
a/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java 
b/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java
--- a/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java
+++ b/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java
@@ -57,6 +57,16 @@
          }
      }

+    static void testUnsafeHelper(int i) {
+        mem.swap(i - 1, i);
+    }
+
+    static void testArrayHelper(int[] arr, int i) {
+        int tmp = arr[i - 1];
+        arr[i - 1] = arr[i];
+        arr[i] = tmp;
+    }
+
      static void test() {
          Random rnd = new Random(SEED);
          for (int i = 0; i < SIZE; i++) {
@@ -72,10 +82,8 @@
          }

          for (int i = 1; i < SIZE; i++) {
-            mem.swap(i - 1, i);
-            int tmp = arr[i - 1];
-            arr[i - 1] = arr[i];
-            arr[i] = tmp;
+            testUnsafeHelper(i);
+            testArrayHelper(arr, i);
          }

          for (int i = 0; i < SIZE; i++) {

[2] $ java -cp 
JTwork/classes/gc/shenandoah/compiler/TestUnsafeOffheapSwap.d/ 
--add-exports=java.base/jdk.internal.misc=ALL-UNNAMED 
-XX:-UseOnStackReplacement -XX:-BackgroundCompilation 
-XX:-TieredCompilation  -XX:+UnlockExperimentalVMOptions 
-XX:+UseShenandoahGC -XX:+PrintCompilation -XX:CICompilerCount=1 
-XX:CompileCommand=quiet 
-XX:CompileCommand=compileonly,*::testUnsafeHelper 
-XX:CompileCommand=print,*::testUnsafeHelper -XX:PrintIdealGraphLevel=0 
-XX:-VerifyOops -XX:-UseCompressedOops TestUnsafeOffheapSwap


Best regards,
Vladimir Ivanov

On 29.11.2019 20:18, Vladimir Ivanov wrote:
> 
>> Does it affect this:
>> https://bugs.openjdk.java.net/browse/JDK-8220714
> 
> Good point, Roman. Proposed patch breaks the fix for JDK-8220714.
> 
> I'll investigate what happens there and come back with a revised fix.
> 
> Best regards,
> Vladimir Ivanov
> 
>>> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8226411
>>>
>>> There were a number of fixes in C2 support for unsafe accesses recently
>>> which led to additional memory barriers around them. It improved
>>> stability, but in some cases it was redundant. One of important use
>>> cases which regressed is off-heap accesses [1]. The barriers around them
>>> are redundant because they are serialized on raw memory and don't
>>> intersect with any on-heap accesses.
>>>
>>> Proposed fix skips memory barriers around unsafe accesses which are
>>> provably off-heap (base == NULL).
>>>
>>> It (almost completely) recovers performance on the microbenchmark
>>> provided in JDK-8224182 [1].
>>>
>>> Testing: tier1-6.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8224182
>>>
>>

From rkennke at redhat.com  Fri Nov 29 19:01:54 2019
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 29 Nov 2019 20:01:54 +0100
Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap
 unsafe accesses
In-Reply-To: <0330b8f0-28c7-1372-2d18-2b5d0ced05c3@oracle.com>
References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com>
 <024ee660-a0e5-e899-5c48-4ca12ffa37fa@redhat.com>
 <f327a08a-b74d-32b9-2caf-e1ad9145361f@oracle.com>
 <0330b8f0-28c7-1372-2d18-2b5d0ced05c3@oracle.com>
Message-ID: <7776c872-e5a1-2dc4-4cf7-b1c733b6a314@redhat.com>

Hi Vladimir,

Oh! Now that is surprising, and weird! I will have to discuss this with
Roland and likely open a new bug for this.

Thanks for figuring this out!

Please carry on with your changes then.

Thanks,
Roman

> Roman,
> 
> JDK-8220714 looks like a bug in Shenandoah barrier expansion.
> 
> I slightly modified the test to simplify the analysis [1].
> 
> While running the test [2] I'm seeing the following:
> 
> ? 127 LoadI? === 44?? 7 125
> ? 160 StoreI === 44?? 7? 91 127
> ?? 94 LoadI? === 44?? 7? 91
> ? 193 StoreI === 44 160 125? 94
> 
> After expansion is over, it looks as follows:
> 
> ? 127 LoadI? ===? 44 209 125
> ? 160 StoreI ===? 44 209? 91 127
> ?? 94 LoadI? ===? 44 160? 91
> ? 193 StoreI ===? 44 160 125? 94
> 
> Note that 94 LoadI depends on 160 StoreI memory now. Before the
> expansion they were independent (7 Parm == initial memory state).
> 
> And then 94 goes away, since it now reads updated value:
> 
> <???????? < 94??? LoadI??? === _ _ _? [[]]?? [3200094]
>> int???????? 127??? LoadI??? ===? 44? 209? 125? [[ 160? 193 ]]?
> @rawptr:BotPTR, idx=Raw; unsafe #int (does not depend only on test)
> !orig=[94] !jvms: Unsafe::getInt @ bci:3
> TestUnsafeOffheapSwap$Memory::getInt @ bci:14
> TestUnsafeOffheapSwap$Memory::swap @ bci:10
> TestUnsafeOffheapSwap::testUnsafeHelper @ bci:7
> 
> The final graph is evidently wrong:
> 
> ? 127 LoadI? ===? 44 209 125
> ? 160 StoreI ===? 44 209? 91 127
> ? 193 StoreI ===? 44 160 125 127
> 
> Let me know how you want to proceed with it.
> 
> Best regards,
> Vladimir Ivanov
> 
> [1]
> 
> diff --git
> a/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java
> b/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java
> --- a/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java
> +++ b/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java
> @@ -57,6 +57,16 @@
> ???????? }
> ???? }
> 
> +??? static void testUnsafeHelper(int i) {
> +??????? mem.swap(i - 1, i);
> +??? }
> +
> +??? static void testArrayHelper(int[] arr, int i) {
> +??????? int tmp = arr[i - 1];
> +??????? arr[i - 1] = arr[i];
> +??????? arr[i] = tmp;
> +??? }
> +
> ???? static void test() {
> ???????? Random rnd = new Random(SEED);
> ???????? for (int i = 0; i < SIZE; i++) {
> @@ -72,10 +82,8 @@
> ???????? }
> 
> ???????? for (int i = 1; i < SIZE; i++) {
> -??????????? mem.swap(i - 1, i);
> -??????????? int tmp = arr[i - 1];
> -??????????? arr[i - 1] = arr[i];
> -??????????? arr[i] = tmp;
> +??????????? testUnsafeHelper(i);
> +??????????? testArrayHelper(arr, i);
> ???????? }
> 
> ???????? for (int i = 0; i < SIZE; i++) {
> 
> [2] $ java -cp
> JTwork/classes/gc/shenandoah/compiler/TestUnsafeOffheapSwap.d/
> --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED
> -XX:-UseOnStackReplacement -XX:-BackgroundCompilation
> -XX:-TieredCompilation? -XX:+UnlockExperimentalVMOptions
> -XX:+UseShenandoahGC -XX:+PrintCompilation -XX:CICompilerCount=1
> -XX:CompileCommand=quiet
> -XX:CompileCommand=compileonly,*::testUnsafeHelper
> -XX:CompileCommand=print,*::testUnsafeHelper -XX:PrintIdealGraphLevel=0
> -XX:-VerifyOops -XX:-UseCompressedOops TestUnsafeOffheapSwap
> 
> 
> Best regards,
> Vladimir Ivanov
> 
> On 29.11.2019 20:18, Vladimir Ivanov wrote:
>>
>>> Does it affect this:
>>> https://bugs.openjdk.java.net/browse/JDK-8220714
>>
>> Good point, Roman. Proposed patch breaks the fix for JDK-8220714.
>>
>> I'll investigate what happens there and come back with a revised fix.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>>> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/
>>>> https://bugs.openjdk.java.net/browse/JDK-8226411
>>>>
>>>> There were a number of fixes in C2 support for unsafe accesses recently
>>>> which led to additional memory barriers around them. It improved
>>>> stability, but in some cases it was redundant. One of important use
>>>> cases which regressed is off-heap accesses [1]. The barriers around
>>>> them
>>>> are redundant because they are serialized on raw memory and don't
>>>> intersect with any on-heap accesses.
>>>>
>>>> Proposed fix skips memory barriers around unsafe accesses which are
>>>> provably off-heap (base == NULL).
>>>>
>>>> It (almost completely) recovers performance on the microbenchmark
>>>> provided in JDK-8224182 [1].
>>>>
>>>> Testing: tier1-6.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8224182
>>>>
>>>
> 


From john.r.rose at oracle.com  Fri Nov 29 23:30:42 2019
From: john.r.rose at oracle.com (John Rose)
Date: Fri, 29 Nov 2019 15:30:42 -0800
Subject: [14] RFR (S): 8234923: Missed call_site_target nmethod dependency
 for non-fully initialized ConstantCallSite instance
In-Reply-To: <7d4c2ab1-f8ec-8ccc-a442-8401a048b353@oracle.com>
References: <7d4c2ab1-f8ec-8ccc-a442-8401a048b353@oracle.com>
Message-ID: <F095375D-EF1B-4CBB-BA8D-3F8E7453A6DD@oracle.com>

Reviewed.

> On Nov 29, 2019, at 7:55 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~vlivanov/8234923/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8234923


From ioi.lam at oracle.com  Sat Nov 30 01:02:29 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Fri, 29 Nov 2019 17:02:29 -0800
Subject: RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64
In-Reply-To: <DB7PR08MB3115121B5DBB949E8A96A2FF96460@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115121B5DBB949E8A96A2FF96460@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <0b33fa95-ff15-b628-1891-f990f239e60f@oracle.com>

Hi Pengfei,

I have cc-ed hotspot-compiler-dev at openjdk.java.net.

Please do not push the patch until someone from hotspot-compiler-dev has 
looked at it.

Many people are away due to Thanksgiving in the US.

Thanks
- Ioi

On 11/28/19 7:56 PM, Pengfei Li (Arm Technology China) wrote:
> Hi,
>
> Please help review this small fix for 64-bit client build.
>
> Webrev: http://cr.openjdk.java.net/~pli/rfr/8234791/webrev.00/
> JBS: https://bugs.openjdk.java.net/browse/JDK-8234791
>
> Current 64-bit client VM build fails because errors occurred in dumping
> the CDS archive. In JDK 12, we enabled "Default CDS Archives"[1] which
> runs "java -Xshare:dump" after linking the JDK image. But for Client VM
> build on 64-bit platforms, the ergonomic flag UseCompressedOops is not
> set.[2] This leads to VM exits in checking the flags for dumping the
> shared archive.[3]
>
> This change removes the "#if defined" macro to make shared archive dump
> successful in 64-bit client build. By tracking the history of the macro,
> I found it is initially added as "#ifndef COMPILER1"[4] 10 years ago
> when C1 did not have a good support of compressed oops and modified to
> current shape[5] in the implementation of tiered compilation. It should
> be safe to be removed today.
>
> This patch also fixes another client build issue on AArch64.
>
> [1] http://openjdk.java.net/jeps/341
> [2] http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l1694
> [3] http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l3551
> [4] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/323bd24c6520#l11.7
> [5] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/d5d065957597#l86.56
>
> --
> Thanks,
> Pengfei
>


From john.r.rose at oracle.com  Sat Nov 30 01:28:28 2019
From: john.r.rose at oracle.com (John Rose)
Date: Fri, 29 Nov 2019 17:28:28 -0800
Subject: [14] RFR (L): 8234391: C2: Generic vector operands
In-Reply-To: <89904467-5010-129f-6f61-e279cce8936a@oracle.com>
References: <89904467-5010-129f-6f61-e279cce8936a@oracle.com>
Message-ID: <375F08F3-BE51-41F0-A20C-9443CB9F0D46@oracle.com>

Wow, very impressive work, Jatin and Vladimir.

Here?s an extra review in case it helps.

In machnode.cpp I think the code would be easier to read if a common
subexpression were assigned a name ?num_edges? as in similar places:

   uint num_edges = _opnds[opcnt]->num_edges();

(Alternatively, you could just increment ?skipped? on the fly, which
would be OK too.)  I?m being picky because it cost me a second to verify
that the set of raw edges was processed exhaustively.  This is a task
every reader of the code will have to do.

Another nit:  There is some dissonance between the seemingly general
name postselect_cleanup and supports_generic_vector_operands, which
is a very specific name.  But they refer to the same condition.  Pick the
shorter name, maybe, and convert the longer one into a nice comment.
It?s also not clear that helpers like process_mach_node and get_vector_operand
are is part of postselect_cleanup.  Should they be cleanup_mach_node
cleanup_vector_operand?  It?s a nit-pick, but might help future readers.

The name get_vector_operand is particularly bad, since it suggests that
it accesses something previously computed, where in fact it transforms
the graph.  Also, IMO, ?get_? is one of those noise words which offers little
help to the reader and just takes up valuable screen space.

The name clone_generic_vector_operand is confusing; I would expect it
to be called something like [specialize,cleanup,?]_generic_vector_operand.

Thank you for the get_vector_regmask refactor.  Relatively few readers
prefer walls of repetitive code.

There?s a funny condition reported in this comment:
  // RShiftCntV/RShiftCntV report wide vector type, but VecS as ideal register.

It seems to comes out of the blue sky.  Maybe add a cross-referencing comment
between that and the vshiftcnt instruction in the AD file?  (Are there asserts
that would catch similar oddities if they were to arise?  Was this one caught
via an assert?  I certainly hope it was, rather than by debugging bad code!)

On a similar note (about asserts), I?m very glad to see verify_mach_nodes.
The name is a little non-specific.  Maybe verify_after_postselect_cleanup.

I skimmed the AD files and they look good.  For a nicer experience I filtered
out the trivial changes (to vec[SDXYZ]) and used this reduced patch file:
  http://cr.openjdk.java.net/~jrose/vectors/8234391-x86.ad.reduced.patch

The diff noise from s/vec/vec1/ is unfortunate.  I suppose that?s the price
of adding a new type name to the AD file.  Glad to see the problem doesn?t
show up in the big file x86.ad.

Why do vector[xy]_reg_legacy and vectorz_reg_legacy get different treatments
in the change set?  I?m mainly curious here.  The vectorz_reg_vl thing is a
dynamic set (?) which is fine, but why is it needed for z and not xy?  A comment
might help.  Also, this gripe is not part of this review, but I?ll make it anyway:
The very very short acronym ?vl? which appears here starts for ?AVX512VL?
referring to ?variable length? but it bumps into the phrase ?vector legacy?
with an unfortunate occasion for confusion.  Suggest ?_vl? be renamed to
?_512vl? or some other more specific thing.

For the string intrinsics, there?s a regular replacement of legRegS by legRegD.
That strikes me as potentially a semantic change.  I suppose the register
allocator will do the same things in both cases, and there?s no spill code
generated by the JIT for such a temp.  I wonder, if the change was necessary,
how do we know that all the occurrences were correctly found and changed?
(Also, the string intrinsic macros use AVX512 instructions when available,
and in theory those would require heftier register masks.)  Can someone
comment on this, for the record?maybe even in the source code?

I?m very glad to see the duplicate operand definitions hoisted up to x86.ad.
(Cut-and-paste coding makes my skin crawl.)  Ignoring those insertions into
x86.a, and ignoring the trivial changes of vec[X?] to vec, it turns out that
there are 140 new lines added and 160 old lines removed.  (Mostly the old
lines are vector move instructions of particular sizes.)  That?s a win!

I suggest that the warning comments ?(leg)Vec should be used instead? could
be a little less cryptic.  An unconditional warning like this makes the reader
wonder, ?so why is it here at all??  Maybe use a cross-reference as in:
  // Replaces (leg)vec during post-selection cleanup.  See above.

So, reviewed.  I?m relieved to see lots of combinatorial complexity disappear
from the AD files.

? John

On Nov 19, 2019, at 6:30 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234391/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8234391
> 
> Introduce generic vector operands and migrate existing usages from fixed sized operands (vec[SDXYZ]) to generic ones.
> 
> (It's an updated version of generic vector support posted for review in August, 2019 [1] [2]. AD instruction merges will be handled separately.)
> 
> On a high-level it is organized as follows:
> 
>  (1) all AD instructions in x86.ad/x86_64.ad/x86_32.ad use vec/legVec;
> 
>  (2) at runtime, right after matching is over, a special pass is performed which does:
> 
>      * replaces vecOper with vec[SDXYZ] depending on mach node type
>         - vector mach nodes capute bottom_type() of their ideal prototype;
> 
>      * eliminates redundant reg-to-reg vector moves (MoveVec2Leg /MoveLeg2Vec)
>         - matcher needs them, but they are useless for register allocator (moreover, may cause additional spills);
> 
> 
>   (3) after post-selection pass is over, all mach nodes should have fixed-size vector operands.
> 
> 
> Some details:
> 
>   (1) vec and legVec are marked as "dynamic" operands, so post-selection rewriting works
> 
> 
>   (2) new logic is guarded by new matcher flag (Matcher::supports_generic_vector_operands) which is enabled only on x86
> 
> 
>   (3) post-selection analysis is implemented as a single pass over the graph and processing individual nodes using their own (for DEF operands) or their inputs (USE operands) bottom_type() (which is an instance of TypeVect)
> 
> 
>   (4) most of the analysis is cross-platform and interface with platform-specific code through 3 methods:
> 
>     static bool is_generic_reg2reg_move(MachNode* m);
>     // distinguishes MoveVec2Leg/MoveLeg2Vec nodes
> 
>     static bool is_generic_vector(MachOper* opnd);
>     // distinguishes vec/legVec operands
> 
>     static MachOper* clone_generic_vector_operand(MachOper* generic_opnd, uint ideal_reg);
>     // constructs fixed-sized vector operand based on ideal reg
>     //   vec    + Op_Vec[SDXYZ] =>    vec[SDXYZ]
>     //   legVec + Op_Vec[SDXYZ] => legVec[SDXYZ]
> 
> 
>   (5) TEMP operands are handled specially:
>     - TEMP uses max_vector_size() to determine what fixed-sized operand to use
>         * it is needed to cover reductions which don't produce vectors but scalars
>     - TEMP_DEF inherits fixed-sized operand type from DEF;
> 
> 
>   (6) there is limited number of special cases for mach nodes in Matcher::get_vector_operand_helper:
> 
>       - RShiftCntV/RShiftCntV: though it reports wide vector type as Node::bottom_type(), its ideal_reg is VecS! But for vector nodes only Node::bottom_type() is captured during matching and not ideal_reg().
> 
>       - vshiftcntimm: chain instructions which convert scalar to vector don't have vector type.
> 
> 
>   (7) idealreg2regmask initialization logic is adjusted to handle generic vector operands (see Matcher::get_vector_regmask)
> 
> 
>   (8) operand renaming in x86_32.ad & x86_64.ad to avoid name conflicts with new vec/legVec operands
> 
> 
>   (9) x86_64.ad: all TEMP usages of vecS/legVecS are replaced with regD/legRegD
>      - it aligns the code between x86_64.ad and x86_32.ad
>      - strictly speaking, it's illegal to use vector operands on a non-vector node (e.g., string_inflate) unless its usage is guarded by C2 vector support checks (-XX:MaxVectorSize=0)
> 
> 
> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
> Reviewed-by: vlivanov, sviswanathan, ?
> 
> Testing: tier1-tier4, jtreg compiler tests on KNL and SKL,
>         performance testing (SPEC* + Octane + micros / G1 + ParGC).
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html
> 
> [2] http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf


From john.r.rose at oracle.com  Sat Nov 30 07:02:33 2019
From: john.r.rose at oracle.com (John Rose)
Date: Fri, 29 Nov 2019 23:02:33 -0800
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <7e25e62f-46c1-ec08-2b2f-60436251d12b@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
 <C5533D91-B35E-4956-BD67-41FDEC505F74@oracle.com>
 <7e25e62f-46c1-ec08-2b2f-60436251d12b@oracle.com>
Message-ID: <84C1F5DD-E939-49A0-A82A-258E6E864B77@oracle.com>

On Nov 28, 2019, at 1:02 PM, Claes Redestad <claes.redestad at oracle.com> wrote:
> 
> I just want to point out that the "round up to power of 2"
> implementations I've seen seem prone to the same kind of overflows as a
> next up would, just not for exactly the same set of inputs.

Thanks; I stand corrected.