From dean.long at oracle.com Fri Nov 1 08:05:27 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 1 Nov 2019 01:05:27 -0700 Subject: RFR: 8231955: ARM32: Address displacement is 0 for volatile field access because of Unsafe field access. In-Reply-To: <2vyvfdgdqk-1@aserp2050.oracle.com> References: <20191010143426.BA4B6319F46@aojmv0009> <20191015073212.7FCCA319074@aojmv0009> <587f6363-bbdc-da12-9e50-82acc5bc5853@oracle.com> <2vyvfdgdqk-1@aserp2050.oracle.com> Message-ID: On 10/31/19 2:12 AM, christoph.goettschkes at microdoc.com wrote: >> I see now that BarrierSetC1::resolve_address() is calling >> generate_address(), at least when access isn't patched. So now I'm >> thinking that the address passed to >> volatile_field_load/volatile_field_store should be correct, and the call >> to add_large_constant() isn't necessary. > Yes, this is correct. The LIR_Address is created by > LIRGenerator::generate_address and has a displacement of 0. > I attached a backtrace of the failing assert at the end of this mail. > > Do you think the patch makes sense and can be pushed? > The HotSpot tier1 JTreg tests are passing with this and other patches I am > working on applied with a debug VM. Yes,? it looks fine now that I am reminded that arm32 is using ldmia/stmia in volatile_move_op, which means the displacement must be 0. dl > -- Christoph > > #0 0x7636b860 in LIRGenerator::add_large_constant > (this=0x641ae2f0, src=0xe500b, c=0, dest=0xe900b) > at src/hotspot/cpu/arm/c1_LIRGenerator_arm.cpp:166 > #1 0x7636f266 in LIRGenerator::volatile_field_load > (this=0x641ae2f0, address=0x6429c970, result=0xdd093, info=0x0) > at src/hotspot/cpu/arm/c1_LIRGenerator_arm.cpp:1326 > #2 0x762d9806 in BarrierSetC1::load_at_resolved > (this=0x7602b1f0, access=..., result=0xdd093) > at src/hotspot/share/gc/shared/c1/barrierSetC1.cpp:183 > #3 0x762d929a in BarrierSetC1::load_at > (this=0x7602b1f0, access=..., result=0xdd093) > at src/hotspot/share/gc/shared/c1/barrierSetC1.cpp:94 > #4 0x7635f6cc in LIRGenerator::access_load_at > (this=0x641ae2f0, decorators=9127331840, type=T_LONG, base=..., > offset=0xd900b, result=0xdd093, patch_info=0x0, load_emit_info=0x0) > at src/hotspot/share/c1/c1_LIRGenerator.cpp:1618 > #5 0x7636133e in LIRGenerator::do_UnsafeGetObject > (this=0x641ae2f0, x=0x6429a0d0) > at src/hotspot/share/c1/c1_LIRGenerator.cpp:2173 > #6 0x76328bdc in UnsafeGetObject::visit > (this=0x6429a0d0, v=0x641ae2f0) > at src/hotspot/share/c1/c1_Instruction.hpp:2407 > #7 0x7635b2d2 in LIRGenerator::do_root > (this=0x641ae2f0, instr=0x6429a0d0) > at src/hotspot/share/c1/c1_LIRGenerator.cpp:373 > #8 0x7635b1f2 in LIRGenerator::block_do > (this=0x641ae2f0, block=0x64299788) > at src/hotspot/share/c1/c1_LIRGenerator.cpp:354 > #9 0x76337d5a in BlockList::iterate_forward > (this=0x6429bf00, closure=0x641ae2f4) > at src/hotspot/share/c1/c1_Instruction.cpp:921 > #10 0x76332936 in IR::iterate_linear_scan_order > (this=0x642994d0, closure=0x641ae2f4) > at src/hotspot/share/c1/c1_IR.cpp:1221 > #11 0x7630ed10 in Compilation::emit_lir > (this=0x641ae5c0) > at src/hotspot/share/c1/c1_Compilation.cpp:259 > #12 0x7630f2be in Compilation::compile_java_method > (this=0x641ae5c0) > at src/hotspot/share/c1/c1_Compilation.cpp:398 > #13 0x7630f566 in Compilation::compile_method > (this=0x641ae5c0) > at src/hotspot/share/c1/c1_Compilation.cpp:460 > #14 0x7630fabc in Compilation::Compilation > (this=0x641ae5c0, compiler=0x760eb610, env=0x641ae848, > method=0x63d2edc8, osr_bci=-1, buffer_blob=0x73eb7448, > directive=0x760cf858) > at src/hotspot/share/c1/c1_Compilation.cpp:583 > #15 0x76312d6e in Compiler::compile_method > (this=0x760eb610, env=0x641ae848, method=0x63d2edc8, entry_bci=-1, > directive=0x760cf858) > at src/hotspot/share/c1/c1_Compiler.cpp:247 > #16 0x76453704 in CompileBroker::invoke_compiler_on_method > (task=0x642cfa50) > at src/hotspot/share/compiler/compileBroker.cpp:2115 > #17 0x764529ba in CompileBroker::compiler_thread_loop > () > at src/hotspot/share/compiler/compileBroker.cpp:1800 > #18 0x7693548c in compiler_thread_entry > (thread=0x6423b400, __the_thread__=0x6423b400) > at src/hotspot/share/runtime/thread.cpp:3401 > #19 0x769315d4 in JavaThread::thread_main_inner > (this=0x6423b400) > at src/hotspot/share/runtime/thread.cpp:1917 > #20 0x769314ac in JavaThread::run > (this=0x6423b400) > at src/hotspot/share/runtime/thread.cpp:1900 > #21 0x7692e884 in Thread::call_run > (this=0x6423b400) > at src/hotspot/share/runtime/thread.cpp:398 > #22 0x768285ce in thread_native_entry > (thread=0x6423b400) > at src/hotspot/os/linux/os_linux.cpp:790 > #23 0x76f84568 in start_thread() from target:/usr/lib/libpthread.so.0 > #24 0x76ef8ac8 in ?? () from target:/usr/lib/libc.so.6 > From per.liden at oracle.com Fri Nov 1 11:50:07 2019 From: per.liden at oracle.com (Per Liden) Date: Fri, 1 Nov 2019 12:50:07 +0100 Subject: RFR(M): 8232896: ZGC: Enable C2 clone intrinsic In-Reply-To: <2d509788-35f0-1fed-c305-d98d76583c66@oracle.com> References: <2d509788-35f0-1fed-c305-d98d76583c66@oracle.com> Message-ID: <3e148258-8c89-afdd-bf4f-585be5e09c42@oracle.com> Hi Nils, On 10/31/19 5:37 PM, Nils Eliasson wrote: > Hi, > > This patch fixes and enables the clone intrinsic for C2 with ZGC. > > The main thing added is an implementation of the > ZBarrierSetC2::clone_at_expansion method. This method handles clone > expansion for regular objects and primitive arrays that hasn't already > been reduced by optimizations. (Oop array clones doesn't go through this > path.) > > The code switches on the type of the source to either make a leaf call > to a runtime clone for regular objects, or delegate to > BarrierSetC2::clone_at_expansion for primitive arrays. > > Updated micro benchmark shows great gains, especially for small objects > that now will be reduced to inlined move-store sequences. Sweet! The speedup of micro:Clone is impressive! > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232896 > > Webrev: http://cr.openjdk.java.net/~neliasso/8232896/webrev.02/ My review comments summarized in a patch on top of yours. Let me know if there's something you don't agree with. The changes are: * Changed the type of the runtime function's size arguments from TypeInt::INT to TypeLong::LONG, to make it 64 bits, rather than 32. * Changed the new runtime function to call HeapAccess::clone() rather than ZBarrier::clone_oop(), since ZBarrier is the backend for the access layer, not the frontend. * Renamed clone_oop() to clone(), as we're cloning an object rather than an oop. * Added const where applicable. * Adjusted an include line. * Moved the clone_at_expansion() function up in the file. http://cr.openjdk.java.net/~pliden/8232896/webrev.pliden_review.0 I ran micro:Clone to verify my changes. cheers, Per > > > Please review, > > Nils Eliasson > From jorn.vernee at oracle.com Fri Nov 1 15:09:51 2019 From: jorn.vernee at oracle.com (Jorn Vernee) Date: Fri, 1 Nov 2019 16:09:51 +0100 Subject: RFR 8233389: Add PrintIdeal to compiler directives Message-ID: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com> Hi, I'd like to add PrintIdeal as a compiler directive in order to enable PrintIdeal for only a single method when combining it with the 'match' directive. Please review the following: Bug: https://bugs.openjdk.java.net/browse/JDK-8233389 Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/ (Testing = tier1, manual) As a heads-up; I'm not a committer on the jdk project, so if this sounds like a good idea, I would require a sponsor to push the changes. Thanks, Jorn From Nikola.Grcevski at microsoft.com Fri Nov 1 18:26:57 2019 From: Nikola.Grcevski at microsoft.com (Nikola Grcevski) Date: Fri, 1 Nov 2019 18:26:57 +0000 Subject: RFR 8233389: Add PrintIdeal to compiler directives In-Reply-To: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com> References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com> Message-ID: Hi Jorn and hotspot-compiler-dev! I'm a VM engineer at Microsoft and we recently signed the OCA agreement so we can properly contribute to OpenJDK. I was recently also in need of this option, as I have been educating myself on Ideal Graph and how the C2 optimizations work. I initially went ahead and added a very similar change in my local build to be able to log only certain methods in larger applications, however I later on discovered that a similar effect can be achieved by using the following combination of options: 1. On the main java command line you would need to add: -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=0 2. With the compile command (or through the .hotspot_compiler file) you can then use IGVPrintLevel to increase the level to greater than 0 on the methods you'd like, e.g: option path.class_name::method_name,intx,IGVPrintLevel,2 The idea is to enable the Ideal Graph printing globally, set the global level to 0, and then control which methods would get logged by setting their level through the compiler directive. This is not as good as having a direct option properly documented as suggested by Jorn, but it works. Thank you and please let me know if you think this is valid approach and if I should add this comment in the bug report. Nikola -----Original Message----- From: hotspot-compiler-dev On Behalf Of Jorn Vernee Sent: November 1, 2019 11:10 AM To: hotspot-compiler-dev at openjdk.java.net Subject: RFR 8233389: Add PrintIdeal to compiler directives Hi, I'd like to add PrintIdeal as a compiler directive in order to enable PrintIdeal for only a single method when combining it with the 'match' directive. Please review the following: Bug: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8233389&data=02%7C01%7CNikola.Grcevski%40microsoft.com%7Cda3a597a49d94f4c541008d75ede29b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637082181241665037&sdata=24uDd9T1ncKsAPzcvxYI70XUZ1WvXdJ8eX5jHrNHNhk%3D&reserved=0 Webrev: https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~jvernee%2Fprint_ideal%2Fwebrev.00%2F&data=02%7C01%7CNikola.Grcevski%40microsoft.com%7Cda3a597a49d94f4c541008d75ede29b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637082181241665037&sdata=GD8GvzGdJljn4gDWsLTGxbtbvVl5KCv0RgmMPNH3UKU%3D&reserved=0 (Testing = tier1, manual) As a heads-up; I'm not a committer on the jdk project, so if this sounds like a good idea, I would require a sponsor to push the changes. Thanks, Jorn From dean.long at oracle.com Fri Nov 1 22:10:24 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 1 Nov 2019 15:10:24 -0700 Subject: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns wrong result if Klass* is aligned to 32bit In-Reply-To: References: Message-ID: <274c3691-8e12-3425-0ee9-94da295e203f@oracle.com> The shared changes look good to me, and I don't see any obvious problems with the cpu changes. dl On 10/31/19 12:01 AM, Thomas St?fe wrote: > Hi Martin, > > thanks for the review! > > New version: > http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.01/webrev/ > > pls find remarks inline. > > On Wed, Oct 30, 2019 at 1:09 PM Doerr, Martin wrote: > >> Hi Thomas, >> >> >> >> thank you for finding and fixing this issue. >> >> >> >> Shared code part and regression test look good to me. >> >> But I have a few requests and questions related to the platform code. >> >> >> >> Arm32: >> >> Breaks build. The local variable p needs a scope. Can be fixed by: >> >> - case T_METADATA: >> >> + case T_METADATA: { >> >> // We only need, for now, comparison with NULL for metadata. >> >> ? >> >> break; >> >> + } >> >> > Ouch. Fixed. Actually, I discarded my coding completely and took over the > code from T_OBJECT to keep in line with the rest of the coding here. > > >> >> S390: >> >> Using z_cgfi is correct, but there are equivalent instructions with >> shorter opcode. >> >> For comparing to 0, z_ltgr(reg1, reg1) or z_cghi(reg1, 0) may be >> preferred. But that?s not a big deal. >> >> >> > Okay I switched to cghi. ltgr sounds cool but would be difficult to > integrate into the shared part since that one does first a move, then a > compare. > > >> I wonder why you have added includes to some platform files. Isn?t that >> redundant? >> >> "utilities/debug.hpp" comes via shared assembler.hpp. >> > Did this because I added asserts and the rule is that every file should > include what it needs and not rely on other includes including it (save for > umbrella includes like globalDefinitions.hpp). But okay, I removed the > added includes again to keep the patch small. > > >> >> I?d probably choose Unimplemented() instead of ShouldNotReachHere() for >> non-null cases because it?s not bad in general, it?s just currently not >> used and therefore not yet implemented. >> >> But you can keep that as it is. I?m ok with that, too. >> >> > I rather keep it to keep in line with the rest of the code (see e.g. the > default: branches). > > >> >> Best regards, >> >> Martin >> >> >> > Thanks Martin! > > ..Thomas > > >> >> *From:* Thomas St?fe >> *Sent:* Mittwoch, 30. Oktober 2019 11:48 >> *To:* hotspot compiler >> *Cc:* Doerr, Martin ; Schmidt, Lutz < >> lutz.schmidt at sap.com> >> *Subject:* RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns >> wrong result if Klass* is aligned to 32bit >> >> >> >> Hi all, >> >> >> >> second attempt at a fix (please find first review thread here: >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-October/035608.html >> ) >> >> >> >> Issue: https://bugs.openjdk.java.net/browse/JDK-8233019 >> >> webrev: >> http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.00/webrev/ >> >> >> >> In short, C1 intrinsic for jlC::isPrimitive does a compare with the Klass* >> pointer for the class to find out if its NULL and hence a primitive type. >> That compare is done using 32bit cmp and so it gives wrong results when the >> Klass* pointer is aligned to 32bit. >> >> >> >> In the generator I changed the comparison constant type from intConst(0) >> to metadataConst(0) and implemented the missing code paths for all CPUs. >> Since on most architectures we do not seem to have a comparison with a >> 64bit immediate (at least I could not find one) I kept the change simple >> and only implemented comparison with NULL for now. >> >> >> >> I tested the fix in our nightlies (jtreg tier1, jck and others) as well as >> manually testing it. >> >> >> >> I did not test on aarch64 and arm though and would be thankful if someone >> knowledgeable to these platforms could take a look. >> >> >> >> Thanks to Martin and Lutz for eyeballing the ppc and s390 parts. >> >> >> >> Thanks, Thomas >> From jorn.vernee at oracle.com Fri Nov 1 23:03:16 2019 From: jorn.vernee at oracle.com (Jorn Vernee) Date: Sat, 2 Nov 2019 00:03:16 +0100 Subject: RFR 8233389: Add PrintIdeal to compiler directives In-Reply-To: References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com> Message-ID: <6e40a291-2628-9787-8b1b-92d6950cec27@oracle.com> Hi Nikola, Thanks for the suggestion, and welcome to OpenJDK! The PrintIdeal option is slightly different from PrintIdealGraph + IGVPrintLevel. The latter outputs the graph in XML format (or directly to the visualizer tool), while the former calls Node::dump on the root node of the graph, which outputs a plain text representation of the graph to the console instead. IGVPrintLevel is already supported as a compiler directive, but I'd like to add PrintIdeal as well, since that doesn't require using the visualizer tool :) In case you were unaware of the feature; I'm using the compiler directives JSON file support which was added by JEP 165: https://openjdk.java.net/jeps/165 in Java 9. This allows me to use "-XX:CompilerDirectivesFile=compile.txt" and then have a compile.txt file with something like: ``` [ ??? { ??????? match: "main.Main::invoke", ??????? c2: { ??????????? inline: "-main.Main::invoke", ??????????? Log: true, ??????????? PrintAssembly: true, ??????????? PrintInlining: true, ??????????? PrintIdeal: true ??????? } ??? } ] ``` As a more structured way of defining the compile commands I need. Cheers, Jorn On 01/11/2019 19:26, Nikola Grcevski wrote: > Hi Jorn and hotspot-compiler-dev! > > I'm a VM engineer at Microsoft and we recently signed the OCA agreement so we can properly contribute to OpenJDK. I was recently also in need of this option, as I have been educating myself on Ideal Graph and how the C2 optimizations work. I initially went ahead and added a very similar change in my local build to be able to log only certain methods in larger applications, however I later on discovered that a similar effect can be achieved by using the following combination of options: > > 1. On the main java command line you would need to add: -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=0 > > 2. With the compile command (or through the .hotspot_compiler file) you can then use IGVPrintLevel to increase the level to greater than 0 on the methods you'd like, e.g: > option path.class_name::method_name,intx,IGVPrintLevel,2 > > The idea is to enable the Ideal Graph printing globally, set the global level to 0, and then control which methods would get logged by setting their level through the compiler directive. > > This is not as good as having a direct option properly documented as suggested by Jorn, but it works. > > Thank you and please let me know if you think this is valid approach and if I should add this comment in the bug report. > Nikola > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Jorn Vernee > Sent: November 1, 2019 11:10 AM > To: hotspot-compiler-dev at openjdk.java.net > Subject: RFR 8233389: Add PrintIdeal to compiler directives > > Hi, > > I'd like to add PrintIdeal as a compiler directive in order to enable PrintIdeal for only a single method when combining it with the 'match' > directive. > > Please review the following: > > Bug: https://urldefense.proofpoint.com/v2/url?u=https-3A__nam06.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fbugs.openjdk.java.net-252Fbrowse-252FJDK-2D8233389-26amp-3Bdata-3D02-257C01-257CNikola.Grcevski-2540microsoft.com-257Cda3a597a49d94f4c541008d75ede29b2-257C72f988bf86f141af91ab2d7cd011db47-257C1-257C0-257C637082181241665037-26amp-3Bsdata-3D24uDd9T1ncKsAPzcvxYI70XUZ1WvXdJ8eX5jHrNHNhk-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=z6hqmB9KSaIU-Otko9bPiG2fp55VDaBvROsDkQobkYI&m=k9YvYehfSfTknl9EFUZMcbiqTaW26xrJPPQUd5KnPTs&s=mObyYrV_E5da2eJQdvqpKJI_tJ5te1qGcSXVSOHqz0M&e= > Webrev: https://urldefense.proofpoint.com/v2/url?u=https-3A__nam06.safelinks.protection.outlook.com_-3Furl-3Dhttp-3A-252F-252Fcr.openjdk.java.net-252F-7Ejvernee-252Fprint-5Fideal-252Fwebrev.00-252F-26amp-3Bdata-3D02-257C01-257CNikola.Grcevski-2540microsoft.com-257Cda3a597a49d94f4c541008d75ede29b2-257C72f988bf86f141af91ab2d7cd011db47-257C1-257C0-257C637082181241665037-26amp-3Bsdata-3DGD8GvzGdJljn4gDWsLTGxbtbvVl5KCv0RgmMPNH3UKU-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=z6hqmB9KSaIU-Otko9bPiG2fp55VDaBvROsDkQobkYI&m=k9YvYehfSfTknl9EFUZMcbiqTaW26xrJPPQUd5KnPTs&s=3HriLu-X9seR3J_Z7VsBJyyWVginzlyEMhTEBTzkfBs&e= > (Testing = tier1, manual) > > As a heads-up; I'm not a committer on the jdk project, so if this sounds like a good idea, I would require a sponsor to push the changes. > > Thanks, > Jorn > From Nikola.Grcevski at microsoft.com Fri Nov 1 23:39:14 2019 From: Nikola.Grcevski at microsoft.com (Nikola Grcevski) Date: Fri, 1 Nov 2019 23:39:14 +0000 Subject: RFR 8233389: Add PrintIdeal to compiler directives In-Reply-To: <6e40a291-2628-9787-8b1b-92d6950cec27@oracle.com> References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com> <6e40a291-2628-9787-8b1b-92d6950cec27@oracle.com> Message-ID: Hi Jorn, This is great, thanks so much for the explanation. I wasn't aware of the new directives format, I'll check it out. Cheers, Nikola -----Original Message----- From: Jorn Vernee Sent: November 1, 2019 7:03 PM To: Nikola Grcevski ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR 8233389: Add PrintIdeal to compiler directives Hi Nikola, Thanks for the suggestion, and welcome to OpenJDK! The PrintIdeal option is slightly different from PrintIdealGraph + IGVPrintLevel. The latter outputs the graph in XML format (or directly to the visualizer tool), while the former calls Node::dump on the root node of the graph, which outputs a plain text representation of the graph to the console instead. IGVPrintLevel is already supported as a compiler directive, but I'd like to add PrintIdeal as well, since that doesn't require using the visualizer tool :) In case you were unaware of the feature; I'm using the compiler directives JSON file support which was added by JEP 165: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenjdk.java.net%2Fjeps%2F165&data=02%7C01%7CNikola.Grcevski%40microsoft.com%7C8d8803db27e04864494108d75f1fb1d0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637082462078430230&sdata=20c4zHKdGZralHRTgC8AL5HLpNeWiQiwfk7tGmQ0mIU%3D&reserved=0 in Java 9. This allows me to use "-XX:CompilerDirectivesFile=compile.txt" and then have a compile.txt file with something like: ``` [ ??? { ??????? match: "main.Main::invoke", ??????? c2: { ??????????? inline: "-main.Main::invoke", ??????????? Log: true, ??????????? PrintAssembly: true, ??????????? PrintInlining: true, ??????????? PrintIdeal: true ??????? } ??? } ] ``` As a more structured way of defining the compile commands I need. Cheers, Jorn On 01/11/2019 19:26, Nikola Grcevski wrote: > Hi Jorn and hotspot-compiler-dev! > > I'm a VM engineer at Microsoft and we recently signed the OCA agreement so we can properly contribute to OpenJDK. I was recently also in need of this option, as I have been educating myself on Ideal Graph and how the C2 optimizations work. I initially went ahead and added a very similar change in my local build to be able to log only certain methods in larger applications, however I later on discovered that a similar effect can be achieved by using the following combination of options: > > 1. On the main java command line you would need to add: > -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=0 > > 2. With the compile command (or through the .hotspot_compiler file) you can then use IGVPrintLevel to increase the level to greater than 0 on the methods you'd like, e.g: > option path.class_name::method_name,intx,IGVPrintLevel,2 > > The idea is to enable the Ideal Graph printing globally, set the global level to 0, and then control which methods would get logged by setting their level through the compiler directive. > > This is not as good as having a direct option properly documented as suggested by Jorn, but it works. > > Thank you and please let me know if you think this is valid approach and if I should add this comment in the bug report. > Nikola > > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of Jorn > Vernee > Sent: November 1, 2019 11:10 AM > To: hotspot-compiler-dev at openjdk.java.net > Subject: RFR 8233389: Add PrintIdeal to compiler directives > > Hi, > > I'd like to add PrintIdeal as a compiler directive in order to enable PrintIdeal for only a single method when combining it with the 'match' > directive. > > Please review the following: > > Bug: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld > efense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__nam06.safelinks.prote > ction.outlook.com_-3Furl-3Dhttps-253A-252F-252Fbugs.openjdk.java.net-2 > 52Fbrowse-252FJDK-2D8233389-26amp-3Bdata-3D02-257C01-257CNikola.Grcevs > ki-2540microsoft.com-257Cda3a597a49d94f4c541008d75ede29b2-257C72f988bf > 86f141af91ab2d7cd011db47-257C1-257C0-257C637082181241665037-26amp-3Bsd > ata-3D24uDd9T1ncKsAPzcvxYI70XUZ1WvXdJ8eX5jHrNHNhk-253D-26amp-3Breserve > d-3D0%26d%3DDwIGaQ%26c%3DRoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE%2 > 6r%3Dz6hqmB9KSaIU-Otko9bPiG2fp55VDaBvROsDkQobkYI%26m%3Dk9YvYehfSfTknl9 > EFUZMcbiqTaW26xrJPPQUd5KnPTs%26s%3DmObyYrV_E5da2eJQdvqpKJI_tJ5te1qGcSX > VSOHqz0M%26e&data=02%7C01%7CNikola.Grcevski%40microsoft.com%7C8d88 > 03db27e04864494108d75f1fb1d0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C > 0%7C637082462078430230&sdata=P9YD4j5nbAegopSUa3FKOA8g4Nr26nKiOw16K > IXNZd4%3D&reserved=0= > Webrev: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld > efense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__nam06.safelinks.prote > ction.outlook.com_-3Furl-3Dhttp-3A-252F-252Fcr.openjdk.java.net-252F-7 > Ejvernee-252Fprint-5Fideal-252Fwebrev.00-252F-26amp-3Bdata-3D02-257C01 > -257CNikola.Grcevski-2540microsoft.com-257Cda3a597a49d94f4c541008d75ed > e29b2-257C72f988bf86f141af91ab2d7cd011db47-257C1-257C0-257C63708218124 > 1665037-26amp-3Bsdata-3DGD8GvzGdJljn4gDWsLTGxbtbvVl5KCv0RgmMPNH3UKU-25 > 3D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DRoP1YumCXCgaWHvlZYR8PZh8Bv7 > qIrMUB65eapI_JnE%26r%3Dz6hqmB9KSaIU-Otko9bPiG2fp55VDaBvROsDkQobkYI%26m > %3Dk9YvYehfSfTknl9EFUZMcbiqTaW26xrJPPQUd5KnPTs%26s%3D3HriLu-X9seR3J_Z7 > VsBJyyWVginzlyEMhTEBTzkfBs%26e&data=02%7C01%7CNikola.Grcevski%40mi > crosoft.com%7C8d8803db27e04864494108d75f1fb1d0%7C72f988bf86f141af91ab2 > d7cd011db47%7C1%7C0%7C637082462078430230&sdata=qpkD3ku8UI2lNodOWh6 > xVy72p455nxlWx%2FBvEi%2BS5Bo%3D&reserved=0= > (Testing = tier1, manual) > > As a heads-up; I'm not a committer on the jdk project, so if this sounds like a good idea, I would require a sponsor to push the changes. > > Thanks, > Jorn > From thomas.stuefe at gmail.com Sat Nov 2 05:28:25 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Sat, 2 Nov 2019 06:28:25 +0100 Subject: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns wrong result if Klass* is aligned to 32bit In-Reply-To: <274c3691-8e12-3425-0ee9-94da295e203f@oracle.com> References: <274c3691-8e12-3425-0ee9-94da295e203f@oracle.com> Message-ID: Thank you, Dean. On Fri 1. Nov 2019 at 23:10, wrote: > The shared changes look good to me, and I don't see any obvious problems > with the cpu changes. > > dl > > On 10/31/19 12:01 AM, Thomas St?fe wrote: > > Hi Martin, > > > > thanks for the review! > > > > New version: > > > http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.01/webrev/ > > > > pls find remarks inline. > > > > On Wed, Oct 30, 2019 at 1:09 PM Doerr, Martin > wrote: > > > >> Hi Thomas, > >> > >> > >> > >> thank you for finding and fixing this issue. > >> > >> > >> > >> Shared code part and regression test look good to me. > >> > >> But I have a few requests and questions related to the platform code. > >> > >> > >> > >> Arm32: > >> > >> Breaks build. The local variable p needs a scope. Can be fixed by: > >> > >> - case T_METADATA: > >> > >> + case T_METADATA: { > >> > >> // We only need, for now, comparison with NULL for metadata. > >> > >> ? > >> > >> break; > >> > >> + } > >> > >> > > Ouch. Fixed. Actually, I discarded my coding completely and took over the > > code from T_OBJECT to keep in line with the rest of the coding here. > > > > > >> > >> S390: > >> > >> Using z_cgfi is correct, but there are equivalent instructions with > >> shorter opcode. > >> > >> For comparing to 0, z_ltgr(reg1, reg1) or z_cghi(reg1, 0) may be > >> preferred. But that?s not a big deal. > >> > >> > >> > > Okay I switched to cghi. ltgr sounds cool but would be difficult to > > integrate into the shared part since that one does first a move, then a > > compare. > > > > > >> I wonder why you have added includes to some platform files. Isn?t that > >> redundant? > >> > >> "utilities/debug.hpp" comes via shared assembler.hpp. > >> > > Did this because I added asserts and the rule is that every file should > > include what it needs and not rely on other includes including it (save > for > > umbrella includes like globalDefinitions.hpp). But okay, I removed the > > added includes again to keep the patch small. > > > > > >> > >> I?d probably choose Unimplemented() instead of ShouldNotReachHere() for > >> non-null cases because it?s not bad in general, it?s just currently not > >> used and therefore not yet implemented. > >> > >> But you can keep that as it is. I?m ok with that, too. > >> > >> > > I rather keep it to keep in line with the rest of the code (see e.g. the > > default: branches). > > > > > >> > >> Best regards, > >> > >> Martin > >> > >> > >> > > Thanks Martin! > > > > ..Thomas > > > > > >> > >> *From:* Thomas St?fe > >> *Sent:* Mittwoch, 30. Oktober 2019 11:48 > >> *To:* hotspot compiler > >> *Cc:* Doerr, Martin ; Schmidt, Lutz < > >> lutz.schmidt at sap.com> > >> *Subject:* RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns > >> wrong result if Klass* is aligned to 32bit > >> > >> > >> > >> Hi all, > >> > >> > >> > >> second attempt at a fix (please find first review thread here: > >> > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-October/035608.html > >> ) > >> > >> > >> > >> Issue: https://bugs.openjdk.java.net/browse/JDK-8233019 > >> > >> webrev: > >> > http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.00/webrev/ > >> > >> > >> > >> In short, C1 intrinsic for jlC::isPrimitive does a compare with the > Klass* > >> pointer for the class to find out if its NULL and hence a primitive > type. > >> That compare is done using 32bit cmp and so it gives wrong results when > the > >> Klass* pointer is aligned to 32bit. > >> > >> > >> > >> In the generator I changed the comparison constant type from intConst(0) > >> to metadataConst(0) and implemented the missing code paths for all CPUs. > >> Since on most architectures we do not seem to have a comparison with a > >> 64bit immediate (at least I could not find one) I kept the change simple > >> and only implemented comparison with NULL for now. > >> > >> > >> > >> I tested the fix in our nightlies (jtreg tier1, jck and others) as well > as > >> manually testing it. > >> > >> > >> > >> I did not test on aarch64 and arm though and would be thankful if > someone > >> knowledgeable to these platforms could take a look. > >> > >> > >> > >> Thanks to Martin and Lutz for eyeballing the ppc and s390 parts. > >> > >> > >> > >> Thanks, Thomas > >> > > From dean.long at oracle.com Sat Nov 2 07:35:46 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Sat, 2 Nov 2019 00:35:46 -0700 Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread In-Reply-To: References: <3f80fc0b-2388-d4be-3c84-4af516e9635f@oracle.com> <15a92da5-c5ba-ce55-341d-5f60acf14c3a@oracle.com> <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com> <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com> <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com> Message-ID: <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com> Hi Martin, On 10/30/19 3:18 AM, Doerr, Martin wrote: > Hi David, > >> I don't think factoring out CompileBroker::clear_compiler2_object when >> it is only used once was warranted, but that's call for compiler team to >> make. > I did that because _compiler2_objects is private and there's currently no setter available. > But let's see what the compiler folks think. how about changing can_remove() to CompileBroker::can_remove()? Then you can access _compiler2_objects directly, right? dl >> Otherwise changes seem fine and I have noted the use of the >> MutexUnlocker as per your direct email. > Thanks a lot for reviewing. It was not a trivial one ?? > > You had noticed an incorrect usage of the CHECK macro. I've created a new bug for that: > https://bugs.openjdk.java.net/browse/JDK-8233193 > Would be great if you could take a look if that's what you meant and made adaptions if needed. > > Best regards, > Martin > > >> -----Original Message----- >> From: David Holmes >> Sent: Mittwoch, 30. Oktober 2019 05:47 >> To: Doerr, Martin ; Kim Barrett >> >> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) >> ; hotspot-compiler-dev at openjdk.java.net; >> David Holmes >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread >> >> Hi Martin, >> >> On 29/10/2019 12:06 am, Doerr, Martin wrote: >>> Hi David and Kim, >>> >>> I think it's easier to talk about code. So here's a new webrev: >>> >> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr >> ev.03/ >> >> I don't think factoring out CompileBroker::clear_compiler2_object when >> it is only used once was warranted, but that's call for compiler team to >> make. Otherwise changes seem fine and I have noted the use of the >> MutexUnlocker as per your direct email. >> >> Thanks, >> David >> ----- >> >>> @Kim: >>> Thanks for looking at the handle related parts. It's ok if you don't want to >> be a reviewer of the whole change. >>>> I think it's weird that can_remove() is a predicate with optional side >>>> effects. I think it would be simpler to have it be a pure predicate, >>>> and have the one caller with do_it = true perform the updates. That >>>> should include NULLing out the handle pointer (perhaps debug-only, but >>>> it doesn't cost much to cleanly maintain the data structure). >>> Nevertheless, it has the advantage that it enforces the update to be >> consistent. >>> A caller could use it without holding the lock or mess it up otherwise. >>> In addition, I don't what to change that as part of this fix. >>> >>>> So far as I can tell, THREAD == NULL here. >>> This is a very tricky part (not my invention): >>> EXCEPTION_MARK contains an ExceptionMark constructor call which sets >> __the_thread__ to Thread::current(). >>> I don't want to publish my opinion about this ?? >>> >>> @David: >>> Seems like this option is preferred over option 3 >> (possibly_add_compiler_threads part of webrev.02 and leave the >> initialization as is). >>> So when you're ok with it, I'll request a 2nd review from the compiler folks >> (I should change the subject to contain RFR). >>> Thanks, >>> Martin >>> >>> >>>> -----Original Message----- >>>> From: David Holmes >>>> Sent: Montag, 28. Oktober 2019 05:04 >>>> To: Kim Barrett >>>> Cc: Doerr, Martin ; Vladimir Kozlov >>>> (vladimir.kozlov at oracle.com) ; hotspot- >>>> compiler-dev at openjdk.java.net >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread >>>> >>>> On 28/10/2019 1:42 pm, Kim Barrett wrote: >>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes >> >>>> wrote: >>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote: >>>>>>> Hi Kim, >>>>>>> I didn't like using the OopStorage stuff directly, either. I just have not >>>> seen how to allocate a global handle and add the oop later. >>>>>>> Thanks for pointing me to JVMCI::make_global. I was not aware of >> that. >>>>>>> So I can imagine 3 ways to implement it: >>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just added >>>> that to >> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr >>>> ev.01/ >>>>>>> We may want to improve that further by setting the handle pointer to >>>> NULL and asserting that it is NULL before adding the new one. >>>>>>> I had been concerned about NULLs in the array, but looks like the >>>> existing code can deal with that. >>>>>> I think it would be cleaner to both destroy the global handle and NULL it >> in >>>> the array at the same time. >>>>>> This comment >>>>>> >>>>>> 325 // Old j.l.Thread object can die here. >>>>>> >>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct->threadObj() >> so >>>> can't "die" until that is also cleared during the actual termination process. >>>>> I think if there is such a thread here that it can't die, because the >>>>> death predicate (the can_remove stuff) won't see that old thread as >>>>> the last thread in _compiler2_objects. That's what I meant by this: >>>>> >>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett >>>> wrote: >>>>>> I also think that here: >>>>>> >>>>>> 947 jobject thread_handle = >> JNIHandles::make_global(thread_oop); >>>>>> 948 _compiler2_objects[i] = thread_handle; >>>>>> >>>>>> should assert _compiler2_objects[i] == NULL. Or if that isn't a valid >>>>>> assertion then I think there are other problems. >>>>> I think either that comment about an old thread is wrong (and the NULL >>>>> assertion I suggested is okay), or I think the whole mechanism here >>>>> has problems. Or at least I was unable to figure out how it could work... >>>>> >>>> I'm not following sorry. You can't assert NULL unless it's actually set >>>> to NULL which it presently isn't. But it could be set NULL as Martin >>>> suggested: >>>> >>>> "We may want to improve that further by setting the handle pointer to >>>> NULL and asserting that it is NULL before adding the new one." >>>> >>>> and which I also supported. But that aside once the delete_global has >>>> been called that JNIHandle no longer references the j.l.Thread that it >>>> did, at which point it is only reachable via the threadObj() of the >>>> CompilerThread. >>>> >>>> David From fujie at loongson.cn Sat Nov 2 09:29:01 2019 From: fujie at loongson.cn (Jie Fu) Date: Sat, 2 Nov 2019 17:29:01 +0800 Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003 Message-ID: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn> Hi all, May I get reviews for this small fix? JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233429 Webrev: http://cr.openjdk.java.net/~jiefu/8233429/webrev.00/ Thanks a lot. Best regards, Jie From bsrbnd at gmail.com Sat Nov 2 17:18:29 2019 From: bsrbnd at gmail.com (B. Blaser) Date: Sat, 2 Nov 2019 18:18:29 +0100 Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and setting long vector bits Message-ID: Hi, I experimented, some time ago, with an optimization of several common flag patterns (see also JBS) using BTR/BTS instead of AND/OR instructions on x86_64 xeon: @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Thread) public class BitSetAndReset { private static final int COUNT = 10_000; private static final long MASK63 = 0x8000_0000_0000_0000L; private static final long MASK31 = 0x0000_0000_8000_0000L; private static final long MASK15 = 0x0000_0000_0000_8000L; private static final long MASK00 = 0x0000_0000_0000_0001L; private long andq, orq; private boolean success = true; @TearDown(Level.Iteration) public void finish() { if (!success) throw new AssertionError("Failure while setting or clearing long vector bits!"); } @Benchmark public void bitSet(Blackhole bh) { for (int i=0; i 28 bytes 03c xorl RAX, RAX # long 03e movq R10, #-2147483649 # long 048 andq [RSI + #16 (8-bit)], R10 # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq 04c movl R10, #2147483648 # long (unsigned 32-bit) 052 orq [RSI + #24 (8-bit)], R10 # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq 056 ... => 26 bytes 03c andq [RSI + #16 (8-bit)], #-32769 # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq 044 orq [RSI + #24 (8-bit)], #32768 # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq 04c ... 03c andq [RSI + #16 (8-bit)], #-2 # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq 041 orq [RSI + #24 (8-bit)], #1 # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq 046 ... Benchmark Mode Cnt Score Error Units BitSetAndReset.bitSet avgt 9 78083.773 ? 2182.692 ns/op And we would have after: 03c btrq [RSI + #16 (8-bit)], log2(not(#9223372036854775807)) # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq 042 btsq [RSI + #24 (8-bit)], log2(#-9223372036854775808) # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq 048 ... => 12 bytes 03c btrq [RSI + #16 (8-bit)], log2(not(#-2147483649)) # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq 042 xorl RAX, RAX # long 044 movl R10, #2147483648 # long (unsigned 32-bit) 04a orq [RSI + #24 (8-bit)], R10 # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq 04e ... => 18 bytes 03c andq [RSI + #16 (8-bit)], #-32769 # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq 044 orq [RSI + #24 (8-bit)], #32768 # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq 04c ... 03c andq [RSI + #16 (8-bit)], #-2 # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq 041 orq [RSI + #24 (8-bit)], #1 # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq 046 ... Benchmark Mode Cnt Score Error Units BitSetAndReset.bitSet avgt 9 77355.154 ? 252.503 ns/op We see a tiny performance gain with BTR/BTS but the major interest remains the much better encoding with up to 16 bytes saving for pure 64-bit immediates along with a lower register consumption. Does the patch below look reasonable enough to eventually rebase and push it to jdk/submit and to post a RFR maybe soon if all goes well? Thanks, Bernard diff --git a/src/hotspot/cpu/x86/x86_64.ad b/src/hotspot/cpu/x86/x86_64.ad --- a/src/hotspot/cpu/x86/x86_64.ad +++ b/src/hotspot/cpu/x86/x86_64.ad @@ -2069,6 +2069,16 @@ } %} + enc_class Log2L(immPow2L imm) + %{ + emit_d8(cbuf, log2_long($imm$$constant)); + %} + + enc_class Log2NotL(immPow2NotL imm) + %{ + emit_d8(cbuf, log2_long(~$imm$$constant)); + %} + enc_class opc2_reg(rRegI dst) %{ // BSWAP @@ -3131,6 +3141,28 @@ interface(CONST_INTER); %} +operand immPow2L() +%{ + // n should be a pure 64-bit power of 2 immediate. + predicate(is_power_of_2_long(n->get_long()) && log2_long(n->get_long()) > 31); + match(ConL); + + op_cost(15); + format %{ %} + interface(CONST_INTER); +%} + +operand immPow2NotL() +%{ + // n should be a pure 64-bit immediate given that not(n) is a power of 2. + predicate(is_power_of_2_long(~n->get_long()) && log2_long(~n->get_long()) > 30); + match(ConL); + + op_cost(15); + format %{ %} + interface(CONST_INTER); +%} + // Long Immediate zero operand immL0() %{ @@ -9740,6 +9772,19 @@ ins_pipe(ialu_mem_imm); %} +instruct btrL_mem_imm(memory dst, immPow2NotL src, rFlagsReg cr) +%{ + match(Set dst (StoreL dst (AndL (LoadL dst) src))); + effect(KILL cr); + + ins_cost(125); + format %{ "btrq $dst, log2(not($src))\t# long" %} + opcode(0x0F, 0xBA, 0x06); + ins_encode(REX_mem_wide(dst), OpcP, OpcS, + RM_opc_mem(tertiary, dst), Log2NotL(src)); + ins_pipe(ialu_mem_imm); +%} + // BMI1 instructions instruct andnL_rReg_rReg_mem(rRegL dst, rRegL src1, memory src2, immL_M1 minus_1, rFlagsReg cr) %{ match(Set dst (AndL (XorL src1 minus_1) (LoadL src2))); @@ -9933,6 +9978,19 @@ ins_pipe(ialu_mem_imm); %} +instruct btsL_mem_imm(memory dst, immPow2L src, rFlagsReg cr) +%{ + match(Set dst (StoreL dst (OrL (LoadL dst) src))); + effect(KILL cr); + + ins_cost(125); + format %{ "btsq $dst, log2($src)\t# long" %} + opcode(0x0F, 0xBA, 0x05); + ins_encode(REX_mem_wide(dst), OpcP, OpcS, + RM_opc_mem(tertiary, dst), Log2L(src)); + ins_pipe(ialu_mem_imm); +%} + // Xor Instructions // Xor Register with Register instruct xorL_rReg(rRegL dst, rRegL src, rFlagsReg cr) From tobias.hartmann at oracle.com Mon Nov 4 07:25:20 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 4 Nov 2019 08:25:20 +0100 Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out In-Reply-To: <878spbc0c8.fsf@redhat.com> References: <878spbc0c8.fsf@redhat.com> Message-ID: <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com> Hi Roland, this seems reasonable to me but I'm concerned that it might cause performance regressions. I'll run some tests in our system. Best regards, Tobias On 23.10.19 10:50, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8232539/webrev.00/ > > I couldn't come up with a test case because node processing order during > IGVN matters. Bug was reported against 11 but I see no reason it > wouldn't apply to current code as well. > > At parse time, predicates are added by > Parse::maybe_add_predicate_after_if() but not loop is actually > created. Compile::_major_progress is cleared. On the next round of IGVN, > one input of a region points to predicates. The same region has an if as > use that can be split through phi during IGVN. The predicates are going > to be removed by IGVN. But that happens in multiple steps because there > are several predicates (for reason Deoptimization::Reason_predicate, > Deoptimization::Reason_loop_limit_check etc.) and because for each > predicate one IGVN iteration must first remove the Opaque1 node, then > another kill the IfFalse projection, finally another replace the IfTrue > projection by the If control input. > > Split if occurs while predicates are in the process of being removed. It > sees predicates, tries to walk over them, encounters a predicates that's > been half removed (false projection removed) and we hit the assert/crash. > > I propose we simply not apply IGVN split if if we're splitting through a > loop or if there's a predicate input to a region because: > > - Making split if robust to dying predicates is not straightforward as > far as I can tell > > - Loop opts split if doesn't split through loop header so why would it > make sense for IGVN split if? > > - I'm wondering if there are other cases where handling of predicates in > split if could be wrong (and so more trouble ahead): > > + What if we split through a Loop region, predicates were added by > loop optimizations, loop opts are now over so the predicates added at > parse time were removed: then PhaseIdealLoop::find_predicate() > wouldn't report a predicate but cloning predicates would still be > required for correctness? > > + What if we have no loop, a region has predicates as input, > predicates are going to die but have not yet been processed, split if > uselessly duplicates predicates but one of then is control dependent > on the branch it is in so cloning predicates actually causes a broken > graph? > > So overall it feels safer to me to simply bail out from split if for > loops/predicates. > > Roland. > From tobias.hartmann at oracle.com Mon Nov 4 07:33:06 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 4 Nov 2019 08:33:06 +0100 Subject: RFR(S): 8233081: C1: PatchingStub for field access copies too much In-Reply-To: References: Message-ID: <70bf20d2-a1f4-2e02-7c7a-62f986eaf7c0@oracle.com> Hi Martin, nice cleanup, x86/Sparc looks good to me. Best regards, Tobias On 30.10.19 16:38, Doerr, Martin wrote: > Hi, > > ? > > I'd like to fix an issue in C1's PatchingStub implementation for "access_field_id". > > We had noticed that the code in the template exceeded the 255 byte limitation when switching on > VerifyOops on PPC64. > > I'd like to improve the situation for all platforms. > > ? > > More detailed bug description: > > https://bugs.openjdk.java.net/browse/JDK-8233081 > > ? > > I need a function to determine how many bytes are needed for the NativeMovRegMem. > > x86 has next_instruction_address() which could in theory be used, but I noticed that it's dead code > which is no longer correct. > > Is it ok to remove it? > > I?d also like to remove the constant instruction_size from NativeMovRegMem because it?s not constant. > > I'd prefer to introduce num_bytes_to_end_of_patch() for the purpose of determining how many bytes to > copy for the "access_field_id" PatchingStub. > > We can factor out the offset computation from offset() and set_offset() and reuse it. This enforces > consistency. > > ? > > Webrev: > > http://cr.openjdk.java.net/~mdoerr/8233081_C1_access_field_patching/webrev.00/ > > ? > > Please review. > > ? > > Best regards, > > Martin > > ? > From tobias.hartmann at oracle.com Mon Nov 4 07:38:41 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 4 Nov 2019 08:38:41 +0100 Subject: RFR 8233389: Add PrintIdeal to compiler directives In-Reply-To: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com> References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com> Message-ID: Hi Jorn, looks good to me. I can sponsor the change once you got a second review. Best regards, Tobias On 01.11.19 16:09, Jorn Vernee wrote: > Hi, > > I'd like to add PrintIdeal as a compiler directive in order to enable PrintIdeal for only a single > method when combining it with the 'match' directive. > > Please review the following: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8233389 > Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/ > (Testing = tier1, manual) > > As a heads-up; I'm not a committer on the jdk project, so if this sounds like a good idea, I would > require a sponsor to push the changes. > > Thanks, > Jorn > From rwestrel at redhat.com Mon Nov 4 08:16:30 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 04 Nov 2019 09:16:30 +0100 Subject: RFR 8233389: Add PrintIdeal to compiler directives In-Reply-To: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com> References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com> Message-ID: <877e4g83a9.fsf@redhat.com> > Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/ Looks good to me. Roland. From nils.eliasson at oracle.com Mon Nov 4 09:16:22 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 4 Nov 2019 10:16:22 +0100 Subject: RFR(M): 8232896: ZGC: Enable C2 clone intrinsic In-Reply-To: <3e148258-8c89-afdd-bf4f-585be5e09c42@oracle.com> References: <2d509788-35f0-1fed-c305-d98d76583c66@oracle.com> <3e148258-8c89-afdd-bf4f-585be5e09c42@oracle.com> Message-ID: On 2019-11-01 12:50, Per Liden wrote: > Hi Nils, > > On 10/31/19 5:37 PM, Nils Eliasson wrote: >> Hi, >> >> This patch fixes and enables the clone intrinsic for C2 with ZGC. >> >> The main thing added is an implementation of the >> ZBarrierSetC2::clone_at_expansion method. This method handles clone >> expansion for regular objects and primitive arrays that hasn't >> already been reduced by optimizations. (Oop array clones doesn't go >> through this path.) >> >> The code switches on the type of the source to either make a leaf >> call to a runtime clone for regular objects, or delegate to >> BarrierSetC2::clone_at_expansion for primitive arrays. >> >> Updated micro benchmark shows great gains, especially for small >> objects that now will be reduced to inlined move-store sequences. > > Sweet! The speedup of micro:Clone is impressive! > >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8232896 >> >> Webrev: http://cr.openjdk.java.net/~neliasso/8232896/webrev.02/ > > My review comments summarized in a patch on top of yours. Let me know > if there's something you don't agree with. The changes are: > * Changed the type of the runtime function's size arguments from > TypeInt::INT to TypeLong::LONG, to make it 64 bits, rather than 32. > * Changed the new runtime function to call HeapAccess::clone() rather > than ZBarrier::clone_oop(), since ZBarrier is the backend for the > access layer, not the frontend. > * Renamed clone_oop() to clone(), as we're cloning an object rather > than an oop. Overall, I like your suggestions. One problem with using "clone" is that is suggest that it will create a new Object too (like how java.lang.Object.clone does). But this function only copies the contents. I am open to suggestions, and even keeping the clone name, but feel a slight bit of unease. Thanks! /Nils > * Added const where applicable. > * Adjusted an include line. > * Moved the clone_at_expansion() function up in the file. > > http://cr.openjdk.java.net/~pliden/8232896/webrev.pliden_review.0 > > I ran micro:Clone to verify my changes. > > cheers, > Per > >> >> >> Please review, >> >> Nils Eliasson >> From shade at redhat.com Mon Nov 4 08:53:47 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 4 Nov 2019 09:53:47 +0100 Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003 In-Reply-To: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn> References: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn> Message-ID: <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com> On 11/2/19 10:29 AM, Jie Fu wrote: > Hi all, > > May I get reviews for this small fix? > > JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233429 > Webrev: http://cr.openjdk.java.net/~jiefu/8233429/webrev.00/ Looks fine to me. The alternative is to stub out CompilationModeFlag::*() definitions under TIERED define, but that would be more awkward than effectively using the "default" mode for minimal and zero VMs. -- Thanks, -Aleksey From tobias.hartmann at oracle.com Mon Nov 4 08:55:50 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 4 Nov 2019 09:55:50 +0100 Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003 In-Reply-To: <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com> References: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn> <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com> Message-ID: <20c3a0ff-3b96-3fc8-f175-73eda72885cb@oracle.com> +1 Best regards, Tobias On 04.11.19 09:53, Aleksey Shipilev wrote: > On 11/2/19 10:29 AM, Jie Fu wrote: >> Hi all, >> >> May I get reviews for this small fix? >> >> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233429 >> Webrev: http://cr.openjdk.java.net/~jiefu/8233429/webrev.00/ > > Looks fine to me. > > The alternative is to stub out CompilationModeFlag::*() definitions under TIERED define, but that > would be more awkward than effectively using the "default" mode for minimal and zero VMs. > From per.liden at oracle.com Mon Nov 4 09:26:14 2019 From: per.liden at oracle.com (Per Liden) Date: Mon, 4 Nov 2019 10:26:14 +0100 Subject: RFR(M): 8232896: ZGC: Enable C2 clone intrinsic In-Reply-To: References: <2d509788-35f0-1fed-c305-d98d76583c66@oracle.com> <3e148258-8c89-afdd-bf4f-585be5e09c42@oracle.com> Message-ID: Hi Nils, On 11/4/19 10:16 AM, Nils Eliasson wrote: > > On 2019-11-01 12:50, Per Liden wrote: >> Hi Nils, >> >> On 10/31/19 5:37 PM, Nils Eliasson wrote: >>> Hi, >>> >>> This patch fixes and enables the clone intrinsic for C2 with ZGC. >>> >>> The main thing added is an implementation of the >>> ZBarrierSetC2::clone_at_expansion method. This method handles clone >>> expansion for regular objects and primitive arrays that hasn't >>> already been reduced by optimizations. (Oop array clones doesn't go >>> through this path.) >>> >>> The code switches on the type of the source to either make a leaf >>> call to a runtime clone for regular objects, or delegate to >>> BarrierSetC2::clone_at_expansion for primitive arrays. >>> >>> Updated micro benchmark shows great gains, especially for small >>> objects that now will be reduced to inlined move-store sequences. >> >> Sweet! The speedup of micro:Clone is impressive! >> >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8232896 >>> >>> Webrev: http://cr.openjdk.java.net/~neliasso/8232896/webrev.02/ >> >> My review comments summarized in a patch on top of yours. Let me know >> if there's something you don't agree with. The changes are: >> * Changed the type of the runtime function's size arguments from >> TypeInt::INT to TypeLong::LONG, to make it 64 bits, rather than 32. >> * Changed the new runtime function to call HeapAccess::clone() rather >> than ZBarrier::clone_oop(), since ZBarrier is the backend for the >> access layer, not the frontend. >> * Renamed clone_oop() to clone(), as we're cloning an object rather >> than an oop. > > Overall, I like your suggestions. > > One problem with using "clone" is that is suggest that it will create a > new Object too (like how java.lang.Object.clone does). But this function > only copies the contents. I am open to suggestions, and even keeping the > clone name, but feel a slight bit of unease. I see your points. However, I see two reasons to keep that name: 1) In the Access API it's called Access::clone(src, dst, size), and since ZBarrierRuntime::clone() is a bridge to that function it might be good to use the same name. 2) The clone() function has a dst argument, implying that dst already exists. Objections? cheers, Per > > Thanks! > > /Nils > >> * Added const where applicable. >> * Adjusted an include line. >> * Moved the clone_at_expansion() function up in the file. >> >> http://cr.openjdk.java.net/~pliden/8232896/webrev.pliden_review.0 >> >> I ran micro:Clone to verify my changes. >> >> cheers, >> Per >> >>> >>> >>> Please review, >>> >>> Nils Eliasson >>> From fujie at loongson.cn Mon Nov 4 09:33:46 2019 From: fujie at loongson.cn (Jie Fu) Date: Mon, 4 Nov 2019 17:33:46 +0800 Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003 In-Reply-To: <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com> References: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn> <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com> Message-ID: Hi Aleksey and Tobias, Thanks for your review and valuable comments. I'm sorry to mention that Igor is teaching me how to fix the bug off the list these days. What do you think of this version? ? http://cr.openjdk.java.net/~jiefu/8233429/webrev.01/ I prefer webrev.01. Thanks a lot. Best regards, Jie On 2019/11/4 ??4:53, Aleksey Shipilev wrote: > On 11/2/19 10:29 AM, Jie Fu wrote: >> Hi all, >> >> May I get reviews for this small fix? >> >> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233429 >> Webrev: http://cr.openjdk.java.net/~jiefu/8233429/webrev.00/ > Looks fine to me. > > The alternative is to stub out CompilationModeFlag::*() definitions under TIERED define, but that > would be more awkward than effectively using the "default" mode for minimal and zero VMs. > From nils.eliasson at oracle.com Mon Nov 4 09:52:15 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 4 Nov 2019 10:52:15 +0100 Subject: RFR(M): 8232896: ZGC: Enable C2 clone intrinsic In-Reply-To: References: <2d509788-35f0-1fed-c305-d98d76583c66@oracle.com> <3e148258-8c89-afdd-bf4f-585be5e09c42@oracle.com> Message-ID: <9f1fe58a-17b9-899a-ffd9-f743e8cb1551@oracle.com> On 2019-11-04 10:26, Per Liden wrote: > Hi Nils, > > On 11/4/19 10:16 AM, Nils Eliasson wrote: >> >> On 2019-11-01 12:50, Per Liden wrote: >>> Hi Nils, >>> >>> On 10/31/19 5:37 PM, Nils Eliasson wrote: >>>> Hi, >>>> >>>> This patch fixes and enables the clone intrinsic for C2 with ZGC. >>>> >>>> The main thing added is an implementation of the >>>> ZBarrierSetC2::clone_at_expansion method. This method handles clone >>>> expansion for regular objects and primitive arrays that hasn't >>>> already been reduced by optimizations. (Oop array clones doesn't go >>>> through this path.) >>>> >>>> The code switches on the type of the source to either make a leaf >>>> call to a runtime clone for regular objects, or delegate to >>>> BarrierSetC2::clone_at_expansion for primitive arrays. >>>> >>>> Updated micro benchmark shows great gains, especially for small >>>> objects that now will be reduced to inlined move-store sequences. >>> >>> Sweet! The speedup of micro:Clone is impressive! >>> >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8232896 >>>> >>>> Webrev: http://cr.openjdk.java.net/~neliasso/8232896/webrev.02/ >>> >>> My review comments summarized in a patch on top of yours. Let me >>> know if there's something you don't agree with. The changes are: >>> * Changed the type of the runtime function's size arguments from >>> TypeInt::INT to TypeLong::LONG, to make it 64 bits, rather than 32. >>> * Changed the new runtime function to call HeapAccess::clone() >>> rather than ZBarrier::clone_oop(), since ZBarrier is the backend for >>> the access layer, not the frontend. >>> * Renamed clone_oop() to clone(), as we're cloning an object rather >>> than an oop. >> >> Overall, I like your suggestions. >> >> One problem with using "clone" is that is suggest that it will create >> a new Object too (like how java.lang.Object.clone does). But this >> function only copies the contents. I am open to suggestions, and even >> keeping the clone name, but feel a slight bit of unease. > > I see your points. However, I see two reasons to keep that name: > 1) In the Access API it's called Access::clone(src, dst, size), and > since ZBarrierRuntime::clone() is a bridge to that function it might > be good to use the same name. > 2) The clone() function has a dst argument, implying that dst already > exists. > > Objections? > > cheers, > Per No, Lets go with this. Thanks! // Nils > >> >> Thanks! >> >> /Nils >> >>> * Added const where applicable. >>> * Adjusted an include line. >>> * Moved the clone_at_expansion() function up in the file. >>> >>> http://cr.openjdk.java.net/~pliden/8232896/webrev.pliden_review.0 >>> >>> I ran micro:Clone to verify my changes. >>> >>> cheers, >>> Per >>> >>>> >>>> >>>> Please review, >>>> >>>> Nils Eliasson >>>> From adinn at redhat.com Mon Nov 4 10:08:33 2019 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 4 Nov 2019 10:08:33 +0000 Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free blocks In-Reply-To: <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com> References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com> <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com> <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com> <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com> <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com> <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com> <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com> <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com> Message-ID: Hi Lutz, I'll summarize my thoughts here rather than answer point by point. The patch successfully addresses the worst case performance but it seems to me extremely unlikely that we will see anything that approaches that case in real applications. So, that doesn't argue for pushing the patch. The patch does not seem to make a significant difference to the stress test. This test is also not necessarily 'representative' of real cases but it is much more likely to be so than the worst case test. That suggests to me that the current patch is perhaps not worth pursuing (it ain't really broke so ...). Especially so given that it is not possible to distinguish any benefit when running the Spec benchmark apps. One could argue that the patch looks like it will do no harm and may do good in pathological cases but that's not really good enough reason to make a change. We really need evidence that this is worth doing. The free list 'search bottleneck' certainly looks like a more promising problem to tackle than the 'merge problem'. However, once again this 'problem' may just be an artefact of running this specific test rather than anything that might happen in real life. I think the only way to find out for sure whether the current patch or a patch that addresses the 'search bottleneck' is going to be beneficial is to instrument the JVM to record traces for code-cache use from real apps and then replay allocations/frees based on those traces to see what difference a patch makes and how much this might help the overall execution time. regards, Andrew Dinn ----------- On 31/10/2019 16:55, Schmidt, Lutz wrote: > Hi Andrew, (and hi to the interested crowd), > > Please accept my apologies for taking so long to get back. > > These tests (OverflowCodeCacheTest and StressCodeCacheTest) were causing me quite some headaches. Some layer between me and the test prevents the vm (in particular: the VMThread) from terminating normally. The final output from my time measurements is therefore not generated or thrown away. Adding to that were some test machine unavailabilities and a bug in my measurement code, causing crashes. > > Anyway, I added some on-the-fly output, printing the timer values after 10k measurement intervals. This reveals some interesting, additional facts about the tests and the CodeHeap management methods. For detailed numbers, refer to the files attached to the bug (https://bugs.openjdk.java.net/browse/JDK-8231460). For even more detail, I can provide the jtr files on request. > > > OverflowCodeCacheTest > ===================== > This test runs (in my setup) with a 1GB CodeCache. > > For this test, CodeHeap::mark_segmap_as_used() is THE performance hog. 40% of all calls have to mark more than 16k segment map entries (in the not optimized case). Basically all of these calls convert to len=1 calls with the optimization turned on. Note that during FreeBlock joining, the segment count is forced to 1(one). No wonder the time spent in CodeHeap::mark_segmap_as_used() collapses from >80sec (half of the test runtime) to <100msec. > > CodeHeap::add_to_freelist() on the other hand, is almost not observable. Average free list length is at two elements, making even linear search really quick. > > > StressCodeCacheTest > =================== > With a 1GB CodeCache, this test runs into a 12 min timeout, set by our internal test environment. Scaling back to 300MB prevents the test from timing out. > > For this test, CodeHeap::mark_segmap_as_used() is not a factor. From 200,000 calls, only a few (less than 3%) had to process a block consisting of more than 16 segments. Note that during FreeBlock joining, the segment count is forced to 1(one). > > Another method is popping up as performance hog instead: CodeHeap::add_to_freelist(). More than 8 out of 40 seconds of test runtime (before optimization) are spent in this method, for just 160,000 calls. The test seems to create a long list of non-contiguous free blocks (around 5,500 on average). This list is linearly scanned to find the insert point for the free block at hand. > > Suffering as well from the long free block list is CodeHeap::search_freelist(). It uses another 2.7 seconds for 270,000 calls. > > > SPEVjvm2008 suite > ================= > With respect to the task at hand, this is a well-behaved test suite. Timing shows some before/after difference, but nothing spectacular. The measurements due not provide evidence of a performance bottleneck. > > > There were some minor adjustments to the code. Unused code blocks have been removed as well. I have therefore created a new webrev. You can find it here: > http://cr.openjdk.java.net/~lucy/webrevs/8231460.01/ > > Thanks for investing your time! > Lutz > > > On 21.10.19, 15:06, "Andrew Dinn" wrote: > > Hi Lutz, > > On 21/10/2019 13:37, Schmidt, Lutz wrote: > > I understand what you are interested in. And I was hoping to be able > > to provide some (first) numbers by today. Unfortunately, the > > measurement code I activated last Friday was buggy and blew most of > > the tests I had hoped to run over the weekend. > > > > I will take your modified test and run it with and without my > > optimization. In parallel, I will try to generate some (non-random) > > numbers for other tests. > > > > I'll be back as soon as I have results. > > Thanks for trying the test and also for deriving some call stats from a > real example. I'm keen to see how much your patch improves things. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill > > > > From martin.doerr at sap.com Mon Nov 4 11:12:42 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 4 Nov 2019 11:12:42 +0000 Subject: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns wrong result if Klass* is aligned to 32bit In-Reply-To: References: Message-ID: Hi Thomas, looks good. Thanks, Martin From: Thomas St?fe Sent: Donnerstag, 31. Oktober 2019 08:02 To: Doerr, Martin Cc: hotspot compiler ; Schmidt, Lutz Subject: Re: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns wrong result if Klass* is aligned to 32bit Hi Martin, thanks for the review! New version: http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.01/webrev/ pls find remarks inline. On Wed, Oct 30, 2019 at 1:09 PM Doerr, Martin > wrote: Hi Thomas, thank you for finding and fixing this issue. Shared code part and regression test look good to me. But I have a few requests and questions related to the platform code. Arm32: Breaks build. The local variable p needs a scope. Can be fixed by: - case T_METADATA: + case T_METADATA: { // We only need, for now, comparison with NULL for metadata. ? break; + } Ouch. Fixed. Actually, I discarded my coding completely and took over the code from T_OBJECT to keep in line with the rest of the coding here. S390: Using z_cgfi is correct, but there are equivalent instructions with shorter opcode. For comparing to 0, z_ltgr(reg1, reg1) or z_cghi(reg1, 0) may be preferred. But that?s not a big deal. Okay I switched to cghi. ltgr sounds cool but would be difficult to integrate into the shared part since that one does first a move, then a compare. I wonder why you have added includes to some platform files. Isn?t that redundant? "utilities/debug.hpp" comes via shared assembler.hpp. Did this because I added asserts and the rule is that every file should include what it needs and not rely on other includes including it (save for umbrella includes like globalDefinitions.hpp). But okay, I removed the added includes again to keep the patch small. I?d probably choose Unimplemented() instead of ShouldNotReachHere() for non-null cases because it?s not bad in general, it?s just currently not used and therefore not yet implemented. But you can keep that as it is. I?m ok with that, too. I rather keep it to keep in line with the rest of the code (see e.g. the default: branches). Best regards, Martin Thanks Martin! ..Thomas From: Thomas St?fe > Sent: Mittwoch, 30. Oktober 2019 11:48 To: hotspot compiler > Cc: Doerr, Martin >; Schmidt, Lutz > Subject: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns wrong result if Klass* is aligned to 32bit Hi all, second attempt at a fix (please find first review thread here: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-October/035608.html) Issue: https://bugs.openjdk.java.net/browse/JDK-8233019 webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.00/webrev/ In short, C1 intrinsic for jlC::isPrimitive does a compare with the Klass* pointer for the class to find out if its NULL and hence a primitive type. That compare is done using 32bit cmp and so it gives wrong results when the Klass* pointer is aligned to 32bit. In the generator I changed the comparison constant type from intConst(0) to metadataConst(0) and implemented the missing code paths for all CPUs. Since on most architectures we do not seem to have a comparison with a 64bit immediate (at least I could not find one) I kept the change simple and only implemented comparison with NULL for now. I tested the fix in our nightlies (jtreg tier1, jck and others) as well as manually testing it. I did not test on aarch64 and arm though and would be thankful if someone knowledgeable to these platforms could take a look. Thanks to Martin and Lutz for eyeballing the ppc and s390 parts. Thanks, Thomas From martin.doerr at sap.com Mon Nov 4 11:12:45 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 4 Nov 2019 11:12:45 +0000 Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread In-Reply-To: <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com> References: <3f80fc0b-2388-d4be-3c84-4af516e9635f@oracle.com> <15a92da5-c5ba-ce55-341d-5f60acf14c3a@oracle.com> <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com> <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com> <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com> <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com> Message-ID: Hi all, @Dean > changing can_remove() to CompileBroker::can_remove()? Yes. That would be an option. @Kim, David I think there's another problem with this implementation. It introduces a use-after-free pattern due to concurrency. Compiler threads may still read the oops from the handles after one of them has called destroy_global until next safepoint. It doesn't matter which values they get in this case, but the VM should not crash. I believe that OopStorage allows freeing storage without safepoints, so this may be unsafe. Right? If so, I think replacing the oops in the handles (and keeping the handles alive) would be better. And also much more simple. Best regards, Martin > -----Original Message----- > From: dean.long at oracle.com > Sent: Samstag, 2. November 2019 08:36 > To: Doerr, Martin ; David Holmes > ; Kim Barrett > Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) > ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread > > Hi Martin, > > On 10/30/19 3:18 AM, Doerr, Martin wrote: > > Hi David, > > > >> I don't think factoring out CompileBroker::clear_compiler2_object when > >> it is only used once was warranted, but that's call for compiler team to > >> make. > > I did that because _compiler2_objects is private and there's currently no > setter available. > > But let's see what the compiler folks think. > > how about changing can_remove() to CompileBroker::can_remove()? Then > you > can access _compiler2_objects directly, right? > > dl > >> Otherwise changes seem fine and I have noted the use of the > >> MutexUnlocker as per your direct email. > > Thanks a lot for reviewing. It was not a trivial one ?? > > > > You had noticed an incorrect usage of the CHECK macro. I've created a new > bug for that: > > https://bugs.openjdk.java.net/browse/JDK-8233193 > > Would be great if you could take a look if that's what you meant and made > adaptions if needed. > > > > Best regards, > > Martin > > > > > >> -----Original Message----- > >> From: David Holmes > >> Sent: Mittwoch, 30. Oktober 2019 05:47 > >> To: Doerr, Martin ; Kim Barrett > >> > >> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) > >> ; hotspot-compiler-dev at openjdk.java.net; > >> David Holmes > >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread > >> > >> Hi Martin, > >> > >> On 29/10/2019 12:06 am, Doerr, Martin wrote: > >>> Hi David and Kim, > >>> > >>> I think it's easier to talk about code. So here's a new webrev: > >>> > >> > http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr > >> ev.03/ > >> > >> I don't think factoring out CompileBroker::clear_compiler2_object when > >> it is only used once was warranted, but that's call for compiler team to > >> make. Otherwise changes seem fine and I have noted the use of the > >> MutexUnlocker as per your direct email. > >> > >> Thanks, > >> David > >> ----- > >> > >>> @Kim: > >>> Thanks for looking at the handle related parts. It's ok if you don't want to > >> be a reviewer of the whole change. > >>>> I think it's weird that can_remove() is a predicate with optional side > >>>> effects. I think it would be simpler to have it be a pure predicate, > >>>> and have the one caller with do_it = true perform the updates. That > >>>> should include NULLing out the handle pointer (perhaps debug-only, > but > >>>> it doesn't cost much to cleanly maintain the data structure). > >>> Nevertheless, it has the advantage that it enforces the update to be > >> consistent. > >>> A caller could use it without holding the lock or mess it up otherwise. > >>> In addition, I don't what to change that as part of this fix. > >>> > >>>> So far as I can tell, THREAD == NULL here. > >>> This is a very tricky part (not my invention): > >>> EXCEPTION_MARK contains an ExceptionMark constructor call which > sets > >> __the_thread__ to Thread::current(). > >>> I don't want to publish my opinion about this ?? > >>> > >>> @David: > >>> Seems like this option is preferred over option 3 > >> (possibly_add_compiler_threads part of webrev.02 and leave the > >> initialization as is). > >>> So when you're ok with it, I'll request a 2nd review from the compiler > folks > >> (I should change the subject to contain RFR). > >>> Thanks, > >>> Martin > >>> > >>> > >>>> -----Original Message----- > >>>> From: David Holmes > >>>> Sent: Montag, 28. Oktober 2019 05:04 > >>>> To: Kim Barrett > >>>> Cc: Doerr, Martin ; Vladimir Kozlov > >>>> (vladimir.kozlov at oracle.com) ; hotspot- > >>>> compiler-dev at openjdk.java.net > >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread > >>>> > >>>> On 28/10/2019 1:42 pm, Kim Barrett wrote: > >>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes > >> > >>>> wrote: > >>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote: > >>>>>>> Hi Kim, > >>>>>>> I didn't like using the OopStorage stuff directly, either. I just have > not > >>>> seen how to allocate a global handle and add the oop later. > >>>>>>> Thanks for pointing me to JVMCI::make_global. I was not aware of > >> that. > >>>>>>> So I can imagine 3 ways to implement it: > >>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just > added > >>>> that to > >> > http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr > >>>> ev.01/ > >>>>>>> We may want to improve that further by setting the handle pointer > to > >>>> NULL and asserting that it is NULL before adding the new one. > >>>>>>> I had been concerned about NULLs in the array, but looks like the > >>>> existing code can deal with that. > >>>>>> I think it would be cleaner to both destroy the global handle and > NULL it > >> in > >>>> the array at the same time. > >>>>>> This comment > >>>>>> > >>>>>> 325 // Old j.l.Thread object can die here. > >>>>>> > >>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct- > >threadObj() > >> so > >>>> can't "die" until that is also cleared during the actual termination > process. > >>>>> I think if there is such a thread here that it can't die, because the > >>>>> death predicate (the can_remove stuff) won't see that old thread as > >>>>> the last thread in _compiler2_objects. That's what I meant by this: > >>>>> > >>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett > >>>> wrote: > >>>>>> I also think that here: > >>>>>> > >>>>>> 947 jobject thread_handle = > >> JNIHandles::make_global(thread_oop); > >>>>>> 948 _compiler2_objects[i] = thread_handle; > >>>>>> > >>>>>> should assert _compiler2_objects[i] == NULL. Or if that isn't a valid > >>>>>> assertion then I think there are other problems. > >>>>> I think either that comment about an old thread is wrong (and the > NULL > >>>>> assertion I suggested is okay), or I think the whole mechanism here > >>>>> has problems. Or at least I was unable to figure out how it could > work... > >>>>> > >>>> I'm not following sorry. You can't assert NULL unless it's actually set > >>>> to NULL which it presently isn't. But it could be set NULL as Martin > >>>> suggested: > >>>> > >>>> "We may want to improve that further by setting the handle pointer to > >>>> NULL and asserting that it is NULL before adding the new one." > >>>> > >>>> and which I also supported. But that aside once the delete_global has > >>>> been called that JNIHandle no longer references the j.l.Thread that it > >>>> did, at which point it is only reachable via the threadObj() of the > >>>> CompilerThread. > >>>> > >>>> David From thomas.stuefe at gmail.com Mon Nov 4 11:17:27 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 4 Nov 2019 12:17:27 +0100 Subject: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns wrong result if Klass* is aligned to 32bit In-Reply-To: References: Message-ID: Thanks, Martin! On Mon, Nov 4, 2019 at 12:12 PM Doerr, Martin wrote: > Hi Thomas, > > > > looks good. > > > > Thanks, > > Martin > > > > > > *From:* Thomas St?fe > *Sent:* Donnerstag, 31. Oktober 2019 08:02 > *To:* Doerr, Martin > *Cc:* hotspot compiler ; Schmidt, > Lutz > *Subject:* Re: RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) > returns wrong result if Klass* is aligned to 32bit > > > > Hi Martin, > > > > thanks for the review! > > > > New version: > http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.01/webrev/ > > > > pls find remarks inline. > > > > On Wed, Oct 30, 2019 at 1:09 PM Doerr, Martin > wrote: > > Hi Thomas, > > > > thank you for finding and fixing this issue. > > > > Shared code part and regression test look good to me. > > But I have a few requests and questions related to the platform code. > > > > Arm32: > > Breaks build. The local variable p needs a scope. Can be fixed by: > > - case T_METADATA: > > + case T_METADATA: { > > // We only need, for now, comparison with NULL for metadata. > > ? > > break; > > + } > > > > Ouch. Fixed. Actually, I discarded my coding completely and took over the > code from T_OBJECT to keep in line with the rest of the coding here. > > > > > > S390: > > Using z_cgfi is correct, but there are equivalent instructions with > shorter opcode. > > For comparing to 0, z_ltgr(reg1, reg1) or z_cghi(reg1, 0) may be > preferred. But that?s not a big deal. > > > > > > Okay I switched to cghi. ltgr sounds cool but would be difficult to > integrate into the shared part since that one does first a move, then a > compare. > > > > I wonder why you have added includes to some platform files. Isn?t that > redundant? > > "utilities/debug.hpp" comes via shared assembler.hpp. > > > > Did this because I added asserts and the rule is that every file should > include what it needs and not rely on other includes including it (save for > umbrella includes like globalDefinitions.hpp). But okay, I removed the > added includes again to keep the patch small. > > > > > > I?d probably choose Unimplemented() instead of ShouldNotReachHere() for > non-null cases because it?s not bad in general, it?s just currently not > used and therefore not yet implemented. > > But you can keep that as it is. I?m ok with that, too. > > > > I rather keep it to keep in line with the rest of the code (see e.g. the > default: branches). > > > > > > Best regards, > > Martin > > > > > > Thanks Martin! > > > > ..Thomas > > > > > > *From:* Thomas St?fe > *Sent:* Mittwoch, 30. Oktober 2019 11:48 > *To:* hotspot compiler > *Cc:* Doerr, Martin ; Schmidt, Lutz < > lutz.schmidt at sap.com> > *Subject:* RFR(xs): 8233019: java.lang.Class.isPrimitive() (C1) returns > wrong result if Klass* is aligned to 32bit > > > > Hi all, > > > > second attempt at a fix (please find first review thread here: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-October/035608.html > ) > > > > Issue: https://bugs.openjdk.java.net/browse/JDK-8233019 > > webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8233019--c1-intrinsic-for-java.lang.class.isprimitive()-does-32bit-compare/webrev.00/webrev/ > > > > In short, C1 intrinsic for jlC::isPrimitive does a compare with the Klass* > pointer for the class to find out if its NULL and hence a primitive type. > That compare is done using 32bit cmp and so it gives wrong results when the > Klass* pointer is aligned to 32bit. > > > > In the generator I changed the comparison constant type from intConst(0) > to metadataConst(0) and implemented the missing code paths for all CPUs. > Since on most architectures we do not seem to have a comparison with a > 64bit immediate (at least I could not find one) I kept the change simple > and only implemented comparison with NULL for now. > > > > I tested the fix in our nightlies (jtreg tier1, jck and others) as well as > manually testing it. > > > > I did not test on aarch64 and arm though and would be thankful if someone > knowledgeable to these platforms could take a look. > > > > Thanks to Martin and Lutz for eyeballing the ppc and s390 parts. > > > > Thanks, Thomas > > From martin.doerr at sap.com Mon Nov 4 11:20:05 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 4 Nov 2019 11:20:05 +0000 Subject: RFR(S): 8233081: C1: PatchingStub for field access copies too much In-Reply-To: <70bf20d2-a1f4-2e02-7c7a-62f986eaf7c0@oracle.com> References: <70bf20d2-a1f4-2e02-7c7a-62f986eaf7c0@oracle.com> Message-ID: Hi Tobias, thanks for the review. Best regards, Martin > -----Original Message----- > From: Tobias Hartmann > Sent: Montag, 4. November 2019 08:33 > To: Doerr, Martin ; 'hotspot-compiler- > dev at openjdk.java.net' > Cc: Lindenmaier, Goetz > Subject: Re: RFR(S): 8233081: C1: PatchingStub for field access copies too > much > > Hi Martin, > > nice cleanup, x86/Sparc looks good to me. > > Best regards, > Tobias > > On 30.10.19 16:38, Doerr, Martin wrote: > > Hi, > > > > > > > > I'd like to fix an issue in C1's PatchingStub implementation for > "access_field_id". > > > > We had noticed that the code in the template exceeded the 255 byte > limitation when switching on > > VerifyOops on PPC64. > > > > I'd like to improve the situation for all platforms. > > > > > > > > More detailed bug description: > > > > https://bugs.openjdk.java.net/browse/JDK-8233081 > > > > > > > > I need a function to determine how many bytes are needed for the > NativeMovRegMem. > > > > x86 has next_instruction_address() which could in theory be used, but I > noticed that it's dead code > > which is no longer correct. > > > > Is it ok to remove it? > > > > I'd also like to remove the constant instruction_size from > NativeMovRegMem because it's not constant. > > > > I'd prefer to introduce num_bytes_to_end_of_patch() for the purpose of > determining how many bytes to > > copy for the "access_field_id" PatchingStub. > > > > We can factor out the offset computation from offset() and set_offset() > and reuse it. This enforces > > consistency. > > > > > > > > Webrev: > > > > > http://cr.openjdk.java.net/~mdoerr/8233081_C1_access_field_patching/we > brev.00/ > > > > > > > > Please review. > > > > > > > > Best regards, > > > > Martin > > > > > > From nils.eliasson at oracle.com Mon Nov 4 12:54:55 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 4 Nov 2019 13:54:55 +0100 Subject: RFR 8233389: Add PrintIdeal to compiler directives In-Reply-To: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com> References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com> Message-ID: <54ce54dd-5b68-311a-e055-3db815c081bd@oracle.com> Hi Jorn, One nitpick - in compilerDirectives.hpp: you should surround the line with NOT_PRODUCT(...) similair to how it's done for TraceOptoOutput. You have already made sure that the use is guarded, which is good, but it should show in the list too. In the future I hope we can make this flag diagnostic. Otherwise good. Regards, Nils On 2019-11-01 16:09, Jorn Vernee wrote: > Hi, > > I'd like to add PrintIdeal as a compiler directive in order to enable > PrintIdeal for only a single method when combining it with the 'match' > directive. > > Please review the following: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8233389 > Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/ > (Testing = tier1, manual) > > As a heads-up; I'm not a committer on the jdk project, so if this > sounds like a good idea, I would require a sponsor to push the changes. > > Thanks, > Jorn > From patric.hedlin at oracle.com Mon Nov 4 13:19:25 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Mon, 4 Nov 2019 14:19:25 +0100 Subject: RFR(S/T): 8233498: [SPARC] Remove dead code Message-ID: Dear all, I would like to ask for help to review the following change/update: Issue:? https://bugs.openjdk.java.net/browse/JDK-8233498 Webrev: http://cr.openjdk.java.net/~phedlin/tr8233498/ 8233498: [SPARC] Remove dead code ??? Remove a number of dead/unused/non-implemented methods in the ??? SPARC version of the MacroAssembler. Testing: SPARC build on Solaris and Linux (thanks to Adrian G. [1]) ??? hs-tier1-3 (sparcv9) [1]. John Paul Adrian Glaubitz Best regards, Patric From nils.eliasson at oracle.com Mon Nov 4 13:52:44 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 4 Nov 2019 14:52:44 +0100 Subject: RFR(S/T): 8233498: [SPARC] Remove dead code In-Reply-To: References: Message-ID: Looks good! Regards, Nils On 2019-11-04 14:19, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8233498 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8233498/ > > 8233498: [SPARC] Remove dead code > > ??? Remove a number of dead/unused/non-implemented methods in the > ??? SPARC version of the MacroAssembler. > > > Testing: SPARC build on Solaris and Linux (thanks to Adrian G. [1]) > ??? hs-tier1-3 (sparcv9) > > [1]. John Paul Adrian Glaubitz > > > Best regards, > Patric > From vladimir.x.ivanov at oracle.com Mon Nov 4 14:45:03 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 4 Nov 2019 17:45:03 +0300 Subject: RFR 8233389: Add PrintIdeal to compiler directives In-Reply-To: <54ce54dd-5b68-311a-e055-3db815c081bd@oracle.com> References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com> <54ce54dd-5b68-311a-e055-3db815c081bd@oracle.com> Message-ID: <7deb65c5-822c-36db-bb97-8f2e3edbac3b@oracle.com> On 04.11.2019 15:54, Nils Eliasson wrote: > Hi Jorn, > > One nitpick - in compilerDirectives.hpp: you should surround the line > with NOT_PRODUCT(...) similair to how it's done for TraceOptoOutput. You > have already made sure that the use is guarded, which is good, but it > should show in the list too. FTR the same applies to IGVPrintLevel: src/hotspot/share/compiler/compilerDirectives.hpp: cflags(IGVPrintLevel, intx, PrintIdealGraphLevel, IGVPrintLevel) \ src/hotspot/share/opto/c2_globals.hpp: notproduct(intx, PrintIdealGraphLevel, 0, Best regards, Vladimir Ivanov > > In the future I hope we can make this flag diagnostic. > > Otherwise good. > > Regards, > > Nils > > On 2019-11-01 16:09, Jorn Vernee wrote: >> Hi, >> >> I'd like to add PrintIdeal as a compiler directive in order to enable >> PrintIdeal for only a single method when combining it with the 'match' >> directive. >> >> Please review the following: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8233389 >> Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/ >> (Testing = tier1, manual) >> >> As a heads-up; I'm not a committer on the jdk project, so if this >> sounds like a good idea, I would require a sponsor to push the changes. >> >> Thanks, >> Jorn >> From vladimir.x.ivanov at oracle.com Mon Nov 4 14:57:06 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 4 Nov 2019 17:57:06 +0300 Subject: RFR 8233389: Add PrintIdeal to compiler directives In-Reply-To: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com> References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com> Message-ID: <21ab250e-0564-69e4-62c0-07bb4dce9082@oracle.com> Hi Jorn, src\hotspot\share\opto\compile.hpp: + bool _print_ideal; // True if we should dump node IR for this compilation Since the only usage is in non-product code, I suggest to put _print_ideal into #ifndef PRODUCT, so you don't need to initialize it in product build. Also, it'll allow you to just put it on initializer list instead of doing it in the ctor body (akin to how _trace_opto_output is handled): src\hotspot\share\opto\compile.cpp: Compile::Compile( ciEnv* ci_env, ... : Phase(Compiler), ... _has_reserved_stack_access(false), #ifndef PRODUCT _trace_opto_output(directive->TraceOptoOutputOption), #endif _has_method_handle_invokes(false), Overall, I don't see much value in PrintIdeal: PrintIdealGraph provides much more detailed information (even though in XML format) and IdealGraphVisualizer is better at browsing the graph. The only thing I'm usually missing is full text dump output on individual nodes (they are shown pruned in IGV; not sure whether it's IGV fault or the info is missing in the dump). Best regards, Vladimir Ivanov On 01.11.2019 18:09, Jorn Vernee wrote: > Hi, > > I'd like to add PrintIdeal as a compiler directive in order to enable > PrintIdeal for only a single method when combining it with the 'match' > directive. > > Please review the following: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8233389 > Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/ > (Testing = tier1, manual) > > As a heads-up; I'm not a committer on the jdk project, so if this sounds > like a good idea, I would require a sponsor to push the changes. > > Thanks, > Jorn > From lutz.schmidt at sap.com Mon Nov 4 15:35:30 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Mon, 4 Nov 2019 15:35:30 +0000 Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free blocks In-Reply-To: References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com> <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com> <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com> <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com> <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com> <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com> <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com> <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com> Message-ID: <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com> Hi Andrew, thank you for your thoughts. I do not agree to your conclusion, though. There are two bottlenecks in the CodeHeap management code. One is in CodeHeap::mark_segmap_as_used(), uncovered by OverflowCodeCacheTest.java. The other is in CodeHeap::add_to_freelist(), uncovered by StressCodeCacheTest.java. Both bottlenecks are tackled by the recommended changeset. CodeHeap::mark_segmap_as_used() is no longer O(n*n) for the critical "FreeBlock-join" case. It actually is O(1) now. The time reduction from > 80 seconds to just a few milliseconds is proof of that statement. CodeHeap::add_to_freelist() is still O(n*n), with n being the free list length. But the kick-in point of the non-linearity could be significantly shifted towards larger n. The time reduction from approx. 8 seconds to 160 milliseconds supports this statement. I agree it would be helpful to have a "real-world" example showing some improvement. Providing such evidence is hard, though. I could instrument the code and print some values form time to time. It's certain this additional output will mess up success/failure decisions in our test environment. Not sure everybody likes that. But I will give it a try and take the hits. This will be a multi-day effort. On a general note, I am always uncomfortable knowing of a O(n*n) effort, in particular when it could be removed or at least tamed considerably. Experience tells (at least to me) that, at some point in time, n will be large enough to hurt. I'll be back. Thanks, Lutz ?On 04.11.19, 11:08, "Andrew Dinn" wrote: Hi Lutz, I'll summarize my thoughts here rather than answer point by point. The patch successfully addresses the worst case performance but it seems to me extremely unlikely that we will see anything that approaches that case in real applications. So, that doesn't argue for pushing the patch. The patch does not seem to make a significant difference to the stress test. This test is also not necessarily 'representative' of real cases but it is much more likely to be so than the worst case test. That suggests to me that the current patch is perhaps not worth pursuing (it ain't really broke so ...). Especially so given that it is not possible to distinguish any benefit when running the Spec benchmark apps. One could argue that the patch looks like it will do no harm and may do good in pathological cases but that's not really good enough reason to make a change. We really need evidence that this is worth doing. The free list 'search bottleneck' certainly looks like a more promising problem to tackle than the 'merge problem'. However, once again this 'problem' may just be an artefact of running this specific test rather than anything that might happen in real life. I think the only way to find out for sure whether the current patch or a patch that addresses the 'search bottleneck' is going to be beneficial is to instrument the JVM to record traces for code-cache use from real apps and then replay allocations/frees based on those traces to see what difference a patch makes and how much this might help the overall execution time. regards, Andrew Dinn ----------- On 31/10/2019 16:55, Schmidt, Lutz wrote: > Hi Andrew, (and hi to the interested crowd), > > Please accept my apologies for taking so long to get back. > > These tests (OverflowCodeCacheTest and StressCodeCacheTest) were causing me quite some headaches. Some layer between me and the test prevents the vm (in particular: the VMThread) from terminating normally. The final output from my time measurements is therefore not generated or thrown away. Adding to that were some test machine unavailabilities and a bug in my measurement code, causing crashes. > > Anyway, I added some on-the-fly output, printing the timer values after 10k measurement intervals. This reveals some interesting, additional facts about the tests and the CodeHeap management methods. For detailed numbers, refer to the files attached to the bug (https://bugs.openjdk.java.net/browse/JDK-8231460). For even more detail, I can provide the jtr files on request. > > > OverflowCodeCacheTest > ===================== > This test runs (in my setup) with a 1GB CodeCache. > > For this test, CodeHeap::mark_segmap_as_used() is THE performance hog. 40% of all calls have to mark more than 16k segment map entries (in the not optimized case). Basically all of these calls convert to len=1 calls with the optimization turned on. Note that during FreeBlock joining, the segment count is forced to 1(one). No wonder the time spent in CodeHeap::mark_segmap_as_used() collapses from >80sec (half of the test runtime) to <100msec. > > CodeHeap::add_to_freelist() on the other hand, is almost not observable. Average free list length is at two elements, making even linear search really quick. > > > StressCodeCacheTest > =================== > With a 1GB CodeCache, this test runs into a 12 min timeout, set by our internal test environment. Scaling back to 300MB prevents the test from timing out. > > For this test, CodeHeap::mark_segmap_as_used() is not a factor. From 200,000 calls, only a few (less than 3%) had to process a block consisting of more than 16 segments. Note that during FreeBlock joining, the segment count is forced to 1(one). > > Another method is popping up as performance hog instead: CodeHeap::add_to_freelist(). More than 8 out of 40 seconds of test runtime (before optimization) are spent in this method, for just 160,000 calls. The test seems to create a long list of non-contiguous free blocks (around 5,500 on average). This list is linearly scanned to find the insert point for the free block at hand. > > Suffering as well from the long free block list is CodeHeap::search_freelist(). It uses another 2.7 seconds for 270,000 calls. > > > SPEVjvm2008 suite > ================= > With respect to the task at hand, this is a well-behaved test suite. Timing shows some before/after difference, but nothing spectacular. The measurements due not provide evidence of a performance bottleneck. > > > There were some minor adjustments to the code. Unused code blocks have been removed as well. I have therefore created a new webrev. You can find it here: > http://cr.openjdk.java.net/~lucy/webrevs/8231460.01/ > > Thanks for investing your time! > Lutz > > > On 21.10.19, 15:06, "Andrew Dinn" wrote: > > Hi Lutz, > > On 21/10/2019 13:37, Schmidt, Lutz wrote: > > I understand what you are interested in. And I was hoping to be able > > to provide some (first) numbers by today. Unfortunately, the > > measurement code I activated last Friday was buggy and blew most of > > the tests I had hoped to run over the weekend. > > > > I will take your modified test and run it with and without my > > optimization. In parallel, I will try to generate some (non-random) > > numbers for other tests. > > > > I'll be back as soon as I have results. > > Thanks for trying the test and also for deriving some call stats from a > real example. I'm keen to see how much your patch improves things. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill > > > > From martin.doerr at sap.com Mon Nov 4 16:43:01 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 4 Nov 2019 16:43:01 +0000 Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread In-Reply-To: References: <3f80fc0b-2388-d4be-3c84-4af516e9635f@oracle.com> <15a92da5-c5ba-ce55-341d-5f60acf14c3a@oracle.com> <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com> <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com> <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com> <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com> Message-ID: Hi again, it is possible to release the handles, but it comes with a much higher complexity: http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webrev.04/ If we can replace oops in handles like NativeAccess<>::oop_store((oop*)compiler2_object(i), thread_oop()); we only need to change one place (possibly_add_compiler_threads) and that's it. Best regards, Martin > -----Original Message----- > From: Doerr, Martin > Sent: Montag, 4. November 2019 12:13 > To: 'dean.long at oracle.com' ; David Holmes > ; Kim Barrett > Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) > ; hotspot-compiler-dev at openjdk.java.net > Subject: RE: JDK-8230459: Test failed to resume JVMCI CompilerThread > > Hi all, > > @Dean > > changing can_remove() to CompileBroker::can_remove()? > Yes. That would be an option. > > @Kim, David > I think there's another problem with this implementation. > It introduces a use-after-free pattern due to concurrency. > Compiler threads may still read the oops from the handles after one of them > has called destroy_global until next safepoint. It doesn't matter which values > they get in this case, but the VM should not crash. I believe that OopStorage > allows freeing storage without safepoints, so this may be unsafe. Right? > > If so, I think replacing the oops in the handles (and keeping the handles alive) > would be better. And also much more simple. > > Best regards, > Martin > > > > -----Original Message----- > > From: dean.long at oracle.com > > Sent: Samstag, 2. November 2019 08:36 > > To: Doerr, Martin ; David Holmes > > ; Kim Barrett > > Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) > > ; hotspot-compiler-dev at openjdk.java.net > > Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread > > > > Hi Martin, > > > > On 10/30/19 3:18 AM, Doerr, Martin wrote: > > > Hi David, > > > > > >> I don't think factoring out CompileBroker::clear_compiler2_object when > > >> it is only used once was warranted, but that's call for compiler team to > > >> make. > > > I did that because _compiler2_objects is private and there's currently no > > setter available. > > > But let's see what the compiler folks think. > > > > how about changing can_remove() to CompileBroker::can_remove()? Then > > you > > can access _compiler2_objects directly, right? > > > > dl > > >> Otherwise changes seem fine and I have noted the use of the > > >> MutexUnlocker as per your direct email. > > > Thanks a lot for reviewing. It was not a trivial one ?? > > > > > > You had noticed an incorrect usage of the CHECK macro. I've created a > new > > bug for that: > > > https://bugs.openjdk.java.net/browse/JDK-8233193 > > > Would be great if you could take a look if that's what you meant and > made > > adaptions if needed. > > > > > > Best regards, > > > Martin > > > > > > > > >> -----Original Message----- > > >> From: David Holmes > > >> Sent: Mittwoch, 30. Oktober 2019 05:47 > > >> To: Doerr, Martin ; Kim Barrett > > >> > > >> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) > > >> ; hotspot-compiler- > dev at openjdk.java.net; > > >> David Holmes > > >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread > > >> > > >> Hi Martin, > > >> > > >> On 29/10/2019 12:06 am, Doerr, Martin wrote: > > >>> Hi David and Kim, > > >>> > > >>> I think it's easier to talk about code. So here's a new webrev: > > >>> > > >> > > > http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr > > >> ev.03/ > > >> > > >> I don't think factoring out CompileBroker::clear_compiler2_object when > > >> it is only used once was warranted, but that's call for compiler team to > > >> make. Otherwise changes seem fine and I have noted the use of the > > >> MutexUnlocker as per your direct email. > > >> > > >> Thanks, > > >> David > > >> ----- > > >> > > >>> @Kim: > > >>> Thanks for looking at the handle related parts. It's ok if you don't want > to > > >> be a reviewer of the whole change. > > >>>> I think it's weird that can_remove() is a predicate with optional side > > >>>> effects. I think it would be simpler to have it be a pure predicate, > > >>>> and have the one caller with do_it = true perform the updates. That > > >>>> should include NULLing out the handle pointer (perhaps debug-only, > > but > > >>>> it doesn't cost much to cleanly maintain the data structure). > > >>> Nevertheless, it has the advantage that it enforces the update to be > > >> consistent. > > >>> A caller could use it without holding the lock or mess it up otherwise. > > >>> In addition, I don't what to change that as part of this fix. > > >>> > > >>>> So far as I can tell, THREAD == NULL here. > > >>> This is a very tricky part (not my invention): > > >>> EXCEPTION_MARK contains an ExceptionMark constructor call which > > sets > > >> __the_thread__ to Thread::current(). > > >>> I don't want to publish my opinion about this ?? > > >>> > > >>> @David: > > >>> Seems like this option is preferred over option 3 > > >> (possibly_add_compiler_threads part of webrev.02 and leave the > > >> initialization as is). > > >>> So when you're ok with it, I'll request a 2nd review from the compiler > > folks > > >> (I should change the subject to contain RFR). > > >>> Thanks, > > >>> Martin > > >>> > > >>> > > >>>> -----Original Message----- > > >>>> From: David Holmes > > >>>> Sent: Montag, 28. Oktober 2019 05:04 > > >>>> To: Kim Barrett > > >>>> Cc: Doerr, Martin ; Vladimir Kozlov > > >>>> (vladimir.kozlov at oracle.com) ; > hotspot- > > >>>> compiler-dev at openjdk.java.net > > >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI > CompilerThread > > >>>> > > >>>> On 28/10/2019 1:42 pm, Kim Barrett wrote: > > >>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes > > >> > > >>>> wrote: > > >>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote: > > >>>>>>> Hi Kim, > > >>>>>>> I didn't like using the OopStorage stuff directly, either. I just have > > not > > >>>> seen how to allocate a global handle and add the oop later. > > >>>>>>> Thanks for pointing me to JVMCI::make_global. I was not aware > of > > >> that. > > >>>>>>> So I can imagine 3 ways to implement it: > > >>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just > > added > > >>>> that to > > >> > > > http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr > > >>>> ev.01/ > > >>>>>>> We may want to improve that further by setting the handle > pointer > > to > > >>>> NULL and asserting that it is NULL before adding the new one. > > >>>>>>> I had been concerned about NULLs in the array, but looks like the > > >>>> existing code can deal with that. > > >>>>>> I think it would be cleaner to both destroy the global handle and > > NULL it > > >> in > > >>>> the array at the same time. > > >>>>>> This comment > > >>>>>> > > >>>>>> 325 // Old j.l.Thread object can die here. > > >>>>>> > > >>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct- > > >threadObj() > > >> so > > >>>> can't "die" until that is also cleared during the actual termination > > process. > > >>>>> I think if there is such a thread here that it can't die, because the > > >>>>> death predicate (the can_remove stuff) won't see that old thread as > > >>>>> the last thread in _compiler2_objects. That's what I meant by this: > > >>>>> > > >>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett > > > >>>> wrote: > > >>>>>> I also think that here: > > >>>>>> > > >>>>>> 947 jobject thread_handle = > > >> JNIHandles::make_global(thread_oop); > > >>>>>> 948 _compiler2_objects[i] = thread_handle; > > >>>>>> > > >>>>>> should assert _compiler2_objects[i] == NULL. Or if that isn't a valid > > >>>>>> assertion then I think there are other problems. > > >>>>> I think either that comment about an old thread is wrong (and the > > NULL > > >>>>> assertion I suggested is okay), or I think the whole mechanism here > > >>>>> has problems. Or at least I was unable to figure out how it could > > work... > > >>>>> > > >>>> I'm not following sorry. You can't assert NULL unless it's actually set > > >>>> to NULL which it presently isn't. But it could be set NULL as Martin > > >>>> suggested: > > >>>> > > >>>> "We may want to improve that further by setting the handle pointer > to > > >>>> NULL and asserting that it is NULL before adding the new one." > > >>>> > > >>>> and which I also supported. But that aside once the delete_global has > > >>>> been called that JNIHandle no longer references the j.l.Thread that it > > >>>> did, at which point it is only reachable via the threadObj() of the > > >>>> CompilerThread. > > >>>> > > >>>> David From thomas.stuefe at gmail.com Mon Nov 4 17:02:26 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 4 Nov 2019 18:02:26 +0100 Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free blocks In-Reply-To: <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com> References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com> <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com> <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com> <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com> <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com> <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com> <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com> <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com> Message-ID: Hi Lutz, I like this patch and have a number of remarks: Segment map splicing: - If the last byte of the leading block happens to contain 0xFD, you do not need to increment the fragmentation counter after splicing since the leading hop would be fully formed. - If patch complexity is a concern, I would maybe leave out the segmap_template. That would simplify the coding a bit and I am not sure how much this actually brings. I really am curious. Is it better to let the compiler store immediates or to memcpy the template? Would the latter not mean I pay for the loads too? - In ::allocate(), I like that you re-use block. This is just bikeshedding, but you also could reshape this function a tiny bit for a clean single exit at the end of the function. - Comment nits: - * code block (e.g. nmethod) when only a pointer to somewhere inside the + * code block (e.g. nmethod) when only a pointer to a location inside the - * - Segment addresses should be aligned to be multiples of CodeCacheSegmentSize. + * - Segment start addresses should be aligned to be multiples of CodeCacheSegmentSize. - * - Allocation in the code cache can only happen at segment boundaries. + * - Allocation in the code cache can only happen at segment start addresses. - I would like comments for the newly added functions in heap.hpp, not in the cpp file, because my IDE expands those comments from the hpp. But thats just me :=) - block_at() could be implemented now using address_for and a cast - fragmentation_limit, freelist_limit: I thought using enums for constants is not done anymore, and the preferred way is to use static const. See e.g. markWord.hpp. - + // Boundaries of committed space. + // Boundaries of reserved space. Thanks for the comments. Out of curiosity, can the lower part be uncommitted? Or would low() == low_boundary() always be true? - + bool contains(const void* p) const { return low() <= p && p < high(); } Is comparing with void* legal C++? - CodeHeap::merge_right(): - Nits, I would maybe rename "follower" to "beg_follower", or, alternatively, "beg" to "leader". - I wonder whether we can have a "wipe header function" which only wipes the FreeBlock header of the follower, instead of wiping the whole segment. - CodeHeap::add_to_freelist(): I am astonished that this brings such a performance increase. I would naively have thought the deallocation to be pretty random, and therefore this technique to have an improvement of factor 2 in general, but we also need to find the start of the last_insert_free_block which may take some hops, and then, the block might not even be free... -- General remarks, not necessarily for your patch: - This code could really gain readability if the segment id would be its an own type, e.g. "typdef int segid_t". APIs like "mark_segmap_as_used(segid_t beg, segid_t end) are immediately clearer than when size_t is used. - In CodeHeap::verify(), do we actually check that the segment map is correctly formed? That from every point in the map, we reach the correct start of the associated code blob? I could not find this but I may be blind. Thanks, Thomas On Thu, Oct 31, 2019 at 5:55 PM Schmidt, Lutz wrote: > Hi Andrew, (and hi to the interested crowd), > > Please accept my apologies for taking so long to get back. > > These tests (OverflowCodeCacheTest and StressCodeCacheTest) were causing > me quite some headaches. Some layer between me and the test prevents the vm > (in particular: the VMThread) from terminating normally. The final output > from my time measurements is therefore not generated or thrown away. Adding > to that were some test machine unavailabilities and a bug in my measurement > code, causing crashes. > > Anyway, I added some on-the-fly output, printing the timer values after > 10k measurement intervals. This reveals some interesting, additional facts > about the tests and the CodeHeap management methods. For detailed numbers, > refer to the files attached to the bug ( > https://bugs.openjdk.java.net/browse/JDK-8231460). For even more detail, > I can provide the jtr files on request. > > > OverflowCodeCacheTest > ===================== > This test runs (in my setup) with a 1GB CodeCache. > > For this test, CodeHeap::mark_segmap_as_used() is THE performance hog. 40% > of all calls have to mark more than 16k segment map entries (in the not > optimized case). Basically all of these calls convert to len=1 calls with > the optimization turned on. Note that during FreeBlock joining, the segment > count is forced to 1(one). No wonder the time spent in > CodeHeap::mark_segmap_as_used() collapses from >80sec (half of the test > runtime) to <100msec. > > CodeHeap::add_to_freelist() on the other hand, is almost not observable. > Average free list length is at two elements, making even linear search > really quick. > > > StressCodeCacheTest > =================== > With a 1GB CodeCache, this test runs into a 12 min timeout, set by our > internal test environment. Scaling back to 300MB prevents the test from > timing out. > > For this test, CodeHeap::mark_segmap_as_used() is not a factor. From > 200,000 calls, only a few (less than 3%) had to process a block consisting > of more than 16 segments. Note that during FreeBlock joining, the segment > count is forced to 1(one). > > Another method is popping up as performance hog instead: > CodeHeap::add_to_freelist(). More than 8 out of 40 seconds of test runtime > (before optimization) are spent in this method, for just 160,000 calls. The > test seems to create a long list of non-contiguous free blocks (around > 5,500 on average). This list is linearly scanned to find the insert point > for the free block at hand. > > Suffering as well from the long free block list is > CodeHeap::search_freelist(). It uses another 2.7 seconds for 270,000 > calls. > > > SPEVjvm2008 suite > ================= > With respect to the task at hand, this is a well-behaved test suite. > Timing shows some before/after difference, but nothing spectacular. The > measurements due not provide evidence of a performance bottleneck. > > > There were some minor adjustments to the code. Unused code blocks have > been removed as well. I have therefore created a new webrev. You can find > it here: > http://cr.openjdk.java.net/~lucy/webrevs/8231460.01/ > > Thanks for investing your time! > Lutz > > > On 21.10.19, 15:06, "Andrew Dinn" wrote: > > Hi Lutz, > > On 21/10/2019 13:37, Schmidt, Lutz wrote: > > I understand what you are interested in. And I was hoping to be able > > to provide some (first) numbers by today. Unfortunately, the > > measurement code I activated last Friday was buggy and blew most of > > the tests I had hoped to run over the weekend. > > > > I will take your modified test and run it with and without my > > optimization. In parallel, I will try to generate some (non-random) > > numbers for other tests. > > > > I'll be back as soon as I have results. > > Thanks for trying the test and also for deriving some call stats from a > real example. I'm keen to see how much your patch improves things. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill > > > > > From thomas.stuefe at gmail.com Mon Nov 4 17:04:41 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 4 Nov 2019 18:04:41 +0100 Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free blocks In-Reply-To: <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com> References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com> <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com> <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com> <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com> <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com> <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com> <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com> <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com> <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com> Message-ID: Hi Andrew, Lutz, I agree with Lutz in this case. I think the patch complexity could be reduced if needed (see my mail to Lutz) if complexity is a concern, but I like most of these changes and the comment improvements are nice. Just my 5 cent. Cheers, Thomas On Mon, Nov 4, 2019 at 4:36 PM Schmidt, Lutz wrote: > Hi Andrew, > > thank you for your thoughts. I do not agree to your conclusion, though. > > There are two bottlenecks in the CodeHeap management code. One is in > CodeHeap::mark_segmap_as_used(), uncovered by OverflowCodeCacheTest.java. > The other is in CodeHeap::add_to_freelist(), uncovered by > StressCodeCacheTest.java. > > Both bottlenecks are tackled by the recommended changeset. > > CodeHeap::mark_segmap_as_used() is no longer O(n*n) for the critical > "FreeBlock-join" case. It actually is O(1) now. The time reduction from > > 80 seconds to just a few milliseconds is proof of that statement. > > CodeHeap::add_to_freelist() is still O(n*n), with n being the free list > length. But the kick-in point of the non-linearity could be significantly > shifted towards larger n. The time reduction from approx. 8 seconds to 160 > milliseconds supports this statement. > > I agree it would be helpful to have a "real-world" example showing some > improvement. Providing such evidence is hard, though. I could instrument > the code and print some values form time to time. It's certain this > additional output will mess up success/failure decisions in our test > environment. Not sure everybody likes that. But I will give it a try and > take the hits. This will be a multi-day effort. > > On a general note, I am always uncomfortable knowing of a O(n*n) effort, > in particular when it could be removed or at least tamed considerably. > Experience tells (at least to me) that, at some point in time, n will be > large enough to hurt. > > I'll be back. > > Thanks, > Lutz > > > ?On 04.11.19, 11:08, "Andrew Dinn" wrote: > > Hi Lutz, > > I'll summarize my thoughts here rather than answer point by point. > > The patch successfully addresses the worst case performance but it > seems > to me extremely unlikely that we will see anything that approaches that > case in real applications. So, that doesn't argue for pushing the > patch. > > The patch does not seem to make a significant difference to the stress > test. This test is also not necessarily 'representative' of real cases > but it is much more likely to be so than the worst case test. That > suggests to me that the current patch is perhaps not worth pursuing (it > ain't really broke so ...). Especially so given that it is not > possible > to distinguish any benefit when running the Spec benchmark apps. One > could argue that the patch looks like it will do no harm and may do > good > in pathological cases but that's not really good enough reason to make > a change. We really need evidence that this is worth doing. > > The free list 'search bottleneck' certainly looks like a more promising > problem to tackle than the 'merge problem'. However, once again this > 'problem' may just be an artefact of running this specific test rather > than anything that might happen in real life. > > I think the only way to find out for sure whether the current patch or > a > patch that addresses the 'search bottleneck' is going to be beneficial > is to instrument the JVM to record traces for code-cache use from real > apps and then replay allocations/frees based on those traces to see > what > difference a patch makes and how much this might help the overall > execution time. > > regards, > > > Andrew Dinn > ----------- > > On 31/10/2019 16:55, Schmidt, Lutz wrote: > > Hi Andrew, (and hi to the interested crowd), > > > > Please accept my apologies for taking so long to get back. > > > > These tests (OverflowCodeCacheTest and StressCodeCacheTest) were > causing me quite some headaches. Some layer between me and the test > prevents the vm (in particular: the VMThread) from terminating normally. > The final output from my time measurements is therefore not generated or > thrown away. Adding to that were some test machine unavailabilities and a > bug in my measurement code, causing crashes. > > > > Anyway, I added some on-the-fly output, printing the timer values > after 10k measurement intervals. This reveals some interesting, additional > facts about the tests and the CodeHeap management methods. For detailed > numbers, refer to the files attached to the bug ( > https://bugs.openjdk.java.net/browse/JDK-8231460). For even more detail, > I can provide the jtr files on request. > > > > > > OverflowCodeCacheTest > > ===================== > > This test runs (in my setup) with a 1GB CodeCache. > > > > For this test, CodeHeap::mark_segmap_as_used() is THE performance > hog. 40% of all calls have to mark more than 16k segment map entries (in > the not optimized case). Basically all of these calls convert to len=1 > calls with the optimization turned on. Note that during FreeBlock joining, > the segment count is forced to 1(one). No wonder the time spent in > CodeHeap::mark_segmap_as_used() collapses from >80sec (half of the test > runtime) to <100msec. > > > > CodeHeap::add_to_freelist() on the other hand, is almost not > observable. Average free list length is at two elements, making even linear > search really quick. > > > > > > StressCodeCacheTest > > =================== > > With a 1GB CodeCache, this test runs into a 12 min timeout, set by > our internal test environment. Scaling back to 300MB prevents the test from > timing out. > > > > For this test, CodeHeap::mark_segmap_as_used() is not a factor. From > 200,000 calls, only a few (less than 3%) had to process a block consisting > of more than 16 segments. Note that during FreeBlock joining, the segment > count is forced to 1(one). > > > > Another method is popping up as performance hog instead: > CodeHeap::add_to_freelist(). More than 8 out of 40 seconds of test runtime > (before optimization) are spent in this method, for just 160,000 calls. The > test seems to create a long list of non-contiguous free blocks (around > 5,500 on average). This list is linearly scanned to find the insert point > for the free block at hand. > > > > Suffering as well from the long free block list is > CodeHeap::search_freelist(). It uses another 2.7 seconds for 270,000 > calls. > > > > > > SPEVjvm2008 suite > > ================= > > With respect to the task at hand, this is a well-behaved test suite. > Timing shows some before/after difference, but nothing spectacular. The > measurements due not provide evidence of a performance bottleneck. > > > > > > There were some minor adjustments to the code. Unused code blocks > have been removed as well. I have therefore created a new webrev. You can > find it here: > > http://cr.openjdk.java.net/~lucy/webrevs/8231460.01/ > > > > Thanks for investing your time! > > Lutz > > > > > > On 21.10.19, 15:06, "Andrew Dinn" wrote: > > > > Hi Lutz, > > > > On 21/10/2019 13:37, Schmidt, Lutz wrote: > > > I understand what you are interested in. And I was hoping to > be able > > > to provide some (first) numbers by today. Unfortunately, the > > > measurement code I activated last Friday was buggy and blew > most of > > > the tests I had hoped to run over the weekend. > > > > > > I will take your modified test and run it with and without my > > > optimization. In parallel, I will try to generate some > (non-random) > > > numbers for other tests. > > > > > > I'll be back as soon as I have results. > > > > Thanks for trying the test and also for deriving some call stats > from a > > real example. I'm keen to see how much your patch improves > things. > > > > regards, > > > > > > Andrew Dinn > > ----------- > > Senior Principal Software Engineer > > Red Hat UK Ltd > > Registered in England and Wales under Company Registration No. > 03798903 > > Directors: Michael Cunningham, Michael ("Mike") O'Neill > > > > > > > > > > > > From igor.veresov at oracle.com Mon Nov 4 17:57:13 2019 From: igor.veresov at oracle.com (Igor Veresov) Date: Mon, 4 Nov 2019 09:57:13 -0800 Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003 In-Reply-To: References: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn> <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com> Message-ID: This look good to me. igor > On Nov 4, 2019, at 1:33 AM, Jie Fu wrote: > > Hi Aleksey and Tobias, > > Thanks for your review and valuable comments. > > I'm sorry to mention that Igor is teaching me how to fix the bug off the list these days. > > What do you think of this version? > http://cr.openjdk.java.net/~jiefu/8233429/webrev.01/ > > I prefer webrev.01. > > Thanks a lot. > Best regards, > Jie > > > On 2019/11/4 ??4:53, Aleksey Shipilev wrote: >> On 11/2/19 10:29 AM, Jie Fu wrote: >>> Hi all, >>> >>> May I get reviews for this small fix? >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233429 >>> Webrev: http://cr.openjdk.java.net/~jiefu/8233429/webrev.00/ >> Looks fine to me. >> >> The alternative is to stub out CompilationModeFlag::*() definitions under TIERED define, but that >> would be more awkward than effectively using the "default" mode for minimal and zero VMs. >> > From shade at redhat.com Mon Nov 4 18:02:15 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 4 Nov 2019 19:02:15 +0100 Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003 In-Reply-To: References: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn> <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com> Message-ID: <268b6cd8-7b66-5e5a-4933-e67197ae34ec@redhat.com> On 11/4/19 10:33 AM, Jie Fu wrote: > What do you think of this version? > ? http://cr.openjdk.java.net/~jiefu/8233429/webrev.01/ Alright, this looks fine too. -- Thanks, -Aleksey From dean.long at oracle.com Mon Nov 4 18:58:50 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 4 Nov 2019 10:58:50 -0800 Subject: RFR(S): 8233081: C1: PatchingStub for field access copies too much In-Reply-To: References: Message-ID: Looks good.? If I am reading it right, you also fixed a (latent?) bug in Sparc by changing the max size return from 7 words to 8 words. dl On 10/30/19 8:38 AM, Doerr, Martin wrote: > Hi, > > I'd like to fix an issue in C1's PatchingStub implementation for "access_field_id". > We had noticed that the code in the template exceeded the 255 byte limitation when switching on VerifyOops on PPC64. > I'd like to improve the situation for all platforms. > > More detailed bug description: > https://bugs.openjdk.java.net/browse/JDK-8233081 > > I need a function to determine how many bytes are needed for the NativeMovRegMem. > x86 has next_instruction_address() which could in theory be used, but I noticed that it's dead code which is no longer correct. > Is it ok to remove it? > I'd also like to remove the constant instruction_size from NativeMovRegMem because it's not constant. > I'd prefer to introduce num_bytes_to_end_of_patch() for the purpose of determining how many bytes to copy for the "access_field_id" PatchingStub. > We can factor out the offset computation from offset() and set_offset() and reuse it. This enforces consistency. > > Webrev: > http://cr.openjdk.java.net/~mdoerr/8233081_C1_access_field_patching/webrev.00/ > > Please review. > > Best regards, > Martin > From igor.ignatyev at oracle.com Mon Nov 4 21:33:31 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 4 Nov 2019 13:33:31 -0800 Subject: RFR(S) : 8233496 : AOT tests failures with 'java.lang.RuntimeException: Failed to find sun/hotspot/WhiteBox.class' Message-ID: <87B508A4-0274-4084-B8B2-1A23FB9B8D26@oracle.com> http://cr.openjdk.java.net/~iignatyev//8233496/webrev.00/index.html > 42 lines changed: 0 ins; 6 del; 36 mod; Hi all, could you please review this small patch for compiler/aot tests? the tests run 'ClassFileInstaller sun.hotspot.WhiteBox' w/o having any preceding actions which build s.h.WhiteBox class, the fix adds sun.hotspot.WhiteBox to the explicit build action and also removes unneeded classes (compiler.aot.AotCompiler and compiler.calls.common.InvokeDynamicPatcher as they are built implicitly by @run) from it. JBS: https://bugs.openjdk.java.net/browse/JDK-8233496 webrev: http://cr.openjdk.java.net/~iignatyev//8233496/webrev.00/index.html testing: - all compiler/aot tests together on Oracle platforms - each changed test separately on linux-x64 Thanks, -- Igor From vladimir.kozlov at oracle.com Mon Nov 4 21:45:00 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 4 Nov 2019 13:45:00 -0800 Subject: RFR(S) : 8233496 : AOT tests failures with 'java.lang.RuntimeException: Failed to find sun/hotspot/WhiteBox.class' In-Reply-To: <87B508A4-0274-4084-B8B2-1A23FB9B8D26@oracle.com> References: <87B508A4-0274-4084-B8B2-1A23FB9B8D26@oracle.com> Message-ID: <3845ADEC-2657-421F-B35F-C12962A31452@oracle.com> Looks good. Thanks Vladimir > On Nov 4, 2019, at 1:33 PM, Igor Ignatyev wrote: > > http://cr.openjdk.java.net/~iignatyev//8233496/webrev.00/index.html >> 42 lines changed: 0 ins; 6 del; 36 mod; > > Hi all, > > could you please review this small patch for compiler/aot tests? the tests run 'ClassFileInstaller sun.hotspot.WhiteBox' w/o having any preceding actions which build s.h.WhiteBox class, the fix adds sun.hotspot.WhiteBox to the explicit build action and also removes unneeded classes (compiler.aot.AotCompiler and compiler.calls.common.InvokeDynamicPatcher as they are built implicitly by @run) from it. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8233496 > webrev: http://cr.openjdk.java.net/~iignatyev//8233496/webrev.00/index.html > testing: > - all compiler/aot tests together on Oracle platforms > - each changed test separately on linux-x64 > > Thanks, > -- Igor > > > From gromero at linux.vnet.ibm.com Mon Nov 4 22:32:39 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 4 Nov 2019 19:32:39 -0300 Subject: [8u] RFR for backport of 8198894 (CRC32 1/4): [PPC64] More generic vector CRC implementation (v2) In-Reply-To: References: <0dc83fcb-4e09-5841-04be-aee615e5a7fd@linux.vnet.ibm.com> <67e6e482-df56-27d0-da20-7968615f3ea1@linux.vnet.ibm.com> Message-ID: Hello Martin, On 10/24/2019 07:17 AM, Doerr, Martin wrote: > Hi Gustavo, > > I think removing invertCRC is an unnecessary manual change. > We should minimize that as far as possible. They may create merge conflicts for future backports. Thanks a lot for the review. I agree I should minimize the changes as far as possible. I added back invertCRC and tried to follow your advice, so the final clean-up patch is almost similar to the one found on jdk/jdk, for instance. Please find v2 for the patchset below. v2 changes affect only 3/4 and 4/4. [PPC64] More generic vector CRC implementation (1/4) http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8198894/ v2: - Adapt file names to OpenJDK 8u - Remove CRC32C part, leaving only CRC32 part, since OpenJDK 8u has no CRC32C - Add Assembler::add_const_optimized() from "8077838: Recent developments for ppc" [0] - Fix vpermxor() opcode, replacing VPMSUMW_OPCODE by VPERMXOR_OPCODE, accordingly to fix in "8190781: ppc64 + s390: Fix CriticalJNINatives" [1] - Adapt signatures for the following functions and their callers, accordingly to "8175369: [ppc] Provide intrinsic implementation for CRC32C" [2]: a. MacroAssembler::update_byteLoop_crc32(), removing 'invertCRC' parameter b. MacroAssembler::kernel_crc32_1word(), adding 'invertCRC' parameter [PPC64] Possibly unreliable stack frame resizing in template interpreter (2/4) http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8216376/ v2: - Adapt file names to OpenJDK 8u - Remove CRC32C code [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8216060/ (3/4) v2: - Remove CRC32C code, keeping is_crc32c in crc32(), code related to is_crc32c and invertCRC, like code in kernel_crc32_vpmsum(), and not touching stub code mark in generate_CRC32_updateBytes() to avoid merge conflicts in future backports. [PPC64] Cleanup non-vector version of CRC32 http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8217459/ (4/4) v2: - Add {BIG,LITTLE}_ENDIAN_ONLY to src/share/vm/utilities/macros.hpp - Add kernel_crc32_singleByteReg from change 8175369 [2] as the clean-up uses it in InterpreterGenerator::generate_CRC32_update_entry(). -- Best regards, Gustavo [0] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/88847a1b3718 [1] http://hg.openjdk.java.net/jdk/jdk/rev/5a69ba3a4fd1#l1.7 [2] https://bugs.openjdk.java.net/browse/JDK-8175369 From david.holmes at oracle.com Mon Nov 4 23:18:48 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 5 Nov 2019 09:18:48 +1000 Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread In-Reply-To: References: <15a92da5-c5ba-ce55-341d-5f60acf14c3a@oracle.com> <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com> <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com> <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com> <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com> Message-ID: <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com> On 4/11/2019 9:12 pm, Doerr, Martin wrote: > Hi all, > > @Dean >> changing can_remove() to CompileBroker::can_remove()? > Yes. That would be an option. > > @Kim, David > I think there's another problem with this implementation. > It introduces a use-after-free pattern due to concurrency. > Compiler threads may still read the oops from the handles after one of them has called destroy_global until next safepoint. It doesn't matter which values they get in this case, but the VM should not crash. I believe that OopStorage allows freeing storage without safepoints, so this may be unsafe. Right? I don't understand what you mean. If a compiler thread holds an oop, any oop, it must hold it in a Handle to ensure it can't be gc'd. David > If so, I think replacing the oops in the handles (and keeping the handles alive) would be better. And also much more simple. > > Best regards, > Martin > > >> -----Original Message----- >> From: dean.long at oracle.com >> Sent: Samstag, 2. November 2019 08:36 >> To: Doerr, Martin ; David Holmes >> ; Kim Barrett >> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) >> ; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread >> >> Hi Martin, >> >> On 10/30/19 3:18 AM, Doerr, Martin wrote: >>> Hi David, >>> >>>> I don't think factoring out CompileBroker::clear_compiler2_object when >>>> it is only used once was warranted, but that's call for compiler team to >>>> make. >>> I did that because _compiler2_objects is private and there's currently no >> setter available. >>> But let's see what the compiler folks think. >> >> how about changing can_remove() to CompileBroker::can_remove()? Then >> you >> can access _compiler2_objects directly, right? >> >> dl >>>> Otherwise changes seem fine and I have noted the use of the >>>> MutexUnlocker as per your direct email. >>> Thanks a lot for reviewing. It was not a trivial one ?? >>> >>> You had noticed an incorrect usage of the CHECK macro. I've created a new >> bug for that: >>> https://bugs.openjdk.java.net/browse/JDK-8233193 >>> Would be great if you could take a look if that's what you meant and made >> adaptions if needed. >>> >>> Best regards, >>> Martin >>> >>> >>>> -----Original Message----- >>>> From: David Holmes >>>> Sent: Mittwoch, 30. Oktober 2019 05:47 >>>> To: Doerr, Martin ; Kim Barrett >>>> >>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) >>>> ; hotspot-compiler-dev at openjdk.java.net; >>>> David Holmes >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread >>>> >>>> Hi Martin, >>>> >>>> On 29/10/2019 12:06 am, Doerr, Martin wrote: >>>>> Hi David and Kim, >>>>> >>>>> I think it's easier to talk about code. So here's a new webrev: >>>>> >>>> >> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr >>>> ev.03/ >>>> >>>> I don't think factoring out CompileBroker::clear_compiler2_object when >>>> it is only used once was warranted, but that's call for compiler team to >>>> make. Otherwise changes seem fine and I have noted the use of the >>>> MutexUnlocker as per your direct email. >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>>> @Kim: >>>>> Thanks for looking at the handle related parts. It's ok if you don't want to >>>> be a reviewer of the whole change. >>>>>> I think it's weird that can_remove() is a predicate with optional side >>>>>> effects. I think it would be simpler to have it be a pure predicate, >>>>>> and have the one caller with do_it = true perform the updates. That >>>>>> should include NULLing out the handle pointer (perhaps debug-only, >> but >>>>>> it doesn't cost much to cleanly maintain the data structure). >>>>> Nevertheless, it has the advantage that it enforces the update to be >>>> consistent. >>>>> A caller could use it without holding the lock or mess it up otherwise. >>>>> In addition, I don't what to change that as part of this fix. >>>>> >>>>>> So far as I can tell, THREAD == NULL here. >>>>> This is a very tricky part (not my invention): >>>>> EXCEPTION_MARK contains an ExceptionMark constructor call which >> sets >>>> __the_thread__ to Thread::current(). >>>>> I don't want to publish my opinion about this ?? >>>>> >>>>> @David: >>>>> Seems like this option is preferred over option 3 >>>> (possibly_add_compiler_threads part of webrev.02 and leave the >>>> initialization as is). >>>>> So when you're ok with it, I'll request a 2nd review from the compiler >> folks >>>> (I should change the subject to contain RFR). >>>>> Thanks, >>>>> Martin >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes >>>>>> Sent: Montag, 28. Oktober 2019 05:04 >>>>>> To: Kim Barrett >>>>>> Cc: Doerr, Martin ; Vladimir Kozlov >>>>>> (vladimir.kozlov at oracle.com) ; hotspot- >>>>>> compiler-dev at openjdk.java.net >>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread >>>>>> >>>>>> On 28/10/2019 1:42 pm, Kim Barrett wrote: >>>>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes >>>> >>>>>> wrote: >>>>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote: >>>>>>>>> Hi Kim, >>>>>>>>> I didn't like using the OopStorage stuff directly, either. I just have >> not >>>>>> seen how to allocate a global handle and add the oop later. >>>>>>>>> Thanks for pointing me to JVMCI::make_global. I was not aware of >>>> that. >>>>>>>>> So I can imagine 3 ways to implement it: >>>>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just >> added >>>>>> that to >>>> >> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr >>>>>> ev.01/ >>>>>>>>> We may want to improve that further by setting the handle pointer >> to >>>>>> NULL and asserting that it is NULL before adding the new one. >>>>>>>>> I had been concerned about NULLs in the array, but looks like the >>>>>> existing code can deal with that. >>>>>>>> I think it would be cleaner to both destroy the global handle and >> NULL it >>>> in >>>>>> the array at the same time. >>>>>>>> This comment >>>>>>>> >>>>>>>> 325 // Old j.l.Thread object can die here. >>>>>>>> >>>>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct- >>> threadObj() >>>> so >>>>>> can't "die" until that is also cleared during the actual termination >> process. >>>>>>> I think if there is such a thread here that it can't die, because the >>>>>>> death predicate (the can_remove stuff) won't see that old thread as >>>>>>> the last thread in _compiler2_objects. That's what I meant by this: >>>>>>> >>>>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett >>>>>> wrote: >>>>>>>> I also think that here: >>>>>>>> >>>>>>>> 947 jobject thread_handle = >>>> JNIHandles::make_global(thread_oop); >>>>>>>> 948 _compiler2_objects[i] = thread_handle; >>>>>>>> >>>>>>>> should assert _compiler2_objects[i] == NULL. Or if that isn't a valid >>>>>>>> assertion then I think there are other problems. >>>>>>> I think either that comment about an old thread is wrong (and the >> NULL >>>>>>> assertion I suggested is okay), or I think the whole mechanism here >>>>>>> has problems. Or at least I was unable to figure out how it could >> work... >>>>>>> >>>>>> I'm not following sorry. You can't assert NULL unless it's actually set >>>>>> to NULL which it presently isn't. But it could be set NULL as Martin >>>>>> suggested: >>>>>> >>>>>> "We may want to improve that further by setting the handle pointer to >>>>>> NULL and asserting that it is NULL before adding the new one." >>>>>> >>>>>> and which I also supported. But that aside once the delete_global has >>>>>> been called that JNIHandle no longer references the j.l.Thread that it >>>>>> did, at which point it is only reachable via the threadObj() of the >>>>>> CompilerThread. >>>>>> >>>>>> David > From fujie at loongson.cn Tue Nov 5 01:43:02 2019 From: fujie at loongson.cn (Jie Fu) Date: Tue, 5 Nov 2019 09:43:02 +0800 Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003 In-Reply-To: References: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn> <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com> Message-ID: <28b9a36b-b2ea-8186-0750-65685238cc36@loongson.cn> Hi Igor, Thanks for your help and review. Updated: http://cr.openjdk.java.net/~jiefu/8233429/webrev.02/ ?- Added the reviewers in it. Hope you can sponsor it. Thanks a lot. Best regards, Jie On 2019/11/5 ??1:57, Igor Veresov wrote: > This look good to me. > > igor > > > >> On Nov 4, 2019, at 1:33 AM, Jie Fu > > wrote: >> >> Hi Aleksey and Tobias, >> >> Thanks for your review and valuable comments. >> >> I'm sorry to mention that Igor is teaching me how to fix the bug off >> the list these days. >> >> What do you think of this version? >> http://cr.openjdk.java.net/~jiefu/8233429/webrev.01/ >> >> I prefer webrev.01. >> >> Thanks a lot. >> Best regards, >> Jie >> >> >> On 2019/11/4 ??4:53, Aleksey Shipilev wrote: >>> On 11/2/19 10:29 AM, Jie Fu wrote: >>>> Hi all, >>>> >>>> May I get reviews for this small fix? >>>> >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233429 >>>> Webrev: http://cr.openjdk.java.net/~jiefu/8233429/webrev.00/ >>> Looks fine to me. >>> >>> The alternative is to stub out CompilationModeFlag::*() definitions >>> under TIERED define, but that >>> would be more awkward than effectively using the "default" mode for >>> minimal and zero VMs. >>> >> > From fujie at loongson.cn Tue Nov 5 01:44:33 2019 From: fujie at loongson.cn (Jie Fu) Date: Tue, 5 Nov 2019 09:44:33 +0800 Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003 In-Reply-To: <268b6cd8-7b66-5e5a-4933-e67197ae34ec@redhat.com> References: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn> <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com> <268b6cd8-7b66-5e5a-4933-e67197ae34ec@redhat.com> Message-ID: Thanks Aleksey for your review. On 2019/11/5 ??2:02, Aleksey Shipilev wrote: > On 11/4/19 10:33 AM, Jie Fu wrote: >> What do you think of this version? >> ? http://cr.openjdk.java.net/~jiefu/8233429/webrev.01/ > Alright, this looks fine too. > From martin.doerr at sap.com Tue Nov 5 08:40:02 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 5 Nov 2019 08:40:02 +0000 Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread In-Reply-To: <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com> References: <15a92da5-c5ba-ce55-341d-5f60acf14c3a@oracle.com> <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com> <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com> <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com> <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com> <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com> Message-ID: Hi David > I don't understand what you mean. If a compiler thread holds an oop, any > oop, it must hold it in a Handle to ensure it can't be gc'd. The problem is not related to gc. My change introduces destroy_global for the handles. This means that the OopStorage portion which has held the oop can get freed. However, other compiler threads are running concurrently. They may execute code which reads the oop from the handle which is freed by this thread. Reading stale data is not a problem here, but reading freed memory may assert or even crash in general. I can't see how OopStorage supports reading from handles which were freed by destroy_global. I think it would be safe if the freeing only occurred at safepoints, but I don't think this is the case. Best regards, Martin > -----Original Message----- > From: David Holmes > Sent: Dienstag, 5. November 2019 00:19 > To: Doerr, Martin ; dean.long at oracle.com; Kim > Barrett > Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) > ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread > > On 4/11/2019 9:12 pm, Doerr, Martin wrote: > > Hi all, > > > > @Dean > >> changing can_remove() to CompileBroker::can_remove()? > > Yes. That would be an option. > > > > @Kim, David > > I think there's another problem with this implementation. > > It introduces a use-after-free pattern due to concurrency. > > Compiler threads may still read the oops from the handles after one of > them has called destroy_global until next safepoint. It doesn't matter which > values they get in this case, but the VM should not crash. I believe that > OopStorage allows freeing storage without safepoints, so this may be > unsafe. Right? > > I don't understand what you mean. If a compiler thread holds an oop, any > oop, it must hold it in a Handle to ensure it can't be gc'd. > > David > > > If so, I think replacing the oops in the handles (and keeping the handles > alive) would be better. And also much more simple. > > > > Best regards, > > Martin > > > > > >> -----Original Message----- > >> From: dean.long at oracle.com > >> Sent: Samstag, 2. November 2019 08:36 > >> To: Doerr, Martin ; David Holmes > >> ; Kim Barrett > >> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) > >> ; hotspot-compiler-dev at openjdk.java.net > >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread > >> > >> Hi Martin, > >> > >> On 10/30/19 3:18 AM, Doerr, Martin wrote: > >>> Hi David, > >>> > >>>> I don't think factoring out CompileBroker::clear_compiler2_object > when > >>>> it is only used once was warranted, but that's call for compiler team to > >>>> make. > >>> I did that because _compiler2_objects is private and there's currently no > >> setter available. > >>> But let's see what the compiler folks think. > >> > >> how about changing can_remove() to CompileBroker::can_remove()? > Then > >> you > >> can access _compiler2_objects directly, right? > >> > >> dl > >>>> Otherwise changes seem fine and I have noted the use of the > >>>> MutexUnlocker as per your direct email. > >>> Thanks a lot for reviewing. It was not a trivial one ?? > >>> > >>> You had noticed an incorrect usage of the CHECK macro. I've created a > new > >> bug for that: > >>> https://bugs.openjdk.java.net/browse/JDK-8233193 > >>> Would be great if you could take a look if that's what you meant and > made > >> adaptions if needed. > >>> > >>> Best regards, > >>> Martin > >>> > >>> > >>>> -----Original Message----- > >>>> From: David Holmes > >>>> Sent: Mittwoch, 30. Oktober 2019 05:47 > >>>> To: Doerr, Martin ; Kim Barrett > >>>> > >>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) > >>>> ; hotspot-compiler- > dev at openjdk.java.net; > >>>> David Holmes > >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread > >>>> > >>>> Hi Martin, > >>>> > >>>> On 29/10/2019 12:06 am, Doerr, Martin wrote: > >>>>> Hi David and Kim, > >>>>> > >>>>> I think it's easier to talk about code. So here's a new webrev: > >>>>> > >>>> > >> > http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr > >>>> ev.03/ > >>>> > >>>> I don't think factoring out CompileBroker::clear_compiler2_object > when > >>>> it is only used once was warranted, but that's call for compiler team to > >>>> make. Otherwise changes seem fine and I have noted the use of the > >>>> MutexUnlocker as per your direct email. > >>>> > >>>> Thanks, > >>>> David > >>>> ----- > >>>> > >>>>> @Kim: > >>>>> Thanks for looking at the handle related parts. It's ok if you don't want > to > >>>> be a reviewer of the whole change. > >>>>>> I think it's weird that can_remove() is a predicate with optional side > >>>>>> effects. I think it would be simpler to have it be a pure predicate, > >>>>>> and have the one caller with do_it = true perform the updates. That > >>>>>> should include NULLing out the handle pointer (perhaps debug-only, > >> but > >>>>>> it doesn't cost much to cleanly maintain the data structure). > >>>>> Nevertheless, it has the advantage that it enforces the update to be > >>>> consistent. > >>>>> A caller could use it without holding the lock or mess it up otherwise. > >>>>> In addition, I don't what to change that as part of this fix. > >>>>> > >>>>>> So far as I can tell, THREAD == NULL here. > >>>>> This is a very tricky part (not my invention): > >>>>> EXCEPTION_MARK contains an ExceptionMark constructor call which > >> sets > >>>> __the_thread__ to Thread::current(). > >>>>> I don't want to publish my opinion about this ?? > >>>>> > >>>>> @David: > >>>>> Seems like this option is preferred over option 3 > >>>> (possibly_add_compiler_threads part of webrev.02 and leave the > >>>> initialization as is). > >>>>> So when you're ok with it, I'll request a 2nd review from the compiler > >> folks > >>>> (I should change the subject to contain RFR). > >>>>> Thanks, > >>>>> Martin > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: David Holmes > >>>>>> Sent: Montag, 28. Oktober 2019 05:04 > >>>>>> To: Kim Barrett > >>>>>> Cc: Doerr, Martin ; Vladimir Kozlov > >>>>>> (vladimir.kozlov at oracle.com) ; > hotspot- > >>>>>> compiler-dev at openjdk.java.net > >>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI > CompilerThread > >>>>>> > >>>>>> On 28/10/2019 1:42 pm, Kim Barrett wrote: > >>>>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes > >>>> > >>>>>> wrote: > >>>>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote: > >>>>>>>>> Hi Kim, > >>>>>>>>> I didn't like using the OopStorage stuff directly, either. I just have > >> not > >>>>>> seen how to allocate a global handle and add the oop later. > >>>>>>>>> Thanks for pointing me to JVMCI::make_global. I was not aware > of > >>>> that. > >>>>>>>>> So I can imagine 3 ways to implement it: > >>>>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just > >> added > >>>>>> that to > >>>> > >> > http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr > >>>>>> ev.01/ > >>>>>>>>> We may want to improve that further by setting the handle > pointer > >> to > >>>>>> NULL and asserting that it is NULL before adding the new one. > >>>>>>>>> I had been concerned about NULLs in the array, but looks like > the > >>>>>> existing code can deal with that. > >>>>>>>> I think it would be cleaner to both destroy the global handle and > >> NULL it > >>>> in > >>>>>> the array at the same time. > >>>>>>>> This comment > >>>>>>>> > >>>>>>>> 325 // Old j.l.Thread object can die here. > >>>>>>>> > >>>>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct- > >>> threadObj() > >>>> so > >>>>>> can't "die" until that is also cleared during the actual termination > >> process. > >>>>>>> I think if there is such a thread here that it can't die, because the > >>>>>>> death predicate (the can_remove stuff) won't see that old thread > as > >>>>>>> the last thread in _compiler2_objects. That's what I meant by this: > >>>>>>> > >>>>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett > > >>>>>> wrote: > >>>>>>>> I also think that here: > >>>>>>>> > >>>>>>>> 947 jobject thread_handle = > >>>> JNIHandles::make_global(thread_oop); > >>>>>>>> 948 _compiler2_objects[i] = thread_handle; > >>>>>>>> > >>>>>>>> should assert _compiler2_objects[i] == NULL. Or if that isn't a > valid > >>>>>>>> assertion then I think there are other problems. > >>>>>>> I think either that comment about an old thread is wrong (and the > >> NULL > >>>>>>> assertion I suggested is okay), or I think the whole mechanism here > >>>>>>> has problems. Or at least I was unable to figure out how it could > >> work... > >>>>>>> > >>>>>> I'm not following sorry. You can't assert NULL unless it's actually set > >>>>>> to NULL which it presently isn't. But it could be set NULL as Martin > >>>>>> suggested: > >>>>>> > >>>>>> "We may want to improve that further by setting the handle pointer > to > >>>>>> NULL and asserting that it is NULL before adding the new one." > >>>>>> > >>>>>> and which I also supported. But that aside once the delete_global > has > >>>>>> been called that JNIHandle no longer references the j.l.Thread that it > >>>>>> did, at which point it is only reachable via the threadObj() of the > >>>>>> CompilerThread. > >>>>>> > >>>>>> David > > From tobias.hartmann at oracle.com Tue Nov 5 08:43:45 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 5 Nov 2019 09:43:45 +0100 Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003 In-Reply-To: <28b9a36b-b2ea-8186-0750-65685238cc36@loongson.cn> References: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn> <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com> <28b9a36b-b2ea-8186-0750-65685238cc36@loongson.cn> Message-ID: Looks good to me too. Pushed. Best regards, Tobias On 05.11.19 02:43, Jie Fu wrote: > Hi Igor, > > Thanks for your help and review. > > Updated: http://cr.openjdk.java.net/~jiefu/8233429/webrev.02/ > ?- Added the reviewers in it. > > Hope you can sponsor it. > > Thanks a lot. > Best regards, > Jie > > On 2019/11/5 ??1:57, Igor Veresov wrote: >> This look good to me. >> >> igor >> >> >> >>> On Nov 4, 2019, at 1:33 AM, Jie Fu > wrote: >>> >>> Hi Aleksey and Tobias, >>> >>> Thanks for your review and valuable comments. >>> >>> I'm sorry to mention that Igor is teaching me how to fix the bug off the list these days. >>> >>> What do you think of this version? >>> http://cr.openjdk.java.net/~jiefu/8233429/webrev.01/ >>> >>> I prefer webrev.01. >>> >>> Thanks a lot. >>> Best regards, >>> Jie >>> >>> >>> On 2019/11/4 ??4:53, Aleksey Shipilev wrote: >>>> On 11/2/19 10:29 AM, Jie Fu wrote: >>>>> Hi all, >>>>> >>>>> May I get reviews for this small fix? >>>>> >>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233429 >>>>> Webrev: http://cr.openjdk.java.net/~jiefu/8233429/webrev.00/ >>>> Looks fine to me. >>>> >>>> The alternative is to stub out CompilationModeFlag::*() definitions under TIERED define, but that >>>> would be more awkward than effectively using the "default" mode for minimal and zero VMs. >>>> >>> >> From fujie at loongson.cn Tue Nov 5 08:48:40 2019 From: fujie at loongson.cn (Jie Fu) Date: Tue, 5 Nov 2019 16:48:40 +0800 Subject: RFR: 8233429: Minimal and zero VM build broken after JDK-8227003 In-Reply-To: References: <8fdd568d-4a91-47dd-22a7-abe44b547716@loongson.cn> <4cfea00d-e41d-cb08-16da-8bbc908ec59c@redhat.com> <28b9a36b-b2ea-8186-0750-65685238cc36@loongson.cn> Message-ID: <110c9573-f9b2-0bdd-04d3-7086d7c2ad6c@loongson.cn> Thank you so much, Tobias. On 2019/11/5 ??4:43, Tobias Hartmann wrote: > Looks good to me too. Pushed. > > Best regards, > Tobias > > On 05.11.19 02:43, Jie Fu wrote: >> Hi Igor, >> >> Thanks for your help and review. >> >> Updated: http://cr.openjdk.java.net/~jiefu/8233429/webrev.02/ >> ?- Added the reviewers in it. >> >> Hope you can sponsor it. >> >> Thanks a lot. >> Best regards, >> Jie >> >> On 2019/11/5 ??1:57, Igor Veresov wrote: >>> This look good to me. >>> >>> igor >>> >>> >>> >>>> On Nov 4, 2019, at 1:33 AM, Jie Fu > wrote: >>>> >>>> Hi Aleksey and Tobias, >>>> >>>> Thanks for your review and valuable comments. >>>> >>>> I'm sorry to mention that Igor is teaching me how to fix the bug off the list these days. >>>> >>>> What do you think of this version? >>>> http://cr.openjdk.java.net/~jiefu/8233429/webrev.01/ >>>> >>>> I prefer webrev.01. >>>> >>>> Thanks a lot. >>>> Best regards, >>>> Jie >>>> >>>> >>>> On 2019/11/4 ??4:53, Aleksey Shipilev wrote: >>>>> On 11/2/19 10:29 AM, Jie Fu wrote: >>>>>> Hi all, >>>>>> >>>>>> May I get reviews for this small fix? >>>>>> >>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233429 >>>>>> Webrev: http://cr.openjdk.java.net/~jiefu/8233429/webrev.00/ >>>>> Looks fine to me. >>>>> >>>>> The alternative is to stub out CompilationModeFlag::*() definitions under TIERED define, but that >>>>> would be more awkward than effectively using the "default" mode for minimal and zero VMs. >>>>> From tobias.hartmann at oracle.com Tue Nov 5 09:01:41 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 5 Nov 2019 10:01:41 +0100 Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out In-Reply-To: <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com> References: <878spbc0c8.fsf@redhat.com> <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com> Message-ID: <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com> Performance results look good. Best regards, Tobias On 04.11.19 08:25, Tobias Hartmann wrote: > Hi Roland, > > this seems reasonable to me but I'm concerned that it might cause performance regressions. I'll run > some tests in our system. > > Best regards, > Tobias > > On 23.10.19 10:50, Roland Westrelin wrote: >> >> http://cr.openjdk.java.net/~roland/8232539/webrev.00/ >> >> I couldn't come up with a test case because node processing order during >> IGVN matters. Bug was reported against 11 but I see no reason it >> wouldn't apply to current code as well. >> >> At parse time, predicates are added by >> Parse::maybe_add_predicate_after_if() but not loop is actually >> created. Compile::_major_progress is cleared. On the next round of IGVN, >> one input of a region points to predicates. The same region has an if as >> use that can be split through phi during IGVN. The predicates are going >> to be removed by IGVN. But that happens in multiple steps because there >> are several predicates (for reason Deoptimization::Reason_predicate, >> Deoptimization::Reason_loop_limit_check etc.) and because for each >> predicate one IGVN iteration must first remove the Opaque1 node, then >> another kill the IfFalse projection, finally another replace the IfTrue >> projection by the If control input. >> >> Split if occurs while predicates are in the process of being removed. It >> sees predicates, tries to walk over them, encounters a predicates that's >> been half removed (false projection removed) and we hit the assert/crash. >> >> I propose we simply not apply IGVN split if if we're splitting through a >> loop or if there's a predicate input to a region because: >> >> - Making split if robust to dying predicates is not straightforward as >> far as I can tell >> >> - Loop opts split if doesn't split through loop header so why would it >> make sense for IGVN split if? >> >> - I'm wondering if there are other cases where handling of predicates in >> split if could be wrong (and so more trouble ahead): >> >> + What if we split through a Loop region, predicates were added by >> loop optimizations, loop opts are now over so the predicates added at >> parse time were removed: then PhaseIdealLoop::find_predicate() >> wouldn't report a predicate but cloning predicates would still be >> required for correctness? >> >> + What if we have no loop, a region has predicates as input, >> predicates are going to die but have not yet been processed, split if >> uselessly duplicates predicates but one of then is control dependent >> on the branch it is in so cloning predicates actually causes a broken >> graph? >> >> So overall it feels safer to me to simply bail out from split if for >> loops/predicates. >> >> Roland. >> From rwestrel at redhat.com Tue Nov 5 09:09:56 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 05 Nov 2019 10:09:56 +0100 Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out In-Reply-To: <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com> References: <878spbc0c8.fsf@redhat.com> <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com> <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com> Message-ID: <87y2wu7kpn.fsf@redhat.com> Hi Tobias, Thanks for the review and for performance testing. Roland. From martin.doerr at sap.com Tue Nov 5 09:12:56 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 5 Nov 2019 09:12:56 +0000 Subject: RFR(S): 8233081: C1: PatchingStub for field access copies too much In-Reply-To: References: Message-ID: Hi Dean, thank you for the review. You can call it a bug fix for SPARC, but I'd rather call it removal of dead an incorrect code. Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of dean.long at oracle.com > Sent: Montag, 4. November 2019 19:59 > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): 8233081: C1: PatchingStub for field access copies too > much > > Looks good.? If I am reading it right, you also fixed a (latent?) bug in > Sparc by changing the max size return from 7 words to 8 words. > > dl > > On 10/30/19 8:38 AM, Doerr, Martin wrote: > > Hi, > > > > I'd like to fix an issue in C1's PatchingStub implementation for > "access_field_id". > > We had noticed that the code in the template exceeded the 255 byte > limitation when switching on VerifyOops on PPC64. > > I'd like to improve the situation for all platforms. > > > > More detailed bug description: > > https://bugs.openjdk.java.net/browse/JDK-8233081 > > > > I need a function to determine how many bytes are needed for the > NativeMovRegMem. > > x86 has next_instruction_address() which could in theory be used, but I > noticed that it's dead code which is no longer correct. > > Is it ok to remove it? > > I'd also like to remove the constant instruction_size from > NativeMovRegMem because it's not constant. > > I'd prefer to introduce num_bytes_to_end_of_patch() for the purpose of > determining how many bytes to copy for the "access_field_id" PatchingStub. > > We can factor out the offset computation from offset() and set_offset() > and reuse it. This enforces consistency. > > > > Webrev: > > > http://cr.openjdk.java.net/~mdoerr/8233081_C1_access_field_patching/we > brev.00/ > > > > Please review. > > > > Best regards, > > Martin > > From martin.doerr at sap.com Tue Nov 5 11:07:17 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 5 Nov 2019 11:07:17 +0000 Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread In-Reply-To: References: <15a92da5-c5ba-ce55-341d-5f60acf14c3a@oracle.com> <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com> <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com> <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com> <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com> <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com> Message-ID: Btw. this seems to be the issue Vladimir has found by running more tests: https://bugs.openjdk.java.net/secure/attachment/85306/hs_err_pid93932.log > -----Original Message----- > From: Doerr, Martin > Sent: Dienstag, 5. November 2019 09:40 > To: David Holmes ; dean.long at oracle.com; Kim > Barrett > Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) > ; hotspot-compiler-dev at openjdk.java.net > Subject: RE: JDK-8230459: Test failed to resume JVMCI CompilerThread > > Hi David > > > I don't understand what you mean. If a compiler thread holds an oop, any > > oop, it must hold it in a Handle to ensure it can't be gc'd. > > The problem is not related to gc. > My change introduces destroy_global for the handles. This means that the > OopStorage portion which has held the oop can get freed. > However, other compiler threads are running concurrently. They may > execute code which reads the oop from the handle which is freed by this > thread. Reading stale data is not a problem here, but reading freed memory > may assert or even crash in general. > I can't see how OopStorage supports reading from handles which were freed > by destroy_global. > > I think it would be safe if the freeing only occurred at safepoints, but I don't > think this is the case. > > Best regards, > Martin > > > > -----Original Message----- > > From: David Holmes > > Sent: Dienstag, 5. November 2019 00:19 > > To: Doerr, Martin ; dean.long at oracle.com; Kim > > Barrett > > Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) > > ; hotspot-compiler-dev at openjdk.java.net > > Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread > > > > On 4/11/2019 9:12 pm, Doerr, Martin wrote: > > > Hi all, > > > > > > @Dean > > >> changing can_remove() to CompileBroker::can_remove()? > > > Yes. That would be an option. > > > > > > @Kim, David > > > I think there's another problem with this implementation. > > > It introduces a use-after-free pattern due to concurrency. > > > Compiler threads may still read the oops from the handles after one of > > them has called destroy_global until next safepoint. It doesn't matter > which > > values they get in this case, but the VM should not crash. I believe that > > OopStorage allows freeing storage without safepoints, so this may be > > unsafe. Right? > > > > I don't understand what you mean. If a compiler thread holds an oop, any > > oop, it must hold it in a Handle to ensure it can't be gc'd. > > > > David > > > > > If so, I think replacing the oops in the handles (and keeping the handles > > alive) would be better. And also much more simple. > > > > > > Best regards, > > > Martin > > > > > > > > >> -----Original Message----- > > >> From: dean.long at oracle.com > > >> Sent: Samstag, 2. November 2019 08:36 > > >> To: Doerr, Martin ; David Holmes > > >> ; Kim Barrett > > >> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) > > >> ; hotspot-compiler- > dev at openjdk.java.net > > >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread > > >> > > >> Hi Martin, > > >> > > >> On 10/30/19 3:18 AM, Doerr, Martin wrote: > > >>> Hi David, > > >>> > > >>>> I don't think factoring out CompileBroker::clear_compiler2_object > > when > > >>>> it is only used once was warranted, but that's call for compiler team to > > >>>> make. > > >>> I did that because _compiler2_objects is private and there's currently > no > > >> setter available. > > >>> But let's see what the compiler folks think. > > >> > > >> how about changing can_remove() to CompileBroker::can_remove()? > > Then > > >> you > > >> can access _compiler2_objects directly, right? > > >> > > >> dl > > >>>> Otherwise changes seem fine and I have noted the use of the > > >>>> MutexUnlocker as per your direct email. > > >>> Thanks a lot for reviewing. It was not a trivial one ?? > > >>> > > >>> You had noticed an incorrect usage of the CHECK macro. I've created a > > new > > >> bug for that: > > >>> https://bugs.openjdk.java.net/browse/JDK-8233193 > > >>> Would be great if you could take a look if that's what you meant and > > made > > >> adaptions if needed. > > >>> > > >>> Best regards, > > >>> Martin > > >>> > > >>> > > >>>> -----Original Message----- > > >>>> From: David Holmes > > >>>> Sent: Mittwoch, 30. Oktober 2019 05:47 > > >>>> To: Doerr, Martin ; Kim Barrett > > >>>> > > >>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) > > >>>> ; hotspot-compiler- > > dev at openjdk.java.net; > > >>>> David Holmes > > >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI > CompilerThread > > >>>> > > >>>> Hi Martin, > > >>>> > > >>>> On 29/10/2019 12:06 am, Doerr, Martin wrote: > > >>>>> Hi David and Kim, > > >>>>> > > >>>>> I think it's easier to talk about code. So here's a new webrev: > > >>>>> > > >>>> > > >> > > > http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr > > >>>> ev.03/ > > >>>> > > >>>> I don't think factoring out CompileBroker::clear_compiler2_object > > when > > >>>> it is only used once was warranted, but that's call for compiler team to > > >>>> make. Otherwise changes seem fine and I have noted the use of the > > >>>> MutexUnlocker as per your direct email. > > >>>> > > >>>> Thanks, > > >>>> David > > >>>> ----- > > >>>> > > >>>>> @Kim: > > >>>>> Thanks for looking at the handle related parts. It's ok if you don't > want > > to > > >>>> be a reviewer of the whole change. > > >>>>>> I think it's weird that can_remove() is a predicate with optional side > > >>>>>> effects. I think it would be simpler to have it be a pure predicate, > > >>>>>> and have the one caller with do_it = true perform the updates. > That > > >>>>>> should include NULLing out the handle pointer (perhaps debug- > only, > > >> but > > >>>>>> it doesn't cost much to cleanly maintain the data structure). > > >>>>> Nevertheless, it has the advantage that it enforces the update to be > > >>>> consistent. > > >>>>> A caller could use it without holding the lock or mess it up otherwise. > > >>>>> In addition, I don't what to change that as part of this fix. > > >>>>> > > >>>>>> So far as I can tell, THREAD == NULL here. > > >>>>> This is a very tricky part (not my invention): > > >>>>> EXCEPTION_MARK contains an ExceptionMark constructor call which > > >> sets > > >>>> __the_thread__ to Thread::current(). > > >>>>> I don't want to publish my opinion about this ?? > > >>>>> > > >>>>> @David: > > >>>>> Seems like this option is preferred over option 3 > > >>>> (possibly_add_compiler_threads part of webrev.02 and leave the > > >>>> initialization as is). > > >>>>> So when you're ok with it, I'll request a 2nd review from the > compiler > > >> folks > > >>>> (I should change the subject to contain RFR). > > >>>>> Thanks, > > >>>>> Martin > > >>>>> > > >>>>> > > >>>>>> -----Original Message----- > > >>>>>> From: David Holmes > > >>>>>> Sent: Montag, 28. Oktober 2019 05:04 > > >>>>>> To: Kim Barrett > > >>>>>> Cc: Doerr, Martin ; Vladimir Kozlov > > >>>>>> (vladimir.kozlov at oracle.com) ; > > hotspot- > > >>>>>> compiler-dev at openjdk.java.net > > >>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI > > CompilerThread > > >>>>>> > > >>>>>> On 28/10/2019 1:42 pm, Kim Barrett wrote: > > >>>>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes > > >>>> > > >>>>>> wrote: > > >>>>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote: > > >>>>>>>>> Hi Kim, > > >>>>>>>>> I didn't like using the OopStorage stuff directly, either. I just > have > > >> not > > >>>>>> seen how to allocate a global handle and add the oop later. > > >>>>>>>>> Thanks for pointing me to JVMCI::make_global. I was not > aware > > of > > >>>> that. > > >>>>>>>>> So I can imagine 3 ways to implement it: > > >>>>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just > > >> added > > >>>>>> that to > > >>>> > > >> > > > http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr > > >>>>>> ev.01/ > > >>>>>>>>> We may want to improve that further by setting the handle > > pointer > > >> to > > >>>>>> NULL and asserting that it is NULL before adding the new one. > > >>>>>>>>> I had been concerned about NULLs in the array, but looks like > > the > > >>>>>> existing code can deal with that. > > >>>>>>>> I think it would be cleaner to both destroy the global handle and > > >> NULL it > > >>>> in > > >>>>>> the array at the same time. > > >>>>>>>> This comment > > >>>>>>>> > > >>>>>>>> 325 // Old j.l.Thread object can die here. > > >>>>>>>> > > >>>>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct- > > >>> threadObj() > > >>>> so > > >>>>>> can't "die" until that is also cleared during the actual termination > > >> process. > > >>>>>>> I think if there is such a thread here that it can't die, because the > > >>>>>>> death predicate (the can_remove stuff) won't see that old thread > > as > > >>>>>>> the last thread in _compiler2_objects. That's what I meant by > this: > > >>>>>>> > > >>>>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett > > > > >>>>>> wrote: > > >>>>>>>> I also think that here: > > >>>>>>>> > > >>>>>>>> 947 jobject thread_handle = > > >>>> JNIHandles::make_global(thread_oop); > > >>>>>>>> 948 _compiler2_objects[i] = thread_handle; > > >>>>>>>> > > >>>>>>>> should assert _compiler2_objects[i] == NULL. Or if that isn't a > > valid > > >>>>>>>> assertion then I think there are other problems. > > >>>>>>> I think either that comment about an old thread is wrong (and the > > >> NULL > > >>>>>>> assertion I suggested is okay), or I think the whole mechanism > here > > >>>>>>> has problems. Or at least I was unable to figure out how it could > > >> work... > > >>>>>>> > > >>>>>> I'm not following sorry. You can't assert NULL unless it's actually set > > >>>>>> to NULL which it presently isn't. But it could be set NULL as Martin > > >>>>>> suggested: > > >>>>>> > > >>>>>> "We may want to improve that further by setting the handle > pointer > > to > > >>>>>> NULL and asserting that it is NULL before adding the new one." > > >>>>>> > > >>>>>> and which I also supported. But that aside once the delete_global > > has > > >>>>>> been called that JNIHandle no longer references the j.l.Thread that > it > > >>>>>> did, at which point it is only reachable via the threadObj() of the > > >>>>>> CompilerThread. > > >>>>>> > > >>>>>> David > > > From jorn.vernee at oracle.com Tue Nov 5 12:03:20 2019 From: jorn.vernee at oracle.com (Jorn Vernee) Date: Tue, 5 Nov 2019 13:03:20 +0100 Subject: RFR 8233389: Add PrintIdeal to compiler directives In-Reply-To: <21ab250e-0564-69e4-62c0-07bb4dce9082@oracle.com> References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com> <21ab250e-0564-69e4-62c0-07bb4dce9082@oracle.com> Message-ID: <6547a22a-47e2-822f-0772-c2b0a7599088@oracle.com> Hi, I've update the patch per your suggestions: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.02/ (Testing = tier1, manual) - Moved the PrintIdeal flag (as well as IGVPrintLevel) into a NOT_PRODUCT in compilerDirectives.hpp. - Moved _print_ideal field in Compile into an #ifndef PRODUCT block and moved the initialization to the initializer list. W.r.t. usefulness of PrintIdeal vs PrintIdealGraph; The obvious thing is that PrintIdeal doesn't require IGV, which might be more useful if PrintIdeal were a diagnostic flag instead (as suggested by Nils), so it could be used from a standard JDK build which doesn't come with IGV. Another advantage that comes to mind is that PrintIdeal output is easier to share as text; you can just copy a few lines into the body of an email. That said, I haven't used either of these flags extensively, so I find it hard to judge whether one is clearly better than the other. But, it seems at least unfortunate that we have the PrintIdeal flag, but can not use it in compiler directives to filter the output. Jorn On 04/11/2019 15:57, Vladimir Ivanov wrote: > Hi Jorn, > > src\hotspot\share\opto\compile.hpp: > +?? bool????????????????? _print_ideal;?????????? // True if we should > dump node IR for this compilation > > Since the only usage is in non-product code, I suggest to put > _print_ideal into #ifndef PRODUCT, so you don't need to initialize it > in product build. > > Also, it'll allow you to just put it on initializer list instead of > doing it in the ctor body (akin to how _trace_opto_output is handled): > > src\hotspot\share\opto\compile.cpp: > > Compile::Compile( ciEnv* ci_env, > ... > ? : Phase(Compiler), > ... > ??? _has_reserved_stack_access(false), > #ifndef PRODUCT > ??? _trace_opto_output(directive->TraceOptoOutputOption), > #endif > ??? _has_method_handle_invokes(false), > > > Overall, I don't see much value in PrintIdeal: PrintIdealGraph > provides much more detailed information (even though in XML format) > and IdealGraphVisualizer is better at browsing the graph. The only > thing I'm usually missing is full text dump output on individual nodes > (they are shown pruned in IGV; not sure whether it's IGV fault or the > info is missing in the dump). > > Best regards, > Vladimir Ivanov > > On 01.11.2019 18:09, Jorn Vernee wrote: >> Hi, >> >> I'd like to add PrintIdeal as a compiler directive in order to enable >> PrintIdeal for only a single method when combining it with the >> 'match' directive. >> >> Please review the following: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8233389 >> Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/ >> (Testing = tier1, manual) >> >> As a heads-up; I'm not a committer on the jdk project, so if this >> sounds like a good idea, I would require a sponsor to push the changes. >> >> Thanks, >> Jorn >> From david.holmes at oracle.com Tue Nov 5 12:33:21 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 5 Nov 2019 22:33:21 +1000 Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread In-Reply-To: References: <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com> <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com> <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com> <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com> <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com> Message-ID: <6ac1f31e-61a6-fa92-75f6-cae0732915e3@oracle.com> On 5/11/2019 6:40 pm, Doerr, Martin wrote: > Hi David > >> I don't understand what you mean. If a compiler thread holds an oop, any >> oop, it must hold it in a Handle to ensure it can't be gc'd. > > The problem is not related to gc. > My change introduces destroy_global for the handles. This means that the OopStorage portion which has held the oop can get freed. > However, other compiler threads are running concurrently. They may execute code which reads the oop from the handle which is freed by this thread. Reading stale data is not a problem here, but reading freed memory may assert or even crash in general. > I can't see how OopStorage supports reading from handles which were freed by destroy_global. With JVMCI compiler threads, each getting a new j.l.Thread oop that lasts for the lifetime of that compiler thread (just like a regular JavaThread) do we even actually need these arrays? I'm unclear what purpose they serve when we are not trying to reuse the oops stored in the array. ?? David ----- > I think it would be safe if the freeing only occurred at safepoints, but I don't think this is the case. > > Best regards, > Martin > > >> -----Original Message----- >> From: David Holmes >> Sent: Dienstag, 5. November 2019 00:19 >> To: Doerr, Martin ; dean.long at oracle.com; Kim >> Barrett >> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) >> ; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread >> >> On 4/11/2019 9:12 pm, Doerr, Martin wrote: >>> Hi all, >>> >>> @Dean >>>> changing can_remove() to CompileBroker::can_remove()? >>> Yes. That would be an option. >>> >>> @Kim, David >>> I think there's another problem with this implementation. >>> It introduces a use-after-free pattern due to concurrency. >>> Compiler threads may still read the oops from the handles after one of >> them has called destroy_global until next safepoint. It doesn't matter which >> values they get in this case, but the VM should not crash. I believe that >> OopStorage allows freeing storage without safepoints, so this may be >> unsafe. Right? >> >> I don't understand what you mean. If a compiler thread holds an oop, any >> oop, it must hold it in a Handle to ensure it can't be gc'd. >> >> David >> >>> If so, I think replacing the oops in the handles (and keeping the handles >> alive) would be better. And also much more simple. >>> >>> Best regards, >>> Martin >>> >>> >>>> -----Original Message----- >>>> From: dean.long at oracle.com >>>> Sent: Samstag, 2. November 2019 08:36 >>>> To: Doerr, Martin ; David Holmes >>>> ; Kim Barrett >>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) >>>> ; hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread >>>> >>>> Hi Martin, >>>> >>>> On 10/30/19 3:18 AM, Doerr, Martin wrote: >>>>> Hi David, >>>>> >>>>>> I don't think factoring out CompileBroker::clear_compiler2_object >> when >>>>>> it is only used once was warranted, but that's call for compiler team to >>>>>> make. >>>>> I did that because _compiler2_objects is private and there's currently no >>>> setter available. >>>>> But let's see what the compiler folks think. >>>> >>>> how about changing can_remove() to CompileBroker::can_remove()? >> Then >>>> you >>>> can access _compiler2_objects directly, right? >>>> >>>> dl >>>>>> Otherwise changes seem fine and I have noted the use of the >>>>>> MutexUnlocker as per your direct email. >>>>> Thanks a lot for reviewing. It was not a trivial one ?? >>>>> >>>>> You had noticed an incorrect usage of the CHECK macro. I've created a >> new >>>> bug for that: >>>>> https://bugs.openjdk.java.net/browse/JDK-8233193 >>>>> Would be great if you could take a look if that's what you meant and >> made >>>> adaptions if needed. >>>>> >>>>> Best regards, >>>>> Martin >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes >>>>>> Sent: Mittwoch, 30. Oktober 2019 05:47 >>>>>> To: Doerr, Martin ; Kim Barrett >>>>>> >>>>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) >>>>>> ; hotspot-compiler- >> dev at openjdk.java.net; >>>>>> David Holmes >>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread >>>>>> >>>>>> Hi Martin, >>>>>> >>>>>> On 29/10/2019 12:06 am, Doerr, Martin wrote: >>>>>>> Hi David and Kim, >>>>>>> >>>>>>> I think it's easier to talk about code. So here's a new webrev: >>>>>>> >>>>>> >>>> >> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr >>>>>> ev.03/ >>>>>> >>>>>> I don't think factoring out CompileBroker::clear_compiler2_object >> when >>>>>> it is only used once was warranted, but that's call for compiler team to >>>>>> make. Otherwise changes seem fine and I have noted the use of the >>>>>> MutexUnlocker as per your direct email. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> ----- >>>>>> >>>>>>> @Kim: >>>>>>> Thanks for looking at the handle related parts. It's ok if you don't want >> to >>>>>> be a reviewer of the whole change. >>>>>>>> I think it's weird that can_remove() is a predicate with optional side >>>>>>>> effects. I think it would be simpler to have it be a pure predicate, >>>>>>>> and have the one caller with do_it = true perform the updates. That >>>>>>>> should include NULLing out the handle pointer (perhaps debug-only, >>>> but >>>>>>>> it doesn't cost much to cleanly maintain the data structure). >>>>>>> Nevertheless, it has the advantage that it enforces the update to be >>>>>> consistent. >>>>>>> A caller could use it without holding the lock or mess it up otherwise. >>>>>>> In addition, I don't what to change that as part of this fix. >>>>>>> >>>>>>>> So far as I can tell, THREAD == NULL here. >>>>>>> This is a very tricky part (not my invention): >>>>>>> EXCEPTION_MARK contains an ExceptionMark constructor call which >>>> sets >>>>>> __the_thread__ to Thread::current(). >>>>>>> I don't want to publish my opinion about this ?? >>>>>>> >>>>>>> @David: >>>>>>> Seems like this option is preferred over option 3 >>>>>> (possibly_add_compiler_threads part of webrev.02 and leave the >>>>>> initialization as is). >>>>>>> So when you're ok with it, I'll request a 2nd review from the compiler >>>> folks >>>>>> (I should change the subject to contain RFR). >>>>>>> Thanks, >>>>>>> Martin >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: David Holmes >>>>>>>> Sent: Montag, 28. Oktober 2019 05:04 >>>>>>>> To: Kim Barrett >>>>>>>> Cc: Doerr, Martin ; Vladimir Kozlov >>>>>>>> (vladimir.kozlov at oracle.com) ; >> hotspot- >>>>>>>> compiler-dev at openjdk.java.net >>>>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI >> CompilerThread >>>>>>>> >>>>>>>> On 28/10/2019 1:42 pm, Kim Barrett wrote: >>>>>>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes >>>>>> >>>>>>>> wrote: >>>>>>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote: >>>>>>>>>>> Hi Kim, >>>>>>>>>>> I didn't like using the OopStorage stuff directly, either. I just have >>>> not >>>>>>>> seen how to allocate a global handle and add the oop later. >>>>>>>>>>> Thanks for pointing me to JVMCI::make_global. I was not aware >> of >>>>>> that. >>>>>>>>>>> So I can imagine 3 ways to implement it: >>>>>>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just >>>> added >>>>>>>> that to >>>>>> >>>> >> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr >>>>>>>> ev.01/ >>>>>>>>>>> We may want to improve that further by setting the handle >> pointer >>>> to >>>>>>>> NULL and asserting that it is NULL before adding the new one. >>>>>>>>>>> I had been concerned about NULLs in the array, but looks like >> the >>>>>>>> existing code can deal with that. >>>>>>>>>> I think it would be cleaner to both destroy the global handle and >>>> NULL it >>>>>> in >>>>>>>> the array at the same time. >>>>>>>>>> This comment >>>>>>>>>> >>>>>>>>>> 325 // Old j.l.Thread object can die here. >>>>>>>>>> >>>>>>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct- >>>>> threadObj() >>>>>> so >>>>>>>> can't "die" until that is also cleared during the actual termination >>>> process. >>>>>>>>> I think if there is such a thread here that it can't die, because the >>>>>>>>> death predicate (the can_remove stuff) won't see that old thread >> as >>>>>>>>> the last thread in _compiler2_objects. That's what I meant by this: >>>>>>>>> >>>>>>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett >> >>>>>>>> wrote: >>>>>>>>>> I also think that here: >>>>>>>>>> >>>>>>>>>> 947 jobject thread_handle = >>>>>> JNIHandles::make_global(thread_oop); >>>>>>>>>> 948 _compiler2_objects[i] = thread_handle; >>>>>>>>>> >>>>>>>>>> should assert _compiler2_objects[i] == NULL. Or if that isn't a >> valid >>>>>>>>>> assertion then I think there are other problems. >>>>>>>>> I think either that comment about an old thread is wrong (and the >>>> NULL >>>>>>>>> assertion I suggested is okay), or I think the whole mechanism here >>>>>>>>> has problems. Or at least I was unable to figure out how it could >>>> work... >>>>>>>>> >>>>>>>> I'm not following sorry. You can't assert NULL unless it's actually set >>>>>>>> to NULL which it presently isn't. But it could be set NULL as Martin >>>>>>>> suggested: >>>>>>>> >>>>>>>> "We may want to improve that further by setting the handle pointer >> to >>>>>>>> NULL and asserting that it is NULL before adding the new one." >>>>>>>> >>>>>>>> and which I also supported. But that aside once the delete_global >> has >>>>>>>> been called that JNIHandle no longer references the j.l.Thread that it >>>>>>>> did, at which point it is only reachable via the threadObj() of the >>>>>>>> CompilerThread. >>>>>>>> >>>>>>>> David >>> From martin.doerr at sap.com Tue Nov 5 13:37:09 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 5 Nov 2019 13:37:09 +0000 Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread In-Reply-To: <6ac1f31e-61a6-fa92-75f6-cae0732915e3@oracle.com> References: <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com> <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com> <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com> <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com> <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com> <6ac1f31e-61a6-fa92-75f6-cae0732915e3@oracle.com> Message-ID: Hi David > With JVMCI compiler threads, each getting a new j.l.Thread oop that > lasts for the lifetime of that compiler thread (just like a regular > JavaThread) do we even actually need these arrays? I'm unclear what > purpose they serve when we are not trying to reuse the oops stored in > the array. ?? Compiler threads can lookup j.l.Thread objects of live compilers by iterating over the arrays. That's used to find the last compiler alive or to find a log instance for a compiler. Could get designed differently, but that would make the change even bigger. Best regards, Martin > -----Original Message----- > From: David Holmes > Sent: Dienstag, 5. November 2019 13:33 > To: Doerr, Martin ; dean.long at oracle.com; Kim > Barrett > Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) > ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread > > On 5/11/2019 6:40 pm, Doerr, Martin wrote: > > Hi David > > > >> I don't understand what you mean. If a compiler thread holds an oop, any > >> oop, it must hold it in a Handle to ensure it can't be gc'd. > > > > The problem is not related to gc. > > My change introduces destroy_global for the handles. This means that the > OopStorage portion which has held the oop can get freed. > > However, other compiler threads are running concurrently. They may > execute code which reads the oop from the handle which is freed by this > thread. Reading stale data is not a problem here, but reading freed memory > may assert or even crash in general. > > I can't see how OopStorage supports reading from handles which were > freed by destroy_global. > > With JVMCI compiler threads, each getting a new j.l.Thread oop that > lasts for the lifetime of that compiler thread (just like a regular > JavaThread) do we even actually need these arrays? I'm unclear what > purpose they serve when we are not trying to reuse the oops stored in > the array. ?? > > David > ----- > > > I think it would be safe if the freeing only occurred at safepoints, but I don't > think this is the case. > > > > Best regards, > > Martin > > > > > >> -----Original Message----- > >> From: David Holmes > >> Sent: Dienstag, 5. November 2019 00:19 > >> To: Doerr, Martin ; dean.long at oracle.com; Kim > >> Barrett > >> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) > >> ; hotspot-compiler-dev at openjdk.java.net > >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread > >> > >> On 4/11/2019 9:12 pm, Doerr, Martin wrote: > >>> Hi all, > >>> > >>> @Dean > >>>> changing can_remove() to CompileBroker::can_remove()? > >>> Yes. That would be an option. > >>> > >>> @Kim, David > >>> I think there's another problem with this implementation. > >>> It introduces a use-after-free pattern due to concurrency. > >>> Compiler threads may still read the oops from the handles after one of > >> them has called destroy_global until next safepoint. It doesn't matter > which > >> values they get in this case, but the VM should not crash. I believe that > >> OopStorage allows freeing storage without safepoints, so this may be > >> unsafe. Right? > >> > >> I don't understand what you mean. If a compiler thread holds an oop, any > >> oop, it must hold it in a Handle to ensure it can't be gc'd. > >> > >> David > >> > >>> If so, I think replacing the oops in the handles (and keeping the handles > >> alive) would be better. And also much more simple. > >>> > >>> Best regards, > >>> Martin > >>> > >>> > >>>> -----Original Message----- > >>>> From: dean.long at oracle.com > >>>> Sent: Samstag, 2. November 2019 08:36 > >>>> To: Doerr, Martin ; David Holmes > >>>> ; Kim Barrett > >>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) > >>>> ; hotspot-compiler- > dev at openjdk.java.net > >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread > >>>> > >>>> Hi Martin, > >>>> > >>>> On 10/30/19 3:18 AM, Doerr, Martin wrote: > >>>>> Hi David, > >>>>> > >>>>>> I don't think factoring out CompileBroker::clear_compiler2_object > >> when > >>>>>> it is only used once was warranted, but that's call for compiler team > to > >>>>>> make. > >>>>> I did that because _compiler2_objects is private and there's currently > no > >>>> setter available. > >>>>> But let's see what the compiler folks think. > >>>> > >>>> how about changing can_remove() to CompileBroker::can_remove()? > >> Then > >>>> you > >>>> can access _compiler2_objects directly, right? > >>>> > >>>> dl > >>>>>> Otherwise changes seem fine and I have noted the use of the > >>>>>> MutexUnlocker as per your direct email. > >>>>> Thanks a lot for reviewing. It was not a trivial one ?? > >>>>> > >>>>> You had noticed an incorrect usage of the CHECK macro. I've created a > >> new > >>>> bug for that: > >>>>> https://bugs.openjdk.java.net/browse/JDK-8233193 > >>>>> Would be great if you could take a look if that's what you meant and > >> made > >>>> adaptions if needed. > >>>>> > >>>>> Best regards, > >>>>> Martin > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: David Holmes > >>>>>> Sent: Mittwoch, 30. Oktober 2019 05:47 > >>>>>> To: Doerr, Martin ; Kim Barrett > >>>>>> > >>>>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) > >>>>>> ; hotspot-compiler- > >> dev at openjdk.java.net; > >>>>>> David Holmes > >>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI > CompilerThread > >>>>>> > >>>>>> Hi Martin, > >>>>>> > >>>>>> On 29/10/2019 12:06 am, Doerr, Martin wrote: > >>>>>>> Hi David and Kim, > >>>>>>> > >>>>>>> I think it's easier to talk about code. So here's a new webrev: > >>>>>>> > >>>>>> > >>>> > >> > http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr > >>>>>> ev.03/ > >>>>>> > >>>>>> I don't think factoring out CompileBroker::clear_compiler2_object > >> when > >>>>>> it is only used once was warranted, but that's call for compiler team > to > >>>>>> make. Otherwise changes seem fine and I have noted the use of the > >>>>>> MutexUnlocker as per your direct email. > >>>>>> > >>>>>> Thanks, > >>>>>> David > >>>>>> ----- > >>>>>> > >>>>>>> @Kim: > >>>>>>> Thanks for looking at the handle related parts. It's ok if you don't > want > >> to > >>>>>> be a reviewer of the whole change. > >>>>>>>> I think it's weird that can_remove() is a predicate with optional > side > >>>>>>>> effects. I think it would be simpler to have it be a pure predicate, > >>>>>>>> and have the one caller with do_it = true perform the updates. > That > >>>>>>>> should include NULLing out the handle pointer (perhaps debug- > only, > >>>> but > >>>>>>>> it doesn't cost much to cleanly maintain the data structure). > >>>>>>> Nevertheless, it has the advantage that it enforces the update to > be > >>>>>> consistent. > >>>>>>> A caller could use it without holding the lock or mess it up > otherwise. > >>>>>>> In addition, I don't what to change that as part of this fix. > >>>>>>> > >>>>>>>> So far as I can tell, THREAD == NULL here. > >>>>>>> This is a very tricky part (not my invention): > >>>>>>> EXCEPTION_MARK contains an ExceptionMark constructor call > which > >>>> sets > >>>>>> __the_thread__ to Thread::current(). > >>>>>>> I don't want to publish my opinion about this ?? > >>>>>>> > >>>>>>> @David: > >>>>>>> Seems like this option is preferred over option 3 > >>>>>> (possibly_add_compiler_threads part of webrev.02 and leave the > >>>>>> initialization as is). > >>>>>>> So when you're ok with it, I'll request a 2nd review from the > compiler > >>>> folks > >>>>>> (I should change the subject to contain RFR). > >>>>>>> Thanks, > >>>>>>> Martin > >>>>>>> > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: David Holmes > >>>>>>>> Sent: Montag, 28. Oktober 2019 05:04 > >>>>>>>> To: Kim Barrett > >>>>>>>> Cc: Doerr, Martin ; Vladimir Kozlov > >>>>>>>> (vladimir.kozlov at oracle.com) ; > >> hotspot- > >>>>>>>> compiler-dev at openjdk.java.net > >>>>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI > >> CompilerThread > >>>>>>>> > >>>>>>>> On 28/10/2019 1:42 pm, Kim Barrett wrote: > >>>>>>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes > >>>>>> > >>>>>>>> wrote: > >>>>>>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote: > >>>>>>>>>>> Hi Kim, > >>>>>>>>>>> I didn't like using the OopStorage stuff directly, either. I just > have > >>>> not > >>>>>>>> seen how to allocate a global handle and add the oop later. > >>>>>>>>>>> Thanks for pointing me to JVMCI::make_global. I was not > aware > >> of > >>>>>> that. > >>>>>>>>>>> So I can imagine 3 ways to implement it: > >>>>>>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I > just > >>>> added > >>>>>>>> that to > >>>>>> > >>>> > >> > http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr > >>>>>>>> ev.01/ > >>>>>>>>>>> We may want to improve that further by setting the handle > >> pointer > >>>> to > >>>>>>>> NULL and asserting that it is NULL before adding the new one. > >>>>>>>>>>> I had been concerned about NULLs in the array, but looks like > >> the > >>>>>>>> existing code can deal with that. > >>>>>>>>>> I think it would be cleaner to both destroy the global handle > and > >>>> NULL it > >>>>>> in > >>>>>>>> the array at the same time. > >>>>>>>>>> This comment > >>>>>>>>>> > >>>>>>>>>> 325 // Old j.l.Thread object can die here. > >>>>>>>>>> > >>>>>>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct- > >>>>> threadObj() > >>>>>> so > >>>>>>>> can't "die" until that is also cleared during the actual termination > >>>> process. > >>>>>>>>> I think if there is such a thread here that it can't die, because the > >>>>>>>>> death predicate (the can_remove stuff) won't see that old > thread > >> as > >>>>>>>>> the last thread in _compiler2_objects. That's what I meant by > this: > >>>>>>>>> > >>>>>>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett > >> > >>>>>>>> wrote: > >>>>>>>>>> I also think that here: > >>>>>>>>>> > >>>>>>>>>> 947 jobject thread_handle = > >>>>>> JNIHandles::make_global(thread_oop); > >>>>>>>>>> 948 _compiler2_objects[i] = thread_handle; > >>>>>>>>>> > >>>>>>>>>> should assert _compiler2_objects[i] == NULL. Or if that isn't a > >> valid > >>>>>>>>>> assertion then I think there are other problems. > >>>>>>>>> I think either that comment about an old thread is wrong (and > the > >>>> NULL > >>>>>>>>> assertion I suggested is okay), or I think the whole mechanism > here > >>>>>>>>> has problems. Or at least I was unable to figure out how it could > >>>> work... > >>>>>>>>> > >>>>>>>> I'm not following sorry. You can't assert NULL unless it's actually > set > >>>>>>>> to NULL which it presently isn't. But it could be set NULL as Martin > >>>>>>>> suggested: > >>>>>>>> > >>>>>>>> "We may want to improve that further by setting the handle > pointer > >> to > >>>>>>>> NULL and asserting that it is NULL before adding the new one." > >>>>>>>> > >>>>>>>> and which I also supported. But that aside once the delete_global > >> has > >>>>>>>> been called that JNIHandle no longer references the j.l.Thread > that it > >>>>>>>> did, at which point it is only reachable via the threadObj() of the > >>>>>>>> CompilerThread. > >>>>>>>> > >>>>>>>> David > >>> From rwestrel at redhat.com Tue Nov 5 13:42:20 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 05 Nov 2019 14:42:20 +0100 Subject: [8u] RFR: 8233023: assert(Opcode() == mem->Opcode() || phase->C->get_alias_index(adr_type()) == Compile::AliasIdxRaw) failed: no mismatched stores, except on raw memory In-Reply-To: <7a2d50793120bdeb86d0047fd09db18c725328e0.camel@redhat.com> References: <7a2d50793120bdeb86d0047fd09db18c725328e0.camel@redhat.com> Message-ID: <87tv7ie8xv.fsf@redhat.com> Hi Severin, Thanks for taking care of this. > Could I please get a review of this 8u only issue? The reason a > fastdebug build of latest OpenJDK 8u asserts for the dec-tree benchmark > of the renaissance suite is because the 8u backport of JDK-8140309 was > missing this hunk from JDK 9[1]: > > + (Opcode() == Op_StoreL && st->Opcode() == Op_StoreI) || // expanded ClearArrayNode > + (is_mismatched_access() || st->as_Store()->is_mismatched_access()), > > I had a closer look and there doesn't seem to be missing anything else. > The proposed fix is to amend the assert condition in the appropriate > place, which brings 8u in line with JDK 9 code where the failure isn't > observed. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8233023 > webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8233023/01/webrev/ Isn't this: @@ -3213,6 +3221,9 @@ // within the initialized memory. intptr_t InitializeNode::can_capture_store(StoreNode* st, PhaseTransform* phase, bool can_reshape) { const int FAIL = 0; + if (st->is_unaligned_access()) { + return FAIL; + } if (st->req() != MemNode::ValueIn + 1) return FAIL; // an inscrutable StoreNode (card mark?) Node* ctl = st->in(MemNode::Control); also missing from the 8140309? It must be armless because nothing sets _unaligned_access AFAICT but given the unaligned access part of the patch was backported I think we should keep it consistent. Also, + (Opcode() == Op_StoreL && st->Opcode() == Op_StoreI) || // expanded ClearArrayNode is not from 8140309 but from 8080289. We're not going to backport it and that line is unrelated to the change so backporting it sounds good. But while doing this, we should backport the other changes to that assert from 8080289 as well. + st->Opcode() == Op_StoreVector || + Opcode() == Op_StoreVector || Roland. From lutz.schmidt at sap.com Tue Nov 5 14:54:25 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 5 Nov 2019 14:54:25 +0000 Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free blocks In-Reply-To: References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com> <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com> <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com> <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com> <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com> <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com> <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com> <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com> Message-ID: <09A11702-3B0B-4FA2-A79E-EE61779D311F@sap.com> Hi Thomas, thank you very much for your elaborate comments and suggestions! Please find my comments inline below. Regards, Lutz ?On 04.11.19, 18:02, "Thomas St?fe" wrote: Hi Lutz, I like this patch and have a number of remarks: Segment map splicing: - If the last byte of the leading block happens to contain 0xFD, you do not need to increment the fragmentation counter after splicing since the leading hop would be fully formed. R: You are right. I'm not sure if this extra check will pay off performance-wise, though. - If patch complexity is a concern, I would maybe leave out the segmap_template. That would simplify the coding a bit and I am not sure how much this actually brings. I really am curious. Is it better to let the compiler store immediates or to memcpy the template? Would the latter not mean I pay for the loads too? R: Introducing the segmap_template was my first attempt to boost performance. Improvement was visible, but of course in no way comparable with the effect we see now. Unfortunately, I did not archive the numbers. The improvement over a byte loop depends on the compiler, the memcpy implementation, and CPU capabilities. On s390x, for example, there exists special hardware support for mem->mem moves of up to 256 bytes. - In ::allocate(), I like that you re-use block. This is just bikeshedding, but you also could reshape this function a tiny bit for a clean single exit at the end of the function. R: Where to exit a function is personal taste. Or is there an OpenJDK coding guideline for that? In this particular case, an additional local variable would be needed to hold the function result (block->allocated_space() or NULL). I do not really like that. - Comment nits: - * code block (e.g. nmethod) when only a pointer to somewhere inside the + * code block (e.g. nmethod) when only a pointer to a location inside the - * - Segment addresses should be aligned to be multiples of CodeCacheSegmentSize. + * - Segment start addresses should be aligned to be multiples of CodeCacheSegmentSize. - * - Allocation in the code cache can only happen at segment boundaries. + * - Allocation in the code cache can only happen at segment start addresses. R: agreed. Will be in next webrev. - I would like comments for the newly added functions in heap.hpp, not in the cpp file, because my IDE expands those comments from the hpp. But thats just me :=) R: I prefer interface documentation in *.hpp (where you look what'S available), functional description in *.cpp (where the code is). - block_at() could be implemented now using address_for and a cast R: I tried to keep the diff in check. - fragmentation_limit, freelist_limit: I thought using enums for constants is not done anymore, and the preferred way is to use static const. See e.g. markWord.hpp. R: If static const is the way to go, I'll go that way. - + // Boundaries of committed space. + // Boundaries of reserved space. Thanks for the comments. Out of curiosity, can the lower part be uncommitted? Or would low() == low_boundary() always be true? R: The interface distinguishes low() and low_boundary(). I did not dive into VirtualSpace to find out if the two values are always the same. Instead of relying on implementation detals, I prefer to adhere to the API. - + bool contains(const void* p) const { return low() <= p && p < high(); } Is comparing with void* legal C++? R: None of our various compilers complained... - CodeHeap::merge_right(): - Nits, I would maybe rename "follower" to "beg_follower", or, alternatively, "beg" to "leader". - I wonder whether we can have a "wipe header function" which only wipes the FreeBlock header of the follower, instead of wiping the whole segment. R: If naming ist o be changed, I'd rather do something like size_t end_ix = segment_for(a) + a->length(); - CodeHeap::add_to_freelist(): I am astonished that this brings such a performance increase. I would naively have thought the deallocation to be pretty random, and therefore this technique to have an improvement of factor 2 in general, but we also need to find the start of the last_insert_free_block which may take some hops, and then, the block might not even be free... R: Well, it all depends on the deallocation pattern. If you deallocate with ascending block addresses, the optimization is perfect. If you deallocate in the other direction, the optimization is not so good. But it will help as long as the the deallocated block is at a higher address than the remembered free block. -- General remarks, not necessarily for your patch: - This code could really gain readability if the segment id would be its an own type, e.g. "typdef int segid_t". APIs like "mark_segmap_as_used(segid_t beg, segid_t end) are immediately clearer than when size_t is used. R: Same as above: wanted to keep the diff in check. - In CodeHeap::verify(), do we actually check that the segment map is correctly formed? That from every point in the map, we reach the correct start of the associated code blob? I could not find this but I may be blind. R: I had that, but it was way too expensive, even for a fastdebug build. Check webrev version *.00 Thanks, Thomas On Thu, Oct 31, 2019 at 5:55 PM Schmidt, Lutz wrote: Hi Andrew, (and hi to the interested crowd), Please accept my apologies for taking so long to get back. These tests (OverflowCodeCacheTest and StressCodeCacheTest) were causing me quite some headaches. Some layer between me and the test prevents the vm (in particular: the VMThread) from terminating normally. The final output from my time measurements is therefore not generated or thrown away. Adding to that were some test machine unavailabilities and a bug in my measurement code, causing crashes. Anyway, I added some on-the-fly output, printing the timer values after 10k measurement intervals. This reveals some interesting, additional facts about the tests and the CodeHeap management methods. For detailed numbers, refer to the files attached to the bug (https://bugs.openjdk.java.net/browse/JDK-8231460). For even more detail, I can provide the jtr files on request. OverflowCodeCacheTest ===================== This test runs (in my setup) with a 1GB CodeCache. For this test, CodeHeap::mark_segmap_as_used() is THE performance hog. 40% of all calls have to mark more than 16k segment map entries (in the not optimized case). Basically all of these calls convert to len=1 calls with the optimization turned on. Note that during FreeBlock joining, the segment count is forced to 1(one). No wonder the time spent in CodeHeap::mark_segmap_as_used() collapses from >80sec (half of the test runtime) to <100msec. CodeHeap::add_to_freelist() on the other hand, is almost not observable. Average free list length is at two elements, making even linear search really quick. StressCodeCacheTest =================== With a 1GB CodeCache, this test runs into a 12 min timeout, set by our internal test environment. Scaling back to 300MB prevents the test from timing out. For this test, CodeHeap::mark_segmap_as_used() is not a factor. From 200,000 calls, only a few (less than 3%) had to process a block consisting of more than 16 segments. Note that during FreeBlock joining, the segment count is forced to 1(one). Another method is popping up as performance hog instead: CodeHeap::add_to_freelist(). More than 8 out of 40 seconds of test runtime (before optimization) are spent in this method, for just 160,000 calls. The test seems to create a long list of non-contiguous free blocks (around 5,500 on average). This list is linearly scanned to find the insert point for the free block at hand. Suffering as well from the long free block list is CodeHeap::search_freelist(). It uses another 2.7 seconds for 270,000 calls. SPEVjvm2008 suite ================= With respect to the task at hand, this is a well-behaved test suite. Timing shows some before/after difference, but nothing spectacular. The measurements due not provide evidence of a performance bottleneck. There were some minor adjustments to the code. Unused code blocks have been removed as well. I have therefore created a new webrev. You can find it here: http://cr.openjdk.java.net/~lucy/webrevs/8231460.01/ Thanks for investing your time! Lutz On 21.10.19, 15:06, "Andrew Dinn" wrote: Hi Lutz, On 21/10/2019 13:37, Schmidt, Lutz wrote: > I understand what you are interested in. And I was hoping to be able > to provide some (first) numbers by today. Unfortunately, the > measurement code I activated last Friday was buggy and blew most of > the tests I had hoped to run over the weekend. > > I will take your modified test and run it with and without my > optimization. In parallel, I will try to generate some (non-random) > numbers for other tests. > > I'll be back as soon as I have results. Thanks for trying the test and also for deriving some call stats from a real example. I'm keen to see how much your patch improves things. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From tobias.hartmann at oracle.com Tue Nov 5 16:00:40 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 5 Nov 2019 17:00:40 +0100 Subject: RFR(S) : 8233496 : AOT tests failures with 'java.lang.RuntimeException: Failed to find sun/hotspot/WhiteBox.class' In-Reply-To: <3845ADEC-2657-421F-B35F-C12962A31452@oracle.com> References: <87B508A4-0274-4084-B8B2-1A23FB9B8D26@oracle.com> <3845ADEC-2657-421F-B35F-C12962A31452@oracle.com> Message-ID: <5ca28de0-2a9b-3762-e9a8-12cde97b828c@oracle.com> +1 Best regards, Tobias On 04.11.19 22:45, Vladimir Kozlov wrote: > Looks good. > > Thanks > Vladimir > >> On Nov 4, 2019, at 1:33 PM, Igor Ignatyev wrote: >> >> http://cr.openjdk.java.net/~iignatyev//8233496/webrev.00/index.html >>> 42 lines changed: 0 ins; 6 del; 36 mod; >> >> Hi all, >> >> could you please review this small patch for compiler/aot tests? the tests run 'ClassFileInstaller sun.hotspot.WhiteBox' w/o having any preceding actions which build s.h.WhiteBox class, the fix adds sun.hotspot.WhiteBox to the explicit build action and also removes unneeded classes (compiler.aot.AotCompiler and compiler.calls.common.InvokeDynamicPatcher as they are built implicitly by @run) from it. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8233496 >> webrev: http://cr.openjdk.java.net/~iignatyev//8233496/webrev.00/index.html >> testing: >> - all compiler/aot tests together on Oracle platforms >> - each changed test separately on linux-x64 >> >> Thanks, >> -- Igor >> >> >> > From vladimir.x.ivanov at oracle.com Tue Nov 5 16:54:30 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 5 Nov 2019 19:54:30 +0300 Subject: RFR 8233389: Add PrintIdeal to compiler directives In-Reply-To: <6547a22a-47e2-822f-0772-c2b0a7599088@oracle.com> References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com> <21ab250e-0564-69e4-62c0-07bb4dce9082@oracle.com> <6547a22a-47e2-822f-0772-c2b0a7599088@oracle.com> Message-ID: <975bbd8f-8021-8a5c-544b-123b2a2e08d7@oracle.com> > http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.02/ Looks good. > W.r.t. usefulness of PrintIdeal vs PrintIdealGraph; The obvious thing is > that PrintIdeal doesn't require IGV, which might be more useful if > PrintIdeal were a diagnostic flag instead (as suggested by Nils), so it > could be used from a standard JDK build which doesn't come with IGV. > Another advantage that comes to mind is that PrintIdeal output is easier > to share as text; you can just copy a few lines into the body of an > email. That said, I haven't used either of these flags extensively, so I > find it hard to judge whether one is clearly better than the other. But, > it seems at least unfortunate that we have the PrintIdeal flag, but can > not use it in compiler directives to filter the output. It's a long-standing issue with tracing functionality in C2: both options depend on ability to dump Node and Type instances in textual form which is absent in product binaries. It would be nice to bundle that code in product binaries as well and turn both options into diagnostic ones, but nobody have taken care of it yet. Also, at some point, IGV had a text view of the graph (which was pretty close to PrintIdeal output), but I can't find it there anymore. Best regards, Vladimir Ivanov > On 04/11/2019 15:57, Vladimir Ivanov wrote: >> Hi Jorn, >> >> src\hotspot\share\opto\compile.hpp: >> +?? bool????????????????? _print_ideal;?????????? // True if we should >> dump node IR for this compilation >> >> Since the only usage is in non-product code, I suggest to put >> _print_ideal into #ifndef PRODUCT, so you don't need to initialize it >> in product build. >> >> Also, it'll allow you to just put it on initializer list instead of >> doing it in the ctor body (akin to how _trace_opto_output is handled): >> >> src\hotspot\share\opto\compile.cpp: >> >> Compile::Compile( ciEnv* ci_env, >> ... >> ? : Phase(Compiler), >> ... >> ??? _has_reserved_stack_access(false), >> #ifndef PRODUCT >> ??? _trace_opto_output(directive->TraceOptoOutputOption), >> #endif >> ??? _has_method_handle_invokes(false), >> >> >> Overall, I don't see much value in PrintIdeal: PrintIdealGraph >> provides much more detailed information (even though in XML format) >> and IdealGraphVisualizer is better at browsing the graph. The only >> thing I'm usually missing is full text dump output on individual nodes >> (they are shown pruned in IGV; not sure whether it's IGV fault or the >> info is missing in the dump). >> >> Best regards, >> Vladimir Ivanov >> >> On 01.11.2019 18:09, Jorn Vernee wrote: >>> Hi, >>> >>> I'd like to add PrintIdeal as a compiler directive in order to enable >>> PrintIdeal for only a single method when combining it with the >>> 'match' directive. >>> >>> Please review the following: >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8233389 >>> Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/ >>> (Testing = tier1, manual) >>> >>> As a heads-up; I'm not a committer on the jdk project, so if this >>> sounds like a good idea, I would require a sponsor to push the changes. >>> >>> Thanks, >>> Jorn >>> From igor.ignatyev at oracle.com Tue Nov 5 16:58:47 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 5 Nov 2019 08:58:47 -0800 Subject: RFR(S) : 8233496 : AOT tests failures with 'java.lang.RuntimeException: Failed to find sun/hotspot/WhiteBox.class' In-Reply-To: <5ca28de0-2a9b-3762-e9a8-12cde97b828c@oracle.com> References: <87B508A4-0274-4084-B8B2-1A23FB9B8D26@oracle.com> <3845ADEC-2657-421F-B35F-C12962A31452@oracle.com> <5ca28de0-2a9b-3762-e9a8-12cde97b828c@oracle.com> Message-ID: <920D44DE-9AE1-4304-9682-782F585E4913@oracle.com> Vladimir, Tobias, thanks for your review, pushed. -- Igor > On Nov 5, 2019, at 8:00 AM, Tobias Hartmann wrote: > > +1 > > Best regards, > Tobias > > On 04.11.19 22:45, Vladimir Kozlov wrote: >> Looks good. >> >> Thanks >> Vladimir >> >>> On Nov 4, 2019, at 1:33 PM, Igor Ignatyev wrote: >>> >>> http://cr.openjdk.java.net/~iignatyev//8233496/webrev.00/index.html >>>> 42 lines changed: 0 ins; 6 del; 36 mod; >>> >>> Hi all, >>> >>> could you please review this small patch for compiler/aot tests? the tests run 'ClassFileInstaller sun.hotspot.WhiteBox' w/o having any preceding actions which build s.h.WhiteBox class, the fix adds sun.hotspot.WhiteBox to the explicit build action and also removes unneeded classes (compiler.aot.AotCompiler and compiler.calls.common.InvokeDynamicPatcher as they are built implicitly by @run) from it. >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233496 >>> webrev: http://cr.openjdk.java.net/~iignatyev//8233496/webrev.00/index.html >>> testing: >>> - all compiler/aot tests together on Oracle platforms >>> - each changed test separately on linux-x64 >>> >>> Thanks, >>> -- Igor >>> >>> >>> >> From igor.veresov at oracle.com Tue Nov 5 17:32:39 2019 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Nov 2019 09:32:39 -0800 Subject: RFR(XS) 8233590: Compiler thread creation fails with assert(_c2_count > 0 || _c1_count > 0) failed: No compilers? Message-ID: This fixes a regression introduced by JDK-8233429. JBS: https://bugs.openjdk.java.net/browse/JDK-8233590 Webrev: http://cr.openjdk.java.net/~iveresov/8233590/webrev.00/ igor From shade at redhat.com Tue Nov 5 17:38:10 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 5 Nov 2019 18:38:10 +0100 Subject: RFR(XS) 8233590: Compiler thread creation fails with assert(_c2_count > 0 || _c1_count > 0) failed: No compilers? In-Reply-To: References: Message-ID: <294cb7fb-8277-1f06-ef68-5a6a9b5f348f@redhat.com> On 11/5/19 6:32 PM, Igor Veresov wrote: > JBS: https://bugs.openjdk.java.net/browse/JDK-8233590 > Webrev: http://cr.openjdk.java.net/~iveresov/8233590/webrev.00/ Awwww, took me a while to see how the original code was broken. Looks good! -- Thanks, -Aleksey From tobias.hartmann at oracle.com Tue Nov 5 17:56:08 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 5 Nov 2019 18:56:08 +0100 Subject: RFR(XS) 8233590: Compiler thread creation fails with assert(_c2_count > 0 || _c1_count > 0) failed: No compilers? In-Reply-To: <294cb7fb-8277-1f06-ef68-5a6a9b5f348f@redhat.com> References: <294cb7fb-8277-1f06-ef68-5a6a9b5f348f@redhat.com> Message-ID: <8013806e-b863-a82b-93bd-4f25870b7de8@oracle.com> +1, ship it! Best regards, Tobias On 05.11.19 18:38, Aleksey Shipilev wrote: > On 11/5/19 6:32 PM, Igor Veresov wrote: >> JBS: https://bugs.openjdk.java.net/browse/JDK-8233590 >> Webrev: http://cr.openjdk.java.net/~iveresov/8233590/webrev.00/ > > Awwww, took me a while to see how the original code was broken. Looks good! > From igor.veresov at oracle.com Tue Nov 5 17:57:24 2019 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Nov 2019 09:57:24 -0800 Subject: RFR(XS) 8233590: Compiler thread creation fails with assert(_c2_count > 0 || _c1_count > 0) failed: No compilers? In-Reply-To: <8013806e-b863-a82b-93bd-4f25870b7de8@oracle.com> References: <294cb7fb-8277-1f06-ef68-5a6a9b5f348f@redhat.com> <8013806e-b863-a82b-93bd-4f25870b7de8@oracle.com> Message-ID: <33788B72-540B-4F41-BEEB-A38FC005DAA6@oracle.com> Thanks Aleksey and Tobias! igor > On Nov 5, 2019, at 9:56 AM, Tobias Hartmann wrote: > > +1, ship it! > > Best regards, > Tobias > > On 05.11.19 18:38, Aleksey Shipilev wrote: >> On 11/5/19 6:32 PM, Igor Veresov wrote: >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233590 >>> Webrev: http://cr.openjdk.java.net/~iveresov/8233590/webrev.00/ >> >> Awwww, took me a while to see how the original code was broken. Looks good! >> From sgehwolf at redhat.com Tue Nov 5 19:18:06 2019 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Tue, 05 Nov 2019 20:18:06 +0100 Subject: [8u] RFR: 8233023: assert(Opcode() == mem->Opcode() || phase->C->get_alias_index(adr_type()) == Compile::AliasIdxRaw) failed: no mismatched stores, except on raw memory In-Reply-To: <87tv7ie8xv.fsf@redhat.com> References: <7a2d50793120bdeb86d0047fd09db18c725328e0.camel@redhat.com> <87tv7ie8xv.fsf@redhat.com> Message-ID: <66e485457444840d563685516765e31304691932.camel@redhat.com> Hi Roland, On Tue, 2019-11-05 at 14:42 +0100, Roland Westrelin wrote: > Hi Severin, > > Thanks for taking care of this. Thanks for the review! > > Could I please get a review of this 8u only issue? The reason a > > fastdebug build of latest OpenJDK 8u asserts for the dec-tree benchmark > > of the renaissance suite is because the 8u backport of JDK-8140309 was > > missing this hunk from JDK 9[1]: > > > > + (Opcode() == Op_StoreL && st->Opcode() == Op_StoreI) || // expanded ClearArrayNode > > + (is_mismatched_access() || st->as_Store()->is_mismatched_access()), > > > > I had a closer look and there doesn't seem to be missing anything else. > > The proposed fix is to amend the assert condition in the appropriate > > place, which brings 8u in line with JDK 9 code where the failure isn't > > observed. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8233023 > > webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8233023/01/webrev/ > > Isn't this: > > @@ -3213,6 +3221,9 @@ > // within the initialized memory. > intptr_t InitializeNode::can_capture_store(StoreNode* st, PhaseTransform* phase, bool can_reshape) { > const int FAIL = 0; > + if (st->is_unaligned_access()) { > + return FAIL; > + } > if (st->req() != MemNode::ValueIn + 1) > return FAIL; // an inscrutable StoreNode (card mark?) > Node* ctl = st->in(MemNode::Control); > > also missing from the 8140309? It wasn't missing from the 8u backport of 8140309. See: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/0ffee573412b It was removed later by JDK-8202414. > It must be armless because nothing sets _unaligned_access AFAICT but > given the unaligned access part of the patch was backported I think we > should keep it consistent. > > Also, > > + (Opcode() == Op_StoreL && st->Opcode() == Op_StoreI) || // expanded ClearArrayNode > > is not from 8140309 but from 8080289. Aah, good catch. I didn't mean to include part of 8080289. Updated webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8233023/02/webrev/ > We're not going to backport it and > that line is unrelated to the change so backporting it sounds good. But > while doing this, we should backport the other changes to that assert > from 8080289 as well. > > + st->Opcode() == Op_StoreVector || > + Opcode() == Op_StoreVector || Right. I've opted for not including any parts of 8080289, so we should be consistent. Thoughts? Thanks, Severin From david.holmes at oracle.com Tue Nov 5 23:48:13 2019 From: david.holmes at oracle.com (David Holmes) Date: Wed, 6 Nov 2019 09:48:13 +1000 Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread In-Reply-To: References: <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com> <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com> <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com> <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com> <6ac1f31e-61a6-fa92-75f6-cae0732915e3@oracle.com> Message-ID: <7f792953-112d-40bc-70c9-02323fa8f892@oracle.com> Hi Martin, On 5/11/2019 11:37 pm, Doerr, Martin wrote: > Hi David > >> With JVMCI compiler threads, each getting a new j.l.Thread oop that >> lasts for the lifetime of that compiler thread (just like a regular >> JavaThread) do we even actually need these arrays? I'm unclear what >> purpose they serve when we are not trying to reuse the oops stored in >> the array. ?? > > Compiler threads can lookup j.l.Thread objects of live compilers by iterating over the arrays. > That's used to find the last compiler alive or to find a log instance for a compiler. > Could get designed differently, but that would make the change even bigger. Yes but given we don't have a clean working fix for current arrangement ... Is hacking into oopStorage internals the only way forward? Thanks, David > > Best regards, > Martin > > >> -----Original Message----- >> From: David Holmes >> Sent: Dienstag, 5. November 2019 13:33 >> To: Doerr, Martin ; dean.long at oracle.com; Kim >> Barrett >> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) >> ; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread >> >> On 5/11/2019 6:40 pm, Doerr, Martin wrote: >>> Hi David >>> >>>> I don't understand what you mean. If a compiler thread holds an oop, any >>>> oop, it must hold it in a Handle to ensure it can't be gc'd. >>> >>> The problem is not related to gc. >>> My change introduces destroy_global for the handles. This means that the >> OopStorage portion which has held the oop can get freed. >>> However, other compiler threads are running concurrently. They may >> execute code which reads the oop from the handle which is freed by this >> thread. Reading stale data is not a problem here, but reading freed memory >> may assert or even crash in general. >>> I can't see how OopStorage supports reading from handles which were >> freed by destroy_global. >> >> With JVMCI compiler threads, each getting a new j.l.Thread oop that >> lasts for the lifetime of that compiler thread (just like a regular >> JavaThread) do we even actually need these arrays? I'm unclear what >> purpose they serve when we are not trying to reuse the oops stored in >> the array. ?? >> >> David >> ----- >> >>> I think it would be safe if the freeing only occurred at safepoints, but I don't >> think this is the case. >>> >>> Best regards, >>> Martin >>> >>> >>>> -----Original Message----- >>>> From: David Holmes >>>> Sent: Dienstag, 5. November 2019 00:19 >>>> To: Doerr, Martin ; dean.long at oracle.com; Kim >>>> Barrett >>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) >>>> ; hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread >>>> >>>> On 4/11/2019 9:12 pm, Doerr, Martin wrote: >>>>> Hi all, >>>>> >>>>> @Dean >>>>>> changing can_remove() to CompileBroker::can_remove()? >>>>> Yes. That would be an option. >>>>> >>>>> @Kim, David >>>>> I think there's another problem with this implementation. >>>>> It introduces a use-after-free pattern due to concurrency. >>>>> Compiler threads may still read the oops from the handles after one of >>>> them has called destroy_global until next safepoint. It doesn't matter >> which >>>> values they get in this case, but the VM should not crash. I believe that >>>> OopStorage allows freeing storage without safepoints, so this may be >>>> unsafe. Right? >>>> >>>> I don't understand what you mean. If a compiler thread holds an oop, any >>>> oop, it must hold it in a Handle to ensure it can't be gc'd. >>>> >>>> David >>>> >>>>> If so, I think replacing the oops in the handles (and keeping the handles >>>> alive) would be better. And also much more simple. >>>>> >>>>> Best regards, >>>>> Martin >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: dean.long at oracle.com >>>>>> Sent: Samstag, 2. November 2019 08:36 >>>>>> To: Doerr, Martin ; David Holmes >>>>>> ; Kim Barrett >>>>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) >>>>>> ; hotspot-compiler- >> dev at openjdk.java.net >>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread >>>>>> >>>>>> Hi Martin, >>>>>> >>>>>> On 10/30/19 3:18 AM, Doerr, Martin wrote: >>>>>>> Hi David, >>>>>>> >>>>>>>> I don't think factoring out CompileBroker::clear_compiler2_object >>>> when >>>>>>>> it is only used once was warranted, but that's call for compiler team >> to >>>>>>>> make. >>>>>>> I did that because _compiler2_objects is private and there's currently >> no >>>>>> setter available. >>>>>>> But let's see what the compiler folks think. >>>>>> >>>>>> how about changing can_remove() to CompileBroker::can_remove()? >>>> Then >>>>>> you >>>>>> can access _compiler2_objects directly, right? >>>>>> >>>>>> dl >>>>>>>> Otherwise changes seem fine and I have noted the use of the >>>>>>>> MutexUnlocker as per your direct email. >>>>>>> Thanks a lot for reviewing. It was not a trivial one ?? >>>>>>> >>>>>>> You had noticed an incorrect usage of the CHECK macro. I've created a >>>> new >>>>>> bug for that: >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8233193 >>>>>>> Would be great if you could take a look if that's what you meant and >>>> made >>>>>> adaptions if needed. >>>>>>> >>>>>>> Best regards, >>>>>>> Martin >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: David Holmes >>>>>>>> Sent: Mittwoch, 30. Oktober 2019 05:47 >>>>>>>> To: Doerr, Martin ; Kim Barrett >>>>>>>> >>>>>>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com) >>>>>>>> ; hotspot-compiler- >>>> dev at openjdk.java.net; >>>>>>>> David Holmes >>>>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI >> CompilerThread >>>>>>>> >>>>>>>> Hi Martin, >>>>>>>> >>>>>>>> On 29/10/2019 12:06 am, Doerr, Martin wrote: >>>>>>>>> Hi David and Kim, >>>>>>>>> >>>>>>>>> I think it's easier to talk about code. So here's a new webrev: >>>>>>>>> >>>>>>>> >>>>>> >>>> >> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr >>>>>>>> ev.03/ >>>>>>>> >>>>>>>> I don't think factoring out CompileBroker::clear_compiler2_object >>>> when >>>>>>>> it is only used once was warranted, but that's call for compiler team >> to >>>>>>>> make. Otherwise changes seem fine and I have noted the use of the >>>>>>>> MutexUnlocker as per your direct email. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> @Kim: >>>>>>>>> Thanks for looking at the handle related parts. It's ok if you don't >> want >>>> to >>>>>>>> be a reviewer of the whole change. >>>>>>>>>> I think it's weird that can_remove() is a predicate with optional >> side >>>>>>>>>> effects. I think it would be simpler to have it be a pure predicate, >>>>>>>>>> and have the one caller with do_it = true perform the updates. >> That >>>>>>>>>> should include NULLing out the handle pointer (perhaps debug- >> only, >>>>>> but >>>>>>>>>> it doesn't cost much to cleanly maintain the data structure). >>>>>>>>> Nevertheless, it has the advantage that it enforces the update to >> be >>>>>>>> consistent. >>>>>>>>> A caller could use it without holding the lock or mess it up >> otherwise. >>>>>>>>> In addition, I don't what to change that as part of this fix. >>>>>>>>> >>>>>>>>>> So far as I can tell, THREAD == NULL here. >>>>>>>>> This is a very tricky part (not my invention): >>>>>>>>> EXCEPTION_MARK contains an ExceptionMark constructor call >> which >>>>>> sets >>>>>>>> __the_thread__ to Thread::current(). >>>>>>>>> I don't want to publish my opinion about this ?? >>>>>>>>> >>>>>>>>> @David: >>>>>>>>> Seems like this option is preferred over option 3 >>>>>>>> (possibly_add_compiler_threads part of webrev.02 and leave the >>>>>>>> initialization as is). >>>>>>>>> So when you're ok with it, I'll request a 2nd review from the >> compiler >>>>>> folks >>>>>>>> (I should change the subject to contain RFR). >>>>>>>>> Thanks, >>>>>>>>> Martin >>>>>>>>> >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: David Holmes >>>>>>>>>> Sent: Montag, 28. Oktober 2019 05:04 >>>>>>>>>> To: Kim Barrett >>>>>>>>>> Cc: Doerr, Martin ; Vladimir Kozlov >>>>>>>>>> (vladimir.kozlov at oracle.com) ; >>>> hotspot- >>>>>>>>>> compiler-dev at openjdk.java.net >>>>>>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI >>>> CompilerThread >>>>>>>>>> >>>>>>>>>> On 28/10/2019 1:42 pm, Kim Barrett wrote: >>>>>>>>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes >>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote: >>>>>>>>>>>>> Hi Kim, >>>>>>>>>>>>> I didn't like using the OopStorage stuff directly, either. I just >> have >>>>>> not >>>>>>>>>> seen how to allocate a global handle and add the oop later. >>>>>>>>>>>>> Thanks for pointing me to JVMCI::make_global. I was not >> aware >>>> of >>>>>>>> that. >>>>>>>>>>>>> So I can imagine 3 ways to implement it: >>>>>>>>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I >> just >>>>>> added >>>>>>>>>> that to >>>>>>>> >>>>>> >>>> >> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr >>>>>>>>>> ev.01/ >>>>>>>>>>>>> We may want to improve that further by setting the handle >>>> pointer >>>>>> to >>>>>>>>>> NULL and asserting that it is NULL before adding the new one. >>>>>>>>>>>>> I had been concerned about NULLs in the array, but looks like >>>> the >>>>>>>>>> existing code can deal with that. >>>>>>>>>>>> I think it would be cleaner to both destroy the global handle >> and >>>>>> NULL it >>>>>>>> in >>>>>>>>>> the array at the same time. >>>>>>>>>>>> This comment >>>>>>>>>>>> >>>>>>>>>>>> 325 // Old j.l.Thread object can die here. >>>>>>>>>>>> >>>>>>>>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct- >>>>>>> threadObj() >>>>>>>> so >>>>>>>>>> can't "die" until that is also cleared during the actual termination >>>>>> process. >>>>>>>>>>> I think if there is such a thread here that it can't die, because the >>>>>>>>>>> death predicate (the can_remove stuff) won't see that old >> thread >>>> as >>>>>>>>>>> the last thread in _compiler2_objects. That's what I meant by >> this: >>>>>>>>>>> >>>>>>>>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett >>>> >>>>>>>>>> wrote: >>>>>>>>>>>> I also think that here: >>>>>>>>>>>> >>>>>>>>>>>> 947 jobject thread_handle = >>>>>>>> JNIHandles::make_global(thread_oop); >>>>>>>>>>>> 948 _compiler2_objects[i] = thread_handle; >>>>>>>>>>>> >>>>>>>>>>>> should assert _compiler2_objects[i] == NULL. Or if that isn't a >>>> valid >>>>>>>>>>>> assertion then I think there are other problems. >>>>>>>>>>> I think either that comment about an old thread is wrong (and >> the >>>>>> NULL >>>>>>>>>>> assertion I suggested is okay), or I think the whole mechanism >> here >>>>>>>>>>> has problems. Or at least I was unable to figure out how it could >>>>>> work... >>>>>>>>>>> >>>>>>>>>> I'm not following sorry. You can't assert NULL unless it's actually >> set >>>>>>>>>> to NULL which it presently isn't. But it could be set NULL as Martin >>>>>>>>>> suggested: >>>>>>>>>> >>>>>>>>>> "We may want to improve that further by setting the handle >> pointer >>>> to >>>>>>>>>> NULL and asserting that it is NULL before adding the new one." >>>>>>>>>> >>>>>>>>>> and which I also supported. But that aside once the delete_global >>>> has >>>>>>>>>> been called that JNIHandle no longer references the j.l.Thread >> that it >>>>>>>>>> did, at which point it is only reachable via the threadObj() of the >>>>>>>>>> CompilerThread. >>>>>>>>>> >>>>>>>>>> David >>>>> From kim.barrett at oracle.com Wed Nov 6 03:09:01 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 5 Nov 2019 22:09:01 -0500 Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread In-Reply-To: References: <15a92da5-c5ba-ce55-341d-5f60acf14c3a@oracle.com> <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com> <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com> <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com> <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com> <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com> Message-ID: <8BCAE092-9E05-4B38-9D41-2F3AC889C0AA@oracle.com> > On Nov 5, 2019, at 3:40 AM, Doerr, Martin wrote: Coming back in, because this seems to be going off into the weeds again. >> I don't understand what you mean. If a compiler thread holds an oop, any >> oop, it must hold it in a Handle to ensure it can't be gc'd. > > The problem is not related to gc. > My change introduces destroy_global for the handles. This means that the OopStorage portion which has held the oop can get freed. > However, other compiler threads are running concurrently. They may execute code which reads the oop from the handle which is freed by this thread. > Reading stale data is not a problem here, but reading freed memory may assert or even crash in general. > I can't see how OopStorage supports reading from handles which were freed by destroy_global. So don't do that! OopStorage isn't magic. If you are going to look at an OopStorage handle, you have to ensure there won't be concurrent deletion. Use locks or some safe memory reclamation protocol. (GlobalCounter might be used here, but it depends a lot on what the iterations are doing. A reference counting mechanism is another possibility.) This is no different from any other resource management. > I think it would be safe if the freeing only occurred at safepoints, but I don't think this is the case. Assuming the iteration didn?t happen at safepoints (which is just a way to make the iteration and deletion not concurrent). And I agree that isn?t the case with the current code. From rwestrel at redhat.com Wed Nov 6 08:42:48 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 06 Nov 2019 09:42:48 +0100 Subject: [8u] RFR: 8233023: assert(Opcode() == mem->Opcode() || phase->C->get_alias_index(adr_type()) == Compile::AliasIdxRaw) failed: no mismatched stores, except on raw memory In-Reply-To: <66e485457444840d563685516765e31304691932.camel@redhat.com> References: <7a2d50793120bdeb86d0047fd09db18c725328e0.camel@redhat.com> <87tv7ie8xv.fsf@redhat.com> <66e485457444840d563685516765e31304691932.camel@redhat.com> Message-ID: <87r22le6pj.fsf@redhat.com> >> Isn't this: >> >> @@ -3213,6 +3221,9 @@ >> // within the initialized memory. >> intptr_t InitializeNode::can_capture_store(StoreNode* st, PhaseTransform* phase, bool can_reshape) { >> const int FAIL = 0; >> + if (st->is_unaligned_access()) { >> + return FAIL; >> + } >> if (st->req() != MemNode::ValueIn + 1) >> return FAIL; // an inscrutable StoreNode (card mark?) >> Node* ctl = st->in(MemNode::Control); >> >> also missing from the 8140309? > > It wasn't missing from the 8u backport of 8140309. See: > http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/0ffee573412b > > It was removed later by JDK-8202414. Ok. > Aah, good catch. I didn't mean to include part of 8080289. Updated > webrev: > http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8233023/02/webrev/ Looks good to me. Roland. From martin.doerr at sap.com Wed Nov 6 09:12:28 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 6 Nov 2019 09:12:28 +0000 Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread In-Reply-To: <8BCAE092-9E05-4B38-9D41-2F3AC889C0AA@oracle.com> References: <15a92da5-c5ba-ce55-341d-5f60acf14c3a@oracle.com> <2C595A53-202F-4552-96AB-73FF2B922120@oracle.com> <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com> <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com> <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com> <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com> <8BCAE092-9E05-4B38-9D41-2F3AC889C0AA@oracle.com> Message-ID: Hi Kim, thanks for confirming. http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webrev.04/ already avoids access to freed handles. I don't really like the complexity of this code. Replacing oops in handles would have been much more simple. But I can live with either version. Best regards, Martin > -----Original Message----- > From: Kim Barrett > Sent: Mittwoch, 6. November 2019 04:09 > To: Doerr, Martin > Cc: David Holmes ; dean.long at oracle.com; > Vladimir Kozlov (vladimir.kozlov at oracle.com) > ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread > > > On Nov 5, 2019, at 3:40 AM, Doerr, Martin wrote: > > Coming back in, because this seems to be going off into the weeds again. > > >> I don't understand what you mean. If a compiler thread holds an oop, any > >> oop, it must hold it in a Handle to ensure it can't be gc'd. > > > > The problem is not related to gc. > > My change introduces destroy_global for the handles. This means that the > OopStorage portion which has held the oop can get freed. > > However, other compiler threads are running concurrently. They may > execute code which reads the oop from the handle which is freed by this > thread. > > Reading stale data is not a problem here, but reading freed memory may > assert or even crash in general. > > I can't see how OopStorage supports reading from handles which were > freed by destroy_global. > > So don't do that! > > OopStorage isn't magic. If you are going to look at an OopStorage > handle, you have to ensure there won't be concurrent deletion. Use > locks or some safe memory reclamation protocol. (GlobalCounter might > be used here, but it depends a lot on what the iterations are doing. A > reference counting mechanism is another possibility.) This is no > different from any other resource management. > > > I think it would be safe if the freeing only occurred at safepoints, but I don't > think this is the case. > > Assuming the iteration didn?t happen at safepoints (which is just a way to > make the iteration and > deletion not concurrent). And I agree that isn?t the case with the current > code. From sgehwolf at redhat.com Wed Nov 6 09:28:19 2019 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Wed, 06 Nov 2019 10:28:19 +0100 Subject: [8u] RFR: 8233023: assert(Opcode() == mem->Opcode() || phase->C->get_alias_index(adr_type()) == Compile::AliasIdxRaw) failed: no mismatched stores, except on raw memory In-Reply-To: <87r22le6pj.fsf@redhat.com> References: <7a2d50793120bdeb86d0047fd09db18c725328e0.camel@redhat.com> <87tv7ie8xv.fsf@redhat.com> <66e485457444840d563685516765e31304691932.camel@redhat.com> <87r22le6pj.fsf@redhat.com> Message-ID: <2e6ec654637b3ceac2af2d7cf02c72398bda0d11.camel@redhat.com> On Wed, 2019-11-06 at 09:42 +0100, Roland Westrelin wrote: > > > Isn't this: > > > > > > @@ -3213,6 +3221,9 @@ > > > // within the initialized memory. > > > intptr_t InitializeNode::can_capture_store(StoreNode* st, PhaseTransform* phase, bool can_reshape) { > > > const int FAIL = 0; > > > + if (st->is_unaligned_access()) { > > > + return FAIL; > > > + } > > > if (st->req() != MemNode::ValueIn + 1) > > > return FAIL; // an inscrutable StoreNode (card mark?) > > > Node* ctl = st->in(MemNode::Control); > > > > > > also missing from the 8140309? > > > > It wasn't missing from the 8u backport of 8140309. See: > > http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/0ffee573412b > > > > It was removed later by JDK-8202414. > > Ok. > > > Aah, good catch. I didn't mean to include part of 8080289. Updated > > webrev: > > http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8233023/02/webrev/ > > Looks good to me. Thanks again for the review, Roland. Cheers, Severin From christian.hagedorn at oracle.com Wed Nov 6 10:13:03 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 6 Nov 2019 11:13:03 +0100 Subject: [14] RFR(S): 8229694: JVM crash in SWPointer during C2 OSR compilation Message-ID: Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8229694 http://cr.openjdk.java.net/~chagedorn/8229694/webrev.00/ The JVM crashes in the testcase when trying to dereference mem at [1] which is NULL. This happens when converting packs into vector nodes in SuperWord::output() by reading an unexpected NULL value with SuperWord::align_to_ref() [2]. The corresponding field _align_to_ref is set to NULL at [3] since best_align_to_mem_ref was assigned NULL just before at [4]. _packset only contained one pack and there were no memory operations left to be processed (memops is empty). As a result SuperWord::find_align_to_ref() will return NULL since it needs at least two operations to find an alignment. The fix is straight forward to directly use the alignment of the only pack remaining if there are no memory operations left in memops to find another alignment. The testcase creates such a situation where only one pack remains at [4] when the loop is unrolled four times. When calling SuperWord::find_adjacent_refs() there are: - 4 StoreI for intArr[j-1] = 400 - 4 StoreC for shortArr[j] = 30 - 2 StoreI for intArr[7] = 260 // Initially 4 but 2 are removed by IGVN in Ideal() - 2 StoreC for shortArr[10] = 10 // Initially 4 but 2 are removed by IGVN in Ideal() - 2 LoadI (and 2 StoreI) for iFld = intArr[j] // Initially 4 each but 2 of each are removed by IGVN in Ideal() The field stores are obviously ignored for the superword algorithm. intArr[j-1] aligns with intArr[7] and therefore create_pack is true. The only pack created is one with two immediately following stores for intArr[j-1]. The IGVN algorithm is not able to remove the first redundant store to intArr[7] when the loop is unrolled the first time. Only when unrolling it again the second time, it is able to remove the two newly created redundant stores to intArr[7]. This leaves us with the following depencendies of stores: "intArr[j-1] -> intArr[j-1] -> intArr[7] -> intArr[j-1] -> intArr[7] -> intArr[j-1]" from which only the first two operations can be used to create a pack. The very same applies to the StoreC nodes. As a result, one pack for StoreI and one for StoreC are created in total. There are now only the two LoadI nodes of intArr[j] left which are not aligned with intArr[j-1]. Therefore, all StoreI packs are removed at [5]. This leaves us with exactly one ShortC pack and an empty memops list which sets the alignment to NULL and eventually lets the JVM crash at [1]. We might want to file an RFE to investigate further why IGVN cannot remove the first redundant stores to intArr[7], shortArr[10], and iFld, respectively (even though it's quite useless to keep setting the same values in a loop). This problem can also be observed if the loop only contains the statement "iFld = intArr[j]". But I think even if those redundant stores would have been optimized away we should have this fix to handle the situation with only one pack and no memory operations remaining. Thank you! Best regards, Christian [1] http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l3608 [2] http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l2328 [3] http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l732 [4] http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l708 [5] http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l688 From david.holmes at oracle.com Wed Nov 6 10:14:30 2019 From: david.holmes at oracle.com (David Holmes) Date: Wed, 6 Nov 2019 20:14:30 +1000 Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread In-Reply-To: References: <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com> <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com> <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com> <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com> <8BCAE092-9E05-4B38-9D41-2F3AC889C0AA@oracle.com> Message-ID: <84ef3a8c-5005-6529-5192-b9214e0348ac@oracle.com> Hi Martin, On 6/11/2019 7:12 pm, Doerr, Martin wrote: > Hi Kim, > > thanks for confirming. > > http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webrev.04/ > already avoids access to freed handles. Sorry I missed your earlier reference to this version. So the expectation here is that all accesses to these arrays are guarded by the CompileThread_lock, but that doesn't seem to hold for get_log ? Thanks, David ----- > I don't really like the complexity of this code. > Replacing oops in handles would have been much more simple. > But I can live with either version. > > Best regards, > Martin > > >> -----Original Message----- >> From: Kim Barrett >> Sent: Mittwoch, 6. November 2019 04:09 >> To: Doerr, Martin >> Cc: David Holmes ; dean.long at oracle.com; >> Vladimir Kozlov (vladimir.kozlov at oracle.com) >> ; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread >> >>> On Nov 5, 2019, at 3:40 AM, Doerr, Martin wrote: >> >> Coming back in, because this seems to be going off into the weeds again. >> >>>> I don't understand what you mean. If a compiler thread holds an oop, any >>>> oop, it must hold it in a Handle to ensure it can't be gc'd. >>> >>> The problem is not related to gc. >>> My change introduces destroy_global for the handles. This means that the >> OopStorage portion which has held the oop can get freed. >>> However, other compiler threads are running concurrently. They may >> execute code which reads the oop from the handle which is freed by this >> thread. >>> Reading stale data is not a problem here, but reading freed memory may >> assert or even crash in general. >>> I can't see how OopStorage supports reading from handles which were >> freed by destroy_global. >> >> So don't do that! >> >> OopStorage isn't magic. If you are going to look at an OopStorage >> handle, you have to ensure there won't be concurrent deletion. Use >> locks or some safe memory reclamation protocol. (GlobalCounter might >> be used here, but it depends a lot on what the iterations are doing. A >> reference counting mechanism is another possibility.) This is no >> different from any other resource management. >> >>> I think it would be safe if the freeing only occurred at safepoints, but I don't >> think this is the case. >> >> Assuming the iteration didn?t happen at safepoints (which is just a way to >> make the iteration and >> deletion not concurrent). And I agree that isn?t the case with the current >> code. > From tobias.hartmann at oracle.com Wed Nov 6 13:34:15 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 6 Nov 2019 14:34:15 +0100 Subject: RFR(XS): 8233491: Crash in AdapterHandlerLibrary::get_adapter with CDS due to code cache exhaustion Message-ID: Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8233491 http://cr.openjdk.java.net/~thartmann/8233491/webrev.00/ When running a stress test with CDS, we fail to create adapters when linking a method from a shared class because the code cache is full. This case is not properly handled by the CDS specific code and instead of throwing a VirtualMachineError, we crash because "entry" is NULL. I'm able to spuriously reproduce this with a test (see [1]) but since the problem depends on the class loading sequence, I was not able to make it more reliable or convert it to a robust jtreg test. However, I've verified that the patch fixes the problem. Thanks, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8233491?focusedCommentId=14298462&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14298462 From vitalyd at gmail.com Wed Nov 6 13:45:35 2019 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 6 Nov 2019 08:45:35 -0500 Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out In-Reply-To: <87y2wu7kpn.fsf@redhat.com> References: <878spbc0c8.fsf@redhat.com> <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com> <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com> <87y2wu7kpn.fsf@redhat.com> Message-ID: Hi Roland, Thanks for fixing this! :) Perhaps a bit too premature to ask but: any chance this will get backported to 11? Thanks On Tue, Nov 5, 2019 at 4:10 AM Roland Westrelin wrote: > > Hi Tobias, > > Thanks for the review and for performance testing. > > Roland. > From claes.redestad at oracle.com Wed Nov 6 14:42:50 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 6 Nov 2019 15:42:50 +0100 Subject: RFR: 8233708: VectorSet cleanup Message-ID: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com> Hi, VectorSet needs a cleanup: - Extravagant use of operator overloading (<<= adds items to the set, >>= removes them :eyeroll:) - Plenty of unused methods - Since VectorSet is the only implementation in the code base, the abstract base class Set is unnecessary - Various method names in conflict with HotSpot code "standard" Webrev: http://cr.openjdk.java.net/~redestad/8233708/open.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8233708 Testing: tier1-3, built and sanity tested with Shenandoah due changes in some Shenandoah C2 support Thanks! /Claes From rwestrel at redhat.com Wed Nov 6 14:54:34 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 06 Nov 2019 15:54:34 +0100 Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out In-Reply-To: References: <878spbc0c8.fsf@redhat.com> <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com> <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com> <87y2wu7kpn.fsf@redhat.com> Message-ID: <87o8xpdphx.fsf@redhat.com> Hi Vitaly, > Thanks for fixing this! :) Perhaps a bit too premature to ask but: any > chance this will get backported to 11? (I still need an extra review in order to push this) I'll get it backported to 11u (that is openjdk 11u, I can't comment about Oracle 11u). Roland. From vitalyd at gmail.com Wed Nov 6 15:04:37 2019 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 6 Nov 2019 10:04:37 -0500 Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out In-Reply-To: <87o8xpdphx.fsf@redhat.com> References: <878spbc0c8.fsf@redhat.com> <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com> <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com> <87y2wu7kpn.fsf@redhat.com> <87o8xpdphx.fsf@redhat.com> Message-ID: Hey Roland, On Wed, Nov 6, 2019 at 9:54 AM Roland Westrelin wrote: > > Hi Vitaly, > > > Thanks for fixing this! :) Perhaps a bit too premature to ask but: any > > chance this will get backported to 11? > > (I still need an extra review in order to push this) > > I'll get it backported to 11u (that is openjdk 11u, I can't comment > about Oracle 11u). > Perfect! I'm actually interested in openjdk11 - should've made that clear, sorry. Thanks again for tackling this so quickly! > > Roland. > From nils.eliasson at oracle.com Wed Nov 6 15:08:51 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 6 Nov 2019 16:08:51 +0100 Subject: RFR: 8233708: VectorSet cleanup In-Reply-To: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com> References: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com> Message-ID: Hi Claes, Excellent cleanup! // Nils On 2019-11-06 15:42, Claes Redestad wrote: > Hi, > > VectorSet needs a cleanup: > > - Extravagant use of operator overloading (<<= adds items to the set, > >>= removes them :eyeroll:) > - Plenty of unused methods > - Since VectorSet is the only implementation in the code base, the > abstract base class Set is unnecessary > - Various method names in conflict with HotSpot code "standard" > > Webrev: http://cr.openjdk.java.net/~redestad/8233708/open.00/ > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8233708 > > Testing: tier1-3, built and sanity tested with Shenandoah due changes > in some Shenandoah C2 support > > Thanks! > > /Claes From martin.doerr at sap.com Wed Nov 6 15:15:30 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 6 Nov 2019 15:15:30 +0000 Subject: [8u] RFR for backport of 8198894 (CRC32 1/4): [PPC64] More generic vector CRC implementation (v2) In-Reply-To: References: <0dc83fcb-4e09-5841-04be-aee615e5a7fd@linux.vnet.ibm.com> <67e6e482-df56-27d0-da20-7968615f3ea1@linux.vnet.ibm.com> Message-ID: Hi Gustavo, > [PPC64] More generic vector CRC implementation (1/4) > http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8198894/ This seems to be the version I had already reviewed. I'm still ok with it. > [PPC64] Possibly unreliable stack frame resizing in template interpreter (2/4) > http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8216376/ Normally, backports should get handled in separate backport RFRs, but the manual change in this version could be considered trivial, so I don't insist on separate RFR. Looks good, now. > [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays > http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8216060/ (3/4) Almost like above. I'd leave at least the CRC32C defines in the code (stubRoutines_ppc). They don't disturb. Why should we introduce additional diffs? > [PPC64] Cleanup non-vector version of CRC32 > http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8217459/ (4/4) This one contains a part of another shared code change. So I can't review it as part of this RFR. Needs to get reviewed separately. Backport of the original change which introduced LITTLE_ENDIAN_ONLY etc. should get evaluated. Best regards, Martin > -----Original Message----- > From: Gustavo Romero > Sent: Montag, 4. November 2019 23:33 > To: Doerr, Martin ; hotspot-compiler- > dev at openjdk.java.net > Cc: jdk8u-dev at openjdk.java.net > Subject: Re: [8u] RFR for backport of 8198894 (CRC32 1/4): [PPC64] More > generic vector CRC implementation (v2) > > Hello Martin, > > On 10/24/2019 07:17 AM, Doerr, Martin wrote: > > Hi Gustavo, > > > > I think removing invertCRC is an unnecessary manual change. > > We should minimize that as far as possible. They may create merge > conflicts for future backports. > > Thanks a lot for the review. > > I agree I should minimize the changes as far as possible. I added back > invertCRC > and tried to follow your advice, so the final clean-up patch is almost similar > to the one found on jdk/jdk, for instance. > > Please find v2 for the patchset below. v2 changes affect only 3/4 and 4/4. > > > [PPC64] More generic vector CRC implementation (1/4) > http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8198894/ > > v2: > - Adapt file names to OpenJDK 8u > - Remove CRC32C part, leaving only CRC32 part, since OpenJDK 8u has no > CRC32C > - Add Assembler::add_const_optimized() from "8077838: Recent > developments for ppc" [0] > - Fix vpermxor() opcode, replacing VPMSUMW_OPCODE by > VPERMXOR_OPCODE, > accordingly to fix in "8190781: ppc64 + s390: Fix CriticalJNINatives" [1] > - Adapt signatures for the following functions and their callers, accordingly to > "8175369: [ppc] Provide intrinsic implementation for CRC32C" [2]: > a. MacroAssembler::update_byteLoop_crc32(), removing 'invertCRC' > parameter > b. MacroAssembler::kernel_crc32_1word(), adding 'invertCRC' parameter > > > [PPC64] Possibly unreliable stack frame resizing in template interpreter (2/4) > http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8216376/ > > v2: > - Adapt file names to OpenJDK 8u > - Remove CRC32C code > > > [PPC64] Vector CRC implementation should be used by interpreter and be > faster for short arrays > http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8216060/ > (3/4) > > v2: > - Remove CRC32C code, keeping is_crc32c in crc32(), code related to is_crc32c > and invertCRC, like code in kernel_crc32_vpmsum(), and not touching stub > code > mark in generate_CRC32_updateBytes() to avoid merge conflicts in future > backports. > > > [PPC64] Cleanup non-vector version of CRC32 > http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8217459/ > (4/4) > > v2: > - Add {BIG,LITTLE}_ENDIAN_ONLY to src/share/vm/utilities/macros.hpp > - Add kernel_crc32_singleByteReg from change 8175369 [2] as the clean-up > uses it > in InterpreterGenerator::generate_CRC32_update_entry(). > > > -- > > Best regards, > Gustavo > > [0] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/88847a1b3718 > [1] http://hg.openjdk.java.net/jdk/jdk/rev/5a69ba3a4fd1#l1.7 > [2] https://bugs.openjdk.java.net/browse/JDK-8175369 From shade at redhat.com Wed Nov 6 15:49:26 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 6 Nov 2019 16:49:26 +0100 Subject: RFR: 8233708: VectorSet cleanup In-Reply-To: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com> References: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com> Message-ID: <336853d7-3047-0194-4e36-9ee968e07a86@redhat.com> On 11/6/19 3:42 PM, Claes Redestad wrote: > Webrev: http://cr.openjdk.java.net/~redestad/8233708/open.00/ Shenandoah part looks fine. The rest looks fine too. -- Thanks, -Aleksey From tobias.hartmann at oracle.com Wed Nov 6 15:57:13 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 6 Nov 2019 16:57:13 +0100 Subject: RFR: 8233708: VectorSet cleanup In-Reply-To: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com> References: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com> Message-ID: Hi Claes, very nice. While you're at it, could you add brackets to the for-loops/ifs in vectset.cpp:70, vectset.hpp:89, 98 and fix the whitespacing? No new webrev required. Thanks, Tobias On 06.11.19 15:42, Claes Redestad wrote: > Hi, > > VectorSet needs a cleanup: > > - Extravagant use of operator overloading (<<= adds items to the set, >>>= removes them :eyeroll:) > - Plenty of unused methods > - Since VectorSet is the only implementation in the code base, the > abstract base class Set is unnecessary > - Various method names in conflict with HotSpot code "standard" > > Webrev: http://cr.openjdk.java.net/~redestad/8233708/open.00/ > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8233708 > > Testing: tier1-3, built and sanity tested with Shenandoah due changes > in some Shenandoah C2 support > > Thanks! > > /Claes From gnu.andrew at redhat.com Wed Nov 6 16:51:47 2019 From: gnu.andrew at redhat.com (Andrew John Hughes) Date: Wed, 6 Nov 2019 16:51:47 +0000 Subject: [8u] RFR: 8233023: assert(Opcode() == mem->Opcode() || phase->C->get_alias_index(adr_type()) == Compile::AliasIdxRaw) failed: no mismatched stores, except on raw memory In-Reply-To: <7a2d50793120bdeb86d0047fd09db18c725328e0.camel@redhat.com> References: <7a2d50793120bdeb86d0047fd09db18c725328e0.camel@redhat.com> Message-ID: On 30/10/2019 09:41, Severin Gehwolf wrote: > Hi, > > Could I please get a review of this 8u only issue? The reason a > fastdebug build of latest OpenJDK 8u asserts for the dec-tree benchmark > of the renaissance suite is because the 8u backport of JDK-8140309 was > missing this hunk from JDK 9[1]: > > + (Opcode() == Op_StoreL && st->Opcode() == Op_StoreI) || // expanded ClearArrayNode > + (is_mismatched_access() || st->as_Store()->is_mismatched_access()), > > I had a closer look and there doesn't seem to be missing anything else. > The proposed fix is to amend the assert condition in the appropriate > place, which brings 8u in line with JDK 9 code where the failure isn't > observed. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8233023 > webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8233023/01/webrev/ > > Testing: 8u tier1 test set with fastdebug build on x86_64 Linux. No new > failures. dec-tree benchmark now runs successfully on an 8u fastdebug > build. > > Thoughts? > > Thanks, > Severin > > [1] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/4bee38ba018c > I compared the two patches and this missing hunk does stand out. So the patch looks fine in that respect. I notice they also didn't backport the testcase to 8u. Any thoughts on including that? Thanks, -- Andrew :) Senior Free Java Software Engineer Red Hat, Inc. (http://www.redhat.com) PGP Key: ed25519/0xCFDA0F9B35964222 (hkp://keys.gnupg.net) Fingerprint = 5132 579D D154 0ED2 3E04 C5A0 CFDA 0F9B 3596 4222 https://keybase.io/gnu_andrew From vladimir.kozlov at oracle.com Wed Nov 6 18:01:44 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 6 Nov 2019 10:01:44 -0800 Subject: RFR(XS): 8233491: Crash in AdapterHandlerLibrary::get_adapter with CDS due to code cache exhaustion In-Reply-To: References: Message-ID: CC to runtime group too. Looks good to me. Thanks, Vladimir On 11/6/19 5:34 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8233491 > http://cr.openjdk.java.net/~thartmann/8233491/webrev.00/ > > When running a stress test with CDS, we fail to create adapters when linking a method from a shared > class because the code cache is full. This case is not properly handled by the CDS specific code and > instead of throwing a VirtualMachineError, we crash because "entry" is NULL. > > I'm able to spuriously reproduce this with a test (see [1]) but since the problem depends on the > class loading sequence, I was not able to make it more reliable or convert it to a robust jtreg > test. However, I've verified that the patch fixes the problem. > > Thanks, > Tobias > > [1] > https://bugs.openjdk.java.net/browse/JDK-8233491?focusedCommentId=14298462&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14298462 > From vladimir.kozlov at oracle.com Wed Nov 6 18:05:40 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 6 Nov 2019 10:05:40 -0800 Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out In-Reply-To: <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com> References: <878spbc0c8.fsf@redhat.com> <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com> <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com> Message-ID: I am fine with this conservative fix (bailout optimization). It is good that performance is not affected. Thanks, Vladimir On 11/5/19 1:01 AM, Tobias Hartmann wrote: > Performance results look good. > > Best regards, > Tobias > > On 04.11.19 08:25, Tobias Hartmann wrote: >> Hi Roland, >> >> this seems reasonable to me but I'm concerned that it might cause performance regressions. I'll run >> some tests in our system. >> >> Best regards, >> Tobias >> >> On 23.10.19 10:50, Roland Westrelin wrote: >>> >>> http://cr.openjdk.java.net/~roland/8232539/webrev.00/ >>> >>> I couldn't come up with a test case because node processing order during >>> IGVN matters. Bug was reported against 11 but I see no reason it >>> wouldn't apply to current code as well. >>> >>> At parse time, predicates are added by >>> Parse::maybe_add_predicate_after_if() but not loop is actually >>> created. Compile::_major_progress is cleared. On the next round of IGVN, >>> one input of a region points to predicates. The same region has an if as >>> use that can be split through phi during IGVN. The predicates are going >>> to be removed by IGVN. But that happens in multiple steps because there >>> are several predicates (for reason Deoptimization::Reason_predicate, >>> Deoptimization::Reason_loop_limit_check etc.) and because for each >>> predicate one IGVN iteration must first remove the Opaque1 node, then >>> another kill the IfFalse projection, finally another replace the IfTrue >>> projection by the If control input. >>> >>> Split if occurs while predicates are in the process of being removed. It >>> sees predicates, tries to walk over them, encounters a predicates that's >>> been half removed (false projection removed) and we hit the assert/crash. >>> >>> I propose we simply not apply IGVN split if if we're splitting through a >>> loop or if there's a predicate input to a region because: >>> >>> - Making split if robust to dying predicates is not straightforward as >>> far as I can tell >>> >>> - Loop opts split if doesn't split through loop header so why would it >>> make sense for IGVN split if? >>> >>> - I'm wondering if there are other cases where handling of predicates in >>> split if could be wrong (and so more trouble ahead): >>> >>> + What if we split through a Loop region, predicates were added by >>> loop optimizations, loop opts are now over so the predicates added at >>> parse time were removed: then PhaseIdealLoop::find_predicate() >>> wouldn't report a predicate but cloning predicates would still be >>> required for correctness? >>> >>> + What if we have no loop, a region has predicates as input, >>> predicates are going to die but have not yet been processed, split if >>> uselessly duplicates predicates but one of then is control dependent >>> on the branch it is in so cloning predicates actually causes a broken >>> graph? >>> >>> So overall it feels safer to me to simply bail out from split if for >>> loops/predicates. >>> >>> Roland. >>> From vladimir.kozlov at oracle.com Wed Nov 6 18:44:47 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 6 Nov 2019 10:44:47 -0800 Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: References: Message-ID: Hi Bernard, It is interesting suggestion. I don't see we use BTR and BTS currently. Sandhya, do these instructions has some limitations/restrictions? Regarding changes. For new code we prefer to have new encoding of macroasm instructions used in .ad files instead of opcodes [1]. This way we make sure correct encoding is used on different CPUs. Thanks, Vladimir [1] http://hg.openjdk.java.net/jdk/jdk/file/38d4202154f2/src/hotspot/cpu/x86/x86_64.ad#l10051 On 11/2/19 10:18 AM, B. Blaser wrote: > Hi, > > I experimented, some time ago, with an optimization of several common > flag patterns (see also JBS) using BTR/BTS instead of AND/OR > instructions on x86_64 xeon: > > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @State(Scope.Thread) > public class BitSetAndReset { > private static final int COUNT = 10_000; > > private static final long MASK63 = 0x8000_0000_0000_0000L; > private static final long MASK31 = 0x0000_0000_8000_0000L; > private static final long MASK15 = 0x0000_0000_0000_8000L; > private static final long MASK00 = 0x0000_0000_0000_0001L; > > private long andq, orq; > private boolean success = true; > > @TearDown(Level.Iteration) > public void finish() { > if (!success) > throw new AssertionError("Failure while setting or > clearing long vector bits!"); > } > > @Benchmark > public void bitSet(Blackhole bh) { > for (int i=0; i andq = MASK63 | MASK31 | MASK15 | MASK00; > orq = 0; > bh.consume(test63()); > bh.consume(test31()); > bh.consume(test15()); > bh.consume(test00()); > success &= andq == 0 && orq == (MASK63 | MASK31 | MASK15 | MASK00); > } > } > > private long test63() { > andq &= ~MASK63; > orq |= MASK63; > return 0L; > } > private long test31() { > andq &= ~MASK31; > orq |= MASK31; > return 0L; > } > private long test15() { > andq &= ~MASK15; > orq |= MASK15; > return 0L; > } > private long test00() { > andq &= ~MASK00; > orq |= MASK00; > return 0L; > } > } > > Running the benchmark this way: > > $ make test TEST="micro:vm.compiler.BitSetAndReset" > MICRO="VM_OPTIONS='-XX:CompileCommand=print,org/openjdk/bench/vm/compiler/BitSetAndReset.*test*';FORK=3;WARMUP_ITER=1;ITER=3" > > We had before: > > 03e movq R10, #9223372036854775807 # long > 048 andq [RSI + #16 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 04c movq R10, #-9223372036854775808 # long > 056 orq [RSI + #24 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 05a ... > > => 28 bytes > > 03c xorl RAX, RAX # long > 03e movq R10, #-2147483649 # long > 048 andq [RSI + #16 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 04c movl R10, #2147483648 # long (unsigned 32-bit) > 052 orq [RSI + #24 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 056 ... > > => 26 bytes > > 03c andq [RSI + #16 (8-bit)], #-32769 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 044 orq [RSI + #24 (8-bit)], #32768 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 04c ... > > 03c andq [RSI + #16 (8-bit)], #-2 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 041 orq [RSI + #24 (8-bit)], #1 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 046 ... > > Benchmark Mode Cnt Score Error Units > BitSetAndReset.bitSet avgt 9 78083.773 ? 2182.692 ns/op > > And we would have after: > > 03c btrq [RSI + #16 (8-bit)], log2(not(#9223372036854775807)) > # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 042 btsq [RSI + #24 (8-bit)], log2(#-9223372036854775808) > # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 048 ... > > => 12 bytes > > 03c btrq [RSI + #16 (8-bit)], log2(not(#-2147483649)) # > long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 042 xorl RAX, RAX # long > 044 movl R10, #2147483648 # long (unsigned 32-bit) > 04a orq [RSI + #24 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 04e ... > > => 18 bytes > > 03c andq [RSI + #16 (8-bit)], #-32769 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 044 orq [RSI + #24 (8-bit)], #32768 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 04c ... > > 03c andq [RSI + #16 (8-bit)], #-2 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 041 orq [RSI + #24 (8-bit)], #1 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 046 ... > > Benchmark Mode Cnt Score Error Units > BitSetAndReset.bitSet avgt 9 77355.154 ? 252.503 ns/op > > We see a tiny performance gain with BTR/BTS but the major interest > remains the much better encoding with up to 16 bytes saving for pure > 64-bit immediates along with a lower register consumption. > > Does the patch below look reasonable enough to eventually rebase and > push it to jdk/submit and to post a RFR maybe soon if all goes well? > > Thanks, > Bernard > > diff --git a/src/hotspot/cpu/x86/x86_64.ad b/src/hotspot/cpu/x86/x86_64.ad > --- a/src/hotspot/cpu/x86/x86_64.ad > +++ b/src/hotspot/cpu/x86/x86_64.ad > @@ -2069,6 +2069,16 @@ > } > %} > > + enc_class Log2L(immPow2L imm) > + %{ > + emit_d8(cbuf, log2_long($imm$$constant)); > + %} > + > + enc_class Log2NotL(immPow2NotL imm) > + %{ > + emit_d8(cbuf, log2_long(~$imm$$constant)); > + %} > + > enc_class opc2_reg(rRegI dst) > %{ > // BSWAP > @@ -3131,6 +3141,28 @@ > interface(CONST_INTER); > %} > > +operand immPow2L() > +%{ > + // n should be a pure 64-bit power of 2 immediate. > + predicate(is_power_of_2_long(n->get_long()) && > log2_long(n->get_long()) > 31); > + match(ConL); > + > + op_cost(15); > + format %{ %} > + interface(CONST_INTER); > +%} > + > +operand immPow2NotL() > +%{ > + // n should be a pure 64-bit immediate given that not(n) is a power of 2. > + predicate(is_power_of_2_long(~n->get_long()) && > log2_long(~n->get_long()) > 30); > + match(ConL); > + > + op_cost(15); > + format %{ %} > + interface(CONST_INTER); > +%} > + > // Long Immediate zero > operand immL0() > %{ > @@ -9740,6 +9772,19 @@ > ins_pipe(ialu_mem_imm); > %} > > +instruct btrL_mem_imm(memory dst, immPow2NotL src, rFlagsReg cr) > +%{ > + match(Set dst (StoreL dst (AndL (LoadL dst) src))); > + effect(KILL cr); > + > + ins_cost(125); > + format %{ "btrq $dst, log2(not($src))\t# long" %} > + opcode(0x0F, 0xBA, 0x06); > + ins_encode(REX_mem_wide(dst), OpcP, OpcS, > + RM_opc_mem(tertiary, dst), Log2NotL(src)); > + ins_pipe(ialu_mem_imm); > +%} > + > // BMI1 instructions > instruct andnL_rReg_rReg_mem(rRegL dst, rRegL src1, memory src2, > immL_M1 minus_1, rFlagsReg cr) %{ > match(Set dst (AndL (XorL src1 minus_1) (LoadL src2))); > @@ -9933,6 +9978,19 @@ > ins_pipe(ialu_mem_imm); > %} > > +instruct btsL_mem_imm(memory dst, immPow2L src, rFlagsReg cr) > +%{ > + match(Set dst (StoreL dst (OrL (LoadL dst) src))); > + effect(KILL cr); > + > + ins_cost(125); > + format %{ "btsq $dst, log2($src)\t# long" %} > + opcode(0x0F, 0xBA, 0x05); > + ins_encode(REX_mem_wide(dst), OpcP, OpcS, > + RM_opc_mem(tertiary, dst), Log2L(src)); > + ins_pipe(ialu_mem_imm); > +%} > + > // Xor Instructions > // Xor Register with Register > instruct xorL_rReg(rRegL dst, rRegL src, rFlagsReg cr) > From claes.redestad at oracle.com Wed Nov 6 19:17:16 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 6 Nov 2019 20:17:16 +0100 Subject: RFR: 8233708: VectorSet cleanup In-Reply-To: <336853d7-3047-0194-4e36-9ee968e07a86@redhat.com> References: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com> <336853d7-3047-0194-4e36-9ee968e07a86@redhat.com> Message-ID: <5af59bb9-6d1a-1a69-0693-a27ee9a2d56c@oracle.com> Hi Aleksey, On 2019-11-06 16:49, Aleksey Shipilev wrote: > On 11/6/19 3:42 PM, Claes Redestad wrote: >> Webrev: http://cr.openjdk.java.net/~redestad/8233708/open.00/ > > Shenandoah part looks fine. thanks for checking! > > The rest looks fine too. Thanks! /Claes From claes.redestad at oracle.com Wed Nov 6 19:17:39 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 6 Nov 2019 20:17:39 +0100 Subject: RFR: 8233708: VectorSet cleanup In-Reply-To: References: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com> Message-ID: <531ad263-331c-dcca-b63d-f99648d64618@oracle.com> Hi Nils, On 2019-11-06 16:08, Nils Eliasson wrote: > Hi Claes, > > Excellent cleanup! thanks for reviewing! /Claes From claes.redestad at oracle.com Wed Nov 6 19:18:01 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 6 Nov 2019 20:18:01 +0100 Subject: RFR: 8233708: VectorSet cleanup In-Reply-To: References: <641814b2-3b10-06d6-39f8-b02d096832b5@oracle.com> Message-ID: <7cf8fffc-4198-11af-86a8-3db575524e79@oracle.com> Hi Tobias, On 2019-11-06 16:57, Tobias Hartmann wrote: > Hi Claes, > > very nice. While you're at it, could you add brackets to the for-loops/ifs in vectset.cpp:70, > vectset.hpp:89, 98 and fix the whitespacing? No new webrev required. will do! /Claes From sandhya.viswanathan at intel.com Thu Nov 7 00:34:32 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Thu, 7 Nov 2019 00:34:32 +0000 Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: References: Message-ID: Hi Vladimir/Bernard, I don?t see any restrictions/limitations on these instructions other than the fact that the ?long? operation is only supported on 64-bit format as usual so should be restricted to 64-bit JVM only. The code size improvement that Bernard demonstrates is significant for operation on longs. It looks like the throughput for AND/OR is better than BTR/BTS (0.25 vs 0.5) though. Please refer Table C-17 in the document below: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov Sent: Wednesday, November 06, 2019 10:45 AM To: B. Blaser ; hotspot-compiler-dev at openjdk.java.net Cc: Viswanathan, Sandhya Subject: Re: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and setting long vector bits Hi Bernard, It is interesting suggestion. I don't see we use BTR and BTS currently. Sandhya, do these instructions has some limitations/restrictions? Regarding changes. For new code we prefer to have new encoding of macroasm instructions used in .ad files instead of opcodes [1]. This way we make sure correct encoding is used on different CPUs. Thanks, Vladimir [1] http://hg.openjdk.java.net/jdk/jdk/file/38d4202154f2/src/hotspot/cpu/x86/x86_64.ad#l10051 On 11/2/19 10:18 AM, B. Blaser wrote: > Hi, > > I experimented, some time ago, with an optimization of several common > flag patterns (see also JBS) using BTR/BTS instead of AND/OR > instructions on x86_64 xeon: > > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @State(Scope.Thread) > public class BitSetAndReset { > private static final int COUNT = 10_000; > > private static final long MASK63 = 0x8000_0000_0000_0000L; > private static final long MASK31 = 0x0000_0000_8000_0000L; > private static final long MASK15 = 0x0000_0000_0000_8000L; > private static final long MASK00 = 0x0000_0000_0000_0001L; > > private long andq, orq; > private boolean success = true; > > @TearDown(Level.Iteration) > public void finish() { > if (!success) > throw new AssertionError("Failure while setting or > clearing long vector bits!"); > } > > @Benchmark > public void bitSet(Blackhole bh) { > for (int i=0; i andq = MASK63 | MASK31 | MASK15 | MASK00; > orq = 0; > bh.consume(test63()); > bh.consume(test31()); > bh.consume(test15()); > bh.consume(test00()); > success &= andq == 0 && orq == (MASK63 | MASK31 | MASK15 | MASK00); > } > } > > private long test63() { > andq &= ~MASK63; > orq |= MASK63; > return 0L; > } > private long test31() { > andq &= ~MASK31; > orq |= MASK31; > return 0L; > } > private long test15() { > andq &= ~MASK15; > orq |= MASK15; > return 0L; > } > private long test00() { > andq &= ~MASK00; > orq |= MASK00; > return 0L; > } > } > > Running the benchmark this way: > > $ make test TEST="micro:vm.compiler.BitSetAndReset" > MICRO="VM_OPTIONS='-XX:CompileCommand=print,org/openjdk/bench/vm/compiler/BitSetAndReset.*test*';FORK=3;WARMUP_ITER=1;ITER=3" > > We had before: > > 03e movq R10, #9223372036854775807 # long > 048 andq [RSI + #16 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 04c movq R10, #-9223372036854775808 # long > 056 orq [RSI + #24 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 05a ... > > => 28 bytes > > 03c xorl RAX, RAX # long > 03e movq R10, #-2147483649 # long > 048 andq [RSI + #16 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 04c movl R10, #2147483648 # long (unsigned 32-bit) > 052 orq [RSI + #24 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 056 ... > > => 26 bytes > > 03c andq [RSI + #16 (8-bit)], #-32769 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 044 orq [RSI + #24 (8-bit)], #32768 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 04c ... > > 03c andq [RSI + #16 (8-bit)], #-2 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 041 orq [RSI + #24 (8-bit)], #1 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 046 ... > > Benchmark Mode Cnt Score Error Units > BitSetAndReset.bitSet avgt 9 78083.773 ? 2182.692 ns/op > > And we would have after: > > 03c btrq [RSI + #16 (8-bit)], log2(not(#9223372036854775807)) > # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 042 btsq [RSI + #24 (8-bit)], log2(#-9223372036854775808) > # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 048 ... > > => 12 bytes > > 03c btrq [RSI + #16 (8-bit)], log2(not(#-2147483649)) # > long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 042 xorl RAX, RAX # long > 044 movl R10, #2147483648 # long (unsigned 32-bit) > 04a orq [RSI + #24 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 04e ... > > => 18 bytes > > 03c andq [RSI + #16 (8-bit)], #-32769 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 044 orq [RSI + #24 (8-bit)], #32768 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 04c ... > > 03c andq [RSI + #16 (8-bit)], #-2 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 041 orq [RSI + #24 (8-bit)], #1 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 046 ... > > Benchmark Mode Cnt Score Error Units > BitSetAndReset.bitSet avgt 9 77355.154 ? 252.503 ns/op > > We see a tiny performance gain with BTR/BTS but the major interest > remains the much better encoding with up to 16 bytes saving for pure > 64-bit immediates along with a lower register consumption. > > Does the patch below look reasonable enough to eventually rebase and > push it to jdk/submit and to post a RFR maybe soon if all goes well? > > Thanks, > Bernard > > diff --git a/src/hotspot/cpu/x86/x86_64.ad > b/src/hotspot/cpu/x86/x86_64.ad > --- a/src/hotspot/cpu/x86/x86_64.ad > +++ b/src/hotspot/cpu/x86/x86_64.ad > @@ -2069,6 +2069,16 @@ > } > %} > > + enc_class Log2L(immPow2L imm) > + %{ > + emit_d8(cbuf, log2_long($imm$$constant)); %} > + > + enc_class Log2NotL(immPow2NotL imm) %{ > + emit_d8(cbuf, log2_long(~$imm$$constant)); %} > + > enc_class opc2_reg(rRegI dst) > %{ > // BSWAP > @@ -3131,6 +3141,28 @@ > interface(CONST_INTER); > %} > > +operand immPow2L() > +%{ > + // n should be a pure 64-bit power of 2 immediate. > + predicate(is_power_of_2_long(n->get_long()) && > log2_long(n->get_long()) > 31); > + match(ConL); > + > + op_cost(15); > + format %{ %} > + interface(CONST_INTER); > +%} > + > +operand immPow2NotL() > +%{ > + // n should be a pure 64-bit immediate given that not(n) is a power of 2. > + predicate(is_power_of_2_long(~n->get_long()) && > log2_long(~n->get_long()) > 30); > + match(ConL); > + > + op_cost(15); > + format %{ %} > + interface(CONST_INTER); > +%} > + > // Long Immediate zero > operand immL0() > %{ > @@ -9740,6 +9772,19 @@ > ins_pipe(ialu_mem_imm); > %} > > +instruct btrL_mem_imm(memory dst, immPow2NotL src, rFlagsReg cr) %{ > + match(Set dst (StoreL dst (AndL (LoadL dst) src))); > + effect(KILL cr); > + > + ins_cost(125); > + format %{ "btrq $dst, log2(not($src))\t# long" %} > + opcode(0x0F, 0xBA, 0x06); > + ins_encode(REX_mem_wide(dst), OpcP, OpcS, > + RM_opc_mem(tertiary, dst), Log2NotL(src)); > + ins_pipe(ialu_mem_imm); > +%} > + > // BMI1 instructions > instruct andnL_rReg_rReg_mem(rRegL dst, rRegL src1, memory src2, > immL_M1 minus_1, rFlagsReg cr) %{ > match(Set dst (AndL (XorL src1 minus_1) (LoadL src2))); @@ -9933,6 > +9978,19 @@ > ins_pipe(ialu_mem_imm); > %} > > +instruct btsL_mem_imm(memory dst, immPow2L src, rFlagsReg cr) %{ > + match(Set dst (StoreL dst (OrL (LoadL dst) src))); > + effect(KILL cr); > + > + ins_cost(125); > + format %{ "btsq $dst, log2($src)\t# long" %} > + opcode(0x0F, 0xBA, 0x05); > + ins_encode(REX_mem_wide(dst), OpcP, OpcS, > + RM_opc_mem(tertiary, dst), Log2L(src)); > + ins_pipe(ialu_mem_imm); > +%} > + > // Xor Instructions > // Xor Register with Register > instruct xorL_rReg(rRegL dst, rRegL src, rFlagsReg cr) > From gromero at linux.vnet.ibm.com Thu Nov 7 00:53:32 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 6 Nov 2019 21:53:32 -0300 Subject: [8u] RFR for backport of 8198894 (CRC32 1/4): [PPC64] More generic vector CRC implementation (v2) In-Reply-To: References: <0dc83fcb-4e09-5841-04be-aee615e5a7fd@linux.vnet.ibm.com> <67e6e482-df56-27d0-da20-7968615f3ea1@linux.vnet.ibm.com> Message-ID: Hi Martin, On 11/06/2019 12:15 PM, Doerr, Martin wrote: > Hi Gustavo, > >> [PPC64] More generic vector CRC implementation (1/4) >> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8198894/ > > This seems to be the version I had already reviewed. I'm still ok with it. Yes, nothing changed. I posted it again for completeness, but I see now it can cause confusing. I'll avoid doing it in the future. >> [PPC64] Possibly unreliable stack frame resizing in template interpreter (2/4) >> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8216376/ > > Normally, backports should get handled in separate backport RFRs, but the manual change in this version could be considered trivial, so I don't insist on separate RFR. > Looks good, now. I see, initially I posted the patches separately. I should have kept them separated. >> [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays >> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8216060/ (3/4) > > Almost like above. I'd leave at least the CRC32C defines in the code (stubRoutines_ppc). They don't disturb. Why should we introduce additional diffs? Got it. I'll send a separate RFR. >> [PPC64] Cleanup non-vector version of CRC32 >> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v2/8217459/ (4/4) > > This one contains a part of another shared code change. So I can't review it as part of this RFR. > Needs to get reviewed separately. Backport of the original change which introduced LITTLE_ENDIAN_ONLY etc. should get evaluated. Good point, Martin. I overlooked the original change. It's a fix from Volker to consider big-endian function descriptors when walking the stack: https://bugs.openjdk.java.net/browse/JDK-8206173 Fix applies cleanly (except for the path adjustment). It's shared code but in effect it's PPC64-only. I'll take care of backporting it separate. And a nit in the test pointed out by Volker: http://hg.openjdk.java.net/jdk/jdk/file/5bc2e9c9604d/test/hotspot/jtreg/runtime/ElfDecoder/TestElfDirectRead.java#l39 It should read "function descriptors" instead of "file descriptors", right? I'll send a patch to jdk/jdk fixing that comment too. Thank you! Best regards, Gustavo From john.r.rose at oracle.com Thu Nov 7 01:01:34 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 6 Nov 2019 17:01:34 -0800 Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: References: Message-ID: <58CC373D-E70D-4503-8578-6110F3FC04EE@oracle.com> I recently saw LLVM compile a classification switch into a really tidy BTR instruction, something like this: switch (ch) { case ';': case '/': case '.': case '[': return 0; default: return 1; } => ? range check ? movabsq 0x200000002003, %rcx btq %rdi, %rcx It made me wish for this change, plus some more to switch itself. Given Sandhya?s report, though, BTR may only be helpful in limited cases. In the case above, it subsumes a shift instruction. Bernard?s JMH experiment suggests something else is going on besides the throughput difference which Sandhya cites. Maybe it?s a benchmark artifact, or maybe it?s a good effect from smaller code. I suggest jamming more back-to-back BTRs together, to see if the throughput effect appears. ? John On Nov 6, 2019, at 4:34 PM, Viswanathan, Sandhya wrote: > > Hi Vladimir/Bernard, > > > > I don?t see any restrictions/limitations on these instructions other than the fact that the ?long? operation is only supported on 64-bit format as usual so should be restricted to 64-bit JVM only. > > The code size improvement that Bernard demonstrates is significant for operation on longs. > > It looks like the throughput for AND/OR is better than BTR/BTS (0.25 vs 0.5) though. Please refer Table C-17 in the document below: From igor.ignatyev at oracle.com Thu Nov 7 04:23:24 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 6 Nov 2019 20:23:24 -0800 Subject: RFR(S) : 8230364 : [JVMCI] a number of JVMCI tests are not jtreg enabled Message-ID: <82DF702D-19C9-486B-8B5D-82C4F94D0A95@oracle.com> http://cr.openjdk.java.net/~iignatyev//8230364/webrev.02/ > 102 lines changed: 72 ins; 9 del; 21 mod; Hi all, could you please review this small patch which adds jtreg test descriptions to all tests in compiler/jvmci/jdk.vm.ci.hotspot.test/src? to make it work, the patch also: - replaces junit ceremonies w/ testng ceremonies; - changes TestHotSpotJVMCIRuntime to use platform classLoader instead of ext. loader b/c ext. loader (and internal classes used by the test) got removed in jdk9; - temporary excludes TestTranslatedException. the test fails b/c decoded exception doesn't have information about modules. 8233745 is going to update jdk.vm.ci.hotspot.TranslatedException and remove @ignore from the test. JBS: https://bugs.openjdk.java.net/browse/JDK-8230364 webrev: http://cr.openjdk.java.net/~iignatyev//8230364/webrev.02/ testing: "added" tests (test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.hotspot.test/) Thanks, -- Igor From vladimir.kozlov at oracle.com Thu Nov 7 05:20:40 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 6 Nov 2019 21:20:40 -0800 Subject: RFR(S) : 8230364 : [JVMCI] a number of JVMCI tests are not jtreg enabled In-Reply-To: <82DF702D-19C9-486B-8B5D-82C4F94D0A95@oracle.com> References: <82DF702D-19C9-486B-8B5D-82C4F94D0A95@oracle.com> Message-ID: <09AB4909-D013-44C3-8A80-4334A5F52902@oracle.com> Looks good. Thanks Vladimir > On Nov 6, 2019, at 8:23 PM, Igor Ignatyev wrote: > > http://cr.openjdk.java.net/~iignatyev//8230364/webrev.02/ >> 102 lines changed: 72 ins; 9 del; 21 mod; > > Hi all, > > could you please review this small patch which adds jtreg test descriptions to all tests in compiler/jvmci/jdk.vm.ci.hotspot.test/src? > to make it work, the patch also: > - replaces junit ceremonies w/ testng ceremonies; > - changes TestHotSpotJVMCIRuntime to use platform classLoader instead of ext. loader b/c ext. loader (and internal classes used by the test) got removed in jdk9; > - temporary excludes TestTranslatedException. the test fails b/c decoded exception doesn't have information about modules. 8233745 is going to update jdk.vm.ci.hotspot.TranslatedException and remove @ignore from the test. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8230364 > webrev: http://cr.openjdk.java.net/~iignatyev//8230364/webrev.02/ > testing: "added" tests (test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.hotspot.test/) > > Thanks, > -- Igor From tobias.hartmann at oracle.com Thu Nov 7 05:54:45 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 7 Nov 2019 06:54:45 +0100 Subject: RFR(XS): 8233491: Crash in AdapterHandlerLibrary::get_adapter with CDS due to code cache exhaustion In-Reply-To: References: Message-ID: <3e45e244-31a0-3bd2-4b6c-acd1478ace5f@oracle.com> Thanks Vladimir. Best regards, Tobias On 06.11.19 19:01, Vladimir Kozlov wrote: > CC to runtime group too. > > Looks good to me. > > Thanks, > Vladimir > > On 11/6/19 5:34 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8233491 >> http://cr.openjdk.java.net/~thartmann/8233491/webrev.00/ >> >> When running a stress test with CDS, we fail to create adapters when linking a method from a shared >> class because the code cache is full. This case is not properly handled by the CDS specific code and >> instead of throwing a VirtualMachineError, we crash because "entry" is NULL. >> >> I'm able to spuriously reproduce this with a test (see [1]) but since the problem depends on the >> class loading sequence, I was not able to make it more reliable or convert it to a robust jtreg >> test. However, I've verified that the patch fixes the problem. >> >> Thanks, >> Tobias >> >> [1] >> https://bugs.openjdk.java.net/browse/JDK-8233491?focusedCommentId=14298462&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14298462 >> >> From martin.doerr at sap.com Thu Nov 7 09:03:50 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 7 Nov 2019 09:03:50 +0000 Subject: [8u] RFR for backport of 8198894 (CRC32 1/4): [PPC64] More generic vector CRC implementation (v2) In-Reply-To: References: <0dc83fcb-4e09-5841-04be-aee615e5a7fd@linux.vnet.ibm.com> <67e6e482-df56-27d0-da20-7968615f3ea1@linux.vnet.ibm.com> Message-ID: Hi Gustavo, > Good point, Martin. I overlooked the original change. It's a fix from Volker to > consider big-endian function descriptors when walking the stack: > > https://bugs.openjdk.java.net/browse/JDK-8206173 > > Fix applies cleanly (except for the path adjustment). It's shared code but in > effect it's PPC64-only. I'll take care of backporting it separate. > > And a nit in the test pointed out by Volker: > > http://hg.openjdk.java.net/jdk/jdk/file/5bc2e9c9604d/test/hotspot/jtreg/ru > ntime/ElfDecoder/TestElfDirectRead.java#l39 > > It should read "function descriptors" instead of "file descriptors", right? Comment fixes should not be done in backports. If you would like to fix it, it should get fixed in jdk/jdk and then backported. Please backport as it is. You won't need a review if you don't change anything except the trivial path adaptations. Best regards, Martin > -----Original Message----- > From: Gustavo Romero > Sent: Donnerstag, 7. November 2019 01:54 > To: Doerr, Martin ; hotspot-compiler- > dev at openjdk.java.net > Cc: jdk8u-dev at openjdk.java.net > Subject: Re: [8u] RFR for backport of 8198894 (CRC32 1/4): [PPC64] More > generic vector CRC implementation (v2) > > Hi Martin, > > On 11/06/2019 12:15 PM, Doerr, Martin wrote: > > Hi Gustavo, > > > >> [PPC64] More generic vector CRC implementation (1/4) > >> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for- > review/v2/8198894/ > > > > This seems to be the version I had already reviewed. I'm still ok with it. > > Yes, nothing changed. I posted it again for completeness, but I see now > it can cause confusing. I'll avoid doing it in the future. > > > >> [PPC64] Possibly unreliable stack frame resizing in template interpreter > (2/4) > >> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for- > review/v2/8216376/ > > > > Normally, backports should get handled in separate backport RFRs, but the > manual change in this version could be considered trivial, so I don't insist on > separate RFR. > > Looks good, now. > > I see, initially I posted the patches separately. I should have kept them > separated. > > > >> [PPC64] Vector CRC implementation should be used by interpreter and be > faster for short arrays > >> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for- > review/v2/8216060/ (3/4) > > > > Almost like above. I'd leave at least the CRC32C defines in the code > (stubRoutines_ppc). They don't disturb. Why should we introduce additional > diffs? > Got it. I'll send a separate RFR. > > > >> [PPC64] Cleanup non-vector version of CRC32 > >> http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for- > review/v2/8217459/ (4/4) > > > > This one contains a part of another shared code change. So I can't review it > as part of this RFR. > > Needs to get reviewed separately. Backport of the original change which > introduced LITTLE_ENDIAN_ONLY etc. should get evaluated. > > Good point, Martin. I overlooked the original change. It's a fix from Volker to > consider big-endian function descriptors when walking the stack: > > https://bugs.openjdk.java.net/browse/JDK-8206173 > > Fix applies cleanly (except for the path adjustment). It's shared code but in > effect it's PPC64-only. I'll take care of backporting it separate. > > And a nit in the test pointed out by Volker: > > http://hg.openjdk.java.net/jdk/jdk/file/5bc2e9c9604d/test/hotspot/jtreg/ru > ntime/ElfDecoder/TestElfDirectRead.java#l39 > > It should read "function descriptors" instead of "file descriptors", right? > > I'll send a patch to jdk/jdk fixing that comment too. > > Thank you! > > Best regards, > Gustavo From martin.doerr at sap.com Thu Nov 7 09:08:38 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 7 Nov 2019 09:08:38 +0000 Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread In-Reply-To: <84ef3a8c-5005-6529-5192-b9214e0348ac@oracle.com> References: <10f415b2-a56b-8e74-ba12-b971db310c8c@oracle.com> <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com> <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com> <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com> <8BCAE092-9E05-4B38-9D41-2F3AC889C0AA@oracle.com> <84ef3a8c-5005-6529-5192-b9214e0348ac@oracle.com> Message-ID: Hi David, get_log only accesses the executing thread's own oop and the ones before it. So it's ensured by the algorithm that all accessed oops are in live handles. The problem is in can_remove when not holding the lock. For that, webrev.04 avoids accessing the oop of the last compiler thread in the case in which the lock is not held. Best regards, Martin > -----Original Message----- > From: David Holmes > Sent: Mittwoch, 6. November 2019 11:15 > To: Doerr, Martin ; Kim Barrett > > Cc: dean.long at oracle.com; Vladimir Kozlov (vladimir.kozlov at oracle.com) > ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread > > Hi Martin, > > On 6/11/2019 7:12 pm, Doerr, Martin wrote: > > Hi Kim, > > > > thanks for confirming. > > > > > http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr > ev.04/ > > already avoids access to freed handles. > > Sorry I missed your earlier reference to this version. > > So the expectation here is that all accesses to these arrays are guarded > by the CompileThread_lock, but that doesn't seem to hold for get_log ? > > Thanks, > David > ----- > > > I don't really like the complexity of this code. > > Replacing oops in handles would have been much more simple. > > But I can live with either version. > > > > Best regards, > > Martin > > > > > >> -----Original Message----- > >> From: Kim Barrett > >> Sent: Mittwoch, 6. November 2019 04:09 > >> To: Doerr, Martin > >> Cc: David Holmes ; dean.long at oracle.com; > >> Vladimir Kozlov (vladimir.kozlov at oracle.com) > >> ; hotspot-compiler-dev at openjdk.java.net > >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread > >> > >>> On Nov 5, 2019, at 3:40 AM, Doerr, Martin > wrote: > >> > >> Coming back in, because this seems to be going off into the weeds again. > >> > >>>> I don't understand what you mean. If a compiler thread holds an oop, > any > >>>> oop, it must hold it in a Handle to ensure it can't be gc'd. > >>> > >>> The problem is not related to gc. > >>> My change introduces destroy_global for the handles. This means that > the > >> OopStorage portion which has held the oop can get freed. > >>> However, other compiler threads are running concurrently. They may > >> execute code which reads the oop from the handle which is freed by this > >> thread. > >>> Reading stale data is not a problem here, but reading freed memory may > >> assert or even crash in general. > >>> I can't see how OopStorage supports reading from handles which were > >> freed by destroy_global. > >> > >> So don't do that! > >> > >> OopStorage isn't magic. If you are going to look at an OopStorage > >> handle, you have to ensure there won't be concurrent deletion. Use > >> locks or some safe memory reclamation protocol. (GlobalCounter might > >> be used here, but it depends a lot on what the iterations are doing. A > >> reference counting mechanism is another possibility.) This is no > >> different from any other resource management. > >> > >>> I think it would be safe if the freeing only occurred at safepoints, but I > don't > >> think this is the case. > >> > >> Assuming the iteration didn?t happen at safepoints (which is just a way to > >> make the iteration and > >> deletion not concurrent). And I agree that isn?t the case with the current > >> code. > > From rwestrel at redhat.com Thu Nov 7 09:46:26 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 07 Nov 2019 10:46:26 +0100 Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out In-Reply-To: References: <878spbc0c8.fsf@redhat.com> <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com> <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com> Message-ID: <87imnwdnnx.fsf@redhat.com> > I am fine with this conservative fix (bailout optimization). It is good that performance is not affected. Thanks for the review. Roland. From david.holmes at oracle.com Thu Nov 7 09:51:10 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 7 Nov 2019 19:51:10 +1000 Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread In-Reply-To: References: <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com> <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com> <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com> <8BCAE092-9E05-4B38-9D41-2F3AC889C0AA@oracle.com> <84ef3a8c-5005-6529-5192-b9214e0348ac@oracle.com> Message-ID: On 7/11/2019 7:08 pm, Doerr, Martin wrote: > Hi David, > > get_log only accesses the executing thread's own oop and the ones before it. So it's ensured by the algorithm that all accessed oops are in live handles. Okay I see that now. Thanks, David > The problem is in can_remove when not holding the lock. For that, webrev.04 avoids accessing the oop of the last compiler thread in the case in which the lock is not held. > > Best regards, > Martin > > >> -----Original Message----- >> From: David Holmes >> Sent: Mittwoch, 6. November 2019 11:15 >> To: Doerr, Martin ; Kim Barrett >> >> Cc: dean.long at oracle.com; Vladimir Kozlov (vladimir.kozlov at oracle.com) >> ; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread >> >> Hi Martin, >> >> On 6/11/2019 7:12 pm, Doerr, Martin wrote: >>> Hi Kim, >>> >>> thanks for confirming. >>> >>> >> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr >> ev.04/ >>> already avoids access to freed handles. >> >> Sorry I missed your earlier reference to this version. >> >> So the expectation here is that all accesses to these arrays are guarded >> by the CompileThread_lock, but that doesn't seem to hold for get_log ? >> >> Thanks, >> David >> ----- >> >>> I don't really like the complexity of this code. >>> Replacing oops in handles would have been much more simple. >>> But I can live with either version. >>> >>> Best regards, >>> Martin >>> >>> >>>> -----Original Message----- >>>> From: Kim Barrett >>>> Sent: Mittwoch, 6. November 2019 04:09 >>>> To: Doerr, Martin >>>> Cc: David Holmes ; dean.long at oracle.com; >>>> Vladimir Kozlov (vladimir.kozlov at oracle.com) >>>> ; hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread >>>> >>>>> On Nov 5, 2019, at 3:40 AM, Doerr, Martin >> wrote: >>>> >>>> Coming back in, because this seems to be going off into the weeds again. >>>> >>>>>> I don't understand what you mean. If a compiler thread holds an oop, >> any >>>>>> oop, it must hold it in a Handle to ensure it can't be gc'd. >>>>> >>>>> The problem is not related to gc. >>>>> My change introduces destroy_global for the handles. This means that >> the >>>> OopStorage portion which has held the oop can get freed. >>>>> However, other compiler threads are running concurrently. They may >>>> execute code which reads the oop from the handle which is freed by this >>>> thread. >>>>> Reading stale data is not a problem here, but reading freed memory may >>>> assert or even crash in general. >>>>> I can't see how OopStorage supports reading from handles which were >>>> freed by destroy_global. >>>> >>>> So don't do that! >>>> >>>> OopStorage isn't magic. If you are going to look at an OopStorage >>>> handle, you have to ensure there won't be concurrent deletion. Use >>>> locks or some safe memory reclamation protocol. (GlobalCounter might >>>> be used here, but it depends a lot on what the iterations are doing. A >>>> reference counting mechanism is another possibility.) This is no >>>> different from any other resource management. >>>> >>>>> I think it would be safe if the freeing only occurred at safepoints, but I >> don't >>>> think this is the case. >>>> >>>> Assuming the iteration didn?t happen at safepoints (which is just a way to >>>> make the iteration and >>>> deletion not concurrent). And I agree that isn?t the case with the current >>>> code. >>> From tobias.hartmann at oracle.com Thu Nov 7 11:34:40 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 7 Nov 2019 12:34:40 +0100 Subject: [14] RFR(S): 8233788: Remove useless asserts in PhaseCFG::insert_anti_dependences Message-ID: Hi, please review the following cleanup: https://bugs.openjdk.java.net/browse/JDK-8233788 http://cr.openjdk.java.net/~thartmann/8233788/webrev.00/ load_alias_idx can never be 0 and even if it could be, one of the asserts would fail because the opcode check handles only a single type. In fact, all these intrinsic nodes have adr_type() == NULL which maps to AliasIdxTop. Thanks, Tobias From erik.osterlund at oracle.com Thu Nov 7 13:49:00 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 7 Nov 2019 14:49:00 +0100 Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted to a load for strong refs Message-ID: Hi, We have noticed problems with the one single place where we let C2 touch the graph while injecting our load barriers. Right before tagging a load as needing a load barrier, it is GVN transformed. The problem with this is that if we emit a strong field load, the GVN transformation can see through that load, by following it through the store that set its value, and find that the field value really came from the load of a weak Reference.get intrinsic. In this scenario, the load we get out from the GVN transformation needs weak barriers, yet we override it with strong barriers, as that was the semantics of the access being parsed. Sigh. We already have code that tries to determine if the load we got out from the GVN transformation looks like a load that was created in the BarrierSetC2 factory function, so one way of solving this is to refine that logic that tries to determine if this was the load we created before the transformation or not. But I felt like a better solution is to finish constructing the access with all the intended properties *before* transformation. I massaged the code so that the GC barrier data of accesses with load barriers gets passed in to the factory functions that create the access, right before the transformation. This way, we construct the access with the intended semantics where it is being created (parser or macro expansion for field accesses in clone intrinsics). Then we do not have to touch it after the GVN transformation. It does seem like there could be similar problems from other GCs, but in e.g. G1, the consequences are weird suboptimal code instead of anything dangerous happening. For example, we can generate SATB buffering code required by G1 Reference.get() intrinsics for strong accesses, due to GVN handing out earlier accesses with different semantics. Perhaps that should be looked into separately as well. But that investigation is outside of the scope of this bug fix. Webrev: http://cr.openjdk.java.net/~eosterlund/8233506/webrev.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8233506 Thanks, /Erik From sgehwolf at redhat.com Thu Nov 7 15:02:36 2019 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Thu, 07 Nov 2019 16:02:36 +0100 Subject: [8u] RFR: 8233023: assert(Opcode() == mem->Opcode() || phase->C->get_alias_index(adr_type()) == Compile::AliasIdxRaw) failed: no mismatched stores, except on raw memory In-Reply-To: References: <7a2d50793120bdeb86d0047fd09db18c725328e0.camel@redhat.com> Message-ID: <7aaffb34ba9af2e5508f423f7f3dddf75e5ea9cc.camel@redhat.com> On Wed, 2019-11-06 at 16:51 +0000, Andrew John Hughes wrote: > On 30/10/2019 09:41, Severin Gehwolf wrote: > > Hi, > > > > Could I please get a review of this 8u only issue? The reason a > > fastdebug build of latest OpenJDK 8u asserts for the dec-tree benchmark > > of the renaissance suite is because the 8u backport of JDK-8140309 was > > missing this hunk from JDK 9[1]: > > > > + (Opcode() == Op_StoreL && st->Opcode() == Op_StoreI) || // expanded ClearArrayNode > > + (is_mismatched_access() || st->as_Store()->is_mismatched_access()), > > > > I had a closer look and there doesn't seem to be missing anything else. > > The proposed fix is to amend the assert condition in the appropriate > > place, which brings 8u in line with JDK 9 code where the failure isn't > > observed. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8233023 > > webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8233023/01/webrev/ > > > > Testing: 8u tier1 test set with fastdebug build on x86_64 Linux. No new > > failures. dec-tree benchmark now runs successfully on an 8u fastdebug > > build. > > > > Thoughts? > > > > Thanks, > > Severin > > > > [1] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/4bee38ba018c > > > > I compared the two patches and this missing hunk does stand out. So the > patch looks fine in that respect. Thanks. > I notice they also didn't backport the testcase to 8u. Any thoughts on > including that? The test uses Unsafe.putXXXUnaligned() which aren't available in JDK 8. So I'm not sure how the test would work. More thoughts? Thanks, Severin From adinn at redhat.com Thu Nov 7 16:34:15 2019 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 7 Nov 2019 16:34:15 +0000 Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free blocks In-Reply-To: <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com> References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com> <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com> <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com> <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com> <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com> <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com> <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com> <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com> <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com> Message-ID: On 04/11/2019 15:35, Schmidt, Lutz wrote: > thank you for your thoughts. I do not agree to your conclusion, > though. > > There are two bottlenecks in the CodeHeap management code. One is in > CodeHeap::mark_segmap_as_used(), uncovered by > OverflowCodeCacheTest.java. The other is in > CodeHeap::add_to_freelist(), uncovered by StressCodeCacheTest.java. > > Both bottlenecks are tackled by the recommended changeset. . . . > CodeHeap::add_to_freelist() is still O(n*n), with n being the free > list length. But the kick-in point of the non-linearity could be > significantly shifted towards larger n. The time reduction from > approx. 8 seconds to 160 milliseconds supports this statement. Ah sorry, I was not clear from your original post that the proposed change had significantly improved the time spent in free list management in the second test by significantly cutting down the free list size. As you say, a reduction factor of 1/K in list size will give a 1/K*K reduction in execution time. Since this test is a lot nearer to reality than the overflow test I think the current result is perhaps enough to justify its value. > I agree it would be helpful to have a "real-world" example showing > some improvement. Providing such evidence is hard, though. I could > instrument the code and print some values form time to time. It's > certain this additional output will mess up success/failure decisions > in our test environment. Not sure everybody likes that. But I will > give it a try and take the hits. This will be a multi-day effort. Well, that would be nice to have but not if it stops other work. The one thing about the Stress test that I fear may be 'unreal' is the potentially over-high probability of generating long(ish) runs of adjacent free segments. That might be giving an artificial win that we will not in fact see. However, given the current numbers I'd be happy to risk that and let this patch go in as is. > On a general note, I am always uncomfortable knowing of a O(n*n) > effort, in particular when it could be removed or at least tamed > considerably. Experience tells (at least to me) that, at some point > in time, n will be large enough to hurt. Well, yes, although salesman do travel /and/ make money ... ;-) > I'll be back. Sure, thanks for following up. This is all very interesting. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From gromero at linux.vnet.ibm.com Thu Nov 7 18:36:46 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Thu, 7 Nov 2019 15:36:46 -0300 Subject: [8u] RFR for backport of 8198894 (CRC32 1/4): [PPC64] More generic vector CRC implementation (v2) In-Reply-To: References: <0dc83fcb-4e09-5841-04be-aee615e5a7fd@linux.vnet.ibm.com> <67e6e482-df56-27d0-da20-7968615f3ea1@linux.vnet.ibm.com> Message-ID: <26fba4a1-0d53-bf0c-42cd-09c724bf833b@linux.vnet.ibm.com> Hello Martin, On 11/07/2019 06:03 AM, Doerr, Martin wrote: > Hi Gustavo, > >> Good point, Martin. I overlooked the original change. It's a fix from Volker to >> consider big-endian function descriptors when walking the stack: >> >> https://bugs.openjdk.java.net/browse/JDK-8206173 >> >> Fix applies cleanly (except for the path adjustment). It's shared code but in >> effect it's PPC64-only. I'll take care of backporting it separate. >> >> And a nit in the test pointed out by Volker: >> >> http://hg.openjdk.java.net/jdk/jdk/file/5bc2e9c9604d/test/hotspot/jtreg/ru >> ntime/ElfDecoder/TestElfDirectRead.java#l39 >> >> It should read "function descriptors" instead of "file descriptors", right? > > Comment fixes should not be done in backports. If you would like to fix it, it should get fixed in jdk/jdk and then backported. Yeah, that's what I meant (implicitly at least). I'm aware of that. I plan to fix it only in jdk/jdk, also the test is not part of Volker's fix, anyway I think that nit is not worth a backport. > Please backport as it is. You won't need a review if you don't change anything except the trivial path adaptations. Yeah, that's why I mentioned it applied cleanly, except for path adaptations :) Thanks. Best regards, Gustavo From vladimir.kozlov at oracle.com Thu Nov 7 19:28:13 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Nov 2019 11:28:13 -0800 Subject: [14] RFR(S): 8233788: Remove useless asserts in PhaseCFG::insert_anti_dependences In-Reply-To: References: Message-ID: Looks good. Thanks, Vladimir On 11/7/19 3:34 AM, Tobias Hartmann wrote: > Hi, > > please review the following cleanup: > https://bugs.openjdk.java.net/browse/JDK-8233788 > http://cr.openjdk.java.net/~thartmann/8233788/webrev.00/ > > load_alias_idx can never be 0 and even if it could be, one of the asserts would fail because the > opcode check handles only a single type. In fact, all these intrinsic nodes have adr_type() == NULL > which maps to AliasIdxTop. > > Thanks, > Tobias > From bsrbnd at gmail.com Thu Nov 7 19:30:09 2019 From: bsrbnd at gmail.com (B. Blaser) Date: Thu, 7 Nov 2019 20:30:09 +0100 Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: <58CC373D-E70D-4503-8578-6110F3FC04EE@oracle.com> References: <58CC373D-E70D-4503-8578-6110F3FC04EE@oracle.com> Message-ID: Hi Vladimir, Sandhya and John, Thanks for your respective answers. The suggested fix focuses on x86_64 and pure 64-bit immediates which means that all other cases are left unchanged as shown by the initial benchmark, for example: andq &= ~MASK00; orq |= MASK00; would still give: 03c andq [RSI + #16 (8-bit)], #-2 # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq 041 orq [RSI + #24 (8-bit)], #1 # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq 046 ... Now, the interesting point is that pure 64-bit immediates (which cannot be treated as sign-extended 8/32-bit values) are assembled using two instructions (not one) because AND/OR cannot be used directly in such cases, for example: andq &= ~MASK63; orq |= MASK63; gives: 03e movq R10, #9223372036854775807 # long 048 andq [RSI + #16 (8-bit)], R10 # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq 04c movq R10, #-9223372036854775808 # long 056 orq [RSI + #24 (8-bit)], R10 # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq 05a ... So, even though Sandhya mentioned a better throughput for AND/OR, the additional MOV cost (I didn't find it in table C-17 but I assume something close to MOVS/Z with latency=1/throughput=0.25) seems to be in favor of a sole BTR/BTS instruction as shown by the initial benchmark. However, as John suggested, I tried another benchmark which focuses on the throughput to make sure there isn't any regression in such situations: private long orq63, orq62, orq61, orq60; @Benchmark public void throughput(Blackhole bh) { for (int i=0; i wrote: > > I recently saw LLVM compile a classification switch into a really tidy BTR instruction, > something like this: > > switch (ch) { > case ';': case '/': case '.': case '[': return 0; > default: return 1; > } > => > ? range check ? > movabsq 0x200000002003, %rcx > btq %rdi, %rcx > > It made me wish for this change, plus some more to switch itself. > Given Sandhya?s report, though, BTR may only be helpful in limited > cases. In the case above, it subsumes a shift instruction. > > Bernard?s JMH experiment suggests something else is going on besides > the throughput difference which Sandhya cites. Maybe it?s a benchmark > artifact, or maybe it?s a good effect from smaller code. I suggest jamming > more back-to-back BTRs together, to see if the throughput effect appears. > > ? John > > On Nov 6, 2019, at 4:34 PM, Viswanathan, Sandhya wrote: > > > > Hi Vladimir/Bernard, > > > > > > > > I don?t see any restrictions/limitations on these instructions other than the fact that the ?long? operation is only supported on 64-bit format as usual so should be restricted to 64-bit JVM only. > > > > The code size improvement that Bernard demonstrates is significant for operation on longs. > > > > It looks like the throughput for AND/OR is better than BTR/BTS (0.25 vs 0.5) though. Please refer Table C-17 in the document below: > From vladimir.kozlov at oracle.com Thu Nov 7 19:51:08 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Nov 2019 11:51:08 -0800 Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: References: <58CC373D-E70D-4503-8578-6110F3FC04EE@oracle.com> Message-ID: <5fac20bf-d5d6-3380-894e-0cb488fb0dca@oracle.com> I agree with you, Bernard. I think throughput performance is limited by memory accesses which is the same in both cases. But code reduction is very nice improvement. We can squeeze more code into CPU buffer which is very good for small loops. Please, send official RFR and to testing. Also would be nice to have a test which verifies result of these operations. Thanks, Vladimir On 11/7/19 11:30 AM, B. Blaser wrote: > Hi Vladimir, Sandhya and John, > > Thanks for your respective answers. > > The suggested fix focuses on x86_64 and pure 64-bit immediates which > means that all other cases are left unchanged as shown by the initial > benchmark, for example: > > andq &= ~MASK00; > orq |= MASK00; > > would still give: > > 03c andq [RSI + #16 (8-bit)], #-2 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 041 orq [RSI + #24 (8-bit)], #1 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 046 ... > > Now, the interesting point is that pure 64-bit immediates (which > cannot be treated as sign-extended 8/32-bit values) are assembled > using two instructions (not one) because AND/OR cannot be used > directly in such cases, for example: > > andq &= ~MASK63; > orq |= MASK63; > > gives: > > 03e movq R10, #9223372036854775807 # long > 048 andq [RSI + #16 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 04c movq R10, #-9223372036854775808 # long > 056 orq [RSI + #24 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 05a ... > > So, even though Sandhya mentioned a better throughput for AND/OR, the > additional MOV cost (I didn't find it in table C-17 but I assume > something close to MOVS/Z with latency=1/throughput=0.25) seems to be > in favor of a sole BTR/BTS instruction as shown by the initial > benchmark. > > However, as John suggested, I tried another benchmark which focuses on > the throughput to make sure there isn't any regression in such > situations: > > private long orq63, orq62, orq61, orq60; > > @Benchmark > public void throughput(Blackhole bh) { > for (int i=0; i orq63 = orq62 = orq61 = orq60 = 0; > bh.consume(testTp()); > } > } > > private long testTp() { > orq63 |= MASK63; > orq62 |= MASK62; > orq61 |= MASK61; > orq60 |= MASK60; > return 0L; > } > > Before, we had: > > 03e movq R10, #-9223372036854775808 # long > 048 orq [RSI + #32 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq63 > 04c movq R10, #4611686018427387904 # long > 056 orq [RSI + #40 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq62 > 05a movq R10, #2305843009213693952 # long > 064 orq [RSI + #48 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq61 > 068 movq R10, #1152921504606846976 # long > 072 orq [RSI + #56 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq60 > > Benchmark Mode Cnt Score Error Units > BitSetAndReset.throughput avgt 9 25912.455 ? 2527.041 ns/op > > And after, we would have: > > 03c btsq [RSI + #32 (8-bit)], log2(#-9223372036854775808) > # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq63 > 042 btsq [RSI + #40 (8-bit)], log2(#4611686018427387904) # > long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq62 > 048 btsq [RSI + #48 (8-bit)], log2(#2305843009213693952) # > long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq61 > 04e btsq [RSI + #56 (8-bit)], log2(#1152921504606846976) # > long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq60 > > Benchmark Mode Cnt Score Error Units > BitSetAndReset.throughput avgt 9 25803.195 ? 2434.009 ns/op > > Fortunately, we still see a tiny performance gain along with the large > size reduction and register saving. > Should we go ahead with this optimization? If so, I'll post a RFR with > Vladimir's requested changes soon. > > Thanks, > Bernard > > On Thu, 7 Nov 2019 at 02:02, John Rose wrote: >> >> I recently saw LLVM compile a classification switch into a really tidy BTR instruction, >> something like this: >> >> switch (ch) { >> case ';': case '/': case '.': case '[': return 0; >> default: return 1; >> } >> => >> ? range check ? >> movabsq 0x200000002003, %rcx >> btq %rdi, %rcx >> >> It made me wish for this change, plus some more to switch itself. >> Given Sandhya?s report, though, BTR may only be helpful in limited >> cases. In the case above, it subsumes a shift instruction. >> >> Bernard?s JMH experiment suggests something else is going on besides >> the throughput difference which Sandhya cites. Maybe it?s a benchmark >> artifact, or maybe it?s a good effect from smaller code. I suggest jamming >> more back-to-back BTRs together, to see if the throughput effect appears. >> >> ? John >> >> On Nov 6, 2019, at 4:34 PM, Viswanathan, Sandhya wrote: >>> >>> Hi Vladimir/Bernard, >>> >>> >>> >>> I don?t see any restrictions/limitations on these instructions other than the fact that the ?long? operation is only supported on 64-bit format as usual so should be restricted to 64-bit JVM only. >>> >>> The code size improvement that Bernard demonstrates is significant for operation on longs. >>> >>> It looks like the throughput for AND/OR is better than BTR/BTS (0.25 vs 0.5) though. Please refer Table C-17 in the document below: >> From smita.kamath at intel.com Thu Nov 7 20:12:18 2019 From: smita.kamath at intel.com (Kamath, Smita) Date: Thu, 7 Nov 2019 20:12:18 +0000 Subject: RFR(S) JDK-8233741: AES Countermode (AES-CTR) optimization using AVX512 + VAES instructions Message-ID: <6563F381B547594081EF9DE181D07912B2D4418D@fmsmsx121.amr.corp.intel.com> Hi Vladimir, As per Intel Architecture Instruction Set Reference [1] Vector AES (VAES) Operations will be supported in future Intel ISA. I would like to contribute an optimization for AES-CTR algorithm using AVX512+VAES instructions. This optimization is for x86_64 architecture that have AVX512-VAES enabled. I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented. I, smita.kamath at intel.com , Regev Shemy (regev.shemy at intel.com) and Shay Gueron, (shay.gueron at intel.com) are contributors to this code. Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8233741 Link to webrev: https://cr.openjdk.java.net/~srukmannagar/AESCTR/webrev.01 [1] https://software.intel.com/sites/default/files/managed/ad/01/253666-sdm-vol-2a.pdf (Pages 156 - 159) [2] https://software.intel.com/en-us/articles/intel-software-development-emulator Regards, Smita Kamath From vladimir.kozlov at oracle.com Thu Nov 7 21:09:29 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Nov 2019 13:09:29 -0800 Subject: RFR(S) JDK-8233741: AES Countermode (AES-CTR) optimization using AVX512 + VAES instructions In-Reply-To: <6563F381B547594081EF9DE181D07912B2D4418D@fmsmsx121.amr.corp.intel.com> References: <6563F381B547594081EF9DE181D07912B2D4418D@fmsmsx121.amr.corp.intel.com> Message-ID: Hi Smita, You don't need #ifdef _LP64 in stubGenerator_x86_64.cpp. This file is compiled only for 64-bit JVM. You also have trailing spaces - please remove them. Changes seem fine otherwise. I submit tier1 testing to make sure it builds. Thanks, Vladimir On 11/7/19 12:12 PM, Kamath, Smita wrote: > Hi Vladimir, > > > As per Intel Architecture Instruction Set Reference [1] Vector AES (VAES) Operations will be supported in future Intel ISA. I would like to contribute an optimization for AES-CTR algorithm using AVX512+VAES instructions. This optimization is for x86_64 architecture that have AVX512-VAES enabled. I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented. > > > I, smita.kamath at intel.com , Regev Shemy (regev.shemy at intel.com) and Shay Gueron, (shay.gueron at intel.com) are contributors to this code. > > Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8233741 > > Link to webrev: https://cr.openjdk.java.net/~srukmannagar/AESCTR/webrev.01 > > > [1] https://software.intel.com/sites/default/files/managed/ad/01/253666-sdm-vol-2a.pdf (Pages 156 - 159) > > [2] https://software.intel.com/en-us/articles/intel-software-development-emulator > > > Regards, > Smita Kamath > From lutz.schmidt at sap.com Thu Nov 7 21:33:49 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Thu, 7 Nov 2019 21:33:49 +0000 Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free blocks In-Reply-To: References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com> <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com> <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com> <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com> <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com> <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com> <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com> <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com> <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com> Message-ID: Hi Andrew, thanks for spending more thoughts on this matter - and for updating your opinion. The instrumentation and measurement of other tests will take longer than expected. It got delayed by JDK-8233787. The fix for this bug will enable my timing code to run smoother. Side note: this timing code I have mentioned now several times is nothing secret. It's just not suitable to contribute, among other reasons because it's only available for ppc and s390. I can give you more information in case you are interested - no problem if you say "ahhh, never mind...". Thanks, Lutz ?On 07.11.19, 17:34, "Andrew Dinn" wrote: On 04/11/2019 15:35, Schmidt, Lutz wrote: > thank you for your thoughts. I do not agree to your conclusion, > though. > > There are two bottlenecks in the CodeHeap management code. One is in > CodeHeap::mark_segmap_as_used(), uncovered by > OverflowCodeCacheTest.java. The other is in > CodeHeap::add_to_freelist(), uncovered by StressCodeCacheTest.java. > > Both bottlenecks are tackled by the recommended changeset. . . . > CodeHeap::add_to_freelist() is still O(n*n), with n being the free > list length. But the kick-in point of the non-linearity could be > significantly shifted towards larger n. The time reduction from > approx. 8 seconds to 160 milliseconds supports this statement. Ah sorry, I was not clear from your original post that the proposed change had significantly improved the time spent in free list management in the second test by significantly cutting down the free list size. As you say, a reduction factor of 1/K in list size will give a 1/K*K reduction in execution time. Since this test is a lot nearer to reality than the overflow test I think the current result is perhaps enough to justify its value. > I agree it would be helpful to have a "real-world" example showing > some improvement. Providing such evidence is hard, though. I could > instrument the code and print some values form time to time. It's > certain this additional output will mess up success/failure decisions > in our test environment. Not sure everybody likes that. But I will > give it a try and take the hits. This will be a multi-day effort. Well, that would be nice to have but not if it stops other work. The one thing about the Stress test that I fear may be 'unreal' is the potentially over-high probability of generating long(ish) runs of adjacent free segments. That might be giving an artificial win that we will not in fact see. However, given the current numbers I'd be happy to risk that and let this patch go in as is. > On a general note, I am always uncomfortable knowing of a O(n*n) > effort, in particular when it could be removed or at least tamed > considerably. Experience tells (at least to me) that, at some point > in time, n will be large enough to hurt. Well, yes, although salesman do travel /and/ make money ... ;-) > I'll be back. Sure, thanks for following up. This is all very interesting. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From igor.ignatyev at oracle.com Thu Nov 7 21:41:10 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 7 Nov 2019 13:41:10 -0800 Subject: RFR(S) : 8230364 : [JVMCI] a number of JVMCI tests are not jtreg enabled In-Reply-To: <09AB4909-D013-44C3-8A80-4334A5F52902@oracle.com> References: <82DF702D-19C9-486B-8B5D-82C4F94D0A95@oracle.com> <09AB4909-D013-44C3-8A80-4334A5F52902@oracle.com> Message-ID: Thanks Vladimir, pushed. -- Igor > On Nov 6, 2019, at 9:20 PM, Vladimir Kozlov wrote: > > Looks good. > > Thanks > Vladimir > >> On Nov 6, 2019, at 8:23 PM, Igor Ignatyev wrote: >> >> http://cr.openjdk.java.net/~iignatyev//8230364/webrev.02/ >>> 102 lines changed: 72 ins; 9 del; 21 mod; >> >> Hi all, >> >> could you please review this small patch which adds jtreg test descriptions to all tests in compiler/jvmci/jdk.vm.ci.hotspot.test/src? >> to make it work, the patch also: >> - replaces junit ceremonies w/ testng ceremonies; >> - changes TestHotSpotJVMCIRuntime to use platform classLoader instead of ext. loader b/c ext. loader (and internal classes used by the test) got removed in jdk9; >> - temporary excludes TestTranslatedException. the test fails b/c decoded exception doesn't have information about modules. 8233745 is going to update jdk.vm.ci.hotspot.TranslatedException and remove @ignore from the test. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8230364 >> webrev: http://cr.openjdk.java.net/~iignatyev//8230364/webrev.02/ >> testing: "added" tests (test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.hotspot.test/) >> >> Thanks, >> -- Igor > From john.r.rose at oracle.com Thu Nov 7 21:58:45 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 7 Nov 2019 13:58:45 -0800 Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: References: <58CC373D-E70D-4503-8578-6110F3FC04EE@oracle.com> Message-ID: Would you consider adding patterns for non-constant masks also? It would be something like (And (LShift n) x), etc. It could be in this set or in an a follow-on. Thanks (says John who always wants more). > On Nov 7, 2019, at 11:30 AM, B. Blaser wrote: > > Hi Vladimir, Sandhya and John, > > Thanks for your respective answers. > > The suggested fix focuses on x86_64 and pure 64-bit immediates which > means that all other cases are left unchanged as shown by the initial > benchmark, for example: > > andq &= ~MASK00; > orq |= MASK00; > > would still give: > > 03c andq [RSI + #16 (8-bit)], #-2 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 041 orq [RSI + #24 (8-bit)], #1 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 046 ... > > Now, the interesting point is that pure 64-bit immediates (which > cannot be treated as sign-extended 8/32-bit values) are assembled > using two instructions (not one) because AND/OR cannot be used > directly in such cases, for example: > > andq &= ~MASK63; > orq |= MASK63; > > gives: > > 03e movq R10, #9223372036854775807 # long > 048 andq [RSI + #16 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 04c movq R10, #-9223372036854775808 # long > 056 orq [RSI + #24 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 05a ... > > So, even though Sandhya mentioned a better throughput for AND/OR, the > additional MOV cost (I didn't find it in table C-17 but I assume > something close to MOVS/Z with latency=1/throughput=0.25) seems to be > in favor of a sole BTR/BTS instruction as shown by the initial > benchmark. > > However, as John suggested, I tried another benchmark which focuses on > the throughput to make sure there isn't any regression in such > situations: > > private long orq63, orq62, orq61, orq60; > > @Benchmark > public void throughput(Blackhole bh) { > for (int i=0; i orq63 = orq62 = orq61 = orq60 = 0; > bh.consume(testTp()); > } > } > > private long testTp() { > orq63 |= MASK63; > orq62 |= MASK62; > orq61 |= MASK61; > orq60 |= MASK60; > return 0L; > } > > Before, we had: > > 03e movq R10, #-9223372036854775808 # long > 048 orq [RSI + #32 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq63 > 04c movq R10, #4611686018427387904 # long > 056 orq [RSI + #40 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq62 > 05a movq R10, #2305843009213693952 # long > 064 orq [RSI + #48 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq61 > 068 movq R10, #1152921504606846976 # long > 072 orq [RSI + #56 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq60 > > Benchmark Mode Cnt Score Error Units > BitSetAndReset.throughput avgt 9 25912.455 ? 2527.041 ns/op > > And after, we would have: > > 03c btsq [RSI + #32 (8-bit)], log2(#-9223372036854775808) > # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq63 > 042 btsq [RSI + #40 (8-bit)], log2(#4611686018427387904) # > long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq62 > 048 btsq [RSI + #48 (8-bit)], log2(#2305843009213693952) # > long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq61 > 04e btsq [RSI + #56 (8-bit)], log2(#1152921504606846976) # > long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq60 > > Benchmark Mode Cnt Score Error Units > BitSetAndReset.throughput avgt 9 25803.195 ? 2434.009 ns/op > > Fortunately, we still see a tiny performance gain along with the large > size reduction and register saving. > Should we go ahead with this optimization? If so, I'll post a RFR with > Vladimir's requested changes soon. > > Thanks, > Bernard > > On Thu, 7 Nov 2019 at 02:02, John Rose wrote: >> >> I recently saw LLVM compile a classification switch into a really tidy BTR instruction, >> something like this: >> >> switch (ch) { >> case ';': case '/': case '.': case '[': return 0; >> default: return 1; >> } >> => >> ? range check ? >> movabsq 0x200000002003, %rcx >> btq %rdi, %rcx >> >> It made me wish for this change, plus some more to switch itself. >> Given Sandhya?s report, though, BTR may only be helpful in limited >> cases. In the case above, it subsumes a shift instruction. >> >> Bernard?s JMH experiment suggests something else is going on besides >> the throughput difference which Sandhya cites. Maybe it?s a benchmark >> artifact, or maybe it?s a good effect from smaller code. I suggest jamming >> more back-to-back BTRs together, to see if the throughput effect appears. >> >> ? John >> >> On Nov 6, 2019, at 4:34 PM, Viswanathan, Sandhya wrote: >>> >>> Hi Vladimir/Bernard, >>> >>> >>> >>> I don?t see any restrictions/limitations on these instructions other than the fact that the ?long? operation is only supported on 64-bit format as usual so should be restricted to 64-bit JVM only. >>> >>> The code size improvement that Bernard demonstrates is significant for operation on longs. >>> >>> It looks like the throughput for AND/OR is better than BTR/BTS (0.25 vs 0.5) though. Please refer Table C-17 in the document below: >> From vladimir.kozlov at oracle.com Thu Nov 7 22:14:35 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Nov 2019 14:14:35 -0800 Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread In-Reply-To: References: <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com> <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com> <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com> <8BCAE092-9E05-4B38-9D41-2F3AC889C0AA@oracle.com> <84ef3a8c-5005-6529-5192-b9214e0348ac@oracle.com> Message-ID: I resubmitted testing. Vladimir On 11/7/19 1:51 AM, David Holmes wrote: > On 7/11/2019 7:08 pm, Doerr, Martin wrote: >> Hi David, >> >> get_log only accesses the executing thread's own oop and the ones before it. So it's ensured by >> the algorithm that all accessed oops are in live handles. > > Okay I see that now. > > Thanks, > David > >> The problem is in can_remove when not holding the lock. For that, webrev.04 avoids accessing the >> oop of the last compiler thread in the case in which the lock is not held. >> >> Best regards, >> Martin >> >> >>> -----Original Message----- >>> From: David Holmes >>> Sent: Mittwoch, 6. November 2019 11:15 >>> To: Doerr, Martin ; Kim Barrett >>> >>> Cc: dean.long at oracle.com; Vladimir Kozlov (vladimir.kozlov at oracle.com) >>> ; hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread >>> >>> Hi Martin, >>> >>> On 6/11/2019 7:12 pm, Doerr, Martin wrote: >>>> Hi Kim, >>>> >>>> thanks for confirming. >>>> >>>> >>> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr >>> ev.04/ >>>> already avoids access to freed handles. >>> >>> Sorry I missed your earlier reference to this version. >>> >>> So the expectation here is that all accesses to these arrays are guarded >>> by the CompileThread_lock, but that doesn't seem to hold for get_log ? >>> >>> Thanks, >>> David >>> ----- >>> >>>> I don't really like the complexity of this code. >>>> Replacing oops in handles would have been much more simple. >>>> But I can live with either version. >>>> >>>> Best regards, >>>> Martin >>>> >>>> >>>>> -----Original Message----- >>>>> From: Kim Barrett >>>>> Sent: Mittwoch, 6. November 2019 04:09 >>>>> To: Doerr, Martin >>>>> Cc: David Holmes ; dean.long at oracle.com; >>>>> Vladimir Kozlov (vladimir.kozlov at oracle.com) >>>>> ; hotspot-compiler-dev at openjdk.java.net >>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread >>>>> >>>>>> On Nov 5, 2019, at 3:40 AM, Doerr, Martin >>> wrote: >>>>> >>>>> Coming back in, because this seems to be going off into the weeds again. >>>>> >>>>>>> I don't understand what you mean. If a compiler thread holds an oop, >>> any >>>>>>> oop, it must hold it in a Handle to ensure it can't be gc'd. >>>>>> >>>>>> The problem is not related to gc. >>>>>> My change introduces destroy_global for the handles. This means that >>> the >>>>> OopStorage portion which has held the oop can get freed. >>>>>> However, other compiler threads are running concurrently. They may >>>>> execute code which reads the oop from the handle which is freed by this >>>>> thread. >>>>>> Reading stale data is not a problem here, but reading freed memory may >>>>> assert or even crash in general. >>>>>> I can't see how OopStorage supports reading from handles which were >>>>> freed by destroy_global. >>>>> >>>>> So don't do that! >>>>> >>>>> OopStorage isn't magic. If you are going to look at an OopStorage >>>>> handle, you have to ensure there won't be concurrent deletion. Use >>>>> locks or some safe memory reclamation protocol. (GlobalCounter might >>>>> be used here, but it depends a lot on what the iterations are doing. A >>>>> reference counting mechanism is another possibility.) This is no >>>>> different from any other resource management. >>>>> >>>>>> I think it would be safe if the freeing only occurred at safepoints, but I >>> don't >>>>> think this is the case. >>>>> >>>>> Assuming the iteration didn?t happen at safepoints (which is just a way to >>>>> make the iteration and >>>>> deletion not concurrent).? And I agree that isn?t the case with the current >>>>> code. >>>> From igor.ignatyev at oracle.com Thu Nov 7 23:15:24 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 7 Nov 2019 15:15:24 -0800 Subject: RFR(S) : 8233745 : [JVMCI] TranslatedException should serialize classloader and module info Message-ID: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com> http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00/index.html > 71 lines changed: 50 ins; 14 del; 7 mod; Hi all, could you please review the small patch which updates jdk/vm/ci/hotspot/TranslatedException to encode/decode StackTraceElement fields which were introduced in JDK9 (classloader name, module name and version fields)? I wasn't able to make deserialize StackTraceElement::toString to return the same string representation as original ones b/c StackTraceElement::declaringClassObject won't be set, as a result, JDK_NON_UPGRADEABLE_MODULE and BUILTIN_CLASS_LOADER bits won't be set either and StackTraceElement::toString will have classloader names even for built-it loader (won't be in original b/c dropClassLoaderName() is true) and version of system modules (won't be in original b/c dropModuleVersion() is true); so I changed how TestTranslatedException compares original and decoded exceptions. webrev: http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00 JBS: https://bugs.openjdk.java.net/browse/JDK-8233745 testing: compiler/jvmci/ + graal tiers Thanks, -- Igor From vladimir.kozlov at oracle.com Thu Nov 7 23:33:21 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Nov 2019 15:33:21 -0800 Subject: RFR(S) : 8233745 : [JVMCI] TranslatedException should serialize classloader and module info In-Reply-To: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com> References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com> Message-ID: Good. Tom and Doug should look on this. Thanks, Vladimir On 11/7/19 3:15 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00/index.html >> 71 lines changed: 50 ins; 14 del; 7 mod; > > Hi all, > > could you please review the small patch which updates jdk/vm/ci/hotspot/TranslatedException to encode/decode StackTraceElement fields which were introduced in JDK9 (classloader name, module name and version fields)? > > I wasn't able to make deserialize StackTraceElement::toString to return the same string representation as original ones b/c StackTraceElement::declaringClassObject won't be set, as a result, JDK_NON_UPGRADEABLE_MODULE and BUILTIN_CLASS_LOADER bits won't be set either and StackTraceElement::toString will have classloader names even for built-it loader (won't be in original b/c dropClassLoaderName() is true) and version of system modules (won't be in original b/c dropModuleVersion() is true); so I changed how TestTranslatedException compares original and decoded exceptions. > > webrev: http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00 > JBS: https://bugs.openjdk.java.net/browse/JDK-8233745 > testing: compiler/jvmci/ + graal tiers > > Thanks, > -- Igor > > From sandhya.viswanathan at intel.com Fri Nov 8 00:57:28 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 8 Nov 2019 00:57:28 +0000 Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: <5fac20bf-d5d6-3380-894e-0cb488fb0dca@oracle.com> References: <58CC373D-E70D-4503-8578-6110F3FC04EE@oracle.com> <5fac20bf-d5d6-3380-894e-0cb488fb0dca@oracle.com> Message-ID: Thanks a lot Bernard, for identifying these optimizations. Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev On Behalf Of Vladimir Kozlov Sent: Thursday, November 07, 2019 11:51 AM To: B. Blaser ; John Rose Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and setting long vector bits I agree with you, Bernard. I think throughput performance is limited by memory accesses which is the same in both cases. But code reduction is very nice improvement. We can squeeze more code into CPU buffer which is very good for small loops. Please, send official RFR and to testing. Also would be nice to have a test which verifies result of these operations. Thanks, Vladimir On 11/7/19 11:30 AM, B. Blaser wrote: > Hi Vladimir, Sandhya and John, > > Thanks for your respective answers. > > The suggested fix focuses on x86_64 and pure 64-bit immediates which > means that all other cases are left unchanged as shown by the initial > benchmark, for example: > > andq &= ~MASK00; > orq |= MASK00; > > would still give: > > 03c andq [RSI + #16 (8-bit)], #-2 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 041 orq [RSI + #24 (8-bit)], #1 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 046 ... > > Now, the interesting point is that pure 64-bit immediates (which > cannot be treated as sign-extended 8/32-bit values) are assembled > using two instructions (not one) because AND/OR cannot be used > directly in such cases, for example: > > andq &= ~MASK63; > orq |= MASK63; > > gives: > > 03e movq R10, #9223372036854775807 # long > 048 andq [RSI + #16 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.andq > 04c movq R10, #-9223372036854775808 # long > 056 orq [RSI + #24 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq > 05a ... > > So, even though Sandhya mentioned a better throughput for AND/OR, the > additional MOV cost (I didn't find it in table C-17 but I assume > something close to MOVS/Z with latency=1/throughput=0.25) seems to be > in favor of a sole BTR/BTS instruction as shown by the initial > benchmark. > > However, as John suggested, I tried another benchmark which focuses on > the throughput to make sure there isn't any regression in such > situations: > > private long orq63, orq62, orq61, orq60; > > @Benchmark > public void throughput(Blackhole bh) { > for (int i=0; i orq63 = orq62 = orq61 = orq60 = 0; > bh.consume(testTp()); > } > } > > private long testTp() { > orq63 |= MASK63; > orq62 |= MASK62; > orq61 |= MASK61; > orq60 |= MASK60; > return 0L; > } > > Before, we had: > > 03e movq R10, #-9223372036854775808 # long > 048 orq [RSI + #32 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq63 > 04c movq R10, #4611686018427387904 # long > 056 orq [RSI + #40 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq62 > 05a movq R10, #2305843009213693952 # long > 064 orq [RSI + #48 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq61 > 068 movq R10, #1152921504606846976 # long > 072 orq [RSI + #56 (8-bit)], R10 # long ! Field: > org/openjdk/bench/vm/compiler/BitSetAndReset.orq60 > > Benchmark Mode Cnt Score Error Units > BitSetAndReset.throughput avgt 9 25912.455 ? 2527.041 ns/op > > And after, we would have: > > 03c btsq [RSI + #32 (8-bit)], log2(#-9223372036854775808) > # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq63 > 042 btsq [RSI + #40 (8-bit)], log2(#4611686018427387904) # > long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq62 > 048 btsq [RSI + #48 (8-bit)], log2(#2305843009213693952) # > long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq61 > 04e btsq [RSI + #56 (8-bit)], log2(#1152921504606846976) # > long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq60 > > Benchmark Mode Cnt Score Error Units > BitSetAndReset.throughput avgt 9 25803.195 ? 2434.009 ns/op > > Fortunately, we still see a tiny performance gain along with the large > size reduction and register saving. > Should we go ahead with this optimization? If so, I'll post a RFR with > Vladimir's requested changes soon. > > Thanks, > Bernard > > On Thu, 7 Nov 2019 at 02:02, John Rose wrote: >> >> I recently saw LLVM compile a classification switch into a really >> tidy BTR instruction, something like this: >> >> switch (ch) { >> case ';': case '/': case '.': case '[': return 0; >> default: return 1; >> } >> => >> ? range check ? >> movabsq 0x200000002003, %rcx >> btq %rdi, %rcx >> >> It made me wish for this change, plus some more to switch itself. >> Given Sandhya?s report, though, BTR may only be helpful in limited >> cases. In the case above, it subsumes a shift instruction. >> >> Bernard?s JMH experiment suggests something else is going on besides >> the throughput difference which Sandhya cites. Maybe it?s a >> benchmark artifact, or maybe it?s a good effect from smaller code. I >> suggest jamming more back-to-back BTRs together, to see if the throughput effect appears. >> >> ? John >> >> On Nov 6, 2019, at 4:34 PM, Viswanathan, Sandhya wrote: >>> >>> Hi Vladimir/Bernard, >>> >>> >>> >>> I don?t see any restrictions/limitations on these instructions other than the fact that the ?long? operation is only supported on 64-bit format as usual so should be restricted to 64-bit JVM only. >>> >>> The code size improvement that Bernard demonstrates is significant for operation on longs. >>> >>> It looks like the throughput for AND/OR is better than BTR/BTS (0.25 vs 0.5) though. Please refer Table C-17 in the document below: >> From smita.kamath at intel.com Fri Nov 8 01:34:52 2019 From: smita.kamath at intel.com (Kamath, Smita) Date: Fri, 8 Nov 2019 01:34:52 +0000 Subject: RFR(S) JDK-8233741: AES Countermode (AES-CTR) optimization using AVX512 + VAES instructions In-Reply-To: References: <6563F381B547594081EF9DE181D07912B2D4418D@fmsmsx121.amr.corp.intel.com> Message-ID: <6563F381B547594081EF9DE181D07912B2D4495E@fmsmsx121.amr.corp.intel.com> Hi Vladimir, I have made the changes. Please find the updated webrev at: http://cr.openjdk.java.net/~srukmannagar/AESCTR/webrev.02/ Thanks and Regards, Smita Kamath -----Original Message----- From: Vladimir Kozlov Sent: Thursday, November 07, 2019 1:09 PM To: Kamath, Smita Cc: 'hotspot compiler' ; Shemy, Regev Subject: Re: RFR(S) JDK-8233741: AES Countermode (AES-CTR) optimization using AVX512 + VAES instructions Hi Smita, You don't need #ifdef _LP64 in stubGenerator_x86_64.cpp. This file is compiled only for 64-bit JVM. You also have trailing spaces - please remove them. Changes seem fine otherwise. I submit tier1 testing to make sure it builds. Thanks, Vladimir On 11/7/19 12:12 PM, Kamath, Smita wrote: > Hi Vladimir, > > > As per Intel Architecture Instruction Set Reference [1] Vector AES (VAES) Operations will be supported in future Intel ISA. I would like to contribute an optimization for AES-CTR algorithm using AVX512+VAES instructions. This optimization is for x86_64 architecture that have AVX512-VAES enabled. I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented. > > > I, smita.kamath at intel.com , Regev Shemy (regev.shemy at intel.com) and Shay Gueron, (shay.gueron at intel.com) are contributors to this code. > > Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8233741 > > Link to webrev: https://cr.openjdk.java.net/~srukmannagar/AESCTR/webrev.01 > > > [1] https://software.intel.com/sites/default/files/managed/ad/01/253666-sdm-vol-2a.pdf (Pages 156 - 159) > > [2] https://software.intel.com/en-us/articles/intel-software-development-emulator > > > Regards, > Smita Kamath > From vladimir.kozlov at oracle.com Fri Nov 8 01:51:01 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Nov 2019 17:51:01 -0800 Subject: RFR(S) JDK-8233741: AES Countermode (AES-CTR) optimization using AVX512 + VAES instructions In-Reply-To: <6563F381B547594081EF9DE181D07912B2D4495E@fmsmsx121.amr.corp.intel.com> References: <6563F381B547594081EF9DE181D07912B2D4418D@fmsmsx121.amr.corp.intel.com> <6563F381B547594081EF9DE181D07912B2D4495E@fmsmsx121.amr.corp.intel.com> Message-ID: <522c1d75-c0ce-146f-9838-3445d6638a91@oracle.com> Looks good. I pushed 02 version since tests passed with same changes - I removed #ifdef _LP64 in stubGenerator_x86_64.cpp before testing. And I compared patches - in new one spaces were removed as asked. This should not affect tests results. thanks, Vladimir On 11/7/19 5:34 PM, Kamath, Smita wrote: > Hi Vladimir, > > I have made the changes. Please find the updated webrev at: > http://cr.openjdk.java.net/~srukmannagar/AESCTR/webrev.02/ > > Thanks and Regards, > Smita Kamath > > > -----Original Message----- > From: Vladimir Kozlov > Sent: Thursday, November 07, 2019 1:09 PM > To: Kamath, Smita > Cc: 'hotspot compiler' ; Shemy, Regev > Subject: Re: RFR(S) JDK-8233741: AES Countermode (AES-CTR) optimization using AVX512 + VAES instructions > > Hi Smita, > > You don't need #ifdef _LP64 in stubGenerator_x86_64.cpp. This file is compiled only for 64-bit JVM. > You also have trailing spaces - please remove them. > > Changes seem fine otherwise. I submit tier1 testing to make sure it builds. > > Thanks, > Vladimir > > On 11/7/19 12:12 PM, Kamath, Smita wrote: >> Hi Vladimir, >> >> >> As per Intel Architecture Instruction Set Reference [1] Vector AES (VAES) Operations will be supported in future Intel ISA. I would like to contribute an optimization for AES-CTR algorithm using AVX512+VAES instructions. This optimization is for x86_64 architecture that have AVX512-VAES enabled. I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented. >> >> >> I, smita.kamath at intel.com , Regev Shemy (regev.shemy at intel.com) and Shay Gueron, (shay.gueron at intel.com) are contributors to this code. >> >> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8233741 >> >> Link to webrev: https://cr.openjdk.java.net/~srukmannagar/AESCTR/webrev.01 >> >> >> [1] https://software.intel.com/sites/default/files/managed/ad/01/253666-sdm-vol-2a.pdf (Pages 156 - 159) >> >> [2] https://software.intel.com/en-us/articles/intel-software-development-emulator >> >> >> Regards, >> Smita Kamath >> From tobias.hartmann at oracle.com Fri Nov 8 07:56:10 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 8 Nov 2019 08:56:10 +0100 Subject: [14] RFR(S): 8233788: Remove useless asserts in PhaseCFG::insert_anti_dependences In-Reply-To: References: Message-ID: <88db889c-4601-e654-ec03-973b1f381f98@oracle.com> Thanks Vladimir. Best regards, Tobias On 07.11.19 20:28, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 11/7/19 3:34 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following cleanup: >> https://bugs.openjdk.java.net/browse/JDK-8233788 >> http://cr.openjdk.java.net/~thartmann/8233788/webrev.00/ >> >> load_alias_idx can never be 0 and even if it could be, one of the asserts would fail because the >> opcode check handles only a single type. In fact, all these intrinsic nodes have adr_type() == NULL >> which maps to AliasIdxTop. >> >> Thanks, >> Tobias >> From tobias.hartmann at oracle.com Fri Nov 8 08:38:06 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 8 Nov 2019 09:38:06 +0100 Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted to a load for strong refs In-Reply-To: References: Message-ID: <575000ef-fc06-c33e-07e3-d8046db1ea43@oracle.com> Hi Erik, On 07.11.19 14:49, Erik ?sterlund wrote: > We have noticed problems with the one single place where we let C2 touch the graph while injecting > our load barriers. Right before tagging a load as needing a load barrier, it is GVN transformed. The > problem with this is that if we emit a strong field load, the GVN transformation can see through > that load, by following it through the store that set its value, and find that the field value > really came from the load of a weak Reference.get intrinsic. In this scenario, the load we get out > from the GVN transformation needs weak barriers, yet we override it with strong barriers, as that > was the semantics of the access being parsed. Sigh. We already have code that tries to determine if > the load we got out from the GVN transformation looks like a load that was created in the > BarrierSetC2 factory function, so one way of solving this is to refine that logic that tries to > determine if this was the load we created before the transformation or not. Do we still need that logic with your change? > But I felt like a better > solution is to finish constructing the access with all the intended properties *before* transformation Yes, that's much better. > I massaged the code so that the GC barrier data of accesses with load barriers gets passed in to the > factory functions that create the access, right before the transformation. This way, we construct > the access with the intended semantics where it is being created (parser or macro expansion for > field accesses in clone intrinsics). Then we do not have to touch it after the GVN transformation. > > It does seem like there could be similar problems from other GCs, but in e.g. G1, the consequences > are weird suboptimal code instead of anything dangerous happening. For example, we can generate SATB > buffering code required by G1 Reference.get() intrinsics for strong accesses, due to GVN handing out > earlier accesses with different semantics. Perhaps that should be looked into separately as well. > But that investigation is outside of the scope of this bug fix. Could you please file an RFE for that? I also wonder if we could assert that the barrier data is set when GVN is performed? That way we would catch problems like the one you've described above early. > Webrev: > http://cr.openjdk.java.net/~eosterlund/8233506/webrev.00/ Looks good to me! Best regards, Tobias From tobias.hartmann at oracle.com Fri Nov 8 09:46:30 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 8 Nov 2019 10:46:30 +0100 Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed: must have CFG nodes Message-ID: Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8233656 http://cr.openjdk.java.net/~thartmann/8233656/webrev.00/ During IGVN, we process a CastII node that carries a non-zero dependency from GraphKit::cast_not_null [1]. ConstraintCastNode::dominating_cast then finds another CastII and checks if it's dominating. We assert in PhaseGVN::is_dominator_helper because the other CastII has a ProjNode as control input that has !is_CFG() because it's input is TOP [2]. The input has been replaced in the same round of IGVN and the projection is already on the IGVN worklist but hasn't been processed yet (it will go away). I propose to simply check the control inputs for is_CFG(). I can reproduce the issue with a complex Javafuzzer generated test (attached to the bug) but minimal changes/simplifications to the test cause the issue to not reproduce anymore because it depends on the order in which nodes are processed by IGVN. So I don't think it makes sense to include that fragile test. This has been triggered by my fix for 8229496 [3] which added additional Cast nodes but I believe it can also happen without these changes. Thanks, Tobias [1] https://hg.openjdk.java.net/jdk/jdk/rev/86b95fc6ca32#l12.40 [2] https://hg.openjdk.java.net/jdk/jdk/file/47c20fc6a517/src/hotspot/share/opto/multnode.cpp#l83 [3] https://bugs.openjdk.java.net/browse/JDK-8229496 From tobias.hartmann at oracle.com Fri Nov 8 09:54:41 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 8 Nov 2019 10:54:41 +0100 Subject: RFR 8233389: Add PrintIdeal to compiler directives In-Reply-To: <975bbd8f-8021-8a5c-544b-123b2a2e08d7@oracle.com> References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com> <21ab250e-0564-69e4-62c0-07bb4dce9082@oracle.com> <6547a22a-47e2-822f-0772-c2b0a7599088@oracle.com> <975bbd8f-8021-8a5c-544b-123b2a2e08d7@oracle.com> Message-ID: <19f1d1e3-74c5-7228-6f5d-47f616323d03@oracle.com> Hi Jorn, looks good to me too. I'll sponsor. Best regards, Tobias On 05.11.19 17:54, Vladimir Ivanov wrote: > >> http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.02/ > > Looks good. > >> W.r.t. usefulness of PrintIdeal vs PrintIdealGraph; The obvious thing is that PrintIdeal doesn't >> require IGV, which might be more useful if PrintIdeal were a diagnostic flag instead (as suggested >> by Nils), so it could be used from a standard JDK build which doesn't come with IGV. Another >> advantage that comes to mind is that PrintIdeal output is easier to share as text; you can just >> copy a few lines into the body of an email. That said, I haven't used either of these flags >> extensively, so I find it hard to judge whether one is clearly better than the other. But, it >> seems at least unfortunate that we have the PrintIdeal flag, but can not use it in compiler >> directives to filter the output. > > It's a long-standing issue with tracing functionality in C2: both options depend on ability to dump > Node and Type instances in textual form which is absent in product binaries. It would be nice to > bundle that code in product binaries as well and turn both options into diagnostic ones, but nobody > have taken care of it yet. > > Also, at some point, IGV had a text view of the graph (which was pretty close to PrintIdeal output), > but I can't find it there anymore. > > Best regards, > Vladimir Ivanov > >> On 04/11/2019 15:57, Vladimir Ivanov wrote: >>> Hi Jorn, >>> >>> src\hotspot\share\opto\compile.hpp: >>> +?? bool????????????????? _print_ideal;?????????? // True if we should dump node IR for this >>> compilation >>> >>> Since the only usage is in non-product code, I suggest to put _print_ideal into #ifndef PRODUCT, >>> so you don't need to initialize it in product build. >>> >>> Also, it'll allow you to just put it on initializer list instead of doing it in the ctor body >>> (akin to how _trace_opto_output is handled): >>> >>> src\hotspot\share\opto\compile.cpp: >>> >>> Compile::Compile( ciEnv* ci_env, >>> ... >>> ? : Phase(Compiler), >>> ... >>> ??? _has_reserved_stack_access(false), >>> #ifndef PRODUCT >>> ??? _trace_opto_output(directive->TraceOptoOutputOption), >>> #endif >>> ??? _has_method_handle_invokes(false), >>> >>> >>> Overall, I don't see much value in PrintIdeal: PrintIdealGraph provides much more detailed >>> information (even though in XML format) and IdealGraphVisualizer is better at browsing the graph. >>> The only thing I'm usually missing is full text dump output on individual nodes (they are shown >>> pruned in IGV; not sure whether it's IGV fault or the info is missing in the dump). >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> On 01.11.2019 18:09, Jorn Vernee wrote: >>>> Hi, >>>> >>>> I'd like to add PrintIdeal as a compiler directive in order to enable PrintIdeal for only a >>>> single method when combining it with the 'match' directive. >>>> >>>> Please review the following: >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8233389 >>>> Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/ >>>> (Testing = tier1, manual) >>>> >>>> As a heads-up; I'm not a committer on the jdk project, so if this sounds like a good idea, I >>>> would require a sponsor to push the changes. >>>> >>>> Thanks, >>>> Jorn >>>> From jorn.vernee at oracle.com Fri Nov 8 10:05:15 2019 From: jorn.vernee at oracle.com (Jorn Vernee) Date: Fri, 8 Nov 2019 11:05:15 +0100 Subject: RFR 8233389: Add PrintIdeal to compiler directives In-Reply-To: <19f1d1e3-74c5-7228-6f5d-47f616323d03@oracle.com> References: <3579cd4e-cccc-d77b-7a7d-14f9483dfa80@oracle.com> <21ab250e-0564-69e4-62c0-07bb4dce9082@oracle.com> <6547a22a-47e2-822f-0772-c2b0a7599088@oracle.com> <975bbd8f-8021-8a5c-544b-123b2a2e08d7@oracle.com> <19f1d1e3-74c5-7228-6f5d-47f616323d03@oracle.com> Message-ID: <4b014e01-2233-77da-f3d3-7407f226833b@oracle.com> Thanks Tobias! Jorn On 08/11/2019 10:54, Tobias Hartmann wrote: > Hi Jorn, > > looks good to me too. I'll sponsor. > > Best regards, > Tobias > > On 05.11.19 17:54, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.02/ >> Looks good. >> >>> W.r.t. usefulness of PrintIdeal vs PrintIdealGraph; The obvious thing is that PrintIdeal doesn't >>> require IGV, which might be more useful if PrintIdeal were a diagnostic flag instead (as suggested >>> by Nils), so it could be used from a standard JDK build which doesn't come with IGV. Another >>> advantage that comes to mind is that PrintIdeal output is easier to share as text; you can just >>> copy a few lines into the body of an email. That said, I haven't used either of these flags >>> extensively, so I find it hard to judge whether one is clearly better than the other. But, it >>> seems at least unfortunate that we have the PrintIdeal flag, but can not use it in compiler >>> directives to filter the output. >> It's a long-standing issue with tracing functionality in C2: both options depend on ability to dump >> Node and Type instances in textual form which is absent in product binaries. It would be nice to >> bundle that code in product binaries as well and turn both options into diagnostic ones, but nobody >> have taken care of it yet. >> >> Also, at some point, IGV had a text view of the graph (which was pretty close to PrintIdeal output), >> but I can't find it there anymore. >> >> Best regards, >> Vladimir Ivanov >> >>> On 04/11/2019 15:57, Vladimir Ivanov wrote: >>>> Hi Jorn, >>>> >>>> src\hotspot\share\opto\compile.hpp: >>>> +?? bool????????????????? _print_ideal;?????????? // True if we should dump node IR for this >>>> compilation >>>> >>>> Since the only usage is in non-product code, I suggest to put _print_ideal into #ifndef PRODUCT, >>>> so you don't need to initialize it in product build. >>>> >>>> Also, it'll allow you to just put it on initializer list instead of doing it in the ctor body >>>> (akin to how _trace_opto_output is handled): >>>> >>>> src\hotspot\share\opto\compile.cpp: >>>> >>>> Compile::Compile( ciEnv* ci_env, >>>> ... >>>> ? : Phase(Compiler), >>>> ... >>>> ??? _has_reserved_stack_access(false), >>>> #ifndef PRODUCT >>>> ??? _trace_opto_output(directive->TraceOptoOutputOption), >>>> #endif >>>> ??? _has_method_handle_invokes(false), >>>> >>>> >>>> Overall, I don't see much value in PrintIdeal: PrintIdealGraph provides much more detailed >>>> information (even though in XML format) and IdealGraphVisualizer is better at browsing the graph. >>>> The only thing I'm usually missing is full text dump output on individual nodes (they are shown >>>> pruned in IGV; not sure whether it's IGV fault or the info is missing in the dump). >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> On 01.11.2019 18:09, Jorn Vernee wrote: >>>>> Hi, >>>>> >>>>> I'd like to add PrintIdeal as a compiler directive in order to enable PrintIdeal for only a >>>>> single method when combining it with the 'match' directive. >>>>> >>>>> Please review the following: >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8233389 >>>>> Webrev: http://cr.openjdk.java.net/~jvernee/print_ideal/webrev.00/ >>>>> (Testing = tier1, manual) >>>>> >>>>> As a heads-up; I'm not a committer on the jdk project, so if this sounds like a good idea, I >>>>> would require a sponsor to push the changes. >>>>> >>>>> Thanks, >>>>> Jorn >>>>> From doug.simon at oracle.com Fri Nov 8 11:18:40 2019 From: doug.simon at oracle.com (Doug Simon) Date: Fri, 8 Nov 2019 12:18:40 +0100 Subject: RFR(S) : 8233745 : [JVMCI] TranslatedException should serialize classloader and module info In-Reply-To: References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com> Message-ID: <327E76EB-4257-449F-8648-0546203D8216@oracle.com> Hi Igor, To understand the bits lost in the translation as you describe below, can you please paste here or in the issue an example of before and after of a translated exception that looses info in the translation. -Doug > On 8 Nov 2019, at 00:33, Vladimir Kozlov wrote: > > Good. > > Tom and Doug should look on this. > > Thanks, > Vladimir > > On 11/7/19 3:15 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00/index.html >>> 71 lines changed: 50 ins; 14 del; 7 mod; >> Hi all, >> could you please review the small patch which updates jdk/vm/ci/hotspot/TranslatedException to encode/decode StackTraceElement fields which were introduced in JDK9 (classloader name, module name and version fields)? >> I wasn't able to make deserialize StackTraceElement::toString to return the same string representation as original ones b/c StackTraceElement::declaringClassObject won't be set, as a result, JDK_NON_UPGRADEABLE_MODULE and BUILTIN_CLASS_LOADER bits won't be set either and StackTraceElement::toString will have classloader names even for built-it loader (won't be in original b/c dropClassLoaderName() is true) and version of system modules (won't be in original b/c dropModuleVersion() is true); so I changed how TestTranslatedException compares original and decoded exceptions. >> webrev: http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00 >> JBS: https://bugs.openjdk.java.net/browse/JDK-8233745 >> testing: compiler/jvmci/ + graal tiers >> Thanks, >> -- Igor From bsrbnd at gmail.com Fri Nov 8 11:40:49 2019 From: bsrbnd at gmail.com (B. Blaser) Date: Fri, 8 Nov 2019 12:40:49 +0100 Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: References: <58CC373D-E70D-4503-8578-6110F3FC04EE@oracle.com> Message-ID: Yes, I'll also consider adding patterns for non-constant masks which might help in collections like BitSet or EnumSet (see [1] & [2]) by subsuming the shift operation provided that we do successful experiments. The range check of LLVM classification switch is somewhat different in the sense that it uses BT (bit test) which unfortunately sets uncommon flags (CF) instead of regular AND/TEST flags (ZF,...) which might require some additional work to replace the existing operations? However, the current patch being almost ready to be pushed, I'll look at these questions in separate issues. Thanks for these suggestions, Bernard [1] http://hg.openjdk.java.net/jdk/jdk/file/c709424ad48f/src/java.base/share/classes/java/util/BitSet.java#l452 [2] http://hg.openjdk.java.net/jdk/jdk/file/c709424ad48f/src/java.base/share/classes/java/util/RegularEnumSet.java#l165 On Thu, 7 Nov 2019 at 22:59, John Rose wrote: > > Would you consider adding patterns for non-constant masks also? > It would be something like (And (LShift n) x), etc. > It could be in this set or in an a follow-on. > Thanks (says John who always wants more). From tobias.hartmann at oracle.com Fri Nov 8 12:52:04 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 8 Nov 2019 13:52:04 +0100 Subject: [14] RFR(S): 8233529: loopTransform.cpp:2984: Error: assert(p_f->Opcode() == Op_IfFalse) failed Message-ID: <472426eb-a89a-dd79-c196-4a2e4fa6a2e2@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8233529 http://cr.openjdk.java.net/~thartmann/8233529/webrev.00/ We have two loops (see TestRemoveMainPostLoops.java): Loop A with an inner loop followed by Loop B. (1) OSR compilation is triggered in loop B. (2) Pre-/main-/post loops are created for loop B. (3) Main and post loops of B are found empty and are removed. (4) Inner loop A is fully unrolled and removed. (5) Only main and post loops are created for A (no pre loop -> "PeelMainPost") and main is unrolled. (6) Pre loop of A is found empty, attempt to remove main and post loop then incorrectly selects main loop from A. The loop layout looks like this: Loop: N0/N0 has_sfpt Loop: N383/N718 limit_check sfpts={ 160 } Loop: N512/N517 counted [int,int),+1 (4 iters) pre has_sfpt <- belongs to A Loop: N760/N338 counted [1,100),+2 (102 iters) main has_sfpt <- belongs to B Loop: N713/N716 counted [int,101),+1 (4 iters) post has_sfpt <- belongs to B Please note that the order of the two loops is not like in the Java code because it's an OSR compilation that starts execution in the second loop. I've strengthened the asserts in locate_pre_from_main() and added a check for is_main_no_pre_loop() in the caller. The code has been introduced by JDK-8085832 [1]. Thanks, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8085832 From nils.eliasson at oracle.com Fri Nov 8 14:58:06 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 8 Nov 2019 15:58:06 +0100 Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted to a load for strong refs In-Reply-To: References: Message-ID: Hi Erik, Looks good! Regards, Nils On 2019-11-07 14:49, Erik ?sterlund wrote: > Hi, > > We have noticed problems with the one single place where we let C2 > touch the graph while injecting our load barriers. Right before > tagging a load as needing a load barrier, it is GVN transformed. The > problem with this is that if we emit a strong field load, the GVN > transformation can see through that load, by following it through the > store that set its value, and find that the field value really came > from the load of a weak Reference.get intrinsic. In this scenario, the > load we get out from the GVN transformation needs weak barriers, yet > we override it with strong barriers, as that was the semantics of the > access being parsed. Sigh. We already have code that tries to > determine if the load we got out from the GVN transformation looks > like a load that was created in the BarrierSetC2 factory function, so > one way of solving this is to refine that logic that tries to > determine if this was the load we created before the transformation or > not. But I felt like a better solution is to finish constructing the > access with all the intended properties *before* transformation. > > I massaged the code so that the GC barrier data of accesses with load > barriers gets passed in to the factory functions that create the > access, right before the transformation. This way, we construct the > access with the intended semantics where it is being created (parser > or macro expansion for field accesses in clone intrinsics). Then we do > not have to touch it after the GVN transformation. > > It does seem like there could be similar problems from other GCs, but > in e.g. G1, the consequences are weird suboptimal code instead of > anything dangerous happening. For example, we can generate SATB > buffering code required by G1 Reference.get() intrinsics for strong > accesses, due to GVN handing out earlier accesses with different > semantics. Perhaps that should be looked into separately as well. But > that investigation is outside of the scope of this bug fix. > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8233506/webrev.00/ > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8233506 > > Thanks, > /Erik From erik.osterlund at oracle.com Fri Nov 8 15:00:42 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Fri, 8 Nov 2019 16:00:42 +0100 Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted to a load for strong refs In-Reply-To: References: Message-ID: <9f458896-caea-cb93-6b64-f300cf116d0f@oracle.com> Hi Nils, Thanks! /Erik On 11/8/19 3:58 PM, Nils Eliasson wrote: > Hi Erik, > > Looks good! > > Regards, > > Nils > > On 2019-11-07 14:49, Erik ?sterlund wrote: >> Hi, >> >> We have noticed problems with the one single place where we let C2 >> touch the graph while injecting our load barriers. Right before >> tagging a load as needing a load barrier, it is GVN transformed. The >> problem with this is that if we emit a strong field load, the GVN >> transformation can see through that load, by following it through the >> store that set its value, and find that the field value really came >> from the load of a weak Reference.get intrinsic. In this scenario, >> the load we get out from the GVN transformation needs weak barriers, >> yet we override it with strong barriers, as that was the semantics of >> the access being parsed. Sigh. We already have code that tries to >> determine if the load we got out from the GVN transformation looks >> like a load that was created in the BarrierSetC2 factory function, so >> one way of solving this is to refine that logic that tries to >> determine if this was the load we created before the transformation >> or not. But I felt like a better solution is to finish constructing >> the access with all the intended properties *before* transformation. >> >> I massaged the code so that the GC barrier data of accesses with load >> barriers gets passed in to the factory functions that create the >> access, right before the transformation. This way, we construct the >> access with the intended semantics where it is being created (parser >> or macro expansion for field accesses in clone intrinsics). Then we >> do not have to touch it after the GVN transformation. >> >> It does seem like there could be similar problems from other GCs, but >> in e.g. G1, the consequences are weird suboptimal code instead of >> anything dangerous happening. For example, we can generate SATB >> buffering code required by G1 Reference.get() intrinsics for strong >> accesses, due to GVN handing out earlier accesses with different >> semantics. Perhaps that should be looked into separately as well. But >> that investigation is outside of the scope of this bug fix. >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8233506/webrev.00/ >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8233506 >> >> Thanks, >> /Erik From erik.osterlund at oracle.com Fri Nov 8 15:15:48 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Fri, 8 Nov 2019 16:15:48 +0100 Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted to a load for strong refs In-Reply-To: <575000ef-fc06-c33e-07e3-d8046db1ea43@oracle.com> References: <575000ef-fc06-c33e-07e3-d8046db1ea43@oracle.com> Message-ID: <1e3fd82d-2235-e5c6-3e41-4d5b5242ced4@oracle.com> Hi Tobias, On 11/8/19 9:38 AM, Tobias Hartmann wrote: > Hi Erik, > > On 07.11.19 14:49, Erik ?sterlund wrote: >> We have noticed problems with the one single place where we let C2 touch the graph while injecting >> our load barriers. Right before tagging a load as needing a load barrier, it is GVN transformed. The >> problem with this is that if we emit a strong field load, the GVN transformation can see through >> that load, by following it through the store that set its value, and find that the field value >> really came from the load of a weak Reference.get intrinsic. In this scenario, the load we get out >> from the GVN transformation needs weak barriers, yet we override it with strong barriers, as that >> was the semantics of the access being parsed. Sigh. We already have code that tries to determine if >> the load we got out from the GVN transformation looks like a load that was created in the >> BarrierSetC2 factory function, so one way of solving this is to refine that logic that tries to >> determine if this was the load we created before the transformation or not. > Do we still need that logic with your change? Nope! :) Because the access nodes have their barrier data populated before transformation, describing the semantics of the produced access, we don't care what GVN gives us after transformation. Whatever it gives us has the correct semantics for the corresponding access that produced it. >> But I felt like a better >> solution is to finish constructing the access with all the intended properties *before* transformation > Yes, that's much better. > >> I massaged the code so that the GC barrier data of accesses with load barriers gets passed in to the >> factory functions that create the access, right before the transformation. This way, we construct >> the access with the intended semantics where it is being created (parser or macro expansion for >> field accesses in clone intrinsics). Then we do not have to touch it after the GVN transformation. >> >> It does seem like there could be similar problems from other GCs, but in e.g. G1, the consequences >> are weird suboptimal code instead of anything dangerous happening. For example, we can generate SATB >> buffering code required by G1 Reference.get() intrinsics for strong accesses, due to GVN handing out >> earlier accesses with different semantics. Perhaps that should be looked into separately as well. >> But that investigation is outside of the scope of this bug fix. > Could you please file an RFE for that? I also wonder if we could assert that the barrier data is set > when GVN is performed? That way we would catch problems like the one you've described above early. Sure, will file another RFE for that. Regarding the assert, it's not obvious what it would look like, since the assert in the transformation code has to know exactly what nodes are expected to have GC data. For example, a LoadP might be used to read any pointer, including but not limited to oops. And some oops don't need barriers like the threadOop due to being processed in safepoints. So given a LoadP node for example, I don't know if we can determine whether it should or should not have GC data. >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8233506/webrev.00/ > Looks good to me! Thanks Tobias! /Erik > Best regards, > Tobias From goetz.lindenmaier at sap.com Fri Nov 8 15:32:48 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 8 Nov 2019 15:32:48 +0000 Subject: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. In-Reply-To: <5e283ed2-1f93-f3a1-1762-da6d5cf465a2@oracle.com> References: <3f1be78d-b5e3-192b-d05f-e81ed520d65a@oracle.com> <5e283ed2-1f93-f3a1-1762-da6d5cf465a2@oracle.com> Message-ID: Hi, I waited for https://bugs.openjdk.java.net/browse/JDK-8233081 which makes one of the fixes unnecessary. Also, I had to fix the argument of verify_oop_helper from oop to oopDesc* for the fastdebug build. New webrev: http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/03/ Best regards, Goetz. > -----Original Message----- > From: David Holmes > Sent: Freitag, 18. Oktober 2019 01:38 > To: Lindenmaier, Goetz ; hotspot-runtime- > dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' compiler-dev at openjdk.java.net> > Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. > > On 18/10/2019 12:10 am, Lindenmaier, Goetz wrote: > > Hi David, > > > > you are right, thanks for pointing me to that! > > Doing one test for vm.bits=64 and one for 32 should fix it: > > http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/ > > s/01/02/ :) > > For the 32-bit case you can delete the line: > > * @requires vm.debug & (os.arch != "sparc") & (os.arch != "sparcv9") > > For the 64-but case you can delete the "sparc" check from the same line. > > Thanks, > David > > > > > Best regards, > > Goetz. > > > >> -----Original Message----- > >> From: David Holmes > >> Sent: Donnerstag, 17. Oktober 2019 13:18 > >> To: Lindenmaier, Goetz ; hotspot-runtime- > >> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' >> compiler-dev at openjdk.java.net> > >> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. > >> > >> Hi Goetz, > >> > >> UseCompressedOops is a 64-bit flag only so your change will break the > >> test on 32-bit systems. > >> > >> David > >> > >> On 17/10/2019 8:55 pm, Lindenmaier, Goetz wrote: > >>> Hi, > >>> > >>> 8231058 introduced a test that enables +VerifyOops. > >>> This fails on ppc, because this was not used in a very > >>> long time. > >>> > >>> The crash is caused by passing compressed oops from > >>> LIR_Assembler::store() to the checker routine. > >>> I fix this by implementing a checker routine verify_coop > >>> that first decompresses the coop. This makes the new > >>> test pass. > >>> > >>> Further testing showed that the additional checker > >>> coding makes Patching Stubs overflow. These > >>> can not be increased in size to fit the code. I > >>> disable generating verify_oop code in LIRAssembler::load() > >>> which fixes the issue. > >>> > >>> Further I extended the message printed when verification > >>> of an oop failed. First, I print the location in the source > >>> code where the checker code was generated. Second, > >>> I print the faulty oop. > >>> > >>> I also improved the message printed when PatchingStubs > >>> overflow. > >>> > >>> Finally, I improve the test to run with and without compressed > >>> Oops. > >>> > >>> Please review: > >>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/ > >>> > >>> @runtime as I modify the test introduced there > >>> @compiler as the error is in C1. > >>> > >>> Best regards, > >>> Goetz. > >>> From vladimir.kozlov at oracle.com Fri Nov 8 16:50:50 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Nov 2019 08:50:50 -0800 Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed: must have CFG nodes In-Reply-To: References: Message-ID: Good. thanks, Vladimir On 11/8/19 1:46 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8233656 > http://cr.openjdk.java.net/~thartmann/8233656/webrev.00/ > > During IGVN, we process a CastII node that carries a non-zero dependency from > GraphKit::cast_not_null [1]. ConstraintCastNode::dominating_cast then finds another CastII and > checks if it's dominating. We assert in PhaseGVN::is_dominator_helper because the other CastII has a > ProjNode as control input that has !is_CFG() because it's input is TOP [2]. The input has been > replaced in the same round of IGVN and the projection is already on the IGVN worklist but hasn't > been processed yet (it will go away). > > I propose to simply check the control inputs for is_CFG(). > > I can reproduce the issue with a complex Javafuzzer generated test (attached to the bug) but minimal > changes/simplifications to the test cause the issue to not reproduce anymore because it depends on > the order in which nodes are processed by IGVN. So I don't think it makes sense to include that > fragile test. > > This has been triggered by my fix for 8229496 [3] which added additional Cast nodes but I believe it > can also happen without these changes. > > Thanks, > Tobias > > [1] https://hg.openjdk.java.net/jdk/jdk/rev/86b95fc6ca32#l12.40 > [2] https://hg.openjdk.java.net/jdk/jdk/file/47c20fc6a517/src/hotspot/share/opto/multnode.cpp#l83 > [3] https://bugs.openjdk.java.net/browse/JDK-8229496 > From vladimir.kozlov at oracle.com Fri Nov 8 17:00:42 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Nov 2019 09:00:42 -0800 Subject: [14] RFR(S): 8233529: loopTransform.cpp:2984: Error: assert(p_f->Opcode() == Op_IfFalse) failed In-Reply-To: <472426eb-a89a-dd79-c196-4a2e4fa6a2e2@oracle.com> References: <472426eb-a89a-dd79-c196-4a2e4fa6a2e2@oracle.com> Message-ID: <1de496aa-f57b-1ce1-5fa2-5d3488217887@oracle.com> Looks good. How much time take for Graal to run the test (you switched off TieredCompilation)? Vladimir On 11/8/19 4:52 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8233529 > http://cr.openjdk.java.net/~thartmann/8233529/webrev.00/ > > We have two loops (see TestRemoveMainPostLoops.java): Loop A with an inner loop followed by Loop B. > > (1) OSR compilation is triggered in loop B. > (2) Pre-/main-/post loops are created for loop B. > (3) Main and post loops of B are found empty and are removed. > (4) Inner loop A is fully unrolled and removed. > (5) Only main and post loops are created for A (no pre loop -> "PeelMainPost") and main is unrolled. > (6) Pre loop of A is found empty, attempt to remove main and post loop then incorrectly selects main > loop from A. > > The loop layout looks like this: > Loop: N0/N0 has_sfpt > Loop: N383/N718 limit_check sfpts={ 160 } > Loop: N512/N517 counted [int,int),+1 (4 iters) pre has_sfpt <- belongs to A > Loop: N760/N338 counted [1,100),+2 (102 iters) main has_sfpt <- belongs to B > Loop: N713/N716 counted [int,101),+1 (4 iters) post has_sfpt <- belongs to B > > Please note that the order of the two loops is not like in the Java code because it's an OSR > compilation that starts execution in the second loop. > > I've strengthened the asserts in locate_pre_from_main() and added a check for is_main_no_pre_loop() > in the caller. > > The code has been introduced by JDK-8085832 [1]. > > Thanks, > Tobias > > [1] https://bugs.openjdk.java.net/browse/JDK-8085832 > From vladimir.kozlov at oracle.com Fri Nov 8 17:16:01 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Nov 2019 09:16:01 -0800 Subject: [14] RFR(S): 8229694: JVM crash in SWPointer during C2 OSR compilation In-Reply-To: References: Message-ID: Looks good. It looks like you now know our vector optimizations ;) Please file a RFE, as you suggested, to clean this up later. Thanks, Vladimir On 11/6/19 2:13 AM, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8229694 > http://cr.openjdk.java.net/~chagedorn/8229694/webrev.00/ > > The JVM crashes in the testcase when trying to dereference mem at [1] which is NULL. This happens > when converting packs into vector nodes in SuperWord::output() by reading an unexpected NULL value > with SuperWord::align_to_ref() [2]. The corresponding field _align_to_ref is set to NULL at [3] > since best_align_to_mem_ref was assigned NULL just before at [4]. _packset only contained one pack > and there were no memory operations left to be processed (memops is empty). As a result > SuperWord::find_align_to_ref() will return NULL since it needs at least two operations to find an > alignment. > > The fix is straight forward to directly use the alignment of the only pack remaining if there are no > memory operations left in memops to find another alignment. > > > The testcase creates such a situation where only one pack remains at [4] when the loop is unrolled > four times. When calling SuperWord::find_adjacent_refs() there are: > - 4 StoreI for intArr[j-1] = 400 > - 4 StoreC for shortArr[j] = 30 > - 2 StoreI for intArr[7] = 260 // Initially 4 but 2 are removed by IGVN in Ideal() > - 2 StoreC for shortArr[10] = 10 // Initially 4 but 2 are removed by IGVN in Ideal() > - 2 LoadI (and 2 StoreI) for iFld = intArr[j] // Initially 4 each but 2 of each are removed by IGVN > in Ideal() > > The field stores are obviously ignored for the superword algorithm. intArr[j-1] aligns with > intArr[7] and therefore create_pack is true. The only pack created is one with two immediately > following stores for intArr[j-1]. The IGVN algorithm is not able to remove the first redundant store > to intArr[7] when the loop is unrolled the first time. Only when unrolling it again the second time, > it is able to remove the two newly created redundant stores to intArr[7]. This leaves us with the > following depencendies of stores: "intArr[j-1] -> intArr[j-1] -> intArr[7] -> intArr[j-1] -> > intArr[7] -> intArr[j-1]" from which only the first two operations can be used to create a pack. > > The very same applies to the StoreC nodes. As a result, one pack for StoreI and one for StoreC are > created in total. There are now only the two LoadI nodes of intArr[j] left which are not aligned > with intArr[j-1]. Therefore, all StoreI packs are removed at [5]. This leaves us with exactly one > ShortC pack and an empty memops list which sets the alignment to NULL and eventually lets the JVM > crash at [1]. > > We might want to file an RFE to investigate further why IGVN cannot remove the first redundant > stores to intArr[7], shortArr[10], and iFld, respectively (even though it's quite useless to keep > setting the same values in a loop). This problem can also be observed if the loop only contains the > statement "iFld = intArr[j]". But I think even if those redundant stores would have been optimized > away we should have this fix to handle the situation with only one pack and no memory operations > remaining. > > > Thank you! > > Best regards, > Christian > > > [1] http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l3608 > [2] http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l2328 > [3] http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l732 > [4] http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l708 > [5] http://hg.openjdk.java.net/jdk/jdk/file/f61eea1869e4/src/hotspot/share/opto/superword.cpp#l688 From igor.ignatyev at oracle.com Fri Nov 8 19:33:32 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 8 Nov 2019 11:33:32 -0800 Subject: RFR(S) : 8233745 : [JVMCI] TranslatedException should serialize classloader and module info In-Reply-To: <327E76EB-4257-449F-8648-0546203D8216@oracle.com> References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com> <327E76EB-4257-449F-8648-0546203D8216@oracle.com> Message-ID: <5716B217-5CAE-4512-A063-CC416336B064@oracle.com> Hi Doug, I've added the difference in string representations to the bug report, for the connivence, here is the part of the diff: > < at app//jdk.vm.ci.hotspot.test.TestTranslatedException.encodeDecodeTest(TestTranslatedException.java:73) > < at java.base at 14-internal/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) <...> > --- > > at jdk.vm.ci.hotspot.test.TestTranslatedException.encodeDecodeTest(TestTranslatedException.java:73) > > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) as you can see there is 'app/' for classes loaded by application loader (and another '/' as in the tests are from unnamed module) and '@14-internal' for classes from system modules. Thanks, -- Igor > On Nov 8, 2019, at 3:18 AM, Doug Simon wrote: > > Hi Igor, > > To understand the bits lost in the translation as you describe below, can you please paste here or in the issue an example of before and after of a translated exception that looses info in the translation. > > -Doug > >> On 8 Nov 2019, at 00:33, Vladimir Kozlov wrote: >> >> Good. >> >> Tom and Doug should look on this. >> >> Thanks, >> Vladimir >> >> On 11/7/19 3:15 PM, Igor Ignatyev wrote: >>> http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00/index.html >>>> 71 lines changed: 50 ins; 14 del; 7 mod; >>> Hi all, >>> could you please review the small patch which updates jdk/vm/ci/hotspot/TranslatedException to encode/decode StackTraceElement fields which were introduced in JDK9 (classloader name, module name and version fields)? >>> I wasn't able to make deserialize StackTraceElement::toString to return the same string representation as original ones b/c StackTraceElement::declaringClassObject won't be set, as a result, JDK_NON_UPGRADEABLE_MODULE and BUILTIN_CLASS_LOADER bits won't be set either and StackTraceElement::toString will have classloader names even for built-it loader (won't be in original b/c dropClassLoaderName() is true) and version of system modules (won't be in original b/c dropModuleVersion() is true); so I changed how TestTranslatedException compares original and decoded exceptions. >>> webrev: http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00 >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233745 >>> testing: compiler/jvmci/ + graal tiers >>> Thanks, >>> -- Igor > From doug.simon at oracle.com Fri Nov 8 22:38:58 2019 From: doug.simon at oracle.com (Doug Simon) Date: Fri, 8 Nov 2019 23:38:58 +0100 Subject: RFR(S) : 8233745 : [JVMCI] TranslatedException should serialize classloader and module info In-Reply-To: <5716B217-5CAE-4512-A063-CC416336B064@oracle.com> References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com> <327E76EB-4257-449F-8648-0546203D8216@oracle.com> <5716B217-5CAE-4512-A063-CC416336B064@oracle.com> Message-ID: <416DB2FA-EEC4-4AB4-8265-C166F68EBEB9@oracle.com> Ok, I think the translated exceptions still convert the most important information. Looks good to me. -Doug > On 8 Nov 2019, at 20:33, Igor Ignatyev wrote: > > Hi Doug, > > I've added the difference in string representations to the bug report, for the connivence, here is the part of the diff: >> < at app//jdk.vm.ci.hotspot.test.TestTranslatedException.encodeDecodeTest(TestTranslatedException.java:73) >> < at java.base at 14-internal/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > <...> >> --- >>> at jdk.vm.ci.hotspot.test.TestTranslatedException.encodeDecodeTest(TestTranslatedException.java:73) >>> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > as you can see there is 'app/' for classes loaded by application loader (and another '/' as in the tests are from unnamed module) and '@14-internal' for classes from system modules. > > Thanks, > -- Igor > >> On Nov 8, 2019, at 3:18 AM, Doug Simon wrote: >> >> Hi Igor, >> >> To understand the bits lost in the translation as you describe below, can you please paste here or in the issue an example of before and after of a translated exception that looses info in the translation. >> >> -Doug >> >>> On 8 Nov 2019, at 00:33, Vladimir Kozlov wrote: >>> >>> Good. >>> >>> Tom and Doug should look on this. >>> >>> Thanks, >>> Vladimir >>> >>> On 11/7/19 3:15 PM, Igor Ignatyev wrote: >>>> http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00/index.html >>>>> 71 lines changed: 50 ins; 14 del; 7 mod; >>>> Hi all, >>>> could you please review the small patch which updates jdk/vm/ci/hotspot/TranslatedException to encode/decode StackTraceElement fields which were introduced in JDK9 (classloader name, module name and version fields)? >>>> I wasn't able to make deserialize StackTraceElement::toString to return the same string representation as original ones b/c StackTraceElement::declaringClassObject won't be set, as a result, JDK_NON_UPGRADEABLE_MODULE and BUILTIN_CLASS_LOADER bits won't be set either and StackTraceElement::toString will have classloader names even for built-it loader (won't be in original b/c dropClassLoaderName() is true) and version of system modules (won't be in original b/c dropModuleVersion() is true); so I changed how TestTranslatedException compares original and decoded exceptions. >>>> webrev: http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00 >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233745 >>>> testing: compiler/jvmci/ + graal tiers >>>> Thanks, >>>> -- Igor >> > From per.liden at oracle.com Fri Nov 8 22:45:41 2019 From: per.liden at oracle.com (Per Liden) Date: Fri, 8 Nov 2019 23:45:41 +0100 Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted to a load for strong refs In-Reply-To: References: Message-ID: <3d08b7f4-68a4-5e69-1e39-cf3725adc8da@oracle.com> On 11/7/19 2:49 PM, Erik ?sterlund wrote: > Hi, > > We have noticed problems with the one single place where we let C2 touch > the graph while injecting our load barriers. Right before tagging a load > as needing a load barrier, it is GVN transformed. The problem with this > is that if we emit a strong field load, the GVN transformation can see > through that load, by following it through the store that set its value, > and find that the field value really came from the load of a weak > Reference.get intrinsic. In this scenario, the load we get out from the > GVN transformation needs weak barriers, yet we override it with strong > barriers, as that was the semantics of the access being parsed. Sigh. We > already have code that tries to determine if the load we got out from > the GVN transformation looks like a load that was created in the > BarrierSetC2 factory function, so one way of solving this is to refine > that logic that tries to determine if this was the load we created > before the transformation or not. But I felt like a better solution is > to finish constructing the access with all the intended properties > *before* transformation. > > I massaged the code so that the GC barrier data of accesses with load > barriers gets passed in to the factory functions that create the access, > right before the transformation. This way, we construct the access with > the intended semantics where it is being created (parser or macro > expansion for field accesses in clone intrinsics). Then we do not have > to touch it after the GVN transformation. > > It does seem like there could be similar problems from other GCs, but in > e.g. G1, the consequences are weird suboptimal code instead of anything > dangerous happening. For example, we can generate SATB buffering code > required by G1 Reference.get() intrinsics for strong accesses, due to > GVN handing out earlier accesses with different semantics. Perhaps that > should be looked into separately as well. But that investigation is > outside of the scope of this bug fix. > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8233506/webrev.00/ Looks good! As discussed off-line, even if it's not a problem for ZGC, we should fix so that we never call access.set_raw_access() *after* GVN transformation. But let's do that as a separate fix. /Per > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8233506 > > Thanks, > /Erik From john.r.rose at oracle.com Fri Nov 8 23:00:52 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 8 Nov 2019 15:00:52 -0800 Subject: JDK-8214239 (?): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: References: <58CC373D-E70D-4503-8578-6110F3FC04EE@oracle.com> Message-ID: <74901E7C-D45E-47FF-A61B-913F91CC7EEC@oracle.com> On Nov 8, 2019, at 3:40 AM, B. Blaser wrote: > > Yes, I'll also consider adding patterns for non-constant masks which > might help in collections like BitSet or EnumSet (see [1] & [2]) by > subsuming the shift operation provided that we do successful > experiments. Yes. There?s lots of code like that out there. If you make a micro of enum performance, please be aware of this missing bit: https://bugs.openjdk.java.net/browse/JDK-8161245 I added you as a watcher. The problem is that, although an enum *object* often constant folds, its *ordinal field* fails to constant fold (last time I looked). The above is a point fix, but we also need a more comprehensive fix. > The range check of LLVM classification switch is somewhat different in > the sense that it uses BT (bit test) which unfortunately sets uncommon > flags (CF) instead of regular AND/TEST flags (ZF,...) which might > require some additional work to replace the existing operations? That may be. Flag handling is a tricky part of C2. I think there are pre-existing instructions that work with CF which can serves as examples. FWIW I noticed this as a possibly relevant change to LLVM: https://reviews.llvm.org/D48606 > However, the current patch being almost ready to be pushed, I'll look > at these questions in separate issues. That makes perfect sense; don?t delay what you have. If you get stalled on the follow-up work, please do post your learnings on JBS (like JDK-8214239). > Thanks for these suggestions, > Bernard Thank you for taking this on! ? John > > [1] http://hg.openjdk.java.net/jdk/jdk/file/c709424ad48f/src/java.base/share/classes/java/util/BitSet.java#l452 > [2] http://hg.openjdk.java.net/jdk/jdk/file/c709424ad48f/src/java.base/share/classes/java/util/RegularEnumSet.java#l165 > > On Thu, 7 Nov 2019 at 22:59, John Rose wrote: >> >> Would you consider adding patterns for non-constant masks also? >> It would be something like (And (LShift n) x), etc. >> It could be in this set or in an a follow-on. >> Thanks (says John who always wants more). From john.r.rose at oracle.com Sat Nov 9 01:24:10 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 8 Nov 2019 17:24:10 -0800 Subject: final field values should be trusted as constant (filed as JDK-8233873) Message-ID: https://bugs.openjdk.java.net/browse/JDK-8233873 # Problem The JVM JITs routinely optimize references to final fields as constant values, when a JIT can deduce a constant containing object. This is a fundamental capability for producing good code. Currently, though, only a small number of "white listed" fields are treated in this way, since vigorously optimizing _all_ final fields is thought to have unknown risky consequences. The white listing logic is defined using the function `trust_final_non_static_fields` and similar logic setting ` as part of changes like JDK-6912065 and JDK-8140483. # Proposal The JVM should support an option `FoldConstantFields` which treats bypasses the above "white list" and uses a "black list" instead as needed. Initially this option should be turned off by default. Turning it on should, initially, also turn on a new option `VerifyConstantFields` which detects updates to final fields and diagnoses them with some selectable mix of warnings or errors. (See below for discussion of how updates to final fields can occcur. The short summary is "reflection, JNI, or Unsafe". Each of these requires a different remediation.) This feature will not solve the problem of full optimization of constant fields all at once, but will set the stage for finding and fixing problems caused by such optimizations. The support for `FoldConstantFields` should include (either initially or as follow-on work) the following functions: - Dependency recording in the JIT, whenever a final field value is used. At first this should be recorded per field declaration, not per individual field instance, on the assumption that invalidation will be very rare. This assumption may need to be revised. - Updates to final fields via reflection must be trapped and must trigger deoptimization of dependent JIT. - Updates to final fields via JNI must be trapped similarly. - Updates to final fields via other users of `Unsafe` must be trapped similarly. This addresses uses of `Unsafe` _that the JDK knows about and controls_. - Encourage other users of `Unsafe` to perform similar notifications, and document how to do so. Perhaps there are additional `Unsafe` API points to notify the JIT. - Placing the checking logic inside `Unsafe` is the wrong answer in most cases, since it would penalize well-behaved users of `Unsafe`. Perhaps a separate flag `VerifyUnsafeUpdates` would be applicable, for stress tests where performance can be sacrificed. - Define an API for use by privileged frameworks (including those in the JDK) for creating objects in a "larval" state, apart from normal constructor invocation. (Possibly `Unsafe.allocateInstance` is such an API point; see also JNI AllocObject.) These are released from the constraints on final field writing, including JIT invalidation. If a JIT encounters an object in the larval state, the JIT will simply refrain from constant-folding its fields. - Define an API for promoting larval objects to a normal "adult" state, at which point the normal JIT optimizations would apply. If this isn't done, performance will be lost only regarding the larval objects created by old frameworks, so perhaps this isn't needed. - It seems likely that the larval and adult states would need to be reflected in a bit pattern in the object header. As an optimization, normally constructed objects would probably not need to have this state change in their header bits, unless perhaps they "escape" during their constructor call. # Discussion A final field can in some cases be assigned a new value. If a JIT has already observed the previous value of that final field, and incorporated it into object code as a constant, then (after the assignment of a new value to that field), the optimized object code will execute wrongly. We call such wrongly executing code "invalid", and the JVM takes great care to avoid executing invalid code in similar cases involving speculative optimizations, such as devirtualized method calls or uncommon traps. The basic reason for this is that the Java Memory Model requires that all fields (including changed final fields) must be read accurately. An accurate read yields a value that is appropriate to the current thread, as defined by a web of "happens-before" relations. (It is not entirely wrong to think of these relations as a linear set, although concurrency and races are also part of the JMM.) But field fields _must_ be changed when an object is initialized, and _may rarely_ change in other circumstances. There are a number of ways to change the current value of a final field: 0. In a constructor, a final field may be changed from its current value (typically initial default value) to a new (possibly non-default) value. The JVM (per specification) allows this to occur _multiple times_ although most sources of bytecode are thought to avoid such behavior. 1. When a field is reflected, and `setAccessible(true)` is called, the value may be set. This "hook" is intended for use by deserializers and other low-level facilities. It is thought to be used as a simulation of case #0 above, when an object's constructor cannot be conveniently invoked. In a real sense, holding this option open for serialization frameworks harms the optimization of the entire ecosystem. 2. JNI functions such as SetBooleanField can be used to smash new values into fields even if they are final. 3. Good old `Unsafe.setInt` can be also be used to smash new values into fields (or parts of fields or groups of fields) even if they are final. Although a debugger can forcibly change the value of a field from outside the JVM, via APIs in the `jdk.jdi` module, it appears to be impossible to use those APIs to change final fields. It is unknown what libraries or bytecode spinners "in the wild" are using any of the four options above in ways that would invalidate JIT-compiled code. Setting the JITs free to optimize fully requires a plan for mitigating the impact of final field changes both in known code (in the JDK) and in unknown "wild" code. ## Side note on races Although race conditions (on non-volatile fields) allow the JVM some latitute to return "stale" values for field references, such latitude would usually be quite narrow, since an execution of the invalid optimized method is likely to occur downstream of the invalidating field update (as determined by the happens-before relation of the JMM). The JMM itself would have to be updated to either relax happens-before relations pertaining to final field updates, or else allow special race conditions that allow the JIT to use stale values of final fields (in effect, loo king backward in time, past events visible through the relevant happens-before events). There are no active proposals to update the JMM in this way, and it seems easier to take the JMM as a given, or (at most) make very small changes to it to further specialize the treatment of final fields. From vladimir.x.ivanov at oracle.com Sat Nov 9 12:38:10 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Sat, 9 Nov 2019 15:38:10 +0300 Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed: must have CFG nodes In-Reply-To: References: Message-ID: > http://cr.openjdk.java.net/~thartmann/8233656/webrev.00/ > I propose to simply check the control inputs for is_CFG(). Can you just check control for TOP instead? Also, is it worth putting an assert to ensure the node is already on worklist and will be eventually eliminated? Best regards, Vladimir Ivanov From vladimir.x.ivanov at oracle.com Sat Nov 9 12:40:52 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Sat, 9 Nov 2019 15:40:52 +0300 Subject: [14] RFR(S): 8233529: loopTransform.cpp:2984: Error: assert(p_f->Opcode() == Op_IfFalse) failed In-Reply-To: <472426eb-a89a-dd79-c196-4a2e4fa6a2e2@oracle.com> References: <472426eb-a89a-dd79-c196-4a2e4fa6a2e2@oracle.com> Message-ID: <3d3aec9d-5cfb-4664-cd67-b2f8b84f319f@oracle.com> > http://cr.openjdk.java.net/~thartmann/8233529/webrev.00/ Looks good. Best regards, Vladimir Ivanov From vladimir.x.ivanov at oracle.com Sat Nov 9 12:47:25 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Sat, 9 Nov 2019 15:47:25 +0300 Subject: [14] RFR(S): 8229694: JVM crash in SWPointer during C2 OSR compilation In-Reply-To: References: Message-ID: <48dea421-af44-047a-05d8-26c2c37358b7@oracle.com> Nice analysis, Christian! > http://cr.openjdk.java.net/~chagedorn/8229694/webrev.00/ Looks good. > We might want to file an RFE to investigate further why IGVN cannot > remove the first redundant stores to intArr[7], shortArr[10], and iFld, > respectively (even though it's quite useless to keep setting the same > values in a loop). This problem can also be observed if the loop only > contains the statement "iFld = intArr[j]". But I think even if those > redundant stores would have been optimized away we should have this fix > to handle the situation with only one pack and no memory operations > remaining. Agree. And, please, file the RFE. Best regards, Vladimir Ivanov From igor.ignatyev at oracle.com Sat Nov 9 18:29:20 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Sat, 9 Nov 2019 10:29:20 -0800 Subject: RFR(S) : 8233745 : [JVMCI] TranslatedException should serialize classloader and module info In-Reply-To: <416DB2FA-EEC4-4AB4-8265-C166F68EBEB9@oracle.com> References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com> <327E76EB-4257-449F-8648-0546203D8216@oracle.com> <5716B217-5CAE-4512-A063-CC416336B064@oracle.com> <416DB2FA-EEC4-4AB4-8265-C166F68EBEB9@oracle.com> Message-ID: Doug, Vladimir, thanks for your review, pushed. -- Igor > On Nov 8, 2019, at 2:38 PM, Doug Simon wrote: > > Ok, I think the translated exceptions still convert the most important information. Looks good to me. > > -Doug > >> On 8 Nov 2019, at 20:33, Igor Ignatyev wrote: >> >> Hi Doug, >> >> I've added the difference in string representations to the bug report, for the connivence, here is the part of the diff: >>> < at app//jdk.vm.ci.hotspot.test.TestTranslatedException.encodeDecodeTest(TestTranslatedException.java:73) >>> < at java.base at 14-internal/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> <...> >>> --- >>>> at jdk.vm.ci.hotspot.test.TestTranslatedException.encodeDecodeTest(TestTranslatedException.java:73) >>>> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >> as you can see there is 'app/' for classes loaded by application loader (and another '/' as in the tests are from unnamed module) and '@14-internal' for classes from system modules. >> >> Thanks, >> -- Igor >> >>> On Nov 8, 2019, at 3:18 AM, Doug Simon wrote: >>> >>> Hi Igor, >>> >>> To understand the bits lost in the translation as you describe below, can you please paste here or in the issue an example of before and after of a translated exception that looses info in the translation. >>> >>> -Doug >>> >>>> On 8 Nov 2019, at 00:33, Vladimir Kozlov wrote: >>>> >>>> Good. >>>> >>>> Tom and Doug should look on this. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 11/7/19 3:15 PM, Igor Ignatyev wrote: >>>>> http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00/index.html >>>>>> 71 lines changed: 50 ins; 14 del; 7 mod; >>>>> Hi all, >>>>> could you please review the small patch which updates jdk/vm/ci/hotspot/TranslatedException to encode/decode StackTraceElement fields which were introduced in JDK9 (classloader name, module name and version fields)? >>>>> I wasn't able to make deserialize StackTraceElement::toString to return the same string representation as original ones b/c StackTraceElement::declaringClassObject won't be set, as a result, JDK_NON_UPGRADEABLE_MODULE and BUILTIN_CLASS_LOADER bits won't be set either and StackTraceElement::toString will have classloader names even for built-it loader (won't be in original b/c dropClassLoaderName() is true) and version of system modules (won't be in original b/c dropModuleVersion() is true); so I changed how TestTranslatedException compares original and decoded exceptions. >>>>> webrev: http://cr.openjdk.java.net/~iignatyev//8233745/webrev.00 >>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8233745 >>>>> testing: compiler/jvmci/ + graal tiers >>>>> Thanks, >>>>> -- Igor >>> >> > From dl at cs.oswego.edu Sat Nov 9 19:21:00 2019 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 9 Nov 2019 14:21:00 -0500 Subject: final field values should be trusted as constant (filed as JDK-8233873) In-Reply-To: References: Message-ID: <69de5d88-2487-850b-5388-6363d88d2b5b@cs.oswego.edu> On 11/8/19 8:24 PM, John Rose wrote: > ## Side note on races > > Although race conditions (on non-volatile fields) allow the JVM some > latitute to return "stale" values for field references, such latitude > would usually be quite narrow, since an execution of the invalid > optimized method is likely to occur downstream of the invalidating > field update (as determined by the happens-before relation of the > JMM). Ever since initial revisions of JLS1 version, the intent of JMM specs (including current) is to allow compilers to believe that the value they see in initial reads of a final field is the only value they will ever see. So no revision is necessary on these grounds (although one of these days there will be one that accommodates VarHandle modes etc, formalizing http://gee.cs.oswego.edu/dl/html/j9mm.html). Some of the spec messiness exists just to explain why compilers are allowed not to believe this as well, because of reflection etc. In other words, don't let JMM concerns stop you from this worthwhile effort. -Doug From fujie at loongson.cn Mon Nov 11 03:56:32 2019 From: fujie at loongson.cn (Jie Fu) Date: Mon, 11 Nov 2019 11:56:32 +0800 Subject: RFR: 8233885: Test fails with assert(comp != __null) failed: Ensure we have a compiler Message-ID: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn> Hi all, May I get reviews for this small change? This bug was found by running compiler/profiling/spectrapredefineclass/Launcher.java with -Xcomp. The fix sets CompLevel_initial_compile to CompLevel_full_optimization if CompilationMode=high-only is specified. JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233885 Webrev: http://cr.openjdk.java.net/~jiefu/8233885/webrev.00/ Testing: ? - make test TEST="test/hotspot/jtreg/compiler" CONF=fastdebug on Linux/x64 Thanks a lot. Best regards, Jie From fujie at loongson.cn Mon Nov 11 05:43:58 2019 From: fujie at loongson.cn (Jie Fu) Date: Mon, 11 Nov 2019 13:43:58 +0800 Subject: RFR: 8233885: Test fails with assert(comp != __null) failed: Ensure we have a compiler In-Reply-To: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn> References: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn> Message-ID: <11304b85-3a28-017b-645e-e76fe4c2d0f8@loongson.cn> Sorry, I missed the test case in webrev.00 I just added it in webrev.01 Please review this version: http://cr.openjdk.java.net/~jiefu/8233885/webrev.01/ Thanks a lot. Best regards, Jie On 2019/11/11 ??11:56, Jie Fu wrote: > Hi all, > > May I get reviews for this small change? > This bug was found by running > compiler/profiling/spectrapredefineclass/Launcher.java with -Xcomp. > The fix sets CompLevel_initial_compile to CompLevel_full_optimization > if CompilationMode=high-only is specified. > > JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233885 > Webrev: http://cr.openjdk.java.net/~jiefu/8233885/webrev.00/ > > Testing: > ? - make test TEST="test/hotspot/jtreg/compiler" CONF=fastdebug on > Linux/x64 > > Thanks a lot. > Best regards, > Jie > From tobias.hartmann at oracle.com Mon Nov 11 05:57:33 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 11 Nov 2019 06:57:33 +0100 Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed: must have CFG nodes In-Reply-To: References: Message-ID: <574c76eb-3613-13d6-bb4d-288e5f509426@oracle.com> Thanks Vladimir. Best regards, Tobias On 08.11.19 17:50, Vladimir Kozlov wrote: > Good. > > thanks, > Vladimir > > On 11/8/19 1:46 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8233656 >> http://cr.openjdk.java.net/~thartmann/8233656/webrev.00/ >> >> During IGVN, we process a CastII node that carries a non-zero dependency from >> GraphKit::cast_not_null [1]. ConstraintCastNode::dominating_cast then finds another CastII and >> checks if it's dominating. We assert in PhaseGVN::is_dominator_helper because the other CastII has a >> ProjNode as control input that has !is_CFG() because it's input is TOP [2]. The input has been >> replaced in the same round of IGVN and the projection is already on the IGVN worklist but hasn't >> been processed yet (it will go away). >> >> I propose to simply check the control inputs for is_CFG(). >> >> I can reproduce the issue with a complex Javafuzzer generated test (attached to the bug) but minimal >> changes/simplifications to the test cause the issue to not reproduce anymore because it depends on >> the order in which nodes are processed by IGVN. So I don't think it makes sense to include that >> fragile test. >> >> This has been triggered by my fix for 8229496 [3] which added additional Cast nodes but I believe it >> can also happen without these changes. >> >> Thanks, >> Tobias >> >> [1] https://hg.openjdk.java.net/jdk/jdk/rev/86b95fc6ca32#l12.40 >> [2] https://hg.openjdk.java.net/jdk/jdk/file/47c20fc6a517/src/hotspot/share/opto/multnode.cpp#l83 >> [3] https://bugs.openjdk.java.net/browse/JDK-8229496 >> From tobias.hartmann at oracle.com Mon Nov 11 06:10:58 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 11 Nov 2019 07:10:58 +0100 Subject: [14] RFR(S): 8233529: loopTransform.cpp:2984: Error: assert(p_f->Opcode() == Op_IfFalse) failed In-Reply-To: <1de496aa-f57b-1ce1-5fa2-5d3488217887@oracle.com> References: <472426eb-a89a-dd79-c196-4a2e4fa6a2e2@oracle.com> <1de496aa-f57b-1ce1-5fa2-5d3488217887@oracle.com> Message-ID: <29541caa-f0aa-d8f6-56ea-cb350944c2e5@oracle.com> Thanks Vladimir. On my machine, it takes 30s with Graal as JIT vs. 10s with C2. Best regards, Tobias On 08.11.19 18:00, Vladimir Kozlov wrote: > Looks good. How much time take for Graal to run the test (you switched off TieredCompilation)? > > Vladimir > > On 11/8/19 4:52 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8233529 >> http://cr.openjdk.java.net/~thartmann/8233529/webrev.00/ >> >> We have two loops (see TestRemoveMainPostLoops.java): Loop A with an inner loop followed by Loop B. >> >> (1) OSR compilation is triggered in loop B. >> (2) Pre-/main-/post loops are created for loop B. >> (3) Main and post loops of B are found empty and are removed. >> (4) Inner loop A is fully unrolled and removed. >> (5) Only main and post loops are created for A (no pre loop -> "PeelMainPost") and main is unrolled. >> (6) Pre loop of A is found empty, attempt to remove main and post loop then incorrectly selects main >> loop from A. >> >> The loop layout looks like this: >> ?? Loop: N0/N0? has_sfpt >> ???? Loop: N383/N718? limit_check sfpts={ 160 } >> ?????? Loop: N512/N517? counted [int,int),+1 (4 iters)? pre has_sfpt?? <- belongs to A >> ?????? Loop: N760/N338? counted [1,100),+2 (102 iters)? main has_sfpt? <- belongs to B >> ?????? Loop: N713/N716? counted [int,101),+1 (4 iters)? post has_sfpt? <- belongs to B >> >> Please note that the order of the two loops is not like in the Java code because it's an OSR >> compilation that starts execution in the second loop. >> >> I've strengthened the asserts in locate_pre_from_main() and added a check for is_main_no_pre_loop() >> in the caller. >> >> The code has been introduced by JDK-8085832 [1]. >> >> Thanks, >> Tobias >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8085832 >> From tobias.hartmann at oracle.com Mon Nov 11 06:13:30 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 11 Nov 2019 07:13:30 +0100 Subject: [14] RFR(S): 8233529: loopTransform.cpp:2984: Error: assert(p_f->Opcode() == Op_IfFalse) failed In-Reply-To: <3d3aec9d-5cfb-4664-cd67-b2f8b84f319f@oracle.com> References: <472426eb-a89a-dd79-c196-4a2e4fa6a2e2@oracle.com> <3d3aec9d-5cfb-4664-cd67-b2f8b84f319f@oracle.com> Message-ID: <7d4ab00c-e3c6-044d-45a0-c1de60dddcd0@oracle.com> Thanks Vladimir. Best regards, Tobias On 09.11.19 13:40, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~thartmann/8233529/webrev.00/ > Looks good. > > Best regards, > Vladimir Ivanov From tobias.hartmann at oracle.com Mon Nov 11 07:07:23 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 11 Nov 2019 08:07:23 +0100 Subject: [14] RFR(S): 8229694: JVM crash in SWPointer during C2 OSR compilation In-Reply-To: <48dea421-af44-047a-05d8-26c2c37358b7@oracle.com> References: <48dea421-af44-047a-05d8-26c2c37358b7@oracle.com> Message-ID: On 09.11.19 13:47, Vladimir Ivanov wrote: > Nice analysis, Christian! I agree, very nice. Looks good to me too. I'll sponsor. Best regards, Tobias From tobias.hartmann at oracle.com Mon Nov 11 07:12:24 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 11 Nov 2019 08:12:24 +0100 Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted to a load for strong refs In-Reply-To: <1e3fd82d-2235-e5c6-3e41-4d5b5242ced4@oracle.com> References: <575000ef-fc06-c33e-07e3-d8046db1ea43@oracle.com> <1e3fd82d-2235-e5c6-3e41-4d5b5242ced4@oracle.com> Message-ID: <17f9f020-f064-6d00-2f11-c71b35832611@oracle.com> Hi Erik, On 08.11.19 16:15, erik.osterlund at oracle.com wrote: > On 11/8/19 9:38 AM, Tobias Hartmann wrote: >> Do we still need that logic with your change? > > Nope! :) Because the access nodes have their barrier data populated before transformation, > describing the semantics of the produced access, we don't care what GVN gives us after > transformation. Whatever it gives us has the correct semantics for the corresponding access that > produced it. Can we remove that code then? I mean the one you referred to with "We already have code that tries to determine if the load we got out from the GVN transformation looks like a load that was created in the BarrierSetC2 factory function". > Sure, will file another RFE for that. > Regarding the assert, it's not obvious what it would look like, since the assert in the > transformation code has to know exactly what nodes are expected to have GC data. For example, a > LoadP might be used to read any pointer, including but not limited to oops. And some oops don't need > barriers like the threadOop due to being processed in safepoints. So given a LoadP node for example, > I don't know if we can determine whether it should or should not have GC data. Right, that doesn't seem feasible. Ship it! Best regards, Tobias From tobias.hartmann at oracle.com Mon Nov 11 07:44:04 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 11 Nov 2019 08:44:04 +0100 Subject: RFR: 8233885: Test fails with assert(comp != __null) failed: Ensure we have a compiler In-Reply-To: <11304b85-3a28-017b-645e-e76fe4c2d0f8@loongson.cn> References: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn> <11304b85-3a28-017b-645e-e76fe4c2d0f8@loongson.cn> Message-ID: <016d980a-1197-be1e-6078-23290e9ae365@oracle.com> Hi Jie, what about the high-only-quick-internal mode? While looking at the fix for 8227003, I spotted a little typo here ("mininum"): https://hg.openjdk.java.net/jdk/jdk/rev/b95bead30957#l6.8 Maybe you can fix that as well with your patch. Thanks, Tobias On 11.11.19 06:43, Jie Fu wrote: > Sorry, I missed the test case in webrev.00 > I just added it in webrev.01 > > Please review this version: http://cr.openjdk.java.net/~jiefu/8233885/webrev.01/ > > Thanks a lot. > Best regards, > Jie > > On 2019/11/11 ??11:56, Jie Fu wrote: >> Hi all, >> >> May I get reviews for this small change? >> This bug was found by running compiler/profiling/spectrapredefineclass/Launcher.java with -Xcomp. >> The fix sets CompLevel_initial_compile to CompLevel_full_optimization if CompilationMode=high-only >> is specified. >> >> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233885 >> Webrev: http://cr.openjdk.java.net/~jiefu/8233885/webrev.00/ >> >> Testing: >> ? - make test TEST="test/hotspot/jtreg/compiler" CONF=fastdebug on Linux/x64 >> >> Thanks a lot. >> Best regards, >> Jie >> > From erik.osterlund at oracle.com Mon Nov 11 07:53:26 2019 From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=) Date: Mon, 11 Nov 2019 08:53:26 +0100 Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted to a load for strong refs In-Reply-To: <3d08b7f4-68a4-5e69-1e39-cf3725adc8da@oracle.com> References: <3d08b7f4-68a4-5e69-1e39-cf3725adc8da@oracle.com> Message-ID: <66003FC0-87B2-432F-8151-87952B3115E2@oracle.com> Hi Per, Thanks for the review! Will file a cleanup RFE. Thanks, /Erik > On 8 Nov 2019, at 23:45, Per Liden wrote: > > ?On 11/7/19 2:49 PM, Erik ?sterlund wrote: >> Hi, >> We have noticed problems with the one single place where we let C2 touch the graph while injecting our load barriers. Right before tagging a load as needing a load barrier, it is GVN transformed. The problem with this is that if we emit a strong field load, the GVN transformation can see through that load, by following it through the store that set its value, and find that the field value really came from the load of a weak Reference.get intrinsic. In this scenario, the load we get out from the GVN transformation needs weak barriers, yet we override it with strong barriers, as that was the semantics of the access being parsed. Sigh. We already have code that tries to determine if the load we got out from the GVN transformation looks like a load that was created in the BarrierSetC2 factory function, so one way of solving this is to refine that logic that tries to determine if this was the load we created before the transformation or not. But I felt like a better solution is to finish constructing the access with all the intended properties *before* transformation. >> I massaged the code so that the GC barrier data of accesses with load barriers gets passed in to the factory functions that create the access, right before the transformation. This way, we construct the access with the intended semantics where it is being created (parser or macro expansion for field accesses in clone intrinsics). Then we do not have to touch it after the GVN transformation. >> It does seem like there could be similar problems from other GCs, but in e.g. G1, the consequences are weird suboptimal code instead of anything dangerous happening. For example, we can generate SATB buffering code required by G1 Reference.get() intrinsics for strong accesses, due to GVN handing out earlier accesses with different semantics. Perhaps that should be looked into separately as well. But that investigation is outside of the scope of this bug fix. >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8233506/webrev.00/ > > Looks good! > > As discussed off-line, even if it's not a problem for ZGC, we should fix so that we never call access.set_raw_access() *after* GVN transformation. But let's do that as a separate fix. > > /Per > >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8233506 >> Thanks, >> /Erik From david.holmes at oracle.com Mon Nov 11 07:56:21 2019 From: david.holmes at oracle.com (David Holmes) Date: Mon, 11 Nov 2019 17:56:21 +1000 Subject: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. In-Reply-To: References: <3f1be78d-b5e3-192b-d05f-e81ed520d65a@oracle.com> <5e283ed2-1f93-f3a1-1762-da6d5cf465a2@oracle.com> Message-ID: Hi Goetz, Please note I only looked at the test initially and have not reviewed this overall fix as I don't know the PPC code. The updated test seems fine. Thanks, David On 9/11/2019 1:32 am, Lindenmaier, Goetz wrote: > Hi, > > I waited for https://bugs.openjdk.java.net/browse/JDK-8233081 > which makes one of the fixes unnecessary. > Also, I had to fix the argument of verify_oop_helper > from oop to oopDesc* for the fastdebug build. > > New webrev: > http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/03/ > > Best regards, > Goetz. > >> -----Original Message----- >> From: David Holmes >> Sent: Freitag, 18. Oktober 2019 01:38 >> To: Lindenmaier, Goetz ; hotspot-runtime- >> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' > compiler-dev at openjdk.java.net> >> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. >> >> On 18/10/2019 12:10 am, Lindenmaier, Goetz wrote: >>> Hi David, >>> >>> you are right, thanks for pointing me to that! >>> Doing one test for vm.bits=64 and one for 32 should fix it: >>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/ >> >> s/01/02/ :) >> >> For the 32-bit case you can delete the line: >> >> * @requires vm.debug & (os.arch != "sparc") & (os.arch != "sparcv9") >> >> For the 64-but case you can delete the "sparc" check from the same line. >> >> Thanks, >> David >> >>> >>> Best regards, >>> Goetz. >>> >>>> -----Original Message----- >>>> From: David Holmes >>>> Sent: Donnerstag, 17. Oktober 2019 13:18 >>>> To: Lindenmaier, Goetz ; hotspot-runtime- >>>> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' >>> compiler-dev at openjdk.java.net> >>>> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. >>>> >>>> Hi Goetz, >>>> >>>> UseCompressedOops is a 64-bit flag only so your change will break the >>>> test on 32-bit systems. >>>> >>>> David >>>> >>>> On 17/10/2019 8:55 pm, Lindenmaier, Goetz wrote: >>>>> Hi, >>>>> >>>>> 8231058 introduced a test that enables +VerifyOops. >>>>> This fails on ppc, because this was not used in a very >>>>> long time. >>>>> >>>>> The crash is caused by passing compressed oops from >>>>> LIR_Assembler::store() to the checker routine. >>>>> I fix this by implementing a checker routine verify_coop >>>>> that first decompresses the coop. This makes the new >>>>> test pass. >>>>> >>>>> Further testing showed that the additional checker >>>>> coding makes Patching Stubs overflow. These >>>>> can not be increased in size to fit the code. I >>>>> disable generating verify_oop code in LIRAssembler::load() >>>>> which fixes the issue. >>>>> >>>>> Further I extended the message printed when verification >>>>> of an oop failed. First, I print the location in the source >>>>> code where the checker code was generated. Second, >>>>> I print the faulty oop. >>>>> >>>>> I also improved the message printed when PatchingStubs >>>>> overflow. >>>>> >>>>> Finally, I improve the test to run with and without compressed >>>>> Oops. >>>>> >>>>> Please review: >>>>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/ >>>>> >>>>> @runtime as I modify the test introduced there >>>>> @compiler as the error is in C1. >>>>> >>>>> Best regards, >>>>> Goetz. >>>>> From erik.osterlund at oracle.com Mon Nov 11 07:58:16 2019 From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=) Date: Mon, 11 Nov 2019 08:58:16 +0100 Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted to a load for strong refs In-Reply-To: <17f9f020-f064-6d00-2f11-c71b35832611@oracle.com> References: <17f9f020-f064-6d00-2f11-c71b35832611@oracle.com> Message-ID: <75C2E95C-1D75-4AD6-8056-28323ED85507@oracle.com> Hi Tobias, > On 11 Nov 2019, at 08:12, Tobias Hartmann wrote: > > ?Hi Erik, > >> On 08.11.19 16:15, erik.osterlund at oracle.com wrote: >>> On 11/8/19 9:38 AM, Tobias Hartmann wrote: >>> Do we still need that logic with your change? >> >> Nope! :) Because the access nodes have their barrier data populated before transformation, >> describing the semantics of the produced access, we don't care what GVN gives us after >> transformation. Whatever it gives us has the correct semantics for the corresponding access that >> produced it. > > Can we remove that code then? I mean the one you referred to with "We already have code that tries > to determine if the load we got out from the GVN transformation looks like a load that was created > in the BarrierSetC2 factory function". Ahh. To be clear: I meant the code in the backend that first calls e.g. BarrerSetC2::load_at_resolved, and then checks in ZBarrierSetC2, if the resulting raw_access() is really a load before sprinkling barriers. This change deletes all that. >> Sure, will file another RFE for that. >> Regarding the assert, it's not obvious what it would look like, since the assert in the >> transformation code has to know exactly what nodes are expected to have GC data. For example, a >> LoadP might be used to read any pointer, including but not limited to oops. And some oops don't need >> barriers like the threadOop due to being processed in safepoints. So given a LoadP node for example, >> I don't know if we can determine whether it should or should not have GC data. > > Right, that doesn't seem feasible. > > Ship it! Thanks Tobias! /Erik > Best regards, > Tobias From tobias.hartmann at oracle.com Mon Nov 11 08:01:18 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 11 Nov 2019 09:01:18 +0100 Subject: RFR: 8233506: ZGC: the load for Reference.get() can be converted to a load for strong refs In-Reply-To: <75C2E95C-1D75-4AD6-8056-28323ED85507@oracle.com> References: <17f9f020-f064-6d00-2f11-c71b35832611@oracle.com> <75C2E95C-1D75-4AD6-8056-28323ED85507@oracle.com> Message-ID: On 11.11.19 08:58, Erik ?sterlund wrote: > Ahh. To be clear: I meant the code in the backend that first calls e.g. BarrerSetC2::load_at_resolved, and then checks in ZBarrierSetC2, if the resulting raw_access() is really a load before sprinkling barriers. This change deletes all that. Okay, got it. I thought there's more. Thanks, Tobias From tobias.hartmann at oracle.com Mon Nov 11 08:03:35 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 11 Nov 2019 09:03:35 +0100 Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed: must have CFG nodes In-Reply-To: References: Message-ID: Hi Vladimir, thanks for the review. On 09.11.19 13:38, Vladimir Ivanov wrote: > Can you just check control for TOP instead? In the failing case, the control input is not TOP but it's a ProjNode with a TOP input. We hit the assert because the is_CFG() method returns false for these: https://hg.openjdk.java.net/jdk/jdk/file/47c20fc6a517/src/hotspot/share/opto/multnode.cpp#l83 Or do you mean checking ctl->in(0) for TOP? > Also, is it worth putting an assert to ensure the node is already on worklist and will be eventually > eliminated? I don't think it's worth it. Since 8040213 [1] we have code that ensures that all modified nodes are added to the worklist (see Compile::record_modified_node()). I've verified that the ProjNode with TOP input is covered by that. Best regards, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8040213 From christian.hagedorn at oracle.com Mon Nov 11 09:17:36 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Mon, 11 Nov 2019 10:17:36 +0100 Subject: [14] RFR(S): 8229694: JVM crash in SWPointer during C2 OSR compilation In-Reply-To: References: <48dea421-af44-047a-05d8-26c2c37358b7@oracle.com> Message-ID: <931f7ed6-5f1b-9669-a5fc-c0902dcd9b08@oracle.com> Thank you very much Vladimir K., Vladimir I. and Tobias! I filed an RFE [1] and assigned it to me to further investigate. Best regards, Christian [1] https://bugs.openjdk.java.net/browse/JDK-8233895 On 11.11.19 08:07, Tobias Hartmann wrote: > > On 09.11.19 13:47, Vladimir Ivanov wrote: >> Nice analysis, Christian! > > I agree, very nice. > > Looks good to me too. I'll sponsor. > > Best regards, > Tobias > From martin.doerr at sap.com Mon Nov 11 10:39:31 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 11 Nov 2019 10:39:31 +0000 Subject: RFR(S): Test crashed with assert(phi->operand_count() != 1 || phi->subst() != phi) failed: missed trivial simplification Message-ID: Hi, some C1 assertions currently don't deal correctly with illegal phi functions (phi function with illegal type due to type conflict). Since JDK-8214352 we bail out in fewer situations, so C1 needs to deal with a few more illegal phi cases. The assertion (see headline) in PhiSimplifier doesn't support illegal phi, but the simplify call before it does (by just skipping). Another assertion which needs to skip illegal phi is in c1_Optimizer where we merge a block with its unique successor. If a local value of the succeeding block is coming from an illegal phi function, we drop it and take the value from the first block. This is correct, only the assertion doesn't expect that at the moment. So I'd like to fix these 2 assertions I found: http://cr.openjdk.java.net/~mdoerr/8233820_C1_illegal_phi/webrev.00/ Best regards, Martin From doug.simon at oracle.com Mon Nov 11 10:54:08 2019 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 11 Nov 2019 11:54:08 +0100 Subject: RFR(S) : 8233900: [JVMCI] improve help text for EnableJVMCIProduct option In-Reply-To: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com> References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com> Message-ID: Hi, Please review this change to improve the help text for EnableJVMCIProduct and related options. https://dougxc.github.io/webrevs/8233900 https://bugs.openjdk.java.net/browse/JDK-8233900 -Doug From vladimir.x.ivanov at oracle.com Mon Nov 11 12:06:45 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 11 Nov 2019 15:06:45 +0300 Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed: must have CFG nodes In-Reply-To: References: Message-ID: <720f25b4-e153-652a-cef3-f41084d30662@oracle.com> Hi Tobias, > Or do you mean checking ctl->in(0) for TOP? Yes, I'm curious whether it's enough to check that in(0) is NULL, TOP, or Proj(_, TOP). Are there any other important cases left? My concerns is that other usages of PhaseGVN::is_dominator() may be affected the same way as well. It looks like PhaseGVN::is_dominator_helper() would benefit from additional checks: bool PhaseGVN::is_dominator_helper(Node *d, Node *n, bool linear_only) { if (d->is_top() || n->is_top()) { return false; } assert(d->is_CFG() && n->is_CFG(), "must have CFG nodes"); ... As an example, InitializeNode::detect_init_independence() does some control normalization first: bool InitializeNode::detect_init_independence(Node* value, PhaseGVN* phase) { ... if (n->is_Proj()) n = n->in(0); ... if (n->is_CFG() && phase->is_dominator(n, allocation())) { continue; } Node* ctl = n->in(0); if (ctl != NULL && !ctl->is_top()) { if (ctl->is_Proj()) ctl = ctl->in(0); if (ctl == this) return false; ... if (!MemNode::all_controls_dominate(n, this)) ... bool MemNode::all_controls_dominate(Node* dom, Node* sub) { if (dom == NULL || dom->is_top() || sub == NULL || sub->is_top()) return false; // Conservative answer for dead code >> Also, is it worth putting an assert to ensure the node is already on worklist and will be eventually >> eliminated? > > I don't think it's worth it. Since 8040213 [1] we have code that ensures that all modified nodes are > added to the worklist (see Compile::record_modified_node()). I've verified that the ProjNode with > TOP input is covered by that. Got it. Sounds good. Best regards, Vladimir Ivanov From fujie at loongson.cn Mon Nov 11 13:28:13 2019 From: fujie at loongson.cn (Jie Fu) Date: Mon, 11 Nov 2019 21:28:13 +0800 Subject: RFR: 8233885: Test fails with assert(comp != __null) failed: Ensure we have a compiler In-Reply-To: <016d980a-1197-be1e-6078-23290e9ae365@oracle.com> References: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn> <11304b85-3a28-017b-645e-e76fe4c2d0f8@loongson.cn> <016d980a-1197-be1e-6078-23290e9ae365@oracle.com> Message-ID: Hi Tobias, Thank you for your review and valuable comments. Updated: http://cr.openjdk.java.net/~jiefu/8233885/webrev.02/ Thanks a lot. Best regards, Jie On 2019/11/11 ??3:44, Tobias Hartmann wrote: > Hi Jie, > > what about the high-only-quick-internal mode? It's really a nice catch. Fixed. Thanks. > While looking at the fix for 8227003, I spotted a little typo here ("mininum"): > https://hg.openjdk.java.net/jdk/jdk/rev/b95bead30957#l6.8 > Maybe you can fix that as well with your patch. Done. From martin.doerr at sap.com Mon Nov 11 15:06:54 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 11 Nov 2019 15:06:54 +0000 Subject: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. In-Reply-To: References: <3f1be78d-b5e3-192b-d05f-e81ed520d65a@oracle.com> <5e283ed2-1f93-f3a1-1762-da6d5cf465a2@oracle.com> Message-ID: Hi G?tz, the PPC64 code looks good, too. Thanks for fixing and improving it. Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of David Holmes > Sent: Montag, 11. November 2019 08:56 > To: Lindenmaier, Goetz ; hotspot-runtime- > dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' compiler-dev at openjdk.java.net> > Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. > > Hi Goetz, > > Please note I only looked at the test initially and have not reviewed > this overall fix as I don't know the PPC code. > > The updated test seems fine. > > Thanks, > David > > On 9/11/2019 1:32 am, Lindenmaier, Goetz wrote: > > Hi, > > > > I waited for https://bugs.openjdk.java.net/browse/JDK-8233081 > > which makes one of the fixes unnecessary. > > Also, I had to fix the argument of verify_oop_helper > > from oop to oopDesc* for the fastdebug build. > > > > New webrev: > > http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/03/ > > > > Best regards, > > Goetz. > > > >> -----Original Message----- > >> From: David Holmes > >> Sent: Freitag, 18. Oktober 2019 01:38 > >> To: Lindenmaier, Goetz ; hotspot-runtime- > >> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' > >> compiler-dev at openjdk.java.net> > >> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. > >> > >> On 18/10/2019 12:10 am, Lindenmaier, Goetz wrote: > >>> Hi David, > >>> > >>> you are right, thanks for pointing me to that! > >>> Doing one test for vm.bits=64 and one for 32 should fix it: > >>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/ > >> > >> s/01/02/ :) > >> > >> For the 32-bit case you can delete the line: > >> > >> * @requires vm.debug & (os.arch != "sparc") & (os.arch != "sparcv9") > >> > >> For the 64-but case you can delete the "sparc" check from the same line. > >> > >> Thanks, > >> David > >> > >>> > >>> Best regards, > >>> Goetz. > >>> > >>>> -----Original Message----- > >>>> From: David Holmes > >>>> Sent: Donnerstag, 17. Oktober 2019 13:18 > >>>> To: Lindenmaier, Goetz ; hotspot- > runtime- > >>>> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' > >>>> compiler-dev at openjdk.java.net> > >>>> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. > >>>> > >>>> Hi Goetz, > >>>> > >>>> UseCompressedOops is a 64-bit flag only so your change will break the > >>>> test on 32-bit systems. > >>>> > >>>> David > >>>> > >>>> On 17/10/2019 8:55 pm, Lindenmaier, Goetz wrote: > >>>>> Hi, > >>>>> > >>>>> 8231058 introduced a test that enables +VerifyOops. > >>>>> This fails on ppc, because this was not used in a very > >>>>> long time. > >>>>> > >>>>> The crash is caused by passing compressed oops from > >>>>> LIR_Assembler::store() to the checker routine. > >>>>> I fix this by implementing a checker routine verify_coop > >>>>> that first decompresses the coop. This makes the new > >>>>> test pass. > >>>>> > >>>>> Further testing showed that the additional checker > >>>>> coding makes Patching Stubs overflow. These > >>>>> can not be increased in size to fit the code. I > >>>>> disable generating verify_oop code in LIRAssembler::load() > >>>>> which fixes the issue. > >>>>> > >>>>> Further I extended the message printed when verification > >>>>> of an oop failed. First, I print the location in the source > >>>>> code where the checker code was generated. Second, > >>>>> I print the faulty oop. > >>>>> > >>>>> I also improved the message printed when PatchingStubs > >>>>> overflow. > >>>>> > >>>>> Finally, I improve the test to run with and without compressed > >>>>> Oops. > >>>>> > >>>>> Please review: > >>>>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/ > >>>>> > >>>>> @runtime as I modify the test introduced there > >>>>> @compiler as the error is in C1. > >>>>> > >>>>> Best regards, > >>>>> Goetz. > >>>>> From goetz.lindenmaier at sap.com Mon Nov 11 15:17:23 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 11 Nov 2019 15:17:23 +0000 Subject: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. In-Reply-To: References: <3f1be78d-b5e3-192b-d05f-e81ed520d65a@oracle.com> <5e283ed2-1f93-f3a1-1762-da6d5cf465a2@oracle.com> Message-ID: Hi David, thanks for looking again. Martin checked the PPC code. Best regards, Goetz. > -----Original Message----- > From: David Holmes > Sent: Montag, 11. November 2019 08:56 > To: Lindenmaier, Goetz ; hotspot-runtime- > dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' compiler-dev at openjdk.java.net> > Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. > > Hi Goetz, > > Please note I only looked at the test initially and have not reviewed > this overall fix as I don't know the PPC code. > > The updated test seems fine. > > Thanks, > David > > On 9/11/2019 1:32 am, Lindenmaier, Goetz wrote: > > Hi, > > > > I waited for https://bugs.openjdk.java.net/browse/JDK-8233081 > > which makes one of the fixes unnecessary. > > Also, I had to fix the argument of verify_oop_helper > > from oop to oopDesc* for the fastdebug build. > > > > New webrev: > > http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/03/ > > > > Best regards, > > Goetz. > > > >> -----Original Message----- > >> From: David Holmes > >> Sent: Freitag, 18. Oktober 2019 01:38 > >> To: Lindenmaier, Goetz ; hotspot-runtime- > >> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' >> compiler-dev at openjdk.java.net> > >> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. > >> > >> On 18/10/2019 12:10 am, Lindenmaier, Goetz wrote: > >>> Hi David, > >>> > >>> you are right, thanks for pointing me to that! > >>> Doing one test for vm.bits=64 and one for 32 should fix it: > >>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/ > >> > >> s/01/02/ :) > >> > >> For the 32-bit case you can delete the line: > >> > >> * @requires vm.debug & (os.arch != "sparc") & (os.arch != "sparcv9") > >> > >> For the 64-but case you can delete the "sparc" check from the same line. > >> > >> Thanks, > >> David > >> > >>> > >>> Best regards, > >>> Goetz. > >>> > >>>> -----Original Message----- > >>>> From: David Holmes > >>>> Sent: Donnerstag, 17. Oktober 2019 13:18 > >>>> To: Lindenmaier, Goetz ; hotspot-runtime- > >>>> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' > >>>> compiler-dev at openjdk.java.net> > >>>> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. > >>>> > >>>> Hi Goetz, > >>>> > >>>> UseCompressedOops is a 64-bit flag only so your change will break the > >>>> test on 32-bit systems. > >>>> > >>>> David > >>>> > >>>> On 17/10/2019 8:55 pm, Lindenmaier, Goetz wrote: > >>>>> Hi, > >>>>> > >>>>> 8231058 introduced a test that enables +VerifyOops. > >>>>> This fails on ppc, because this was not used in a very > >>>>> long time. > >>>>> > >>>>> The crash is caused by passing compressed oops from > >>>>> LIR_Assembler::store() to the checker routine. > >>>>> I fix this by implementing a checker routine verify_coop > >>>>> that first decompresses the coop. This makes the new > >>>>> test pass. > >>>>> > >>>>> Further testing showed that the additional checker > >>>>> coding makes Patching Stubs overflow. These > >>>>> can not be increased in size to fit the code. I > >>>>> disable generating verify_oop code in LIRAssembler::load() > >>>>> which fixes the issue. > >>>>> > >>>>> Further I extended the message printed when verification > >>>>> of an oop failed. First, I print the location in the source > >>>>> code where the checker code was generated. Second, > >>>>> I print the faulty oop. > >>>>> > >>>>> I also improved the message printed when PatchingStubs > >>>>> overflow. > >>>>> > >>>>> Finally, I improve the test to run with and without compressed > >>>>> Oops. > >>>>> > >>>>> Please review: > >>>>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/ > >>>>> > >>>>> @runtime as I modify the test introduced there > >>>>> @compiler as the error is in C1. > >>>>> > >>>>> Best regards, > >>>>> Goetz. > >>>>> From goetz.lindenmaier at sap.com Mon Nov 11 15:18:03 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 11 Nov 2019 15:18:03 +0000 Subject: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. In-Reply-To: References: <3f1be78d-b5e3-192b-d05f-e81ed520d65a@oracle.com> <5e283ed2-1f93-f3a1-1762-da6d5cf465a2@oracle.com> Message-ID: Hi Martin, thanks for looking at this, and thanks for resolving the patching stub issue! Best regards, Goetz. > -----Original Message----- > From: Doerr, Martin > Sent: Montag, 11. November 2019 16:07 > To: David Holmes ; Lindenmaier, Goetz > ; hotspot-runtime-dev at openjdk.java.net; > 'hotspot-compiler-dev at openjdk.java.net' dev at openjdk.java.net> > Subject: RE: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. > > Hi G?tz, > > the PPC64 code looks good, too. > Thanks for fixing and improving it. > > Best regards, > Martin > > > > -----Original Message----- > > From: hotspot-compiler-dev > bounces at openjdk.java.net> On Behalf Of David Holmes > > Sent: Montag, 11. November 2019 08:56 > > To: Lindenmaier, Goetz ; hotspot-runtime- > > dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' > compiler-dev at openjdk.java.net> > > Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. > > > > Hi Goetz, > > > > Please note I only looked at the test initially and have not reviewed > > this overall fix as I don't know the PPC code. > > > > The updated test seems fine. > > > > Thanks, > > David > > > > On 9/11/2019 1:32 am, Lindenmaier, Goetz wrote: > > > Hi, > > > > > > I waited for https://bugs.openjdk.java.net/browse/JDK-8233081 > > > which makes one of the fixes unnecessary. > > > Also, I had to fix the argument of verify_oop_helper > > > from oop to oopDesc* for the fastdebug build. > > > > > > New webrev: > > > http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/03/ > > > > > > Best regards, > > > Goetz. > > > > > >> -----Original Message----- > > >> From: David Holmes > > >> Sent: Freitag, 18. Oktober 2019 01:38 > > >> To: Lindenmaier, Goetz ; hotspot-runtime- > > >> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' > > > >> compiler-dev at openjdk.java.net> > > >> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. > > >> > > >> On 18/10/2019 12:10 am, Lindenmaier, Goetz wrote: > > >>> Hi David, > > >>> > > >>> you are right, thanks for pointing me to that! > > >>> Doing one test for vm.bits=64 and one for 32 should fix it: > > >>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/ > > >> > > >> s/01/02/ :) > > >> > > >> For the 32-bit case you can delete the line: > > >> > > >> * @requires vm.debug & (os.arch != "sparc") & (os.arch != "sparcv9") > > >> > > >> For the 64-but case you can delete the "sparc" check from the same line. > > >> > > >> Thanks, > > >> David > > >> > > >>> > > >>> Best regards, > > >>> Goetz. > > >>> > > >>>> -----Original Message----- > > >>>> From: David Holmes > > >>>> Sent: Donnerstag, 17. Oktober 2019 13:18 > > >>>> To: Lindenmaier, Goetz ; hotspot- > > runtime- > > >>>> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' > > > >>>> compiler-dev at openjdk.java.net> > > >>>> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058. > > >>>> > > >>>> Hi Goetz, > > >>>> > > >>>> UseCompressedOops is a 64-bit flag only so your change will break the > > >>>> test on 32-bit systems. > > >>>> > > >>>> David > > >>>> > > >>>> On 17/10/2019 8:55 pm, Lindenmaier, Goetz wrote: > > >>>>> Hi, > > >>>>> > > >>>>> 8231058 introduced a test that enables +VerifyOops. > > >>>>> This fails on ppc, because this was not used in a very > > >>>>> long time. > > >>>>> > > >>>>> The crash is caused by passing compressed oops from > > >>>>> LIR_Assembler::store() to the checker routine. > > >>>>> I fix this by implementing a checker routine verify_coop > > >>>>> that first decompresses the coop. This makes the new > > >>>>> test pass. > > >>>>> > > >>>>> Further testing showed that the additional checker > > >>>>> coding makes Patching Stubs overflow. These > > >>>>> can not be increased in size to fit the code. I > > >>>>> disable generating verify_oop code in LIRAssembler::load() > > >>>>> which fixes the issue. > > >>>>> > > >>>>> Further I extended the message printed when verification > > >>>>> of an oop failed. First, I print the location in the source > > >>>>> code where the checker code was generated. Second, > > >>>>> I print the faulty oop. > > >>>>> > > >>>>> I also improved the message printed when PatchingStubs > > >>>>> overflow. > > >>>>> > > >>>>> Finally, I improve the test to run with and without compressed > > >>>>> Oops. > > >>>>> > > >>>>> Please review: > > >>>>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/ > > >>>>> > > >>>>> @runtime as I modify the test introduced there > > >>>>> @compiler as the error is in C1. > > >>>>> > > >>>>> Best regards, > > >>>>> Goetz. > > >>>>> From richard.reingruber at sap.com Mon Nov 11 15:29:36 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Mon, 11 Nov 2019 15:29:36 +0000 Subject: RFC 8233915: JVMTI FollowReferences: Java Heap Leak not found because of C2 Scalar Replacement Message-ID: Hi, I have created https://bugs.openjdk.java.net/browse/JDK-8233915 In short, a set of live objects L is not found using JVMTI FollowReferences() if L is only reachable from a scalar replaced object in a frame of a C2 compiled method. If L happens to be a growing leak, then a dynamically loaded JVMTI agent (note: can_tag_objects is an always capability) for heap diagnostics won't discover L as live and it won't be able to find root references that lead to L. I'd like to suggest the implementation for the proposed enhancement JDK-8227745 as bug-fix. RFE: https://bugs.openjdk.java.net/browse/JDK-8227745 Webrev(*): http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.1/ Please comment on the suggestion. Dou you see other solutions that allow an agent to discover the chain of references to L? I'd like to work on the complexity as well. One significant simplification could be, if it was possible to reallocate scalar replaced objects at safepoints (i.e. allow the VM thread to call Deoptimization::realloc_objects()). The GC interface does not seem to allow this. Thanks, Richard. (*) Not yet accepted, because deemed too complex for the performance gain. Note that I was able to reduce webrev.1 in size compared to webrev.0 From fweimer at redhat.com Mon Nov 11 15:40:21 2019 From: fweimer at redhat.com (Florian Weimer) Date: Mon, 11 Nov 2019 16:40:21 +0100 Subject: adlc-generated operator= for Pipeline_Use_Cycle_Mask Message-ID: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com> adlc generates an assignment operator for the Pipeline_Use_Cycle_Mask class, like this: Pipeline_Use_Cycle_Mask& operator=(const Pipeline_Use_Cycle_Mask &in) { _mask = in._mask; return *this; } GCC 10 takes this as an indicator that objects of this class should not be modified with memcpy or memset and issues -Wclass-memaccess warnings: ?/src/hotspot/share/opto/output.cpp: In constructor 'Scheduling::Scheduling(Arena*, Compile&)': ?/src/hotspot/share/opto/output.cpp:1745:108: error: 'void* memcpy(void*, const void*, size_t)' writing to an object of type 'class Pipeline_Use_Element' with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Werror=class-memaccess] 1745 | memcpy(_bundle_use_elements, Pipeline_Use::elaborated_elements, sizeof(Pipeline_Use::elaborated_elements)); | ^ In file included from ?/src/hotspot/share/opto/ad.hpp:31, from ?/src/hotspot/share/opto/output.cpp:37: ad_x86.hpp:6196:7: note: 'class Pipeline_Use_Element' declared here ?/src/hotspot/share/opto/output.cpp: In member function 'void Scheduling::step_and_clear()': ?/src/hotspot/share/opto/output.cpp:1797:51: error: 'void* memcpy(void*, const void*, size_t)' writing to an object of type 'class Pipeline_Use_Element' with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Werror=class-memaccess] 1797 | sizeof(Pipeline_Use::elaborated_elements)); | ^ Looking at the code generator, it seems that the default assignment operator will do the right thing in all cases. Can we remove generation of this C++ code fragment? Thanks, Florian From patric.hedlin at oracle.com Mon Nov 11 16:49:04 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Mon, 11 Nov 2019 17:49:04 +0100 Subject: RFR(XS): 8233918: 8233498 broke build on SPARC Message-ID: <6fc02880-958f-1e0b-bb9a-a40cc8fb6cfa@oracle.com> Dear all, I would like to ask for help to review the following change/update: Issue:? https://bugs.openjdk.java.net/browse/JDK-8233918 8233918: 8233498 broke build on SPARC ??? Pushed poor patch that broke debug builds. Testing: SPARC build on Solaris (sparcv9, sparcv9-debug, sparcv9-slowdebug) Best regards, Patric -----8<----- diff -r e4d7fcab43d7 -r 0c993f1305eb src/hotspot/cpu/sparc/interp_masm_sparc.hpp --- a/src/hotspot/cpu/sparc/interp_masm_sparc.hpp?????? Tue Apr 24 13:59:02 2018 +0200 +++ b/src/hotspot/cpu/sparc/interp_masm_sparc.hpp?????? Mon Nov 11 16:59:42 2019 +0100 @@ -321,6 +321,7 @@ ?? // Debugging ?? void interp_verify_oop(Register reg, TosState state, const char * file, int line);??? // only if +VerifyOops && state == atos ?? void verify_oop_or_return_address(Register reg, Register rtmp); // for astore +? void verify_FPU(int stack_depth, TosState state = ftos) {} // No-op. ?? // support for JVMTI/Dtrace ?? typedef enum { NotifyJVMTI, SkipNotifyJVMTI } NotifyMethodExitMode; From erik.osterlund at oracle.com Mon Nov 11 16:54:57 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Mon, 11 Nov 2019 17:54:57 +0100 Subject: RFR(XS): 8233918: 8233498 broke build on SPARC In-Reply-To: <6fc02880-958f-1e0b-bb9a-a40cc8fb6cfa@oracle.com> References: <6fc02880-958f-1e0b-bb9a-a40cc8fb6cfa@oracle.com> Message-ID: Hi Patric, Looks good, and trivial. Ship it! Thanks, /Erik On 11/11/19 5:49 PM, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8233918 > > 8233918: 8233498 broke build on SPARC > > ??? Pushed poor patch that broke debug builds. > > > Testing: SPARC build on Solaris (sparcv9, sparcv9-debug, > sparcv9-slowdebug) > > Best regards, > Patric > > -----8<----- > > diff -r e4d7fcab43d7 -r 0c993f1305eb > src/hotspot/cpu/sparc/interp_masm_sparc.hpp > --- a/src/hotspot/cpu/sparc/interp_masm_sparc.hpp?????? Tue Apr 24 > 13:59:02 2018 +0200 > +++ b/src/hotspot/cpu/sparc/interp_masm_sparc.hpp?????? Mon Nov 11 > 16:59:42 2019 +0100 > @@ -321,6 +321,7 @@ > ?? // Debugging > ?? void interp_verify_oop(Register reg, TosState state, const char * > file, int line);??? // only if +VerifyOops && state == atos > ?? void verify_oop_or_return_address(Register reg, Register rtmp); // > for astore > +? void verify_FPU(int stack_depth, TosState state = ftos) {} // No-op. > > ?? // support for JVMTI/Dtrace > ?? typedef enum { NotifyJVMTI, SkipNotifyJVMTI } NotifyMethodExitMode; > From vladimir.kozlov at oracle.com Mon Nov 11 16:59:00 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 11 Nov 2019 08:59:00 -0800 Subject: RFR(S) : 8233900: [JVMCI] improve help text for EnableJVMCIProduct option In-Reply-To: References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com> Message-ID: <3C974BFC-3CF7-4214-B4F3-EE630C32EBF3@oracle.com> Looks good. Do you need this be backported to 11u? Add affected versions and fix version to rfe. Thanks Vladimir > On Nov 11, 2019, at 2:54 AM, Doug Simon wrote: > > Hi, > > Please review this change to improve the help text for EnableJVMCIProduct and related options. > > https://dougxc.github.io/webrevs/8233900 > https://bugs.openjdk.java.net/browse/JDK-8233900 > > -Doug From bsrbnd at gmail.com Mon Nov 11 17:23:32 2019 From: bsrbnd at gmail.com (B. Blaser) Date: Mon, 11 Nov 2019 18:23:32 +0100 Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and setting long vector bits Message-ID: Hi, Please review the following fix for [1] which has been extensively discussed in [2]: http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/ It includes: * Vladimir's requested changes about instruction encoding and testing along with: * John's suggested complementary benchmark following Sandhya's note regarding throughput. Thanks (hotspot:tier1 is OK on Linux/x86_64), Bernard [1] https://bugs.openjdk.java.net/browse/JDK-8214239 [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/035699.html From vladimir.kozlov at oracle.com Mon Nov 11 20:14:41 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 11 Nov 2019 12:14:41 -0800 Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: References: Message-ID: <07845286-F5FB-4A7C-925B-8C5488EA7A02@oracle.com> Very good. What testing you did? Thanks Vladimir > On Nov 11, 2019, at 9:23 AM, B. Blaser wrote: > > Hi, > > Please review the following fix for [1] which has been extensively > discussed in [2]: > > http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/ > > It includes: > * Vladimir's requested changes about instruction encoding and testing > along with: > * John's suggested complementary benchmark following Sandhya's note > regarding throughput. > > Thanks (hotspot:tier1 is OK on Linux/x86_64), > Bernard > > [1] https://bugs.openjdk.java.net/browse/JDK-8214239 > [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/035699.html From vladimir.kozlov at oracle.com Mon Nov 11 20:16:03 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 11 Nov 2019 12:16:03 -0800 Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: <07845286-F5FB-4A7C-925B-8C5488EA7A02@oracle.com> References: <07845286-F5FB-4A7C-925B-8C5488EA7A02@oracle.com> Message-ID: On 11/11/19 12:14 PM, Vladimir Kozlov wrote: > Very good. What testing you did? I mean in addition to tier1. Vladimir > > Thanks > Vladimir > >> On Nov 11, 2019, at 9:23 AM, B. Blaser wrote: >> >> Hi, >> >> Please review the following fix for [1] which has been extensively >> discussed in [2]: >> >> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/ >> >> It includes: >> * Vladimir's requested changes about instruction encoding and testing >> along with: >> * John's suggested complementary benchmark following Sandhya's note >> regarding throughput. >> >> Thanks (hotspot:tier1 is OK on Linux/x86_64), >> Bernard >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8214239 >> [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/035699.html > From kim.barrett at oracle.com Mon Nov 11 20:20:24 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 11 Nov 2019 15:20:24 -0500 Subject: adlc-generated operator= for Pipeline_Use_Cycle_Mask In-Reply-To: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com> References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com> Message-ID: <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com> > On Nov 11, 2019, at 10:40 AM, Florian Weimer wrote: > > adlc generates an assignment operator for the Pipeline_Use_Cycle_Mask > class, like this: > > Pipeline_Use_Cycle_Mask& operator=(const Pipeline_Use_Cycle_Mask &in) { > _mask = in._mask; > return *this; > } > > GCC 10 takes this as an indicator that objects of this class should not > be modified with memcpy or memset and issues -Wclass-memaccess warnings: > > ?/src/hotspot/share/opto/output.cpp: In constructor 'Scheduling::Scheduling(Arena*, Compile&)': > ?/src/hotspot/share/opto/output.cpp:1745:108: error: 'void* memcpy(void*, const void*, size_t)' writing to an object of type 'class Pipeline_Use_Element' with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Werror=class-memaccess] > 1745 | memcpy(_bundle_use_elements, Pipeline_Use::elaborated_elements, sizeof(Pipeline_Use::elaborated_elements)); > | ^ > In file included from ?/src/hotspot/share/opto/ad.hpp:31, > from ?/src/hotspot/share/opto/output.cpp:37: > ad_x86.hpp:6196:7: note: 'class Pipeline_Use_Element' declared here > ?/src/hotspot/share/opto/output.cpp: In member function 'void Scheduling::step_and_clear()': > ?/src/hotspot/share/opto/output.cpp:1797:51: error: 'void* memcpy(void*, const void*, size_t)' writing to an object of type 'class Pipeline_Use_Element' with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Werror=class-memaccess] > 1797 | sizeof(Pipeline_Use::elaborated_elements)); > | ^ > > Looking at the code generator, it seems that the default assignment > operator will do the right thing in all cases. Can we remove generation > of this C++ code fragment? > > Thanks, > Florian I agree with that analysis and the suggested solution approach. I?ve filed a bug for this issue: https://bugs.openjdk.java.net/browse/JDK-8233941 From bsrbnd at gmail.com Mon Nov 11 20:24:24 2019 From: bsrbnd at gmail.com (B. Blaser) Date: Mon, 11 Nov 2019 21:24:24 +0100 Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: References: <07845286-F5FB-4A7C-925B-8C5488EA7A02@oracle.com> Message-ID: I pushed it to jdk/submit and all seems to be OK: http://hg.openjdk.java.net/jdk/submit/rev/cbe81ae81095 Should we run more tests? Bernard On Mon, 11 Nov 2019 at 21:16, Vladimir Kozlov wrote: > > On 11/11/19 12:14 PM, Vladimir Kozlov wrote: > > Very good. What testing you did? > > I mean in addition to tier1. > > Vladimir > > > > > Thanks > > Vladimir > > > >> On Nov 11, 2019, at 9:23 AM, B. Blaser wrote: > >> > >> Hi, > >> > >> Please review the following fix for [1] which has been extensively > >> discussed in [2]: > >> > >> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/ > >> > >> It includes: > >> * Vladimir's requested changes about instruction encoding and testing > >> along with: > >> * John's suggested complementary benchmark following Sandhya's note > >> regarding throughput. > >> > >> Thanks (hotspot:tier1 is OK on Linux/x86_64), > >> Bernard > >> > >> [1] https://bugs.openjdk.java.net/browse/JDK-8214239 > >> [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/035699.html > > From fweimer at redhat.com Mon Nov 11 20:29:01 2019 From: fweimer at redhat.com (Florian Weimer) Date: Mon, 11 Nov 2019 21:29:01 +0100 Subject: RFR 8233941: adlc should not generate Pipeline_Use_Cycle_Mask::operator= (was: Re: adlc-generated operator= for Pipeline_Use_Cycle_Mask) In-Reply-To: <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com> (Kim Barrett's message of "Mon, 11 Nov 2019 15:20:24 -0500") References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com> <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com> Message-ID: <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com> * Kim Barrett: > I agree with that analysis and the suggested solution approach. I?ve > filed a bug for this issue: > https://bugs.openjdk.java.net/browse/JDK-8233941 Thanks, here's a webrev: I have put this through some testing on x86-64 (although that exercises only one branch). Thanks, Florian From vladimir.kozlov at oracle.com Mon Nov 11 20:31:37 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 11 Nov 2019 12:31:37 -0800 Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: References: <07845286-F5FB-4A7C-925B-8C5488EA7A02@oracle.com> Message-ID: It is fine. I will submit our internal testing (more tiers) and let you know results. Vladimir On 11/11/19 12:24 PM, B. Blaser wrote: > I pushed it to jdk/submit and all seems to be OK: > > http://hg.openjdk.java.net/jdk/submit/rev/cbe81ae81095 > > Should we run more tests? > > Bernard > > On Mon, 11 Nov 2019 at 21:16, Vladimir Kozlov > wrote: >> >> On 11/11/19 12:14 PM, Vladimir Kozlov wrote: >>> Very good. What testing you did? >> >> I mean in addition to tier1. >> >> Vladimir >> >>> >>> Thanks >>> Vladimir >>> >>>> On Nov 11, 2019, at 9:23 AM, B. Blaser wrote: >>>> >>>> Hi, >>>> >>>> Please review the following fix for [1] which has been extensively >>>> discussed in [2]: >>>> >>>> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/ >>>> >>>> It includes: >>>> * Vladimir's requested changes about instruction encoding and testing >>>> along with: >>>> * John's suggested complementary benchmark following Sandhya's note >>>> regarding throughput. >>>> >>>> Thanks (hotspot:tier1 is OK on Linux/x86_64), >>>> Bernard >>>> >>>> [1] https://bugs.openjdk.java.net/browse/JDK-8214239 >>>> [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/035699.html >>> From vladimir.kozlov at oracle.com Mon Nov 11 20:37:18 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 11 Nov 2019 12:37:18 -0800 Subject: RFR 8233941: adlc should not generate Pipeline_Use_Cycle_Mask::operator= In-Reply-To: <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com> References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com> <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com> <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com> Message-ID: <44025d2f-9d0d-b42f-816f-582e67df62e2@oracle.com> Patch is empty. Vladimir On 11/11/19 12:29 PM, Florian Weimer wrote: > * Kim Barrett: > >> I agree with that analysis and the suggested solution approach. I?ve >> filed a bug for this issue: > >> https://bugs.openjdk.java.net/browse/JDK-8233941 > > Thanks, here's a webrev: > > > > I have put this through some testing on x86-64 (although that exercises > only one branch). > > Thanks, > Florian > From fweimer at redhat.com Mon Nov 11 21:36:18 2019 From: fweimer at redhat.com (Florian Weimer) Date: Mon, 11 Nov 2019 22:36:18 +0100 Subject: RFR 8233941: adlc should not generate Pipeline_Use_Cycle_Mask::operator= In-Reply-To: <44025d2f-9d0d-b42f-816f-582e67df62e2@oracle.com> (Vladimir Kozlov's message of "Mon, 11 Nov 2019 12:37:18 -0800") References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com> <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com> <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com> <44025d2f-9d0d-b42f-816f-582e67df62e2@oracle.com> Message-ID: <87k186rt7x.fsf@oldenburg2.str.redhat.com> * Vladimir Kozlov: > Patch is empty. Ugh. Is it better now? Thanks, Florian > > Vladimir > > On 11/11/19 12:29 PM, Florian Weimer wrote: >> * Kim Barrett: >> >>> I agree with that analysis and the suggested solution approach. I?ve >>> filed a bug for this issue: >> >>> https://bugs.openjdk.java.net/browse/JDK-8233941 >> >> Thanks, here's a webrev: >> >> >> >> I have put this through some testing on x86-64 (although that exercises >> only one branch). >> >> Thanks, >> Florian >> From doug.simon at oracle.com Mon Nov 11 21:44:37 2019 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 11 Nov 2019 22:44:37 +0100 Subject: RFR(S) : 8233900: [JVMCI] improve help text for EnableJVMCIProduct option In-Reply-To: <3C974BFC-3CF7-4214-B4F3-EE630C32EBF3@oracle.com> References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com> <3C974BFC-3CF7-4214-B4F3-EE630C32EBF3@oracle.com> Message-ID: <23E9D7B9-F573-4BD7-BF7A-9434AECCE30C@oracle.com> Thanks for the review Vladimir. -Doug > On 11 Nov 2019, at 17:59, Vladimir Kozlov wrote: > > Looks good. Do you need this be backported to 11u? Add affected versions and fix version to rfe. > > Thanks > Vladimir > >> On Nov 11, 2019, at 2:54 AM, Doug Simon wrote: >> >> Hi, >> >> Please review this change to improve the help text for EnableJVMCIProduct and related options. >> >> https://dougxc.github.io/webrevs/8233900 >> https://bugs.openjdk.java.net/browse/JDK-8233900 >> >> -Doug > From vladimir.kozlov at oracle.com Mon Nov 11 23:35:45 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 11 Nov 2019 15:35:45 -0800 Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: References: <07845286-F5FB-4A7C-925B-8C5488EA7A02@oracle.com> Message-ID: <6ebb5cee-04ca-4cb8-5c6a-6dc29f52d73a@oracle.com> Bernard, Testing passed clean. After second review it can be pushed. Thanks, Vladimir On 11/11/19 12:31 PM, Vladimir Kozlov wrote: > It is fine. I will submit our internal testing (more tiers) and let you know results. > > Vladimir > > On 11/11/19 12:24 PM, B. Blaser wrote: >> I pushed it to jdk/submit and all seems to be OK: >> >> http://hg.openjdk.java.net/jdk/submit/rev/cbe81ae81095 >> >> Should we run more tests? >> >> Bernard >> >> On Mon, 11 Nov 2019 at 21:16, Vladimir Kozlov >> wrote: >>> >>> On 11/11/19 12:14 PM, Vladimir Kozlov wrote: >>>> Very good. What testing you did? >>> >>> I mean in addition to tier1. >>> >>> Vladimir >>> >>>> >>>> Thanks >>>> Vladimir >>>> >>>>> On Nov 11, 2019, at 9:23 AM, B. Blaser wrote: >>>>> >>>>> Hi, >>>>> >>>>> Please review the following fix for [1] which has been extensively >>>>> discussed in [2]: >>>>> >>>>> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/ >>>>> >>>>> It includes: >>>>> * Vladimir's requested changes about instruction encoding and testing >>>>> along with: >>>>> * John's suggested complementary benchmark following Sandhya's note >>>>> regarding throughput. >>>>> >>>>> Thanks (hotspot:tier1 is OK on Linux/x86_64), >>>>> Bernard >>>>> >>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8214239 >>>>> [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/035699.html >>>> From vladimir.kozlov at oracle.com Mon Nov 11 23:37:10 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 11 Nov 2019 15:37:10 -0800 Subject: RFR 8233941: adlc should not generate Pipeline_Use_Cycle_Mask::operator= In-Reply-To: <87k186rt7x.fsf@oldenburg2.str.redhat.com> References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com> <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com> <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com> <44025d2f-9d0d-b42f-816f-582e67df62e2@oracle.com> <87k186rt7x.fsf@oldenburg2.str.redhat.com> Message-ID: <72d6aa27-014c-7344-b782-c7f9b9bfd5ed@oracle.com> Yes, it looks good. Someone have to test it to make sure it builds and runs on all supported platforms (and compilers). Thanks, vladimir On 11/11/19 1:36 PM, Florian Weimer wrote: > * Vladimir Kozlov: > >> Patch is empty. > > Ugh. Is it better now? > > Thanks, > Florian > >> >> Vladimir >> >> On 11/11/19 12:29 PM, Florian Weimer wrote: >>> * Kim Barrett: >>> >>>> I agree with that analysis and the suggested solution approach. I?ve >>>> filed a bug for this issue: >>> >>>> https://bugs.openjdk.java.net/browse/JDK-8233941 >>> >>> Thanks, here's a webrev: >>> >>> >>> >>> I have put this through some testing on x86-64 (although that exercises >>> only one branch). >>> >>> Thanks, >>> Florian >>> > From gromero at linux.vnet.ibm.com Tue Nov 12 02:54:17 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 11 Nov 2019 23:54:17 -0300 Subject: [8u] RFR for backport of 8216060 (CRC32 3/4): [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays In-Reply-To: <675a6c68-ca08-27ba-d9cb-8fa02efc5102@linux.vnet.ibm.com> References: <675a6c68-ca08-27ba-d9cb-8fa02efc5102@linux.vnet.ibm.com> Message-ID: <92c29733-7916-3069-b913-4c51fc5424df@linux.vnet.ibm.com> Hi Martin, Please find v3 for CRC32 3/4 accordingly to your last review in: http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v3_/8216060/ I kept the at the CRC32C defines. Thank you & best regards, Gustavo From gromero at linux.vnet.ibm.com Tue Nov 12 02:58:48 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 11 Nov 2019 23:58:48 -0300 Subject: [8u] RFR for backport of 8217459 (CRC32 4/4): [PPC64] Cleanup non-vector version of CRC32 In-Reply-To: <06bc93e6-59b4-95ec-7a27-ef789ac51564@linux.vnet.ibm.com> References: <06bc93e6-59b4-95ec-7a27-ef789ac51564@linux.vnet.ibm.com> Message-ID: Hi Martin, Change 8206173 is now backported to jdk8u-dev: http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/9148fcba5de9 So could you please review v3 CRC32 4/4 accordingly to your last review? The change now is PPC64-only since shared code part was pushed with 8206173. Please find v3 in: http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v3_/8217459/ Thank you & best regards, Gustavo From tobias.hartmann at oracle.com Tue Nov 12 08:05:56 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 12 Nov 2019 09:05:56 +0100 Subject: RFR 8233941: adlc should not generate Pipeline_Use_Cycle_Mask::operator= In-Reply-To: <72d6aa27-014c-7344-b782-c7f9b9bfd5ed@oracle.com> References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com> <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com> <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com> <44025d2f-9d0d-b42f-816f-582e67df62e2@oracle.com> <87k186rt7x.fsf@oldenburg2.str.redhat.com> <72d6aa27-014c-7344-b782-c7f9b9bfd5ed@oracle.com> Message-ID: <8743f6f3-37a5-e1f7-e607-5a7cfe75c5d5@oracle.com> Looks good to me too. I'll run some testing and sponsor if everything passes. Best regards, Tobias On 12.11.19 00:37, Vladimir Kozlov wrote: > Yes, it looks good. > > Someone have to test it to make sure it builds and runs on all supported platforms (and compilers). > > Thanks, > vladimir > > On 11/11/19 1:36 PM, Florian Weimer wrote: >> * Vladimir Kozlov: >> >>> Patch is empty. >> >> Ugh.? Is it better now? >> >> Thanks, >> Florian >> >>> >>> Vladimir >>> >>> On 11/11/19 12:29 PM, Florian Weimer wrote: >>>> * Kim Barrett: >>>> >>>>> I agree with that analysis and the suggested solution approach.? I?ve >>>>> filed a bug for this issue: >>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8233941 >>>> >>>> Thanks, here's a webrev: >>>> >>>> ??? >>>> >>>> I have put this through some testing on x86-64 (although that exercises >>>> only one branch). >>>> >>>> Thanks, >>>> Florian >>>> >> From tobias.hartmann at oracle.com Tue Nov 12 08:30:28 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 12 Nov 2019 09:30:28 +0100 Subject: RFR(S): Test crashed with assert(phi->operand_count() != 1 || phi->subst() != phi) failed: missed trivial simplification In-Reply-To: References: Message-ID: <9f3b6da1-9d86-5850-14b7-ea53d99e54ab@oracle.com> Hi Martin, looks reasonable to me. Best regards, Tobias On 11.11.19 11:39, Doerr, Martin wrote: > Hi, > > some C1 assertions currently don't deal correctly with illegal phi functions (phi function with illegal type due to type conflict). > Since JDK-8214352 we bail out in fewer situations, so C1 needs to deal with a few more illegal phi cases. > > The assertion (see headline) in PhiSimplifier doesn't support illegal phi, but the simplify call before it does (by just skipping). > > Another assertion which needs to skip illegal phi is in c1_Optimizer where we merge a block with its unique successor. > If a local value of the succeeding block is coming from an illegal phi function, we drop it and take the value from the first block. > This is correct, only the assertion doesn't expect that at the moment. > > So I'd like to fix these 2 assertions I found: > http://cr.openjdk.java.net/~mdoerr/8233820_C1_illegal_phi/webrev.00/ > > Best regards, > Martin > From tobias.hartmann at oracle.com Tue Nov 12 08:36:59 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 12 Nov 2019 09:36:59 +0100 Subject: RFR: 8233885: Test fails with assert(comp != __null) failed: Ensure we have a compiler In-Reply-To: References: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn> <11304b85-3a28-017b-645e-e76fe4c2d0f8@loongson.cn> <016d980a-1197-be1e-6078-23290e9ae365@oracle.com> Message-ID: <9967d8f1-19d1-6ca1-5ad2-490c36801544@oracle.com> Hi Jie, seems reasonable to me but Igor should have a look as well. Thanks, Tobias On 11.11.19 14:28, Jie Fu wrote: > Hi Tobias, > > Thank you for your review and valuable comments. > Updated: http://cr.openjdk.java.net/~jiefu/8233885/webrev.02/ > > Thanks a lot. > Best regards, > Jie > > On 2019/11/11 ??3:44, Tobias Hartmann wrote: >> Hi Jie, >> >> what about the high-only-quick-internal mode? > It's really a nice catch. > Fixed. Thanks. > > >> While looking at the fix for 8227003, I spotted a little typo here ("mininum"): >> https://hg.openjdk.java.net/jdk/jdk/rev/b95bead30957#l6.8 >> Maybe you can fix that as well with your patch. > Done. > From vladimir.x.ivanov at oracle.com Tue Nov 12 09:00:21 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 12 Nov 2019 12:00:21 +0300 Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: References: Message-ID: <15fe28f9-e3e9-2766-2287-0fb1762b4414@oracle.com> Hi Bernard, Nice improvement! > http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/ I don't see cases for non-constant masks John suggested covered. Have you tried to implement them? Any problems encountered or did you just leave them for future improvement? ==================================================================== I'd like to see asserts in new MacroAssembler methods that imm8 fits into a byte, like: +void Assembler::btsq(Address dst, int imm8) { + assert(isByte(imm8), "not a byte"); + InstructionMark im(this); ==================================================================== src/hotspot/cpu/x86/x86_64.ad: +operand immPow2L() +%{ + // n should be a pure 64-bit power of 2 immediate. + predicate(is_power_of_2_long(n->get_long()) && log2_long(n->get_long()) > 31); +operand immPow2NotL() +%{ + // n should be a pure 64-bit immediate given that not(n) is a power of 2. Why do you limit the optimization to bits in upper half? Is it because ordinary andq/orq instructions work well for the rest? If that's the case, it deserves a comment. (immPow2NotL is a bit misleading: I read it as "power of 2, but not a long". What do you think about immL_NegPow2/immL_Pow2? Not sure how to encode that it's > 2^32, but I would just skip it for now.) ==================================================================== +instruct btrL_mem_imm(memory dst, immPow2NotL src, rFlagsReg cr) %{ + match(Set dst (StoreL dst (AndL (LoadL dst) src))); +instruct btsL_mem_imm(memory dst, immPow2L src, rFlagsReg cr) %{ + match(Set dst (StoreL dst (OrL (LoadL dst) src))); Does it make sense to cover 32-/16-bit cases the same way? (Relates to the earlier question on bits from upper half.) Do you leave out in-register updates because they don't give any benefits compared to andq/orq? Also, please, use "con" instead of "src". It's easier to read when the name reflects that the value is a constant. ==================================================================== test/hotspot/jtreg/compiler/c2/TestBitSetAndReset.java: As I understand, the only test case which exercises new code is: 64 private static void test63() { 65 andq &= ~MASK63; 66 orq |= MASK63; 67 } Please, at least, add a case for MASK32. 29 * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions 30 * -XX:-TieredCompilation -XX:CompileThresholdScaling=0.1 -XX:-Inline 31 * -XX:CompileCommand=print,compiler/c2/TestBitSetAndReset.test* 32 * -XX:CompileCommand=compileonly,compiler/c2/TestBitSetAndReset.test* 33 * compiler.c2.TestBitSetAndReset Since you explicitly disable tiered, you can just directly set the threshold instead (-XX:CompileThreshold=1000). -XX:-Inline is redundant and you can replace compileonly directive with dontinline to speed up the test. Best regards, Vladimir Ivanov > > It includes: > * Vladimir's requested changes about instruction encoding and testing > along with: > * John's suggested complementary benchmark following Sandhya's note > regarding throughput. > > Thanks (hotspot:tier1 is OK on Linux/x86_64), > Bernard > > [1] https://bugs.openjdk.java.net/browse/JDK-8214239 > [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/035699.html > From vladimir.x.ivanov at oracle.com Tue Nov 12 09:02:27 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 12 Nov 2019 12:02:27 +0300 Subject: RFR(S): Test crashed with assert(phi->operand_count() != 1 || phi->subst() != phi) failed: missed trivial simplification In-Reply-To: References: Message-ID: Hi Martin, > http://cr.openjdk.java.net/~mdoerr/8233820_C1_illegal_phi/webrev.00/ Looks good. Best regards, Vladimir Ivanov From igor.veresov at oracle.com Tue Nov 12 09:11:44 2019 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 12 Nov 2019 01:11:44 -0800 Subject: RFR: 8233885: Test fails with assert(comp != __null) failed: Ensure we have a compiler In-Reply-To: <9967d8f1-19d1-6ca1-5ad2-490c36801544@oracle.com> References: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn> <11304b85-3a28-017b-645e-e76fe4c2d0f8@loongson.cn> <016d980a-1197-be1e-6078-23290e9ae365@oracle.com> <9967d8f1-19d1-6ca1-5ad2-490c36801544@oracle.com> Message-ID: Looks good to me. igor > On Nov 12, 2019, at 12:36 AM, Tobias Hartmann wrote: > > Hi Jie, > > seems reasonable to me but Igor should have a look as well. > > Thanks, > Tobias > > On 11.11.19 14:28, Jie Fu wrote: >> Hi Tobias, >> >> Thank you for your review and valuable comments. >> Updated: http://cr.openjdk.java.net/~jiefu/8233885/webrev.02/ >> >> Thanks a lot. >> Best regards, >> Jie >> >> On 2019/11/11 ??3:44, Tobias Hartmann wrote: >>> Hi Jie, >>> >>> what about the high-only-quick-internal mode? >> It's really a nice catch. >> Fixed. Thanks. >> >> >>> While looking at the fix for 8227003, I spotted a little typo here ("mininum"): >>> https://hg.openjdk.java.net/jdk/jdk/rev/b95bead30957#l6.8 >>> Maybe you can fix that as well with your patch. >> Done. >> From tobias.hartmann at oracle.com Tue Nov 12 09:14:05 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 12 Nov 2019 10:14:05 +0100 Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed: must have CFG nodes In-Reply-To: <720f25b4-e153-652a-cef3-f41084d30662@oracle.com> References: <720f25b4-e153-652a-cef3-f41084d30662@oracle.com> Message-ID: <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com> Hi Vladimir, On 11.11.19 13:06, Vladimir Ivanov wrote: > Yes, I'm curious whether it's enough to check that in(0) is NULL, TOP, or Proj(_, TOP). Are there > any other important cases left? I don't think in(0) == NULL can/should happen and I don't think there are any other cases left. > My concerns is that other usages of PhaseGVN::is_dominator() may be affected the same way as well. The method currently has two implementations: - PhaseIdealLoop::is_dominator - PhaseGVN/PhaseIterGVN:is_dominator -> PhaseGVN::is_dominator_helper Both assert is_CFG() for the arguments so it's the callers responsibility to ensure that. > It looks like PhaseGVN::is_dominator_helper() would benefit from additional checks: > > bool PhaseGVN::is_dominator_helper(Node *d, Node *n, bool linear_only) { > ? if (d->is_top() || n->is_top()) { > ??? return false; > ? } > ? assert(d->is_CFG() && n->is_CFG(), "must have CFG nodes"); > ? ... Do you mean simply converting the assert to a check or adding additional asserts? > As an example, InitializeNode::detect_init_independence() does some control normalization first: Yes but that method processes data and control edges. Thanks, Tobias From fujie at loongson.cn Tue Nov 12 09:29:57 2019 From: fujie at loongson.cn (Jie Fu) Date: Tue, 12 Nov 2019 17:29:57 +0800 Subject: RFR: 8233885: Test fails with assert(comp != __null) failed: Ensure we have a compiler In-Reply-To: References: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn> <11304b85-3a28-017b-645e-e76fe4c2d0f8@loongson.cn> <016d980a-1197-be1e-6078-23290e9ae365@oracle.com> <9967d8f1-19d1-6ca1-5ad2-490c36801544@oracle.com> Message-ID: <86d80738-aef5-3d6d-fe40-f226575bbcb9@loongson.cn> Thanks Igor for your review. Hope you can sponsor it. Thanks a lot. Best regards, Jie On 2019/11/12 ??5:11, Igor Veresov wrote: > Looks good to me. > > igor > > > >> On Nov 12, 2019, at 12:36 AM, Tobias Hartmann >> > wrote: >> >> Hi Jie, >> >> seems reasonable to me but Igor should have a look as well. >> >> Thanks, >> Tobias >> >> On 11.11.19 14:28, Jie Fu wrote: >>> Hi Tobias, >>> >>> Thank you for your review and valuable comments. >>> Updated: http://cr.openjdk.java.net/~jiefu/8233885/webrev.02/ >>> >>> Thanks a lot. >>> Best regards, >>> Jie >>> >>> On 2019/11/11 ??3:44, Tobias Hartmann wrote: >>>> Hi Jie, >>>> >>>> what about the high-only-quick-internal mode? >>> It's really a nice catch. >>> Fixed. Thanks. >>> >>> >>>> While looking at the fix for 8227003, I spotted a little typo here >>>> ("mininum"): >>>> https://hg.openjdk.java.net/jdk/jdk/rev/b95bead30957#l6.8 >>>> Maybe you can fix that as well with your patch. >>> Done. >>> > From vladimir.x.ivanov at oracle.com Tue Nov 12 09:39:31 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 12 Nov 2019 12:39:31 +0300 Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed: must have CFG nodes In-Reply-To: <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com> References: <720f25b4-e153-652a-cef3-f41084d30662@oracle.com> <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com> Message-ID: <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com> Hi Tobias, >> My concerns is that other usages of PhaseGVN::is_dominator() may be affected the same way as well. > > The method currently has two implementations: > - PhaseIdealLoop::is_dominator > - PhaseGVN/PhaseIterGVN:is_dominator -> PhaseGVN::is_dominator_helper > > Both assert is_CFG() for the arguments so it's the callers responsibility to ensure that. > >> It looks like PhaseGVN::is_dominator_helper() would benefit from additional checks: >> >> bool PhaseGVN::is_dominator_helper(Node *d, Node *n, bool linear_only) { >> ? if (d->is_top() || n->is_top()) { >> ??? return false; >> ? } >> ? assert(d->is_CFG() && n->is_CFG(), "must have CFG nodes"); >> ? ... > > Do you mean simply converting the assert to a check or adding additional asserts? (Assuming in(0) == NULL or TOP or Proj(_, TOP) are the only cases we need to care about.) Currently, ConstraintCastNode::dominating_cast() does NULL check, PhaseGVN::is_dominator_helper() does TOP check, but Proj(TOP) is left uncovered and it reaches d->is_CFG() check. Your fix covers both TOP and Proj(_, TOP) on ConstraintCastNode::dominating_cast() by doing is_CFG() check. What I'm in favor of is to handle Proj(TOP) case explicitly and there are other places in the code base which do that. (It may sound too subtle, but it doesn't look right when the code performs in(0)->is_CFG() check outside of an assert.) I mentioned InitializeNode::detect_init_independence() as an example how control info processing can be done [1]. It covers NULL, TOP, and Proj(TOP) cases, but without is_CFG() check. Considering current shape of ConstraintCastNode::dominating_cast() and that PhaseGVN::is_dominator_helper() already assumes non-NULL inputs, putting control normalization before TOP checks should solve the problem as well: bool PhaseGVN::is_dominator_helper(Node *d, Node *n, bool linear_only) { + if (d->is_Proj()) d = d->in(0); + if (n->is_Proj()) n = n->in(0); if (d->is_top() || n->is_top()) { return false; } Best regards, Vladimir Ivanov [1] Node* ctl = n->in(0); if (ctl != NULL && !ctl->is_top()) { if (ctl->is_Proj()) ctl = ctl->in(0); ... if (!MemNode::all_controls_dominate(n, this)) bool MemNode::all_controls_dominate(Node* dom, Node* sub) { if (dom == NULL || dom->is_top() || sub == NULL || sub->is_top()) return false; // Conservative answer for dead code >> As an example, InitializeNode::detect_init_independence() does some control normalization first: > > Yes but that method processes data and control edges. > > Thanks, > Tobias > From tobias.hartmann at oracle.com Tue Nov 12 09:46:22 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 12 Nov 2019 10:46:22 +0100 Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed: must have CFG nodes In-Reply-To: <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com> References: <720f25b4-e153-652a-cef3-f41084d30662@oracle.com> <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com> <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com> Message-ID: <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com> On 12.11.19 10:39, Vladimir Ivanov wrote: > (Assuming in(0) == NULL or TOP or Proj(_, TOP) are the only cases we need to care about.) > > Currently, ConstraintCastNode::dominating_cast() does NULL check, PhaseGVN::is_dominator_helper() > does TOP check, but Proj(TOP) is left uncovered and it reaches d->is_CFG() check. > > Your fix covers both TOP and Proj(_, TOP) on ConstraintCastNode::dominating_cast() by doing is_CFG() > check. > > What I'm in favor of is to handle Proj(TOP) case explicitly and there are other places in the code > base which do that. (It may sound too subtle, but it doesn't look right when the code performs > in(0)->is_CFG() check outside of an assert.) > > I mentioned InitializeNode::detect_init_independence() as an example how control info processing can > be done [1]. It covers NULL, TOP, and Proj(TOP) cases, but without is_CFG() check. > > Considering current shape of ConstraintCastNode::dominating_cast() and that > PhaseGVN::is_dominator_helper() already assumes non-NULL inputs, putting control normalization > before TOP checks should solve the problem as well: > > ? bool PhaseGVN::is_dominator_helper(Node *d, Node *n, bool linear_only) { > ? + if (d->is_Proj())? d = d->in(0); > ? + if (n->is_Proj())? n = n->in(0); > ??? if (d->is_top() || n->is_top()) { > ????? return false; > ??? } Okay, let's go with that version then: http://cr.openjdk.java.net/~thartmann/8233656/webrev.01/ I've verified that it still fixes the problem. Thanks, Tobias From vladimir.x.ivanov at oracle.com Tue Nov 12 09:52:08 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 12 Nov 2019 12:52:08 +0300 Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed: must have CFG nodes In-Reply-To: <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com> References: <720f25b4-e153-652a-cef3-f41084d30662@oracle.com> <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com> <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com> <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com> Message-ID: > http://cr.openjdk.java.net/~thartmann/8233656/webrev.01/ Looks good. Best regards, Vladimir Ivanov From tobias.hartmann at oracle.com Tue Nov 12 09:52:27 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 12 Nov 2019 10:52:27 +0100 Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed: must have CFG nodes In-Reply-To: References: <720f25b4-e153-652a-cef3-f41084d30662@oracle.com> <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com> <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com> <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com> Message-ID: <47863dda-8509-2d94-83b5-56ad76c178e2@oracle.com> Thanks Vladimir! Best regards, Tobias On 12.11.19 10:52, Vladimir Ivanov wrote: > >> http://cr.openjdk.java.net/~thartmann/8233656/webrev.01/ > > Looks good. > > Best regards, > Vladimir Ivanov From tobias.hartmann at oracle.com Tue Nov 12 10:49:53 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 12 Nov 2019 11:49:53 +0100 Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed: must have CFG nodes In-Reply-To: <47863dda-8509-2d94-83b5-56ad76c178e2@oracle.com> References: <720f25b4-e153-652a-cef3-f41084d30662@oracle.com> <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com> <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com> <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com> <47863dda-8509-2d94-83b5-56ad76c178e2@oracle.com> Message-ID: Okay, this is actually not correct: If d is an IfTrue projection, we would change d to the corresponding If node which could be a dominator of n while the IfTrue projection is not. Best regards, Tobias On 12.11.19 10:52, Tobias Hartmann wrote: > Thanks Vladimir! > > Best regards, > Tobias > > On 12.11.19 10:52, Vladimir Ivanov wrote: >> >>> http://cr.openjdk.java.net/~thartmann/8233656/webrev.01/ >> >> Looks good. >> >> Best regards, >> Vladimir Ivanov From tobias.hartmann at oracle.com Tue Nov 12 10:58:50 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 12 Nov 2019 11:58:50 +0100 Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed: must have CFG nodes In-Reply-To: References: <720f25b4-e153-652a-cef3-f41084d30662@oracle.com> <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com> <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com> <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com> <47863dda-8509-2d94-83b5-56ad76c178e2@oracle.com> Message-ID: <6608380f-065f-a839-2e40-a58e43cfc7ec@oracle.com> Here's a new version: http://cr.openjdk.java.net/~thartmann/8233656/webrev.02/ Thanks, Tobias On 12.11.19 11:49, Tobias Hartmann wrote: > Okay, this is actually not correct: > If d is an IfTrue projection, we would change d to the corresponding If node which could be a > dominator of n while the IfTrue projection is not. > > Best regards, > Tobias > > On 12.11.19 10:52, Tobias Hartmann wrote: >> Thanks Vladimir! >> >> Best regards, >> Tobias >> >> On 12.11.19 10:52, Vladimir Ivanov wrote: >>> >>>> http://cr.openjdk.java.net/~thartmann/8233656/webrev.01/ >>> >>> Looks good. >>> >>> Best regards, >>> Vladimir Ivanov From vladimir.x.ivanov at oracle.com Tue Nov 12 11:00:31 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 12 Nov 2019 14:00:31 +0300 Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed: must have CFG nodes In-Reply-To: <6608380f-065f-a839-2e40-a58e43cfc7ec@oracle.com> References: <720f25b4-e153-652a-cef3-f41084d30662@oracle.com> <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com> <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com> <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com> <47863dda-8509-2d94-83b5-56ad76c178e2@oracle.com> <6608380f-065f-a839-2e40-a58e43cfc7ec@oracle.com> Message-ID: <55581f09-b442-b379-23ce-236ae46e9fff@oracle.com> > http://cr.openjdk.java.net/~thartmann/8233656/webrev.02/ Looks good. (And sorry for the misleading suggestion.) Best regards, Vladimir Ivanov > On 12.11.19 11:49, Tobias Hartmann wrote: >> Okay, this is actually not correct: >> If d is an IfTrue projection, we would change d to the corresponding If node which could be a >> dominator of n while the IfTrue projection is not. >> >> Best regards, >> Tobias >> >> On 12.11.19 10:52, Tobias Hartmann wrote: >>> Thanks Vladimir! >>> >>> Best regards, >>> Tobias >>> >>> On 12.11.19 10:52, Vladimir Ivanov wrote: >>>> >>>>> http://cr.openjdk.java.net/~thartmann/8233656/webrev.01/ >>>> >>>> Looks good. >>>> >>>> Best regards, >>>> Vladimir Ivanov From tobias.hartmann at oracle.com Tue Nov 12 11:07:40 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 12 Nov 2019 12:07:40 +0100 Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed: must have CFG nodes In-Reply-To: <55581f09-b442-b379-23ce-236ae46e9fff@oracle.com> References: <720f25b4-e153-652a-cef3-f41084d30662@oracle.com> <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com> <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com> <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com> <47863dda-8509-2d94-83b5-56ad76c178e2@oracle.com> <6608380f-065f-a839-2e40-a58e43cfc7ec@oracle.com> <55581f09-b442-b379-23ce-236ae46e9fff@oracle.com> Message-ID: <2bde2c43-1230-3d7b-ebd5-7befd340a801@oracle.com> Thanks again, Vladimir. Best regards, Tobias On 12.11.19 12:00, Vladimir Ivanov wrote: > >> http://cr.openjdk.java.net/~thartmann/8233656/webrev.02/ > > Looks good. (And sorry for the misleading suggestion.) > > Best regards, > Vladimir Ivanov > >> On 12.11.19 11:49, Tobias Hartmann wrote: >>> Okay, this is actually not correct: >>> If d is an IfTrue projection, we would change d to the corresponding If node which could be a >>> dominator of n while the IfTrue projection is not. >>> >>> Best regards, >>> Tobias >>> >>> On 12.11.19 10:52, Tobias Hartmann wrote: >>>> Thanks Vladimir! >>>> >>>> Best regards, >>>> Tobias >>>> >>>> On 12.11.19 10:52, Vladimir Ivanov wrote: >>>>> >>>>>> http://cr.openjdk.java.net/~thartmann/8233656/webrev.01/ >>>>> >>>>> Looks good. >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov From tobias.hartmann at oracle.com Tue Nov 12 11:13:35 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 12 Nov 2019 12:13:35 +0100 Subject: RFR: 8233885: Test fails with assert(comp != __null) failed: Ensure we have a compiler In-Reply-To: <86d80738-aef5-3d6d-fe40-f226575bbcb9@loongson.cn> References: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn> <11304b85-3a28-017b-645e-e76fe4c2d0f8@loongson.cn> <016d980a-1197-be1e-6078-23290e9ae365@oracle.com> <9967d8f1-19d1-6ca1-5ad2-490c36801544@oracle.com> <86d80738-aef5-3d6d-fe40-f226575bbcb9@loongson.cn> Message-ID: <6f97cdb5-ce66-ed35-83e4-37149d2740be@oracle.com> Hi Jie, I've pushed the fix. Best regards, Tobias On 12.11.19 10:29, Jie Fu wrote: > Thanks Igor for your review. > > Hope you can sponsor it. > > Thanks a lot. > Best regards, > Jie > > On 2019/11/12 ??5:11, Igor Veresov wrote: >> Looks good to me. >> >> igor >> >> >> >>> On Nov 12, 2019, at 12:36 AM, Tobias Hartmann >> > wrote: >>> >>> Hi Jie, >>> >>> seems reasonable to me but Igor should have a look as well. >>> >>> Thanks, >>> Tobias >>> >>> On 11.11.19 14:28, Jie Fu wrote: >>>> Hi Tobias, >>>> >>>> Thank you for your review and valuable comments. >>>> Updated: http://cr.openjdk.java.net/~jiefu/8233885/webrev.02/ >>>> >>>> Thanks a lot. >>>> Best regards, >>>> Jie >>>> >>>> On 2019/11/11 ??3:44, Tobias Hartmann wrote: >>>>> Hi Jie, >>>>> >>>>> what about the high-only-quick-internal mode? >>>> It's really a nice catch. >>>> Fixed. Thanks. >>>> >>>> >>>>> While looking at the fix for 8227003, I spotted a little typo here ("mininum"): >>>>> https://hg.openjdk.java.net/jdk/jdk/rev/b95bead30957#l6.8 >>>>> Maybe you can fix that as well with your patch. >>>> Done. >>>> >> From fujie at loongson.cn Tue Nov 12 11:18:43 2019 From: fujie at loongson.cn (Jie Fu) Date: Tue, 12 Nov 2019 19:18:43 +0800 Subject: RFR: 8233885: Test fails with assert(comp != __null) failed: Ensure we have a compiler In-Reply-To: <6f97cdb5-ce66-ed35-83e4-37149d2740be@oracle.com> References: <9a4e6cd5-f5c2-877c-658f-47bd62c59d41@loongson.cn> <11304b85-3a28-017b-645e-e76fe4c2d0f8@loongson.cn> <016d980a-1197-be1e-6078-23290e9ae365@oracle.com> <9967d8f1-19d1-6ca1-5ad2-490c36801544@oracle.com> <86d80738-aef5-3d6d-fe40-f226575bbcb9@loongson.cn> <6f97cdb5-ce66-ed35-83e4-37149d2740be@oracle.com> Message-ID: <19c26acb-e5be-bba4-1ae6-bf2f2137bcd9@loongson.cn> Thank you so much, Tobias. On 2019/11/12 ??7:13, Tobias Hartmann wrote: > I've pushed the fix. From tobias.hartmann at oracle.com Tue Nov 12 11:20:03 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 12 Nov 2019 12:20:03 +0100 Subject: RFR 8233941: adlc should not generate Pipeline_Use_Cycle_Mask::operator= In-Reply-To: <8743f6f3-37a5-e1f7-e607-5a7cfe75c5d5@oracle.com> References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com> <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com> <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com> <44025d2f-9d0d-b42f-816f-582e67df62e2@oracle.com> <87k186rt7x.fsf@oldenburg2.str.redhat.com> <72d6aa27-014c-7344-b782-c7f9b9bfd5ed@oracle.com> <8743f6f3-37a5-e1f7-e607-5a7cfe75c5d5@oracle.com> Message-ID: <635763e6-b435-6028-22a0-9eb5c586df35@oracle.com> On 12.11.19 09:05, Tobias Hartmann wrote: > Looks good to me too. I'll run some testing and sponsor if everything passes. All tests passed. Pushed. Best regards, Tobias From christoph.goettschkes at microdoc.com Tue Nov 12 12:07:47 2019 From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com) Date: Tue, 12 Nov 2019 13:07:47 +0100 Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs. Message-ID: Hi, The test "compiler/codegen/TestCharVect2.java" uses the VM flag "MaxVectorSize" which is not defined for all supported VM configurations. Client VMs exit with an "Unrecognized VM option" error. Bug: https://bugs.openjdk.java.net/browse/JDK-8231954 Webrev: https://cr.openjdk.java.net/~bulasevich/8231954/webrev.00 Igor suggested to use the tag '@requires vm.flavor == "server"', but since other tests like [1] and [2] are also using my suggested approach, and I am uncertain how the flag MaxVectorSize plays together with the JVMCI, I would currently stick with my approach and would like to get more feedback on this topic. Also, the @requires tag would disable the test altogether, but the first run (without the MaxVectorSize option) works in client VMs and might be a viable test case, which is lost if the tag is added. Thanks, Christoph [1] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestNaNVector.java [2] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestPopCountVector.java From bsrbnd at gmail.com Tue Nov 12 12:49:31 2019 From: bsrbnd at gmail.com (B. Blaser) Date: Tue, 12 Nov 2019 13:49:31 +0100 Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: <15fe28f9-e3e9-2766-2287-0fb1762b4414@oracle.com> References: <15fe28f9-e3e9-2766-2287-0fb1762b4414@oracle.com> Message-ID: Hi Vladimir Ivanov, On Tue, 12 Nov 2019 at 10:00, Vladimir Ivanov wrote: > > Hi Bernard, > > Nice improvement! Thanks. > > http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/ > > I don't see cases for non-constant masks John suggested covered. Have > you tried to implement them? Any problems encountered or did you just > leave them for future improvement? I didn't experiment with non-constant masks yet, which is why I left them for future improvements (as told to John). > ==================================================================== > > I'd like to see asserts in new MacroAssembler methods that imm8 fits > into a byte, like: > > +void Assembler::btsq(Address dst, int imm8) { > + assert(isByte(imm8), "not a byte"); > + InstructionMark im(this); I'll do this. > ==================================================================== > > src/hotspot/cpu/x86/x86_64.ad: > > +operand immPow2L() > +%{ > + // n should be a pure 64-bit power of 2 immediate. > + predicate(is_power_of_2_long(n->get_long()) && > log2_long(n->get_long()) > 31); > > +operand immPow2NotL() > +%{ > + // n should be a pure 64-bit immediate given that not(n) is a power of 2. > > Why do you limit the optimization to bits in upper half? Is it because > ordinary andq/orq instructions work well for the rest? If that's the > case, it deserves a comment. On a pure specification basis (Intel optimization manual that Sandhya pointed me to), AND/OR and BTR/BTS have the same latency=1 but a slightly better throughput for the former and when experimenting with values <= 32-bit, I didn't observed much difference or quite imperceptibly in favor of AND/OR. But with pure 64-bit values, the benefit is much more evident because BTR/BTS replaces both a MOV and an AND/OR which is simply better on specification basis (latency=1 for BTR/BTS vs latency=1+1 for MOV + AND/OR). So, I'll update the comments as next: // n should be a pure 64-bit power of 2 immediate because AND/OR works well enough for 8/32-bit values. // n should be a pure 64-bit immediate given that not(n) is a power of 2 because AND/OR works well enough for 8/32-bit values. > (immPow2NotL is a bit misleading: I read it as "power of 2, but not a > long". What do you think about immL_NegPow2/immL_Pow2? Not sure how to > encode that it's > 2^32, but I would just skip it for now.) I agree with immL_NotPow2/immL_Pow2, for the encoding, see below. > ==================================================================== > > +instruct btrL_mem_imm(memory dst, immPow2NotL src, rFlagsReg cr) %{ > + match(Set dst (StoreL dst (AndL (LoadL dst) src))); > > +instruct btsL_mem_imm(memory dst, immPow2L src, rFlagsReg cr) %{ > + match(Set dst (StoreL dst (OrL (LoadL dst) src))); > > Does it make sense to cover 32-/16-bit cases the same way? (Relates to > the earlier question on bits from upper half.) Given my above answer, I don't think this would make sense (but I'll do some more experiments as part of further enhancement issues as discussed with John). > Do you leave out in-register updates because they don't give any > benefits compared to andq/orq? Same answer, I'd have to do further experiments as part of complementary issues. Note that this very one is focusing on the most frequent cases which will likely to become even more common with final fields trusted as constants. > Also, please, use "con" instead of "src". It's easier to read when the > name reflects that the value is a constant. Yes, I'll do this. > ==================================================================== > > test/hotspot/jtreg/compiler/c2/TestBitSetAndReset.java: > > As I understand, the only test case which exercises new code is: > > 64 private static void test63() { > 65 andq &= ~MASK63; > 66 orq |= MASK63; > 67 } > > Please, at least, add a case for MASK32. The existing MASK31 exercises the new code because ffff_ffff_7fff_ffff cannot be written as an AND/OR sign-extended 32-bit value: 03c btrq [RSI + #16 (8-bit)], log2(not(#-2147483649)) # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.andq See 'immPow2NotL' predicate 'log2_long(~n->get_long()) > 30'. Note that 8000_0000 is also a corner case as it can be written as a MOV zero-extended 32-bit value which doesn't seem to be worse than BTR/BTS on experiment basis: 044 movl R10, #2147483648 # long (unsigned 32-bit) 04a orq [RSI + #24 (8-bit)], R10 # long ! Field: org/openjdk/bench/vm/compiler/BitSetAndReset.orq That's why I kept it, see 'immPow2L' predicate 'log2_long(n->get_long()) > 31'. But I agree that a test case for MASK32 would be nice in this situation and I'll add it. > 29 * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions > -XX:+UnlockDiagnosticVMOptions > 30 * -XX:-TieredCompilation > -XX:CompileThresholdScaling=0.1 -XX:-Inline > 31 * > -XX:CompileCommand=print,compiler/c2/TestBitSetAndReset.test* > 32 * > -XX:CompileCommand=compileonly,compiler/c2/TestBitSetAndReset.test* > 33 * compiler.c2.TestBitSetAndReset > > Since you explicitly disable tiered, you can just directly set the > threshold instead (-XX:CompileThreshold=1000). > > -XX:-Inline is redundant and you can replace compileonly directive with > dontinline to speed up the test. Yes, thanks, I'll do this too. Best regards, Bernard > Best regards, > Vladimir Ivanov > > > > It includes: > > * Vladimir's requested changes about instruction encoding and testing > > along with: > > * John's suggested complementary benchmark following Sandhya's note > > regarding throughput. > > > > Thanks (hotspot:tier1 is OK on Linux/x86_64), > > Bernard > > > > [1] https://bugs.openjdk.java.net/browse/JDK-8214239 > > [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/035699.html > > From bsrbnd at gmail.com Tue Nov 12 13:05:33 2019 From: bsrbnd at gmail.com (B. Blaser) Date: Tue, 12 Nov 2019 14:05:33 +0100 Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: <6ebb5cee-04ca-4cb8-5c6a-6dc29f52d73a@oracle.com> References: <07845286-F5FB-4A7C-925B-8C5488EA7A02@oracle.com> <6ebb5cee-04ca-4cb8-5c6a-6dc29f52d73a@oracle.com> Message-ID: Thanks for testing and reviewing. I'll do Vladimir Ivanov's non-functional changes and I'll push it again to jdk/submit to make sure all is right. You're welcome to do additional internal testing if necessary, Bernard On Tue, 12 Nov 2019 at 00:35, Vladimir Kozlov wrote: > > Bernard, > > Testing passed clean. After second review it can be pushed. > > Thanks, > Vladimir > > On 11/11/19 12:31 PM, Vladimir Kozlov wrote: > > It is fine. I will submit our internal testing (more tiers) and let you know results. > > > > Vladimir > > > > On 11/11/19 12:24 PM, B. Blaser wrote: > >> I pushed it to jdk/submit and all seems to be OK: > >> > >> http://hg.openjdk.java.net/jdk/submit/rev/cbe81ae81095 > >> > >> Should we run more tests? > >> > >> Bernard From vladimir.x.ivanov at oracle.com Tue Nov 12 13:32:40 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 12 Nov 2019 16:32:40 +0300 Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: References: <15fe28f9-e3e9-2766-2287-0fb1762b4414@oracle.com> Message-ID: Thanks for the clarifications, Bernard. >>> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/ >> >> I don't see cases for non-constant masks John suggested covered. Have >> you tried to implement them? Any problems encountered or did you just >> leave them for future improvement? > > I didn't experiment with non-constant masks yet, which is why I left > them for future improvements (as told to John). Sounds good. >> Why do you limit the optimization to bits in upper half? Is it because >> ordinary andq/orq instructions work well for the rest? If that's the >> case, it deserves a comment. > > On a pure specification basis (Intel optimization manual that Sandhya > pointed me to), AND/OR and BTR/BTS have the same latency=1 but a > slightly better throughput for the former and when experimenting with > values <= 32-bit, I didn't observed much difference or quite > imperceptibly in favor of AND/OR. But with pure 64-bit values, the > benefit is much more evident because BTR/BTS replaces both a MOV and > an AND/OR which is simply better on specification basis (latency=1 for > BTR/BTS vs latency=1+1 for MOV + AND/OR). So, I'll update the comments > as next: > > // n should be a pure 64-bit power of 2 immediate because AND/OR works > well enough for 8/32-bit values. > // n should be a pure 64-bit immediate given that not(n) is a power of > 2 because AND/OR works well enough for 8/32-bit values. Looks good. > >> (immPow2NotL is a bit misleading: I read it as "power of 2, but not a >> long". What do you think about immL_NegPow2/immL_Pow2? Not sure how to >> encode that it's > 2^32, but I would just skip it for now.) > > I agree with immL_NotPow2/immL_Pow2, for the encoding, see below. One idea to try: you can move "log2_long(n->get_long()) > ..." check from operand declaration to the instruction. operand immL_Pow2() %{ // ... predicate(is_power_of_2_long(n->get_long())); ... operand immL_NotPow2() %{ // ... predicate(is_power_of_2_long(~n->get_long())); ... instruct btrL_mem_imm(memory dst, immL_NotPow2 con, rFlagsReg cr) %{ predicate(log2_long(~in(2)->in(2)->get_long()) > 30); match(Set dst (StoreL dst (AndL (LoadL dst) con))); ... instruct btsL_mem_imm(memory dst, immPow2L con, rFlagsReg cr) %{ predicate(log2_long(in(2)->in(2)->get_long()) > 31); match(Set dst (StoreL dst (OrL (LoadL dst) con))); ... It looks more natural (but also it requires more code) to do such operation-specific dispatching on instructions than on operands. Best regards, Vladimir Ivanov From martin.doerr at sap.com Tue Nov 12 13:54:31 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 12 Nov 2019 13:54:31 +0000 Subject: [8u] RFR for backport of 8216060 (CRC32 3/4): [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays In-Reply-To: <92c29733-7916-3069-b913-4c51fc5424df@linux.vnet.ibm.com> References: <675a6c68-ca08-27ba-d9cb-8fa02efc5102@linux.vnet.ibm.com> <92c29733-7916-3069-b913-4c51fc5424df@linux.vnet.ibm.com> Message-ID: Hi Gustavo, looks good, now. Best regards, Martin > -----Original Message----- > From: Gustavo Romero > Sent: Dienstag, 12. November 2019 03:54 > To: Doerr, Martin > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: [8u] RFR for backport of 8216060 (CRC32 3/4): [PPC64] Vector > CRC implementation should be used by interpreter and be faster for short > arrays > > Hi Martin, > > Please find v3 for CRC32 3/4 accordingly to your last review in: > > http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v3_/8216060/ > > I kept the at the CRC32C defines. > > > Thank you & best regards, > Gustavo From martin.doerr at sap.com Tue Nov 12 13:58:10 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 12 Nov 2019 13:58:10 +0000 Subject: [8u] RFR for backport of 8217459 (CRC32 4/4): [PPC64] Cleanup non-vector version of CRC32 In-Reply-To: References: <06bc93e6-59b4-95ec-7a27-ef789ac51564@linux.vnet.ibm.com> Message-ID: Hi Gustavo, thanks for backporting JDK-8206173 separately. Looks good to me. Best regards, Martin > -----Original Message----- > From: Gustavo Romero > Sent: Dienstag, 12. November 2019 03:59 > To: Doerr, Martin > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: [8u] RFR for backport of 8217459 (CRC32 4/4): [PPC64] Cleanup > non-vector version of CRC32 > > Hi Martin, > > Change 8206173 is now backported to jdk8u-dev: > > http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/9148fcba5de9 > > So could you please review v3 CRC32 4/4 accordingly to your last review? The > change now is PPC64-only since shared code part was pushed with 8206173. > > Please find v3 in: > > http://cr.openjdk.java.net/~gromero/crc32_jdk8u/for-review/v3_/8217459/ > > Thank you & best regards, > Gustavo From patric.hedlin at oracle.com Tue Nov 12 14:16:13 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 12 Nov 2019 15:16:13 +0100 Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 check Message-ID: Dear all, I would like to ask for help to review the following change/update: Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376 Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ 8220376: C2: Int >0 not recognized as !=0 for div by 0 check ??? Adding a simple subsumption test to IfNode::Ideal to enable a local ??? short-circuit for (obviously) redundant if-nodes. Testing: hs-tier1-4, hs-precheckin-comp Best regards, Patric From bob.vandette at oracle.com Tue Nov 12 14:33:21 2019 From: bob.vandette at oracle.com (Bob Vandette) Date: Tue, 12 Nov 2019 09:33:21 -0500 Subject: RFR(S) : 8233900: [JVMCI] improve help text for EnableJVMCIProduct option In-Reply-To: <23E9D7B9-F573-4BD7-BF7A-9434AECCE30C@oracle.com> References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com> <3C974BFC-3CF7-4214-B4F3-EE630C32EBF3@oracle.com> <23E9D7B9-F573-4BD7-BF7A-9434AECCE30C@oracle.com> Message-ID: The changes look fine. Bob. > On Nov 11, 2019, at 4:44 PM, Doug Simon wrote: > > Thanks for the review Vladimir. > > -Doug > >> On 11 Nov 2019, at 17:59, Vladimir Kozlov wrote: >> >> Looks good. Do you need this be backported to 11u? Add affected versions and fix version to rfe. >> >> Thanks >> Vladimir >> >>> On Nov 11, 2019, at 2:54 AM, Doug Simon wrote: >>> >>> Hi, >>> >>> Please review this change to improve the help text for EnableJVMCIProduct and related options. >>> >>> https://dougxc.github.io/webrevs/8233900 >>> https://bugs.openjdk.java.net/browse/JDK-8233900 >>> >>> -Doug >> > From doug.simon at oracle.com Tue Nov 12 14:33:59 2019 From: doug.simon at oracle.com (Doug Simon) Date: Tue, 12 Nov 2019 15:33:59 +0100 Subject: RFR(S) : 8233900: [JVMCI] improve help text for EnableJVMCIProduct option In-Reply-To: References: <4693D193-D1FD-413E-B344-3EB066B66FC4@oracle.com> <3C974BFC-3CF7-4214-B4F3-EE630C32EBF3@oracle.com> <23E9D7B9-F573-4BD7-BF7A-9434AECCE30C@oracle.com> Message-ID: Thanks Bob. -Doug > On 12 Nov 2019, at 15:33, Bob Vandette wrote: > > The changes look fine. > > Bob. > >> On Nov 11, 2019, at 4:44 PM, Doug Simon wrote: >> >> Thanks for the review Vladimir. >> >> -Doug >> >>> On 11 Nov 2019, at 17:59, Vladimir Kozlov wrote: >>> >>> Looks good. Do you need this be backported to 11u? Add affected versions and fix version to rfe. >>> >>> Thanks >>> Vladimir >>> >>>> On Nov 11, 2019, at 2:54 AM, Doug Simon wrote: >>>> >>>> Hi, >>>> >>>> Please review this change to improve the help text for EnableJVMCIProduct and related options. >>>> >>>> https://dougxc.github.io/webrevs/8233900 >>>> https://bugs.openjdk.java.net/browse/JDK-8233900 >>>> >>>> -Doug >>> >> > From gromero at linux.vnet.ibm.com Tue Nov 12 15:54:12 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 12 Nov 2019 12:54:12 -0300 Subject: [8u] RFR for backport of 8217459 (CRC32 4/4): [PPC64] Cleanup non-vector version of CRC32 In-Reply-To: References: <06bc93e6-59b4-95ec-7a27-ef789ac51564@linux.vnet.ibm.com> Message-ID: Hi Martin, On 11/12/2019 10:58 AM, Doerr, Martin wrote: > Hi Gustavo, > > thanks for backporting JDK-8206173 separately. > Looks good to me. Thanks a lot for reviewing the whole patchset. I'll proceed to get the approval to push them, and once it's approved I'll push them all at once. Best regards, Gustavo From fweimer at redhat.com Tue Nov 12 16:02:45 2019 From: fweimer at redhat.com (Florian Weimer) Date: Tue, 12 Nov 2019 17:02:45 +0100 Subject: RFR 8233941: adlc should not generate Pipeline_Use_Cycle_Mask::operator= In-Reply-To: <635763e6-b435-6028-22a0-9eb5c586df35@oracle.com> (Tobias Hartmann's message of "Tue, 12 Nov 2019 12:20:03 +0100") References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com> <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com> <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com> <44025d2f-9d0d-b42f-816f-582e67df62e2@oracle.com> <87k186rt7x.fsf@oldenburg2.str.redhat.com> <72d6aa27-014c-7344-b782-c7f9b9bfd5ed@oracle.com> <8743f6f3-37a5-e1f7-e607-5a7cfe75c5d5@oracle.com> <635763e6-b435-6028-22a0-9eb5c586df35@oracle.com> Message-ID: <87zhh1nkuy.fsf@oldenburg2.str.redhat.com> * Tobias Hartmann: > On 12.11.19 09:05, Tobias Hartmann wrote: >> Looks good to me too. I'll run some testing and sponsor if everything passes. > > All tests passed. Pushed. Thanks! Florian From kim.barrett at oracle.com Tue Nov 12 17:31:51 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 12 Nov 2019 12:31:51 -0500 Subject: RFR 8233941: adlc should not generate Pipeline_Use_Cycle_Mask::operator= In-Reply-To: <635763e6-b435-6028-22a0-9eb5c586df35@oracle.com> References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com> <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com> <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com> <44025d2f-9d0d-b42f-816f-582e67df62e2@oracle.com> <87k186rt7x.fsf@oldenburg2.str.redhat.com> <72d6aa27-014c-7344-b782-c7f9b9bfd5ed@oracle.com> <8743f6f3-37a5-e1f7-e607-5a7cfe75c5d5@oracle.com> <635763e6-b435-6028-22a0-9eb5c586df35@oracle.com> Message-ID: > On Nov 12, 2019, at 6:20 AM, Tobias Hartmann wrote: > > > On 12.11.19 09:05, Tobias Hartmann wrote: >> Looks good to me too. I'll run some testing and sponsor if everything passes. > > All tests passed. Pushed. > > Best regards, > Tobias For the record, the change looked good to me too. Did the copyright year get updated in the changed file? From martin.doerr at sap.com Tue Nov 12 15:40:46 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 12 Nov 2019 15:40:46 +0000 Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread In-Reply-To: References: <09de4093-bcb2-7b96-b660-bd19abfe3e76@oracle.com> <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com> <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com> <8BCAE092-9E05-4B38-9D41-2F3AC889C0AA@oracle.com> <84ef3a8c-5005-6529-5192-b9214e0348ac@oracle.com> Message-ID: Hi Vladimir et al., thanks a lot for running these tests. I've seen that they have passed. I haven't got any more comments, so should I push the latest version? Are you ok with it? Best regards, Martin > -----Original Message----- > From: Vladimir Kozlov > Sent: Donnerstag, 7. November 2019 23:15 > To: David Holmes ; Doerr, Martin > ; Kim Barrett > Cc: dean.long at oracle.com; hotspot-compiler-dev at openjdk.java.net > Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread > > I resubmitted testing. > > Vladimir > > On 11/7/19 1:51 AM, David Holmes wrote: > > On 7/11/2019 7:08 pm, Doerr, Martin wrote: > >> Hi David, > >> > >> get_log only accesses the executing thread's own oop and the ones > before it. So it's ensured by > >> the algorithm that all accessed oops are in live handles. > > > > Okay I see that now. > > > > Thanks, > > David > > > >> The problem is in can_remove when not holding the lock. For that, > webrev.04 avoids accessing the > >> oop of the last compiler thread in the case in which the lock is not held. > >> > >> Best regards, > >> Martin > >> > >> > >>> -----Original Message----- > >>> From: David Holmes > >>> Sent: Mittwoch, 6. November 2019 11:15 > >>> To: Doerr, Martin ; Kim Barrett > >>> > >>> Cc: dean.long at oracle.com; Vladimir Kozlov > (vladimir.kozlov at oracle.com) > >>> ; hotspot-compiler- > dev at openjdk.java.net > >>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread > >>> > >>> Hi Martin, > >>> > >>> On 6/11/2019 7:12 pm, Doerr, Martin wrote: > >>>> Hi Kim, > >>>> > >>>> thanks for confirming. > >>>> > >>>> > >>> > http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr > >>> ev.04/ > >>>> already avoids access to freed handles. > >>> > >>> Sorry I missed your earlier reference to this version. > >>> > >>> So the expectation here is that all accesses to these arrays are guarded > >>> by the CompileThread_lock, but that doesn't seem to hold for get_log ? > >>> > >>> Thanks, > >>> David > >>> ----- > >>> > >>>> I don't really like the complexity of this code. > >>>> Replacing oops in handles would have been much more simple. > >>>> But I can live with either version. > >>>> > >>>> Best regards, > >>>> Martin > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: Kim Barrett > >>>>> Sent: Mittwoch, 6. November 2019 04:09 > >>>>> To: Doerr, Martin > >>>>> Cc: David Holmes ; > dean.long at oracle.com; > >>>>> Vladimir Kozlov (vladimir.kozlov at oracle.com) > >>>>> ; hotspot-compiler- > dev at openjdk.java.net > >>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI > CompilerThread > >>>>> > >>>>>> On Nov 5, 2019, at 3:40 AM, Doerr, Martin > >>> wrote: > >>>>> > >>>>> Coming back in, because this seems to be going off into the weeds > again. > >>>>> > >>>>>>> I don't understand what you mean. If a compiler thread holds an > oop, > >>> any > >>>>>>> oop, it must hold it in a Handle to ensure it can't be gc'd. > >>>>>> > >>>>>> The problem is not related to gc. > >>>>>> My change introduces destroy_global for the handles. This means > that > >>> the > >>>>> OopStorage portion which has held the oop can get freed. > >>>>>> However, other compiler threads are running concurrently. They > may > >>>>> execute code which reads the oop from the handle which is freed by > this > >>>>> thread. > >>>>>> Reading stale data is not a problem here, but reading freed memory > may > >>>>> assert or even crash in general. > >>>>>> I can't see how OopStorage supports reading from handles which > were > >>>>> freed by destroy_global. > >>>>> > >>>>> So don't do that! > >>>>> > >>>>> OopStorage isn't magic. If you are going to look at an OopStorage > >>>>> handle, you have to ensure there won't be concurrent deletion. Use > >>>>> locks or some safe memory reclamation protocol. (GlobalCounter > might > >>>>> be used here, but it depends a lot on what the iterations are doing. A > >>>>> reference counting mechanism is another possibility.) This is no > >>>>> different from any other resource management. > >>>>> > >>>>>> I think it would be safe if the freeing only occurred at safepoints, but > I > >>> don't > >>>>> think this is the case. > >>>>> > >>>>> Assuming the iteration didn?t happen at safepoints (which is just a > way to > >>>>> make the iteration and > >>>>> deletion not concurrent).? And I agree that isn?t the case with the > current > >>>>> code. > >>>> From martin.doerr at sap.com Tue Nov 12 15:17:58 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 12 Nov 2019 15:17:58 +0000 Subject: RFR(S): Test crashed with assert(phi->operand_count() != 1 || phi->subst() != phi) failed: missed trivial simplification In-Reply-To: References: Message-ID: Hi Tobias and Vladimir, thanks for the reviews. Pushed to jdk/jdk. We'll request an 11u backport later. Best regards, Martin > -----Original Message----- > From: Vladimir Ivanov > Sent: Dienstag, 12. November 2019 10:02 > To: Doerr, Martin ; 'hotspot-compiler- > dev at openjdk.java.net' > Subject: Re: RFR(S): Test crashed with assert(phi->operand_count() != 1 || > phi->subst() != phi) failed: missed trivial simplification > > Hi Martin, > > > http://cr.openjdk.java.net/~mdoerr/8233820_C1_illegal_phi/webrev.00/ > > Looks good. > > Best regards, > Vladimir Ivanov From vladimir.kozlov at oracle.com Tue Nov 12 19:07:30 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 12 Nov 2019 11:07:30 -0800 Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs. In-Reply-To: <20191112120936.1D826D285F@aojmv0009> References: <20191112120936.1D826D285F@aojmv0009> Message-ID: <58942EBB-CEC9-4B24-9917-0BE3C6AA96F2@oracle.com> I agree with this fix. Reviewed. Thanks Vladimir > On Nov 12, 2019, at 4:07 AM, christoph.goettschkes at microdoc.com wrote: > > Hi, > > The test "compiler/codegen/TestCharVect2.java" uses the VM flag > "MaxVectorSize" which is not defined for all supported VM configurations. > Client VMs exit with an "Unrecognized VM option" error. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231954 > Webrev: https://cr.openjdk.java.net/~bulasevich/8231954/webrev.00 > > Igor suggested to use the tag '@requires vm.flavor == "server"', but since > other tests like [1] and [2] are also using my suggested approach, and I > am uncertain how the flag MaxVectorSize plays together with the JVMCI, I > would currently stick with my approach and would like to get more feedback > on this topic. Also, the @requires tag would disable the test altogether, > but the first run (without the MaxVectorSize option) works in client VMs > and might be a viable test case, which is lost if the tag is added. > > Thanks, > Christoph > > [1] > https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestNaNVector.java > [2] > https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestPopCountVector.java > From leonid.mesnik at oracle.com Tue Nov 12 19:33:49 2019 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Tue, 12 Nov 2019 11:33:49 -0800 Subject: RFC 8233915: JVMTI FollowReferences: Java Heap Leak not found because of C2 Scalar Replacement In-Reply-To: References: Message-ID: <729138cc-7a21-cf79-947c-c6a68f34237a@oracle.com> Hi I don't make complete review just sanity verified your test headers. I see a couple of potential issues with them. 1) The using Xmx32M could cause OOME failures if test is executed with ZGC. I think that at least 256M should be set. Could you please verify that your tests pass with ZGC enabled. 2) I think it makes sense to add requires vm.opt.TieredCompilation != true to just skip tests if anyone runs them with tiered compilation disabled explicitly. Leonid On 11/11/19 7:29 AM, Reingruber, Richard wrote: > Hi, > > I have created https://bugs.openjdk.java.net/browse/JDK-8233915 > > In short, a set of live objects L is not found using JVMTI FollowReferences() if L is only reachable > from a scalar replaced object in a frame of a C2 compiled method. If L happens to be a growing leak, > then a dynamically loaded JVMTI agent (note: can_tag_objects is an always capability) for heap > diagnostics won't discover L as live and it won't be able to find root references that lead to L. > > I'd like to suggest the implementation for the proposed enhancement JDK-8227745 as bug-fix. > > RFE: https://bugs.openjdk.java.net/browse/JDK-8227745 > Webrev(*): http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.1/ > > Please comment on the suggestion. Dou you see other solutions that allow an agent to discover the > chain of references to L? > > I'd like to work on the complexity as well. One significant simplification could be, if it was > possible to reallocate scalar replaced objects at safepoints (i.e. allow the VM thread to call > Deoptimization::realloc_objects()). The GC interface does not seem to allow this. > > Thanks, Richard. > > (*) Not yet accepted, because deemed too complex for the performance gain. Note that I was able to > reduce webrev.1 in size compared to webrev.0 From vladimir.kozlov at oracle.com Tue Nov 12 19:35:04 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 12 Nov 2019 11:35:04 -0800 Subject: JDK-8230459: Test failed to resume JVMCI CompilerThread In-Reply-To: References: <07e2dd1f-dd01-07e2-42e7-90ea5fdc96f3@oracle.com> <5591d551-9b1c-aef8-4f28-d87ab8953cab@oracle.com> <8BCAE092-9E05-4B38-9D41-2F3AC889C0AA@oracle.com> <84ef3a8c-5005-6529-5192-b9214e0348ac@oracle.com> Message-ID: <459ded9d-f5da-bfbd-6e80-3501e1443b09@oracle.com> On 11/12/19 7:40 AM, Doerr, Martin wrote: > Hi Vladimir et al., > > thanks a lot for running these tests. I've seen that they have passed. > I haven't got any more comments, so should I push the latest version? > Are you ok with it? Yes, please push it. Vladimir > > Best regards, > Martin > > >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: Donnerstag, 7. November 2019 23:15 >> To: David Holmes ; Doerr, Martin >> ; Kim Barrett >> Cc: dean.long at oracle.com; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread >> >> I resubmitted testing. >> >> Vladimir >> >> On 11/7/19 1:51 AM, David Holmes wrote: >>> On 7/11/2019 7:08 pm, Doerr, Martin wrote: >>>> Hi David, >>>> >>>> get_log only accesses the executing thread's own oop and the ones >> before it. So it's ensured by >>>> the algorithm that all accessed oops are in live handles. >>> >>> Okay I see that now. >>> >>> Thanks, >>> David >>> >>>> The problem is in can_remove when not holding the lock. For that, >> webrev.04 avoids accessing the >>>> oop of the last compiler thread in the case in which the lock is not held. >>>> >>>> Best regards, >>>> Martin >>>> >>>> >>>>> -----Original Message----- >>>>> From: David Holmes >>>>> Sent: Mittwoch, 6. November 2019 11:15 >>>>> To: Doerr, Martin ; Kim Barrett >>>>> >>>>> Cc: dean.long at oracle.com; Vladimir Kozlov >> (vladimir.kozlov at oracle.com) >>>>> ; hotspot-compiler- >> dev at openjdk.java.net >>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread >>>>> >>>>> Hi Martin, >>>>> >>>>> On 6/11/2019 7:12 pm, Doerr, Martin wrote: >>>>>> Hi Kim, >>>>>> >>>>>> thanks for confirming. >>>>>> >>>>>> >>>>> >> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr >>>>> ev.04/ >>>>>> already avoids access to freed handles. >>>>> >>>>> Sorry I missed your earlier reference to this version. >>>>> >>>>> So the expectation here is that all accesses to these arrays are guarded >>>>> by the CompileThread_lock, but that doesn't seem to hold for get_log ? >>>>> >>>>> Thanks, >>>>> David >>>>> ----- >>>>> >>>>>> I don't really like the complexity of this code. >>>>>> Replacing oops in handles would have been much more simple. >>>>>> But I can live with either version. >>>>>> >>>>>> Best regards, >>>>>> Martin >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Kim Barrett >>>>>>> Sent: Mittwoch, 6. November 2019 04:09 >>>>>>> To: Doerr, Martin >>>>>>> Cc: David Holmes ; >> dean.long at oracle.com; >>>>>>> Vladimir Kozlov (vladimir.kozlov at oracle.com) >>>>>>> ; hotspot-compiler- >> dev at openjdk.java.net >>>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI >> CompilerThread >>>>>>> >>>>>>>> On Nov 5, 2019, at 3:40 AM, Doerr, Martin >>>>> wrote: >>>>>>> >>>>>>> Coming back in, because this seems to be going off into the weeds >> again. >>>>>>> >>>>>>>>> I don't understand what you mean. If a compiler thread holds an >> oop, >>>>> any >>>>>>>>> oop, it must hold it in a Handle to ensure it can't be gc'd. >>>>>>>> >>>>>>>> The problem is not related to gc. >>>>>>>> My change introduces destroy_global for the handles. This means >> that >>>>> the >>>>>>> OopStorage portion which has held the oop can get freed. >>>>>>>> However, other compiler threads are running concurrently. They >> may >>>>>>> execute code which reads the oop from the handle which is freed by >> this >>>>>>> thread. >>>>>>>> Reading stale data is not a problem here, but reading freed memory >> may >>>>>>> assert or even crash in general. >>>>>>>> I can't see how OopStorage supports reading from handles which >> were >>>>>>> freed by destroy_global. >>>>>>> >>>>>>> So don't do that! >>>>>>> >>>>>>> OopStorage isn't magic. If you are going to look at an OopStorage >>>>>>> handle, you have to ensure there won't be concurrent deletion. Use >>>>>>> locks or some safe memory reclamation protocol. (GlobalCounter >> might >>>>>>> be used here, but it depends a lot on what the iterations are doing. A >>>>>>> reference counting mechanism is another possibility.) This is no >>>>>>> different from any other resource management. >>>>>>> >>>>>>>> I think it would be safe if the freeing only occurred at safepoints, but >> I >>>>> don't >>>>>>> think this is the case. >>>>>>> >>>>>>> Assuming the iteration didn?t happen at safepoints (which is just a >> way to >>>>>>> make the iteration and >>>>>>> deletion not concurrent).? And I agree that isn?t the case with the >> current >>>>>>> code. >>>>>> From igor.ignatyev at oracle.com Tue Nov 12 19:40:46 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 12 Nov 2019 11:40:46 -0800 Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs. In-Reply-To: <20191112120936.1D826D285F@aojmv0009> References: <20191112120936.1D826D285F@aojmv0009> Message-ID: <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com> Hi Christoph, we are trying to get rid of IgnoreUnrecognizedVMOptions in our tests, as in most cases, it causes wasted compute time (as in this test) and can also lead to wrong/deprecated/deleted flags sneaking into the testbase, so I'd like you to reconsider your decision. below are my comments on your concerns regarding @requires approach. > but since other tests like [1] and [2] are also using my suggested approach I'm sorry but this is not an argument. > how the flag MaxVectorSize plays together with the JVMCI as '@requires vm.flavor == "server"' filters configurations based vm build type, it will still allow execution on JVM w/ JVMCI and when JVMCI compiler is selected, as it will still be Server VM build. so, in a sense, the test will be w/ JVMCI in the same way as w/ your approach. > @requires tag would disable the test altogether, but the first run this is the known limitation of jtreg/@requires, and our current way to workaround it is to split a test description based on @requires values, so in this case it will be: > /** > * @test > * @bug 8001183 > * @summary incorrect results of char vectors right shift operaiton > * > * @run main/othervm/timeout=400 -Xbatch -Xmx128m compiler.codegen.TestCharVect2 > */ > /** > * @test > * @bug 8001183 > * @summary incorrect results of char vectors right shift operaiton with different MaxVectorSize > * > * @comment only server VM has MaxVectorSize > * @requires MaxVectorSizevm.flavor == "server" > * @run main/othervm/timeout=400 -Xbatch -Xmx128m -XX:+IgnoreUnrecognizedVMOptions -XX:MaxVectorSize=8 compiler.codegen.TestCharVect2 > * @run main/othervm/timeout=400 -Xbatch -Xmx128m -XX:+IgnoreUnrecognizedVMOptions -XX:MaxVectorSize=16 compiler.codegen.TestCharVect2 > * @run main/othervm/timeout=400 -Xbatch -Xmx128m -XX:+IgnoreUnrecognizedVMOptions -XX:MaxVectorSize=32 compiler.codegen.TestCharVect2 > */ Thanks, -- Igor > On Nov 12, 2019, at 4:07 AM, christoph.goettschkes at microdoc.com wrote: > > Hi, > > The test "compiler/codegen/TestCharVect2.java" uses the VM flag > "MaxVectorSize" which is not defined for all supported VM configurations. > Client VMs exit with an "Unrecognized VM option" error. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231954 > Webrev: https://cr.openjdk.java.net/~bulasevich/8231954/webrev.00 > > Igor suggested to use the tag '@requires vm.flavor == "server"', but since > other tests like [1] and [2] are also using my suggested approach, and I > am uncertain how the flag MaxVectorSize plays together with the JVMCI, I > would currently stick with my approach and would like to get more feedback > on this topic. Also, the @requires tag would disable the test altogether, > but the first run (without the MaxVectorSize option) works in client VMs > and might be a viable test case, which is lost if the tag is added. > > Thanks, > Christoph > > [1] > https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestNaNVector.java > [2] > https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestPopCountVector.java > From vladimir.kozlov at oracle.com Tue Nov 12 19:42:59 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 12 Nov 2019 11:42:59 -0800 Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed: must have CFG nodes In-Reply-To: <55581f09-b442-b379-23ce-236ae46e9fff@oracle.com> References: <720f25b4-e153-652a-cef3-f41084d30662@oracle.com> <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com> <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com> <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com> <47863dda-8509-2d94-83b5-56ad76c178e2@oracle.com> <6608380f-065f-a839-2e40-a58e43cfc7ec@oracle.com> <55581f09-b442-b379-23ce-236ae46e9fff@oracle.com> Message-ID: +1 (good) Thanks, Vladimir K. On 11/12/19 3:00 AM, Vladimir Ivanov wrote: > >> http://cr.openjdk.java.net/~thartmann/8233656/webrev.02/ > > Looks good. (And sorry for the misleading suggestion.) > > Best regards, > Vladimir Ivanov > >> On 12.11.19 11:49, Tobias Hartmann wrote: >>> Okay, this is actually not correct: >>> If d is an IfTrue projection, we would change d to the corresponding If node which could be a >>> dominator of n while the IfTrue projection is not. >>> >>> Best regards, >>> Tobias >>> >>> On 12.11.19 10:52, Tobias Hartmann wrote: >>>> Thanks Vladimir! >>>> >>>> Best regards, >>>> Tobias >>>> >>>> On 12.11.19 10:52, Vladimir Ivanov wrote: >>>>> >>>>>> http://cr.openjdk.java.net/~thartmann/8233656/webrev.01/ >>>>> >>>>> Looks good. >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov From fweimer at redhat.com Tue Nov 12 20:18:32 2019 From: fweimer at redhat.com (Florian Weimer) Date: Tue, 12 Nov 2019 21:18:32 +0100 Subject: RFR 8233941: adlc should not generate Pipeline_Use_Cycle_Mask::operator= In-Reply-To: (Kim Barrett's message of "Tue, 12 Nov 2019 12:31:51 -0500") References: <87o8xiv2u2.fsf@oldenburg2.str.redhat.com> <53916D64-0515-40F4-B606-56DB47FAF9EA@oracle.com> <87v9rqtawi.fsf_-_@oldenburg2.str.redhat.com> <44025d2f-9d0d-b42f-816f-582e67df62e2@oracle.com> <87k186rt7x.fsf@oldenburg2.str.redhat.com> <72d6aa27-014c-7344-b782-c7f9b9bfd5ed@oracle.com> <8743f6f3-37a5-e1f7-e607-5a7cfe75c5d5@oracle.com> <635763e6-b435-6028-22a0-9eb5c586df35@oracle.com> Message-ID: <87y2wklug7.fsf@oldenburg2.str.redhat.com> * Kim Barrett: >> On Nov 12, 2019, at 6:20 AM, Tobias Hartmann wrote: >> >> >> On 12.11.19 09:05, Tobias Hartmann wrote: >>> Looks good to me too. I'll run some testing and sponsor if everything passes. >> >> All tests passed. Pushed. >> >> Best regards, >> Tobias > > For the record, the change looked good to me too. > > Did the copyright year get updated in the changed file? No, it did not. 8-( I forgot that this is necessary in OpenJDK, and this is a file that is only changed rarely. What shall we do now? Thanks, Florian From bsrbnd at gmail.com Tue Nov 12 21:13:52 2019 From: bsrbnd at gmail.com (B. Blaser) Date: Tue, 12 Nov 2019 22:13:52 +0100 Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: References: <15fe28f9-e3e9-2766-2287-0fb1762b4414@oracle.com> Message-ID: Hi Vladimir Kozlov and Ivanov, Please review the updated patch according to Vladimir Ivanov comments: http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.01/ I've pushed it to jdk/submit as second changeset on branch "JDK-8214239" and tests are OK: http://hg.openjdk.java.net/jdk/submit/rev/f961f7a454e4 Any feedback is welcome. Thanks, Bernard On Tue, 12 Nov 2019 at 14:32, Vladimir Ivanov wrote: > > Thanks for the clarifications, Bernard. > > >>> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/ > >> > >> I don't see cases for non-constant masks John suggested covered. Have > >> you tried to implement them? Any problems encountered or did you just > >> leave them for future improvement? > > > > I didn't experiment with non-constant masks yet, which is why I left > > them for future improvements (as told to John). > > Sounds good. > > > >> Why do you limit the optimization to bits in upper half? Is it because > >> ordinary andq/orq instructions work well for the rest? If that's the > >> case, it deserves a comment. > > > > On a pure specification basis (Intel optimization manual that Sandhya > > pointed me to), AND/OR and BTR/BTS have the same latency=1 but a > > slightly better throughput for the former and when experimenting with > > values <= 32-bit, I didn't observed much difference or quite > > imperceptibly in favor of AND/OR. But with pure 64-bit values, the > > benefit is much more evident because BTR/BTS replaces both a MOV and > > an AND/OR which is simply better on specification basis (latency=1 for > > BTR/BTS vs latency=1+1 for MOV + AND/OR). So, I'll update the comments > > as next: > > > > // n should be a pure 64-bit power of 2 immediate because AND/OR works > > well enough for 8/32-bit values. > > // n should be a pure 64-bit immediate given that not(n) is a power of > > 2 because AND/OR works well enough for 8/32-bit values. > > Looks good. > > > > >> (immPow2NotL is a bit misleading: I read it as "power of 2, but not a > >> long". What do you think about immL_NegPow2/immL_Pow2? Not sure how to > >> encode that it's > 2^32, but I would just skip it for now.) > > > > I agree with immL_NotPow2/immL_Pow2, for the encoding, see below. > > One idea to try: you can move "log2_long(n->get_long()) > ..." check > from operand declaration to the instruction. > > operand immL_Pow2() %{ > // ... > predicate(is_power_of_2_long(n->get_long())); > ... > > operand immL_NotPow2() %{ > // ... > predicate(is_power_of_2_long(~n->get_long())); > ... > > instruct btrL_mem_imm(memory dst, immL_NotPow2 con, rFlagsReg cr) %{ > predicate(log2_long(~in(2)->in(2)->get_long()) > 30); > match(Set dst (StoreL dst (AndL (LoadL dst) con))); > ... > > instruct btsL_mem_imm(memory dst, immPow2L con, rFlagsReg cr) %{ > predicate(log2_long(in(2)->in(2)->get_long()) > 31); > match(Set dst (StoreL dst (OrL (LoadL dst) con))); > ... > > It looks more natural (but also it requires more code) to do such > operation-specific dispatching on instructions than on operands. > > Best regards, > Vladimir Ivanov From vladimir.x.ivanov at oracle.com Tue Nov 12 22:11:13 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 13 Nov 2019 01:11:13 +0300 Subject: RFR 8214239 (S): Missing x86_64.ad patterns for clearing and setting long vector bits In-Reply-To: References: <15fe28f9-e3e9-2766-2287-0fb1762b4414@oracle.com> Message-ID: <534e81e9-9eac-6631-805a-7e5616b2f940@oracle.com> > http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.01/ Looks good. PS: it would be nice to be able to directly reference instruction arguments from predicates (in a similar way it is supported in ins_encode section). Then: + predicate(log2_long(~n->in(3)->in(2)->get_long()) > 30); + predicate(log2_long( n->in(3)->in(2)->get_long()) > 31); could be turned into: + predicate(log2_long(~$con->get_long()) > 30); + predicate(log2_long( $con->get_long()) > 31); or even: + predicate(log2_long(~$con$$constant) > 30); + predicate(log2_long( $con$$constant) > 31); Best regards, Vladimir Ivanov > > I've pushed it to jdk/submit as second changeset on branch > "JDK-8214239" and tests are OK: > > http://hg.openjdk.java.net/jdk/submit/rev/f961f7a454e4 > > Any feedback is welcome. > > Thanks, > Bernard > > On Tue, 12 Nov 2019 at 14:32, Vladimir Ivanov > wrote: >> >> Thanks for the clarifications, Bernard. >> >>>>> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/ >>>> >>>> I don't see cases for non-constant masks John suggested covered. Have >>>> you tried to implement them? Any problems encountered or did you just >>>> leave them for future improvement? >>> >>> I didn't experiment with non-constant masks yet, which is why I left >>> them for future improvements (as told to John). >> >> Sounds good. >> >> >>>> Why do you limit the optimization to bits in upper half? Is it because >>>> ordinary andq/orq instructions work well for the rest? If that's the >>>> case, it deserves a comment. >>> >>> On a pure specification basis (Intel optimization manual that Sandhya >>> pointed me to), AND/OR and BTR/BTS have the same latency=1 but a >>> slightly better throughput for the former and when experimenting with >>> values <= 32-bit, I didn't observed much difference or quite >>> imperceptibly in favor of AND/OR. But with pure 64-bit values, the >>> benefit is much more evident because BTR/BTS replaces both a MOV and >>> an AND/OR which is simply better on specification basis (latency=1 for >>> BTR/BTS vs latency=1+1 for MOV + AND/OR). So, I'll update the comments >>> as next: >>> >>> // n should be a pure 64-bit power of 2 immediate because AND/OR works >>> well enough for 8/32-bit values. >>> // n should be a pure 64-bit immediate given that not(n) is a power of >>> 2 because AND/OR works well enough for 8/32-bit values. >> >> Looks good. >> >>> >>>> (immPow2NotL is a bit misleading: I read it as "power of 2, but not a >>>> long". What do you think about immL_NegPow2/immL_Pow2? Not sure how to >>>> encode that it's > 2^32, but I would just skip it for now.) >>> >>> I agree with immL_NotPow2/immL_Pow2, for the encoding, see below. >> >> One idea to try: you can move "log2_long(n->get_long()) > ..." check >> from operand declaration to the instruction. >> >> operand immL_Pow2() %{ >> // ... >> predicate(is_power_of_2_long(n->get_long())); >> ... >> >> operand immL_NotPow2() %{ >> // ... >> predicate(is_power_of_2_long(~n->get_long())); >> ... >> >> instruct btrL_mem_imm(memory dst, immL_NotPow2 con, rFlagsReg cr) %{ >> predicate(log2_long(~in(2)->in(2)->get_long()) > 30); >> match(Set dst (StoreL dst (AndL (LoadL dst) con))); >> ... >> >> instruct btsL_mem_imm(memory dst, immPow2L con, rFlagsReg cr) %{ >> predicate(log2_long(in(2)->in(2)->get_long()) > 31); >> match(Set dst (StoreL dst (OrL (LoadL dst) con))); >> ... >> >> It looks more natural (but also it requires more code) to do such >> operation-specific dispatching on instructions than on operands. >> >> Best regards, >> Vladimir Ivanov From claes.redestad at oracle.com Tue Nov 12 23:09:44 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 13 Nov 2019 00:09:44 +0100 Subject: RFR: 8234003: Improve IndexSet iteration Message-ID: <0f55bfa2-f6e7-9c9e-e10f-a97a909eeb98@oracle.com> Hi, a significant portion of work done during register allocation in C2 is iterating over IndexSets. A few small optimizations show a ~4% decrease in instructions retired by register allocation when instrumenting, and up to 3% fewer instructions retired in total on startup tests. Bug: https://bugs.openjdk.java.net/browse/JDK-8234003 Webrev: http://cr.openjdk.java.net/~redestad/8234003/open.00/ The biggest improvement comes from avoiding iterating over empty sets altogether. A smaller improvement from adding a water mark to avoid iterating over all the blocks in the IndexSet. Testing: tier1-3, verified improvements on large and tiny startup tests, checked that any increased inlining is footprint neutral on Linux. Thanks! /Claes From igor.ignatyev at oracle.com Tue Nov 12 23:24:38 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 12 Nov 2019 15:24:38 -0800 Subject: RFR(T) : 8226795 : compiler/tiered/Level2RecompilationTest.java fails when XX:TieredStopAtLevel=1/2/3 is set Message-ID: http://cr.openjdk.java.net/~iignatyev//8226795/webrev.00/index.html > 13 lines changed: 12 ins; 0 del; 1 mod; Hi all, could you please review this small and trivial patch which adds @requires to exclude Level2RecompilationTest, OSRFailureLevel4Test, and TestTypeProfiling tests from execution w/ non default values of TieredStopAtLevel? the tests expect to have tiered compilation enabled and level 4 compiler available, so they can't be run w/ TieredStopAtLevel < 4. webrev: http://cr.openjdk.java.net/~iignatyev//8226795/webrev.00/index.html JBS: https://bugs.openjdk.java.net/browse/JDK-8226795 testing: verified that changed tests are run w/o TieredStopAtLevel, w/ TieredStopAtLevel=4 and aren't run w/ TieredStopAtLevel != 4 Thanks, -- Igor From claes.redestad at oracle.com Tue Nov 12 23:47:59 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 13 Nov 2019 00:47:59 +0100 Subject: RFR(T) : 8226795 : compiler/tiered/Level2RecompilationTest.java fails when XX:TieredStopAtLevel=1/2/3 is set In-Reply-To: References: Message-ID: Hi Igor, fix looks good to me. A more open question: do tests that need to specifically compile with C2 also need to be made aware of the recently added -XX:CompilationMode=quick-only flag? IIRC this would disable C2, but not in exactly the same way as TieredStopAtLevel=1/2/3 would (TieredStopAtLevel is ergonomically set to 4 in this case). /Claes On 2019-11-13 00:24, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8226795/webrev.00/index.html >> 13 lines changed: 12 ins; 0 del; 1 mod; > > Hi all, > > could you please review this small and trivial patch which adds @requires to exclude Level2RecompilationTest, OSRFailureLevel4Test, and TestTypeProfiling tests from execution w/ non default values of TieredStopAtLevel? > the tests expect to have tiered compilation enabled and level 4 compiler available, so they can't be run w/ TieredStopAtLevel < 4. > > webrev: http://cr.openjdk.java.net/~iignatyev//8226795/webrev.00/index.html > JBS: https://bugs.openjdk.java.net/browse/JDK-8226795 > testing: verified that changed tests are run w/o TieredStopAtLevel, w/ TieredStopAtLevel=4 and aren't run w/ TieredStopAtLevel != 4 > > Thanks, > -- Igor > From igor.ignatyev at oracle.com Tue Nov 12 23:52:55 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 12 Nov 2019 15:52:55 -0800 Subject: RFR(T) : 8226795 : compiler/tiered/Level2RecompilationTest.java fails when XX:TieredStopAtLevel=1/2/3 is set In-Reply-To: References: Message-ID: <68140795-6949-4B4B-9108-13F656F504E9@oracle.com> > On Nov 12, 2019, at 3:47 PM, Claes Redestad wrote: > > Hi Igor, > > fix looks good to me. thanks. > > A more open question: do tests that need to specifically compile with C2 > also need to be made aware of the recently added > -XX:CompilationMode=quick-only flag? IIRC this would disable C2, but not > in exactly the same way as TieredStopAtLevel=1/2/3 would > (TieredStopAtLevel is ergonomically set to 4 in this case). I think they do, I'll file an RFE to improve how we mark c2-only tests. > > /Claes > > On 2019-11-13 00:24, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8226795/webrev.00/index.html >>> 13 lines changed: 12 ins; 0 del; 1 mod; >> Hi all, >> could you please review this small and trivial patch which adds @requires to exclude Level2RecompilationTest, OSRFailureLevel4Test, and TestTypeProfiling tests from execution w/ non default values of TieredStopAtLevel? >> the tests expect to have tiered compilation enabled and level 4 compiler available, so they can't be run w/ TieredStopAtLevel < 4. >> webrev: http://cr.openjdk.java.net/~iignatyev//8226795/webrev.00/index.html >> JBS: https://bugs.openjdk.java.net/browse/JDK-8226795 >> testing: verified that changed tests are run w/o TieredStopAtLevel, w/ TieredStopAtLevel=4 and aren't run w/ TieredStopAtLevel != 4 >> Thanks, >> -- Igor From igor.ignatyev at oracle.com Wed Nov 13 04:18:15 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 12 Nov 2019 20:18:15 -0800 Subject: RFR(S) : 8225756 : [testbug] compiler/loopstripmining/CheckLoopStripMining.java sets too short a SafepointTimeoutDelay Message-ID: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com> http://cr.openjdk.java.net/~iignatyev/8225756/webrev.00/ > 34 lines changed: 24 ins; 3 del; 7 mod; Hi all, could you please review this patch adjust SafepointTimeoutDelay and GuaranteedSafepointInterval in CheckLoopStripMining test according to time out factor? webrev: http://cr.openjdk.java.net/~iignatyev/8225756/webrev.00/ JBS: https://bugs.openjdk.java.net/browse/JDK-8225756 testing: the changed test on windows-x64-debug (where all the failures were seen) 100 times Thanks, -- Igor From vladimir.kozlov at oracle.com Wed Nov 13 05:05:34 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 12 Nov 2019 21:05:34 -0800 Subject: RFR(S) : 8225756 : [testbug] compiler/loopstripmining/CheckLoopStripMining.java sets too short a SafepointTimeoutDelay In-Reply-To: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com> References: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com> Message-ID: Good. thanks, Vladimir On 11/12/19 8:18 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev/8225756/webrev.00/ >> 34 lines changed: 24 ins; 3 del; 7 mod; > > Hi all, > > could you please review this patch adjust SafepointTimeoutDelay and GuaranteedSafepointInterval in CheckLoopStripMining test according to time out factor? > > webrev: http://cr.openjdk.java.net/~iignatyev/8225756/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8225756 > testing: the changed test on windows-x64-debug (where all the failures were seen) 100 times > > Thanks, > -- Igor > From ekaterina.pavlova at oracle.com Wed Nov 13 05:54:35 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Tue, 12 Nov 2019 21:54:35 -0800 Subject: RFR(S) : 8225756 : [testbug] compiler/loopstripmining/CheckLoopStripMining.java sets too short a SafepointTimeoutDelay In-Reply-To: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com> References: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com> Message-ID: <6a25864e-40c2-d8b6-e166-1ec39b81639b@oracle.com> Looks good. thanks, -katya On 11/12/19 8:18 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev/8225756/webrev.00/ >> 34 lines changed: 24 ins; 3 del; 7 mod; > > Hi all, > > could you please review this patch adjust SafepointTimeoutDelay and GuaranteedSafepointInterval in CheckLoopStripMining test according to time out factor? > > webrev: http://cr.openjdk.java.net/~iignatyev/8225756/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8225756 > testing: the changed test on windows-x64-debug (where all the failures were seen) 100 times > > Thanks, > -- Igor > From tobias.hartmann at oracle.com Wed Nov 13 07:40:51 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 13 Nov 2019 08:40:51 +0100 Subject: [14] RFR(S): 8233656: assert(d->is_CFG() && n->is_CFG()) failed: must have CFG nodes In-Reply-To: References: <720f25b4-e153-652a-cef3-f41084d30662@oracle.com> <0ecca86f-b999-2131-4907-a75e2f653c73@oracle.com> <72928e9c-6b9d-bfc0-5cca-7afea8e1ac35@oracle.com> <6419beb6-af60-a36a-5029-458f4de7e19b@oracle.com> <47863dda-8509-2d94-83b5-56ad76c178e2@oracle.com> <6608380f-065f-a839-2e40-a58e43cfc7ec@oracle.com> <55581f09-b442-b379-23ce-236ae46e9fff@oracle.com> Message-ID: <755ad6bd-e29a-3295-329f-452d695515d6@oracle.com> Thanks Vladimir. Best regards, Tobias On 12.11.19 20:42, Vladimir Kozlov wrote: > +1 (good) > > Thanks, > Vladimir K. > > On 11/12/19 3:00 AM, Vladimir Ivanov wrote: >> >>> http://cr.openjdk.java.net/~thartmann/8233656/webrev.02/ >> >> Looks good. (And sorry for the misleading suggestion.) >> >> Best regards, >> Vladimir Ivanov >> >>> On 12.11.19 11:49, Tobias Hartmann wrote: >>>> Okay, this is actually not correct: >>>> If d is an IfTrue projection, we would change d to the corresponding If node which could be a >>>> dominator of n while the IfTrue projection is not. >>>> >>>> Best regards, >>>> Tobias >>>> >>>> On 12.11.19 10:52, Tobias Hartmann wrote: >>>>> Thanks Vladimir! >>>>> >>>>> Best regards, >>>>> Tobias >>>>> >>>>> On 12.11.19 10:52, Vladimir Ivanov wrote: >>>>>> >>>>>>> http://cr.openjdk.java.net/~thartmann/8233656/webrev.01/ >>>>>> >>>>>> Looks good. >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov From rwestrel at redhat.com Wed Nov 13 09:19:44 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 13 Nov 2019 10:19:44 +0100 Subject: RFR(S) : 8225756 : [testbug] compiler/loopstripmining/CheckLoopStripMining.java sets too short a SafepointTimeoutDelay In-Reply-To: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com> References: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com> Message-ID: <87a790dtfz.fsf@redhat.com> Looks reasonable to me. Thanks for fixing this, Igor. Roland. From Pengfei.Li at arm.com Wed Nov 13 09:55:48 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Wed, 13 Nov 2019 09:55:48 +0000 Subject: RFR(M): 8233743: AArch64: Make r27 conditionally allocatable Message-ID: Hi, JBS: https://bugs.openjdk.java.net/browse/JDK-8233743 Webrev: http://cr.openjdk.java.net/~pli/rfr/8233743/webrev.00/ This is a follow-up patch of JDK-8217909[1] to make the AArch64 register r27 allocatable when CompressedOops and CompressedClassPointers are both turned off. Below changes have been made: - Massage the RegMask(s) in reg_mask_init() at C2 initialization and remove r27 from some of the masks conditionally to make it allocatable. - Also make r29 conditionally reserved in this similar way. - Make r29 allocatable for pointers as well as integers. - Replace an rheapbase use to rscratch1 in AArch64 ZGC. - Revert JDK-8231754[2] which makes r27 always reserved in JVMCI. This patch aligns with the implementation in [1] which makes the x86_64 r12 register allocatable. Please let me know if I have missed anything for AArch64. Tests: Full jtreg with default options and extra options "-XX:-UseCompressedOops -XX:+PreserveFramePointer". No new failure is found. [1] https://hg.openjdk.java.net/jdk/jdk/rev/48b50573dee4 [2] https://hg.openjdk.java.net/jdk/jdk/rev/d068b1e534de -- Thanks, Pengfei From martin.doerr at sap.com Wed Nov 13 11:37:05 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 13 Nov 2019 11:37:05 +0000 Subject: RFR(S) : 8225756 : [testbug] compiler/loopstripmining/CheckLoopStripMining.java sets too short a SafepointTimeoutDelay In-Reply-To: <87a790dtfz.fsf@redhat.com> References: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com> <87a790dtfz.fsf@redhat.com> Message-ID: Hi Igor, thanks for improving it. Please note that this test was derived from test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.java JDK-8227528 has added -XX:-UseBiasedLocking. Would you mind adding that to your new version, too? Please also remove double-whitespace before Utils.adjustTimeout(500). Thanks and best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Roland Westrelin > Sent: Mittwoch, 13. November 2019 10:20 > To: Igor Ignatyev ; hotspot compiler compiler-dev at openjdk.java.net> > Subject: Re: RFR(S) : 8225756 : [testbug] > compiler/loopstripmining/CheckLoopStripMining.java sets too short a > SafepointTimeoutDelay > > > Looks reasonable to me. > > Thanks for fixing this, Igor. > > Roland. From richard.reingruber at sap.com Wed Nov 13 12:24:41 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Wed, 13 Nov 2019 12:24:41 +0000 Subject: RFC 8233915: JVMTI FollowReferences: Java Heap Leak not found because of C2 Scalar Replacement In-Reply-To: <729138cc-7a21-cf79-947c-c6a68f34237a@oracle.com> References: <729138cc-7a21-cf79-947c-c6a68f34237a@oracle.com> Message-ID: Hi Leonid, these are valid points. Thanks for making me aware of them. I've increased the maximum heap size in my tests as suggested, and I've also run them with ZGC enabled. I've also added the vm.opt.TieredCompilation != true requirement. I've done the changes in place. Thanks, Richard. -----Original Message----- From: hotspot-compiler-dev On Behalf Of Leonid Mesnik Sent: Dienstag, 12. November 2019 20:34 To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFC 8233915: JVMTI FollowReferences: Java Heap Leak not found because of C2 Scalar Replacement Hi I don't make complete review just sanity verified your test headers. I see a couple of potential issues with them. 1) The using Xmx32M could cause OOME failures if test is executed with ZGC. I think that at least 256M should be set. Could you please verify that your tests pass with ZGC enabled. 2) I think it makes sense to add requires vm.opt.TieredCompilation != true to just skip tests if anyone runs them with tiered compilation disabled explicitly. Leonid On 11/11/19 7:29 AM, Reingruber, Richard wrote: > Hi, > > I have created https://bugs.openjdk.java.net/browse/JDK-8233915 > > In short, a set of live objects L is not found using JVMTI FollowReferences() if L is only reachable > from a scalar replaced object in a frame of a C2 compiled method. If L happens to be a growing leak, > then a dynamically loaded JVMTI agent (note: can_tag_objects is an always capability) for heap > diagnostics won't discover L as live and it won't be able to find root references that lead to L. > > I'd like to suggest the implementation for the proposed enhancement JDK-8227745 as bug-fix. > > RFE: https://bugs.openjdk.java.net/browse/JDK-8227745 > Webrev(*): http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.1/ > > Please comment on the suggestion. Dou you see other solutions that allow an agent to discover the > chain of references to L? > > I'd like to work on the complexity as well. One significant simplification could be, if it was > possible to reallocate scalar replaced objects at safepoints (i.e. allow the VM thread to call > Deoptimization::realloc_objects()). The GC interface does not seem to allow this. > > Thanks, Richard. > > (*) Not yet accepted, because deemed too complex for the performance gain. Note that I was able to > reduce webrev.1 in size compared to webrev.0 From christoph.goettschkes at microdoc.com Wed Nov 13 12:42:58 2019 From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com) Date: Wed, 13 Nov 2019 13:42:58 +0100 Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs. In-Reply-To: <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com> References: <20191112120936.1D826D285F@aojmv0009> <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com> Message-ID: Hi Igor, thanks for your explanation. Igor Ignatyev wrote on 2019-11-12 20:40:46: > we are trying to get rid of IgnoreUnrecognizedVMOptions in our tests, as > in most cases, it causes wasted compute time (as in this test) and can > also lead to wrong/deprecated/deleted flags sneaking into the testbase Agreed. I also wanted to discuss this, since I think that your solution is better than mine, but at the same time, I saw possible problems with it, see below. > as '@requires vm.flavor == "server"' filters configurations based vm > build type, it will still allow execution on JVM w/ JVMCI and when JVMCI > compiler is selected, as it will still be Server VM build. so, in a > sense, the test will be w/ JVMCI in the same way as w/ your approach. My concern is not about server VMs with JVMCI, but client VMs with JVMCI enabled. Is this a valid configuration? The MaxVectorSize option is defined in [1] as well as in [2], so for me it looks like MaxVectorSize can be used for any VM variant as long as JVMCI is enabled. The configure script also states that both compilers are possible (if you configure with --with-jvm-features='jvmci'): configure: error: Specified JVM feature 'jvmci' requires feature 'compiler2' or 'compiler1' Should maybe the requires tag "vm.jvmci" be used as well, like: @requires vm.flavor == "server" | vm.jvmci > this is the known limitation of jtreg/@requires, and our current way to > workaround it is to split a test description based on @requires values Yes, if the @requires tag is used, splitting up the test looks like a good idea. I didn't know that it is possible to have multiple test descriptions in one test file. I created a new webrev with the new ideas: https://cr.openjdk.java.net/~cgo/8231954/webrev.01/ I tested with an amd64 client and server VM and it looks good. I am currently unable to build a client VM with JVMCI enabled, hence no test for that yet. I get compile errors and as soon as I resolve those, runtime errors occur. Before I look into that, I would like to know if client VMs with JVMCI enabled are supported or not. Thanks, Christoph [1] https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/share/opto/c2_globals.hpp [2] https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/share/jvmci/jvmci_globals.hpp From nils.eliasson at oracle.com Wed Nov 13 16:08:10 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 13 Nov 2019 17:08:10 +0100 Subject: RFR: 8234003: Improve IndexSet iteration In-Reply-To: <0f55bfa2-f6e7-9c9e-e10f-a97a909eeb98@oracle.com> References: <0f55bfa2-f6e7-9c9e-e10f-a97a909eeb98@oracle.com> Message-ID: <7edbcedf-b24c-fd8b-9aac-f3a87aaf1134@oracle.com> Hi Claes, Thanks for cleaning up some of the surrounding code too. Looks good, Nils Eliasson On 2019-11-13 00:09, Claes Redestad wrote: > Hi, > > a significant portion of work done during register allocation in C2 is > iterating over IndexSets. > > A few small optimizations show a ~4% decrease in instructions retired by > register allocation when instrumenting, and up to 3% fewer instructions > retired in total on startup tests. > > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8234003 > Webrev: http://cr.openjdk.java.net/~redestad/8234003/open.00/ > > The biggest improvement comes from avoiding iterating over empty sets > altogether. A smaller improvement from adding a water mark to avoid > iterating over all the blocks in the IndexSet. > > Testing: tier1-3, verified improvements on large and tiny startup tests, > checked that any increased inlining is footprint neutral on Linux. > > Thanks! > > /Claes > From nils.eliasson at oracle.com Wed Nov 13 16:12:57 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 13 Nov 2019 17:12:57 +0100 Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 check In-Reply-To: References: Message-ID: <8d451a2e-fb94-8d59-e02e-e3182e115a0b@oracle.com> Hi Patric, Looks good! (I have pre-reviewed this patch offline) Regards, Nils On 2019-11-12 15:16, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ > > 8220376: C2: Int >0 not recognized as !=0 for div by 0 check > > ??? Adding a simple subsumption test to IfNode::Ideal to enable a local > ??? short-circuit for (obviously) redundant if-nodes. > > Testing: hs-tier1-4, hs-precheckin-comp > > > Best regards, > Patric > From leonid.mesnik at oracle.com Wed Nov 13 16:42:18 2019 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Wed, 13 Nov 2019 08:42:18 -0800 Subject: RFC 8233915: JVMTI FollowReferences: Java Heap Leak not found because of C2 Scalar Replacement In-Reply-To: References: <729138cc-7a21-cf79-947c-c6a68f34237a@oracle.com> Message-ID: Thank you for fixing this. Leonid On 11/13/19 4:24 AM, Reingruber, Richard wrote: > Hi Leonid, > > these are valid points. Thanks for making me aware of them. > > I've increased the maximum heap size in my tests as suggested, and I've also run them with ZGC > enabled. > > I've also added the vm.opt.TieredCompilation != true requirement. > > I've done the changes in place. > > Thanks, Richard. > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Leonid Mesnik > Sent: Dienstag, 12. November 2019 20:34 > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFC 8233915: JVMTI FollowReferences: Java Heap Leak not found because of C2 Scalar Replacement > > Hi > > I don't make complete review just sanity verified your test headers. I > see a couple of potential issues with them. > > 1) The using Xmx32M could cause OOME failures if test is executed with > ZGC. I think that at least 256M should be set. Could you please verify > that your tests pass with ZGC enabled. > > > 2) I think it makes sense to add requires > > vm.opt.TieredCompilation != true > > to just skip tests if anyone runs them with tiered compilation disabled > explicitly. > > Leonid > > On 11/11/19 7:29 AM, Reingruber, Richard wrote: >> Hi, >> >> I have created https://bugs.openjdk.java.net/browse/JDK-8233915 >> >> In short, a set of live objects L is not found using JVMTI FollowReferences() if L is only reachable >> from a scalar replaced object in a frame of a C2 compiled method. If L happens to be a growing leak, >> then a dynamically loaded JVMTI agent (note: can_tag_objects is an always capability) for heap >> diagnostics won't discover L as live and it won't be able to find root references that lead to L. >> >> I'd like to suggest the implementation for the proposed enhancement JDK-8227745 as bug-fix. >> >> RFE: https://bugs.openjdk.java.net/browse/JDK-8227745 >> Webrev(*): http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.1/ >> >> Please comment on the suggestion. Dou you see other solutions that allow an agent to discover the >> chain of references to L? >> >> I'd like to work on the complexity as well. One significant simplification could be, if it was >> possible to reallocate scalar replaced objects at safepoints (i.e. allow the VM thread to call >> Deoptimization::realloc_objects()). The GC interface does not seem to allow this. >> >> Thanks, Richard. >> >> (*) Not yet accepted, because deemed too complex for the performance gain. Note that I was able to >> reduce webrev.1 in size compared to webrev.0 From igor.ignatyev at oracle.com Wed Nov 13 19:11:14 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 13 Nov 2019 11:11:14 -0800 Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs. In-Reply-To: <2w8hcs8xd1-1@aserp2030.oracle.com> References: <20191112120936.1D826D285F@aojmv0009> <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com> <2w8hcs8xd1-1@aserp2030.oracle.com> Message-ID: <7BEE375F-312E-4EE5-B0FA-2223752689A3@oracle.com> @Christoph, webrev.01 looks good to me. I always thought that jvmci feature can be built only when compiler2 feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp suggests that jvmci can be used w/o compiler2; I don't think we have ever build/test, let alone support, this configuration. @Vladimir, did/do we plan to support compiler1 + jvmci w/o compiler2 configuration? Thanks, -- Igor > On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com wrote: > > Hi Igor, > > thanks for your explanation. > > Igor Ignatyev wrote on 2019-11-12 20:40:46: > >> we are trying to get rid of IgnoreUnrecognizedVMOptions in our tests, as >> in most cases, it causes wasted compute time (as in this test) and can >> also lead to wrong/deprecated/deleted flags sneaking into the testbase > > Agreed. I also wanted to discuss this, since I think that your solution > is better than mine, but at the same time, I saw possible problems with > it, see below. > >> as '@requires vm.flavor == "server"' filters configurations based vm >> build type, it will still allow execution on JVM w/ JVMCI and when JVMCI >> compiler is selected, as it will still be Server VM build. so, in a >> sense, the test will be w/ JVMCI in the same way as w/ your approach. > > My concern is not about server VMs with JVMCI, but client VMs with JVMCI > enabled. Is this a valid configuration? The MaxVectorSize option is > defined in [1] as well as in [2], so for me it looks like MaxVectorSize > can be used for any VM variant as long as JVMCI is enabled. The > configure script also states that both compilers are possible (if you > configure with --with-jvm-features='jvmci'): > > configure: error: Specified JVM feature 'jvmci' requires feature > 'compiler2' or 'compiler1' > > Should maybe the requires tag "vm.jvmci" be used as well, like: > > @requires vm.flavor == "server" | vm.jvmci > >> this is the known limitation of jtreg/@requires, and our current way to >> workaround it is to split a test description based on @requires values > > Yes, if the @requires tag is used, splitting up the test looks like a good > idea. I didn't know that it is possible to have multiple test descriptions > in one test file. > > I created a new webrev with the new ideas: > > https://cr.openjdk.java.net/~cgo/8231954/webrev.01/ > > I tested with an amd64 client and server VM and it looks good. I am > currently unable to build a client VM with JVMCI enabled, hence no test > for that yet. I get compile errors and as soon as I resolve those, > runtime errors occur. Before I look into that, I would like to know if > client VMs with JVMCI enabled are supported or not. > > Thanks, > Christoph > > [1] > https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/share/opto/c2_globals.hpp > > [2] > https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/share/jvmci/jvmci_globals.hpp > From martin.doerr at sap.com Wed Nov 13 19:26:43 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 13 Nov 2019 19:26:43 +0000 Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 check In-Reply-To: <8d451a2e-fb94-8d59-e02e-e3182e115a0b@oracle.com> References: <8d451a2e-fb94-8d59-e02e-e3182e115a0b@oracle.com> Message-ID: Hi Patric, thanks for addressing this issue. There seems to be a small issue with the webrev: - if (bol->is_Bool()) { + if (!bol->is_Bool()) { I guess you have tested it already this way and just something with the webrev went wrong. You recognize and transform a specific pattern you have described in the comment. It is appropriate for fixing this example and it looks good to me. But I'm curious how often this matches. Maybe we can see a performance improvement. Thanks and best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Nils Eliasson > Sent: Mittwoch, 13. November 2019 17:13 > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 > check > > Hi Patric, > > Looks good! > > (I have pre-reviewed this patch offline) > > Regards, > > Nils > > On 2019-11-12 15:16, Patric Hedlin wrote: > > Dear all, > > > > I would like to ask for help to review the following change/update: > > > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376 > > Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ > > > > 8220376: C2: Int >0 not recognized as !=0 for div by 0 check > > > > ??? Adding a simple subsumption test to IfNode::Ideal to enable a local > > ??? short-circuit for (obviously) redundant if-nodes. > > > > Testing: hs-tier1-4, hs-precheckin-comp > > > > > > Best regards, > > Patric > > From igor.ignatyev at oracle.com Wed Nov 13 19:30:44 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 13 Nov 2019 11:30:44 -0800 Subject: RFR(S) : 8225756 : [testbug] compiler/loopstripmining/CheckLoopStripMining.java sets too short a SafepointTimeoutDelay In-Reply-To: References: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com> <87a790dtfz.fsf@redhat.com> Message-ID: @Martin, I've updated the test according to your comments and also added 'to prevent biased locking handshakes from changing the timing' comment preceding '-XX:-UseBiasedLocking'. @all, thanks for your review, pushed. -- Igor > On Nov 13, 2019, at 3:37 AM, Doerr, Martin wrote: > > Hi Igor, > > thanks for improving it. > > Please note that this test was derived from > test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.java > JDK-8227528 has added -XX:-UseBiasedLocking. > > Would you mind adding that to your new version, too? > > Please also remove double-whitespace before Utils.adjustTimeout(500). > > Thanks and best regards, > Martin > > >> -----Original Message----- >> From: hotspot-compiler-dev > bounces at openjdk.java.net> On Behalf Of Roland Westrelin >> Sent: Mittwoch, 13. November 2019 10:20 >> To: Igor Ignatyev ; hotspot compiler > compiler-dev at openjdk.java.net> >> Subject: Re: RFR(S) : 8225756 : [testbug] >> compiler/loopstripmining/CheckLoopStripMining.java sets too short a >> SafepointTimeoutDelay >> >> >> Looks reasonable to me. >> >> Thanks for fixing this, Igor. >> >> Roland. > From vladimir.kozlov at oracle.com Wed Nov 13 19:32:18 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 13 Nov 2019 11:32:18 -0800 Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs. In-Reply-To: <7BEE375F-312E-4EE5-B0FA-2223752689A3@oracle.com> References: <20191112120936.1D826D285F@aojmv0009> <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com> <2w8hcs8xd1-1@aserp2030.oracle.com> <7BEE375F-312E-4EE5-B0FA-2223752689A3@oracle.com> Message-ID: <71204849-979a-bd2e-b302-69ac81a44bca@oracle.com> On 11/13/19 11:11 AM, Igor Ignatyev wrote: > @Christoph, > > webrev.01 looks good to me. > I always thought that jvmci feature can be built only when compiler2 feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp suggests that jvmci can be used w/o compiler2; I don't think we have ever build/test, let alone support, this configuration. > > @Vladimir, > did/do we plan to support compiler1 + jvmci w/o compiler2 configuration? Yes. It could be configuration when we start looking on replacing C1 with Graal. I think several people were interested in "Client VM" like configuration. Also Server configuration without C2 (with Graal or other jvmci compiler) which would be out configuration in a future. But I would prefer to be more explicit in these changes: @requires vm.compiler2.enabled | vm.graal.enabled Thanks, Vladimir > > Thanks, > -- Igor > >> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com wrote: >> >> Hi Igor, >> >> thanks for your explanation. >> >> Igor Ignatyev wrote on 2019-11-12 20:40:46: >> >>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our tests, as >>> in most cases, it causes wasted compute time (as in this test) and can >>> also lead to wrong/deprecated/deleted flags sneaking into the testbase >> >> Agreed. I also wanted to discuss this, since I think that your solution >> is better than mine, but at the same time, I saw possible problems with >> it, see below. >> >>> as '@requires vm.flavor == "server"' filters configurations based vm >>> build type, it will still allow execution on JVM w/ JVMCI and when JVMCI >>> compiler is selected, as it will still be Server VM build. so, in a >>> sense, the test will be w/ JVMCI in the same way as w/ your approach. >> >> My concern is not about server VMs with JVMCI, but client VMs with JVMCI >> enabled. Is this a valid configuration? The MaxVectorSize option is >> defined in [1] as well as in [2], so for me it looks like MaxVectorSize >> can be used for any VM variant as long as JVMCI is enabled. The >> configure script also states that both compilers are possible (if you >> configure with --with-jvm-features='jvmci'): >> >> configure: error: Specified JVM feature 'jvmci' requires feature >> 'compiler2' or 'compiler1' >> >> Should maybe the requires tag "vm.jvmci" be used as well, like: >> >> @requires vm.flavor == "server" | vm.jvmci >> >>> this is the known limitation of jtreg/@requires, and our current way to >>> workaround it is to split a test description based on @requires values >> >> Yes, if the @requires tag is used, splitting up the test looks like a good >> idea. I didn't know that it is possible to have multiple test descriptions >> in one test file. >> >> I created a new webrev with the new ideas: >> >> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/ >> >> I tested with an amd64 client and server VM and it looks good. I am >> currently unable to build a client VM with JVMCI enabled, hence no test >> for that yet. I get compile errors and as soon as I resolve those, >> runtime errors occur. Before I look into that, I would like to know if >> client VMs with JVMCI enabled are supported or not. >> >> Thanks, >> Christoph >> >> [1] >> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/share/opto/c2_globals.hpp >> >> [2] >> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/share/jvmci/jvmci_globals.hpp >> > From john.r.rose at oracle.com Wed Nov 13 21:23:35 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 13 Nov 2019 13:23:35 -0800 Subject: final field values should be trusted as constant (filed as JDK-8233873) In-Reply-To: <69de5d88-2487-850b-5388-6363d88d2b5b@cs.oswego.edu> References: <69de5d88-2487-850b-5388-6363d88d2b5b@cs.oswego.edu> Message-ID: <06DC8BED-60F2-438C-82FD-1578B7215D80@oracle.com> On Nov 9, 2019, at 11:21 AM, Doug Lea
wrote: > > On 11/8/19 8:24 PM, John Rose wrote: > >> ## Side note on races >> >> Although race conditions (on non-volatile fields) allow the JVM some >> latitute to return "stale" values for field references, such latitude >> would usually be quite narrow, since an execution of the invalid >> optimized method is likely to occur downstream of the invalidating >> field update (as determined by the happens-before relation of the >> JMM). > > Ever since initial revisions of JLS1 version, the intent of JMM specs > (including current) is to allow compilers to believe that the value they > see in initial reads of a final field is the only value they will ever > see. So no revision is necessary on these grounds (although one of these > days there will be one that accommodates VarHandle modes etc, > formalizing http://gee.cs.oswego.edu/dl/html/j9mm.html). Some of the > spec messiness exists just to explain why compilers are allowed not to > believe this as well, because of reflection etc. > > In other words, don't let JMM concerns stop you from this worthwhile effort. Thanks for the encouragement, Doug. The JMM probably doesn?t require enhancement, but we do need some elucidation of dark corners around settings of final fields outside if constructors. Maybe it?s already been written; I?m sure it?s already been thought about, by you at least. Setting a final inside a constructor is not problematic, as long as a JIT can prove statically that a (non-static) field it proposes to constant fold is in an instance whose constructor has returned (normally). This all by itself is tricky to do (see also B, C below), but let?s set that aside. Also tricky is smashing of finals of normally-constructed objects, which is allowed by the JVM. This is allowed even if heinous. We need some Big Hammer to turn off optimization when it happens. If we were really slick we might try to amend the JMM to declare that the JIT can retain a previously folded value, even after somebody smashed it, on the grounds that this is a valid race condition, where the JITted code is racily reading back in time. (Is this already true?? Probably not. It is worth considering as a JMM enhancement; the idea is of races which are allowed to retain sticky values that would ordinarily be updated. I?m thinking @Stable and final both.) But set that aside for now also. Here?s the hard part, I think. When an instance is born without running a constructor, then we know somebody did something off-label to set any non-static final fields it has. This means the JMM makes no guarantees about safe publication of those fields. This in turn means somebody must do something: A. Give up on all static final fields, because somebody might smash a properly constructed object. This is our decade-plus status quo, which I?m very tired of. B. Have the JVM leave enough clues to distinguish properly constructed objects from the off-label ones. (This is the ?larval/adult? distinction.) Then the JIT can optimize only the normal cases and avoid the others. C. Demand that frameworks which make off-label objects be upgraded to include a memory fence after they are done setting their finals, so that they conform to the JMM behaviors required of regular objects. Make the fence be checkable by the JIT so it can see when the framework has done its duty, so the JIT can treat the object normally. D. Have the JMM make special rules for final fields, that their values are ?sticker? or more ?memorable? than normal fields, so races can see old values *if the JVM want to*. E. Some balanced combination of the above. I.e., give frameworks a carrot for upgrading their construction sequences with better performance, and a stick of emitting log entries or warnings when they spew out objects in the larval state. F. Your idea here, please! ? John From ekaterina.pavlova at oracle.com Wed Nov 13 21:28:58 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Wed, 13 Nov 2019 13:28:58 -0800 Subject: RFR(T/XS) 8215728: [Graal] we should run some Graal tests in tier1 Message-ID: Hi, please review this small patch which defines jtreg test group 'tier1_compiler_graal' so we can run it as part of tier1 testing. compiler/graalunit/HotspotTest.java is the most frequent failed test based on bugs statistics, so I included only this group of tests for now. They also takes reasonable amount of time to execute. JBS: https://bugs.openjdk.java.net/browse/JDK-8215728 webrev: http://cr.openjdk.java.net/~epavlova//8215728/webrev.00/index.html testing: tier1 thanks, -katya From igor.ignatyev at oracle.com Wed Nov 13 21:30:59 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 13 Nov 2019 13:30:59 -0800 Subject: RFR(T/XS) 8215728: [Graal] we should run some Graal tests in tier1 In-Reply-To: References: Message-ID: Hi Katya, shouldn't this group be also into tier1_compiler group? -- Igor > On Nov 13, 2019, at 1:28 PM, Ekaterina Pavlova wrote: > > Hi, > > please review this small patch which defines jtreg test group 'tier1_compiler_graal' so we can run it > as part of tier1 testing. compiler/graalunit/HotspotTest.java is the most frequent failed test based on > bugs statistics, so I included only this group of tests for now. They also takes reasonable amount of > time to execute. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8215728 > webrev: http://cr.openjdk.java.net/~epavlova//8215728/webrev.00/index.html > testing: tier1 > > > thanks, > -katya From ekaterina.pavlova at oracle.com Wed Nov 13 21:38:19 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Wed, 13 Nov 2019 13:38:19 -0800 Subject: RFR(T/XS) 8215728: [Graal] we should run some Graal tests in tier1 In-Reply-To: References: Message-ID: <9d853b2a-05b8-b5eb-dbe7-f6a00396269b@oracle.com> The tests require -XX:+EnableJVMCI to be run with, so this is why I created separate group. On 11/13/19 1:30 PM, Igor Ignatyev wrote: > Hi Katya, > > shouldn't this group be also into tier1_compiler group? > > -- Igor > >> On Nov 13, 2019, at 1:28 PM, Ekaterina Pavlova wrote: >> >> Hi, >> >> please review this small patch which defines jtreg test group 'tier1_compiler_graal' so we can run it >> as part of tier1 testing. compiler/graalunit/HotspotTest.java is the most frequent failed test based on >> bugs statistics, so I included only this group of tests for now. They also takes reasonable amount of >> time to execute. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8215728 >> webrev: http://cr.openjdk.java.net/~epavlova//8215728/webrev.00/index.html >> testing: tier1 >> >> >> thanks, >> -katya > From igor.ignatyev at oracle.com Wed Nov 13 21:45:55 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 13 Nov 2019 13:45:55 -0800 Subject: RFR(T/XS) 8215728: [Graal] we should run some Graal tests in tier1 In-Reply-To: <9d853b2a-05b8-b5eb-dbe7-f6a00396269b@oracle.com> References: <9d853b2a-05b8-b5eb-dbe7-f6a00396269b@oracle.com> Message-ID: <72C6DC4D-1FD2-4710-8982-B08BCDA9023B@oracle.com> all tier1 groups are expected to runnable as-is, so I think we need to update GraalUnitTestLauncher to pass -XX:+EnableJVMCI to the spawn JVM, and updated the test descriptions (and generateTests.sh) to require JVM w/ jvmci feature (@requires vm.jvmci)? then this will be a proper tier1 group. -- Igor > On Nov 13, 2019, at 1:38 PM, Ekaterina Pavlova wrote: > > The tests require -XX:+EnableJVMCI to be run with, so this is why I created separate group. > > On 11/13/19 1:30 PM, Igor Ignatyev wrote: >> Hi Katya, >> shouldn't this group be also into tier1_compiler group? >> -- Igor >>> On Nov 13, 2019, at 1:28 PM, Ekaterina Pavlova wrote: >>> >>> Hi, >>> >>> please review this small patch which defines jtreg test group 'tier1_compiler_graal' so we can run it >>> as part of tier1 testing. compiler/graalunit/HotspotTest.java is the most frequent failed test based on >>> bugs statistics, so I included only this group of tests for now. They also takes reasonable amount of >>> time to execute. >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8215728 >>> webrev: http://cr.openjdk.java.net/~epavlova//8215728/webrev.00/index.html >>> testing: tier1 >>> >>> >>> thanks, >>> -katya > From dean.long at oracle.com Thu Nov 14 04:15:40 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 13 Nov 2019 20:15:40 -0800 Subject: RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: Message-ID: Hi Pengfei, I took a quick look and didn't notice any problems.? Nice work! This seems to match the x64 approach, however please get other reviews. dl On 11/13/19 1:55 AM, Pengfei Li (Arm Technology China) wrote: > Hi, > > JBS: https://bugs.openjdk.java.net/browse/JDK-8233743 > Webrev: http://cr.openjdk.java.net/~pli/rfr/8233743/webrev.00/ > > This is a follow-up patch of JDK-8217909[1] to make the AArch64 register > r27 allocatable when CompressedOops and CompressedClassPointers are both > turned off. > > Below changes have been made: > - Massage the RegMask(s) in reg_mask_init() at C2 initialization and > remove r27 from some of the masks conditionally to make it allocatable. > - Also make r29 conditionally reserved in this similar way. > - Make r29 allocatable for pointers as well as integers. > - Replace an rheapbase use to rscratch1 in AArch64 ZGC. > - Revert JDK-8231754[2] which makes r27 always reserved in JVMCI. > > This patch aligns with the implementation in [1] which makes the x86_64 > r12 register allocatable. Please let me know if I have missed anything > for AArch64. > > Tests: > Full jtreg with default options and extra options "-XX:-UseCompressedOops > -XX:+PreserveFramePointer". No new failure is found. > > [1] https://hg.openjdk.java.net/jdk/jdk/rev/48b50573dee4 > [2] https://hg.openjdk.java.net/jdk/jdk/rev/d068b1e534de > > -- > Thanks, > Pengfei > From ekaterina.pavlova at oracle.com Thu Nov 14 04:52:10 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Wed, 13 Nov 2019 20:52:10 -0800 Subject: RFR(T/XS) 8215728: [Graal] we should run some Graal tests in tier1 In-Reply-To: <72C6DC4D-1FD2-4710-8982-B08BCDA9023B@oracle.com> References: <9d853b2a-05b8-b5eb-dbe7-f6a00396269b@oracle.com> <72C6DC4D-1FD2-4710-8982-B08BCDA9023B@oracle.com> Message-ID: <93eb8e6c-9b36-5797-4445-635db0c0e509@oracle.com> -XX:+EnableJVMCI is already passed to the spawn JVM by GraalUnitTestLauncher. However -XX:+EnableJVMCI is also required for GraalUnitTestLauncher itself so getModuleExports() function works properly for graal modules. Also note that we can't pass '-XX:+EnableJVMCI' to GraalUnitTestLauncher in jtreg directive as we use '@run main compiler.graalunit.common.GraalUnitTestLauncher' to launch it. See also discussion regarding this issue in JDK-8216551. Anyway, I understand the point regarding tier1 and will see what can be done. thanks, -katya On 11/13/19 1:45 PM, Igor Ignatyev wrote: > all tier1 groups are expected to runnable as-is, so I think we need to update GraalUnitTestLauncher to pass -XX:+EnableJVMCI to the spawn JVM, > and updated the test descriptions (and generateTests.sh) to require JVM w/ jvmci feature (@requires vm.jvmci)? then this will be a proper tier1 group. > > -- Igor > >> On Nov 13, 2019, at 1:38 PM, Ekaterina Pavlova wrote: >> >> The tests require -XX:+EnableJVMCI to be run with, so this is why I created separate group. >> >> On 11/13/19 1:30 PM, Igor Ignatyev wrote: >>> Hi Katya, >>> shouldn't this group be also into tier1_compiler group? >>> -- Igor >>>> On Nov 13, 2019, at 1:28 PM, Ekaterina Pavlova wrote: >>>> >>>> Hi, >>>> >>>> please review this small patch which defines jtreg test group 'tier1_compiler_graal' so we can run it >>>> as part of tier1 testing. compiler/graalunit/HotspotTest.java is the most frequent failed test based on >>>> bugs statistics, so I included only this group of tests for now. They also takes reasonable amount of >>>> time to execute. >>>> >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8215728 >>>> webrev: http://cr.openjdk.java.net/~epavlova//8215728/webrev.00/index.html >>>> testing: tier1 >>>> >>>> >>>> thanks, >>>> -katya >> > From igor.ignatyev at oracle.com Thu Nov 14 04:59:37 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 13 Nov 2019 20:59:37 -0800 Subject: RFR(T/XS) 8215728: [Graal] we should run some Graal tests in tier1 In-Reply-To: <93eb8e6c-9b36-5797-4445-635db0c0e509@oracle.com> References: <9d853b2a-05b8-b5eb-dbe7-f6a00396269b@oracle.com> <72C6DC4D-1FD2-4710-8982-B08BCDA9023B@oracle.com> <93eb8e6c-9b36-5797-4445-635db0c0e509@oracle.com> Message-ID: if GraalUnitTestLauncher has to be run w/ -XX:+EnableJVMCI, I guess we just have to switch back to othervm mode. -- Igor > On Nov 13, 2019, at 8:52 PM, Ekaterina Pavlova wrote: > > -XX:+EnableJVMCI is already passed to the spawn JVM by GraalUnitTestLauncher. > However -XX:+EnableJVMCI is also required for GraalUnitTestLauncher itself so > getModuleExports() function works properly for graal modules. > Also note that we can't pass '-XX:+EnableJVMCI' to GraalUnitTestLauncher in jtreg > directive as we use '@run main compiler.graalunit.common.GraalUnitTestLauncher' to launch it. > See also discussion regarding this issue in JDK-8216551. > > Anyway, I understand the point regarding tier1 and will see what can be done. > > thanks, > -katya > > On 11/13/19 1:45 PM, Igor Ignatyev wrote: >> all tier1 groups are expected to runnable as-is, so I think we need to update GraalUnitTestLauncher to pass -XX:+EnableJVMCI to the spawn JVM, and updated the test descriptions (and generateTests.sh) to require JVM w/ jvmci feature (@requires vm.jvmci)? then this will be a proper tier1 group. >> -- Igor >>> On Nov 13, 2019, at 1:38 PM, Ekaterina Pavlova wrote: >>> >>> The tests require -XX:+EnableJVMCI to be run with, so this is why I created separate group. >>> >>> On 11/13/19 1:30 PM, Igor Ignatyev wrote: >>>> Hi Katya, >>>> shouldn't this group be also into tier1_compiler group? >>>> -- Igor >>>>> On Nov 13, 2019, at 1:28 PM, Ekaterina Pavlova wrote: >>>>> >>>>> Hi, >>>>> >>>>> please review this small patch which defines jtreg test group 'tier1_compiler_graal' so we can run it >>>>> as part of tier1 testing. compiler/graalunit/HotspotTest.java is the most frequent failed test based on >>>>> bugs statistics, so I included only this group of tests for now. They also takes reasonable amount of >>>>> time to execute. >>>>> >>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8215728 >>>>> webrev: http://cr.openjdk.java.net/~epavlova//8215728/webrev.00/index.html >>>>> testing: tier1 >>>>> >>>>> >>>>> thanks, >>>>> -katya >>> > From dean.long at oracle.com Thu Nov 14 05:12:22 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 13 Nov 2019 21:12:22 -0800 Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 check In-Reply-To: References: Message-ID: <6f1dbc0a-b1a4-43e8-65e1-1e0df2115c33@oracle.com> Hi Patric.? I was expecting the fix to allow the following existing logic in DivINode::Ideal to work: ? // Check for excluding div-zero case ? if (in(0) && (ti->_hi < 0 || ti->_lo > 0)) { ??? set_req(0, NULL);?????????? // Yank control input ??? return this; ? } by making sure the range of "ti" has been sharpened by the previous if-node.? I was just wondering if you looked at that solution and thought it was feasible.? I see Parse::sharpen_type_after_if() is almost doing the right thing, but only handles BoolTest::eq. dl On 11/12/19 6:16 AM, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ > > 8220376: C2: Int >0 not recognized as !=0 for div by 0 check > > ??? Adding a simple subsumption test to IfNode::Ideal to enable a local > ??? short-circuit for (obviously) redundant if-nodes. > > Testing: hs-tier1-4, hs-precheckin-comp > > > Best regards, > Patric > From vladimir.x.ivanov at oracle.com Thu Nov 14 08:31:12 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 14 Nov 2019 11:31:12 +0300 Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 check In-Reply-To: References: Message-ID: <82e3e8df-e6dc-5638-73a4-c5738b33fdad@oracle.com> > Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ I just briefly looked through the patch and have a quick high-level question: + // Rewrite: + // cmp cmp + // / \ | + // (r1) bool \ bool (r1) + // / bool (r2) \ + // (dom) if \ ==> if + // \ ) \ + // (pre) if[TF] / if[TF]X + // \ / + // if (this) + // / \ + // ifT ifF [X] Why do you do complex graph surgery instead of simply adjusting condition at redundant If (to 0/1) and let existing logic to eliminate it? Best regards, Vladimir Ivanov > > 8220376: C2: Int >0 not recognized as !=0 for div by 0 check > > ??? Adding a simple subsumption test to IfNode::Ideal to enable a local > ??? short-circuit for (obviously) redundant if-nodes. > > Testing: hs-tier1-4, hs-precheckin-comp > > > Best regards, > Patric > From tobias.hartmann at oracle.com Thu Nov 14 09:08:25 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 14 Nov 2019 10:08:25 +0100 Subject: RFR: 8234003: Improve IndexSet iteration In-Reply-To: <0f55bfa2-f6e7-9c9e-e10f-a97a909eeb98@oracle.com> References: <0f55bfa2-f6e7-9c9e-e10f-a97a909eeb98@oracle.com> Message-ID: <48900cd6-3616-7e5c-dd26-f8475f4f0eb4@oracle.com> Hi Claes, nice cleanup, looks good to me! Just noticed that you've (intentionally?) changed the indentation of the comment in live.cpp:259. Best regards, Tobias On 13.11.19 00:09, Claes Redestad wrote: > Hi, > > a significant portion of work done during register allocation in C2 is > iterating over IndexSets. > > A few small optimizations show a ~4% decrease in instructions retired by > register allocation when instrumenting, and up to 3% fewer instructions > retired in total on startup tests. > > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8234003 > Webrev: http://cr.openjdk.java.net/~redestad/8234003/open.00/ > > The biggest improvement comes from avoiding iterating over empty sets > altogether. A smaller improvement from adding a water mark to avoid > iterating over all the blocks in the IndexSet. > > Testing: tier1-3, verified improvements on large and tiny startup tests, > checked that any increased inlining is footprint neutral on Linux. > > Thanks! > > /Claes > From martin.doerr at sap.com Thu Nov 14 09:29:08 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 14 Nov 2019 09:29:08 +0000 Subject: RFR(S) : 8225756 : [testbug] compiler/loopstripmining/CheckLoopStripMining.java sets too short a SafepointTimeoutDelay In-Reply-To: References: <800A7F02-2372-4837-A5DF-D0B513A2280F@oracle.com> <87a790dtfz.fsf@redhat.com> Message-ID: Excellent. Thank you. Martin > -----Original Message----- > From: Igor Ignatyev > Sent: Mittwoch, 13. November 2019 20:31 > To: Doerr, Martin ; Ekaterina Pavlova > ; Vladimir Kozlov > ; Roland Westrelin > Cc: hotspot compiler > Subject: Re: RFR(S) : 8225756 : [testbug] > compiler/loopstripmining/CheckLoopStripMining.java sets too short a > SafepointTimeoutDelay > > @Martin, > I've updated the test according to your comments and also added 'to prevent > biased locking handshakes from changing the timing' comment preceding '- > XX:-UseBiasedLocking'. > > @all, thanks for your review, pushed. > > -- Igor > > > > On Nov 13, 2019, at 3:37 AM, Doerr, Martin > wrote: > > > > Hi Igor, > > > > thanks for improving it. > > > > Please note that this test was derived from > > > test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.ja > va > > JDK-8227528 has added -XX:-UseBiasedLocking. > > > > Would you mind adding that to your new version, too? > > > > Please also remove double-whitespace before Utils.adjustTimeout(500). > > > > Thanks and best regards, > > Martin > > > > > >> -----Original Message----- > >> From: hotspot-compiler-dev >> bounces at openjdk.java.net> On Behalf Of Roland Westrelin > >> Sent: Mittwoch, 13. November 2019 10:20 > >> To: Igor Ignatyev ; hotspot compiler > >> compiler-dev at openjdk.java.net> > >> Subject: Re: RFR(S) : 8225756 : [testbug] > >> compiler/loopstripmining/CheckLoopStripMining.java sets too short a > >> SafepointTimeoutDelay > >> > >> > >> Looks reasonable to me. > >> > >> Thanks for fixing this, Igor. > >> > >> Roland. > > From aph at redhat.com Thu Nov 14 10:40:53 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 14 Nov 2019 10:40:53 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: Message-ID: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> On 11/13/19 9:55 AM, Pengfei Li (Arm Technology China) wrote: > This patch aligns with the implementation in [1] which makes the x86_64 > r12 register allocatable. Please let me know if I have missed anything > for AArch64. We don't generally use r27 for compressed class pointers. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From christoph.goettschkes at microdoc.com Thu Nov 14 11:20:33 2019 From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com) Date: Thu, 14 Nov 2019 12:20:33 +0100 Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs. In-Reply-To: <71204849-979a-bd2e-b302-69ac81a44bca@oracle.com> References: <20191112120936.1D826D285F@aojmv0009> <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com> <2w8hcs8xd1-1@aserp2030.oracle.com> <7BEE375F-312E-4EE5-B0FA-2223752689A3@oracle.com> <71204849-979a-bd2e-b302-69ac81a44bca@oracle.com> Message-ID: Thanks for your feedback, this resolves my concerns and I am happy with the solution. I integrated the suggestions from Vladimir, here is the latest webrev: https://cr.openjdk.java.net/~cgo/8231954/webrev.02/ I re-tested and it works as expected. Please give your consent if this is fine for you as well. -- Christoph Vladimir Kozlov wrote on 2019-11-13 20:32:18: > From: Vladimir Kozlov > To: Igor Ignatyev , christoph.goettschkes at microdoc.com > Cc: hotspot compiler > Date: 2019-11-13 20:32 > Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/ > TestCharVect2.java only works with server VMs. > > On 11/13/19 11:11 AM, Igor Ignatyev wrote: > > @Christoph, > > > > webrev.01 looks good to me. > > I always thought that jvmci feature can be built only when compiler2 > feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp > suggests that jvmci can be used w/o compiler2; I don't think we have > ever build/test, let alone support, this configuration. > > > > @Vladimir, > > did/do we plan to support compiler1 + jvmci w/o compiler2 configuration? > > Yes. It could be configuration when we start looking on replacing C1 > with Graal. I think several people were interested > in "Client VM" like configuration. > Also Server configuration without C2 (with Graal or other jvmci > compiler) which would be out configuration in a future. > > But I would prefer to be more explicit in these changes: > > @requires vm.compiler2.enabled | vm.graal.enabled > > Thanks, > Vladimir > > > > > Thanks, > > -- Igor > > > >> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com wrote: > >> > >> Hi Igor, > >> > >> thanks for your explanation. > >> > >> Igor Ignatyev wrote on 2019-11-12 20:40:46: > >> > >>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our tests, as > >>> in most cases, it causes wasted compute time (as in this test) and can > >>> also lead to wrong/deprecated/deleted flags sneaking into the testbase > >> > >> Agreed. I also wanted to discuss this, since I think that your solution > >> is better than mine, but at the same time, I saw possible problems with > >> it, see below. > >> > >>> as '@requires vm.flavor == "server"' filters configurations based vm > >>> build type, it will still allow execution on JVM w/ JVMCI and when JVMCI > >>> compiler is selected, as it will still be Server VM build. so, in a > >>> sense, the test will be w/ JVMCI in the same way as w/ your approach. > >> > >> My concern is not about server VMs with JVMCI, but client VMs with JVMCI > >> enabled. Is this a valid configuration? The MaxVectorSize option is > >> defined in [1] as well as in [2], so for me it looks like MaxVectorSize > >> can be used for any VM variant as long as JVMCI is enabled. The > >> configure script also states that both compilers are possible (if you > >> configure with --with-jvm-features='jvmci'): > >> > >> configure: error: Specified JVM feature 'jvmci' requires feature > >> 'compiler2' or 'compiler1' > >> > >> Should maybe the requires tag "vm.jvmci" be used as well, like: > >> > >> @requires vm.flavor == "server" | vm.jvmci > >> > >>> this is the known limitation of jtreg/@requires, and our current way to > >>> workaround it is to split a test description based on @requires values > >> > >> Yes, if the @requires tag is used, splitting up the test looks like a good > >> idea. I didn't know that it is possible to have multiple test descriptions > >> in one test file. > >> > >> I created a new webrev with the new ideas: > >> > >> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/ > >> > >> I tested with an amd64 client and server VM and it looks good. I am > >> currently unable to build a client VM with JVMCI enabled, hence no test > >> for that yet. I get compile errors and as soon as I resolve those, > >> runtime errors occur. Before I look into that, I would like to know if > >> client VMs with JVMCI enabled are supported or not. > >> > >> Thanks, > >> Christoph > >> > >> [1] > >> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/ > share/opto/c2_globals.hpp > >> > >> [2] > >> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/ > share/jvmci/jvmci_globals.hpp > >> > > > From claes.redestad at oracle.com Thu Nov 14 13:36:37 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 14 Nov 2019 14:36:37 +0100 Subject: RFR: 8234003: Improve IndexSet iteration In-Reply-To: <48900cd6-3616-7e5c-dd26-f8475f4f0eb4@oracle.com> References: <0f55bfa2-f6e7-9c9e-e10f-a97a909eeb98@oracle.com> <48900cd6-3616-7e5c-dd26-f8475f4f0eb4@oracle.com> Message-ID: <1db8cd03-7553-416e-abfe-1e50ed9b82e3@oracle.com> Nils, Tobias, On 2019-11-14 10:08, Tobias Hartmann wrote: > Hi Claes, > > nice cleanup, looks good to me! thank you for reviewing! > > Just noticed that you've (intentionally?) changed the indentation of the comment in live.cpp:259. Yes, seems like my IDE stumbled a bit there.. will fix before push. /Claes From martin.doerr at sap.com Thu Nov 14 16:18:23 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 14 Nov 2019 16:18:23 +0000 Subject: RFR(S): 8233193: Incorrect bailout from possibly_add_compiler_threads Message-ID: Hi, I'd like to cleanup exception handling in CompileBroker a little bit. Here's my proposal: - Use THREAD instead of CHECK where no exceptions get thrown. - Remove unused preload_classes. - make_thread: Rename thread to new_thread to avoid confusion with THREAD (the current compiler thread). - possibly_add_compiler_threads: Remove usage of EXCEPTION_MARK + CHECK because this functions is not supposed to kill the VM on exceptions. Add assertion to caller. Webrev: http://cr.openjdk.java.net/~mdoerr/8233193_CompileBroker/webrev.00/ @David: You didn't like usage of the CHECK macro in the initialization functions, but I think they are ok. Not very nice to read, but the behavior looks ok to me. At least, I didn't find a better replacement for them. Maybe you have a proposal? Best regards, Martin From igor.ignatyev at oracle.com Thu Nov 14 16:44:38 2019 From: igor.ignatyev at oracle.com (Igor Ignatev) Date: Thu, 14 Nov 2019 08:44:38 -0800 Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs. In-Reply-To: <2w947d9dgm-1@aserp2050.oracle.com> References: <2w947d9dgm-1@aserp2050.oracle.com> Message-ID: <9725CF6F-64B1-4101-BB3A-801C841A8441@oracle.com> LGTM ? Igor > On Nov 14, 2019, at 3:22 AM, christoph.goettschkes at microdoc.com wrote: > > ?Thanks for your feedback, this resolves my concerns and I am happy with > the solution. I integrated the suggestions from Vladimir, here is the > latest webrev: > > https://cr.openjdk.java.net/~cgo/8231954/webrev.02/ > > I re-tested and it works as expected. > Please give your consent if this is fine for you as well. > > -- Christoph > > Vladimir Kozlov wrote on 2019-11-13 20:32:18: > >> From: Vladimir Kozlov >> To: Igor Ignatyev , > christoph.goettschkes at microdoc.com >> Cc: hotspot compiler >> Date: 2019-11-13 20:32 >> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/ >> TestCharVect2.java only works with server VMs. >> >>> On 11/13/19 11:11 AM, Igor Ignatyev wrote: >>> @Christoph, >>> >>> webrev.01 looks good to me. >>> I always thought that jvmci feature can be built only when compiler2 >> feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp >> suggests that jvmci can be used w/o compiler2; I don't think we have >> ever build/test, let alone support, this configuration. >>> >>> @Vladimir, >>> did/do we plan to support compiler1 + jvmci w/o compiler2 > configuration? >> >> Yes. It could be configuration when we start looking on replacing C1 >> with Graal. I think several people were interested >> in "Client VM" like configuration. >> Also Server configuration without C2 (with Graal or other jvmci >> compiler) which would be out configuration in a future. >> >> But I would prefer to be more explicit in these changes: >> >> @requires vm.compiler2.enabled | vm.graal.enabled >> >> Thanks, >> Vladimir >> >>> >>> Thanks, >>> -- Igor >>> >>>> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com > wrote: >>>> >>>> Hi Igor, >>>> >>>> thanks for your explanation. >>>> >>>> Igor Ignatyev wrote on 2019-11-12 > 20:40:46: >>>> >>>>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our > tests, as >>>>> in most cases, it causes wasted compute time (as in this test) and > can >>>>> also lead to wrong/deprecated/deleted flags sneaking into the > testbase >>>> >>>> Agreed. I also wanted to discuss this, since I think that your > solution >>>> is better than mine, but at the same time, I saw possible problems > with >>>> it, see below. >>>> >>>>> as '@requires vm.flavor == "server"' filters configurations based vm >>>>> build type, it will still allow execution on JVM w/ JVMCI and when > JVMCI >>>>> compiler is selected, as it will still be Server VM build. so, in a >>>>> sense, the test will be w/ JVMCI in the same way as w/ your > approach. >>>> >>>> My concern is not about server VMs with JVMCI, but client VMs with > JVMCI >>>> enabled. Is this a valid configuration? The MaxVectorSize option is >>>> defined in [1] as well as in [2], so for me it looks like > MaxVectorSize >>>> can be used for any VM variant as long as JVMCI is enabled. The >>>> configure script also states that both compilers are possible (if you >>>> configure with --with-jvm-features='jvmci'): >>>> >>>> configure: error: Specified JVM feature 'jvmci' requires feature >>>> 'compiler2' or 'compiler1' >>>> >>>> Should maybe the requires tag "vm.jvmci" be used as well, like: >>>> >>>> @requires vm.flavor == "server" | vm.jvmci >>>> >>>>> this is the known limitation of jtreg/@requires, and our current way > to >>>>> workaround it is to split a test description based on @requires > values >>>> >>>> Yes, if the @requires tag is used, splitting up the test looks like a > good >>>> idea. I didn't know that it is possible to have multiple test > descriptions >>>> in one test file. >>>> >>>> I created a new webrev with the new ideas: >>>> >>>> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/ >>>> >>>> I tested with an amd64 client and server VM and it looks good. I am >>>> currently unable to build a client VM with JVMCI enabled, hence no > test >>>> for that yet. I get compile errors and as soon as I resolve those, >>>> runtime errors occur. Before I look into that, I would like to know > if >>>> client VMs with JVMCI enabled are supported or not. >>>> >>>> Thanks, >>>> Christoph >>>> >>>> [1] >>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/ >> share/opto/c2_globals.hpp >>>> >>>> [2] >>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/ >> share/jvmci/jvmci_globals.hpp >>>> >>> >> > From dean.long at oracle.com Thu Nov 14 17:23:25 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 14 Nov 2019 09:23:25 -0800 Subject: RFR(M) 8233841: Update Graal Message-ID: https://bugs.openjdk.java.net/browse/JDK-8233841 http://cr.openjdk.java.net/~dlong/8233841/webrev/ This is a Graal update.? Changes since the last update (JDK-8231973) are listed in the bug description. dl From vladimir.kozlov at oracle.com Thu Nov 14 17:53:20 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Nov 2019 09:53:20 -0800 Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs. In-Reply-To: <9725CF6F-64B1-4101-BB3A-801C841A8441@oracle.com> References: <2w947d9dgm-1@aserp2050.oracle.com> <9725CF6F-64B1-4101-BB3A-801C841A8441@oracle.com> Message-ID: <79dba9cf-46c4-1a3d-aba5-28622f2bd505@oracle.com> +1 Vladimir On 11/14/19 8:44 AM, Igor Ignatev wrote: > LGTM > > ? Igor > >> On Nov 14, 2019, at 3:22 AM, christoph.goettschkes at microdoc.com wrote: >> >> ?Thanks for your feedback, this resolves my concerns and I am happy with >> the solution. I integrated the suggestions from Vladimir, here is the >> latest webrev: >> >> https://cr.openjdk.java.net/~cgo/8231954/webrev.02/ >> >> I re-tested and it works as expected. >> Please give your consent if this is fine for you as well. >> >> -- Christoph >> >> Vladimir Kozlov wrote on 2019-11-13 20:32:18: >> >>> From: Vladimir Kozlov >>> To: Igor Ignatyev , >> christoph.goettschkes at microdoc.com >>> Cc: hotspot compiler >>> Date: 2019-11-13 20:32 >>> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/ >>> TestCharVect2.java only works with server VMs. >>> >>>> On 11/13/19 11:11 AM, Igor Ignatyev wrote: >>>> @Christoph, >>>> >>>> webrev.01 looks good to me. >>>> I always thought that jvmci feature can be built only when compiler2 >>> feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp >>> suggests that jvmci can be used w/o compiler2; I don't think we have >>> ever build/test, let alone support, this configuration. >>>> >>>> @Vladimir, >>>> did/do we plan to support compiler1 + jvmci w/o compiler2 >> configuration? >>> >>> Yes. It could be configuration when we start looking on replacing C1 >>> with Graal. I think several people were interested >>> in "Client VM" like configuration. >>> Also Server configuration without C2 (with Graal or other jvmci >>> compiler) which would be out configuration in a future. >>> >>> But I would prefer to be more explicit in these changes: >>> >>> @requires vm.compiler2.enabled | vm.graal.enabled >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Thanks, >>>> -- Igor >>>> >>>>> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com >> wrote: >>>>> >>>>> Hi Igor, >>>>> >>>>> thanks for your explanation. >>>>> >>>>> Igor Ignatyev wrote on 2019-11-12 >> 20:40:46: >>>>> >>>>>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our >> tests, as >>>>>> in most cases, it causes wasted compute time (as in this test) and >> can >>>>>> also lead to wrong/deprecated/deleted flags sneaking into the >> testbase >>>>> >>>>> Agreed. I also wanted to discuss this, since I think that your >> solution >>>>> is better than mine, but at the same time, I saw possible problems >> with >>>>> it, see below. >>>>> >>>>>> as '@requires vm.flavor == "server"' filters configurations based vm >>>>>> build type, it will still allow execution on JVM w/ JVMCI and when >> JVMCI >>>>>> compiler is selected, as it will still be Server VM build. so, in a >>>>>> sense, the test will be w/ JVMCI in the same way as w/ your >> approach. >>>>> >>>>> My concern is not about server VMs with JVMCI, but client VMs with >> JVMCI >>>>> enabled. Is this a valid configuration? The MaxVectorSize option is >>>>> defined in [1] as well as in [2], so for me it looks like >> MaxVectorSize >>>>> can be used for any VM variant as long as JVMCI is enabled. The >>>>> configure script also states that both compilers are possible (if you >>>>> configure with --with-jvm-features='jvmci'): >>>>> >>>>> configure: error: Specified JVM feature 'jvmci' requires feature >>>>> 'compiler2' or 'compiler1' >>>>> >>>>> Should maybe the requires tag "vm.jvmci" be used as well, like: >>>>> >>>>> @requires vm.flavor == "server" | vm.jvmci >>>>> >>>>>> this is the known limitation of jtreg/@requires, and our current way >> to >>>>>> workaround it is to split a test description based on @requires >> values >>>>> >>>>> Yes, if the @requires tag is used, splitting up the test looks like a >> good >>>>> idea. I didn't know that it is possible to have multiple test >> descriptions >>>>> in one test file. >>>>> >>>>> I created a new webrev with the new ideas: >>>>> >>>>> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/ >>>>> >>>>> I tested with an amd64 client and server VM and it looks good. I am >>>>> currently unable to build a client VM with JVMCI enabled, hence no >> test >>>>> for that yet. I get compile errors and as soon as I resolve those, >>>>> runtime errors occur. Before I look into that, I would like to know >> if >>>>> client VMs with JVMCI enabled are supported or not. >>>>> >>>>> Thanks, >>>>> Christoph >>>>> >>>>> [1] >>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/ >>> share/opto/c2_globals.hpp >>>>> >>>>> [2] >>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/ >>> share/jvmci/jvmci_globals.hpp >>>>> >>>> >>> >> > From vladimir.kozlov at oracle.com Thu Nov 14 17:57:17 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Nov 2019 09:57:17 -0800 Subject: RFR(M) 8233841: Update Graal In-Reply-To: References: Message-ID: <5f3d6026-2bd8-0385-06e1-506c2aad4c7e@oracle.com> Looks good. Tests results good too. Thanks, Vladimir On 11/14/19 9:23 AM, dean.long at oracle.com wrote: > https://bugs.openjdk.java.net/browse/JDK-8233841 > http://cr.openjdk.java.net/~dlong/8233841/webrev/ > > This is a Graal update.? Changes since the last update (JDK-8231973) are listed in the bug description. > > dl From tom.rodriguez at oracle.com Thu Nov 14 18:17:13 2019 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 14 Nov 2019 10:17:13 -0800 Subject: RFR(XS) 8233888: jdk.vm.ci.hotspot.test.VirtualObjectLayoutTest.testFormat(): Unexpected error verifying Message-ID: <02ef4b41-a693-e671-b549-a7e007fb549d@oracle.com> http://cr.openjdk.java.net/~never/8233888/webrev https://bugs.openjdk.java.net/browse/JDK-8233888 A recently added unit test was making assumptions about field alignment that didn't hold when compressed oops were disabled. The fix is to adjust the starting point for the debug info depending on the alignment of the first field. Tested locally with and without compressed oops. tom From igor.ignatyev at oracle.com Thu Nov 14 18:28:32 2019 From: igor.ignatyev at oracle.com (Igor Ignatev) Date: Thu, 14 Nov 2019 10:28:32 -0800 Subject: RFR(XS) 8233888: jdk.vm.ci.hotspot.test.VirtualObjectLayoutTest.testFormat(): Unexpected error verifying In-Reply-To: <02ef4b41-a693-e671-b549-a7e007fb549d@oracle.com> References: <02ef4b41-a693-e671-b549-a7e007fb549d@oracle.com> Message-ID: <3FECF92F-62B8-4964-B8CA-664F6C832A0A@oracle.com> Hi Tom, LGTM ? Igor > On Nov 14, 2019, at 10:17 AM, Tom Rodriguez wrote: > > ?http://cr.openjdk.java.net/~never/8233888/webrev > https://bugs.openjdk.java.net/browse/JDK-8233888 > > A recently added unit test was making assumptions about field alignment that didn't hold when compressed oops were disabled. The fix is to adjust the starting point for the debug info depending on the alignment of the first field. Tested locally with and without compressed oops. > > tom From vladimir.kozlov at oracle.com Thu Nov 14 18:46:43 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Nov 2019 10:46:43 -0800 Subject: RFR(XS) 8233888: jdk.vm.ci.hotspot.test.VirtualObjectLayoutTest.testFormat(): Unexpected error verifying In-Reply-To: <3FECF92F-62B8-4964-B8CA-664F6C832A0A@oracle.com> References: <02ef4b41-a693-e671-b549-a7e007fb549d@oracle.com> <3FECF92F-62B8-4964-B8CA-664F6C832A0A@oracle.com> Message-ID: <0f87dac2-d383-4d0c-bfb0-1d9bcdc48d10@oracle.com> +1. Tom, or may be Igor better, can you run the test in the same configuration as in bug report to make sure it is fixed? Thanks, Vladimir On 11/14/19 10:28 AM, Igor Ignatev wrote: > Hi Tom, > > LGTM > > ? Igor > >> On Nov 14, 2019, at 10:17 AM, Tom Rodriguez wrote: >> >> ?http://cr.openjdk.java.net/~never/8233888/webrev >> https://bugs.openjdk.java.net/browse/JDK-8233888 >> >> A recently added unit test was making assumptions about field alignment that didn't hold when compressed oops were disabled. The fix is to adjust the starting point for the debug info depending on the alignment of the first field. Tested locally with and without compressed oops. >> >> tom > From dean.long at oracle.com Thu Nov 14 20:06:54 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 14 Nov 2019 12:06:54 -0800 Subject: RFR(M) 8233841: Update Graal In-Reply-To: <5f3d6026-2bd8-0385-06e1-506c2aad4c7e@oracle.com> References: <5f3d6026-2bd8-0385-06e1-506c2aad4c7e@oracle.com> Message-ID: <86c3c5e6-191e-e92e-df28-52baab350937@oracle.com> Thanks Vladimir. dl On 11/14/19 9:57 AM, Vladimir Kozlov wrote: > Looks good. Tests results good too. > > Thanks, > Vladimir > > On 11/14/19 9:23 AM, dean.long at oracle.com wrote: >> https://bugs.openjdk.java.net/browse/JDK-8233841 >> http://cr.openjdk.java.net/~dlong/8233841/webrev/ >> >> This is a Graal update.? Changes since the last update (JDK-8231973) >> are listed in the bug description. >> >> dl From dl at cs.oswego.edu Fri Nov 15 00:46:34 2019 From: dl at cs.oswego.edu (Doug Lea) Date: Thu, 14 Nov 2019 19:46:34 -0500 Subject: final field values should be trusted as constant (filed as JDK-8233873) In-Reply-To: <06DC8BED-60F2-438C-82FD-1578B7215D80@oracle.com> References: <69de5d88-2487-850b-5388-6363d88d2b5b@cs.oswego.edu> <06DC8BED-60F2-438C-82FD-1578B7215D80@oracle.com> Message-ID: <6b015267-9249-9b49-ced0-ec6b8e34a5fa@cs.oswego.edu> On 11/13/19 4:23 PM, John Rose wrote: > Also tricky is smashing of finals of normally-constructed objects, which is > allowed by the JVM. This is allowed even if heinous. We need some > Big Hammer to turn off optimization when it happens. If we were really > slick we might try to amend the JMM to declare that the JIT can retain > a previously folded value, even after somebody smashed it, on the grounds > that this is a valid race condition, where the JITted code is racily reading > back in time. (Is this already true?? I strongly suspect it is or could be sometimes true already, for example when reads are hoisted out of loops. So there is not much of an argument against the simplest interpretation of the final field rule: the first value ever read by any thread is allowed to the only value ever used by all threads. Although with some practical accommodations: > > C. Demand that frameworks which make off-label objects be upgraded > to include a memory fence after they are done setting their finals, so > that they conform to the JMM behaviors required of regular objects. I think that some (all the reliable ones!) already do so. Dependency injection frameworks (which can be worst offenders) were the main reason for adding explicit fences to Unsafe in jdk7, prior to the current full VarHandle (plus fence) scheme in jdk9. If frameworks don't use them, programs are still buggy. Where "buggy" means that readers are only guaranteed to see a typesafe value. Maybe with a a context-specific stronger guarantee; maybe not. My sense is that even the most egregious cases set fields before publishing to make accessible to other threads. Otherwise even now, bad things would routinely happen when multiple threads disagree about values. > E. Some balanced combination of the above. The remaining balance may be a matter of leaving a couple of loopholes: System.setIn and a few others could be treated magically, which they already are (but perhaps with an apologetic-sounding new annotation @NotReallyFinal). And dealing with the dubious legality of setting a final field more than once, with intervening reads, in a constructor. Maybe this could be deprecated, since this case doesn't seem to have a semantics, and would remove the weirdness that "could be final" analysis does always not apply to finals. -Doug From vladimir.kozlov at oracle.com Fri Nov 15 01:18:28 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Nov 2019 17:18:28 -0800 Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 check In-Reply-To: <82e3e8df-e6dc-5638-73a4-c5738b33fdad@oracle.com> References: <82e3e8df-e6dc-5638-73a4-c5738b33fdad@oracle.com> Message-ID: <1e6d980f-5676-31f7-510c-7c7dc746540a@oracle.com> I second this. Vladimir K. On 11/14/19 12:31 AM, Vladimir Ivanov wrote: > > >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ > > I just briefly looked through the patch and have a quick high-level question: > > +? // Rewrite: > +? //????????????? cmp?????????????????? cmp > +? //????????????? / \??????????????????? | > +? //???? (r1)? bool? \????????????????? bool (r1) > +? //??????????? /??? bool (r2)??????????? \ > +? //??? (dom) if?????? \??????? ==>?????? if > +? //??????????? \?????? )????????????????? \ > +? //??? (pre)? if[TF]? /????????????????? if[TF]X > +? //?????????????? \? / > +? //??????????????? if (this) > +? //?????????????? /? \ > +? //???????????? ifT? ifF [X] > > Why do you do complex graph surgery instead of simply adjusting condition at redundant If (to 0/1) and let existing > logic to eliminate it? > > Best regards, > Vladimir Ivanov > >> >> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check >> >> ???? Adding a simple subsumption test to IfNode::Ideal to enable a local >> ???? short-circuit for (obviously) redundant if-nodes. >> >> Testing: hs-tier1-4, hs-precheckin-comp >> >> >> Best regards, >> Patric >> From Yang.Zhang at arm.com Fri Nov 15 03:35:54 2019 From: Yang.Zhang at arm.com (Yang Zhang (Arm Technology China)) Date: Fri, 15 Nov 2019 03:35:54 +0000 Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs. In-Reply-To: <20191114112213.D20B8D9B6E@aojmv0009> References: <20191112120936.1D826D285F@aojmv0009> <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com> <2w8hcs8xd1-1@aserp2030.oracle.com> <7BEE375F-312E-4EE5-B0FA-2223752689A3@oracle.com> <71204849-979a-bd2e-b302-69ac81a44bca@oracle.com> <20191114112213.D20B8D9B6E@aojmv0009> Message-ID: Hi Christoph, Igor, Vladimir, Thanks very much for your fix. After discussion, we have got a better solution for this issue. Do we need to change the following files in which MaxVectorSize option is used? [1] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestNaNVector.java [2] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestPopCountVector.java [3] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/c2/cr6340864 Ps. For [3], it locates in c2 directory. So I'm not sure whether they will be excluded in jtreg test with client mode. Regards Yang -----Original Message----- From: hotspot-compiler-dev On Behalf Of christoph.goettschkes at microdoc.com Sent: Thursday, November 14, 2019 7:21 PM To: vladimir.kozlov at oracle.com; igor.ignatyev at oracle.com Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs. Thanks for your feedback, this resolves my concerns and I am happy with the solution. I integrated the suggestions from Vladimir, here is the latest webrev: https://cr.openjdk.java.net/~cgo/8231954/webrev.02/ I re-tested and it works as expected. Please give your consent if this is fine for you as well. -- Christoph Vladimir Kozlov wrote on 2019-11-13 20:32:18: > From: Vladimir Kozlov > To: Igor Ignatyev , christoph.goettschkes at microdoc.com > Cc: hotspot compiler > Date: 2019-11-13 20:32 > Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/ > TestCharVect2.java only works with server VMs. > > On 11/13/19 11:11 AM, Igor Ignatyev wrote: > > @Christoph, > > > > webrev.01 looks good to me. > > I always thought that jvmci feature can be built only when compiler2 > feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp > suggests that jvmci can be used w/o compiler2; I don't think we have > ever build/test, let alone support, this configuration. > > > > @Vladimir, > > did/do we plan to support compiler1 + jvmci w/o compiler2 configuration? > > Yes. It could be configuration when we start looking on replacing C1 > with Graal. I think several people were interested in "Client VM" like > configuration. > Also Server configuration without C2 (with Graal or other jvmci > compiler) which would be out configuration in a future. > > But I would prefer to be more explicit in these changes: > > @requires vm.compiler2.enabled | vm.graal.enabled > > Thanks, > Vladimir > > > > > Thanks, > > -- Igor > > > >> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com wrote: > >> > >> Hi Igor, > >> > >> thanks for your explanation. > >> > >> Igor Ignatyev wrote on 2019-11-12 20:40:46: > >> > >>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our tests, as > >>> in most cases, it causes wasted compute time (as in this test) and can > >>> also lead to wrong/deprecated/deleted flags sneaking into the testbase > >> > >> Agreed. I also wanted to discuss this, since I think that your solution > >> is better than mine, but at the same time, I saw possible problems with > >> it, see below. > >> > >>> as '@requires vm.flavor == "server"' filters configurations based > >>> vm build type, it will still allow execution on JVM w/ JVMCI and > >>> when JVMCI > >>> compiler is selected, as it will still be Server VM build. so, in > >>> a sense, the test will be w/ JVMCI in the same way as w/ your approach. > >> > >> My concern is not about server VMs with JVMCI, but client VMs with JVMCI > >> enabled. Is this a valid configuration? The MaxVectorSize option is > >> defined in [1] as well as in [2], so for me it looks like MaxVectorSize > >> can be used for any VM variant as long as JVMCI is enabled. The > >> configure script also states that both compilers are possible (if > >> you configure with --with-jvm-features='jvmci'): > >> > >> configure: error: Specified JVM feature 'jvmci' requires feature > >> 'compiler2' or 'compiler1' > >> > >> Should maybe the requires tag "vm.jvmci" be used as well, like: > >> > >> @requires vm.flavor == "server" | vm.jvmci > >> > >>> this is the known limitation of jtreg/@requires, and our current > >>> way to > >>> workaround it is to split a test description based on @requires values > >> > >> Yes, if the @requires tag is used, splitting up the test looks like > >> a good > >> idea. I didn't know that it is possible to have multiple test descriptions > >> in one test file. > >> > >> I created a new webrev with the new ideas: > >> > >> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/ > >> > >> I tested with an amd64 client and server VM and it looks good. I am > >> currently unable to build a client VM with JVMCI enabled, hence no test > >> for that yet. I get compile errors and as soon as I resolve those, > >> runtime errors occur. Before I look into that, I would like to know if > >> client VMs with JVMCI enabled are supported or not. > >> > >> Thanks, > >> Christoph > >> > >> [1] > >> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/ > share/opto/c2_globals.hpp > >> > >> [2] > >> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/ > share/jvmci/jvmci_globals.hpp > >> > > > From patrick at os.amperecomputing.com Fri Nov 15 07:51:17 2019 From: patrick at os.amperecomputing.com (Patrick Zhang OS) Date: Fri, 15 Nov 2019 07:51:17 +0000 Subject: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement In-Reply-To: <399186f2-5cb7-b713-33c5-b29b6eb73eb8@samersoff.net> References: <86f91401-b7f6-b634-fef1-b0615b8fcde0@redhat.com> <399186f2-5cb7-b713-33c5-b29b6eb73eb8@samersoff.net> Message-ID: Hi Dmitrij, The inline document inside your this patch is nice in helping me understand the string_compare stub code generation in depth, although I don't know why it was not pushed. http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp.sdiff.html There is one point I don't quite understand, could you please clarify it a little more? Thanks in advance! The large loop with prefetching logic, in different_encoding function, it uses "cnt2 - prefetchLoopExitCondition" to tell whether the 1st iteration should be executed or not, while the same_encoding does not do this, why? I was thinking that "prefetching out of the string boundary" could be an invalid operation, but it seems a misunderstanding, isn't it? The AArch64 ISA document says that prfm instruction signals the memory system and expects "preloading the cache line containing the specified address into one or more caches" would be done, in order to speed up the memory accesses when they do occur. If this is just a hint for coming ldrs, and safe enough, could we not restrict the rest iterations (2 ~ n) with largeLoopExitCondition? instead use 64 LL/UU, and 128 for LU/UL. As such more iteration can say in the large loop (SoftwarePrefetchHintDistance=192 for example), for better performance. Any comments? Thanks 4327 address generate_compare_long_string_different_encoding(bool isLU) { 4377 if (SoftwarePrefetchHintDistance >= 0) { 4378 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); 4379 __ br(__ LT, NO_PREFETCH); 4380 __ bind(LARGE_LOOP_PREFETCH); // 64-characters loop ... ... 4395 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); // <-- could we use subs(rscratch2, cnt2, 128) instead? 4396 __ br(__ GE, LARGE_LOOP_PREFETCH); 4397 } // end of 64-characters loop 4616 address generate_compare_long_string_same_encoding(bool isLL) { 4637 if (SoftwarePrefetchHintDistance >= 0) { 4638 __ bind(LARGE_LOOP_PREFETCH); 4639 __ prfm(Address(str1, SoftwarePrefetchHintDistance)); 4640 __ prfm(Address(str2, SoftwarePrefetchHintDistance)); 4641 compare_string_16_bytes_same(DIFF, DIFF2); 4642 compare_string_16_bytes_same(DIFF, DIFF2); 4643 __ sub(cnt2, cnt2, 8 * characters_in_word); 4644 compare_string_16_bytes_same(DIFF, DIFF2); 4645 __ subs(rscratch2, cnt2, largeLoopExitCondition); // rscratch2 is not used. Use subs instead of cmp in case of potentially large constants // <-- could we use subs(rscratch2, cnt2, 64) instead? 4646 compare_string_16_bytes_same(DIFF, DIFF2); 4647 __ br(__ GT, LARGE_LOOP_PREFETCH); 4648 __ cbz(cnt2, LAST_CHECK); // no more loads left 4649 } Regards Patrick -----Original Message----- From: hotspot-compiler-dev On Behalf Of Dmitry Samersoff Sent: Sunday, May 19, 2019 11:42 PM To: Dmitrij Pochepko ; Andrew Haley ; Pengfei Li (Arm Technology China) Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement Dmitrij, The changes looks good to me. -Dmitry On 25.02.2019 19:52, Dmitrij Pochepko wrote: > Hi Andrew, Pengfei, > > I created webrev.02 with all your suggestions implemented: > > webrev: http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/ > > - comments are now both in separate section and inlined into code. > - documentation mismatch mentioned by Pengfei is fixed: > -- SHORT_LAST_INIT label name misprint changed to correct SHORT_LAST > -- SHORT_LOOP_TAIL block now merged with last instruction. > Documentation is updated respectively > - minor other changes to layout and wording > > Newly developed tests were run as sanity and they passed. > > Thanks, > Dmitrij > > On 22/02/2019 6:42 PM, Andrew Haley wrote: >> On 2/22/19 10:31 AM, Pengfei Li (Arm Technology China) wrote: >> >>> So personally, I still prefer to inline the comments with the >>> original code block to avoid this kind of inconsistencies. And it >>> makes us easier to review or maintain the code together with the >>> doc, as we don't need to scroll back and force. I don't know the >>> benefit of making the code documentation as a separate part. What's >>> your opinion, Andrew Haley? >> I agree with you. There's no harm having both inline and separate. >> From Pengfei.Li at arm.com Fri Nov 15 09:15:37 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Fri, 15 Nov 2019 09:15:37 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> Message-ID: Hi Andrew, > > This patch aligns with the implementation in [1] which makes the > > x86_64 > > r12 register allocatable. Please let me know if I have missed anything > > for AArch64. > > We don't generally use r27 for compressed class pointers. Do you mean that r27 is only used for encoding/decoding oops but not for any klass pointers? I looked at the AArch64 code and find it also used in MacroAssembler::encode_klass_not_null() if the compressed mode is not zero-based. -- Thanks, Pengfei From patric.hedlin at oracle.com Fri Nov 15 09:28:23 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Fri, 15 Nov 2019 10:28:23 +0100 Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 check In-Reply-To: <82e3e8df-e6dc-5638-73a4-c5738b33fdad@oracle.com> References: <82e3e8df-e6dc-5638-73a4-c5738b33fdad@oracle.com> Message-ID: <82f94320-c260-add2-2627-4116c5667e89@oracle.com> Thanks Vladimir for pointing this out. The immediate reason is that this is a reduced patch derived from a more general one (addressing a few more cases and requiring some "surgery"). I thought I should put this out first (to address the very limited case in the trouble report) as I found the more general approach to grow a bit more than I was happy with (the intent being to follow-up with the more general version later). But you are absolutely right that this single case only requires a constant condition. I'll re-work the patch and save the "surgery" for later. Best regards, Patric Hedlin On 14/11/2019 09:31, Vladimir Ivanov wrote: > > >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ > > I just briefly looked through the patch and have a quick high-level > question: > > +? // Rewrite: > +? //????????????? cmp?????????????????? cmp > +? //????????????? / \??????????????????? | > +? //???? (r1)? bool? \????????????????? bool (r1) > +? //??????????? /??? bool (r2)??????????? \ > +? //??? (dom) if?????? \??????? ==>?????? if > +? //??????????? \?????? )????????????????? \ > +? //??? (pre)? if[TF]? /????????????????? if[TF]X > +? //?????????????? \? / > +? //??????????????? if (this) > +? //?????????????? /? \ > +? //???????????? ifT? ifF [X] > > Why do you do complex graph surgery instead of simply adjusting > condition at redundant If (to 0/1) and let existing logic to eliminate > it? > > Best regards, > Vladimir Ivanov > >> >> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check >> >> ???? Adding a simple subsumption test to IfNode::Ideal to enable a local >> ???? short-circuit for (obviously) redundant if-nodes. >> >> Testing: hs-tier1-4, hs-precheckin-comp >> >> >> Best regards, >> Patric >> From patric.hedlin at oracle.com Fri Nov 15 09:46:57 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Fri, 15 Nov 2019 10:46:57 +0100 Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 check In-Reply-To: References: <8d451a2e-fb94-8d59-e02e-e3182e115a0b@oracle.com> Message-ID: Hi Martin, On 13/11/2019 20:26, Doerr, Martin wrote: > Hi Patric, > > thanks for addressing this issue. > > There seems to be a small issue with the webrev: > - if (bol->is_Bool()) { > + if (!bol->is_Bool()) { Good catch. Looks like I spoiled the patch when adding the line above (keeping 'bol' around). Best regards, Patric Hedlin > I guess you have tested it already this way and just something with the webrev went wrong. > > You recognize and transform a specific pattern you have described in the comment. It is appropriate for fixing this example and it looks good to me. But I'm curious how often this matches. Maybe we can see a performance improvement. > > Thanks and best regards, > Martin > > >> -----Original Message----- >> From: hotspot-compiler-dev > bounces at openjdk.java.net> On Behalf Of Nils Eliasson >> Sent: Mittwoch, 13. November 2019 17:13 >> To: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 >> check >> >> Hi Patric, >> >> Looks good! >> >> (I have pre-reviewed this patch offline) >> >> Regards, >> >> Nils >> >> On 2019-11-12 15:16, Patric Hedlin wrote: >>> Dear all, >>> >>> I would like to ask for help to review the following change/update: >>> >>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376 >>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ >>> >>> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check >>> >>> ??? Adding a simple subsumption test to IfNode::Ideal to enable a local >>> ??? short-circuit for (obviously) redundant if-nodes. >>> >>> Testing: hs-tier1-4, hs-precheckin-comp >>> >>> >>> Best regards, >>> Patric From christoph.goettschkes at microdoc.com Fri Nov 15 10:16:50 2019 From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com) Date: Fri, 15 Nov 2019 11:16:50 +0100 Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs. In-Reply-To: <79dba9cf-46c4-1a3d-aba5-28622f2bd505@oracle.com> References: <2w947d9dgm-1@aserp2050.oracle.com> <9725CF6F-64B1-4101-BB3A-801C841A8441@oracle.com> <79dba9cf-46c4-1a3d-aba5-28622f2bd505@oracle.com> Message-ID: Thanks for the reviews. I created a new webrev with the Reviewed-by line added to the changeset: https://cr.openjdk.java.net/~cgo/8231954/webrev.03/ Could you sponsor the change for me and commit it into the repository? Thanks, Christoph Vladimir Kozlov wrote on 2019-11-14 18:53:20: > From: Vladimir Kozlov > To: christoph.goettschkes at microdoc.com > Cc: hotspot-compiler-dev at openjdk.java.net > Date: 2019-11-14 18:53 > Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/ > TestCharVect2.java only works with server VMs. > > +1 > > Vladimir > > On 11/14/19 8:44 AM, Igor Ignatev wrote: > > LGTM > > > > ? Igor > > > >> On Nov 14, 2019, at 3:22 AM, christoph.goettschkes at microdoc.com wrote: > >> > >> ?Thanks for your feedback, this resolves my concerns and I am happy with > >> the solution. I integrated the suggestions from Vladimir, here is the > >> latest webrev: > >> > >> https://cr.openjdk.java.net/~cgo/8231954/webrev.02/ > >> > >> I re-tested and it works as expected. > >> Please give your consent if this is fine for you as well. > >> > >> -- Christoph > >> > >> Vladimir Kozlov wrote on 2019-11-13 20:32:18: > >> > >>> From: Vladimir Kozlov > >>> To: Igor Ignatyev , > >> christoph.goettschkes at microdoc.com > >>> Cc: hotspot compiler > >>> Date: 2019-11-13 20:32 > >>> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/ > >>> TestCharVect2.java only works with server VMs. > >>> > >>>> On 11/13/19 11:11 AM, Igor Ignatyev wrote: > >>>> @Christoph, > >>>> > >>>> webrev.01 looks good to me. > >>>> I always thought that jvmci feature can be built only when compiler2 > >>> feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp > >>> suggests that jvmci can be used w/o compiler2; I don't think we have > >>> ever build/test, let alone support, this configuration. > >>>> > >>>> @Vladimir, > >>>> did/do we plan to support compiler1 + jvmci w/o compiler2 > >> configuration? > >>> > >>> Yes. It could be configuration when we start looking on replacing C1 > >>> with Graal. I think several people were interested > >>> in "Client VM" like configuration. > >>> Also Server configuration without C2 (with Graal or other jvmci > >>> compiler) which would be out configuration in a future. > >>> > >>> But I would prefer to be more explicit in these changes: > >>> > >>> @requires vm.compiler2.enabled | vm.graal.enabled > >>> > >>> Thanks, > >>> Vladimir > >>> > >>>> > >>>> Thanks, > >>>> -- Igor > >>>> > >>>>> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com > >> wrote: > >>>>> > >>>>> Hi Igor, > >>>>> > >>>>> thanks for your explanation. > >>>>> > >>>>> Igor Ignatyev wrote on 2019-11-12 > >> 20:40:46: > >>>>> > >>>>>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our > >> tests, as > >>>>>> in most cases, it causes wasted compute time (as in this test) and > >> can > >>>>>> also lead to wrong/deprecated/deleted flags sneaking into the > >> testbase > >>>>> > >>>>> Agreed. I also wanted to discuss this, since I think that your > >> solution > >>>>> is better than mine, but at the same time, I saw possible problems > >> with > >>>>> it, see below. > >>>>> > >>>>>> as '@requires vm.flavor == "server"' filters configurations based vm > >>>>>> build type, it will still allow execution on JVM w/ JVMCI and when > >> JVMCI > >>>>>> compiler is selected, as it will still be Server VM build. so, in a > >>>>>> sense, the test will be w/ JVMCI in the same way as w/ your > >> approach. > >>>>> > >>>>> My concern is not about server VMs with JVMCI, but client VMs with > >> JVMCI > >>>>> enabled. Is this a valid configuration? The MaxVectorSize option is > >>>>> defined in [1] as well as in [2], so for me it looks like > >> MaxVectorSize > >>>>> can be used for any VM variant as long as JVMCI is enabled. The > >>>>> configure script also states that both compilers are possible (if you > >>>>> configure with --with-jvm-features='jvmci'): > >>>>> > >>>>> configure: error: Specified JVM feature 'jvmci' requires feature > >>>>> 'compiler2' or 'compiler1' > >>>>> > >>>>> Should maybe the requires tag "vm.jvmci" be used as well, like: > >>>>> > >>>>> @requires vm.flavor == "server" | vm.jvmci > >>>>> > >>>>>> this is the known limitation of jtreg/@requires, and our current way > >> to > >>>>>> workaround it is to split a test description based on @requires > >> values > >>>>> > >>>>> Yes, if the @requires tag is used, splitting up the test looks like a > >> good > >>>>> idea. I didn't know that it is possible to have multiple test > >> descriptions > >>>>> in one test file. > >>>>> > >>>>> I created a new webrev with the new ideas: > >>>>> > >>>>> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/ > >>>>> > >>>>> I tested with an amd64 client and server VM and it looks good. I am > >>>>> currently unable to build a client VM with JVMCI enabled, hence no > >> test > >>>>> for that yet. I get compile errors and as soon as I resolve those, > >>>>> runtime errors occur. Before I look into that, I would like to know > >> if > >>>>> client VMs with JVMCI enabled are supported or not. > >>>>> > >>>>> Thanks, > >>>>> Christoph > >>>>> > >>>>> [1] > >>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/ > >>> share/opto/c2_globals.hpp > >>>>> > >>>>> [2] > >>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/ > >>> share/jvmci/jvmci_globals.hpp > >>>>> > >>>> > >>> > >> > > > From christoph.goettschkes at microdoc.com Fri Nov 15 10:40:21 2019 From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com) Date: Fri, 15 Nov 2019 11:40:21 +0100 Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs. In-Reply-To: References: <20191112120936.1D826D285F@aojmv0009> <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com> <2w8hcs8xd1-1@aserp2030.oracle.com> <7BEE375F-312E-4EE5-B0FA-2223752689A3@oracle.com> <71204849-979a-bd2e-b302-69ac81a44bca@oracle.com> <20191114112213.D20B8D9B6E@aojmv0009> Message-ID: Hi Yang, I don't see a problem with applying the same idea to other tests to get rid of the IgnoreUnrecognizedVMOptions option. > Ps. For [3], it locates in c2 directory. So I'm not sure whether they > will be excluded in jtreg test with client mode. No, I don't think so. Those tests are not excluded automatically by jtreg. Running $ make test TEST=hotspot_all JTREG='OPTIONS=-l' Lists all tests in the c2 directory: ... compiler/c2/cr6340864/TestByteVect.java compiler/c2/cr6340864/TestDoubleVect.java ... The directory "cr6340864" is however excluded from the hotspot tier1 test group. One thing I noticed: some of the tests use other flags, which are only supported by the C2 compiler, like TestNaNVector.java [1]. The flag " OptimizeFill" is only present in c2_globals.hpp [4] and not in jvmci_globals.hpp [5], so the @requires tag should look different. Maybe for [1], we should add another @run tag with no -XX arguments whatsoever? -- Christoph [1] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestNaNVector.java [4] https://hg.openjdk.java.net/jdk/jdk/file/6f42d2a19117/src/hotspot/share/opto/c2_globals.hpp [5] https://hg.openjdk.java.net/jdk/jdk/file/6f42d2a19117/src/hotspot/share/jvmci/jvmci_globals.hpp "Yang Zhang (Arm Technology China)" wrote on 2019-11-15 04:35:54: > From: "Yang Zhang (Arm Technology China)" > To: "christoph.goettschkes at microdoc.com" > , "vladimir.kozlov at oracle.com" > , "igor.ignatyev at oracle.com" > > Cc: "hotspot-compiler-dev at openjdk.java.net" dev at openjdk.java.net> > Date: 2019-11-15 04:36 > Subject: RE: RFR: 8231954: [TESTBUG] Test compiler/codegen/ > TestCharVect2.java only works with server VMs. > > Hi Christoph, Igor, Vladimir, > > Thanks very much for your fix. After discussion, we have got a better > solution for this issue. Do we need to change the following files in > which MaxVectorSize option is used? > > [1] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/ > jtreg/compiler/vectorization/TestNaNVector.java > [2] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/ > jtreg/compiler/vectorization/TestPopCountVector.java > [3] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/ > jtreg/compiler/c2/cr6340864 > > Ps. For [3], it locates in c2 directory. So I'm not sure whether they > will be excluded in jtreg test with client mode. > > Regards > Yang > > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of christoph.goettschkes at microdoc.com > Sent: Thursday, November 14, 2019 7:21 PM > To: vladimir.kozlov at oracle.com; igor.ignatyev at oracle.com > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/ > TestCharVect2.java only works with server VMs. > > Thanks for your feedback, this resolves my concerns and I am happy with > the solution. I integrated the suggestions from Vladimir, here is the > latest webrev: > > https://cr.openjdk.java.net/~cgo/8231954/webrev.02/ > > I re-tested and it works as expected. > Please give your consent if this is fine for you as well. > > -- Christoph > > Vladimir Kozlov wrote on 2019-11-13 20:32:18: > > > From: Vladimir Kozlov > > To: Igor Ignatyev , > christoph.goettschkes at microdoc.com > > Cc: hotspot compiler > > Date: 2019-11-13 20:32 > > Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/ > > TestCharVect2.java only works with server VMs. > > > > On 11/13/19 11:11 AM, Igor Ignatyev wrote: > > > @Christoph, > > > > > > webrev.01 looks good to me. > > > I always thought that jvmci feature can be built only when compiler2 > > feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp > > suggests that jvmci can be used w/o compiler2; I don't think we have > > ever build/test, let alone support, this configuration. > > > > > > @Vladimir, > > > did/do we plan to support compiler1 + jvmci w/o compiler2 > configuration? > > > > Yes. It could be configuration when we start looking on replacing C1 > > with Graal. I think several people were interested in "Client VM" like > > configuration. > > Also Server configuration without C2 (with Graal or other jvmci > > compiler) which would be out configuration in a future. > > > > But I would prefer to be more explicit in these changes: > > > > @requires vm.compiler2.enabled | vm.graal.enabled > > > > Thanks, > > Vladimir > > > > > > > > Thanks, > > > -- Igor > > > > > >> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com > wrote: > > >> > > >> Hi Igor, > > >> > > >> thanks for your explanation. > > >> > > >> Igor Ignatyev wrote on 2019-11-12 > 20:40:46: > > >> > > >>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our > tests, as > > >>> in most cases, it causes wasted compute time (as in this test) and > can > > >>> also lead to wrong/deprecated/deleted flags sneaking into the > testbase > > >> > > >> Agreed. I also wanted to discuss this, since I think that your > solution > > >> is better than mine, but at the same time, I saw possible problems > with > > >> it, see below. > > >> > > >>> as '@requires vm.flavor == "server"' filters configurations based > > >>> vm build type, it will still allow execution on JVM w/ JVMCI and > > >>> when > JVMCI > > >>> compiler is selected, as it will still be Server VM build. so, in > > >>> a sense, the test will be w/ JVMCI in the same way as w/ your > approach. > > >> > > >> My concern is not about server VMs with JVMCI, but client VMs with > JVMCI > > >> enabled. Is this a valid configuration? The MaxVectorSize option is > > >> defined in [1] as well as in [2], so for me it looks like > MaxVectorSize > > >> can be used for any VM variant as long as JVMCI is enabled. The > > >> configure script also states that both compilers are possible (if > > >> you configure with --with-jvm-features='jvmci'): > > >> > > >> configure: error: Specified JVM feature 'jvmci' requires feature > > >> 'compiler2' or 'compiler1' > > >> > > >> Should maybe the requires tag "vm.jvmci" be used as well, like: > > >> > > >> @requires vm.flavor == "server" | vm.jvmci > > >> > > >>> this is the known limitation of jtreg/@requires, and our current > > >>> way > to > > >>> workaround it is to split a test description based on @requires > values > > >> > > >> Yes, if the @requires tag is used, splitting up the test looks like > > >> a > good > > >> idea. I didn't know that it is possible to have multiple test > descriptions > > >> in one test file. > > >> > > >> I created a new webrev with the new ideas: > > >> > > >> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/ > > >> > > >> I tested with an amd64 client and server VM and it looks good. I am > > >> currently unable to build a client VM with JVMCI enabled, hence no > test > > >> for that yet. I get compile errors and as soon as I resolve those, > > >> runtime errors occur. Before I look into that, I would like to know > if > > >> client VMs with JVMCI enabled are supported or not. > > >> > > >> Thanks, > > >> Christoph > > >> > > >> [1] > > >> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/ > > share/opto/c2_globals.hpp > > >> > > >> [2] > > >> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/ > > share/jvmci/jvmci_globals.hpp > > >> > > > > > > > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy > the information in any medium. Thank you. > From patrick at os.amperecomputing.com Fri Nov 15 10:54:16 2019 From: patrick at os.amperecomputing.com (Patrick Zhang OS) Date: Fri, 15 Nov 2019 10:54:16 +0000 Subject: RFR (trivial): 8234228: AArch64: Clean up redundant temp vars in generate_compare_long_string_different_encoding Message-ID: Hi Reviewers, This is a simple patch which cleans up some redundant temp vars and related instructions in generate_compare_long_string_different_encoding. JBS: https://bugs.openjdk.java.net/browse/JDK-8234228 Webrev: http://cr.openjdk.java.net/~qpzhang/8234228/webrev.01 In generate_compare_long_string_different_encoding, the two Register vars strU and strL were used to record the pointers of the last 4 characters for the final comparisons. strU has been no use since the latest code updates as the chars got pre-loaded (r12) by compare_string_16_x_LU early, and strL is redundant too since the pointer is available in r11. Cleaning up these can save two add, two temp vars, and replace two sub with mov. In addition, r10 in compare_string_16_x_LU is not used, cleaned the temp var too. Tested jtreg tier1, and hotspot runtime/compiler, no new failures found. Double checked with string intrinsics cases under [1], no regression found. Ran [2] CompareToBench LU/UL as performance check, no regression found, and slight gains with some input sizes [1] http://hg.openjdk.java.net/jdk/jdk/file/3df2bf731a87/test/hotspot/jtreg/compiler/intrinsics/string [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.jar Regards Patrick From martin.doerr at sap.com Fri Nov 15 11:13:05 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 15 Nov 2019 11:13:05 +0000 Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 check In-Reply-To: <82f94320-c260-add2-2627-4116c5667e89@oracle.com> References: <82e3e8df-e6dc-5638-73a4-c5738b33fdad@oracle.com> <82f94320-c260-add2-2627-4116c5667e89@oracle.com> Message-ID: Hi Patric, excellent. I have found issues with the surgery while running some tests. The VM was running into assert(cnt == _outcnt) failed: no insertions allowed due to usage of DUIterator_Last. So a version without the surgery sounds promising. Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Patric Hedlin > Sent: Freitag, 15. November 2019 10:28 > To: Vladimir Ivanov ; hotspot compiler > > Subject: Re: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 > check > > Thanks Vladimir for pointing this out. > > The immediate reason is that this is a reduced patch derived from a more > general one (addressing a few more cases and requiring some "surgery"). > I thought I should put this out first (to address the very limited case > in the trouble report) as I found the more general approach to grow a > bit more than I was happy with (the intent being to follow-up with the > more general version later). But you are absolutely right that this > single case only requires a constant condition. I'll re-work the patch > and save the "surgery" for later. > > Best regards, > Patric Hedlin > > On 14/11/2019 09:31, Vladimir Ivanov wrote: > > > > > >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ > > > > I just briefly looked through the patch and have a quick high-level > > question: > > > > +? // Rewrite: > > +? //????????????? cmp?????????????????? cmp > > +? //????????????? / \??????????????????? | > > +? //???? (r1)? bool? \????????????????? bool (r1) > > +? //??????????? /??? bool (r2)??????????? \ > > +? //??? (dom) if?????? \??????? ==>?????? if > > +? //??????????? \?????? )????????????????? \ > > +? //??? (pre)? if[TF]? /????????????????? if[TF]X > > +? //?????????????? \? / > > +? //??????????????? if (this) > > +? //?????????????? /? \ > > +? //???????????? ifT? ifF [X] > > > > Why do you do complex graph surgery instead of simply adjusting > > condition at redundant If (to 0/1) and let existing logic to eliminate > > it? > > > > Best regards, > > Vladimir Ivanov > > > >> > >> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check > >> > >> ???? Adding a simple subsumption test to IfNode::Ideal to enable a local > >> ???? short-circuit for (obviously) redundant if-nodes. > >> > >> Testing: hs-tier1-4, hs-precheckin-comp > >> > >> > >> Best regards, > >> Patric > >> From aph at redhat.com Fri Nov 15 14:49:14 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 15 Nov 2019 14:49:14 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> Message-ID: <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> On 11/15/19 9:15 AM, Pengfei Li (Arm Technology China) wrote: >>> This patch aligns with the implementation in [1] which makes the >>> x86_64 >>> r12 register allocatable. Please let me know if I have missed anything >>> for AArch64. >> >> We don't generally use r27 for compressed class pointers. > > Do you mean that r27 is only used for encoding/decoding oops but not for > any klass pointers? Almost always, yes. > I looked at the AArch64 code and find it also used in > MacroAssembler::encode_klass_not_null() if the compressed mode is > not zero-based. I see if (use_XOR_for_compressed_class_base) { if (CompressedKlassPointers::shift() != 0) { eor(dst, src, (uint64_t)CompressedKlassPointers::base()); lsr(dst, dst, LogKlassAlignmentInBytes); } else { eor(dst, src, (uint64_t)CompressedKlassPointers::base()); } return; } if (((uint64_t)CompressedKlassPointers::base() & 0xffffffff) == 0 && CompressedKlassPointers::shift() == 0) { movw(dst, src); return; } ... followed by code which does use r27. Do you ever see r27 being used? If so, I'd be interested to know how this gets triggered and what command-line arguments you use. It's rather inefficient. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dmitrij.pochepko at bell-sw.com Fri Nov 15 15:51:42 2019 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Fri, 15 Nov 2019 18:51:42 +0300 Subject: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement In-Reply-To: References: <86f91401-b7f6-b634-fef1-b0615b8fcde0@redhat.com> <399186f2-5cb7-b713-33c5-b29b6eb73eb8@samersoff.net> Message-ID: <6843892a-4267-06cd-0af4-4618fce86396@bell-sw.com> Hi Patrick, My experiments back then showed that few platforms (some of Cortex A* series) behaves unexpectedly slow when dealing with overprefetch (probably CPU implementation specifics). So this code is some kind of compromise to run relatively well on all platforms I was able to test on (ThunderX, ThunderX2, Cortex A53, Cortex A73). That is the main reason for such code structure. It's good that you're willing to experiment and improve it, but I'm afraid changing largeLoopExitCondition to 64 LL/UU and 128 for LU/UL will likely make some systems slower as I experimented a lot with this. Let us see the performance results for several systems you've got to avoid a situation when one platform benefits by slowing down others. We could offer some help if you don't have some HW available. Thanks, Dmitrij On 15/11/2019 10:51 AM, Patrick Zhang OS wrote: > Hi Dmitrij, > > The inline document inside your this patch is nice in helping me understand the string_compare stub code generation in depth, although I don't know why it was not pushed. > http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp.sdiff.html > > There is one point I don't quite understand, could you please clarify it a little more? Thanks in advance! > The large loop with prefetching logic, in different_encoding function, it uses "cnt2 - prefetchLoopExitCondition" to tell whether the 1st iteration should be executed or not, while the same_encoding does not do this, why? > > I was thinking that "prefetching out of the string boundary" could be an invalid operation, but it seems a misunderstanding, isn't it? The AArch64 ISA document says that prfm instruction signals the memory system and expects "preloading the cache line containing the specified address into one or more caches" would be done, in order to speed up the memory accesses when they do occur. If this is just a hint for coming ldrs, and safe enough, could we not restrict the rest iterations (2 ~ n) with largeLoopExitCondition? instead use 64 LL/UU, and 128 for LU/UL. As such more iteration can say in the large loop (SoftwarePrefetchHintDistance=192 for example), for better performance. Any comments? > > Thanks > > 4327 address generate_compare_long_string_different_encoding(bool isLU) { > 4377 if (SoftwarePrefetchHintDistance >= 0) { > 4378 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); > 4379 __ br(__ LT, NO_PREFETCH); > 4380 __ bind(LARGE_LOOP_PREFETCH); // 64-characters loop > ... ... > 4395 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); // <-- could we use subs(rscratch2, cnt2, 128) instead? > 4396 __ br(__ GE, LARGE_LOOP_PREFETCH); > 4397 } // end of 64-characters loop > > 4616 address generate_compare_long_string_same_encoding(bool isLL) { > 4637 if (SoftwarePrefetchHintDistance >= 0) { > 4638 __ bind(LARGE_LOOP_PREFETCH); > 4639 __ prfm(Address(str1, SoftwarePrefetchHintDistance)); > 4640 __ prfm(Address(str2, SoftwarePrefetchHintDistance)); > 4641 compare_string_16_bytes_same(DIFF, DIFF2); > 4642 compare_string_16_bytes_same(DIFF, DIFF2); > 4643 __ sub(cnt2, cnt2, 8 * characters_in_word); > 4644 compare_string_16_bytes_same(DIFF, DIFF2); > 4645 __ subs(rscratch2, cnt2, largeLoopExitCondition); // rscratch2 is not used. Use subs instead of cmp in case of potentially large constants // <-- could we use subs(rscratch2, cnt2, 64) instead? > 4646 compare_string_16_bytes_same(DIFF, DIFF2); > 4647 __ br(__ GT, LARGE_LOOP_PREFETCH); > 4648 __ cbz(cnt2, LAST_CHECK); // no more loads left > 4649 } > > Regards > Patrick > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Dmitry Samersoff > Sent: Sunday, May 19, 2019 11:42 PM > To: Dmitrij Pochepko ; Andrew Haley ; Pengfei Li (Arm Technology China) > Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement > > Dmitrij, > > The changes looks good to me. > > -Dmitry > > On 25.02.2019 19:52, Dmitrij Pochepko wrote: >> Hi Andrew, Pengfei, >> >> I created webrev.02 with all your suggestions implemented: >> >> webrev: http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/ >> >> - comments are now both in separate section and inlined into code. >> - documentation mismatch mentioned by Pengfei is fixed: >> -- SHORT_LAST_INIT label name misprint changed to correct SHORT_LAST >> -- SHORT_LOOP_TAIL block now merged with last instruction. >> Documentation is updated respectively >> - minor other changes to layout and wording >> >> Newly developed tests were run as sanity and they passed. >> >> Thanks, >> Dmitrij >> >> On 22/02/2019 6:42 PM, Andrew Haley wrote: >>> On 2/22/19 10:31 AM, Pengfei Li (Arm Technology China) wrote: >>> >>>> So personally, I still prefer to inline the comments with the >>>> original code block to avoid this kind of inconsistencies. And it >>>> makes us easier to review or maintain the code together with the >>>> doc, as we don't need to scroll back and force. I don't know the >>>> benefit of making the code documentation as a separate part. What's >>>> your opinion, Andrew Haley? >>> I agree with you. There's no harm having both inline and separate. >>> From vladimir.kozlov at oracle.com Fri Nov 15 19:11:58 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Nov 2019 11:11:58 -0800 Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs. In-Reply-To: References: <20191112120936.1D826D285F@aojmv0009> <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com> <2w8hcs8xd1-1@aserp2030.oracle.com> <7BEE375F-312E-4EE5-B0FA-2223752689A3@oracle.com> <71204849-979a-bd2e-b302-69ac81a44bca@oracle.com> <20191114112213.D20B8D9B6E@aojmv0009> Message-ID: <8f0630cf-7fe6-8021-f195-9595d6f460c9@oracle.com> Note, compiler/c2 and compiler/c1 was misleading naming for tests directories which is nothing to do with C1 and C2 JIT compilers. They are simply 2 groups of tests we split so they can be executed in parallel in reasonable time. Vladimir On 11/14/19 7:35 PM, Yang Zhang (Arm Technology China) wrote: > Hi Christoph, Igor, Vladimir, > > Thanks very much for your fix. After discussion, we have got a better solution for this issue. Do we need to change the following files in which MaxVectorSize option is used? > > [1] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestNaNVector.java > [2] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestPopCountVector.java > [3] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/c2/cr6340864 > > Ps. For [3], it locates in c2 directory. So I'm not sure whether they will be excluded in jtreg test with client mode. > > Regards > Yang > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of christoph.goettschkes at microdoc.com > Sent: Thursday, November 14, 2019 7:21 PM > To: vladimir.kozlov at oracle.com; igor.ignatyev at oracle.com > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs. > > Thanks for your feedback, this resolves my concerns and I am happy with the solution. I integrated the suggestions from Vladimir, here is the latest webrev: > > https://cr.openjdk.java.net/~cgo/8231954/webrev.02/ > > I re-tested and it works as expected. > Please give your consent if this is fine for you as well. > > -- Christoph > > Vladimir Kozlov wrote on 2019-11-13 20:32:18: > >> From: Vladimir Kozlov >> To: Igor Ignatyev , > christoph.goettschkes at microdoc.com >> Cc: hotspot compiler >> Date: 2019-11-13 20:32 >> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/ >> TestCharVect2.java only works with server VMs. >> >> On 11/13/19 11:11 AM, Igor Ignatyev wrote: >>> @Christoph, >>> >>> webrev.01 looks good to me. >>> I always thought that jvmci feature can be built only when compiler2 >> feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp >> suggests that jvmci can be used w/o compiler2; I don't think we have >> ever build/test, let alone support, this configuration. >>> >>> @Vladimir, >>> did/do we plan to support compiler1 + jvmci w/o compiler2 > configuration? >> >> Yes. It could be configuration when we start looking on replacing C1 >> with Graal. I think several people were interested in "Client VM" like >> configuration. >> Also Server configuration without C2 (with Graal or other jvmci >> compiler) which would be out configuration in a future. >> >> But I would prefer to be more explicit in these changes: >> >> @requires vm.compiler2.enabled | vm.graal.enabled >> >> Thanks, >> Vladimir >> >>> >>> Thanks, >>> -- Igor >>> >>>> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com > wrote: >>>> >>>> Hi Igor, >>>> >>>> thanks for your explanation. >>>> >>>> Igor Ignatyev wrote on 2019-11-12 > 20:40:46: >>>> >>>>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our > tests, as >>>>> in most cases, it causes wasted compute time (as in this test) and > can >>>>> also lead to wrong/deprecated/deleted flags sneaking into the > testbase >>>> >>>> Agreed. I also wanted to discuss this, since I think that your > solution >>>> is better than mine, but at the same time, I saw possible problems > with >>>> it, see below. >>>> >>>>> as '@requires vm.flavor == "server"' filters configurations based >>>>> vm build type, it will still allow execution on JVM w/ JVMCI and >>>>> when > JVMCI >>>>> compiler is selected, as it will still be Server VM build. so, in >>>>> a sense, the test will be w/ JVMCI in the same way as w/ your > approach. >>>> >>>> My concern is not about server VMs with JVMCI, but client VMs with > JVMCI >>>> enabled. Is this a valid configuration? The MaxVectorSize option is >>>> defined in [1] as well as in [2], so for me it looks like > MaxVectorSize >>>> can be used for any VM variant as long as JVMCI is enabled. The >>>> configure script also states that both compilers are possible (if >>>> you configure with --with-jvm-features='jvmci'): >>>> >>>> configure: error: Specified JVM feature 'jvmci' requires feature >>>> 'compiler2' or 'compiler1' >>>> >>>> Should maybe the requires tag "vm.jvmci" be used as well, like: >>>> >>>> @requires vm.flavor == "server" | vm.jvmci >>>> >>>>> this is the known limitation of jtreg/@requires, and our current >>>>> way > to >>>>> workaround it is to split a test description based on @requires > values >>>> >>>> Yes, if the @requires tag is used, splitting up the test looks like >>>> a > good >>>> idea. I didn't know that it is possible to have multiple test > descriptions >>>> in one test file. >>>> >>>> I created a new webrev with the new ideas: >>>> >>>> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/ >>>> >>>> I tested with an amd64 client and server VM and it looks good. I am >>>> currently unable to build a client VM with JVMCI enabled, hence no > test >>>> for that yet. I get compile errors and as soon as I resolve those, >>>> runtime errors occur. Before I look into that, I would like to know > if >>>> client VMs with JVMCI enabled are supported or not. >>>> >>>> Thanks, >>>> Christoph >>>> >>>> [1] >>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/ >> share/opto/c2_globals.hpp >>>> >>>> [2] >>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/ >> share/jvmci/jvmci_globals.hpp >>>> >>> >> > From igor.ignatyev at oracle.com Fri Nov 15 19:21:51 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 15 Nov 2019 11:21:51 -0800 Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs. In-Reply-To: <8f0630cf-7fe6-8021-f195-9595d6f460c9@oracle.com> References: <20191112120936.1D826D285F@aojmv0009> <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com> <2w8hcs8xd1-1@aserp2030.oracle.com> <7BEE375F-312E-4EE5-B0FA-2223752689A3@oracle.com> <71204849-979a-bd2e-b302-69ac81a44bca@oracle.com> <20191114112213.D20B8D9B6E@aojmv0009> <8f0630cf-7fe6-8021-f195-9595d6f460c9@oracle.com> Message-ID: <27E94E0A-7A99-4B0D-B96D-7DADF6201542@oracle.com> I'd also add that unfortunately it's not always right to add '@requirers vm.compiler2.enabled' to a test just b/c it uses some c2-only flags. there are cases when such tests aren't just still valid w/o these flags, but are capable to spot bugs, on the other hand, tests which have multiple runs w/ different values of c2-only (or c1-only) flags can be split to reduce wasted time. that's to say decision on whenever @requires is a right thing should be done on test-per-test basis. as 8231954 was only about TestCharVect2 test, I suggest we push Christoph's webrev.03 and file an RFE to deal w/ other tests, or retrofit 8228493 to talk not only about non-product flags but also about c2/c1-only flags and use it as an umbrella for discussion/work-tracking. -- Igor > On Nov 15, 2019, at 11:11 AM, Vladimir Kozlov wrote: > > Note, compiler/c2 and compiler/c1 was misleading naming for tests directories which is nothing to do with C1 and C2 JIT compilers. They are simply 2 groups of tests we split so they can be executed in parallel in reasonable time. > > Vladimir > > On 11/14/19 7:35 PM, Yang Zhang (Arm Technology China) wrote: >> Hi Christoph, Igor, Vladimir, >> Thanks very much for your fix. After discussion, we have got a better solution for this issue. Do we need to change the following files in which MaxVectorSize option is used? >> [1] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestNaNVector.java >> [2] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/vectorization/TestPopCountVector.java >> [3] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/hotspot/jtreg/compiler/c2/cr6340864 >> Ps. For [3], it locates in c2 directory. So I'm not sure whether they will be excluded in jtreg test with client mode. >> Regards >> Yang >> -----Original Message----- >> From: hotspot-compiler-dev On Behalf Of christoph.goettschkes at microdoc.com >> Sent: Thursday, November 14, 2019 7:21 PM >> To: vladimir.kozlov at oracle.com; igor.ignatyev at oracle.com >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs. >> Thanks for your feedback, this resolves my concerns and I am happy with the solution. I integrated the suggestions from Vladimir, here is the latest webrev: >> https://cr.openjdk.java.net/~cgo/8231954/webrev.02/ >> I re-tested and it works as expected. >> Please give your consent if this is fine for you as well. >> -- Christoph >> Vladimir Kozlov wrote on 2019-11-13 20:32:18: >>> From: Vladimir Kozlov >>> To: Igor Ignatyev , >> christoph.goettschkes at microdoc.com >>> Cc: hotspot compiler >>> Date: 2019-11-13 20:32 >>> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/ >>> TestCharVect2.java only works with server VMs. >>> >>> On 11/13/19 11:11 AM, Igor Ignatyev wrote: >>>> @Christoph, >>>> >>>> webrev.01 looks good to me. >>>> I always thought that jvmci feature can be built only when compiler2 >>> feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp >>> suggests that jvmci can be used w/o compiler2; I don't think we have >>> ever build/test, let alone support, this configuration. >>>> >>>> @Vladimir, >>>> did/do we plan to support compiler1 + jvmci w/o compiler2 >> configuration? >>> >>> Yes. It could be configuration when we start looking on replacing C1 >>> with Graal. I think several people were interested in "Client VM" like >>> configuration. >>> Also Server configuration without C2 (with Graal or other jvmci >>> compiler) which would be out configuration in a future. >>> >>> But I would prefer to be more explicit in these changes: >>> >>> @requires vm.compiler2.enabled | vm.graal.enabled >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Thanks, >>>> -- Igor >>>> >>>>> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com >> wrote: >>>>> >>>>> Hi Igor, >>>>> >>>>> thanks for your explanation. >>>>> >>>>> Igor Ignatyev wrote on 2019-11-12 >> 20:40:46: >>>>> >>>>>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our >> tests, as >>>>>> in most cases, it causes wasted compute time (as in this test) and >> can >>>>>> also lead to wrong/deprecated/deleted flags sneaking into the >> testbase >>>>> >>>>> Agreed. I also wanted to discuss this, since I think that your >> solution >>>>> is better than mine, but at the same time, I saw possible problems >> with >>>>> it, see below. >>>>> >>>>>> as '@requires vm.flavor == "server"' filters configurations based >>>>>> vm build type, it will still allow execution on JVM w/ JVMCI and >>>>>> when >> JVMCI >>>>>> compiler is selected, as it will still be Server VM build. so, in >>>>>> a sense, the test will be w/ JVMCI in the same way as w/ your >> approach. >>>>> >>>>> My concern is not about server VMs with JVMCI, but client VMs with >> JVMCI >>>>> enabled. Is this a valid configuration? The MaxVectorSize option is >>>>> defined in [1] as well as in [2], so for me it looks like >> MaxVectorSize >>>>> can be used for any VM variant as long as JVMCI is enabled. The >>>>> configure script also states that both compilers are possible (if >>>>> you configure with --with-jvm-features='jvmci'): >>>>> >>>>> configure: error: Specified JVM feature 'jvmci' requires feature >>>>> 'compiler2' or 'compiler1' >>>>> >>>>> Should maybe the requires tag "vm.jvmci" be used as well, like: >>>>> >>>>> @requires vm.flavor == "server" | vm.jvmci >>>>> >>>>>> this is the known limitation of jtreg/@requires, and our current >>>>>> way >> to >>>>>> workaround it is to split a test description based on @requires >> values >>>>> >>>>> Yes, if the @requires tag is used, splitting up the test looks like >>>>> a >> good >>>>> idea. I didn't know that it is possible to have multiple test >> descriptions >>>>> in one test file. >>>>> >>>>> I created a new webrev with the new ideas: >>>>> >>>>> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/ >>>>> >>>>> I tested with an amd64 client and server VM and it looks good. I am >>>>> currently unable to build a client VM with JVMCI enabled, hence no >> test >>>>> for that yet. I get compile errors and as soon as I resolve those, >>>>> runtime errors occur. Before I look into that, I would like to know >> if >>>>> client VMs with JVMCI enabled are supported or not. >>>>> >>>>> Thanks, >>>>> Christoph >>>>> >>>>> [1] >>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/ >>> share/opto/c2_globals.hpp >>>>> >>>>> [2] >>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/ >>> share/jvmci/jvmci_globals.hpp >>>>> >>>> >>> From igor.ignatyev at oracle.com Fri Nov 15 19:33:07 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 15 Nov 2019 11:33:07 -0800 Subject: RFR(S) : 8214904 : Test8004741.java failed due to "Too few ThreadDeath hits; expected at least 6 but saw only 5" Message-ID: http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00 > 52 lines changed: 13 ins; 24 del; 15 mod; Hi all, could you please review this small patch which (hopefully) solves intermittent failures of compiler/c2/Test8004741 test? the test used to run 12 times and expecting that no less (actually it was more than) 6 times ThreadDeath exception happenes during array allocation; the patch changes the test to run until ThreadDeath got caught 6 times. the test has been also updated to use exceptions instead of System.exit to signal test failure and to use whitebox to check that 'test' method got compiled. testing: 100 times on windows-x64-debug (where the test failed) + once on all platform webrev: http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00 JBS: https://bugs.openjdk.java.net/browse/JDK-8214904 Thanks, -- Igor From vladimir.kozlov at oracle.com Fri Nov 15 19:35:42 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Nov 2019 11:35:42 -0800 Subject: RFR(S) : 8214904 : Test8004741.java failed due to "Too few ThreadDeath hits; expected at least 6 but saw only 5" In-Reply-To: References: Message-ID: <88850dc4-30b3-6157-5f99-d62094b9c9b9@oracle.com> Good. Thanks, Vladimir On 11/15/19 11:33 AM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00 >> 52 lines changed: 13 ins; 24 del; 15 mod; > > Hi all, > > could you please review this small patch which (hopefully) solves intermittent failures of compiler/c2/Test8004741 test? > the test used to run 12 times and expecting that no less (actually it was more than) 6 times ThreadDeath exception happenes during array allocation; the patch changes the test to run until ThreadDeath got caught 6 times. > > the test has been also updated to use exceptions instead of System.exit to signal test failure and to use whitebox to check that 'test' method got compiled. > > testing: 100 times on windows-x64-debug (where the test failed) + once on all platform > webrev: http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00 > JBS: https://bugs.openjdk.java.net/browse/JDK-8214904 > > Thanks, > -- Igor > From ekaterina.pavlova at oracle.com Fri Nov 15 20:25:20 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Fri, 15 Nov 2019 12:25:20 -0800 Subject: RFR(S) : 8214904 : Test8004741.java failed due to "Too few ThreadDeath hits; expected at least 6 but saw only 5" In-Reply-To: References: Message-ID: <78486f8c-84e2-ab0b-d92b-72b538126572@oracle.com> Hi Igor, you also mentioned in the bug report that "there is also a race on 'passed' field, j.u.c.AtomicInteger should be used here". Is it still good thing to do? thanks, -katya On 11/15/19 11:33 AM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00 >> 52 lines changed: 13 ins; 24 del; 15 mod; > > Hi all, > > could you please review this small patch which (hopefully) solves intermittent failures of compiler/c2/Test8004741 test? > the test used to run 12 times and expecting that no less (actually it was more than) 6 times ThreadDeath exception happenes during array allocation; the patch changes the test to run until ThreadDeath got caught 6 times. > > the test has been also updated to use exceptions instead of System.exit to signal test failure and to use whitebox to check that 'test' method got compiled. > > testing: 100 times on windows-x64-debug (where the test failed) + once on all platform > webrev: http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00 > JBS: https://bugs.openjdk.java.net/browse/JDK-8214904 > > Thanks, > -- Igor > From igor.ignatyev at oracle.com Fri Nov 15 21:28:25 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 15 Nov 2019 13:28:25 -0800 Subject: RFR(S) : 8214904 : Test8004741.java failed due to "Too few ThreadDeath hits; expected at least 6 but saw only 5" In-Reply-To: <78486f8c-84e2-ab0b-d92b-72b538126572@oracle.com> References: <78486f8c-84e2-ab0b-d92b-72b538126572@oracle.com> Message-ID: <8B066C86-0393-4FB5-8AEE-229EA9EED159@oracle.com> Hi Katya, actually, there is no race b/c there is happen-before edge b/w all actions in Test8004741::run (including updates of passed) and Thread.join (L142) in threadTest. so there is no need to use AtomicInteger. I'll add a comment to the bug report. -- Igor > On Nov 15, 2019, at 12:25 PM, Ekaterina Pavlova wrote: > > Hi Igor, > > you also mentioned in the bug report that "there is also a race on 'passed' field, j.u.c.AtomicInteger should be used here". > Is it still good thing to do? > > thanks, > -katya > > On 11/15/19 11:33 AM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00 >>> 52 lines changed: 13 ins; 24 del; 15 mod; >> Hi all, >> could you please review this small patch which (hopefully) solves intermittent failures of compiler/c2/Test8004741 test? >> the test used to run 12 times and expecting that no less (actually it was more than) 6 times ThreadDeath exception happenes during array allocation; the patch changes the test to run until ThreadDeath got caught 6 times. >> the test has been also updated to use exceptions instead of System.exit to signal test failure and to use whitebox to check that 'test' method got compiled. >> testing: 100 times on windows-x64-debug (where the test failed) + once on all platform >> webrev: http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00 >> JBS: https://bugs.openjdk.java.net/browse/JDK-8214904 >> Thanks, >> -- Igor > From ekaterina.pavlova at oracle.com Fri Nov 15 21:54:01 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Fri, 15 Nov 2019 13:54:01 -0800 Subject: RFR(S) : 8214904 : Test8004741.java failed due to "Too few ThreadDeath hits; expected at least 6 but saw only 5" In-Reply-To: <8B066C86-0393-4FB5-8AEE-229EA9EED159@oracle.com> References: <78486f8c-84e2-ab0b-d92b-72b538126572@oracle.com> <8B066C86-0393-4FB5-8AEE-229EA9EED159@oracle.com> Message-ID: Ok, thanks Igor, looks good then. -katya On 11/15/19 1:28 PM, Igor Ignatyev wrote: > Hi Katya, > > actually, there is no race b/c there is happen-before edge b/w all actions in Test8004741::run (including updates of passed) and Thread.join (L142) in threadTest. so there is no need to use AtomicInteger. I'll add a comment to the bug report. > > -- Igor > >> On Nov 15, 2019, at 12:25 PM, Ekaterina Pavlova wrote: >> >> Hi Igor, >> >> you also mentioned in the bug report that "there is also a race on 'passed' field, j.u.c.AtomicInteger should be used here". >> Is it still good thing to do? >> >> thanks, >> -katya >> >> On 11/15/19 11:33 AM, Igor Ignatyev wrote: >>> http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00 >>>> 52 lines changed: 13 ins; 24 del; 15 mod; >>> Hi all, >>> could you please review this small patch which (hopefully) solves intermittent failures of compiler/c2/Test8004741 test? >>> the test used to run 12 times and expecting that no less (actually it was more than) 6 times ThreadDeath exception happenes during array allocation; the patch changes the test to run until ThreadDeath got caught 6 times. >>> the test has been also updated to use exceptions instead of System.exit to signal test failure and to use whitebox to check that 'test' method got compiled. >>> testing: 100 times on windows-x64-debug (where the test failed) + once on all platform >>> webrev: http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00 >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8214904 >>> Thanks, >>> -- Igor >> > From igor.ignatyev at oracle.com Fri Nov 15 22:23:38 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 15 Nov 2019 14:23:38 -0800 Subject: RFR(S) : 8214904 : Test8004741.java failed due to "Too few ThreadDeath hits; expected at least 6 but saw only 5" In-Reply-To: References: <78486f8c-84e2-ab0b-d92b-72b538126572@oracle.com> <8B066C86-0393-4FB5-8AEE-229EA9EED159@oracle.com> Message-ID: Katya, Vladimir, thanks for your review, pushed. -- Igor > On Nov 15, 2019, at 11:35 AM, Vladimir Kozlov wrote: > > Good. > > Thanks, > Vladimir > On Nov 15, 2019, at 1:54 PM, Ekaterina Pavlova wrote: > > Ok, thanks Igor, looks good then. > > -katya > > On 11/15/19 1:28 PM, Igor Ignatyev wrote: >> Hi Katya, >> actually, there is no race b/c there is happen-before edge b/w all actions in Test8004741::run (including updates of passed) and Thread.join (L142) in threadTest. so there is no need to use AtomicInteger. I'll add a comment to the bug report. >> -- Igor >>> On Nov 15, 2019, at 12:25 PM, Ekaterina Pavlova wrote: >>> >>> Hi Igor, >>> >>> you also mentioned in the bug report that "there is also a race on 'passed' field, j.u.c.AtomicInteger should be used here". >>> Is it still good thing to do? >>> >>> thanks, >>> -katya >>> >>> On 11/15/19 11:33 AM, Igor Ignatyev wrote: >>>> http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00 >>>>> 52 lines changed: 13 ins; 24 del; 15 mod; >>>> Hi all, >>>> could you please review this small patch which (hopefully) solves intermittent failures of compiler/c2/Test8004741 test? >>>> the test used to run 12 times and expecting that no less (actually it was more than) 6 times ThreadDeath exception happenes during array allocation; the patch changes the test to run until ThreadDeath got caught 6 times. >>>> the test has been also updated to use exceptions instead of System.exit to signal test failure and to use whitebox to check that 'test' method got compiled. >>>> testing: 100 times on windows-x64-debug (where the test failed) + once on all platform >>>> webrev: http://cr.openjdk.java.net/~iignatyev//8214904/webrev.00 >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8214904 >>>> Thanks, >>>> -- Igor >>> > From igor.ignatyev at oracle.com Sat Nov 16 07:47:20 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 15 Nov 2019 23:47:20 -0800 Subject: RFR(S) : 8233462 : serviceability/tmtools/jstat tests times out with -Xcomp Message-ID: <2B7FD259-FBC2-4850-B1B3-81D1669F150B@oracle.com> http://cr.openjdk.java.net/~iignatyev//8233462/webrev.00 > 33 lines changed: 1 ins; 14 del; 18 mod; Hi all, could you please review this small fix for tmtools testlibrary? tmtools tests are believed to fail due to a deadlock-like situation b/w main test process and tmtools process: (from JBS) > it seems these tests attach jstat to the main test process, the same process which reads the tool's stdout/stderr, so there is a possibility that this will deadlock: jstat-process produces more output than the buffer can hold, so it blocks till someone (the main process reads it), while the main process waits till jstat completes. the patch changes serviceability/tmtools/share/common library (used by all serviceability/tmtools) to redirect tmtool's stdout and stderr into files instead of using jdk.test.lib.process.OutputAnalyzer; I've also added a bit of diagnostic output, so it will be easier to analyze future failures. webrev: http://cr.openjdk.java.net/~iignatyev//8233462/webrev.00 JBS: https://bugs.openjdk.java.net/browse/JDK-8233462 testing: - serviceability/tmtools on windows-x64,linux-x64,macosx-x64,solaris-sparcv9 - serviceability/tmtools 100 times on linux-x64-debug w/ '-Xcomp -ea -esa -XX:+TieredCompilation -XX:+DeoptimizeALot' (most of failures have been seen on this configuration) Thanks, -- Igor From igor.ignatyev at oracle.com Sun Nov 17 06:07:19 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Sat, 16 Nov 2019 22:07:19 -0800 Subject: RFR(S) : 8234290 : compiler/c2/Test6857159.java times out and fail to clean up files Message-ID: <37F8BFE5-CF19-42CC-8C26-ECCB2008D4A1@oracle.com> http://cr.openjdk.java.net/~iignatyev//8234290/webrev.00/index.html > 67 lines changed: 16 ins; 24 del; 27 mod; Hi all, could you please review this small fix for Test6857159 test? from JBS: > the test has -XX:CompileOnly=compiler.c2.Test6857159$Test$ct::run, but there is no 'ct' class, there are ct[0-2], and ct0 the only which has 'run' method. shouldNotContain("COMPILE SKIPPED") and shouldContain("$ct0::run (16 bytes)"), which, I guess, were a defense against such situation, didn't help b/c PrintCompilation output doesn't have 'COMPILE SKIPPED' lines and have 'made not compilable on levels 0 1 2 3 ... $ct0::run (16 bytes) excluded by CompileCommand' line. the patch fixes CompileOnly value (actually replaces it w/ the correct CompileCommand), removes extra layer, and makes the test to use WhiteBox to check if ct0::run got compiled. webrev: http://cr.openjdk.java.net/~iignatyev//8234290/webrev.00/index.html JBS: https://bugs.openjdk.java.net/browse/JDK-8234290 testing: - compiler/c2/Test6857159.java once on linux-x64,windows-x64,macosx-x64 - compiler/c2/Test6857159.java 100 time on windows-x64-debug (where all failures were seen so far) Thanks, -- Igor From igor.ignatyev at oracle.com Sun Nov 17 19:00:33 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Sun, 17 Nov 2019 11:00:33 -0800 Subject: RFR(S) : 8147017 : Platform.isGraal should be removed Message-ID: <981118AF-1DAD-4231-9FA6-7A89A46E5EDB@oracle.com> http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html > 16 lines changed: 2 ins; 8 del; 6 mod; Hi all, jdk.test.lib.Platform.isGraal method assumes that JVM w/ Graal as JIT has 'Graal VM' in its name, which is wrong, and caused other to incorrectly assume that '-graal' flag exist and must be used to select Graal compiler. the patch removes this method and updates its only meaningful usage in TestGCLogMessages test. TestGCLogMessages test should use LogMessageWithLevelC2OrJVMCIOnly only when c2 or graal is available, so it's been updated to use corresponding methods of sun.hotspot.code.Compiler class, which requires WhiteBoxAPI being enabled. JBS: https://bugs.openjdk.java.net/browse/JDK-8147017 webrev: http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html testing: tier1 + TestGCLogMessages w/ different JIT configurations Thanks, -- Igor From david.holmes at oracle.com Mon Nov 18 03:45:35 2019 From: david.holmes at oracle.com (David Holmes) Date: Mon, 18 Nov 2019 13:45:35 +1000 Subject: RFR(S): 8233193: Incorrect bailout from possibly_add_compiler_threads In-Reply-To: References: Message-ID: <12461130-cafe-d239-46df-08527f67e779@oracle.com> Hi Martin, On 15/11/2019 2:18 am, Doerr, Martin wrote: > Hi, > > I'd like to cleanup exception handling in CompileBroker a little bit. > > Here's my proposal: > > - Use THREAD instead of CHECK where no exceptions get thrown. That's fine but you also replaced TRAPS with "Thread* THREAD" which is pointless given: #define TRAPS Thread* THREAD > - Remove unused preload_classes. Ok > - make_thread: Rename thread to new_thread to avoid confusion with > THREAD (the current compiler thread). Ok > - possibly_add_compiler_threads: Remove usage of EXCEPTION_MARK + CHECK > because this functions is not supposed to kill the VM on exceptions. Add > assertion to caller. Ok > Webrev: > > http://cr.openjdk.java.net/~mdoerr/8233193_CompileBroker/webrev.00/ > > @David: > > You didn't like usage of the CHECK macro in the initialization > functions, but I think they are ok. > > Not very nice to read, but the behavior looks ok to me. > > At least, I didn't find a better replacement for them. Maybe you have a > proposal? May I suggest a comment then: 844 void CompileBroker::init_compiler_sweeper_threads() { // Ensure any exceptions lead to vm_exit_during_initialization 845 EXCEPTION_MARK; Thanks, David ----- > Best regards, > > Martin > From patrick at os.amperecomputing.com Mon Nov 18 03:52:26 2019 From: patrick at os.amperecomputing.com (Patrick Zhang OS) Date: Mon, 18 Nov 2019 03:52:26 +0000 Subject: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement In-Reply-To: <6843892a-4267-06cd-0af4-4618fce86396@bell-sw.com> References: <86f91401-b7f6-b634-fef1-b0615b8fcde0@redhat.com> <399186f2-5cb7-b713-33c5-b29b6eb73eb8@samersoff.net> <6843892a-4267-06cd-0af4-4618fce86396@bell-sw.com> Message-ID: Thanks for the information. I am interested in the inconsistence between same_encoding and different_encoding functions, if "overprefetch" can be safe enough, why do we prevent it at the end of large-loop inside same_encoding, why do we protect it more strictly in different_encoding at both the beginning and ending of the large-loop? I did not mean globally updating largeLoopExitCondition to 64/128, merely the condition at the end of large-loop inside same_encoding. Suppose large-loop could be faster than small-loop (in theory), removing all "overprefetch" conditions would allow more strings go to the large-loop for better performance. Any other potential side-effects? Regards Patrick -----Original Message----- From: Dmitrij Pochepko Sent: Friday, November 15, 2019 11:52 PM To: Patrick Zhang OS Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; Dmitry Samersoff ; Andrew Haley ; Pengfei Li (Arm Technology China) Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement Hi Patrick, My experiments back then showed that few platforms (some of Cortex A* series) behaves unexpectedly slow when dealing with overprefetch (probably CPU implementation specifics). So this code is some kind of compromise to run relatively well on all platforms I was able to test on (ThunderX, ThunderX2, Cortex A53, Cortex A73). That is the main reason for such code structure. It's good that you're willing to experiment and improve it, but I'm afraid changing largeLoopExitCondition to 64 LL/UU and 128 for LU/UL will likely make some systems slower as I experimented a lot with this. Let us see the performance results for several systems you've got to avoid a situation when one platform benefits by slowing down others. We could offer some help if you don't have some HW available. Thanks, Dmitrij On 15/11/2019 10:51 AM, Patrick Zhang OS wrote: > Hi Dmitrij, > > The inline document inside your this patch is nice in helping me understand the string_compare stub code generation in depth, although I don't know why it was not pushed. > http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/src/hotspot/cpu > /aarch64/stubGenerator_aarch64.cpp.sdiff.html > > There is one point I don't quite understand, could you please clarify it a little more? Thanks in advance! > The large loop with prefetching logic, in different_encoding function, it uses "cnt2 - prefetchLoopExitCondition" to tell whether the 1st iteration should be executed or not, while the same_encoding does not do this, why? > > I was thinking that "prefetching out of the string boundary" could be an invalid operation, but it seems a misunderstanding, isn't it? The AArch64 ISA document says that prfm instruction signals the memory system and expects "preloading the cache line containing the specified address into one or more caches" would be done, in order to speed up the memory accesses when they do occur. If this is just a hint for coming ldrs, and safe enough, could we not restrict the rest iterations (2 ~ n) with largeLoopExitCondition? instead use 64 LL/UU, and 128 for LU/UL. As such more iteration can say in the large loop (SoftwarePrefetchHintDistance=192 for example), for better performance. Any comments? > > Thanks > > 4327 address generate_compare_long_string_different_encoding(bool isLU) { > 4377 if (SoftwarePrefetchHintDistance >= 0) { > 4378 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); > 4379 __ br(__ LT, NO_PREFETCH); > 4380 __ bind(LARGE_LOOP_PREFETCH); // 64-characters loop > ... ... > 4395 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); // <-- could we use subs(rscratch2, cnt2, 128) instead? > 4396 __ br(__ GE, LARGE_LOOP_PREFETCH); > 4397 } // end of 64-characters loop > > 4616 address generate_compare_long_string_same_encoding(bool isLL) { > 4637 if (SoftwarePrefetchHintDistance >= 0) { > 4638 __ bind(LARGE_LOOP_PREFETCH); > 4639 __ prfm(Address(str1, SoftwarePrefetchHintDistance)); > 4640 __ prfm(Address(str2, SoftwarePrefetchHintDistance)); > 4641 compare_string_16_bytes_same(DIFF, DIFF2); > 4642 compare_string_16_bytes_same(DIFF, DIFF2); > 4643 __ sub(cnt2, cnt2, 8 * characters_in_word); > 4644 compare_string_16_bytes_same(DIFF, DIFF2); > 4645 __ subs(rscratch2, cnt2, largeLoopExitCondition); // rscratch2 is not used. Use subs instead of cmp in case of potentially large constants // <-- could we use subs(rscratch2, cnt2, 64) instead? > 4646 compare_string_16_bytes_same(DIFF, DIFF2); > 4647 __ br(__ GT, LARGE_LOOP_PREFETCH); > 4648 __ cbz(cnt2, LAST_CHECK); // no more loads left > 4649 } > > Regards > Patrick > > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of Dmitry > Samersoff > Sent: Sunday, May 19, 2019 11:42 PM > To: Dmitrij Pochepko ; Andrew Haley > ; Pengfei Li (Arm Technology China) > > Cc: hotspot-compiler-dev at openjdk.java.net; > aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: > String::compareTo intrinsic documentation and maintenance improvement > > Dmitrij, > > The changes looks good to me. > > -Dmitry > > On 25.02.2019 19:52, Dmitrij Pochepko wrote: >> Hi Andrew, Pengfei, >> >> I created webrev.02 with all your suggestions implemented: >> >> webrev: http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/ >> >> - comments are now both in separate section and inlined into code. >> - documentation mismatch mentioned by Pengfei is fixed: >> -- SHORT_LAST_INIT label name misprint changed to correct SHORT_LAST >> -- SHORT_LOOP_TAIL block now merged with last instruction. >> Documentation is updated respectively >> - minor other changes to layout and wording >> >> Newly developed tests were run as sanity and they passed. >> >> Thanks, >> Dmitrij >> >> On 22/02/2019 6:42 PM, Andrew Haley wrote: >>> On 2/22/19 10:31 AM, Pengfei Li (Arm Technology China) wrote: >>> >>>> So personally, I still prefer to inline the comments with the >>>> original code block to avoid this kind of inconsistencies. And it >>>> makes us easier to review or maintain the code together with the >>>> doc, as we don't need to scroll back and force. I don't know the >>>> benefit of making the code documentation as a separate part. What's >>>> your opinion, Andrew Haley? >>> I agree with you. There's no harm having both inline and separate. >>> From patrick at os.amperecomputing.com Mon Nov 18 04:03:55 2019 From: patrick at os.amperecomputing.com (Patrick Zhang OS) Date: Mon, 18 Nov 2019 04:03:55 +0000 Subject: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement In-Reply-To: References: <86f91401-b7f6-b634-fef1-b0615b8fcde0@redhat.com> <399186f2-5cb7-b713-33c5-b29b6eb73eb8@samersoff.net> <6843892a-4267-06cd-0af4-4618fce86396@bell-sw.com> Message-ID: >> changing largeLoopExitCondition to 64 LL/UU and 128 for LU/UL will likely make some systems slower as I experimented a lot with this. Sorry my second paragraph was inaccurate, it seems you experimented that there were some cases ran well with the first iteration of the large-loop but would rather quit the loop and go to the small-loop immediately for better performance (?). Please correct me if I misunderstood this. Thanks. Regards Patrick -----Original Message----- From: hotspot-compiler-dev On Behalf Of Patrick Zhang OS Sent: Monday, November 18, 2019 11:52 AM To: Dmitrij Pochepko Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: RE: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement Thanks for the information. I am interested in the inconsistence between same_encoding and different_encoding functions, if "overprefetch" can be safe enough, why do we prevent it at the end of large-loop inside same_encoding, why do we protect it more strictly in different_encoding at both the beginning and ending of the large-loop? I did not mean globally updating largeLoopExitCondition to 64/128, merely the condition at the end of large-loop inside same_encoding. Suppose large-loop could be faster than small-loop (in theory), removing all "overprefetch" conditions would allow more strings go to the large-loop for better performance. Any other potential side-effects? Regards Patrick -----Original Message----- From: Dmitrij Pochepko Sent: Friday, November 15, 2019 11:52 PM To: Patrick Zhang OS Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; Dmitry Samersoff ; Andrew Haley ; Pengfei Li (Arm Technology China) Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement Hi Patrick, My experiments back then showed that few platforms (some of Cortex A* series) behaves unexpectedly slow when dealing with overprefetch (probably CPU implementation specifics). So this code is some kind of compromise to run relatively well on all platforms I was able to test on (ThunderX, ThunderX2, Cortex A53, Cortex A73). That is the main reason for such code structure. It's good that you're willing to experiment and improve it, but I'm afraid changing largeLoopExitCondition to 64 LL/UU and 128 for LU/UL will likely make some systems slower as I experimented a lot with this. Let us see the performance results for several systems you've got to avoid a situation when one platform benefits by slowing down others. We could offer some help if you don't have some HW available. Thanks, Dmitrij On 15/11/2019 10:51 AM, Patrick Zhang OS wrote: > Hi Dmitrij, > > The inline document inside your this patch is nice in helping me understand the string_compare stub code generation in depth, although I don't know why it was not pushed. > http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/src/hotspot/cpu > /aarch64/stubGenerator_aarch64.cpp.sdiff.html > > There is one point I don't quite understand, could you please clarify it a little more? Thanks in advance! > The large loop with prefetching logic, in different_encoding function, it uses "cnt2 - prefetchLoopExitCondition" to tell whether the 1st iteration should be executed or not, while the same_encoding does not do this, why? > > I was thinking that "prefetching out of the string boundary" could be an invalid operation, but it seems a misunderstanding, isn't it? The AArch64 ISA document says that prfm instruction signals the memory system and expects "preloading the cache line containing the specified address into one or more caches" would be done, in order to speed up the memory accesses when they do occur. If this is just a hint for coming ldrs, and safe enough, could we not restrict the rest iterations (2 ~ n) with largeLoopExitCondition? instead use 64 LL/UU, and 128 for LU/UL. As such more iteration can say in the large loop (SoftwarePrefetchHintDistance=192 for example), for better performance. Any comments? > > Thanks > > 4327 address generate_compare_long_string_different_encoding(bool isLU) { > 4377 if (SoftwarePrefetchHintDistance >= 0) { > 4378 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); > 4379 __ br(__ LT, NO_PREFETCH); > 4380 __ bind(LARGE_LOOP_PREFETCH); // 64-characters loop > ... ... > 4395 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); // <-- could we use subs(rscratch2, cnt2, 128) instead? > 4396 __ br(__ GE, LARGE_LOOP_PREFETCH); > 4397 } // end of 64-characters loop > > 4616 address generate_compare_long_string_same_encoding(bool isLL) { > 4637 if (SoftwarePrefetchHintDistance >= 0) { > 4638 __ bind(LARGE_LOOP_PREFETCH); > 4639 __ prfm(Address(str1, SoftwarePrefetchHintDistance)); > 4640 __ prfm(Address(str2, SoftwarePrefetchHintDistance)); > 4641 compare_string_16_bytes_same(DIFF, DIFF2); > 4642 compare_string_16_bytes_same(DIFF, DIFF2); > 4643 __ sub(cnt2, cnt2, 8 * characters_in_word); > 4644 compare_string_16_bytes_same(DIFF, DIFF2); > 4645 __ subs(rscratch2, cnt2, largeLoopExitCondition); // rscratch2 is not used. Use subs instead of cmp in case of potentially large constants // <-- could we use subs(rscratch2, cnt2, 64) instead? > 4646 compare_string_16_bytes_same(DIFF, DIFF2); > 4647 __ br(__ GT, LARGE_LOOP_PREFETCH); > 4648 __ cbz(cnt2, LAST_CHECK); // no more loads left > 4649 } > > Regards > Patrick > > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of Dmitry > Samersoff > Sent: Sunday, May 19, 2019 11:42 PM > To: Dmitrij Pochepko ; Andrew Haley > ; Pengfei Li (Arm Technology China) > > Cc: hotspot-compiler-dev at openjdk.java.net; > aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: > String::compareTo intrinsic documentation and maintenance improvement > > Dmitrij, > > The changes looks good to me. > > -Dmitry > > On 25.02.2019 19:52, Dmitrij Pochepko wrote: >> Hi Andrew, Pengfei, >> >> I created webrev.02 with all your suggestions implemented: >> >> webrev: http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/ >> >> - comments are now both in separate section and inlined into code. >> - documentation mismatch mentioned by Pengfei is fixed: >> -- SHORT_LAST_INIT label name misprint changed to correct SHORT_LAST >> -- SHORT_LOOP_TAIL block now merged with last instruction. >> Documentation is updated respectively >> - minor other changes to layout and wording >> >> Newly developed tests were run as sanity and they passed. >> >> Thanks, >> Dmitrij >> >> On 22/02/2019 6:42 PM, Andrew Haley wrote: >>> On 2/22/19 10:31 AM, Pengfei Li (Arm Technology China) wrote: >>> >>>> So personally, I still prefer to inline the comments with the >>>> original code block to avoid this kind of inconsistencies. And it >>>> makes us easier to review or maintain the code together with the >>>> doc, as we don't need to scroll back and force. I don't know the >>>> benefit of making the code documentation as a separate part. What's >>>> your opinion, Andrew Haley? >>> I agree with you. There's no harm having both inline and separate. >>> From Pengfei.Li at arm.com Mon Nov 18 09:58:11 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Mon, 18 Nov 2019 09:58:11 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> Message-ID: Hi Andrew, > I see > > if (use_XOR_for_compressed_class_base) { > if (CompressedKlassPointers::shift() != 0) { > eor(dst, src, (uint64_t)CompressedKlassPointers::base()); > lsr(dst, dst, LogKlassAlignmentInBytes); > } else { > eor(dst, src, (uint64_t)CompressedKlassPointers::base()); > } > return; > } > > if (((uint64_t)CompressedKlassPointers::base() & 0xffffffff) == 0 > && CompressedKlassPointers::shift() == 0) { > movw(dst, src); > return; > } > > ... followed by code which does use r27. > > Do you ever see r27 being used? If so, I'd be interested to know how this gets > triggered and what command-line arguments you use. It's rather inefficient. I think you're right. I tried hard with various VM options but still failed to get the code after this part triggered. The worst case I've ever found is that the encoding/decoding returns at if block if (((uint64_t)CompressedKlassPointers::base() & 0xffffffff) == 0 && CompressedKlassPointers::shift() == 0) { ... } By browsing the code, I found this is caused by a metaspace reservation trick that always tries to make AArch64 metaspace 4G-aligned. [1] If we do have the confidence that r27 won't be used for class pointers, I will remove UseCompressedClassPointers in my if condition. Another question, shall we clean up the (almost) dead code which uses r27 for encoding/decoding class pointers? [1] http://hg.openjdk.java.net/jdk/jdk/file/7bdc4f073c7f/src/hotspot/share/memory/metaspace.cpp#l1048 -- Thanks, Pengfei From aph at redhat.com Mon Nov 18 10:06:46 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 18 Nov 2019 10:06:46 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> Message-ID: <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com> On 11/18/19 9:58 AM, Pengfei Li (Arm Technology China) wrote: > I think you're right. I tried hard with various VM options but still failed to > get the code after this part triggered. The worst case I've ever found is that > the encoding/decoding returns at if block > if (((uint64_t)CompressedKlassPointers::base() & 0xffffffff) == 0 > && CompressedKlassPointers::shift() == 0) { ... } > > By browsing the code, I found this is caused by a metaspace reservation trick > that always tries to make AArch64 metaspace 4G-aligned. [1] > > If we do have the confidence that r27 won't be used for class pointers, I will > remove UseCompressedClassPointers in my if condition. Another question, shall > we clean up the (almost) dead code which uses r27 for encoding/decoding class > pointers? > > [1] http://hg.openjdk.java.net/jdk/jdk/file/7bdc4f073c7f/src/hotspot/share/memory/metaspace.cpp#l1048 We should have a flag which is set if the search for nicely-aligned memory is successful, and then you can use that flag to determine if r27 is needed. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From patric.hedlin at oracle.com Mon Nov 18 10:06:14 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Mon, 18 Nov 2019 11:06:14 +0100 Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 check In-Reply-To: References: Message-ID: <35dd0640-7f48-11ae-bd12-1fcaf893b2fc@oracle.com> Dear all, Please review the new patch, now reduced to a "minimum" (besides the table encoding). Updated in-place. Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ Testing: hs-tier1-3 Best regards, Patric On 12/11/2019 15:16, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ > > 8220376: C2: Int >0 not recognized as !=0 for div by 0 check > > ??? Adding a simple subsumption test to IfNode::Ideal to enable a local > ??? short-circuit for (obviously) redundant if-nodes. > > Testing: hs-tier1-4, hs-precheckin-comp > > > Best regards, > Patric > From Pengfei.Li at arm.com Mon Nov 18 10:35:18 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Mon, 18 Nov 2019 10:35:18 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com> Message-ID: Hi Andrew, > We should have a flag which is set if the search for nicely-aligned memory is > successful, and then you can use that flag to determine if r27 is needed. I just found in current HotSpot code, UseCompressedOops must be on for UseCompressedClassPointers to be on. See arguments.cpp [1]. If this is true, UseCompressedClassPointers cannot be used without UseCompressedOops. So wouldn't a single condition of UseCompressedOops be enough? But the x86_64 code which I referenced has both two conditions. Is it because the relationship of the arguments are subject to change in the future? [1] http://hg.openjdk.java.net/jdk/jdk/file/7bdc4f073c7f/src/hotspot/share/runtime/arguments.cpp#l1715 -- Thanks, Pengfei From aph at redhat.com Mon Nov 18 10:39:03 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 18 Nov 2019 10:39:03 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com> Message-ID: <638df51f-f9e0-5ceb-6509-7d396f85d8ec@redhat.com> On 11/18/19 10:35 AM, Pengfei Li (Arm Technology China) wrote: > If this is true, UseCompressedClassPointers cannot be used without > UseCompressedOops. So wouldn't a single condition of UseCompressedOops be > enough? Why do you think so? UseCompressedOops doesn't usually need r27. > But the x86_64 code which I referenced has both two conditions. > Is it because the relationship of the arguments are subject to change in the > future? I have no idea why these flags depend on each other. I'd use compressed class pointers all the time, regardless of compressed oops. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From martin.doerr at sap.com Mon Nov 18 10:42:50 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 18 Nov 2019 10:42:50 +0000 Subject: RFR(S): 8233193: Incorrect bailout from possibly_add_compiler_threads In-Reply-To: <12461130-cafe-d239-46df-08527f67e779@oracle.com> References: <12461130-cafe-d239-46df-08527f67e779@oracle.com> Message-ID: Hi David, thanks for the review. > That's fine but you also replaced TRAPS with "Thread* THREAD" which is > pointless given: > > #define TRAPS Thread* THREAD I know that it's technically pointless. But I think it's confusing to use TRAPS if we don't expect any exceptions. We just want to pass a Thread pointer. > May I suggest a comment then: > > 844 void CompileBroker::init_compiler_sweeper_threads() { > // Ensure any exceptions lead to vm_exit_during_initialization > 845 EXCEPTION_MARK; Added. I also added: @@ -647,6 +647,7 @@ // totalTime performance counter is always created as it is required // by the implementation of java.lang.management.CompilationMBean. { + // Ensure OOM leads to vm_exit_during_initialization. EXCEPTION_MARK; _perf_total_compilation = PerfDataManager::create_counter(JAVA_CI, "totalTime", Best regards, Martin > -----Original Message----- > From: David Holmes > Sent: Montag, 18. November 2019 04:46 > To: Doerr, Martin ; 'hotspot-compiler- > dev at openjdk.java.net' > Cc: David Holmes > Subject: Re: RFR(S): 8233193: Incorrect bailout from > possibly_add_compiler_threads > > Hi Martin, > > On 15/11/2019 2:18 am, Doerr, Martin wrote: > > Hi, > > > > I'd like to cleanup exception handling in CompileBroker a little bit. > > > > Here's my proposal: > > > > - Use THREAD instead of CHECK where no exceptions get thrown. > > That's fine but you also replaced TRAPS with "Thread* THREAD" which is > pointless given: > > #define TRAPS Thread* THREAD > > > - Remove unused preload_classes. > > Ok > > > - make_thread: Rename thread to new_thread to avoid confusion with > > THREAD (the current compiler thread). > > Ok > > > - possibly_add_compiler_threads: Remove usage of EXCEPTION_MARK + > CHECK > > because this functions is not supposed to kill the VM on exceptions. Add > > assertion to caller. > > Ok > > > Webrev: > > > > http://cr.openjdk.java.net/~mdoerr/8233193_CompileBroker/webrev.00/ > > > > @David: > > > > You didn't like usage of the CHECK macro in the initialization > > functions, but I think they are ok. > > > > Not very nice to read, but the behavior looks ok to me. > > > > At least, I didn't find a better replacement for them. Maybe you have a > > proposal? > > May I suggest a comment then: > > 844 void CompileBroker::init_compiler_sweeper_threads() { > // Ensure any exceptions lead to vm_exit_during_initialization > 845 EXCEPTION_MARK; > > Thanks, > David > ----- > > > Best regards, > > > > Martin > > From martin.doerr at sap.com Mon Nov 18 11:23:12 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 18 Nov 2019 11:23:12 +0000 Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 check In-Reply-To: <35dd0640-7f48-11ae-bd12-1fcaf893b2fc@oracle.com> References: <35dd0640-7f48-11ae-bd12-1fcaf893b2fc@oracle.com> Message-ID: Hi Patric, I'd consider moving subsuming_bool_test_encode up to avoid the prototype. But I can also live with it. Looks good to me. Best regards, Martin > -----Original Message----- > From: Patric Hedlin > Sent: Montag, 18. November 2019 11:06 > To: hotspot-compiler-dev at openjdk.java.net; Nils Eliasson > ; Vladimir Ivanov > ; Doerr, Martin > Subject: Re: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 > check > > Dear all, > > Please review the new patch, now reduced to a "minimum" (besides the > table encoding). > > Updated in-place. > Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ > > Testing: hs-tier1-3 > > > Best regards, > Patric > > On 12/11/2019 15:16, Patric Hedlin wrote: > > Dear all, > > > > I would like to ask for help to review the following change/update: > > > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376 > > Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ > > > > 8220376: C2: Int >0 not recognized as !=0 for div by 0 check > > > > ??? Adding a simple subsumption test to IfNode::Ideal to enable a local > > ??? short-circuit for (obviously) redundant if-nodes. > > > > Testing: hs-tier1-4, hs-precheckin-comp > > > > > > Best regards, > > Patric > > From david.holmes at oracle.com Mon Nov 18 11:31:53 2019 From: david.holmes at oracle.com (David Holmes) Date: Mon, 18 Nov 2019 21:31:53 +1000 Subject: RFR(S): 8233193: Incorrect bailout from possibly_add_compiler_threads In-Reply-To: References: <12461130-cafe-d239-46df-08527f67e779@oracle.com> Message-ID: <1d4cfee4-6063-f2c4-e7d0-060ff93c6b02@oracle.com> On 18/11/2019 8:42 pm, Doerr, Martin wrote: > Hi David, > > thanks for the review. > > >> That's fine but you also replaced TRAPS with "Thread* THREAD" which is >> pointless given: >> >> #define TRAPS Thread* THREAD > > I know that it's technically pointless. But I think it's confusing to use TRAPS if we don't expect any exceptions. > We just want to pass a Thread pointer. Yes you are right. I just re-read the comments in exceptions.hpp :) Thanks, David > >> May I suggest a comment then: >> >> 844 void CompileBroker::init_compiler_sweeper_threads() { >> // Ensure any exceptions lead to vm_exit_during_initialization >> 845 EXCEPTION_MARK; > > Added. I also added: > > @@ -647,6 +647,7 @@ > // totalTime performance counter is always created as it is required > // by the implementation of java.lang.management.CompilationMBean. > { > + // Ensure OOM leads to vm_exit_during_initialization. > EXCEPTION_MARK; > _perf_total_compilation = > PerfDataManager::create_counter(JAVA_CI, "totalTime", > > > > Best regards, > Martin > > > >> -----Original Message----- >> From: David Holmes >> Sent: Montag, 18. November 2019 04:46 >> To: Doerr, Martin ; 'hotspot-compiler- >> dev at openjdk.java.net' >> Cc: David Holmes >> Subject: Re: RFR(S): 8233193: Incorrect bailout from >> possibly_add_compiler_threads >> >> Hi Martin, >> >> On 15/11/2019 2:18 am, Doerr, Martin wrote: >>> Hi, >>> >>> I'd like to cleanup exception handling in CompileBroker a little bit. >>> >>> Here's my proposal: >>> >>> - Use THREAD instead of CHECK where no exceptions get thrown. >> >> That's fine but you also replaced TRAPS with "Thread* THREAD" which is >> pointless given: >> >> #define TRAPS Thread* THREAD >> >>> - Remove unused preload_classes. >> >> Ok >> >>> - make_thread: Rename thread to new_thread to avoid confusion with >>> THREAD (the current compiler thread). >> >> Ok >> >>> - possibly_add_compiler_threads: Remove usage of EXCEPTION_MARK + >> CHECK >>> because this functions is not supposed to kill the VM on exceptions. Add >>> assertion to caller. >> >> Ok >> >>> Webrev: >>> >>> http://cr.openjdk.java.net/~mdoerr/8233193_CompileBroker/webrev.00/ >>> >>> @David: >>> >>> You didn't like usage of the CHECK macro in the initialization >>> functions, but I think they are ok. >>> >>> Not very nice to read, but the behavior looks ok to me. >>> >>> At least, I didn't find a better replacement for them. Maybe you have a >>> proposal? >> >> May I suggest a comment then: >> >> 844 void CompileBroker::init_compiler_sweeper_threads() { >> // Ensure any exceptions lead to vm_exit_during_initialization >> 845 EXCEPTION_MARK; >> >> Thanks, >> David >> ----- >> >>> Best regards, >>> >>> Martin >>> From claes.redestad at oracle.com Mon Nov 18 11:37:42 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Mon, 18 Nov 2019 12:37:42 +0100 Subject: RFR: 8234248: More VectorSet cleanups Message-ID: Hi, JDK-8233708 removed unused code from VectorSet, but missed a few spots. Additionally, the current code does a few other things like shifting up signed longs and casting back to uint, which could be improved. Bug: https://bugs.openjdk.java.net/browse/JDK-8234248 Webrev: http://cr.openjdk.java.net/~redestad/8234248/open.00/ Testing: tier1-3 Thanks! /Claes From dmitrij.pochepko at bell-sw.com Mon Nov 18 12:02:13 2019 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Mon, 18 Nov 2019 15:02:13 +0300 Subject: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement In-Reply-To: References: <86f91401-b7f6-b634-fef1-b0615b8fcde0@redhat.com> <399186f2-5cb7-b713-33c5-b29b6eb73eb8@samersoff.net> <6843892a-4267-06cd-0af4-4618fce86396@bell-sw.com> Message-ID: <3427facc-eb05-d690-eebe-acca39b87d4a@bell-sw.com> On 18/11/2019 7:03 AM, Patrick Zhang OS wrote: >>> changing largeLoopExitCondition to 64 LL/UU and 128 for LU/UL will likely make some systems slower as I experimented a lot with this. > Sorry my second paragraph was inaccurate, it seems you experimented that there were some cases ran well with the first iteration of the large-loop but would rather quit the loop and go to the small-loop immediately for better performance (?). Please correct me if I misunderstood this. Thanks. > > Regards > Patrick Yes. That's correct. Thanks, Dmitrij > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Patrick Zhang OS > Sent: Monday, November 18, 2019 11:52 AM > To: Dmitrij Pochepko > Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net > Subject: RE: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement > > Thanks for the information. > I am interested in the inconsistence between same_encoding and different_encoding functions, if "overprefetch" can be safe enough, why do we prevent it at the end of large-loop inside same_encoding, why do we protect it more strictly in different_encoding at both the beginning and ending of the large-loop? > I did not mean globally updating largeLoopExitCondition to 64/128, merely the condition at the end of large-loop inside same_encoding. Suppose large-loop could be faster than small-loop (in theory), removing all "overprefetch" conditions would allow more strings go to the large-loop for better performance. Any other potential side-effects? > > Regards > Patrick > > -----Original Message----- > From: Dmitrij Pochepko > Sent: Friday, November 15, 2019 11:52 PM > To: Patrick Zhang OS > Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; Dmitry Samersoff ; Andrew Haley ; Pengfei Li (Arm Technology China) > Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: String::compareTo intrinsic documentation and maintenance improvement > > Hi Patrick, > > My experiments back then showed that few platforms (some of Cortex A* > series) behaves unexpectedly slow when dealing with overprefetch (probably CPU implementation specifics). So this code is some kind of compromise to run relatively well on all platforms I was able to test on (ThunderX, ThunderX2, Cortex A53, Cortex A73). That is the main reason for such code structure. > It's good that you're willing to experiment and improve it, but I'm afraid changing largeLoopExitCondition to 64 LL/UU and 128 for LU/UL will likely make some systems slower as I experimented a lot with this. > Let us see the performance results for several systems you've got to avoid a situation when one platform benefits by slowing down others. We could offer some help if you don't have some HW available. > > Thanks, > Dmitrij > > On 15/11/2019 10:51 AM, Patrick Zhang OS wrote: >> Hi Dmitrij, >> >> The inline document inside your this patch is nice in helping me understand the string_compare stub code generation in depth, although I don't know why it was not pushed. >> http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/src/hotspot/cpu >> /aarch64/stubGenerator_aarch64.cpp.sdiff.html >> >> There is one point I don't quite understand, could you please clarify it a little more? Thanks in advance! >> The large loop with prefetching logic, in different_encoding function, it uses "cnt2 - prefetchLoopExitCondition" to tell whether the 1st iteration should be executed or not, while the same_encoding does not do this, why? >> >> I was thinking that "prefetching out of the string boundary" could be an invalid operation, but it seems a misunderstanding, isn't it? The AArch64 ISA document says that prfm instruction signals the memory system and expects "preloading the cache line containing the specified address into one or more caches" would be done, in order to speed up the memory accesses when they do occur. If this is just a hint for coming ldrs, and safe enough, could we not restrict the rest iterations (2 ~ n) with largeLoopExitCondition? instead use 64 LL/UU, and 128 for LU/UL. As such more iteration can say in the large loop (SoftwarePrefetchHintDistance=192 for example), for better performance. Any comments? >> >> Thanks >> >> 4327 address generate_compare_long_string_different_encoding(bool isLU) { >> 4377 if (SoftwarePrefetchHintDistance >= 0) { >> 4378 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); >> 4379 __ br(__ LT, NO_PREFETCH); >> 4380 __ bind(LARGE_LOOP_PREFETCH); // 64-characters loop >> ... ... >> 4395 __ subs(rscratch2, cnt2, prefetchLoopExitCondition); // <-- could we use subs(rscratch2, cnt2, 128) instead? >> 4396 __ br(__ GE, LARGE_LOOP_PREFETCH); >> 4397 } // end of 64-characters loop >> >> 4616 address generate_compare_long_string_same_encoding(bool isLL) { >> 4637 if (SoftwarePrefetchHintDistance >= 0) { >> 4638 __ bind(LARGE_LOOP_PREFETCH); >> 4639 __ prfm(Address(str1, SoftwarePrefetchHintDistance)); >> 4640 __ prfm(Address(str2, SoftwarePrefetchHintDistance)); >> 4641 compare_string_16_bytes_same(DIFF, DIFF2); >> 4642 compare_string_16_bytes_same(DIFF, DIFF2); >> 4643 __ sub(cnt2, cnt2, 8 * characters_in_word); >> 4644 compare_string_16_bytes_same(DIFF, DIFF2); >> 4645 __ subs(rscratch2, cnt2, largeLoopExitCondition); // rscratch2 is not used. Use subs instead of cmp in case of potentially large constants // <-- could we use subs(rscratch2, cnt2, 64) instead? >> 4646 compare_string_16_bytes_same(DIFF, DIFF2); >> 4647 __ br(__ GT, LARGE_LOOP_PREFETCH); >> 4648 __ cbz(cnt2, LAST_CHECK); // no more loads left >> 4649 } >> >> Regards >> Patrick >> >> -----Original Message----- >> From: hotspot-compiler-dev >> On Behalf Of Dmitry >> Samersoff >> Sent: Sunday, May 19, 2019 11:42 PM >> To: Dmitrij Pochepko ; Andrew Haley >> ; Pengfei Li (Arm Technology China) >> >> Cc: hotspot-compiler-dev at openjdk.java.net; >> aarch64-port-dev at openjdk.java.net >> Subject: Re: [aarch64-port-dev ] RFR: 8218748: AARCH64: >> String::compareTo intrinsic documentation and maintenance improvement >> >> Dmitrij, >> >> The changes looks good to me. >> >> -Dmitry >> >> On 25.02.2019 19:52, Dmitrij Pochepko wrote: >>> Hi Andrew, Pengfei, >>> >>> I created webrev.02 with all your suggestions implemented: >>> >>> webrev: http://cr.openjdk.java.net/~dpochepk/8218748/webrev.02/ >>> >>> - comments are now both in separate section and inlined into code. >>> - documentation mismatch mentioned by Pengfei is fixed: >>> -- SHORT_LAST_INIT label name misprint changed to correct SHORT_LAST >>> -- SHORT_LOOP_TAIL block now merged with last instruction. >>> Documentation is updated respectively >>> - minor other changes to layout and wording >>> >>> Newly developed tests were run as sanity and they passed. >>> >>> Thanks, >>> Dmitrij >>> >>> On 22/02/2019 6:42 PM, Andrew Haley wrote: >>>> On 2/22/19 10:31 AM, Pengfei Li (Arm Technology China) wrote: >>>> >>>>> So personally, I still prefer to inline the comments with the >>>>> original code block to avoid this kind of inconsistencies. And it >>>>> makes us easier to review or maintain the code together with the >>>>> doc, as we don't need to scroll back and force. I don't know the >>>>> benefit of making the code documentation as a separate part. What's >>>>> your opinion, Andrew Haley? >>>> I agree with you. There's no harm having both inline and separate. >>>> From vladimir.x.ivanov at oracle.com Mon Nov 18 12:34:55 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 18 Nov 2019 15:34:55 +0300 Subject: Bounds Check Elimination with Fast-Range In-Reply-To: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com> References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com> Message-ID: (CCing hotspot-compiler-dev at ...) Thanks for the reference, August. Indeed the proposed approach looks very promising. I don't think range-check elimination in C2 can optimize such code shape (haven't verified it with test cases yet though). Filed an RFE for it: https://bugs.openjdk.java.net/browse/JDK-8234333 It looks like it should be pretty straightforward to cover this particular case. Also, it's worth looking at how Graal handles it and file a separate RFE if it doesn't optimize it well. Best regards, Vladimir Ivanov On 18.11.2019 07:02, August Nagro wrote: > Hi! > > The fast-range[1] algorithm is used to map well-distributed hash functions to a range of size N. It is ~4x faster than using integer modulo, and does not require the table to be a power of two. It is used by libraries like Tensorflow and the StockFish chess engine. > > The idea is that, given (int) hash h and (int) size N, then ((long) h) * N) >>> 32 is a good mapping. > > However, will the compiler be able to eliminate array range-checking? HashMap?s approach using power-of-two xor/mask was patched here: https://bugs.openjdk.java.net/browse/JDK-8003585. > > Sincerely, > > - August > > [1]: https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/ > From fweimer at redhat.com Mon Nov 18 13:10:27 2019 From: fweimer at redhat.com (Florian Weimer) Date: Mon, 18 Nov 2019 14:10:27 +0100 Subject: Bounds Check Elimination with Fast-Range In-Reply-To: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com> (August Nagro's message of "Sun, 17 Nov 2019 22:02:22 -0600") References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com> Message-ID: <87r22549fg.fsf@oldenburg2.str.redhat.com> * August Nagro: > The fast-range[1] algorithm is used to map well-distributed hash > functions to a range of size N. It is ~4x faster than using integer > modulo, and does not require the table to be a power of two. It is > used by libraries like Tensorflow and the StockFish chess engine. > > The idea is that, given (int) hash h and (int) size N, then ((long) h) > * N) >>> 32 is a good mapping. I looked at this in the past weeks in a different context, and I don't think this would work because we have: jshell> Integer.hashCode(0) $1 ==> 0 jshell> Integer.hashCode(1) $2 ==> 1 jshell> Integer.hashCode(2) $3 ==> 2 jshell> "a".hashCode() $4 ==> 97 jshell> "b".hashCode() $5 ==> 98 Under the allegedly good mapping, those all map to bucket zero even for really large arrays, which is not acceptable. The multiplication shortcut only works for hash functions which behave in certain ways. Something FNV-style for strings is probably okay, but most Java hashCode() implementations likely are not. For non-power-of-two bucket counts, one could try to pre-compute the reciprocal as explained in Hacker's Delight and in these posts: (I need to write to the author and have some of the math fixed, but I think the general direction is solid.) For an internal hash table, it is possible to use primes which are convenient for the saturating increment algorithm because the choice of bucket count is an implementation detail to some extent. (It is not in my case, so it would need data-dependent branches, which is kind of counter-productive.) Not discussed on the quoted pages is a generalization which uses hashCode - bucketCount * (int) Long.multiplyHigh(hashCode + 1L, magic) as the bucket number. That works for any table size that is not a power of two, but requires a fast multiplier to get the upper half of a 64x64 product. Thanks, Florian From nils.eliasson at oracle.com Mon Nov 18 14:25:58 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 18 Nov 2019 15:25:58 +0100 Subject: RFR: 8234248: More VectorSet cleanups In-Reply-To: References: Message-ID: Hi Claes, Looks good, Regards, Nils On 2019-11-18 12:37, Claes Redestad wrote: > Hi, > > JDK-8233708 removed unused code from VectorSet, but missed a few spots. > > Additionally, the current code does a few other things like shifting up > signed longs and casting back to uint, which could be improved. > > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8234248 > Webrev: http://cr.openjdk.java.net/~redestad/8234248/open.00/ > > Testing: tier1-3 > > Thanks! > > /Claes From claes.redestad at oracle.com Mon Nov 18 14:33:18 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Mon, 18 Nov 2019 15:33:18 +0100 Subject: RFR: 8234248: More VectorSet cleanups In-Reply-To: References: Message-ID: <7fba2025-5386-a83b-6ef9-6172d8327ff9@oracle.com> On 2019-11-18 15:25, Nils Eliasson wrote: > Hi Claes, > > Looks good, Thanks you, Nils! /Claes From tobias.hartmann at oracle.com Mon Nov 18 14:53:41 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 18 Nov 2019 15:53:41 +0100 Subject: RFR: 8234248: More VectorSet cleanups In-Reply-To: References: Message-ID: <0b353d7f-59d6-c796-5286-5d5d4ecf4726@oracle.com> Hi Claes, looks good to me. vectset.cpp:41 "assert (" -> "assert(" Best regards, Tobias On 18.11.19 12:37, Claes Redestad wrote: > Hi, > > JDK-8233708 removed unused code from VectorSet, but missed a few spots. > > Additionally, the current code does a few other things like shifting up > signed longs and casting back to uint, which could be improved. > > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8234248 > Webrev: http://cr.openjdk.java.net/~redestad/8234248/open.00/ > > Testing: tier1-3 > > Thanks! > > /Claes From tobias.hartmann at oracle.com Mon Nov 18 14:58:13 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 18 Nov 2019 15:58:13 +0100 Subject: RFR(S): 8233193: Incorrect bailout from possibly_add_compiler_threads In-Reply-To: References: Message-ID: <41ee3d36-66b9-438a-e791-2a8dcb63d061@oracle.com> Hi Martin, this looks good to me. Best regards, Tobias On 14.11.19 17:18, Doerr, Martin wrote: > Hi, > > I'd like to cleanup exception handling in CompileBroker a little bit. > > Here's my proposal: > - Use THREAD instead of CHECK where no exceptions get thrown. > - Remove unused preload_classes. > - make_thread: Rename thread to new_thread to avoid confusion with THREAD (the current compiler thread). > - possibly_add_compiler_threads: Remove usage of EXCEPTION_MARK + CHECK because this functions is not supposed to kill the VM on exceptions. Add assertion to caller. > > Webrev: > http://cr.openjdk.java.net/~mdoerr/8233193_CompileBroker/webrev.00/ > > @David: > You didn't like usage of the CHECK macro in the initialization functions, but I think they are ok. > Not very nice to read, but the behavior looks ok to me. > At least, I didn't find a better replacement for them. Maybe you have a proposal? > > Best regards, > Martin > From claes.redestad at oracle.com Mon Nov 18 15:04:26 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Mon, 18 Nov 2019 16:04:26 +0100 Subject: RFR: 8234248: More VectorSet cleanups In-Reply-To: <0b353d7f-59d6-c796-5286-5d5d4ecf4726@oracle.com> References: <0b353d7f-59d6-c796-5286-5d5d4ecf4726@oracle.com> Message-ID: On 2019-11-18 15:53, Tobias Hartmann wrote: > Hi Claes, > > looks good to me. Thanks! > > vectset.cpp:41 "assert (" -> "assert(" Fixed! /Claes From martin.doerr at sap.com Mon Nov 18 17:23:42 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 18 Nov 2019 17:23:42 +0000 Subject: RFR(S): 8233193: Incorrect bailout from possibly_add_compiler_threads In-Reply-To: <41ee3d36-66b9-438a-e791-2a8dcb63d061@oracle.com> References: <41ee3d36-66b9-438a-e791-2a8dcb63d061@oracle.com> Message-ID: Hi Tobias, thanks for the review. Pushed. Best regards, Martin > -----Original Message----- > From: Tobias Hartmann > Sent: Montag, 18. November 2019 15:58 > To: Doerr, Martin ; 'hotspot-compiler- > dev at openjdk.java.net' ; David > Holmes (david.holmes at oracle.com) > Subject: Re: RFR(S): 8233193: Incorrect bailout from > possibly_add_compiler_threads > > Hi Martin, > > this looks good to me. > > Best regards, > Tobias > > On 14.11.19 17:18, Doerr, Martin wrote: > > Hi, > > > > I'd like to cleanup exception handling in CompileBroker a little bit. > > > > Here's my proposal: > > - Use THREAD instead of CHECK where no exceptions get thrown. > > - Remove unused preload_classes. > > - make_thread: Rename thread to new_thread to avoid confusion with > THREAD (the current compiler thread). > > - possibly_add_compiler_threads: Remove usage of EXCEPTION_MARK + > CHECK because this functions is not supposed to kill the VM on exceptions. > Add assertion to caller. > > > > Webrev: > > http://cr.openjdk.java.net/~mdoerr/8233193_CompileBroker/webrev.00/ > > > > @David: > > You didn't like usage of the CHECK macro in the initialization functions, but I > think they are ok. > > Not very nice to read, but the behavior looks ok to me. > > At least, I didn't find a better replacement for them. Maybe you have a > proposal? > > > > Best regards, > > Martin > > From john.r.rose at oracle.com Mon Nov 18 17:45:38 2019 From: john.r.rose at oracle.com (John Rose) Date: Mon, 18 Nov 2019 09:45:38 -0800 Subject: Bounds Check Elimination with Fast-Range In-Reply-To: <87r22549fg.fsf@oldenburg2.str.redhat.com> References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com> <87r22549fg.fsf@oldenburg2.str.redhat.com> Message-ID: <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com> On Nov 18, 2019, at 5:10 AM, Florian Weimer wrote: > >> >> The idea is that, given (int) hash h and (int) size N, then ((long) h) >> * N) >>> 32 is a good mapping. > > I looked at this in the past weeks in a different context, and I don't > think this would work because we have: That technique appears to require either a well-conditioned hash code (which is not the case with Integer.hashCode) or else a value of N that performs extra mixing on h. (So a very *non-*power-of-two value of N would be better here, i.e., N with larger popcount.) A little more mixing should help the problem Florian reports with a badly conditioned h. Given this: int fr(int h) { return (int)(((long)h * N) >>> 32); } int h = x.hashCode(); //int bucket = fr(h); // weak if h is badly conditioned then, assuming multiplication is cheap: int bucket = fr(h * M); // M = 0x2357BD or something or maybe something fast and sloppy like: int bucket = fr(h + (h << 8)); or even: int bucket = fr(h) ^ (h & (N-1)); From augustnagro at gmail.com Mon Nov 18 19:26:30 2019 From: augustnagro at gmail.com (August Nagro) Date: Mon, 18 Nov 2019 13:26:30 -0600 Subject: Bounds Check Elimination with Fast-Range In-Reply-To: <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com> References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com> <87r22549fg.fsf@oldenburg2.str.redhat.com> <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com> Message-ID: Yes, exactly. Once can also use Fibonacci hashing to ensure that an arbitrary Key.hashCode() is well distributed. See for instance my implementation of Universal Hashing.. https://gist.github.com/AugustNagro/4f2d70d261347e515efe0f87de9e8dc2 On Mon, Nov 18, 2019 at 11:45 AM John Rose wrote: > On Nov 18, 2019, at 5:10 AM, Florian Weimer wrote: > > > > The idea is that, given (int) hash h and (int) size N, then ((long) h) > * N) >>> 32 is a good mapping. > > > I looked at this in the past weeks in a different context, and I don't > think this would work because we have: > > > That technique appears to require either a well-conditioned hash code > (which is not the case with Integer.hashCode) or else a value of N that > performs extra mixing on h. (So a very *non-*power-of-two value of N > would be better here, i.e., N with larger popcount.) > > A little more mixing should help the problem Florian reports with a > badly conditioned h. Given this: > > int fr(int h) { return (int)(((long)h * N) >>> 32); } > int h = x.hashCode(); > //int bucket = fr(h); // weak if h is badly conditioned > > then, assuming multiplication is cheap: > > int bucket = fr(h * M); // M = 0x2357BD or something > > or maybe something fast and sloppy like: > > int bucket = fr(h + (h << 8)); > > or even: > > int bucket = fr(h) ^ (h & (N-1)); > > From fweimer at redhat.com Mon Nov 18 20:17:05 2019 From: fweimer at redhat.com (Florian Weimer) Date: Mon, 18 Nov 2019 21:17:05 +0100 Subject: Bounds Check Elimination with Fast-Range In-Reply-To: <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com> (John Rose's message of "Mon, 18 Nov 2019 09:45:38 -0800") References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com> <87r22549fg.fsf@oldenburg2.str.redhat.com> <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com> Message-ID: <877e3x0wji.fsf@oldenburg2.str.redhat.com> * John Rose: > On Nov 18, 2019, at 5:10 AM, Florian Weimer wrote: >> >>> >>> The idea is that, given (int) hash h and (int) size N, then ((long) h) >>> * N) >>> 32 is a good mapping. >> >> I looked at this in the past weeks in a different context, and I don't >> think this would work because we have: > > That technique appears to require either a well-conditioned hash code > (which is not the case with Integer.hashCode) or else a value of N that > performs extra mixing on h. (So a very *non-*power-of-two value of N > would be better here, i.e., N with larger popcount.) > > A little more mixing should help the problem Florian reports with a > badly conditioned h. Given this: > > int fr(int h) { return (int)(((long)h * N) >>> 32); } > int h = x.hashCode(); > //int bucket = fr(h); // weak if h is badly conditioned > > then, assuming multiplication is cheap: (Back-to-back multiplications probably are not.) > int bucket = fr(h * M); // M = 0x2357BD or something > > or maybe something fast and sloppy like: > > int bucket = fr(h + (h << 8)); > > or even: > > int bucket = fr(h) ^ (h & (N-1)); Does this really work? I don't think so. I think this kind of perturbation is quite expensive. Arm's BITR should be helpful here. But even though this operation is commonly needed and easily implemented in hardware, it's rarely found in CPUs. Any scheme with another multiplication is probably not an improvement over the multiply-shift-multiply-subtract sequence to implement modulo for certain convenient bucket counts, and for that, we can look up extensive analysis. 8-) Thanks, Florian From sergei.tsypanov at yandex.ru Mon Nov 18 22:44:00 2019 From: sergei.tsypanov at yandex.ru (=?utf-8?B?0KHQtdGA0LPQtdC5INCm0YvQv9Cw0L3QvtCy?=) Date: Tue, 19 Nov 2019 00:44:00 +0200 Subject: Allocation of array copy can be eliminated in particular cases Message-ID: <48528911574117040@sas8-55d8cbf44a35.qloud-c.yandex.net> Hello, this proposal was born as a result of discussion in core-libs-dev [1] and IDEA-226474 [2]. Originally I suggested to replace int count = method.getParameterTypes().length; with int count = method.getParameterCount(); Then it turned out that cloned array could be a subject of allocation elimination as any property dereferenced from the copy can be dereferenced from original array: Consider this test: @Test void arrayClone() { final Object[] objects = new Object[3]; objects[0] = "azaza"; objects[1] = 365; objects[2] = 9876L; final Object[] clone = objects.clone(); assertEquals(objects.length, clone.length); assertSame(objects[0], clone[0]); assertSame(objects[1], clone[1]); assertSame(objects[2], clone[2]); } Optimizing compiler could drop allocation of 'clone' variable and substitute its usage with original array, which is not done currently: @State(Scope.Thread) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) public class MethodParamCountBenchmark { private Method method; @Setup public void setup() throws Exception { method = getClass().getMethod("toString"); } @Benchmark public int getParameterCount() { return method.getParameterCount(); } @Benchmark public int getParameterTypes() { return method.getParameterTypes().length; } } on my i7-7700 with JDK 11 this benchmark yields these results: Benchmark Mode Cnt Score Error Units MethodToStringBenchmark.getParameterCount avgt 25 2,528 ? 0,085 ns/op MethodToStringBenchmark.getParameterCount:?gc.alloc.rate avgt 25 ? 10?? MB/sec MethodToStringBenchmark.getParameterCount:?gc.alloc.rate.norm avgt 25 ? 10?? B/op MethodToStringBenchmark.getParameterCount:?gc.count avgt 25 ? 0 counts MethodToStringBenchmark.getParameterTypes avgt 25 7,299 ? 0,410 ns/op MethodToStringBenchmark.getParameterTypes:?gc.alloc.rate avgt 25 1999,454 ? 89,929 MB/sec MethodToStringBenchmark.getParameterTypes:?gc.alloc.rate.norm avgt 25 16,000 ? 0,001 B/op MethodToStringBenchmark.getParameterTypes:?gc.churn.G1_Eden_Space avgt 25 2003,360 ? 91,537 MB/sec MethodToStringBenchmark.getParameterTypes:?gc.churn.G1_Eden_Space.norm avgt 25 16,030 ? 0,045 B/op MethodToStringBenchmark.getParameterTypes:?gc.churn.G1_Old_Gen avgt 25 0,004 ? 0,001 MB/sec MethodToStringBenchmark.getParameterTypes:?gc.churn.G1_Old_Gen.norm avgt 25 ? 10?? B/op MethodToStringBenchmark.getParameterTypes:?gc.count avgt 25 2380,000 counts MethodToStringBenchmark.getParameterTypes:?gc.time avgt 25 1325,000 ms I.e. intermediate array is allocated even if it doesn't escape the method it is created in. Is my speculation correct and does it make sence to implement optimization that turns sequence array -> array.clone() - > clone.length into array -> array.length for the cases clone's visibility scope is predictable? 1) http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-November/063344.html 2) https://youtrack.jetbrains.com/issue/IDEA-226474 Regards, Sergey Tsypanov From vladimir.x.ivanov at oracle.com Mon Nov 18 22:56:47 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 19 Nov 2019 01:56:47 +0300 Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 check In-Reply-To: <35dd0640-7f48-11ae-bd12-1fcaf893b2fc@oracle.com> References: <35dd0640-7f48-11ae-bd12-1fcaf893b2fc@oracle.com> Message-ID: <4314ca44-9094-a4d5-407e-9d9eaf5d4b37@oracle.com> > Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ Looks good, Patric. Best regards, Vladimir Ivanov PS: some cleanup suggestions (feel free to ignore them if you don't agree): src/hotspot/share/opto/ifnode.cpp: +// \r1 +// r2\ eqT eqF neT neF ltT ltF leT leF gtT gtF geT geF +// eq t f f t f - - f f - - f +// ne f t t f t - - t t - - t +// lt f - - f t f - f f - f t +// le t - - t t - t f f t - t +// gt f - - f f - f t t f - f +// ge t - - t f t - t t - t f +// +Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) { + // Table encoding: N/A (na), True-branch (tb), False-branch (fb). + static enum { na, tb, fb } s_subsume_map[6][12] = { + /*rel: eq+T eq+F ne+T ne+F lt+T lt+F le+T le+F gt+T gt+F ge+T ge+F*/ + /*eq*/{ tb, fb, fb, tb, fb, na, na, fb, fb, na, na, fb }, + /*ne*/{ fb, tb, tb, fb, tb, na, na, tb, tb, na, na, tb }, + /*lt*/{ fb, na, na, fb, tb, fb, na, fb, fb, na, fb, tb }, + /*le*/{ tb, na, na, tb, tb, na, tb, fb, fb, tb, na, tb }, + /*gt*/{ fb, na, na, fb, fb, na, fb, tb, tb, fb, na, fb }, + /*ge*/{ tb, na, na, tb, fb, tb, na, tb, tb, na, tb, fb }}; IMO you can dump the table from the comment: it mostly duplicates the code. (Probably, you can use a different name for "N/A" or just refer to it in numeric form (0?) to preserve clean structure of the table from the comment.) ====================================== + if (is_If() && (cmp = in(1)->in(1))->Opcode() == Op_CmpP) { + if (cmp->in(2) != NULL && // make sure cmp is not already dead + cmp->in(2)->bottom_type() == TypePtr::NULL_PTR) { Merge nested ifs? ====================================== Looks like extracting the following code into a helper function (along with the enum and the table) can improve readability. + int drel = subsuming_bool_test_encode(dom->in(1)); + int trel = subsuming_bool_test_encode(bol); + int bout = pre->is_IfFalse() ? 1 : 0; + + if (drel < 0 || trel < 0) { + return NULL; + } + int br = s_subsume_map[trel][2*drel+bout]; + if (br == na) { + return NULL; + } New function can return intcon(0/1) or bol(or NULL?) and the caller decides whether the update is needed. > On 12/11/2019 15:16, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ >> >> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check >> >> ??? Adding a simple subsumption test to IfNode::Ideal to enable a local >> ??? short-circuit for (obviously) redundant if-nodes. >> >> Testing: hs-tier1-4, hs-precheckin-comp >> >> >> Best regards, >> Patric >> > From serguei.spitsyn at oracle.com Mon Nov 18 23:56:53 2019 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 18 Nov 2019 15:56:53 -0800 Subject: RFR(S) : 8233462 : serviceability/tmtools/jstat tests times out with -Xcomp In-Reply-To: <2B7FD259-FBC2-4850-B1B3-81D1669F150B@oracle.com> References: <2B7FD259-FBC2-4850-B1B3-81D1669F150B@oracle.com> Message-ID: <728f5be0-2988-02e4-43b1-64a65c7b322e@oracle.com> Hi Igor, Looks good. Thank you for taking care about this! Thanks, Serguei On 11/15/19 23:47, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8233462/webrev.00 >> 33 lines changed: 1 ins; 14 del; 18 mod; > Hi all, > > could you please review this small fix for tmtools testlibrary? > tmtools tests are believed to fail due to a deadlock-like situation b/w main test process and tmtools process: > (from JBS) >> it seems these tests attach jstat to the main test process, the same process which reads the tool's stdout/stderr, so there is a possibility that this will deadlock: jstat-process produces more output than the buffer can hold, so it blocks till someone (the main process reads it), while the main process waits till jstat completes. > the patch changes serviceability/tmtools/share/common library (used by all serviceability/tmtools) to redirect tmtool's stdout and stderr into files instead of using jdk.test.lib.process.OutputAnalyzer; I've also added a bit of diagnostic output, so it will be easier to analyze future failures. > > webrev: http://cr.openjdk.java.net/~iignatyev//8233462/webrev.00 > JBS: https://bugs.openjdk.java.net/browse/JDK-8233462 > testing: > - serviceability/tmtools on windows-x64,linux-x64,macosx-x64,solaris-sparcv9 > - serviceability/tmtools 100 times on linux-x64-debug w/ '-Xcomp -ea -esa -XX:+TieredCompilation -XX:+DeoptimizeALot' (most of failures have been seen on this configuration) > > Thanks, > -- Igor From igor.ignatyev at oracle.com Tue Nov 19 00:01:41 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 18 Nov 2019 16:01:41 -0800 Subject: RFR(S) : 8233462 : serviceability/tmtools/jstat tests times out with -Xcomp In-Reply-To: <728f5be0-2988-02e4-43b1-64a65c7b322e@oracle.com> References: <2B7FD259-FBC2-4850-B1B3-81D1669F150B@oracle.com> <728f5be0-2988-02e4-43b1-64a65c7b322e@oracle.com> Message-ID: Hi Serguei, Thank you for your review and discussion around this issue. -- Igor > On Nov 18, 2019, at 3:56 PM, serguei.spitsyn at oracle.com wrote: > > Hi Igor, > > Looks good. > Thank you for taking care about this! > > Thanks, > Serguei > > > On 11/15/19 23:47, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8233462/webrev.00 >>> 33 lines changed: 1 ins; 14 del; 18 mod; >> Hi all, >> >> could you please review this small fix for tmtools testlibrary? >> tmtools tests are believed to fail due to a deadlock-like situation b/w main test process and tmtools process: >> (from JBS) >>> it seems these tests attach jstat to the main test process, the same process which reads the tool's stdout/stderr, so there is a possibility that this will deadlock: jstat-process produces more output than the buffer can hold, so it blocks till someone (the main process reads it), while the main process waits till jstat completes. >> the patch changes serviceability/tmtools/share/common library (used by all serviceability/tmtools) to redirect tmtool's stdout and stderr into files instead of using jdk.test.lib.process.OutputAnalyzer; I've also added a bit of diagnostic output, so it will be easier to analyze future failures. >> >> webrev: http://cr.openjdk.java.net/~iignatyev//8233462/webrev.00 >> JBS: https://bugs.openjdk.java.net/browse/JDK-8233462 >> testing: >> - serviceability/tmtools on windows-x64,linux-x64,macosx-x64,solaris-sparcv9 >> - serviceability/tmtools 100 times on linux-x64-debug w/ '-Xcomp -ea -esa -XX:+TieredCompilation -XX:+DeoptimizeALot' (most of failures have been seen on this configuration) >> >> Thanks, >> -- Igor > From mikhailo.seledtsov at oracle.com Mon Nov 18 22:06:39 2019 From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com) Date: Mon, 18 Nov 2019 14:06:39 -0800 Subject: RFR(S) : 8147017 : Platform.isGraal should be removed In-Reply-To: <981118AF-1DAD-4231-9FA6-7A89A46E5EDB@oracle.com> References: <981118AF-1DAD-4231-9FA6-7A89A46E5EDB@oracle.com> Message-ID: <5e1d17af-798f-123f-ef5e-3957b98a8340@oracle.com> Looks good to me, Misha On 11/17/19 11:00 AM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html >> 16 lines changed: 2 ins; 8 del; 6 mod; > Hi all, > > jdk.test.lib.Platform.isGraal method assumes that JVM w/ Graal as JIT has 'Graal VM' in its name, which is wrong, and caused other to incorrectly assume that '-graal' flag exist and must be used to select Graal compiler. the patch removes this method and updates its only meaningful usage in TestGCLogMessages test. TestGCLogMessages test should use LogMessageWithLevelC2OrJVMCIOnly only when c2 or graal is available, so it's been updated to use corresponding methods of sun.hotspot.code.Compiler class, which requires WhiteBoxAPI being enabled. > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8147017 > webrev: http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html > testing: tier1 + TestGCLogMessages w/ different JIT configurations > > Thanks, > -- Igor From Pengfei.Li at arm.com Tue Nov 19 10:03:50 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Tue, 19 Nov 2019 10:03:50 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <638df51f-f9e0-5ceb-6509-7d396f85d8ec@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com> <638df51f-f9e0-5ceb-6509-7d396f85d8ec@redhat.com> Message-ID: Hi Andrew, > Why do you think so? UseCompressedOops doesn't usually need r27. If I understand correctly, your point is to allocate r27 as well for some scenarios when UseCompressedOops or UseCompressedClassPointers is on. This optimization is much more aggressive and I will try to do it carefully. > We should have a flag which is set if the search for nicely-aligned > memory is successful, and then you can use that flag to determine if r27 is needed. In which file do you think we should add the flag? Can we just check the value of CompressedKlassPointers::base() in reg_mask_init() ? -- Thanks, Pengfei From claes.redestad at oracle.com Tue Nov 19 10:18:36 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Tue, 19 Nov 2019 11:18:36 +0100 Subject: RFR: 8234328: VectorSet::clear can cause fragmentation Message-ID: <47dce9ee-0e62-7375-4dff-2924f824ecc6@oracle.com> Hi, today, VectorSet::clear "reclaims" storage when the size is large. However, since the backing array is allocated in a resource arena, this is dubious since the currently retained memory is only actually freed and made reusable if it's currently the last chunk of memory allocated in the arena. This means a clear() is likely to just waste the allocated memory until we exit the current resource scope Instead, I propose a strategy where instead of "freeing" we keep track of the currently allocated size of the VectorSet separately from the in- use size. We can then defer the memset to reset/clear the memory to the next time we need to grow, thus avoiding unnecessary reallocations and memsets. This limits the memory waste. Bug: https://bugs.openjdk.java.net/browse/JDK-8234328 Webrev: http://cr.openjdk.java.net/~redestad/8234328/open.00/ Testing: tier1-3 Either of reset() or clear() could now be removed, which seems like a straightforward follow-up RFE. With some convincing I could roll it into this patch. Thanks! /Claes From christoph.goettschkes at microdoc.com Tue Nov 19 10:38:42 2019 From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com) Date: Tue, 19 Nov 2019 11:38:42 +0100 Subject: RFR: 8231954: [TESTBUG] Test compiler/codegen/TestCharVect2.java only works with server VMs. In-Reply-To: <27E94E0A-7A99-4B0D-B96D-7DADF6201542@oracle.com> References: <20191112120936.1D826D285F@aojmv0009> <75637C64-1F76-4B67-A1B0-97E0144F3DD6@oracle.com> <2w8hcs8xd1-1@aserp2030.oracle.com> <7BEE375F-312E-4EE5-B0FA-2223752689A3@oracle.com> <71204849-979a-bd2e-b302-69ac81a44bca@oracle.com> <20191114112213.D20B8D9B6E@aojmv0009> <8f0630cf-7fe6-8021-f195-9595d6f460c9@oracle.com> <27E94E0A-7A99-4B0D-B96D-7DADF6201542@oracle.com> Message-ID: Hi Igor, could you please sponsor this changeset and commit it for me into the repository? https://cr.openjdk.java.net/~cgo/8231954/webrev.03/jdk-jdk.changeset Thanks, Christoph Igor Ignatyev wrote on 2019-11-15 20:21:51: > > I'd also add that unfortunately it's not always right to add > '@requirers vm.compiler2.enabled' to a test just b/c it uses some c2- > only flags. there are cases when such tests aren't just still valid w/o > these flags, but are capable to spot bugs, on the other hand, tests > which have multiple runs w/ different values of c2-only (or c1-only) > flags can be split to reduce wasted time. that's to say decision on > whenever @requires is a right thing should be done on test-per-test basis. > > as 8231954 was only about TestCharVect2 test, I suggest we push > Christoph's webrev.03 and file an RFE to deal w/ other tests, or > retrofit 8228493 to talk not only about non-product flags but also > about c2/c1-only flags and use it as an umbrella for discussion/work-tracking. > > -- Igor > > > On Nov 15, 2019, at 11:11 AM, Vladimir Kozlov > wrote: > > > > Note, compiler/c2 and compiler/c1 was misleading naming for tests > directories which is nothing to do with C1 and C2 JIT compilers. They > are simply 2 groups of tests we split so they can be executed in > parallel in reasonable time. > > > > Vladimir > > > > On 11/14/19 7:35 PM, Yang Zhang (Arm Technology China) wrote: > >> Hi Christoph, Igor, Vladimir, > >> Thanks very much for your fix. After discussion, we have got a > better solution for this issue. Do we need to change the following > files in which MaxVectorSize option is used? > >> [1] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/ > hotspot/jtreg/compiler/vectorization/TestNaNVector.java > >> [2] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/ > hotspot/jtreg/compiler/vectorization/TestPopCountVector.java > >> [3] https://hg.openjdk.java.net/jdk/jdk/file/4dbdb7a8fa75/test/ > hotspot/jtreg/compiler/c2/cr6340864 > >> Ps. For [3], it locates in c2 directory. So I'm not sure whether > they will be excluded in jtreg test with client mode. > >> Regards > >> Yang > >> -----Original Message----- > >> From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of christoph.goettschkes at microdoc.com > >> Sent: Thursday, November 14, 2019 7:21 PM > >> To: vladimir.kozlov at oracle.com; igor.ignatyev at oracle.com > >> Cc: hotspot-compiler-dev at openjdk.java.net > >> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/ > TestCharVect2.java only works with server VMs. > >> Thanks for your feedback, this resolves my concerns and I am happy > with the solution. I integrated the suggestions from Vladimir, here is > the latest webrev: > >> https://cr.openjdk.java.net/~cgo/8231954/webrev.02/ > >> I re-tested and it works as expected. > >> Please give your consent if this is fine for you as well. > >> -- Christoph > >> Vladimir Kozlov wrote on 2019-11-13 20:32:18: > >>> From: Vladimir Kozlov > >>> To: Igor Ignatyev , > >> christoph.goettschkes at microdoc.com > >>> Cc: hotspot compiler > >>> Date: 2019-11-13 20:32 > >>> Subject: Re: RFR: 8231954: [TESTBUG] Test compiler/codegen/ > >>> TestCharVect2.java only works with server VMs. > >>> > >>> On 11/13/19 11:11 AM, Igor Ignatyev wrote: > >>>> @Christoph, > >>>> > >>>> webrev.01 looks good to me. > >>>> I always thought that jvmci feature can be built only when compiler2 > >>> feature is enabled, however src/hotspot/share/jvmci/jvmci_globals.hpp > >>> suggests that jvmci can be used w/o compiler2; I don't think we have > >>> ever build/test, let alone support, this configuration. > >>>> > >>>> @Vladimir, > >>>> did/do we plan to support compiler1 + jvmci w/o compiler2 > >> configuration? > >>> > >>> Yes. It could be configuration when we start looking on replacing C1 > >>> with Graal. I think several people were interested in "Client VM" like > >>> configuration. > >>> Also Server configuration without C2 (with Graal or other jvmci > >>> compiler) which would be out configuration in a future. > >>> > >>> But I would prefer to be more explicit in these changes: > >>> > >>> @requires vm.compiler2.enabled | vm.graal.enabled > >>> > >>> Thanks, > >>> Vladimir > >>> > >>>> > >>>> Thanks, > >>>> -- Igor > >>>> > >>>>> On Nov 13, 2019, at 4:42 AM, christoph.goettschkes at microdoc.com > >> wrote: > >>>>> > >>>>> Hi Igor, > >>>>> > >>>>> thanks for your explanation. > >>>>> > >>>>> Igor Ignatyev wrote on 2019-11-12 > >> 20:40:46: > >>>>> > >>>>>> we are trying to get rid of IgnoreUnrecognizedVMOptions in our > >> tests, as > >>>>>> in most cases, it causes wasted compute time (as in this test) and > >> can > >>>>>> also lead to wrong/deprecated/deleted flags sneaking into the > >> testbase > >>>>> > >>>>> Agreed. I also wanted to discuss this, since I think that your > >> solution > >>>>> is better than mine, but at the same time, I saw possible problems > >> with > >>>>> it, see below. > >>>>> > >>>>>> as '@requires vm.flavor == "server"' filters configurations based > >>>>>> vm build type, it will still allow execution on JVM w/ JVMCI and > >>>>>> when > >> JVMCI > >>>>>> compiler is selected, as it will still be Server VM build. so, in > >>>>>> a sense, the test will be w/ JVMCI in the same way as w/ your > >> approach. > >>>>> > >>>>> My concern is not about server VMs with JVMCI, but client VMs with > >> JVMCI > >>>>> enabled. Is this a valid configuration? The MaxVectorSize option is > >>>>> defined in [1] as well as in [2], so for me it looks like > >> MaxVectorSize > >>>>> can be used for any VM variant as long as JVMCI is enabled. The > >>>>> configure script also states that both compilers are possible (if > >>>>> you configure with --with-jvm-features='jvmci'): > >>>>> > >>>>> configure: error: Specified JVM feature 'jvmci' requires feature > >>>>> 'compiler2' or 'compiler1' > >>>>> > >>>>> Should maybe the requires tag "vm.jvmci" be used as well, like: > >>>>> > >>>>> @requires vm.flavor == "server" | vm.jvmci > >>>>> > >>>>>> this is the known limitation of jtreg/@requires, and our current > >>>>>> way > >> to > >>>>>> workaround it is to split a test description based on @requires > >> values > >>>>> > >>>>> Yes, if the @requires tag is used, splitting up the test looks like > >>>>> a > >> good > >>>>> idea. I didn't know that it is possible to have multiple test > >> descriptions > >>>>> in one test file. > >>>>> > >>>>> I created a new webrev with the new ideas: > >>>>> > >>>>> https://cr.openjdk.java.net/~cgo/8231954/webrev.01/ > >>>>> > >>>>> I tested with an amd64 client and server VM and it looks good. I am > >>>>> currently unable to build a client VM with JVMCI enabled, hence no > >> test > >>>>> for that yet. I get compile errors and as soon as I resolve those, > >>>>> runtime errors occur. Before I look into that, I would like to know > >> if > >>>>> client VMs with JVMCI enabled are supported or not. > >>>>> > >>>>> Thanks, > >>>>> Christoph > >>>>> > >>>>> [1] > >>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/ > >>> share/opto/c2_globals.hpp > >>>>> > >>>>> [2] > >>>>> https://hg.openjdk.java.net/jdk/jdk/file/dc1899bb84c0/src/hotspot/ > >>> share/jvmci/jvmci_globals.hpp > >>>>> > >>>> > >>> > From augustnagro at gmail.com Tue Nov 19 04:43:23 2019 From: augustnagro at gmail.com (August Nagro) Date: Mon, 18 Nov 2019 22:43:23 -0600 Subject: Bounds Check Elimination with Fast-Range In-Reply-To: <877e3x0wji.fsf@oldenburg2.str.redhat.com> References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com> <87r22549fg.fsf@oldenburg2.str.redhat.com> <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com> <877e3x0wji.fsf@oldenburg2.str.redhat.com> Message-ID: <2BAEB1D3-46B7-4BA9-81A3-4F5E7B47B82A@gmail.com> Apologies; there were actually a few errors in my Universal Hashing javadoc (not the code); they?ve been corrected: https://gist.github.com/AugustNagro/4f2d70d261347e515efe0f87de9e8dc2 One thing that might be relevant to the bounds check elimination is that in Java fast-range will output in the range of [-tableSize / 2, tableSize / 2 - 1]. So then we need table[fr(hash) + tableSize/2]. However, tableSize / 2 will be a constant, so that division need only be done once. Regarding Florian?s concerns: yes it is right that fast-range isn?t optimal in every case (and I never tried to claim that). If your tableSize is a power of 2, then just use xor/mask ala HashMap. But the benefit is when mapping to tables of arbitrary size, where those modulo intrinsics may not apply. And here?s a tangent to thing to think about: is growing HashMap?s backing array by powers of 2 actually a good thing, when the HashMap gets large? What if you instead wanted to grow by powers of 1.5, or even grow probabilistically, based on the collision rate, allocation pressure, or other data? With fast-range you can do this if you want. And without the performance hit of %! > On Nov 18, 2019, at 2:17 PM, Florian Weimer wrote: > > * John Rose: > >> On Nov 18, 2019, at 5:10 AM, Florian Weimer wrote: >>> >>>> >>>> The idea is that, given (int) hash h and (int) size N, then ((long) h) >>>> * N) >>> 32 is a good mapping. >>> >>> I looked at this in the past weeks in a different context, and I don't >>> think this would work because we have: >> >> That technique appears to require either a well-conditioned hash code >> (which is not the case with Integer.hashCode) or else a value of N that >> performs extra mixing on h. (So a very *non-*power-of-two value of N >> would be better here, i.e., N with larger popcount.) >> >> A little more mixing should help the problem Florian reports with a >> badly conditioned h. Given this: >> >> int fr(int h) { return (int)(((long)h * N) >>> 32); } >> int h = x.hashCode(); >> //int bucket = fr(h); // weak if h is badly conditioned >> >> then, assuming multiplication is cheap: > > (Back-to-back multiplications probably are not.) > >> int bucket = fr(h * M); // M = 0x2357BD or something >> >> or maybe something fast and sloppy like: >> >> int bucket = fr(h + (h << 8)); >> >> or even: >> >> int bucket = fr(h) ^ (h & (N-1)); > > Does this really work? I don't think so. > > I think this kind of perturbation is quite expensive. Arm's BITR should > be helpful here. But even though this operation is commonly needed and > easily implemented in hardware, it's rarely found in CPUs. > > Any scheme with another multiplication is probably not an improvement > over the multiply-shift-multiply-subtract sequence to implement modulo > for certain convenient bucket counts, and for that, we can look up > extensive analysis. 8-) > > Thanks, > Florian From vladimir.x.ivanov at oracle.com Tue Nov 19 12:40:14 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 19 Nov 2019 15:40:14 +0300 Subject: 8230015: [instruction selector] generic vector operands support. In-Reply-To: <0868bad5-c573-362f-0ff6-4a4332d95ecb@oracle.com> References: <0868bad5-c573-362f-0ff6-4a4332d95ecb@oracle.com> Message-ID: <6231a756-bb90-3536-9bfd-a6fdc0702f5a@oracle.com> Short update on the progress: after extensive offline discussions between Jatin, Sandhya, and me, we decided to split the original patch into multiple independent pieces and post them for review separately. On behalf of Jatin, I'll initiate next round of reviews shortly. Generic vector support will be posted first (along with a couple of enhancements). AD instruction merges are in good shape, but still need more work. They'll be posted for review later. Best regards, Vladimir Ivanov On 12.10.2019 03:41, Vladimir Ivanov wrote: > Hi Jatin, > > FTR I'm looking at: > ? http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/webrev.03/ > > Good work, Jatin. The reduction in number of instructions looks > impressive. Also, it's nice to see excessive vec? <-> legVec? moves > being eliminated. > > High-level comments on the patch: > > ? - Please, separate the patch in 2 changes: generic vector type > support and AD instruction unification. It would significantly simplify > review. > > ? - Try to avoid x86-specific changes (#ifdefs) in shared files. > > > I'm still looking through the implementation, but here are the bugs in > AD unification spotted so far: > > ==================================================================== > src/hotspot/cpu/x86/x86_64.ad: > > +operand vecG() %{ > +? constraint(ALLOC_IN_RC(vectorg_reg)); > > +operand legVecG() %{ > +? constraint(ALLOC_IN_RC(vectorg_reg)); > > legVecG definition shouldn't be equivalent to vecG: on AVX512-capable > host legVecG should never include zmm16-zmm31 while vecG can. > > > ==================================================================== > src/hotspot/cpu/x86/x86.ad: > > +instruct ReplF_zero_avx(vecG dst, immF0 zero) %{ > +? predicate(UseAVX < 3 && > > UseAVX > 0 check is missing (same with ReplD_zero_avx). > > > ==================================================================== > +instruct vaddB_mem(vecG dst, vecG src, memory mem) %{ > +? predicate(UseAVX && (UseAVX <= 2 || VM_Version::supports_avx512bw()) && > +??????????? n->as_Vector()->length() >= 4 && n->as_Vector()->length() > <= 64); > > It doesn't match the following case anymore: > ? UseAVX == 3 && > ? VM_Version::supports_avx512bw() == false && > ? n->as_Vector()->length() < 64 // smaller than 512bit > > There are other AD instructions? (with VMVersion::supports_avx512bw() > check) which are affected the same way. > > Best regards, > Vladimir Ivanov > > On 22/08/2019 09:49, Bhateja, Jatin wrote: >> Hi All, >> >> Please find below a patch for generic vector operands[1] support >> during instruction selection. >> >> Motivation behind the patch is to reduce the number of vector >> selection patterns whose operands meagerly differ in vector lengths. >> >> This will not only result in lesser code being generated by ADLC which >> effectively translates to size reduction in libjvm.so but also >> >> help in better maintenance of AD files. >> >> Using generic operands we were able to collapse multiple vector >> patterns over mainline >> >> ??????? ??????Initial number of vector instruction patterns >> (vec[XYZSD] + legVec[ZXYSD] ? : *510* >> >> ??????????????Reduced vector instruction patterns? (vecG + >> legVecG)?????? ?????????????? ? ? ? ? ? ? : *222* >> >> ** >> >> With this we could see around 1MB size reduction in libjvm.so. >> >> In order to have minimal impact over downstream compiler passes, a >> post-selection pass has been introduced (currently enabled only for >> X86 target) >> >> which replaces these generic operands with their corresponding >> concreter vector length variants. >> >> JBS?? ???: https://bugs.openjdk.java.net/browse/JDK-8230015 >> >> Patch ?: >> http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/webrev.00/ >> >> Kindly review and share your feedback. >> >> Best Regards, >> >> Jatin Bhateja >> >> [1] >> http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf >> >> From vladimir.x.ivanov at oracle.com Tue Nov 19 13:00:36 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 19 Nov 2019 16:00:36 +0300 Subject: [14] RFR (S): 8234387: C2: Better support of operands with multiple match rules in AD files Message-ID: <30655159-7431-1a33-cb10-373e32c68002@oracle.com> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234387/webrev.00 https://bugs.openjdk.java.net/browse/JDK-8234387 Though ADLC accepts operands with multiple match rules, it doesn't generate correct code to handle them except the first one. It doesn't cause any noticeable problems for existing code, but is a major limitation for generic vector operands (JDK-8234391 [1]). Proposed fix enumerates all match rules. Fixed some missing declarations along the way. Contributed-by: Jatin Bhateja Reviewed-by: vlivanov, sviswanathan, ? Testing: tier1-4 (both with and without generic vectors) Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8234391 From vladimir.x.ivanov at oracle.com Tue Nov 19 13:40:47 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 19 Nov 2019 16:40:47 +0300 Subject: [14] RFR (S): 8234394: C2: Dynamic register class support in ADLC Message-ID: <82d28c5a-1b18-240d-8356-5e4266c63bd1@oracle.com> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234394/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8234394 Introduce new "placeholder" register class which denotes that instructions which use operands of such class should dynamically query register masks from the operand instance and not hard-code them in the code. It is required for generic vectors in order to support generic vector operand (vec/legVec) replacement with fixed-sized vector operands (vec[SDXYZ]/legVec[SDXYZ]) after matching is over. As an example of usage, generic vector operand is declared as: operand vec() %{ constraint(ALLOC_IN_RC(dynamic)); match(VecX); match(VecY); match(VecZ); match(VecS); match(VecD); ... Then for an instruction which uses vec as DEF x86.ad: instruct loadV4(vec dst, memory mem) %{ =ADLC=> ad_x86_misc.cpp: const RegMask &loadV4Node::out_RegMask() const { return (*_opnds[0]->in_RegMask(0)); } vs x86.ad: instruct loadV4(vecS dst, memory mem) %{ =ADLC=> ad_x86_misc.cpp: const RegMask &loadV4Node::out_RegMask() const { return (VECTORS_REG_VLBWDQ_mask()); } An operand with dynamic register class can't be used during code emission and should be replaced with something different before register allocation: const RegMask *vecOper::in_RegMask(int index) const { return &RegMask::Empty; } Contributed-by: Jatin Bhateja Reviewed-by: vlivanov, sviswanathan, ? Testing: tier1-4 (both with and without generic vector operands) Best regards, Vladimir Ivanov From erik.osterlund at oracle.com Tue Nov 19 14:20:13 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Tue, 19 Nov 2019 15:20:13 +0100 Subject: RFR: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier Message-ID: Hi, Intel released an erratum (SKX102) which causes "unexpected system behaviour" when branches (including fused conditional branches) cross or end at 64 byte boundaries. They are mitigating this by rolling out microcode updates that disable micro op caching for conditional branches that cross or end at 32 byte boundaries. The mitigation can cause performance regressions, unless affected branches are aligned properly. The erratum and its mitigation are described in more detail in this document published by Intel: https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf My intention for this patch is to introduce the infrastructure to determine that we may have an affected CPU, and mitigate this by aligning the most important branch in the whole JVM: the ZGC load barrier fast path check. Perhaps similar methodology can be reused later to solve this for other performance critical code, but that is outside the scope of this CR. The sprinkling of nops do not seem to cause regressions in workloads I have tried, given a machine without the JCC mitigations. Bug: https://bugs.openjdk.java.net/browse/JDK-8234160 Webrev: http://cr.openjdk.java.net/~eosterlund/8234160/webrev.00/ Thanks, /Erik From vladimir.x.ivanov at oracle.com Tue Nov 19 14:30:37 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 19 Nov 2019 17:30:37 +0300 Subject: [14] RFR (L): 8234391: C2: Generic vector operands Message-ID: <89904467-5010-129f-6f61-e279cce8936a@oracle.com> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234391/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8234391 Introduce generic vector operands and migrate existing usages from fixed sized operands (vec[SDXYZ]) to generic ones. (It's an updated version of generic vector support posted for review in August, 2019 [1] [2]. AD instruction merges will be handled separately.) On a high-level it is organized as follows: (1) all AD instructions in x86.ad/x86_64.ad/x86_32.ad use vec/legVec; (2) at runtime, right after matching is over, a special pass is performed which does: * replaces vecOper with vec[SDXYZ] depending on mach node type - vector mach nodes capute bottom_type() of their ideal prototype; * eliminates redundant reg-to-reg vector moves (MoveVec2Leg /MoveLeg2Vec) - matcher needs them, but they are useless for register allocator (moreover, may cause additional spills); (3) after post-selection pass is over, all mach nodes should have fixed-size vector operands. Some details: (1) vec and legVec are marked as "dynamic" operands, so post-selection rewriting works (2) new logic is guarded by new matcher flag (Matcher::supports_generic_vector_operands) which is enabled only on x86 (3) post-selection analysis is implemented as a single pass over the graph and processing individual nodes using their own (for DEF operands) or their inputs (USE operands) bottom_type() (which is an instance of TypeVect) (4) most of the analysis is cross-platform and interface with platform-specific code through 3 methods: static bool is_generic_reg2reg_move(MachNode* m); // distinguishes MoveVec2Leg/MoveLeg2Vec nodes static bool is_generic_vector(MachOper* opnd); // distinguishes vec/legVec operands static MachOper* clone_generic_vector_operand(MachOper* generic_opnd, uint ideal_reg); // constructs fixed-sized vector operand based on ideal reg // vec + Op_Vec[SDXYZ] => vec[SDXYZ] // legVec + Op_Vec[SDXYZ] => legVec[SDXYZ] (5) TEMP operands are handled specially: - TEMP uses max_vector_size() to determine what fixed-sized operand to use * it is needed to cover reductions which don't produce vectors but scalars - TEMP_DEF inherits fixed-sized operand type from DEF; (6) there is limited number of special cases for mach nodes in Matcher::get_vector_operand_helper: - RShiftCntV/RShiftCntV: though it reports wide vector type as Node::bottom_type(), its ideal_reg is VecS! But for vector nodes only Node::bottom_type() is captured during matching and not ideal_reg(). - vshiftcntimm: chain instructions which convert scalar to vector don't have vector type. (7) idealreg2regmask initialization logic is adjusted to handle generic vector operands (see Matcher::get_vector_regmask) (8) operand renaming in x86_32.ad & x86_64.ad to avoid name conflicts with new vec/legVec operands (9) x86_64.ad: all TEMP usages of vecS/legVecS are replaced with regD/legRegD - it aligns the code between x86_64.ad and x86_32.ad - strictly speaking, it's illegal to use vector operands on a non-vector node (e.g., string_inflate) unless its usage is guarded by C2 vector support checks (-XX:MaxVectorSize=0) Contributed-by: Jatin Bhateja Reviewed-by: vlivanov, sviswanathan, ? Testing: tier1-tier4, jtreg compiler tests on KNL and SKL, performance testing (SPEC* + Octane + micros / G1 + ParGC). Best regards, Vladimir Ivanov [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html [2] http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf From lutz.schmidt at sap.com Tue Nov 19 14:35:45 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 19 Nov 2019 14:35:45 +0000 Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free blocks In-Reply-To: References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com> <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com> <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com> <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com> <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com> <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com> <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com> <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com> <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com> Message-ID: Hi Andrew, finally(!) I was able to create some measurements which show kind of an effect on a real-world problem. I added my timers when running the renaissance benchmark (https://renaissance.dev). I am well aware of the limitations. One could argue this benchmark does not solve a real-world problem. Furthermore, the optimizations do not have a visible effect on the overall runtime (> 1 hour) of the test. But at least, deep down, the inner mechanics of CodeHeap management show some timing difference. I have attached a file with some measurement data to this mail for convenience. The same file was also uploaded to the bug. The measurements are from runs on linuxppc64. Other platforms show similar results. Here is what you can see (and my interpretation of the visible): CodeHeap::mark_segmap_as_used() =============================== The number of segment map entries to be processed per call is reduced by a factor of 2.5 to 5. As a consequence, the time spent in the method decreases as well, but not by the same factor. This is due to the added check for fragmentation and the defragmentation itself which occurs twice and eliminates roughly 3.500 excessive fragments. CodeHeap::add_to_freelist() =========================== Here, the free list length controls the effort spent. Depending on the platform, the length increases by a factor of 2 (with optimizations turned on) or decreases by the same factor. Even with increased free list length, the total time spent in the method decreases. That's obviously an effect of not having to search the free list from the beginning every time. I have created a new webrev, mainly to reflect the changes I applied, based on Thomas' comments: http://cr.openjdk.java.net/~lucy/webrevs/8231460.02/ jdk/submit tests pending... Please let me know if we have reached a state now where this change can be considered reviewed. Thanks a lot, Lutz On 07.11.19, 22:33, "Schmidt, Lutz" wrote: Hi Andrew, thanks for spending more thoughts on this matter - and for updating your opinion. The instrumentation and measurement of other tests will take longer than expected. It got delayed by JDK-8233787. The fix for this bug will enable my timing code to run smoother. Side note: this timing code I have mentioned now several times is nothing secret. It's just not suitable to contribute, among other reasons because it's only available for ppc and s390. I can give you more information in case you are interested - no problem if you say "ahhh, never mind...". Thanks, Lutz On 07.11.19, 17:34, "Andrew Dinn" wrote: On 04/11/2019 15:35, Schmidt, Lutz wrote: > thank you for your thoughts. I do not agree to your conclusion, > though. > > There are two bottlenecks in the CodeHeap management code. One is in > CodeHeap::mark_segmap_as_used(), uncovered by > OverflowCodeCacheTest.java. The other is in > CodeHeap::add_to_freelist(), uncovered by StressCodeCacheTest.java. > > Both bottlenecks are tackled by the recommended changeset. . . . > CodeHeap::add_to_freelist() is still O(n*n), with n being the free > list length. But the kick-in point of the non-linearity could be > significantly shifted towards larger n. The time reduction from > approx. 8 seconds to 160 milliseconds supports this statement. Ah sorry, I was not clear from your original post that the proposed change had significantly improved the time spent in free list management in the second test by significantly cutting down the free list size. As you say, a reduction factor of 1/K in list size will give a 1/K*K reduction in execution time. Since this test is a lot nearer to reality than the overflow test I think the current result is perhaps enough to justify its value. > I agree it would be helpful to have a "real-world" example showing > some improvement. Providing such evidence is hard, though. I could > instrument the code and print some values form time to time. It's > certain this additional output will mess up success/failure decisions > in our test environment. Not sure everybody likes that. But I will > give it a try and take the hits. This will be a multi-day effort. Well, that would be nice to have but not if it stops other work. The one thing about the Stress test that I fear may be 'unreal' is the potentially over-high probability of generating long(ish) runs of adjacent free segments. That might be giving an artificial win that we will not in fact see. However, given the current numbers I'd be happy to risk that and let this patch go in as is. > On a general note, I am always uncomfortable knowing of a O(n*n) > effort, in particular when it could be removed or at least tamed > considerably. Experience tells (at least to me) that, at some point > in time, n will be large enough to hurt. Well, yes, although salesman do travel /and/ make money ... ;-) > I'll be back. Sure, thanks for following up. This is all very interesting. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Renaissance_CodeHeap_timing.txt URL: From adinn at redhat.com Tue Nov 19 15:33:36 2019 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 19 Nov 2019 15:33:36 +0000 Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free blocks In-Reply-To: References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com> <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com> <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com> <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com> <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com> <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com> <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com> <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com> <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com> Message-ID: <873965f0-61cc-4aac-fac1-a6cba442971f@redhat.com> Hi Lutz, Thanks for persevering and obtaining these measurements. The benchmark may not necessarily be a great indicator of real-world problems but it is a better indicator of code cache performance than any other tests we have seen so far. I'm happy to accept this as evidence that the patch will improve performance. So, yes, reviewed! regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill On 19/11/2019 14:35, Schmidt, Lutz wrote: > finally(!) I was able to create some measurements which show kind of > an effect on a real-world problem. > > I added my timers when running the renaissance benchmark > (https://renaissance.dev). I am well aware of the limitations. One > could argue this benchmark does not solve a real-world problem. > Furthermore, the optimizations do not have a visible effect on the > overall runtime (> 1 hour) of the test. But at least, deep down, the > inner mechanics of CodeHeap management show some timing difference. I > have attached a file with some measurement data to this mail for > convenience. The same file was also uploaded to the bug. The > measurements are from runs on linuxppc64. Other platforms show > similar results. > > Here is what you can see (and my interpretation of the visible): > > CodeHeap::mark_segmap_as_used() > =============================== > The number of segment map entries to be processed per call is reduced > by a factor of 2.5 to 5. As a consequence, the time spent in the > method decreases as well, but not by the same factor. This is due to > the added check for fragmentation and the defragmentation itself > which occurs twice and eliminates roughly 3.500 excessive fragments. > > CodeHeap::add_to_freelist() > =========================== > Here, the free list length controls the effort spent. Depending on > the platform, the length increases by a factor of 2 (with > optimizations turned on) or decreases by the same factor. Even with > increased free list length, the total time spent in the method > decreases. That's obviously an effect of not having to search the > free list from the beginning every time. > > > I have created a new webrev, mainly to reflect the changes I applied, > based on Thomas' comments: > http://cr.openjdk.java.net/~lucy/webrevs/8231460.02/ > > jdk/submit tests pending... > > Please let me know if we have reached a state now where this change > can be considered reviewed. From vladimir.x.ivanov at oracle.com Tue Nov 19 16:53:09 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 19 Nov 2019 19:53:09 +0300 Subject: [14] RFR (S): 8234401: ConstantCallSite may stuck in non-frozen state Message-ID: http://cr.openjdk.java.net/~vlivanov/8234401/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8234401 ConstantCallSite has a ctor which deliberately leaks partially initialized instance into user code. isFrozen is declared final and if user code is obstinate enough, it can end up with non-frozen state embedded into the generated code. It manifests as a ConstantCallSite instance which is stuck in non-frozen state. I switched isFrozen from final to @Stable, so non-frozen state is never constant folded. Put some store-store barriers along the way to restore final field handling. I deliberately stopped there (just restoring isFrozen final field behavior). Without proper synchronization, there's still a theoretical possibility of transiently observing a call site in non-frozen state right after ctor is over. But at least there's no way anymore to accidentally break an instance in such a way it becomes permanently unusable. PS: converted CallSite.target to final along the way and made some other minor refactorings. Testing: regression test, tier1-2 Best regards, Vladimir Ivanov From thomas.stuefe at gmail.com Tue Nov 19 16:57:52 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 19 Nov 2019 17:57:52 +0100 Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free blocks In-Reply-To: References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com> <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com> <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com> <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com> <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com> <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com> <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com> <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com> <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com> Message-ID: Looks good, Lutz. ..Thomas On Tue, Nov 19, 2019 at 3:36 PM Schmidt, Lutz wrote: > Hi Andrew, > > finally(!) I was able to create some measurements which show kind of an > effect on a real-world problem. > > I added my timers when running the renaissance benchmark ( > https://renaissance.dev). I am well aware of the limitations. One could > argue this benchmark does not solve a real-world problem. Furthermore, the > optimizations do not have a visible effect on the overall runtime (> 1 > hour) of the test. But at least, deep down, the inner mechanics of CodeHeap > management show some timing difference. I have attached a file with some > measurement data to this mail for convenience. The same file was also > uploaded to the bug. The measurements are from runs on linuxppc64. Other > platforms show similar results. > > Here is what you can see (and my interpretation of the visible): > > CodeHeap::mark_segmap_as_used() > =============================== > The number of segment map entries to be processed per call is reduced by a > factor of 2.5 to 5. As a consequence, the time spent in the method > decreases as well, but not by the same factor. This is due to the added > check for fragmentation and the defragmentation itself which occurs twice > and eliminates roughly 3.500 excessive fragments. > > CodeHeap::add_to_freelist() > =========================== > Here, the free list length controls the effort spent. Depending on the > platform, the length increases by a factor of 2 (with optimizations turned > on) or decreases by the same factor. Even with increased free list length, > the total time spent in the method decreases. That's obviously an effect of > not having to search the free list from the beginning every time. > > > I have created a new webrev, mainly to reflect the changes I applied, > based on Thomas' comments: > http://cr.openjdk.java.net/~lucy/webrevs/8231460.02/ > > jdk/submit tests pending... > > Please let me know if we have reached a state now where this change can be > considered reviewed. > > Thanks a lot, > Lutz > > > > On 07.11.19, 22:33, "Schmidt, Lutz" wrote: > > Hi Andrew, > > thanks for spending more thoughts on this matter - and for updating > your opinion. > > The instrumentation and measurement of other tests will take longer > than expected. It got delayed by JDK-8233787. The fix for this bug will > enable my timing code to run smoother. > > Side note: this timing code I have mentioned now several times is > nothing secret. It's just not suitable to contribute, among other reasons > because it's only available for ppc and s390. I can give you more > information in case you are interested - no problem if you say "ahhh, never > mind...". > > Thanks, > Lutz > > On 07.11.19, 17:34, "Andrew Dinn" wrote: > > On 04/11/2019 15:35, Schmidt, Lutz wrote: > > thank you for your thoughts. I do not agree to your conclusion, > > though. > > > > There are two bottlenecks in the CodeHeap management code. One > is in > > CodeHeap::mark_segmap_as_used(), uncovered by > > OverflowCodeCacheTest.java. The other is in > > CodeHeap::add_to_freelist(), uncovered by > StressCodeCacheTest.java. > > > > Both bottlenecks are tackled by the recommended changeset. > . . . > > CodeHeap::add_to_freelist() is still O(n*n), with n being the > free > > list length. But the kick-in point of the non-linearity could be > > significantly shifted towards larger n. The time reduction from > > approx. 8 seconds to 160 milliseconds supports this statement. > > Ah sorry, I was not clear from your original post that the proposed > change had significantly improved the time spent in free list > management > in the second test by significantly cutting down the free list > size. As > you say, a reduction factor of 1/K in list size will give a 1/K*K > reduction in execution time. Since this test is a lot nearer to > reality > than the overflow test I think the current result is perhaps > enough to > justify its value. > > > I agree it would be helpful to have a "real-world" example > showing > > some improvement. Providing such evidence is hard, though. I > could > > instrument the code and print some values form time to time. > It's > > certain this additional output will mess up success/failure > decisions > > in our test environment. Not sure everybody likes that. But I > will > > give it a try and take the hits. This will be a multi-day effort. > > Well, that would be nice to have but not if it stops other work. > The one > thing about the Stress test that I fear may be 'unreal' is the > potentially over-high probability of generating long(ish) runs of > adjacent free segments. That might be giving an artificial win > that we > will not in fact see. However, given the current numbers I'd be > happy to > risk that and let this patch go in as is. > > > On a general note, I am always uncomfortable knowing of a O(n*n) > > effort, in particular when it could be removed or at least tamed > > considerably. Experience tells (at least to me) that, at some > point > > in time, n will be large enough to hurt. > > Well, yes, although salesman do travel /and/ make money ... ;-) > > > I'll be back. > > Sure, thanks for following up. This is all very interesting. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. > 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill > > > > > > > From vladimir.x.ivanov at oracle.com Tue Nov 19 17:18:32 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 19 Nov 2019 20:18:32 +0300 Subject: [14] RFR (XS): 8234403: C2: Enable CallSite.target updates in constructors Message-ID: <23f57c45-b8cf-9d98-c8ed-8f5147afaa03@oracle.com> http://cr.openjdk.java.net/~vlivanov/8234403/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8234403 Direct CallSite.target updates are not supported in C2, because in general the JVM has to invalidate all nmethod dependency before performing the update (handled by MethodHandleNatives.setCallSiteTargetNormal/Volatile). But it's not the case for initializing stores during CallSite instance construction. Proposed fix assumes all raw updates happen on not-yet-published instances (so no nmethod dependencies) and treats CallSite.target updates inside ctors as an ordinary field. Considering the changes proposed for 8234401 [1], all direct updates in CallSite ctors are safe to be treated as ordinary field updates. Testing: tier1-4 Best regards, Vladimir Ivanov [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036019.html From paul.sandoz at oracle.com Tue Nov 19 17:35:44 2019 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 19 Nov 2019 09:35:44 -0800 Subject: [14] RFR (S): 8234401: ConstantCallSite may stuck in non-frozen state In-Reply-To: References: Message-ID: Ah the perils of partial construction :-) Subtle, so I could be misunderstanding something, did you intend to remove the assignment of isFrozen in the ConstantCallSite constructor? ConstantCallSite: protected ConstantCallSite(MethodType targetType, MethodHandle createTargetHook) throws Throwable { - super(targetType, createTargetHook); - isFrozen = true; + super(targetType, createTargetHook); // "this" instance leaks into createTargetHook + UNSAFE.storeStoreFence(); // properly publish isFrozen } Paul. > On Nov 19, 2019, at 8:53 AM, Vladimir Ivanov wrote: > > http://cr.openjdk.java.net/~vlivanov/8234401/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8234401 > > ConstantCallSite has a ctor which deliberately leaks partially initialized instance into user code. isFrozen is declared final and if user code is obstinate enough, it can end up with non-frozen state embedded into the generated code. It manifests as a ConstantCallSite instance which is stuck in non-frozen state. > > I switched isFrozen from final to @Stable, so non-frozen state is never constant folded. Put some store-store barriers along the way to restore final field handling. > > I deliberately stopped there (just restoring isFrozen final field behavior). Without proper synchronization, there's still a theoretical possibility of transiently observing a call site in non-frozen state right after ctor is over. But at least there's no way anymore to accidentally break an instance in such a way it becomes permanently unusable. > > PS: converted CallSite.target to final along the way and made some other minor refactorings. > > Testing: regression test, tier1-2 > > Best regards, > Vladimir Ivanov From vladimir.x.ivanov at oracle.com Tue Nov 19 17:49:35 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 19 Nov 2019 20:49:35 +0300 Subject: [14] RFR (S): 8234401: ConstantCallSite may stuck in non-frozen state In-Reply-To: References: Message-ID: > Subtle, so I could be misunderstanding something, did you intend to remove the assignment of isFrozen in the ConstantCallSite constructor? Oh, good catch. It is my fault: the update should be there. (The barriers are added just to preserve final field semantics for isFrozen.) Published the wrong version (with some leftovers from last-minute failed experiment). Updated in place. Best regards, Vladimir Ivanov >> On Nov 19, 2019, at 8:53 AM, Vladimir Ivanov wrote: >> >> http://cr.openjdk.java.net/~vlivanov/8234401/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8234401 >> >> ConstantCallSite has a ctor which deliberately leaks partially initialized instance into user code. isFrozen is declared final and if user code is obstinate enough, it can end up with non-frozen state embedded into the generated code. It manifests as a ConstantCallSite instance which is stuck in non-frozen state. >> >> I switched isFrozen from final to @Stable, so non-frozen state is never constant folded. Put some store-store barriers along the way to restore final field handling. >> >> I deliberately stopped there (just restoring isFrozen final field behavior). Without proper synchronization, there's still a theoretical possibility of transiently observing a call site in non-frozen state right after ctor is over. But at least there's no way anymore to accidentally break an instance in such a way it becomes permanently unusable. >> >> PS: converted CallSite.target to final along the way and made some other minor refactorings. >> >> Testing: regression test, tier1-2 >> >> Best regards, >> Vladimir Ivanov > From paul.sandoz at oracle.com Tue Nov 19 18:03:09 2019 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 19 Nov 2019 10:03:09 -0800 Subject: [14] RFR (S): 8234401: ConstantCallSite may stuck in non-frozen state In-Reply-To: References: Message-ID: Much better :-) I accumulated some more questions while I was looking further. CallSite: public class CallSite { - // The actual payload of this call site: + // The actual payload of this call site. + // Can be modified using {@link MethodHandleNatives#setCallSiteTargetNormal} or {@link MethodHandleNatives#setCallSiteTargetVolatile}. /*package-private*/ - MethodHandle target; // Note: This field is known to the JVM. Do not change. + final MethodHandle target; // Note: This field is known to the JVM. Is there any benefit to making target final, even though it's not really for mutable call sites? (With the recent discussion of "final means final" it would be nice to not introduce more special case stomping on final fields if we can avoid it). CallSite(MethodType targetType, MethodHandle createTargetHook) throws Throwable { - this(targetType); + this(targetType); // need to initialize target to make CallSite.type() work in createTargetHook ConstantCallSite selfCCS = (ConstantCallSite) this; MethodHandle boundTarget = (MethodHandle) createTargetHook.invokeWithArguments(selfCCS); - checkTargetChange(this.target, boundTarget); - this.target = boundTarget; + setTargetNormal(boundTarget); // ConstantCallSite doesn't publish CallSite.target + UNSAFE.storeStoreFence(); // barrier between target and isFrozen updates } I wonder if instead of introducing the store store fence here we could move it into ConstantCallSite? protected ConstantCallSite(MethodType targetType, MethodHandle createTargetHook) throws Throwable { - super(targetType, createTargetHook); + super(targetType, createTargetHook); // "this" instance leaks into createTargetHook + UNSAFE.storeStoreFence(); // barrier between target and isFrozen updates On Nov 19, 2019, at 9:49 AM, Vladimir Ivanov wrote: > > >> Subtle, so I could be misunderstanding something, did you intend to remove the assignment of isFrozen in the ConstantCallSite constructor? > > Oh, good catch. It is my fault: the update should be there. (The barriers are added just to preserve final field semantics for isFrozen.) > > Published the wrong version (with some leftovers from last-minute failed experiment). Updated in place. > > Best regards, > Vladimir Ivanov > >>> On Nov 19, 2019, at 8:53 AM, Vladimir Ivanov wrote: >>> >>> http://cr.openjdk.java.net/~vlivanov/8234401/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8234401 >>> >>> ConstantCallSite has a ctor which deliberately leaks partially initialized instance into user code. isFrozen is declared final and if user code is obstinate enough, it can end up with non-frozen state embedded into the generated code. It manifests as a ConstantCallSite instance which is stuck in non-frozen state. >>> >>> I switched isFrozen from final to @Stable, so non-frozen state is never constant folded. Put some store-store barriers along the way to restore final field handling. >>> >>> I deliberately stopped there (just restoring isFrozen final field behavior). Without proper synchronization, there's still a theoretical possibility of transiently observing a call site in non-frozen state right after ctor is over. But at least there's no way anymore to accidentally break an instance in such a way it becomes permanently unusable. >>> >>> PS: converted CallSite.target to final along the way and made some other minor refactorings. >>> >>> Testing: regression test, tier1-2 >>> >>> Best regards, >>> Vladimir Ivanov From vladimir.x.ivanov at oracle.com Tue Nov 19 18:12:37 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 19 Nov 2019 21:12:37 +0300 Subject: [14] RFR (S): 8234401: ConstantCallSite may stuck in non-frozen state In-Reply-To: References: Message-ID: Thanks, Paul. > CallSite: > > public class CallSite { > > - // The actual payload of this call site: > + // The actual payload of this call site. > + // Can be modified using {@link > MethodHandleNatives#setCallSiteTargetNormal} or {@link > MethodHandleNatives#setCallSiteTargetVolatile}. > /*package-private*/ > - MethodHandle target; // Note: This field is known to the JVM. Do not > change. > + final MethodHandle target; // Note: This field is known to the JVM. > > > Is there any benefit to making target final, even though it's not really > for mutable call sites? (With the recent discussion of "final means > final" it would be nice to not introduce more special case stomping on > final fields if we can avoid it). CallSite.target is already treated specially: all updates go through MethodHandleNatives and JIT-compiler treat it as "final" irrespective of the flags. My main interest in marking it final is to enforce proper initialization on JDK side to make it easier to reason about: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036021.html > CallSite(MethodType targetType, MethodHandle createTargetHook) throws Throwable { > - this(targetType); > + this(targetType); // need to initialize target to make CallSite.type() > work in createTargetHook > ConstantCallSite selfCCS = (ConstantCallSite) this; > MethodHandle boundTarget = (MethodHandle) createTargetHook.invokeWithArguments(selfCCS); > - checkTargetChange(this.target, boundTarget); > - this.target = boundTarget; > + setTargetNormal(boundTarget); // ConstantCallSite doesn't publish > CallSite.target > + UNSAFE.storeStoreFence(); // barrier between target and isFrozen updates > } > > > I wonder if instead of introducing the store store fence here we could > move it into ConstantCallSite? Sure, if you prefer to see it on ConstantCallSite side, we can move it there. By putting it in CallSite near the call site update, I wanted to stress there's a CallSite.target update happening on partially published instance. Best regards, Vladimir Ivanov > > protected ConstantCallSite(MethodType targetType, MethodHandle createTargetHook) throws Throwable { > - super(targetType, createTargetHook); > + super(targetType, createTargetHook); // "this" instance leaks into > createTargetHook > > + UNSAFE.storeStoreFence(); // barrier between target and isFrozen > updates > isFrozen = true; > + UNSAFE.storeStoreFence(); // properly publish isFrozen > } >> On Nov 19, 2019, at 9:49 AM, Vladimir Ivanov >> > >> wrote: >> >> >>> Subtle, so I could be misunderstanding something, did you intend to >>> remove the assignment of isFrozen in the ConstantCallSite constructor? >> >> Oh, good catch. It is my fault: the update should be there. (The >> barriers are added just to preserve final field semantics for isFrozen.) >> >> Published the wrong version (with some leftovers from last-minute >> failed experiment). Updated in place. >> >> Best regards, >> Vladimir Ivanov >> >>>> On Nov 19, 2019, at 8:53 AM, Vladimir Ivanov >>>> > >>>> wrote: >>>> >>>> http://cr.openjdk.java.net/~vlivanov/8234401/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8234401 >>>> >>>> ConstantCallSite has a ctor which deliberately leaks partially >>>> initialized instance into user code. isFrozen is declared final and >>>> if user code is obstinate enough, it can end up with non-frozen >>>> state embedded into the generated code. It manifests as a >>>> ConstantCallSite instance which is stuck in non-frozen state. >>>> >>>> I switched isFrozen from final to @Stable, so non-frozen state is >>>> never constant folded. Put some store-store barriers along the way >>>> to restore final field handling. >>>> >>>> I deliberately stopped there (just restoring isFrozen final field >>>> behavior). Without proper synchronization, there's still a >>>> theoretical possibility of transiently observing a call site in >>>> non-frozen state right after ctor is over. But at least there's no >>>> way anymore to accidentally break an instance in such a way it >>>> becomes permanently unusable. >>>> >>>> PS: converted CallSite.target to final along the way and made some >>>> other minor refactorings. >>>> >>>> Testing: regression test, tier1-2 >>>> >>>> Best regards, >>>> Vladimir Ivanov > From lutz.schmidt at sap.com Tue Nov 19 18:14:17 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 19 Nov 2019 18:14:17 +0000 Subject: RFR(M): 8231460: Performance issue (CodeHeap) with large free blocks In-Reply-To: <873965f0-61cc-4aac-fac1-a6cba442971f@redhat.com> References: <3DE63BEB-C9F8-429D-8C38-F1531A2E9F0A@sap.com> <37969751-34df-da93-90cc-5746ca4b5a4d@redhat.com> <6a9e3306-ff2b-1ded-c0fc-22af5831e776@redhat.com> <5A263499-D410-4D2A-B257-2F3849BE6D9C@sap.com> <3fce238d-33c8-3ea1-8ba2-b843faca13c2@redhat.com> <604ED3F3-7A81-46A6-AD44-AB838DE018D3@sap.com> <254f2bf0-94f8-8e78-b043-f428f2d2a0f1@redhat.com> <84AF3336-7370-4642-AB04-7015B3AFC4F1@sap.com> <54011A01-CB2E-4630-80C8-D01BF63B5F3A@sap.com> <873965f0-61cc-4aac-fac1-a6cba442971f@redhat.com> Message-ID: <6E415446-6DCC-4B98-8341-C335457C4753@sap.com> Thank you, Andrew and Thomas, for the reviews. Meanwhile, jdk/submit tests returned OK. So I will proceed and push the change, probably on Wednesday. Best Regards, Lutz ?On 19.11.19, 16:33, "Andrew Dinn" wrote: Hi Lutz, Thanks for persevering and obtaining these measurements. The benchmark may not necessarily be a great indicator of real-world problems but it is a better indicator of code cache performance than any other tests we have seen so far. I'm happy to accept this as evidence that the patch will improve performance. So, yes, reviewed! regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill On 19/11/2019 14:35, Schmidt, Lutz wrote: > finally(!) I was able to create some measurements which show kind of > an effect on a real-world problem. > > I added my timers when running the renaissance benchmark > (https://renaissance.dev). I am well aware of the limitations. One > could argue this benchmark does not solve a real-world problem. > Furthermore, the optimizations do not have a visible effect on the > overall runtime (> 1 hour) of the test. But at least, deep down, the > inner mechanics of CodeHeap management show some timing difference. I > have attached a file with some measurement data to this mail for > convenience. The same file was also uploaded to the bug. The > measurements are from runs on linuxppc64. Other platforms show > similar results. > > Here is what you can see (and my interpretation of the visible): > > CodeHeap::mark_segmap_as_used() > =============================== > The number of segment map entries to be processed per call is reduced > by a factor of 2.5 to 5. As a consequence, the time spent in the > method decreases as well, but not by the same factor. This is due to > the added check for fragmentation and the defragmentation itself > which occurs twice and eliminates roughly 3.500 excessive fragments. > > CodeHeap::add_to_freelist() > =========================== > Here, the free list length controls the effort spent. Depending on > the platform, the length increases by a factor of 2 (with > optimizations turned on) or decreases by the same factor. Even with > increased free list length, the total time spent in the method > decreases. That's obviously an effect of not having to search the > free list from the beginning every time. > > > I have created a new webrev, mainly to reflect the changes I applied, > based on Thomas' comments: > http://cr.openjdk.java.net/~lucy/webrevs/8231460.02/ > > jdk/submit tests pending... > > Please let me know if we have reached a state now where this change > can be considered reviewed. From paul.sandoz at oracle.com Tue Nov 19 18:25:11 2019 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 19 Nov 2019 10:25:11 -0800 Subject: [14] RFR (S): 8234401: ConstantCallSite may stuck in non-frozen state In-Reply-To: References: Message-ID: +1 > On Nov 19, 2019, at 10:12 AM, Vladimir Ivanov wrote: > > Thanks, Paul. > >> CallSite: >> public class CallSite { >> - // The actual payload of this call site: >> + // The actual payload of this call site. >> + // Can be modified using {@link MethodHandleNatives#setCallSiteTargetNormal} or {@link MethodHandleNatives#setCallSiteTargetVolatile}. >> /*package-private*/ >> - MethodHandle target; // Note: This field is known to the JVM. Do not change. >> + final MethodHandle target; // Note: This field is known to the JVM. >> Is there any benefit to making target final, even though it's not really for mutable call sites? (With the recent discussion of "final means final" it would be nice to not introduce more special case stomping on final fields if we can avoid it). > > CallSite.target is already treated specially: all updates go through MethodHandleNatives and JIT-compiler treat it as "final" irrespective of the flags. > > My main interest in marking it final is to enforce proper initialization on JDK side to make it easier to reason about: > > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036021.html > Ok, I see now in light of that context. >> CallSite(MethodType targetType, MethodHandle createTargetHook) throws Throwable { >> - this(targetType); >> + this(targetType); // need to initialize target to make CallSite.type() work in createTargetHook >> ConstantCallSite selfCCS = (ConstantCallSite) this; >> MethodHandle boundTarget = (MethodHandle) createTargetHook.invokeWithArguments(selfCCS); >> - checkTargetChange(this.target, boundTarget); >> - this.target = boundTarget; >> + setTargetNormal(boundTarget); // ConstantCallSite doesn't publish CallSite.target >> + UNSAFE.storeStoreFence(); // barrier between target and isFrozen updates >> } >> I wonder if instead of introducing the store store fence here we could move it into ConstantCallSite? > > Sure, if you prefer to see it on ConstantCallSite side, we can move it there. > > By putting it in CallSite near the call site update, I wanted to stress there's a CallSite.target update happening on partially published instance. > Up to you. Paul. From paul.sandoz at oracle.com Tue Nov 19 18:26:36 2019 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 19 Nov 2019 10:26:36 -0800 Subject: [14] RFR (XS): 8234403: C2: Enable CallSite.target updates in constructors In-Reply-To: <23f57c45-b8cf-9d98-c8ed-8f5147afaa03@oracle.com> References: <23f57c45-b8cf-9d98-c8ed-8f5147afaa03@oracle.com> Message-ID: +1 Paul. > On Nov 19, 2019, at 9:18 AM, Vladimir Ivanov wrote: > > http://cr.openjdk.java.net/~vlivanov/8234403/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8234403 > > Direct CallSite.target updates are not supported in C2, because in general the JVM has to invalidate all nmethod dependency before performing the update (handled by MethodHandleNatives.setCallSiteTargetNormal/Volatile). > > But it's not the case for initializing stores during CallSite instance construction. > > Proposed fix assumes all raw updates happen on not-yet-published instances (so no nmethod dependencies) and treats CallSite.target updates inside ctors as an ordinary field. > > Considering the changes proposed for 8234401 [1], all direct updates in CallSite ctors are safe to be treated as ordinary field updates. > > Testing: tier1-4 > > Best regards, > Vladimir Ivanov > > [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036019.html From vladimir.kozlov at oracle.com Tue Nov 19 19:03:39 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Nov 2019 11:03:39 -0800 Subject: [14] RFR (XS): 8234403: C2: Enable CallSite.target updates in constructors In-Reply-To: References: <23f57c45-b8cf-9d98-c8ed-8f5147afaa03@oracle.com> Message-ID: <91fd323a-1b51-bca5-fbdd-d9da321b85c4@oracle.com> +1 Vladimir K On 11/19/19 10:26 AM, Paul Sandoz wrote: > +1 > > Paul. > >> On Nov 19, 2019, at 9:18 AM, Vladimir Ivanov wrote: >> >> http://cr.openjdk.java.net/~vlivanov/8234403/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8234403 >> >> Direct CallSite.target updates are not supported in C2, because in general the JVM has to invalidate all nmethod dependency before performing the update (handled by MethodHandleNatives.setCallSiteTargetNormal/Volatile). >> >> But it's not the case for initializing stores during CallSite instance construction. >> >> Proposed fix assumes all raw updates happen on not-yet-published instances (so no nmethod dependencies) and treats CallSite.target updates inside ctors as an ordinary field. >> >> Considering the changes proposed for 8234401 [1], all direct updates in CallSite ctors are safe to be treated as ordinary field updates. >> >> Testing: tier1-4 >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036019.html > From thomas.stuefe at gmail.com Tue Nov 19 19:31:00 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 19 Nov 2019 20:31:00 +0100 Subject: RFR: 8234328: VectorSet::clear can cause fragmentation In-Reply-To: <47dce9ee-0e62-7375-4dff-2924f824ecc6@oracle.com> References: <47dce9ee-0e62-7375-4dff-2924f824ecc6@oracle.com> Message-ID: Hi Claes, Not that this is wrong, but do we have to live in resource area? I fell over such problems several times already, e.g. with resource-area-backed StringStreams. Maybe it would be better to just forbid resizing of RA-allocated arrays altogether. Then there is also the problem with passing RA-allocated arrays down the stack and accidentally resizing them under a different ResourceMark. I am not sure if this could happen with VectorSet though. Thanks, Thomas On Tue, Nov 19, 2019 at 11:16 AM Claes Redestad wrote: > Hi, > > today, VectorSet::clear "reclaims" storage when the size is large. > > However, since the backing array is allocated in a resource arena, this > is dubious since the currently retained memory is only actually freed > and made reusable if it's currently the last chunk of memory allocated > in the arena. This means a clear() is likely to just waste the allocated > memory until we exit the current resource scope > > Instead, I propose a strategy where instead of "freeing" we keep track > of the currently allocated size of the VectorSet separately from the in- > use size. We can then defer the memset to reset/clear the memory to the > next time we need to grow, thus avoiding unnecessary reallocations and > memsets. This limits the memory waste. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234328 > Webrev: http://cr.openjdk.java.net/~redestad/8234328/open.00/ > > Testing: tier1-3 > > Either of reset() or clear() could now be removed, which seems like a > straightforward follow-up RFE. With some convincing I could roll it into > this patch. > > Thanks! > > /Claes > From dean.long at oracle.com Tue Nov 19 22:16:39 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 19 Nov 2019 14:16:39 -0800 Subject: [14] RFR (S): 8234387: C2: Better support of operands with multiple match rules in AD files In-Reply-To: <30655159-7431-1a33-cb10-373e32c68002@oracle.com> References: <30655159-7431-1a33-cb10-373e32c68002@oracle.com> Message-ID: <0f57998a-7e25-807d-a562-1f74cffbd2e1@oracle.com> Hi Vladimir.? The change seems to be doing what you describe, however for my curiosity, could you give an example of an operand with multiple match rules? dl On 11/19/19 5:00 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8234387/webrev.00 > https://bugs.openjdk.java.net/browse/JDK-8234387 > > Though ADLC accepts operands with multiple match rules, it doesn't > generate correct code to handle them except the first one. > > It doesn't cause any noticeable problems for existing code, but is a > major limitation for generic vector operands (JDK-8234391 [1]). > > Proposed fix enumerates all match rules. > > Fixed some missing declarations along the way. > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, ? > > Testing: tier1-4 (both with and without generic vectors) > > Best regards, > Vladimir Ivanov > > [1] https://bugs.openjdk.java.net/browse/JDK-8234391 From Xiaohong.Gong at arm.com Wed Nov 20 06:36:09 2019 From: Xiaohong.Gong at arm.com (Xiaohong Gong (Arm Technology China)) Date: Wed, 20 Nov 2019 06:36:09 +0000 Subject: RFR: 8234321: Call cache flush after generating trampoline. Message-ID: Hi, Please help to review this small patch: Webrev: http://cr.openjdk.java.net/~xgong/rfr/8234321/webrev.00/ JBS: https://bugs.openjdk.java.net/browse/JDK-8234321 This patch fixes issue: https://github.com/oracle/graal/issues/1801. It is caused by a "SIGLL (0x4)" when the compiler calls a shared method in the CDS archive. It shows that the trampoline instruction is invalid, while actually it's valid. And the trampoline is just an unconditional branch to the real entry of the method, which is generated at runtime when the method is linked. So one possible reason is that the instruction cache and data cache are not synced. This patch fixes it by calling the cache flush after generating the trampoline for shared methods. It invalidates the icache to make sure the instructions fetched are updated. This is important for platforms that the CPUs do not have a coherent icache like AArch64. Thanks, Xiaohong Gong From rwestrel at redhat.com Wed Nov 20 08:59:10 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 20 Nov 2019 09:59:10 +0100 Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use == sfpt || !use->is_reachable_from_root())) failed: missed a node Message-ID: <871ru2diu9.fsf@redhat.com> http://cr.openjdk.java.net/~roland/8234350/webrev.00/ This is the same issue and fix as 8230061 (dead nodes in the outer strip mined loop should be ignored in verification code when cloning the loop body). The only difference is that the assert is relaxed so it applies to all forms of cloning where the outer strip mined loop is involved. Roland. From adinn at redhat.com Wed Nov 20 09:11:43 2019 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 20 Nov 2019 09:11:43 +0000 Subject: RFR: 8234321: Call cache flush after generating trampoline. In-Reply-To: References: Message-ID: On 20/11/2019 06:36, Xiaohong Gong (Arm Technology China) wrote: > Please help to review this small patch: > Webrev: http://cr.openjdk.java.net/~xgong/rfr/8234321/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8234321 Yes, the patch looks good. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From tobias.hartmann at oracle.com Wed Nov 20 10:13:47 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 20 Nov 2019 11:13:47 +0100 Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use == sfpt || !use->is_reachable_from_root())) failed: missed a node In-Reply-To: <871ru2diu9.fsf@redhat.com> References: <871ru2diu9.fsf@redhat.com> Message-ID: <02b9176e-8cae-fbd1-3d7a-0dfcdfa4b0a7@oracle.com> Hi Roland, this looks good to me but due to "-Xcomp -XX:-TieredCompilation", the test should either not be executed with Graal as JIT or compilation should be restricted to test methods. Best regards, Tobias On 20.11.19 09:59, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8234350/webrev.00/ > > This is the same issue and fix as 8230061 (dead nodes in the outer strip > mined loop should be ignored in verification code when cloning the loop > body). The only difference is that the assert is relaxed so it applies > to all forms of cloning where the outer strip mined loop is involved. > > Roland. > From rwestrel at redhat.com Wed Nov 20 10:17:44 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 20 Nov 2019 11:17:44 +0100 Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use == sfpt || !use->is_reachable_from_root())) failed: missed a node In-Reply-To: <02b9176e-8cae-fbd1-3d7a-0dfcdfa4b0a7@oracle.com> References: <871ru2diu9.fsf@redhat.com> <02b9176e-8cae-fbd1-3d7a-0dfcdfa4b0a7@oracle.com> Message-ID: <87y2wac0mv.fsf@redhat.com> Hi Tobias, Thanks for reviewing this. > this looks good to me but due to "-Xcomp -XX:-TieredCompilation", the test should either not be > executed with Graal as JIT or compilation should be restricted to test methods. Wouldn't -XX:CompileOnly=DeadNodesInOuterLoopAtLoopCloning2 (included in the test case) do that? Roland. From tobias.hartmann at oracle.com Wed Nov 20 10:26:22 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 20 Nov 2019 11:26:22 +0100 Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use == sfpt || !use->is_reachable_from_root())) failed: missed a node In-Reply-To: <87y2wac0mv.fsf@redhat.com> References: <871ru2diu9.fsf@redhat.com> <02b9176e-8cae-fbd1-3d7a-0dfcdfa4b0a7@oracle.com> <87y2wac0mv.fsf@redhat.com> Message-ID: On 20.11.19 11:17, Roland Westrelin wrote: > Wouldn't -XX:CompileOnly=DeadNodesInOuterLoopAtLoopCloning2 (included in > the test case) do that? Oops, somehow missed that.. Of course it does :) Thanks, Tobias From vladimir.x.ivanov at oracle.com Wed Nov 20 11:09:30 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 20 Nov 2019 14:09:30 +0300 Subject: [14] RFR (S): 8234387: C2: Better support of operands with multiple match rules in AD files In-Reply-To: <0f57998a-7e25-807d-a562-1f74cffbd2e1@oracle.com> References: <30655159-7431-1a33-cb10-373e32c68002@oracle.com> <0f57998a-7e25-807d-a562-1f74cffbd2e1@oracle.com> Message-ID: Hi Dean, Actually there are many operands declared with multiple match rules. The one which is hit the hardest (matching failures) is being added by 8234391 [1]: +operand vec() %{ + constraint(ALLOC_IN_RC(dynamic)); + match(VecX); + match(VecY); + match(VecZ); + match(VecS); + match(VecD); + + format %{ %} + interface(REG_INTER); +%} But there are existing operand declarations which are affected: operand rRegP() %{ constraint(ALLOC_IN_RC(ptr_reg)); match(RegP); match(rax_RegP); match(rbx_RegP); match(rdi_RegP); match(rsi_RegP); match(rbp_RegP); // See Q&A below about match(r15_RegP); // r15_RegP and rbp_RegP. format %{ %} interface(REG_INTER); %} There was no rbp_RegP operand declared and ADLC didn't notice it since it didn't enumerate all the match rules. Best regards, Vladimir Ivanov [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html On 20.11.2019 01:16, dean.long at oracle.com wrote: > Hi Vladimir.? The change seems to be doing what you describe, however > for my curiosity, could you give an example of an operand with multiple > match rules? > > dl > > On 11/19/19 5:00 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234387/webrev.00 >> https://bugs.openjdk.java.net/browse/JDK-8234387 >> >> Though ADLC accepts operands with multiple match rules, it doesn't >> generate correct code to handle them except the first one. >> >> It doesn't cause any noticeable problems for existing code, but is a >> major limitation for generic vector operands (JDK-8234391 [1]). >> >> Proposed fix enumerates all match rules. >> >> Fixed some missing declarations along the way. >> >> Contributed-by: Jatin Bhateja >> Reviewed-by: vlivanov, sviswanathan, ? >> >> Testing: tier1-4 (both with and without generic vectors) >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8234391 > From vladimir.x.ivanov at oracle.com Wed Nov 20 11:10:25 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 20 Nov 2019 14:10:25 +0300 Subject: [14] RFR (XS): 8234403: C2: Enable CallSite.target updates in constructors In-Reply-To: <91fd323a-1b51-bca5-fbdd-d9da321b85c4@oracle.com> References: <23f57c45-b8cf-9d98-c8ed-8f5147afaa03@oracle.com> <91fd323a-1b51-bca5-fbdd-d9da321b85c4@oracle.com> Message-ID: Thanks for reviews, Vladimir K. and Paul. Best regards, Vladimir Ivanov On 19.11.2019 22:03, Vladimir Kozlov wrote: > +1 > > Vladimir K > > On 11/19/19 10:26 AM, Paul Sandoz wrote: >> +1 >> >> Paul. >> >>> On Nov 19, 2019, at 9:18 AM, Vladimir Ivanov >>> wrote: >>> >>> http://cr.openjdk.java.net/~vlivanov/8234403/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8234403 >>> >>> Direct CallSite.target updates are not supported in C2, because in >>> general the JVM has to invalidate all nmethod dependency before >>> performing the update (handled by >>> MethodHandleNatives.setCallSiteTargetNormal/Volatile). >>> >>> But it's not the case for initializing stores during CallSite >>> instance construction. >>> >>> Proposed fix assumes all raw updates happen on not-yet-published >>> instances (so no nmethod dependencies) and treats CallSite.target >>> updates inside ctors as an ordinary field. >>> >>> Considering the changes proposed for 8234401 [1], all direct updates >>> in CallSite ctors are safe to be treated as ordinary field updates. >>> >>> Testing: tier1-4 >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] >>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036019.html >>> >> From vladimir.x.ivanov at oracle.com Wed Nov 20 11:11:53 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 20 Nov 2019 14:11:53 +0300 Subject: [14] RFR (S): 8234401: ConstantCallSite may stuck in non-frozen state In-Reply-To: References: Message-ID: <3949ef3f-ba3c-2ff5-282c-86b6662e502f@oracle.com> Thanks for review, Paul. Best regards, Vladimir Ivanov On 19.11.2019 21:25, Paul Sandoz wrote: > +1 > >> On Nov 19, 2019, at 10:12 AM, Vladimir Ivanov wrote: >> >> Thanks, Paul. >> >>> CallSite: >>> public class CallSite { >>> - // The actual payload of this call site: >>> + // The actual payload of this call site. >>> + // Can be modified using {@link MethodHandleNatives#setCallSiteTargetNormal} or {@link MethodHandleNatives#setCallSiteTargetVolatile}. >>> /*package-private*/ >>> - MethodHandle target; // Note: This field is known to the JVM. Do not change. >>> + final MethodHandle target; // Note: This field is known to the JVM. >>> Is there any benefit to making target final, even though it's not really for mutable call sites? (With the recent discussion of "final means final" it would be nice to not introduce more special case stomping on final fields if we can avoid it). >> >> CallSite.target is already treated specially: all updates go through MethodHandleNatives and JIT-compiler treat it as "final" irrespective of the flags. >> >> My main interest in marking it final is to enforce proper initialization on JDK side to make it easier to reason about: >> >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036021.html >> > > Ok, I see now in light of that context. > > >>> CallSite(MethodType targetType, MethodHandle createTargetHook) throws Throwable { >>> - this(targetType); >>> + this(targetType); // need to initialize target to make CallSite.type() work in createTargetHook >>> ConstantCallSite selfCCS = (ConstantCallSite) this; >>> MethodHandle boundTarget = (MethodHandle) createTargetHook.invokeWithArguments(selfCCS); >>> - checkTargetChange(this.target, boundTarget); >>> - this.target = boundTarget; >>> + setTargetNormal(boundTarget); // ConstantCallSite doesn't publish CallSite.target >>> + UNSAFE.storeStoreFence(); // barrier between target and isFrozen updates >>> } >>> I wonder if instead of introducing the store store fence here we could move it into ConstantCallSite? >> >> Sure, if you prefer to see it on ConstantCallSite side, we can move it there. >> >> By putting it in CallSite near the call site update, I wanted to stress there's a CallSite.target update happening on partially published instance. >> > > Up to you. > > Paul. > From per.liden at oracle.com Wed Nov 20 12:08:12 2019 From: per.liden at oracle.com (Per Liden) Date: Wed, 20 Nov 2019 13:08:12 +0100 Subject: RFC 8233915: JVMTI FollowReferences: Java Heap Leak not found because of C2 Scalar Replacement In-Reply-To: <729138cc-7a21-cf79-947c-c6a68f34237a@oracle.com> References: <729138cc-7a21-cf79-947c-c6a68f34237a@oracle.com> Message-ID: <2087d0a1-8561-7972-a96b-8caf5ed8c2e6@oracle.com> FYI, I've just pushed a patch[1] that makes ZGC work better with smaller heaps (down to 8M). [1] https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2019-November/027844.html /Per On 11/12/19 8:33 PM, Leonid Mesnik wrote: > Hi > > I don't make complete review just sanity verified your test headers. I > see a couple of potential issues with them. > > 1) The using Xmx32M could cause OOME failures if test is executed with > ZGC. I think that at least 256M should be set. Could you please verify > that your tests pass with ZGC enabled. > > > 2) I think it makes sense to add requires > > vm.opt.TieredCompilation != true > > to just skip tests if anyone runs them with tiered compilation disabled > explicitly. > > Leonid > > On 11/11/19 7:29 AM, Reingruber, Richard wrote: >> Hi, >> >> I have created https://bugs.openjdk.java.net/browse/JDK-8233915 >> >> In short, a set of live objects L is not found using JVMTI >> FollowReferences() if L is only reachable >> from a scalar replaced object in a frame of a C2 compiled method. If L >> happens to be a growing leak, >> then a dynamically loaded JVMTI agent (note: can_tag_objects is an >> always capability) for heap >> diagnostics won't discover L as live and it won't be able to find root >> references that lead to L. >> >> I'd like to suggest the implementation for the proposed enhancement >> JDK-8227745 as bug-fix. >> >> RFE:?????? https://bugs.openjdk.java.net/browse/JDK-8227745 >> Webrev(*): >> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.1/ >> >> Please comment on the suggestion. Dou you see other solutions that >> allow an agent to discover the >> chain of references to L? >> >> I'd like to work on the complexity as well. One significant >> simplification could be, if it was >> possible to reallocate scalar replaced objects at safepoints (i.e. >> allow the VM thread to call >> Deoptimization::realloc_objects()). The GC interface does not seem to >> allow this. >> >> Thanks, Richard. >> >> (*) Not yet accepted, because deemed too complex for the performance >> gain. Note that I was able to >> ???? reduce webrev.1 in size compared to webrev.0 From fujie at loongson.cn Wed Nov 20 12:39:09 2019 From: fujie at loongson.cn (Jie Fu) Date: Wed, 20 Nov 2019 20:39:09 +0800 Subject: RFR: 8234499: [Graal] compiler/compilercontrol/CompilationModeHighOnlyTest.java test fails with timeout Message-ID: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn> Hi all, May I get reviews for this small fix? JBS:??? https://bugs.openjdk.java.net/browse/JDK-8234499 Webrev: http://cr.openjdk.java.net/~jiefu/8234499/webrev.00/ And I need a sponsor. Thanks a lot. Best regards, Jie From vladimir.x.ivanov at oracle.com Wed Nov 20 13:49:01 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 20 Nov 2019 16:49:01 +0300 Subject: Allocation of array copy can be eliminated in particular cases In-Reply-To: <48528911574117040@sas8-55d8cbf44a35.qloud-c.yandex.net> References: <48528911574117040@sas8-55d8cbf44a35.qloud-c.yandex.net> Message-ID: <9d229f16-e3e8-c848-eea8-f4a24082aa3f@oracle.com> Hi Sergey, > Is my speculation correct and does it make sence to implement optimization that turns sequence > > array -> array.clone() - > clone.length > > into > > array -> array.length > > for the cases clone's visibility scope is predictable? Considering there's no way to grow/shrink Java arrays, "cloned_array.length => original_array.length" transformation is correct irrespective of whether cloned variant escapes or not. Moreover, the transformation is already there: http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/opto/memnode.cpp#l2388 I haven't looked into the benchmarks you mentioned, but it looks like cloned_array.length access is not the reason why cloned array is still there. Regarding your other ideas, redirecting accesses from cloned instance to original is problematic (in general case) since compiler has to prove there were no changes in both versions and indexed accesses make it even harder. And safepoints cause problems as well (for rematerialization). But I agree that it would be nice to cover (at least) simple cases of defensive copying. Best regards, Vladimir Ivanov From christian.hagedorn at oracle.com Wed Nov 20 14:14:31 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 20 Nov 2019 15:14:31 +0100 Subject: [14] RFR(S): 8231501: VM crash in MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected tag 99 Message-ID: Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8231501 http://cr.openjdk.java.net/~chagedorn/8231501/webrev.00/ The bug could be traced back to the concurrent cleaning of method data with its extra data in MethodData::clean_method_data() and the loading/copying of extra data for the ci method data in ciMethodData::load_extra_data(). I reproduced the bug by using the test [1] which extensively cleans method data by using the whitebox API [2]. Before loading and copying the extra data from the MDO to the ciMDO in ciMethodData::load_extra_data(), the metadata is prepared in a fixed-point iteration by cleaning all SpeculativeTrapData entries of methods whose klasses are unloaded [3]. If it encounters such a dead entry it releases the extra data lock (due to ranking issues) and tries again later [4]. This release of the lock triggers the bug: There can be cases where one thread A is waiting in the whitebox API method to get the extra data lock [2] to clean the extra data for the very same MDO for which another thread B just released the lock at [4]. If that MDO actually contained SpeculativeTrapData entries, then thread A cleaned those but the ciMDO, which thread B is preparing, still contains the uncleaned old MDO extra data (because thread B only made a snapshot of the MDO earlier at [5]). Things then go wrong when thread B can reacquire the lock after thread A. It tries to load the now cleaned extra data and immediately finishes at [6] since there are no SpeculativeTrapData entries anymore. It copied a single entry with tag DataLayout::no_tag [7] to the ciMDO which actually contained a SpeculativeTrapData entry. This results in a half way cleared entry (since a SpeculativeTrapData entry has an additional cell for the method) and possible other remaining SpeculativeTrapData entries: Let's assume a little-endian ordering and that both 0x00007fff... addresses are real pointers to methods. Tag 13 (0x0d) is used for SpeculativeTrapData and dp points to the first extra data entry: ciMDO extra data before thread B releases the lock at [4] (same extra data for MDO and ciMDO): 0x800000040011000d 0x00007fffd4993c63 0x800000040011000d 0x00007fffd49b1a68 0x0000000000000000 dp: tag = 13 -> next entry = dp+16; dp+8: method 0x00007fffd4993c63 dp+16: tag = 13 -> next entry = dp+32; dp+24: method 0x00007fffd49b1a68 dp+32: tag = 0 -> end of extra data MDO extra data after thread B reacquires the lock and thread A cleaned the MDO (ciMDO extra data is unchanged): 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 dp: tag = 0 -> end of extra data Returning at [6] when the extra data loading from MDO to ciMDO is finished: MDO extra data: 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 dp: tag = 0 -> end of extra data ciMDO extra data, only copied the first no_tag entry from MDO at [7] (8 bytes): 0x0000000000000000 0x00007fffd4993c63 0x800000040011000d 0x00007fffd49b1a68 0x0000000000000000 dp: tag = 0 -> next entry = dp+8 dp+8: tag = 0x63 = 99 -> there is no tag 99 -> fatal... The next time the ciMDO extra data is iterated, for example by using MethodData::next_extra(), it reads tag 99 after processing the first no_tag entry and jumping to the value at offset 8 which causes a crash since there is no tag 99 available. The fix is to completely zero out the current and all following SpeculativeTrapData entries if we encounter a no_tag in the MDO but a speculative_trap_data_tag tag in the ciMDO. There are also other cases where the method data is cleaned. Thus the bug is not only related to the whitebox API usage but occurs very rarely. Thank you! Best regards, Christian [1] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/test/hotspot/jtreg/compiler/types/correctness/CorrectnessTest.java [2] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/prims/whitebox.cpp#l1137 [3] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l137 [4] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l115 [5] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l219 [6] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l191 [7] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l176 From nils.eliasson at oracle.com Wed Nov 20 14:25:54 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 20 Nov 2019 15:25:54 +0100 Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles Message-ID: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com> Hi, I found a few bugs after the enabling of the clone intrinsic in ZGC. 1) The arraycopy clone_basic has the parameters adjusted to work as a memcopy. For an oop the src is pointing inside the oop to where we want to start copying. But when we want to do a runtime call to clone - the parameters are supposed to be the actual src oop and dst oop, and the size should be the instance size. For now I have made a workaround. What should be done later is using the offset in the arraycopy node to encode where the payload is, so that the base pointers are always correct. But that would require changes to the BarrierSet classes of all GCs. So I leave that for next release. 2) The size parameter of the TypeFunc for the runtime call has the wrong type. It was originally Long but missed the upper Half, it was fixed to INT (JDK-8233834), but that is wrong and causes the compiles to be skipped. We didn't notice that since they failed silently. That is also why we didn't notice problem #1 too. https://bugs.openjdk.java.net/browse/JDK-8234520 http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/ Please review! Nils From nils.eliasson at oracle.com Wed Nov 20 14:36:16 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 20 Nov 2019 15:36:16 +0100 Subject: Allocation of array copy can be eliminated in particular cases In-Reply-To: <9d229f16-e3e8-c848-eea8-f4a24082aa3f@oracle.com> References: <48528911574117040@sas8-55d8cbf44a35.qloud-c.yandex.net> <9d229f16-e3e8-c848-eea8-f4a24082aa3f@oracle.com> Message-ID: On 2019-11-20 14:49, Vladimir Ivanov wrote: > Hi Sergey, > >> Is my speculation correct and does it make sence to implement >> optimization that turns sequence >> >> array -> array.clone() - > clone.length >> >> into >> >> array -> array.length >> >> for the cases clone's visibility scope is predictable? > > Considering there's no way to grow/shrink Java arrays, > "cloned_array.length => original_array.length" transformation is > correct irrespective of whether cloned variant escapes or not. > > Moreover, the transformation is already there: > > http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/opto/memnode.cpp#l2388 > > > I haven't looked into the benchmarks you mentioned, but it looks like > cloned_array.length access is not the reason why cloned array is still > there. We don't eliminate array allocations that doesn't have a known length because they might cause a NegativeArraySize Exception. In this case we should be able to prove that the length is positive though. Anyway - I have an almost finished patch that replace unused array allocations with a proper guard. // Nils > > Regarding your other ideas, redirecting accesses from cloned instance > to original is problematic (in general case) since compiler has to > prove there were no changes in both versions and indexed accesses make > it even harder. And safepoints cause problems as well (for > rematerialization). > > But I agree that it would be nice to cover (at least) simple cases of > defensive copying. > > Best regards, > Vladimir Ivanov From vladimir.x.ivanov at oracle.com Wed Nov 20 15:34:05 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 20 Nov 2019 18:34:05 +0300 Subject: Allocation of array copy can be eliminated in particular cases In-Reply-To: References: <48528911574117040@sas8-55d8cbf44a35.qloud-c.yandex.net> <9d229f16-e3e8-c848-eea8-f4a24082aa3f@oracle.com> Message-ID: >> I haven't looked into the benchmarks you mentioned, but it looks like >> cloned_array.length access is not the reason why cloned array is still >> there. > > We don't eliminate array allocations that doesn't have a known length > because they might cause a NegativeArraySize Exception. In this case we > should be able to prove that the length is positive though. Good point, Nils! Yes, in this particular case the length is always non-negative. So, the guard you are working on will go away right away. Best regards, Vladimir Ivanov From sergei.tsypanov at yandex.ru Wed Nov 20 21:30:48 2019 From: sergei.tsypanov at yandex.ru (=?utf-8?B?0KHQtdGA0LPQtdC5INCm0YvQv9Cw0L3QvtCy?=) Date: Wed, 20 Nov 2019 23:30:48 +0200 Subject: Allocation of array copy can be eliminated in particular cases In-Reply-To: <9d229f16-e3e8-c848-eea8-f4a24082aa3f@oracle.com> References: <48528911574117040@sas8-55d8cbf44a35.qloud-c.yandex.net> <9d229f16-e3e8-c848-eea8-f4a24082aa3f@oracle.com> Message-ID: <30768651574285448@vla1-a6eaa355d163.qloud-c.yandex.net> Hello Vladimir, thank you for your response! > Moreover, the transformation is already there: > > http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/opto/memnode.cpp#l2388 Comment in line 2389 seems confusing to me: // This works even if the length is not constant (clone or newArray). When we clone array isn't the length constant and equal to the length of original array? I guess it cannot be different. > I haven't looked into the benchmarks you mentioned, but it looks like > cloned_array.length access is not the reason why cloned array is still > there. Once I thought that cloned array is retained at run time because it's returned from method in original benchmark: @Benchmark public int getParameterTypes() { return method.getParameterTypes().length; } To check whether this speculation is correct I've tried to change my benchmark in order to strip any additional logic from it [1]: @State(Scope.Thread) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) public class ArrayAllocationEliminationBenchmark { private int length = 10; //... @Benchmark public int baseline() { return new int[length].length; } @Benchmark public int baselineClone() { return new int[length].clone().length; } //... } Here I don't see any reason for runtime to hold cloned array: 1) int is returned from the method 2) cloned array doesn't escape the place where it's created So the cloned array should be dropped off, but according to benchmarking results it's not: JDK 11 Mode Cnt Score Error Units baseline avgt 25 10,860 ? 0,604 ns/op baseline:?gc.alloc.rate avgt 25 4703,477 ? 215,986 MB/sec baseline:?gc.alloc.rate.norm avgt 25 56,000 ? 0,001 B/op baseline:?gc.churn.CodeHeap_'non-profiled_nmethods' avgt 25 0,002 ? 0,001 MB/sec baseline:?gc.churn.CodeHeap_'non-profiled_nmethods'.norm avgt 25 ? 10?? B/op baseline:?gc.churn.G1_Old_Gen avgt 25 4711,586 ? 218,439 MB/sec baseline:?gc.churn.G1_Old_Gen.norm avgt 25 56,094 ? 0,084 B/op baseline:?gc.count avgt 25 5400,000 counts baseline:?gc.time avgt 25 3926,000 ms baselineClone avgt 25 21,906 ? 1,234 ns/op baselineClone:?gc.alloc.rate avgt 25 4667,440 ? 248,731 MB/sec baselineClone:?gc.alloc.rate.norm avgt 25 112,000 ? 0,001 B/op baselineClone:?gc.churn.CodeHeap_'non-profiled_nmethods' avgt 25 0,008 ? 0,002 MB/sec baselineClone:?gc.churn.CodeHeap_'non-profiled_nmethods'.norm avgt 25 ? 10?? B/op baselineClone:?gc.churn.G1_Old_Gen avgt 25 4675,250 ? 247,341 MB/sec baselineClone:?gc.churn.G1_Old_Gen.norm avgt 25 112,192 ? 0,162 B/op baselineClone:?gc.count avgt 25 5489,000 counts baselineClone:?gc.time avgt 25 4042,000 ms JDK 13 Mode Cnt Score Error Units baseline avgt 25 10,014 ? 0,227 ns/op baseline:?gc.alloc.rate avgt 25 5082,913 ? 110,593 MB/sec baseline:?gc.alloc.rate.norm avgt 25 56,000 ? 0,001 B/op baseline:?gc.churn.G1_Eden_Space avgt 25 5092,013 ? 110,500 MB/sec baseline:?gc.churn.G1_Eden_Space.norm avgt 25 56,100 ? 0,076 B/op baseline:?gc.churn.G1_Survivor_Space avgt 25 0,005 ? 0,001 MB/sec baseline:?gc.churn.G1_Survivor_Space.norm avgt 25 ? 10?? B/op baseline:?gc.count avgt 25 5753,000 counts baseline:?gc.time avgt 25 3733,000 ms baselineClone avgt 25 26,619 ? 1,405 ns/op baselineClone:?gc.alloc.rate avgt 25 3837,924 ? 185,292 MB/sec baselineClone:?gc.alloc.rate.norm avgt 25 112,000 ? 0,001 B/op baselineClone:?gc.churn.G1_Eden_Space avgt 25 3844,010 ? 185,460 MB/sec baselineClone:?gc.churn.G1_Eden_Space.norm avgt 25 112,178 ? 0,168 B/op baselineClone:?gc.churn.G1_Survivor_Space avgt 25 0,008 ? 0,001 MB/sec baselineClone:?gc.churn.G1_Survivor_Space.norm avgt 25 ? 10?? B/op baselineClone:?gc.count avgt 25 4668,000 counts baselineClone:?gc.time avgt 25 2923,000 ms >From this output I conclude that either I miss something from understanding of how compiler and runtime work, or this is a bug. I will be happy to understand which of the two is correct :) There is also good news though, the latest Graal can drop allocation off for baseline method [2] With best regards, Sergey Tsypanov 1) https://github.com/stsypanov/logeek-night-benchmark/blob/master/benchmark-runners/src/main/java/com/luxoft/logeek/benchmark/array/ArrayAllocationEliminationBenchmark.java 2) https://github.com/oracle/graal/issues/1847 From sergei.tsypanov at yandex.ru Wed Nov 20 21:32:52 2019 From: sergei.tsypanov at yandex.ru (=?utf-8?B?0KHQtdGA0LPQtdC5INCm0YvQv9Cw0L3QvtCy?=) Date: Wed, 20 Nov 2019 23:32:52 +0200 Subject: Allocation of array copy can be eliminated in particular cases In-Reply-To: References: <48528911574117040@sas8-55d8cbf44a35.qloud-c.yandex.net> <9d229f16-e3e8-c848-eea8-f4a24082aa3f@oracle.com> Message-ID: <67884931574285572@sas2-c8fd3ed78d67.qloud-c.yandex.net> Hello Nils, > Anyway - I have an almost finished patch that replace unused array > allocations with a proper guard. is there any particular JDK-xxxxxxx issue where this is tracked? Regards, Sergey Tsypanov From hohensee at amazon.com Wed Nov 20 22:22:30 2019 From: hohensee at amazon.com (Hohensee, Paul) Date: Wed, 20 Nov 2019 22:22:30 +0000 Subject: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier In-Reply-To: References: Message-ID: It's not just the zgc load barrier that's affected, it's every jcc and fuzed jcc (e.g., cmp/jcc and sub/jcc, because the pairs are issued on the same clock). There's a code pattern attribute called ins_alignment() in the ad file, vis ins_attrib ins_alignment(1); // Required alignment attribute (must // be a power of 2) specifies the // alignment that some part of the // instruction (not necessarily the // start) requires. If > 1, a // compute_padding() function must be // provided for the instruction Would it be possible to use/enhance ins_alignment() rather than do something zgc-specific? Thanks, Paul ?On 11/19/19, 6:23 AM, "hotspot-compiler-dev on behalf of erik.osterlund at oracle.com" wrote: Hi, Intel released an erratum (SKX102) which causes "unexpected system behaviour" when branches (including fused conditional branches) cross or end at 64 byte boundaries. They are mitigating this by rolling out microcode updates that disable micro op caching for conditional branches that cross or end at 32 byte boundaries. The mitigation can cause performance regressions, unless affected branches are aligned properly. The erratum and its mitigation are described in more detail in this document published by Intel: https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf My intention for this patch is to introduce the infrastructure to determine that we may have an affected CPU, and mitigate this by aligning the most important branch in the whole JVM: the ZGC load barrier fast path check. Perhaps similar methodology can be reused later to solve this for other performance critical code, but that is outside the scope of this CR. The sprinkling of nops do not seem to cause regressions in workloads I have tried, given a machine without the JCC mitigations. Bug: https://bugs.openjdk.java.net/browse/JDK-8234160 Webrev: http://cr.openjdk.java.net/~eosterlund/8234160/webrev.00/ Thanks, /Erik From igor.ignatyev at oracle.com Thu Nov 21 00:27:42 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 20 Nov 2019 16:27:42 -0800 Subject: RFR(S) : 8147017 : Platform.isGraal should be removed In-Reply-To: <5e1d17af-798f-123f-ef5e-3957b98a8340@oracle.com> References: <981118AF-1DAD-4231-9FA6-7A89A46E5EDB@oracle.com> <5e1d17af-798f-123f-ef5e-3957b98a8340@oracle.com> Message-ID: @Misha, thanks for your review. @list, can I get a 2nd review from a Reviewer? -- Igor > On Nov 18, 2019, at 2:06 PM, mikhailo.seledtsov at oracle.com wrote: > > Looks good to me, > > Misha > > On 11/17/19 11:00 AM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html >>> 16 lines changed: 2 ins; 8 del; 6 mod; >> Hi all, >> >> jdk.test.lib.Platform.isGraal method assumes that JVM w/ Graal as JIT has 'Graal VM' in its name, which is wrong, and caused other to incorrectly assume that '-graal' flag exist and must be used to select Graal compiler. the patch removes this method and updates its only meaningful usage in TestGCLogMessages test. TestGCLogMessages test should use LogMessageWithLevelC2OrJVMCIOnly only when c2 or graal is available, so it's been updated to use corresponding methods of sun.hotspot.code.Compiler class, which requires WhiteBoxAPI being enabled. >> >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8147017 >> webrev: http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html >> testing: tier1 + TestGCLogMessages w/ different JIT configurations >> >> Thanks, >> -- Igor From igor.ignatyev at oracle.com Thu Nov 21 01:05:41 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 20 Nov 2019 17:05:41 -0800 Subject: RFR: 8234499: [Graal] compiler/compilercontrol/CompilationModeHighOnlyTest.java test fails with timeout In-Reply-To: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn> References: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn> Message-ID: <4F729DBC-1527-4899-8F0F-7CA1930BBC7A@oracle.com> Hi Jie, wouldn't it be a better solution to limit compilation to CompilationModeHighOnlyTest via -XX:CompileControl=compileonly? this should solve timeout w/ Graal w/o removing the only test for CompilationMode=high-only? Thanks, -- Igor > On Nov 20, 2019, at 4:39 AM, Jie Fu wrote: > > Hi all, > > May I get reviews for this small fix? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8234499 > Webrev: http://cr.openjdk.java.net/~jiefu/8234499/webrev.00/ > > And I need a sponsor. > > Thanks a lot. > Best regards, > Jie > From vladimir.kozlov at oracle.com Thu Nov 21 01:54:34 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 20 Nov 2019 17:54:34 -0800 Subject: RFR(S) : 8147017 : Platform.isGraal should be removed In-Reply-To: References: <981118AF-1DAD-4231-9FA6-7A89A46E5EDB@oracle.com> <5e1d17af-798f-123f-ef5e-3957b98a8340@oracle.com> Message-ID: <3d0ddee5-ab54-3742-c053-d9cd74a93cb8@oracle.com> Reviewed. Good. Thanks, Vladimir K On 11/20/19 4:27 PM, Igor Ignatyev wrote: > @Misha, > > thanks for your review. > > @list, > can I get a 2nd review from a Reviewer? > > -- Igor > >> On Nov 18, 2019, at 2:06 PM, mikhailo.seledtsov at oracle.com wrote: >> >> Looks good to me, >> >> Misha >> >> On 11/17/19 11:00 AM, Igor Ignatyev wrote: >>> http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html >>>> 16 lines changed: 2 ins; 8 del; 6 mod; >>> Hi all, >>> >>> jdk.test.lib.Platform.isGraal method assumes that JVM w/ Graal as JIT has 'Graal VM' in its name, which is wrong, and caused other to incorrectly assume that '-graal' flag exist and must be used to select Graal compiler. the patch removes this method and updates its only meaningful usage in TestGCLogMessages test. TestGCLogMessages test should use LogMessageWithLevelC2OrJVMCIOnly only when c2 or graal is available, so it's been updated to use corresponding methods of sun.hotspot.code.Compiler class, which requires WhiteBoxAPI being enabled. >>> >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8147017 >>> webrev: http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html >>> testing: tier1 + TestGCLogMessages w/ different JIT configurations >>> >>> Thanks, >>> -- Igor > From fujie at loongson.cn Thu Nov 21 01:58:44 2019 From: fujie at loongson.cn (Jie Fu) Date: Thu, 21 Nov 2019 09:58:44 +0800 Subject: RFR: 8234499: [Graal] compiler/compilercontrol/CompilationModeHighOnlyTest.java test fails with timeout In-Reply-To: <4F729DBC-1527-4899-8F0F-7CA1930BBC7A@oracle.com> References: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn> <4F729DBC-1527-4899-8F0F-7CA1930BBC7A@oracle.com> Message-ID: <3f17a07b-29bc-4386-a539-48866cd79e04@loongson.cn> Hi Igor, Good idea! Thank you very much. Updated: http://cr.openjdk.java.net/~jiefu/8234499/webrev.01/ Hope you can sponsor it if you're OK with it. Thanks a lot. Best regards, Jie On 2019/11/21 ??9:05, Igor Ignatyev wrote: > Hi Jie, > > wouldn't it be a better solution to limit compilation to CompilationModeHighOnlyTest via -XX:CompileControl=compileonly? this should solve timeout w/ Graal w/o removing the only test for CompilationMode=high-only? > > Thanks, > -- Igor > >> On Nov 20, 2019, at 4:39 AM, Jie Fu wrote: >> >> Hi all, >> >> May I get reviews for this small fix? >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8234499 >> Webrev: http://cr.openjdk.java.net/~jiefu/8234499/webrev.00/ >> >> And I need a sponsor. >> >> Thanks a lot. >> Best regards, >> Jie >> From Xiaohong.Gong at arm.com Thu Nov 21 02:16:49 2019 From: Xiaohong.Gong at arm.com (Xiaohong Gong (Arm Technology China)) Date: Thu, 21 Nov 2019 02:16:49 +0000 Subject: RFR: 8234321: Call cache flush after generating trampoline. In-Reply-To: References: Message-ID: Hi, Thanks for your reviewing! @Andrew Dinn So could someone else help to review this patch? Thanks a lot if someone who are familiar with other platforms (ppc, etc) could take a look at it. Thanks, Xiaohong Gong -----Original Message----- From: Andrew Dinn Sent: Wednesday, November 20, 2019 5:12 PM To: Xiaohong Gong (Arm Technology China) ; hotspot-compiler-dev at openjdk.java.net; ioi.lam at oracle.com; calvin.cheung at oracle.com Cc: nd Subject: Re: RFR: 8234321: Call cache flush after generating trampoline. On 20/11/2019 06:36, Xiaohong Gong (Arm Technology China) wrote: > Please help to review this small patch: > Webrev: http://cr.openjdk.java.net/~xgong/rfr/8234321/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8234321 Yes, the patch looks good. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From igor.ignatyev at oracle.com Thu Nov 21 02:17:51 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 20 Nov 2019 18:17:51 -0800 Subject: RFR: 8234499: [Graal] compiler/compilercontrol/CompilationModeHighOnlyTest.java test fails with timeout In-Reply-To: <3f17a07b-29bc-4386-a539-48866cd79e04@loongson.cn> References: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn> <4F729DBC-1527-4899-8F0F-7CA1930BBC7A@oracle.com> <3f17a07b-29bc-4386-a539-48866cd79e04@loongson.cn> Message-ID: Hi Jie, we are trying to replace all usages of -XX:CompileOnly w/ XX:CompileCommand=compileonly, could you please update your patch accordingly? -- Igor > On Nov 20, 2019, at 5:58 PM, Jie Fu wrote: > > Hi Igor, > > Good idea! Thank you very much. > > Updated: http://cr.openjdk.java.net/~jiefu/8234499/webrev.01/ > > Hope you can sponsor it if you're OK with it. > > Thanks a lot. > Best regards, > Jie > > On 2019/11/21 ??9:05, Igor Ignatyev wrote: >> Hi Jie, >> >> wouldn't it be a better solution to limit compilation to CompilationModeHighOnlyTest via -XX:CompileControl=compileonly? this should solve timeout w/ Graal w/o removing the only test for CompilationMode=high-only? >> Thanks, >> -- Igor >> >>> On Nov 20, 2019, at 4:39 AM, Jie Fu wrote: >>> >>> Hi all, >>> >>> May I get reviews for this small fix? >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8234499 >>> Webrev: http://cr.openjdk.java.net/~jiefu/8234499/webrev.00/ >>> >>> And I need a sponsor. >>> >>> Thanks a lot. >>> Best regards, >>> Jie >>> > From igor.ignatyev at oracle.com Thu Nov 21 02:27:30 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 20 Nov 2019 18:27:30 -0800 Subject: RFR(S) : 8147017 : Platform.isGraal should be removed In-Reply-To: <3d0ddee5-ab54-3742-c053-d9cd74a93cb8@oracle.com> References: <981118AF-1DAD-4231-9FA6-7A89A46E5EDB@oracle.com> <5e1d17af-798f-123f-ef5e-3957b98a8340@oracle.com> <3d0ddee5-ab54-3742-c053-d9cd74a93cb8@oracle.com> Message-ID: Hi Vladimir, thanks for your review, pushed. -- Igor > On Nov 20, 2019, at 5:54 PM, Vladimir Kozlov wrote: > > Reviewed. Good. > > Thanks, > Vladimir K > > On 11/20/19 4:27 PM, Igor Ignatyev wrote: >> @Misha, >> thanks for your review. >> @list, >> can I get a 2nd review from a Reviewer? >> -- Igor >>> On Nov 18, 2019, at 2:06 PM, mikhailo.seledtsov at oracle.com wrote: >>> >>> Looks good to me, >>> >>> Misha >>> >>> On 11/17/19 11:00 AM, Igor Ignatyev wrote: >>>> http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html >>>>> 16 lines changed: 2 ins; 8 del; 6 mod; >>>> Hi all, >>>> >>>> jdk.test.lib.Platform.isGraal method assumes that JVM w/ Graal as JIT has 'Graal VM' in its name, which is wrong, and caused other to incorrectly assume that '-graal' flag exist and must be used to select Graal compiler. the patch removes this method and updates its only meaningful usage in TestGCLogMessages test. TestGCLogMessages test should use LogMessageWithLevelC2OrJVMCIOnly only when c2 or graal is available, so it's been updated to use corresponding methods of sun.hotspot.code.Compiler class, which requires WhiteBoxAPI being enabled. >>>> >>>> >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8147017 >>>> webrev: http://cr.openjdk.java.net/~iignatyev//8147017/webrev.00/index.html >>>> testing: tier1 + TestGCLogMessages w/ different JIT configurations >>>> >>>> Thanks, >>>> -- Igor From fujie at loongson.cn Thu Nov 21 02:48:53 2019 From: fujie at loongson.cn (Jie Fu) Date: Thu, 21 Nov 2019 10:48:53 +0800 Subject: RFR: 8234499: [Graal] compiler/compilercontrol/CompilationModeHighOnlyTest.java test fails with timeout In-Reply-To: References: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn> <4F729DBC-1527-4899-8F0F-7CA1930BBC7A@oracle.com> <3f17a07b-29bc-4386-a539-48866cd79e04@loongson.cn> Message-ID: Hi Igor, OK. Updated: http://cr.openjdk.java.net/~jiefu/8234499/webrev.02/ Thanks a lot. Best regards, Jie On 2019/11/21 ??10:17, Igor Ignatyev wrote: > Hi Jie, > > we are trying to replace all usages of -XX:CompileOnly w/ XX:CompileCommand=compileonly, could you please update your patch accordingly? > > -- Igor > >> On Nov 20, 2019, at 5:58 PM, Jie Fu wrote: >> >> Hi Igor, >> >> Good idea! Thank you very much. >> >> Updated: http://cr.openjdk.java.net/~jiefu/8234499/webrev.01/ >> >> Hope you can sponsor it if you're OK with it. >> >> Thanks a lot. >> Best regards, >> Jie >> >> On 2019/11/21 ??9:05, Igor Ignatyev wrote: >>> Hi Jie, >>> >>> wouldn't it be a better solution to limit compilation to CompilationModeHighOnlyTest via -XX:CompileControl=compileonly? this should solve timeout w/ Graal w/o removing the only test for CompilationMode=high-only? >>> Thanks, >>> -- Igor >>> >>>> On Nov 20, 2019, at 4:39 AM, Jie Fu wrote: >>>> >>>> Hi all, >>>> >>>> May I get reviews for this small fix? >>>> >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8234499 >>>> Webrev: http://cr.openjdk.java.net/~jiefu/8234499/webrev.00/ >>>> >>>> And I need a sponsor. >>>> >>>> Thanks a lot. >>>> Best regards, >>>> Jie >>>> From igor.ignatyev at oracle.com Thu Nov 21 03:25:01 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 20 Nov 2019 19:25:01 -0800 Subject: RFR: 8234499: [Graal] compiler/compilercontrol/CompilationModeHighOnlyTest.java test fails with timeout In-Reply-To: References: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn> <4F729DBC-1527-4899-8F0F-7CA1930BBC7A@oracle.com> <3f17a07b-29bc-4386-a539-48866cd79e04@loongson.cn> Message-ID: Hi Jie, just to double check, have you verified that the updated test can still provoke 8233885 ? -- Igor > On Nov 20, 2019, at 6:48 PM, Jie Fu wrote: > > Hi Igor, > > OK. > > Updated: http://cr.openjdk.java.net/~jiefu/8234499/webrev.02/ > > Thanks a lot. > Best regards, > Jie > > On 2019/11/21 ??10:17, Igor Ignatyev wrote: >> Hi Jie, >> >> we are trying to replace all usages of -XX:CompileOnly w/ XX:CompileCommand=compileonly, could you please update your patch accordingly? >> >> -- Igor >> >>> On Nov 20, 2019, at 5:58 PM, Jie Fu wrote: >>> >>> Hi Igor, >>> >>> Good idea! Thank you very much. >>> >>> Updated: http://cr.openjdk.java.net/~jiefu/8234499/webrev.01/ >>> >>> Hope you can sponsor it if you're OK with it. >>> >>> Thanks a lot. >>> Best regards, >>> Jie >>> >>> On 2019/11/21 ??9:05, Igor Ignatyev wrote: >>>> Hi Jie, >>>> >>>> wouldn't it be a better solution to limit compilation to CompilationModeHighOnlyTest via -XX:CompileControl=compileonly? this should solve timeout w/ Graal w/o removing the only test for CompilationMode=high-only? >>>> Thanks, >>>> -- Igor >>>> >>>>> On Nov 20, 2019, at 4:39 AM, Jie Fu wrote: >>>>> >>>>> Hi all, >>>>> >>>>> May I get reviews for this small fix? >>>>> >>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8234499 >>>>> Webrev: http://cr.openjdk.java.net/~jiefu/8234499/webrev.00/ >>>>> >>>>> And I need a sponsor. >>>>> >>>>> Thanks a lot. >>>>> Best regards, >>>>> Jie >>>>> > From fujie at loongson.cn Thu Nov 21 03:34:27 2019 From: fujie at loongson.cn (Jie Fu) Date: Thu, 21 Nov 2019 11:34:27 +0800 Subject: RFR: 8234499: [Graal] compiler/compilercontrol/CompilationModeHighOnlyTest.java test fails with timeout In-Reply-To: References: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn> <4F729DBC-1527-4899-8F0F-7CA1930BBC7A@oracle.com> <3f17a07b-29bc-4386-a539-48866cd79e04@loongson.cn> Message-ID: Yes, it can still provoke 8233885. Thanks. On 2019/11/21 ??11:25, Igor Ignatyev wrote: > have you verified that the updated test can still provoke 8233885 ? From igor.ignatyev at oracle.com Thu Nov 21 03:59:02 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 20 Nov 2019 19:59:02 -0800 Subject: RFR: 8234499: [Graal] compiler/compilercontrol/CompilationModeHighOnlyTest.java test fails with timeout In-Reply-To: References: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn> <4F729DBC-1527-4899-8F0F-7CA1930BBC7A@oracle.com> <3f17a07b-29bc-4386-a539-48866cd79e04@loongson.cn> Message-ID: <928BDF2E-9B89-49F8-BCA6-CFDB14A03298@oracle.com> great, reviewed and integrated. -- Igor > On Nov 20, 2019, at 7:34 PM, Jie Fu wrote: > > Yes, it can still provoke 8233885. > > Thanks. > > On 2019/11/21 ??11:25, Igor Ignatyev wrote: >> have you verified that the updated test can still provoke 8233885 ? > From fujie at loongson.cn Thu Nov 21 04:01:03 2019 From: fujie at loongson.cn (Jie Fu) Date: Thu, 21 Nov 2019 12:01:03 +0800 Subject: RFR: 8234499: [Graal] compiler/compilercontrol/CompilationModeHighOnlyTest.java test fails with timeout In-Reply-To: <928BDF2E-9B89-49F8-BCA6-CFDB14A03298@oracle.com> References: <7e8c5a2a-ea87-a684-4013-b5a60ec4010c@loongson.cn> <4F729DBC-1527-4899-8F0F-7CA1930BBC7A@oracle.com> <3f17a07b-29bc-4386-a539-48866cd79e04@loongson.cn> <928BDF2E-9B89-49F8-BCA6-CFDB14A03298@oracle.com> Message-ID: <3a35f216-dbb0-7b27-6399-7b804b5788f9@loongson.cn> Thank you so much, Igor. On 2019/11/21 ??11:59, Igor Ignatyev wrote: > great, reviewed and integrated. > > -- Igor > >> On Nov 20, 2019, at 7:34 PM, Jie Fu wrote: >> >> Yes, it can still provoke 8233885. >> >> Thanks. >> >> On 2019/11/21 ??11:25, Igor Ignatyev wrote: >>> have you verified that the updated test can still provoke 8233885 ? From igor.ignatyev at oracle.com Thu Nov 21 04:25:44 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 20 Nov 2019 20:25:44 -0800 Subject: RFR(S) : 8225554 : add JFR event for uncommon trap In-Reply-To: <8B759742-CAD0-4811-93C6-18466203A070@oracle.com> References: <688e6abd-fe01-43fd-99c0-a4b8066ddbb2@default> <8B759742-CAD0-4811-93C6-18466203A070@oracle.com> Message-ID: <32AFB259-4D1D-4C0E-B417-3EED0DD07302@oracle.com> ping? > On Jun 19, 2019, at 5:20 PM, Igor Ignatyev wrote: > > Hi Markus, > > I definitely support the idea of making the event as helpful for end-users as possible; and having information about the uncommon trap "location" (method, line-number, bci) seems to be very useful. I don't think that 'instruction' field is helpful though b/c w/o seeing the rest of method code it can't be really used to understand what/why happened. what do you think? > > regarding the name, although I don't think this even can be used by people w/o understanding some level of hotspot internals, they at least need to understand what a bit cryptic reasons and actions mean, "Deoptimization" sounds good to me. > > please let me know how you can proceed further here, I can update my patch to rename even and include location info, or I can just withdraw my patch in favor of yours (that's if you plan to finish work on it in near future and it won't be left for other few years :) ) > > Thanks, > -- Igor > >> On Jun 18, 2019, at 2:44 AM, Markus Gronlund wrote: >> >> Hi Igor, >> >> Thank you for looking into providing this support. >> >> This work partly overlaps with something I have been working on under the following enhancement: >> >> Enh: https://bugs.openjdk.java.net/browse/JDK-8216041 >> >> I have had a patch somewhat semi-ready for some years now, please see: >> http://cr.openjdk.java.net/~mgronlun/8216041/ >> >> Here is what the information set could look visually by default (no structured rendering) in JDK Mission Control: >> http://cr.openjdk.java.net/~mgronlun/8216041/DeoptimizationEvent.jpg >> >> Maybe we should merge our work for this effort (I am interested in your test case)? >> >> I think we need to take a larger view on this, especially to see if this information could also be made understandable and maybe even useful to the end-user / developer. >> >> This is the reason I choose to use the "deoptimization" concept instead of the more internal UncomonTrap. >> >> Let's see if we together can craft a useful event here. >> >> Thanks >> Markus >> >> -----Original Message----- >> From: Igor Ignatyev >> Sent: den 11 juni 2019 20:49 >> To: hotspot-jfr-dev at openjdk.java.net; hotspot compiler >> Subject: RFR(S) : 8225554 : add JFR event for uncommon trap >> >> http://cr.openjdk.java.net/~iignatyev//8225554/webrev.00/index.html >>> 187 lines changed: 184 ins; 0 del; 3 mod; >> >> Hi all, >> >> could you please review this small patch which adds jfr event for uncommon trap? >> >> webrev: http://cr.openjdk.java.net/~iignatyev//8225554/webrev.00/index.html >> JBS: https://bugs.openjdk.java.net/browse/JDK-8225554 >> testing: >> - tier1 (which includes a newly added test) >> - modified version of compiler/intrinsics/klass/CastNullCheckDroppingsTest.java (see JDK-8129092[1]) >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8129092 >> >> Thanks, >> -- Igor > From ioi.lam at oracle.com Thu Nov 21 04:50:10 2019 From: ioi.lam at oracle.com (Ioi Lam) Date: Wed, 20 Nov 2019 20:50:10 -0800 Subject: RFR: 8234321: Call cache flush after generating trampoline. In-Reply-To: References: Message-ID: <4f8dcb9c-f150-7f50-b920-94c2d9cfec13@oracle.com> Hi Xiaohong, The changes look good to me. I am running tests on our test infrastructure now. Do you need a sponsor for pushing the changeset? Thanks - Ioi On 11/20/19 6:16 PM, Xiaohong Gong (Arm Technology China) wrote: > Hi, > > Thanks for your reviewing! @Andrew Dinn > So could someone else help to review this patch? Thanks a lot if someone who are familiar with other platforms (ppc, etc) could take a look at it. > > Thanks, > Xiaohong Gong > > -----Original Message----- > From: Andrew Dinn > Sent: Wednesday, November 20, 2019 5:12 PM > To: Xiaohong Gong (Arm Technology China) ; hotspot-compiler-dev at openjdk.java.net; ioi.lam at oracle.com; calvin.cheung at oracle.com > Cc: nd > Subject: Re: RFR: 8234321: Call cache flush after generating trampoline. > > On 20/11/2019 06:36, Xiaohong Gong (Arm Technology China) wrote: >> Please help to review this small patch: >> Webrev: http://cr.openjdk.java.net/~xgong/rfr/8234321/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8234321 > Yes, the patch looks good. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill > From ioi.lam at oracle.com Thu Nov 21 05:03:14 2019 From: ioi.lam at oracle.com (Ioi Lam) Date: Wed, 20 Nov 2019 21:03:14 -0800 Subject: RFR: 8234321: Call cache flush after generating trampoline. In-Reply-To: References: Message-ID: Hi Xiaohong, The changes look good to me. I am running tests on our test infrastructure now. Do you need a sponsor for pushing the changeset? Thanks - Ioi On 11/20/19 6:16 PM, Xiaohong Gong (Arm Technology China) wrote: > Hi, > > Thanks for your reviewing! @Andrew Dinn > So could someone else help to review this patch? Thanks a lot if > someone who are familiar with other platforms (ppc, etc) could take a > look at it. > > Thanks, > Xiaohong Gong > > -----Original Message----- > From: Andrew Dinn > Sent: Wednesday, November 20, 2019 5:12 PM > To: Xiaohong Gong (Arm Technology China) ; > hotspot-compiler-dev at openjdk.java.net; ioi.lam at oracle.com; > calvin.cheung at oracle.com > Cc: nd > Subject: Re: RFR: 8234321: Call cache flush after generating trampoline. > > On 20/11/2019 06:36, Xiaohong Gong (Arm Technology China) wrote: >> Please help to review this small patch: >> Webrev: http://cr.openjdk.java.net/~xgong/rfr/8234321/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8234321 > Yes, the patch looks good. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill > From tom.rodriguez at oracle.com Thu Nov 21 05:21:59 2019 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 20 Nov 2019 21:21:59 -0800 Subject: RFR 8234359: [JVMCI] invalidate_nmethod_mirror shouldn't use a phantom reference Message-ID: <9e58f59e-b879-c9d5-be40-01a6b61cc87c@oracle.com> http://cr.openjdk.java.net/~never/8234359/webrev https://bugs.openjdk.java.net/browse/JDK-8234359 While testing the latest JVMCI in JDK11, crashes were occurring during draining of the SATB buffers. The problem was tracked down to invalidate_nmethod_mirror being called on an nmethod whose InstalledCode instance was also dead in the current GC. Reading this oop using NativeAccess lead to that oop being enqueued in the SATB buffer. In JDK 14 it appears some other change in G1 disables those barriers at the point this code is executed but in JDK11 no such logic exists. This code never resurrects that oop so using the normal AS_NO_KEEPALIVE semantics is correct and avoids attempting to enqueue the potentially dead object. tom From tobias.hartmann at oracle.com Thu Nov 21 06:13:19 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 21 Nov 2019 07:13:19 +0100 Subject: [14] RFR (S): 8234387: C2: Better support of operands with multiple match rules in AD files In-Reply-To: <30655159-7431-1a33-cb10-373e32c68002@oracle.com> References: <30655159-7431-1a33-cb10-373e32c68002@oracle.com> Message-ID: Hi Vladimir, looks good to me. Best regards, Tobias On 19.11.19 14:00, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8234387/webrev.00 > https://bugs.openjdk.java.net/browse/JDK-8234387 > > Though ADLC accepts operands with multiple match rules, it doesn't generate correct code to handle > them except the first one. > > It doesn't cause any noticeable problems for existing code, but is a major limitation for generic > vector operands (JDK-8234391 [1]). > > Proposed fix enumerates all match rules. > > Fixed some missing declarations along the way. > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, ? > > Testing: tier1-4 (both with and without generic vectors) > > Best regards, > Vladimir Ivanov > > [1] https://bugs.openjdk.java.net/browse/JDK-8234391 From Xiaohong.Gong at arm.com Thu Nov 21 06:24:02 2019 From: Xiaohong.Gong at arm.com (Xiaohong Gong (Arm Technology China)) Date: Thu, 21 Nov 2019 06:24:02 +0000 Subject: RFR: 8234321: Call cache flush after generating trampoline. In-Reply-To: <4f8dcb9c-f150-7f50-b920-94c2d9cfec13@oracle.com> References: <4f8dcb9c-f150-7f50-b920-94c2d9cfec13@oracle.com> Message-ID: Hi Ioi, Thanks for your reviewing! And I'm glad that if you can help to push it if all the tests pass. Thanks, Xiaohong -----Original Message----- From: Ioi Lam Sent: Thursday, November 21, 2019 12:50 PM To: Xiaohong Gong (Arm Technology China) ; Andrew Dinn ; hotspot-compiler-dev at openjdk.java.net; calvin.cheung at oracle.com Cc: nd Subject: Re: RFR: 8234321: Call cache flush after generating trampoline. Hi Xiaohong, The changes look good to me. I am running tests on our test infrastructure now. Do you need a sponsor for pushing the changeset? Thanks - Ioi On 11/20/19 6:16 PM, Xiaohong Gong (Arm Technology China) wrote: > Hi, > > Thanks for your reviewing! @Andrew Dinn So could someone else help to > review this patch? Thanks a lot if someone who are familiar with other platforms (ppc, etc) could take a look at it. > > Thanks, > Xiaohong Gong > > -----Original Message----- > From: Andrew Dinn > Sent: Wednesday, November 20, 2019 5:12 PM > To: Xiaohong Gong (Arm Technology China) ; > hotspot-compiler-dev at openjdk.java.net; ioi.lam at oracle.com; > calvin.cheung at oracle.com > Cc: nd > Subject: Re: RFR: 8234321: Call cache flush after generating trampoline. > > On 20/11/2019 06:36, Xiaohong Gong (Arm Technology China) wrote: >> Please help to review this small patch: >> Webrev: http://cr.openjdk.java.net/~xgong/rfr/8234321/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8234321 > Yes, the patch looks good. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. > 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill > From ioi.lam at oracle.com Thu Nov 21 06:35:32 2019 From: ioi.lam at oracle.com (Ioi Lam) Date: Wed, 20 Nov 2019 22:35:32 -0800 Subject: RFR: 8234321: Call cache flush after generating trampoline. In-Reply-To: References: <4f8dcb9c-f150-7f50-b920-94c2d9cfec13@oracle.com> Message-ID: <4aedfe03-cdee-0b9b-c80d-26dff55829d6@oracle.com> Hi Xiaohong, The patch passed our hs-tier1 and hs-tier2 tests, so I pushed it. http://hg.openjdk.java.net/jdk/jdk/rev/2c55c2fc08f5 Thanks - Ioi On 11/20/19 10:24 PM, Xiaohong Gong (Arm Technology China) wrote: > Hi Ioi, > > Thanks for your reviewing! > And I'm glad that if you can help to push it if all the tests pass. > > Thanks, > Xiaohong > > -----Original Message----- > From: Ioi Lam > Sent: Thursday, November 21, 2019 12:50 PM > To: Xiaohong Gong (Arm Technology China) ; Andrew Dinn ; hotspot-compiler-dev at openjdk.java.net; calvin.cheung at oracle.com > Cc: nd > Subject: Re: RFR: 8234321: Call cache flush after generating trampoline. > > Hi Xiaohong, > > The changes look good to me. I am running tests on our test infrastructure now. > > Do you need a sponsor for pushing the changeset? > > Thanks > - Ioi > > On 11/20/19 6:16 PM, Xiaohong Gong (Arm Technology China) wrote: >> Hi, >> >> Thanks for your reviewing! @Andrew Dinn So could someone else help to >> review this patch? Thanks a lot if someone who are familiar with other platforms (ppc, etc) could take a look at it. >> >> Thanks, >> Xiaohong Gong >> >> -----Original Message----- >> From: Andrew Dinn >> Sent: Wednesday, November 20, 2019 5:12 PM >> To: Xiaohong Gong (Arm Technology China) ; >> hotspot-compiler-dev at openjdk.java.net; ioi.lam at oracle.com; >> calvin.cheung at oracle.com >> Cc: nd >> Subject: Re: RFR: 8234321: Call cache flush after generating trampoline. >> >> On 20/11/2019 06:36, Xiaohong Gong (Arm Technology China) wrote: >>> Please help to review this small patch: >>> Webrev: http://cr.openjdk.java.net/~xgong/rfr/8234321/webrev.00/ >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8234321 >> Yes, the patch looks good. >> >> regards, >> >> >> Andrew Dinn >> ----------- >> Senior Principal Software Engineer >> Red Hat UK Ltd >> Registered in England and Wales under Company Registration No. >> 03798903 >> Directors: Michael Cunningham, Michael ("Mike") O'Neill >> From Xiaohong.Gong at arm.com Thu Nov 21 06:37:25 2019 From: Xiaohong.Gong at arm.com (Xiaohong Gong (Arm Technology China)) Date: Thu, 21 Nov 2019 06:37:25 +0000 Subject: RFR: 8234321: Call cache flush after generating trampoline. In-Reply-To: <4aedfe03-cdee-0b9b-c80d-26dff55829d6@oracle.com> References: <4f8dcb9c-f150-7f50-b920-94c2d9cfec13@oracle.com> <4aedfe03-cdee-0b9b-c80d-26dff55829d6@oracle.com> Message-ID: Hi Ioi, Thanks so much for it! Thanks, Xiaohong -----Original Message----- From: Ioi Lam Sent: Thursday, November 21, 2019 2:36 PM To: Xiaohong Gong (Arm Technology China) ; Andrew Dinn ; hotspot-compiler-dev at openjdk.java.net; calvin.cheung at oracle.com Cc: nd Subject: Re: RFR: 8234321: Call cache flush after generating trampoline. Hi Xiaohong, The patch passed our hs-tier1 and hs-tier2 tests, so I pushed it. http://hg.openjdk.java.net/jdk/jdk/rev/2c55c2fc08f5 Thanks - Ioi On 11/20/19 10:24 PM, Xiaohong Gong (Arm Technology China) wrote: > Hi Ioi, > > Thanks for your reviewing! > And I'm glad that if you can help to push it if all the tests pass. > > Thanks, > Xiaohong > > -----Original Message----- > From: Ioi Lam > Sent: Thursday, November 21, 2019 12:50 PM > To: Xiaohong Gong (Arm Technology China) ; > Andrew Dinn ; hotspot-compiler-dev at openjdk.java.net; > calvin.cheung at oracle.com > Cc: nd > Subject: Re: RFR: 8234321: Call cache flush after generating trampoline. > > Hi Xiaohong, > > The changes look good to me. I am running tests on our test infrastructure now. > > Do you need a sponsor for pushing the changeset? > > Thanks > - Ioi > > On 11/20/19 6:16 PM, Xiaohong Gong (Arm Technology China) wrote: >> Hi, >> >> Thanks for your reviewing! @Andrew Dinn So could someone else help to >> review this patch? Thanks a lot if someone who are familiar with other platforms (ppc, etc) could take a look at it. >> >> Thanks, >> Xiaohong Gong >> >> -----Original Message----- >> From: Andrew Dinn >> Sent: Wednesday, November 20, 2019 5:12 PM >> To: Xiaohong Gong (Arm Technology China) ; >> hotspot-compiler-dev at openjdk.java.net; ioi.lam at oracle.com; >> calvin.cheung at oracle.com >> Cc: nd >> Subject: Re: RFR: 8234321: Call cache flush after generating trampoline. >> >> On 20/11/2019 06:36, Xiaohong Gong (Arm Technology China) wrote: >>> Please help to review this small patch: >>> Webrev: http://cr.openjdk.java.net/~xgong/rfr/8234321/webrev.00/ >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8234321 >> Yes, the patch looks good. >> >> regards, >> >> >> Andrew Dinn >> ----------- >> Senior Principal Software Engineer >> Red Hat UK Ltd >> Registered in England and Wales under Company Registration No. >> 03798903 >> Directors: Michael Cunningham, Michael ("Mike") O'Neill >> From erik.osterlund at oracle.com Thu Nov 21 06:50:55 2019 From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=) Date: Thu, 21 Nov 2019 07:50:55 +0100 Subject: RFR 8234359: [JVMCI] invalidate_nmethod_mirror shouldn't use a phantom reference In-Reply-To: <9e58f59e-b879-c9d5-be40-01a6b61cc87c@oracle.com> References: <9e58f59e-b879-c9d5-be40-01a6b61cc87c@oracle.com> Message-ID: Hi Tom, Looks good. Thanks, /Erik > On 21 Nov 2019, at 06:22, Tom Rodriguez wrote: > > ?http://cr.openjdk.java.net/~never/8234359/webrev > https://bugs.openjdk.java.net/browse/JDK-8234359 > > While testing the latest JVMCI in JDK11, crashes were occurring during draining of the SATB buffers. The problem was tracked down to invalidate_nmethod_mirror being called on an nmethod whose InstalledCode instance was also dead in the current GC. Reading this oop using NativeAccess lead to that oop being enqueued in the SATB buffer. In JDK 14 it appears some other change in G1 disables those barriers at the point this code is executed but in JDK11 no such logic exists. This code never resurrects that oop so using the normal AS_NO_KEEPALIVE semantics is correct and avoids attempting to enqueue the potentially dead object. > > tom From igor.ignatyev at oracle.com Thu Nov 21 07:33:19 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 20 Nov 2019 23:33:19 -0800 Subject: RFR(S) : 8234290 : compiler/c2/Test6857159.java times out and fail to clean up files In-Reply-To: <37F8BFE5-CF19-42CC-8C26-ECCB2008D4A1@oracle.com> References: <37F8BFE5-CF19-42CC-8C26-ECCB2008D4A1@oracle.com> Message-ID: ping? -- Igor > On Nov 16, 2019, at 10:07 PM, Igor Ignatyev wrote: > > http://cr.openjdk.java.net/~iignatyev//8234290/webrev.00/index.html >> 67 lines changed: 16 ins; 24 del; 27 mod; > > Hi all, > > could you please review this small fix for Test6857159 test? > from JBS: >> the test has -XX:CompileOnly=compiler.c2.Test6857159$Test$ct::run, but there is no 'ct' class, there are ct[0-2], and ct0 the only which has 'run' method. shouldNotContain("COMPILE SKIPPED") and shouldContain("$ct0::run (16 bytes)"), which, I guess, were a defense against such situation, didn't help b/c PrintCompilation output doesn't have 'COMPILE SKIPPED' lines and have 'made not compilable on levels 0 1 2 3 ... $ct0::run (16 bytes) excluded by CompileCommand' line. > the patch fixes CompileOnly value (actually replaces it w/ the correct CompileCommand), removes extra layer, and makes the test to use WhiteBox to check if ct0::run got compiled. > > webrev: http://cr.openjdk.java.net/~iignatyev//8234290/webrev.00/index.html > JBS: https://bugs.openjdk.java.net/browse/JDK-8234290 > testing: > - compiler/c2/Test6857159.java once on linux-x64,windows-x64,macosx-x64 > - compiler/c2/Test6857159.java 100 time on windows-x64-debug (where all failures were seen so far) > > Thanks, > -- Igor > From erik.osterlund at oracle.com Thu Nov 21 08:40:12 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 21 Nov 2019 09:40:12 +0100 Subject: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier In-Reply-To: References: Message-ID: Hi Paul, On 2019-11-20 23:22, Hohensee, Paul wrote: > It's not just the zgc load barrier that's affected, it's every jcc and fuzed jcc (e.g., cmp/jcc and sub/jcc, because the pairs are issued on the same clock). Yeah. We also have to mitigate against unconditional branches, like ret and jmp. So I suppose the name "jcc erratum" is slightly misleading in this context. > There's a code pattern attribute called ins_alignment() in the ad file, vis > > ins_attrib ins_alignment(1); // Required alignment attribute (must > // be a power of 2) specifies the > // alignment that some part of the > // instruction (not necessarily the > // start) requires. If > 1, a > // compute_padding() function must be > // provided for the instruction > > Would it be possible to use/enhance ins_alignment() rather than do something zgc-specific? > Thanks, That is a good question. Unfortunately, there are a few problems applying such a strategy: 1) We do not want to constrain the alignment such that the instruction (+ specific offset) sits at e.g. the beginning of a 32 byte boundary. We want to be more loose and say that any alignment is fine... except the bad ones (crossing and ending at a 32 byte boundary). Otherwise I fear we will find ourselves bloating the code cache with unnecessary nops to align instructions that would never have been a problem. So in terms of alignment constraints, I think such a hammer is too big. 2) Another issue is that the alignment constraints apply not just to the one Mach node. It's sometimes for a fused op + jcc. Since we currently match the conditions and their branches separately (and the conditions not necessarily knowing they are indeed conditions to a branch, like for example an and instruction). So aligning the jcc for example is not necessarily going to help, unless its alignment knows what its preceding instruction is, and whether it will be fused or not. And depending on that, we want different alignment properties. So here the hammer is seemingly too loose. I'm not 100% sure what to suggest for the generic case, but perhaps: After things stopped moving around, add a pass to the Mach nodes, similar to branch shortening that: 1) Set up a new flag (Flags_intel_jcc_mitigation or something) to be used on Mach nodes to mark affected nodes. 2) Walk the Mach nodes and tag branches and conditions used by fused branches (by walking edges), checking that the two are adjacent (by looking at the node index in the block), and possibly also checking that it is one of the affected condition instructions that will get fused. 3) Now that we know what Mach nodes (and sequences of macro fused nodes) are problematic, we can put some code where the mach nodes are emitted that checks for consecutively tagged nodes and inject nops in the code buffer if they cross or end at 32 byte boundaries. I suppose an alternative strategy is making sure that any problematic instruction sequence that would be fused, is also fused into one Mach node by sprinkling more rules in the AD file for the various forms of conditional branches that we think cover all the cases, and then applying the alignment constraint on individual nodes only. But it feels like that could be more intrusive and less efficient). Since the generic problem is more involved compared to the simpler ZGC load barrier fix (which will need special treatment anyway), I would like to focus this RFE only on the ZGC load barrier branch, because it makes me sad when it has to suffer. Having said that, we will certainly look into fixing the generic problem too after this. Thanks, /Erik From per.liden at oracle.com Thu Nov 21 10:35:44 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 21 Nov 2019 11:35:44 +0100 Subject: RFR: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier In-Reply-To: References: Message-ID: <1dd569ef-6651-f728-bcdb-684dcb7c61f2@oracle.com> On 11/19/19 3:20 PM, erik.osterlund at oracle.com wrote: > Hi, > > Intel released an erratum (SKX102) which causes "unexpected system > behaviour" when branches > (including fused conditional branches) cross or end at 64 byte boundaries. > They are mitigating this by rolling out microcode updates that disable > micro op caching for > conditional branches that cross or end at 32 byte boundaries. The > mitigation can cause > performance regressions, unless affected branches are aligned properly. > > The erratum and its mitigation are described in more detail in this > document published by Intel: > https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf > > > My intention for this patch is to introduce the infrastructure to > determine that we may > have an affected CPU, and mitigate this by aligning the most important > branch in the whole > JVM: the ZGC load barrier fast path check. Perhaps similar methodology > can be reused later > to solve this for other performance critical code, but that is outside > the scope of this CR. > > The sprinkling of nops do not seem to cause regressions in workloads I > have tried, given a > machine without the JCC mitigations. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8234160 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8234160/webrev.00/ Looks good! I think this is a good solution for ZGC for now. Solving this for all non-ZGC branches is a lot more involved. But if/when a more generic solution arrives, we can easily just remove this again. /Per > > Thanks, > /Erik From vladimir.x.ivanov at oracle.com Thu Nov 21 11:02:08 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 21 Nov 2019 14:02:08 +0300 Subject: RFR: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier In-Reply-To: References: Message-ID: <1e9031de-a5a9-fdf6-8284-bf3cc904320b@oracle.com> Thanks for taking care of it, Erik. Overall, the approach you chose looks promising. I'll let Intel folks to comment on the details of CPU model dispatching in VM_Version::compute_has_intel_jcc_erratum(). As an alternative solution, you could just align instructions on 8/16-byte boundary (for 5 and 10 byte instruction sequencies respectively). It'll definitely need more padding, but it looks easier to implement as well. Do you consider additional padding as risky from performance perspective? Regarding the implementation itself, it looks like MacroAssembler is the best place for it. There are 3 parts (mostly independent) of the fix you put in the single place: what to do (how much padding needed), where to do (what code is affected), and when to apply it (based on whether hardware is affected or not). Even if you want to start with ZGC load barrier, it would be nice to factor the machinery in such a way that it's easy to apply it in the new code. For example, AbstractAssembler already holds some state which is managed in RAII-style (e.g., InstructionMark and ShortBranchVerifier). You could introduce a new capability in MacroAssembler which conditionally pads jumps and conditional jumps. I'm fine with doing the full refactoring later, but it would be nice to do first steps in that direction right away. Best regards, Vladimir Ivanov On 19.11.2019 17:20, erik.osterlund at oracle.com wrote: > Hi, > > Intel released an erratum (SKX102) which causes "unexpected system > behaviour" when branches > (including fused conditional branches) cross or end at 64 byte boundaries. > They are mitigating this by rolling out microcode updates that disable > micro op caching for > conditional branches that cross or end at 32 byte boundaries. The > mitigation can cause > performance regressions, unless affected branches are aligned properly. > > The erratum and its mitigation are described in more detail in this > document published by Intel: > https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf > > > My intention for this patch is to introduce the infrastructure to > determine that we may > have an affected CPU, and mitigate this by aligning the most important > branch in the whole > JVM: the ZGC load barrier fast path check. Perhaps similar methodology > can be reused later > to solve this for other performance critical code, but that is outside > the scope of this CR. > > The sprinkling of nops do not seem to cause regressions in workloads I > have tried, given a > machine without the JCC mitigations. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8234160 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8234160/webrev.00/ > > Thanks, > /Erik From vladimir.x.ivanov at oracle.com Thu Nov 21 11:12:53 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 21 Nov 2019 14:12:53 +0300 Subject: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier In-Reply-To: References: Message-ID: <8fb87bca-a013-e2e0-211f-ebcd5931b8c1@oracle.com> (Missed Paul's and your response when sending previous email.) > That is a good question. Unfortunately, there are a few problems > applying such a strategy: > > 1) We do not want to constrain the alignment such that the instruction > (+ specific offset) sits at e.g. the beginning of a 32 byte boundary. We > want to be more loose and say that any alignment is fine... except the > bad ones (crossing and ending at a 32 byte boundary). Otherwise I fear > we will find ourselves bloating the code cache with unnecessary nops to > align instructions that would never have been a problem. So in terms of > alignment constraints, I think such a hammer is too big. It would be interesting to have some data on that one. Aligning 5-byte instruction on 8-byte boundary wastes 3 bytes at most. For 10-byte sequence it wastes 6 bytes at most which doesn't sound good. > 2) Another issue is that the alignment constraints apply not just to the > one Mach node. It's sometimes for a fused op + jcc. Since we currently > match the conditions and their branches separately (and the conditions > not necessarily knowing they are indeed conditions to a branch, like for > example an and instruction). So aligning the jcc for example is not > necessarily going to help, unless its alignment knows what its preceding > instruction is, and whether it will be fused or not. And depending on > that, we want different alignment properties. So here the hammer is > seemingly too loose. I mentioned MacroAssembler in previous email, because I don't consider it as C2-specific problem. Stubs, interpreter, and C1 are also affected and we need to fix them too (considering being on the edge of cache line may cause unpredictable behavior). Detecting instruction sequencies is harder than aligning a single one, but still possible. And MacroAssembler can introduce a new "macro" instruction for conditional jumps which solves the detection problem once the code base migrate to it. Best regards, Vladimir Ivanov > I'm not 100% sure what to suggest for the generic case, but perhaps: > > After things stopped moving around, add a pass to the Mach nodes, > similar to branch shortening that: > > 1) Set up a new flag (Flags_intel_jcc_mitigation or something) to be > used on Mach nodes to mark affected nodes. > 2) Walk the Mach nodes and tag branches and conditions used by fused > branches (by walking edges), checking that the two are adjacent (by > looking at the node index in the block), and possibly also checking that > it is one of the affected condition instructions that will get fused. > 3) Now that we know what Mach nodes (and sequences of macro fused nodes) > are problematic, we can put some code where the mach nodes are emitted > that checks for consecutively tagged nodes and inject nops in the code > buffer if they cross or end at 32 byte boundaries. > > I suppose an alternative strategy is making sure that any problematic > instruction sequence that would be fused, is also fused into one Mach > node by sprinkling more rules in the AD file for the various forms of > conditional branches that we think cover all the cases, and then > applying the alignment constraint on individual nodes only. But it feels > like that could be more intrusive and less efficient). > > Since the generic problem is more involved compared to the simpler ZGC > load barrier fix (which will need special treatment anyway), I would > like to focus this RFE only on the ZGC load barrier branch, because it > makes me sad when it has to suffer. Having said that, we will certainly > look into fixing the generic problem too after this. From vladimir.x.ivanov at oracle.com Thu Nov 21 11:21:32 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 21 Nov 2019 14:21:32 +0300 Subject: Allocation of array copy can be eliminated in particular cases In-Reply-To: <30768651574285448@vla1-a6eaa355d163.qloud-c.yandex.net> References: <48528911574117040@sas8-55d8cbf44a35.qloud-c.yandex.net> <9d229f16-e3e8-c848-eea8-f4a24082aa3f@oracle.com> <30768651574285448@vla1-a6eaa355d163.qloud-c.yandex.net> Message-ID: <7e0bbca0-56ea-e64f-1720-1c4fbed8f1ff@oracle.com> >> Moreover, the transformation is already there: >> >> http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/opto/memnode.cpp#l2388 > > Comment in line 2389 seems confusing to me: > > // This works even if the length is not constant (clone or newArray). > > When we clone array isn't the length constant and equal to the length of original array? I guess it cannot be different. In the context of JIT-compilers, a "constant" means "compile-time constant" - a value known to JIT-compiler. The constant case is when the array has always the same length (e.g., "original_array.length == 0"). What you are describing is an invariant on 2 values: the length of original array and the length of cloned array are equal. But everything JIT-compiler knows about those values is (1) "original_array.length == cloned_array.length"; and (2) "original_array.length >= 0". >> I haven't looked into the benchmarks you mentioned, but it looks like >> cloned_array.length access is not the reason why cloned array is still >> there. > > Once I thought that cloned array is retained at run time because it's returned from method in original benchmark: ... > From this output I conclude that either I miss something from understanding of how compiler and runtime work, or this is a bug. > > I will be happy to understand which of the two is correct :) I assume Nils answered your question why cloning isn't eliminated right now. Best regards, Vladimir Ivanov From vladimir.x.ivanov at oracle.com Thu Nov 21 11:22:13 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 21 Nov 2019 14:22:13 +0300 Subject: [14] RFR (S): 8234387: C2: Better support of operands with multiple match rules in AD files In-Reply-To: References: <30655159-7431-1a33-cb10-373e32c68002@oracle.com> Message-ID: <12d293f8-3ecd-2027-f731-b0ee1361250c@oracle.com> Thanks, Tobias. Best regards, Vladimir Ivanov On 21.11.2019 09:13, Tobias Hartmann wrote: > Hi Vladimir, > > looks good to me. > > Best regards, > Tobias > > On 19.11.19 14:00, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234387/webrev.00 >> https://bugs.openjdk.java.net/browse/JDK-8234387 >> >> Though ADLC accepts operands with multiple match rules, it doesn't generate correct code to handle >> them except the first one. >> >> It doesn't cause any noticeable problems for existing code, but is a major limitation for generic >> vector operands (JDK-8234391 [1]). >> >> Proposed fix enumerates all match rules. >> >> Fixed some missing declarations along the way. >> >> Contributed-by: Jatin Bhateja >> Reviewed-by: vlivanov, sviswanathan, ? >> >> Testing: tier1-4 (both with and without generic vectors) >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8234391 From nils.eliasson at oracle.com Thu Nov 21 11:53:43 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 21 Nov 2019 12:53:43 +0100 Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles In-Reply-To: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com> References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com> Message-ID: <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com> I updated this to version 2. http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/ I found a problen running compiler/arguments/TestStressReflectiveCode.java Even though the clone was created as a oop clone, the type node type returns isa_aryprt. This is caused by the src ptr not being the base pointer. Until I fix that I wanted a more robust test. In this webrev I split up the is_clonebasic into is_clone_oop and is_clone_array. (is_clone_oop_array is already there). Having a complete set with the three clone types allows for a robust test and easy verification. (The three variants end up in different paths with different GCs). Regards, Nils On 2019-11-20 15:25, Nils Eliasson wrote: > Hi, > > I found a few bugs after the enabling of the clone intrinsic in ZGC. > > 1) The arraycopy clone_basic has the parameters adjusted to work as a > memcopy. For an oop the src is pointing inside the oop to where we > want to start copying. But when we want to do a runtime call to clone > - the parameters are supposed to be the actual src oop and dst oop, > and the size should be the instance size. > > For now I have made a workaround. What should be done later is using > the offset in the arraycopy node to encode where the payload is, so > that the base pointers are always correct. But that would require > changes to the BarrierSet classes of all GCs. So I leave that for next > release. > > 2) The size parameter of the TypeFunc for the runtime call has the > wrong type. It was originally Long but missed the upper Half, it was > fixed to INT (JDK-8233834), but that is wrong and causes the compiles > to be skipped. We didn't notice that since they failed silently. That > is also why we didn't notice problem #1 too. > > https://bugs.openjdk.java.net/browse/JDK-8234520 > > http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/ > > Please review! > > Nils > From claes.redestad at oracle.com Thu Nov 21 13:21:31 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 21 Nov 2019 14:21:31 +0100 Subject: RFR: 8234328: VectorSet::clear can cause fragmentation In-Reply-To: References: <47dce9ee-0e62-7375-4dff-2924f824ecc6@oracle.com> Message-ID: Hi Thomas, On 2019-11-19 20:31, Thomas St?fe wrote: > Hi Claes, > > Not that this is wrong, but do we have to live in resource area? I fell > over such problems several times already, e.g. with resource-area-backed > StringStreams. Maybe it would be better to just forbid resizing of > RA-allocated arrays altogether. this all gets a bit out of scope, but what alternatives do you see to living in the resource area in general for something like this? > > Then there is also the problem with passing RA-allocated arrays down the > stack and accidentally resizing them under a different ResourceMark. I > am not sure if this could happen with VectorSet though. AFAICT there's a ResourceMark at the entry point of compilation, then all others are restricted to a local scope around logging and similar, so it doesn't _look_ like there's any potential issues around. I guess it'd be nice in general with some sort of debug-only ProhibitResourceMark(Arena*) you could wrap around calls into utilities out of your control which would assert if any code tries to allocate/reallocate/free in a specific resource arena. Thanks /Claes > > Thanks, Thomas > > On Tue, Nov 19, 2019 at 11:16 AM Claes Redestad > > wrote: > Webrev: http://cr.openjdk.java.net/~redestad/8234328/open.00/ From aph at redhat.com Thu Nov 21 14:04:07 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 21 Nov 2019 14:04:07 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com> <638df51f-f9e0-5ceb-6509-7d396f85d8ec@redhat.com> Message-ID: <399b921f-2110-3ceb-24b9-e346e7733ab5@redhat.com> On 11/19/19 10:03 AM, Pengfei Li (Arm Technology China) wrote: >> We should have a flag which is set if the search for nicely-aligned >> memory is successful, and then you can use that flag to determine if r27 is needed. > In which file do you think we should add the flag? Can we just check the value of CompressedKlassPointers::base() in reg_mask_init() ? I would call from the #ifdef AARCH64 code that allocates the memory into a static method Assembler::setCompressedBaseAndScale(). -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Thu Nov 21 14:10:03 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 21 Nov 2019 15:10:03 +0100 Subject: [14] RFR (S): 8234394: C2: Dynamic register class support in ADLC In-Reply-To: <82d28c5a-1b18-240d-8356-5e4266c63bd1@oracle.com> References: <82d28c5a-1b18-240d-8356-5e4266c63bd1@oracle.com> Message-ID: <9091c3e0-e784-cba4-94ae-88daf42fed12@oracle.com> Hi Vladimir, looks good to me. Best regards, Tobias On 19.11.19 14:40, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8234394/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8234394 > > Introduce new "placeholder" register class which denotes that instructions which use operands of > such class should dynamically query register masks from the operand instance and not hard-code them > in the code. > > It is required for generic vectors in order to support generic vector operand (vec/legVec) > replacement with fixed-sized vector operands (vec[SDXYZ]/legVec[SDXYZ]) after matching is over. > > As an example of usage, generic vector operand is declared as: > > operand vec() %{ > ? constraint(ALLOC_IN_RC(dynamic)); > ? match(VecX); > ? match(VecY); > ? match(VecZ); > ? match(VecS); > ? match(VecD); > ... > > Then for an instruction which uses vec as DEF > > x86.ad: > instruct loadV4(vec dst, memory mem) %{ > > =ADLC=> > > ad_x86_misc.cpp: > const RegMask &loadV4Node::out_RegMask() const { > ? return (*_opnds[0]->in_RegMask(0)); > } > > vs > > x86.ad: > instruct loadV4(vecS dst, memory mem) %{ > > =ADLC=> > > ad_x86_misc.cpp: > const RegMask &loadV4Node::out_RegMask() const { > ? return (VECTORS_REG_VLBWDQ_mask()); > } > > > An operand with dynamic register class can't be used during code emission and should be replaced > with something different before register allocation: > > const RegMask *vecOper::in_RegMask(int index) const { > ? return &RegMask::Empty; > } > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, ? > > Testing: tier1-4 (both with and without generic vector operands) > > Best regards, > Vladimir Ivanov From tobias.hartmann at oracle.com Thu Nov 21 14:14:57 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 21 Nov 2019 15:14:57 +0100 Subject: RFR(S) : 8234290 : compiler/c2/Test6857159.java times out and fail to clean up files In-Reply-To: References: <37F8BFE5-CF19-42CC-8C26-ECCB2008D4A1@oracle.com> Message-ID: <84f88569-8467-7a01-2941-5f69c1ee862b@oracle.com> Hi Igor, nice cleanup. Looks good to me. Best regards, Tobias On 21.11.19 08:33, Igor Ignatyev wrote: > ping? > > -- Igor > >> On Nov 16, 2019, at 10:07 PM, Igor Ignatyev wrote: >> >> http://cr.openjdk.java.net/~iignatyev//8234290/webrev.00/index.html >>> 67 lines changed: 16 ins; 24 del; 27 mod; >> >> Hi all, >> >> could you please review this small fix for Test6857159 test? >> from JBS: >>> the test has -XX:CompileOnly=compiler.c2.Test6857159$Test$ct::run, but there is no 'ct' class, there are ct[0-2], and ct0 the only which has 'run' method. shouldNotContain("COMPILE SKIPPED") and shouldContain("$ct0::run (16 bytes)"), which, I guess, were a defense against such situation, didn't help b/c PrintCompilation output doesn't have 'COMPILE SKIPPED' lines and have 'made not compilable on levels 0 1 2 3 ... $ct0::run (16 bytes) excluded by CompileCommand' line. >> the patch fixes CompileOnly value (actually replaces it w/ the correct CompileCommand), removes extra layer, and makes the test to use WhiteBox to check if ct0::run got compiled. >> >> webrev: http://cr.openjdk.java.net/~iignatyev//8234290/webrev.00/index.html >> JBS: https://bugs.openjdk.java.net/browse/JDK-8234290 >> testing: >> - compiler/c2/Test6857159.java once on linux-x64,windows-x64,macosx-x64 >> - compiler/c2/Test6857159.java 100 time on windows-x64-debug (where all failures were seen so far) >> >> Thanks, >> -- Igor >> > From vladimir.x.ivanov at oracle.com Thu Nov 21 14:30:10 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 21 Nov 2019 17:30:10 +0300 Subject: [14] RFR (S): 8234394: C2: Dynamic register class support in ADLC In-Reply-To: <9091c3e0-e784-cba4-94ae-88daf42fed12@oracle.com> References: <82d28c5a-1b18-240d-8356-5e4266c63bd1@oracle.com> <9091c3e0-e784-cba4-94ae-88daf42fed12@oracle.com> Message-ID: Thanks, Tobias. Best regards, Vladimir Ivanov On 21.11.2019 17:10, Tobias Hartmann wrote: > Hi Vladimir, > > looks good to me. > > Best regards, > Tobias > > On 19.11.19 14:40, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234394/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8234394 >> >> Introduce new "placeholder" register class which denotes that instructions which use operands of >> such class should dynamically query register masks from the operand instance and not hard-code them >> in the code. >> >> It is required for generic vectors in order to support generic vector operand (vec/legVec) >> replacement with fixed-sized vector operands (vec[SDXYZ]/legVec[SDXYZ]) after matching is over. >> >> As an example of usage, generic vector operand is declared as: >> >> operand vec() %{ >> ? constraint(ALLOC_IN_RC(dynamic)); >> ? match(VecX); >> ? match(VecY); >> ? match(VecZ); >> ? match(VecS); >> ? match(VecD); >> ... >> >> Then for an instruction which uses vec as DEF >> >> x86.ad: >> instruct loadV4(vec dst, memory mem) %{ >> >> =ADLC=> >> >> ad_x86_misc.cpp: >> const RegMask &loadV4Node::out_RegMask() const { >> ? return (*_opnds[0]->in_RegMask(0)); >> } >> >> vs >> >> x86.ad: >> instruct loadV4(vecS dst, memory mem) %{ >> >> =ADLC=> >> >> ad_x86_misc.cpp: >> const RegMask &loadV4Node::out_RegMask() const { >> ? return (VECTORS_REG_VLBWDQ_mask()); >> } >> >> >> An operand with dynamic register class can't be used during code emission and should be replaced >> with something different before register allocation: >> >> const RegMask *vecOper::in_RegMask(int index) const { >> ? return &RegMask::Empty; >> } >> >> Contributed-by: Jatin Bhateja >> Reviewed-by: vlivanov, sviswanathan, ? >> >> Testing: tier1-4 (both with and without generic vector operands) >> >> Best regards, >> Vladimir Ivanov From erik.osterlund at oracle.com Thu Nov 21 15:51:05 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Thu, 21 Nov 2019 16:51:05 +0100 Subject: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier In-Reply-To: <8fb87bca-a013-e2e0-211f-ebcd5931b8c1@oracle.com> References: <8fb87bca-a013-e2e0-211f-ebcd5931b8c1@oracle.com> Message-ID: <0bf38389-2b1a-3198-acf5-3b526a9944d9@oracle.com> Hi Vladimir, Thank you for reviewing this patch. On 11/21/19 12:12 PM, Vladimir Ivanov wrote: > (Missed Paul's and your response when sending previous email.) > >> That is a good question. Unfortunately, there are a few problems >> applying such a strategy: >> >> 1) We do not want to constrain the alignment such that the >> instruction (+ specific offset) sits at e.g. the beginning of a 32 >> byte boundary. We want to be more loose and say that any alignment is >> fine... except the bad ones (crossing and ending at a 32 byte >> boundary). Otherwise I fear we will find ourselves bloating the code >> cache with unnecessary nops to align instructions that would never >> have been a problem. So in terms of alignment constraints, I think >> such a hammer is too big. > > It would be interesting to have some data on that one. Aligning 5-byte > instruction on 8-byte boundary wastes 3 bytes at most. For 10-byte > sequence it wastes 6 bytes at most which doesn't sound good. I think you missed one of my points (#2), which is that aligning single instructions is not enough to remedy the problem. For example, consider you have an add; jne; sequence. Let's say we decide on a magical alignment that we apply globally for jcc instructions. Assume that the jne was indeed aligned correctly and we insert no nops. Then it is not necessarily the case that the fused add + jne sequence has the desired alignment property as well (i.e. the add might cross 32 byte boundaries, tainting the macro fused micro code). Therefore, this will not work. I suppose that if you would always put in at least one nop before the jcc to break all macro fusions of branches globally, then you will be able to do that. But that seems like a larger hammer than what we need. > >> 2) Another issue is that the alignment constraints apply not just to >> the one Mach node. It's sometimes for a fused op + jcc. Since we >> currently match the conditions and their branches separately (and the >> conditions not necessarily knowing they are indeed conditions to a >> branch, like for example an and instruction). So aligning the jcc for >> example is not necessarily going to help, unless its alignment knows >> what its preceding instruction is, and whether it will be fused or >> not. And depending on that, we want different alignment properties. >> So here the hammer is seemingly too loose. > > I mentioned MacroAssembler in previous email, because I don't consider > it as C2-specific problem. Stubs, interpreter, and C1 are also > affected and we need to fix them too (considering being on the edge of > cache line may cause unpredictable behavior). I disagree that this is a correctness fix. The correctness fix is for branches on 64 byte boundaries, and is being dealt with using micro code updates (that disables micro op caching of the problematic branch and fused branch micro ops). What we are dealing with here is mitigating the performance hit of Intel's correctness mitigation for the erratum, which involves branches and fused branches crossing or ending at 32 byte boundaries. In other words, the correctness is dealt with elsewhere, and we are optimizing the code to avoid regressions for performance sensitive branches, due to that correctness fix. Therefore, I think it is wise to focus the optimization efforts where it matters the most: C2. > Detecting instruction sequencies is harder than aligning a single one, > but still possible. And MacroAssembler can introduce a new "macro" > instruction for conditional jumps which solves the detection problem > once the code base migrate to it. Maybe. As I said, alignment is not enough, because it does not catch problematic macro fusing. We would have to do something more drastic like intentionally breaking all macro fusion by putting leading nops regardless of alignment in branches. I'm not sure what I think about that, given that we do this to optimize the code (and not as a correctness fix). I don't think that doing some analysis on the Mach nodes and injecting the padding only where we actually need it is too complicated in C2 (which I believe, at least for now, is where we should focus). I have made a prototype, what this might look like and it looks like this: http://cr.openjdk.java.net/~eosterlund/8234160/webrev.01/ Incremental: http://cr.openjdk.java.net/~eosterlund/8234160/webrev.00_01/ The idea is pretty much what I said to Paul. There are 3 hooks needed: 1) Apply pessimistic size measurements during branch shortening on affected nodes 2) Analyze which nodes will fuse, and tag all affected mach nodes with a flag 3) When emitting the code, add required padding on the flagged nodes that end at or cross 32 byte boundaries. I haven't run exhaustive tests or measurements yet on this. I thought we should sync ideas so we agree about direction before I do too much. What do you think? Thanks, /Erik > Best regards, > Vladimir Ivanov > >> I'm not 100% sure what to suggest for the generic case, but perhaps: >> >> After things stopped moving around, add a pass to the Mach nodes, >> similar to branch shortening that: >> >> 1) Set up a new flag (Flags_intel_jcc_mitigation or something) to be >> used on Mach nodes to mark affected nodes. >> 2) Walk the Mach nodes and tag branches and conditions used by fused >> branches (by walking edges), checking that the two are adjacent (by >> looking at the node index in the block), and possibly also checking >> that it is one of the affected condition instructions that will get >> fused. >> 3) Now that we know what Mach nodes (and sequences of macro fused >> nodes) are problematic, we can put some code where the mach nodes are >> emitted that checks for consecutively tagged nodes and inject nops in >> the code buffer if they cross or end at 32 byte boundaries. >> >> I suppose an alternative strategy is making sure that any problematic >> instruction sequence that would be fused, is also fused into one Mach >> node by sprinkling more rules in the AD file for the various forms of >> conditional branches that we think cover all the cases, and then >> applying the alignment constraint on individual nodes only. But it >> feels like that could be more intrusive and less efficient). >> >> Since the generic problem is more involved compared to the simpler >> ZGC load barrier fix (which will need special treatment anyway), I >> would like to focus this RFE only on the ZGC load barrier branch, >> because it makes me sad when it has to suffer. Having said that, we >> will certainly look into fixing the generic problem too after this. > From fweimer at redhat.com Thu Nov 21 16:05:44 2019 From: fweimer at redhat.com (Florian Weimer) Date: Thu, 21 Nov 2019 17:05:44 +0100 Subject: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier In-Reply-To: (Paul Hohensee's message of "Wed, 20 Nov 2019 22:22:30 +0000") References: Message-ID: <87d0dlkyef.fsf@oldenburg2.str.redhat.com> * Paul Hohensee: > It's not just the zgc load barrier that's affected, it's every jcc and > fuzed jcc (e.g., cmp/jcc and sub/jcc, because the pairs are issued on > the same clock). The microcode update reportedly affects all of JMP/Jcc/CALL/RET subject to the alignment constraints, not just Jcc. Therefore, mitigating the performance impact and papering over the original issue require different machine code. Thanks, Florian From vladimir.kozlov at oracle.com Thu Nov 21 17:44:00 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 21 Nov 2019 09:44:00 -0800 Subject: [14] RFR (S): 8234394: C2: Dynamic register class support in ADLC In-Reply-To: <9091c3e0-e784-cba4-94ae-88daf42fed12@oracle.com> References: <82d28c5a-1b18-240d-8356-5e4266c63bd1@oracle.com> <9091c3e0-e784-cba4-94ae-88daf42fed12@oracle.com> Message-ID: <4a0741bf-77b6-eb67-87f6-ae4da7e4c3e0@oracle.com> +1 Vladimir K On 11/21/19 6:10 AM, Tobias Hartmann wrote: > Hi Vladimir, > > looks good to me. > > Best regards, > Tobias > > On 19.11.19 14:40, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234394/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8234394 >> >> Introduce new "placeholder" register class which denotes that instructions which use operands of >> such class should dynamically query register masks from the operand instance and not hard-code them >> in the code. >> >> It is required for generic vectors in order to support generic vector operand (vec/legVec) >> replacement with fixed-sized vector operands (vec[SDXYZ]/legVec[SDXYZ]) after matching is over. >> >> As an example of usage, generic vector operand is declared as: >> >> operand vec() %{ >> ? constraint(ALLOC_IN_RC(dynamic)); >> ? match(VecX); >> ? match(VecY); >> ? match(VecZ); >> ? match(VecS); >> ? match(VecD); >> ... >> >> Then for an instruction which uses vec as DEF >> >> x86.ad: >> instruct loadV4(vec dst, memory mem) %{ >> >> =ADLC=> >> >> ad_x86_misc.cpp: >> const RegMask &loadV4Node::out_RegMask() const { >> ? return (*_opnds[0]->in_RegMask(0)); >> } >> >> vs >> >> x86.ad: >> instruct loadV4(vecS dst, memory mem) %{ >> >> =ADLC=> >> >> ad_x86_misc.cpp: >> const RegMask &loadV4Node::out_RegMask() const { >> ? return (VECTORS_REG_VLBWDQ_mask()); >> } >> >> >> An operand with dynamic register class can't be used during code emission and should be replaced >> with something different before register allocation: >> >> const RegMask *vecOper::in_RegMask(int index) const { >> ? return &RegMask::Empty; >> } >> >> Contributed-by: Jatin Bhateja >> Reviewed-by: vlivanov, sviswanathan, ? >> >> Testing: tier1-4 (both with and without generic vector operands) >> >> Best regards, >> Vladimir Ivanov From vladimir.kozlov at oracle.com Thu Nov 21 18:15:45 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 21 Nov 2019 10:15:45 -0800 Subject: RFR 8234359: [JVMCI] invalidate_nmethod_mirror shouldn't use a phantom reference In-Reply-To: References: <9e58f59e-b879-c9d5-be40-01a6b61cc87c@oracle.com> Message-ID: +1 Vladimir K On 11/20/19 10:50 PM, Erik ?sterlund wrote: > Hi Tom, > > Looks good. > > Thanks, > /Erik > >> On 21 Nov 2019, at 06:22, Tom Rodriguez wrote: >> >> ?http://cr.openjdk.java.net/~never/8234359/webrev >> https://bugs.openjdk.java.net/browse/JDK-8234359 >> >> While testing the latest JVMCI in JDK11, crashes were occurring during draining of the SATB buffers. The problem was tracked down to invalidate_nmethod_mirror being called on an nmethod whose InstalledCode instance was also dead in the current GC. Reading this oop using NativeAccess lead to that oop being enqueued in the SATB buffer. In JDK 14 it appears some other change in G1 disables those barriers at the point this code is executed but in JDK11 no such logic exists. This code never resurrects that oop so using the normal AS_NO_KEEPALIVE semantics is correct and avoids attempting to enqueue the potentially dead object. >> >> tom > From vladimir.kozlov at oracle.com Thu Nov 21 18:21:42 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 21 Nov 2019 10:21:42 -0800 Subject: RFR 8234359: [JVMCI] invalidate_nmethod_mirror shouldn't use a phantom reference In-Reply-To: References: <9e58f59e-b879-c9d5-be40-01a6b61cc87c@oracle.com> Message-ID: <62e3317b-c1f2-fedb-d2b1-a1d0fdc30bed@oracle.com> On other hand there is testing failure which seems 8234429. May be we should hold this fix until 8234429 is resolved. And retest again after it fixed Thanks, Vladimir On 11/21/19 10:15 AM, Vladimir Kozlov wrote: > +1 > > Vladimir K > > On 11/20/19 10:50 PM, Erik ?sterlund wrote: >> Hi Tom, >> >> Looks good. >> >> Thanks, >> /Erik >> >>> On 21 Nov 2019, at 06:22, Tom Rodriguez >>> wrote: >>> >>> ?http://cr.openjdk.java.net/~never/8234359/webrev >>> https://bugs.openjdk.java.net/browse/JDK-8234359 >>> >>> While testing the latest JVMCI in JDK11, crashes were occurring >>> during draining of the SATB buffers.? The problem was tracked down to >>> invalidate_nmethod_mirror being called on an nmethod whose >>> InstalledCode instance was also dead in the current GC. Reading this >>> oop using NativeAccess lead to that oop being >>> enqueued in the SATB buffer.? In JDK 14 it appears some other change >>> in G1 disables those barriers at the point this code is executed but >>> in JDK11 no such logic exists.? This code never resurrects that oop >>> so using the normal AS_NO_KEEPALIVE semantics is correct and avoids >>> attempting to enqueue the potentially dead object. >>> >>> tom >> From igor.ignatyev at oracle.com Thu Nov 21 22:16:03 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 21 Nov 2019 14:16:03 -0800 Subject: RFR(S) : 8234290 : compiler/c2/Test6857159.java times out and fail to clean up files In-Reply-To: <84f88569-8467-7a01-2941-5f69c1ee862b@oracle.com> References: <37F8BFE5-CF19-42CC-8C26-ECCB2008D4A1@oracle.com> <84f88569-8467-7a01-2941-5f69c1ee862b@oracle.com> Message-ID: <6FEC2418-9848-4F25-86A2-FD2C88632184@oracle.com> Tobias, thanks for your review, pushed. -- Igor > On Nov 21, 2019, at 6:14 AM, Tobias Hartmann wrote: > > Hi Igor, > > nice cleanup. Looks good to me. > > Best regards, > Tobias > > On 21.11.19 08:33, Igor Ignatyev wrote: >> ping? >> >> -- Igor >> >>> On Nov 16, 2019, at 10:07 PM, Igor Ignatyev wrote: >>> >>> http://cr.openjdk.java.net/~iignatyev//8234290/webrev.00/index.html >>>> 67 lines changed: 16 ins; 24 del; 27 mod; >>> >>> Hi all, >>> >>> could you please review this small fix for Test6857159 test? >>> from JBS: >>>> the test has -XX:CompileOnly=compiler.c2.Test6857159$Test$ct::run, but there is no 'ct' class, there are ct[0-2], and ct0 the only which has 'run' method. shouldNotContain("COMPILE SKIPPED") and shouldContain("$ct0::run (16 bytes)"), which, I guess, were a defense against such situation, didn't help b/c PrintCompilation output doesn't have 'COMPILE SKIPPED' lines and have 'made not compilable on levels 0 1 2 3 ... $ct0::run (16 bytes) excluded by CompileCommand' line. >>> the patch fixes CompileOnly value (actually replaces it w/ the correct CompileCommand), removes extra layer, and makes the test to use WhiteBox to check if ct0::run got compiled. >>> >>> webrev: http://cr.openjdk.java.net/~iignatyev//8234290/webrev.00/index.html >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8234290 >>> testing: >>> - compiler/c2/Test6857159.java once on linux-x64,windows-x64,macosx-x64 >>> - compiler/c2/Test6857159.java 100 time on windows-x64-debug (where all failures were seen so far) >>> >>> Thanks, >>> -- Igor >>> >> From sandhya.viswanathan at intel.com Thu Nov 21 23:58:47 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Thu, 21 Nov 2019 23:58:47 +0000 Subject: RFR: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier In-Reply-To: <1e9031de-a5a9-fdf6-8284-bf3cc904320b@oracle.com> References: <1e9031de-a5a9-fdf6-8284-bf3cc904320b@oracle.com> Message-ID: Hi Vladimir/Eric, The CPU model list in VM_Version::compute_has_intel_jcc_erratum() looks correct and is per section 4.0 of the document: https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev On Behalf Of Vladimir Ivanov Sent: Thursday, November 21, 2019 3:02 AM To: erik.osterlund at oracle.com; hotspot compiler Subject: Re: RFR: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier Thanks for taking care of it, Erik. Overall, the approach you chose looks promising. I'll let Intel folks to comment on the details of CPU model dispatching in VM_Version::compute_has_intel_jcc_erratum(). As an alternative solution, you could just align instructions on 8/16-byte boundary (for 5 and 10 byte instruction sequencies respectively). It'll definitely need more padding, but it looks easier to implement as well. Do you consider additional padding as risky from performance perspective? Regarding the implementation itself, it looks like MacroAssembler is the best place for it. There are 3 parts (mostly independent) of the fix you put in the single place: what to do (how much padding needed), where to do (what code is affected), and when to apply it (based on whether hardware is affected or not). Even if you want to start with ZGC load barrier, it would be nice to factor the machinery in such a way that it's easy to apply it in the new code. For example, AbstractAssembler already holds some state which is managed in RAII-style (e.g., InstructionMark and ShortBranchVerifier). You could introduce a new capability in MacroAssembler which conditionally pads jumps and conditional jumps. I'm fine with doing the full refactoring later, but it would be nice to do first steps in that direction right away. Best regards, Vladimir Ivanov On 19.11.2019 17:20, erik.osterlund at oracle.com wrote: > Hi, > > Intel released an erratum (SKX102) which causes "unexpected system > behaviour" when branches (including fused conditional branches) cross > or end at 64 byte boundaries. > They are mitigating this by rolling out microcode updates that disable > micro op caching for conditional branches that cross or end at 32 byte > boundaries. The mitigation can cause performance regressions, unless > affected branches are aligned properly. > > The erratum and its mitigation are described in more detail in this > document published by Intel: > https://www.intel.com/content/dam/support/us/en/documents/processors/m > itigations-jump-conditional-code-erratum.pdf > > > My intention for this patch is to introduce the infrastructure to > determine that we may have an affected CPU, and mitigate this by > aligning the most important branch in the whole > JVM: the ZGC load barrier fast path check. Perhaps similar methodology > can be reused later to solve this for other performance critical code, > but that is outside the scope of this CR. > > The sprinkling of nops do not seem to cause regressions in workloads I > have tried, given a machine without the JCC mitigations. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8234160 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8234160/webrev.00/ > > Thanks, > /Erik From sandhya.viswanathan at intel.com Fri Nov 22 01:27:34 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 22 Nov 2019 01:27:34 +0000 Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 Message-ID: On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092. When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument. This should automatically result in MaxVectorSize being set to 64 bytes. However post JDK-8221092, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes. I have a patch which fixes the issue. JBS: https://bugs.openjdk.java.net/browse/JDK-8234610 Webrev: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.00/ Please review and approve. Best Regards, Sandhya From vladimir.kozlov at oracle.com Fri Nov 22 02:02:43 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 21 Nov 2019 18:02:43 -0800 Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 In-Reply-To: References: Message-ID: Hi Sandhya, I think you should put cpuid code added by 8221092 under if (use_evex) checks because if user specified UseAVX=2 the code under (use_evex) will not be executed anyway. Or I am missing something. I did not get why you said MaxVectorSize is being wrongly set to 16 bytes. It should 32 because it will set UseAVX=1 in current code. Thanks, Vladimir On 11/21/19 5:27 PM, Viswanathan, Sandhya wrote: > On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092. > > When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument. > This should automatically result in MaxVectorSize being set to 64 bytes. > > However post JDK-8221092, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes. > I have a patch which fixes the issue. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8234610 > Webrev: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.00/ > > Please review and approve. > > Best Regards, > Sandhya > > From Pengfei.Li at arm.com Fri Nov 22 08:45:47 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Fri, 22 Nov 2019 08:45:47 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <399b921f-2110-3ceb-24b9-e346e7733ab5@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <355b7629-2ef0-316e-8d75-0593a8a50dc8@redhat.com> <638df51f-f9e0-5ceb-6509-7d396f85d8ec@redhat.com> <399b921f-2110-3ceb-24b9-e346e7733ab5@redhat.com> Message-ID: Hi Andrew, > > In which file do you think we should add the flag? Can we just check the > value of CompressedKlassPointers::base() in reg_mask_init() ? > > I would call from the #ifdef AARCH64 code that allocates the memory into a > static method Assembler::setCompressedBaseAndScale(). Thanks for your suggestion. I have ever tried to set a flag from the metaspace reservation code but now I'm switching back to my another approach. Below is my justification. The #ifdef code block which allocates metaspace is actually used by both AARCH64 and AIX. Of course, we can add AArch64-specific logic inside with AARCH64_ONLY(), but it doesn't cover all scenarios that r27 isn't used. In klass pointers encoding and decoding, we have a special path called use_XOR_for_compressed_class_base where the metaspace may be not nicely fit but r27 isn't used. [1] Regarding your suggestion of setting compressed base and shift values into AArch64 assembler, it can solve the problem of covering the use_XOR_for_compressed_class_base path. But we have to do it in Metaspace::set_narrow_klass_base_and_shift() where the base and shift are finally determined and introduce new code block of "#ifdef AARCH64 #endif" in HotSpot shared code. In my approach, I added a method in aarch64.ad to check the base and shift in reg_mask_init(), and moved the logic of use_XOR_for_compressed_class_base here from the MacroAssembler constructor. I know my implementation has a drawback that the logic of my new method may be mis-aligned with the encoding/decoding logic if someone changes the MacroAssembler code without noticing my code. So I also added a few lines of comments to avoid this happening. See my updated webrev below. http://cr.openjdk.java.net/~pli/rfr/8233743/webrev.01/ Please let me know if you have any further suggestions or disagreements. [1] http://hg.openjdk.java.net/jdk/jdk/file/fcd74557a9cc/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#l3918 -- Thanks, Pengfei From vladimir.x.ivanov at oracle.com Fri Nov 22 12:39:07 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 22 Nov 2019 15:39:07 +0300 Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 In-Reply-To: References: Message-ID: <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com> Hi Sandhya, > On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092. > > When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument. > This should automatically result in MaxVectorSize being set to 64 bytes. > > However post JDK-8221092, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes. Please, elaborate how it happens and how legacy_setup affects it? Why 8221092 needs the code you conditionally exclude. Why the following isn't enough? if (FLAG_IS_DEFAULT(UseAVX)) { FLAG_SET_DEFAULT(UseAVX, use_avx_limit); + if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE && _stepping < 5) { + FLAG_SET_DEFAULT(UseAVX, 2); //Set UseAVX=2 for Skylake + } } else if (UseAVX > use_avx_limit) { Best regards, Vladimir Ivanov From rwestrel at redhat.com Fri Nov 22 13:19:36 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 22 Nov 2019 14:19:36 +0100 Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out In-Reply-To: References: <878spbc0c8.fsf@redhat.com> <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com> <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com> <87y2wu7kpn.fsf@redhat.com> Message-ID: <87mucoaw0n.fsf@redhat.com> Hi Vitaly, > Thanks for fixing this! :) Perhaps a bit too premature to ask but: any > chance this will get backported to 11? I just backported it to openjdk 11u. Roland. From vladimir.x.ivanov at oracle.com Fri Nov 22 13:22:13 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 22 Nov 2019 16:22:13 +0300 Subject: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier In-Reply-To: <0bf38389-2b1a-3198-acf5-3b526a9944d9@oracle.com> References: <8fb87bca-a013-e2e0-211f-ebcd5931b8c1@oracle.com> <0bf38389-2b1a-3198-acf5-3b526a9944d9@oracle.com> Message-ID: Hi Erik, >>> That is a good question. Unfortunately, there are a few problems >>> applying such a strategy: >>> >>> 1) We do not want to constrain the alignment such that the >>> instruction (+ specific offset) sits at e.g. the beginning of a 32 >>> byte boundary. We want to be more loose and say that any alignment is >>> fine... except the bad ones (crossing and ending at a 32 byte >>> boundary). Otherwise I fear we will find ourselves bloating the code >>> cache with unnecessary nops to align instructions that would never >>> have been a problem. So in terms of alignment constraints, I think >>> such a hammer is too big. >> >> It would be interesting to have some data on that one. Aligning 5-byte >> instruction on 8-byte boundary wastes 3 bytes at most. For 10-byte >> sequence it wastes 6 bytes at most which doesn't sound good. > > I think you missed one of my points (#2), which is that aligning single > instructions is not enough to remedy the problem. > For example, consider you have an add; jne; sequence. Let's say we > decide on a magical alignment that we apply globally for > jcc instructions. > Assume that the jne was indeed aligned correctly and we insert no nops. > Then it is not necessarily the case that the fused > add + jne sequence has the desired alignment property as well (i.e. the > add might cross 32 byte boundaries, tainting the > macro fused micro code). Therefore, this will not work. > I suppose that if you would always put in at least one nop before the > jcc to break all macro fusions of branches globally, > then you will be able to do that. But that seems like a larger hammer > than what we need. I was thinking about aligning macro-fused instruction sequences and not individual jcc instructions. There are both automatic and cooperative ways to detect them. >>> 2) Another issue is that the alignment constraints apply not just to >>> the one Mach node. It's sometimes for a fused op + jcc. Since we >>> currently match the conditions and their branches separately (and the >>> conditions not necessarily knowing they are indeed conditions to a >>> branch, like for example an and instruction). So aligning the jcc for >>> example is not necessarily going to help, unless its alignment knows >>> what its preceding instruction is, and whether it will be fused or >>> not. And depending on that, we want different alignment properties. >>> So here the hammer is seemingly too loose. >> >> I mentioned MacroAssembler in previous email, because I don't consider >> it as C2-specific problem. Stubs, interpreter, and C1 are also >> affected and we need to fix them too (considering being on the edge of >> cache line may cause unpredictable behavior). > > I disagree that this is a correctness fix. The correctness fix is for > branches on 64 byte boundaries, and is being dealt with > using micro code updates (that disables micro op caching of the > problematic branch and fused branch micro ops). What > we are dealing with here is mitigating the performance hit of Intel's > correctness mitigation for the erratum, which involves > branches and fused branches crossing or ending at 32 byte boundaries. In > other words, the correctness is dealt with elsewhere, > and we are optimizing the code to avoid regressions for performance > sensitive branches, due to that correctness fix. > Therefore, I think it is wise to focus the optimization efforts where it > matters the most: C2. Point taken. I agree that the focus should be on performance. But I'd include stubs as well. Many of them are extensively used from C2-generated code. > I don't think that doing some analysis on the Mach nodes and injecting > the padding only where we actually need it is too > complicated in C2 (which I believe, at least for now, is where we should > focus). I'm curious is there anything special about Mach IR which helps/simplifies the analysis? One problem I see with it is that some mach nodes issue local branches for their own purposes (inside MachNode::emit()) and you can't mitigate such cases on the level of Mach IR. > I have made a prototype, what this might look like and it looks like this: > http://cr.openjdk.java.net/~eosterlund/8234160/webrev.01/ Just one more comment: it's weird to see intel_jcc_erratum referenced in shared code. You could #ifdef it for x86-only, but it's much better to move the code to x86-specific location. Best regards, Vladimir Ivanov > Incremental: > http://cr.openjdk.java.net/~eosterlund/8234160/webrev.00_01/ > > The idea is pretty much what I said to Paul. There are 3 hooks needed: > 1) Apply pessimistic size measurements during branch shortening on > affected nodes > 2) Analyze which nodes will fuse, and tag all affected mach nodes with a > flag > 3) When emitting the code, add required padding on the flagged nodes > that end at or cross 32 byte boundaries. > > I haven't run exhaustive tests or measurements yet on this. I thought we > should sync ideas so we agree about direction before I do too much. > > What do you think? > > Thanks, > /Erik > >> Best regards, >> Vladimir Ivanov >> >>> I'm not 100% sure what to suggest for the generic case, but perhaps: >>> >>> After things stopped moving around, add a pass to the Mach nodes, >>> similar to branch shortening that: >>> >>> 1) Set up a new flag (Flags_intel_jcc_mitigation or something) to be >>> used on Mach nodes to mark affected nodes. >>> 2) Walk the Mach nodes and tag branches and conditions used by fused >>> branches (by walking edges), checking that the two are adjacent (by >>> looking at the node index in the block), and possibly also checking >>> that it is one of the affected condition instructions that will get >>> fused. >>> 3) Now that we know what Mach nodes (and sequences of macro fused >>> nodes) are problematic, we can put some code where the mach nodes are >>> emitted that checks for consecutively tagged nodes and inject nops in >>> the code buffer if they cross or end at 32 byte boundaries. >>> >>> I suppose an alternative strategy is making sure that any problematic >>> instruction sequence that would be fused, is also fused into one Mach >>> node by sprinkling more rules in the AD file for the various forms of >>> conditional branches that we think cover all the cases, and then >>> applying the alignment constraint on individual nodes only. But it >>> feels like that could be more intrusive and less efficient). >>> >>> Since the generic problem is more involved compared to the simpler >>> ZGC load barrier fix (which will need special treatment anyway), I >>> would like to focus this RFE only on the ZGC load barrier branch, >>> because it makes me sad when it has to suffer. Having said that, we >>> will certainly look into fixing the generic problem too after this. >> > From vladimir.x.ivanov at oracle.com Fri Nov 22 13:50:09 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 22 Nov 2019 16:50:09 +0300 Subject: [14] RFR (S): 8234394: C2: Dynamic register class support in ADLC In-Reply-To: <4a0741bf-77b6-eb67-87f6-ae4da7e4c3e0@oracle.com> References: <82d28c5a-1b18-240d-8356-5e4266c63bd1@oracle.com> <9091c3e0-e784-cba4-94ae-88daf42fed12@oracle.com> <4a0741bf-77b6-eb67-87f6-ae4da7e4c3e0@oracle.com> Message-ID: Thanks, Vladimir. Best regards, Vladimir Ivanov On 21.11.2019 20:44, Vladimir Kozlov wrote: > +1 > > Vladimir K > > On 11/21/19 6:10 AM, Tobias Hartmann wrote: >> Hi Vladimir, >> >> looks good to me. >> >> Best regards, >> Tobias >> >> On 19.11.19 14:40, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234394/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8234394 >>> >>> Introduce new "placeholder" register class which denotes that >>> instructions which use operands of >>> such class should dynamically query register masks from the operand >>> instance and not hard-code them >>> in the code. >>> >>> It is required for generic vectors in order to support generic vector >>> operand (vec/legVec) >>> replacement with fixed-sized vector operands >>> (vec[SDXYZ]/legVec[SDXYZ]) after matching is over. >>> >>> As an example of usage, generic vector operand is declared as: >>> >>> operand vec() %{ >>> ?? constraint(ALLOC_IN_RC(dynamic)); >>> ?? match(VecX); >>> ?? match(VecY); >>> ?? match(VecZ); >>> ?? match(VecS); >>> ?? match(VecD); >>> ... >>> >>> Then for an instruction which uses vec as DEF >>> >>> x86.ad: >>> instruct loadV4(vec dst, memory mem) %{ >>> >>> =ADLC=> >>> >>> ad_x86_misc.cpp: >>> const RegMask &loadV4Node::out_RegMask() const { >>> ?? return (*_opnds[0]->in_RegMask(0)); >>> } >>> >>> vs >>> >>> x86.ad: >>> instruct loadV4(vecS dst, memory mem) %{ >>> >>> =ADLC=> >>> >>> ad_x86_misc.cpp: >>> const RegMask &loadV4Node::out_RegMask() const { >>> ?? return (VECTORS_REG_VLBWDQ_mask()); >>> } >>> >>> >>> An operand with dynamic register class can't be used during code >>> emission and should be replaced >>> with something different before register allocation: >>> >>> const RegMask *vecOper::in_RegMask(int index) const { >>> ?? return &RegMask::Empty; >>> } >>> >>> Contributed-by: Jatin Bhateja >>> Reviewed-by: vlivanov, sviswanathan, ? >>> >>> Testing: tier1-4 (both with and without generic vector operands) >>> >>> Best regards, >>> Vladimir Ivanov From vitalyd at gmail.com Fri Nov 22 13:53:22 2019 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 22 Nov 2019 08:53:22 -0500 Subject: RFR(S): 8232539: SIGSEGV in C2 Node::unique_ctrl_out In-Reply-To: <87mucoaw0n.fsf@redhat.com> References: <878spbc0c8.fsf@redhat.com> <2fff1294-c756-dc91-cf02-fa2d963cb29a@oracle.com> <5dc53a65-d526-2bc8-f10c-2cf330c29387@oracle.com> <87y2wu7kpn.fsf@redhat.com> <87mucoaw0n.fsf@redhat.com> Message-ID: Hey Roland, On Fri, Nov 22, 2019 at 8:19 AM Roland Westrelin wrote: > > Hi Vitaly, > > > Thanks for fixing this! :) Perhaps a bit too premature to ask but: any > > chance this will get backported to 11? > > I just backported it to openjdk 11u. Fantastic - thanks very much! > > > Roland. > > -- Sent from my phone From erik.osterlund at oracle.com Fri Nov 22 13:55:52 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Fri, 22 Nov 2019 14:55:52 +0100 Subject: RFR: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier In-Reply-To: References: <1e9031de-a5a9-fdf6-8284-bf3cc904320b@oracle.com> Message-ID: Hi Sandhya, Thanks for verifying the magic table. /Erik On 11/22/19 12:58 AM, Viswanathan, Sandhya wrote: > Hi Vladimir/Eric, > > The CPU model list in VM_Version::compute_has_intel_jcc_erratum() looks correct and is per section 4.0 of the document: > https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf > > Best Regards, > Sandhya > > > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Vladimir Ivanov > Sent: Thursday, November 21, 2019 3:02 AM > To: erik.osterlund at oracle.com; hotspot compiler > Subject: Re: RFR: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier > > Thanks for taking care of it, Erik. > > Overall, the approach you chose looks promising. > > I'll let Intel folks to comment on the details of CPU model dispatching in VM_Version::compute_has_intel_jcc_erratum(). > > As an alternative solution, you could just align instructions on 8/16-byte boundary (for 5 and 10 byte instruction sequencies respectively). It'll definitely need more padding, but it looks easier to implement as well. Do you consider additional padding as risky from performance perspective? > > Regarding the implementation itself, it looks like MacroAssembler is the best place for it. > > There are 3 parts (mostly independent) of the fix you put in the single > place: what to do (how much padding needed), where to do (what code is affected), and when to apply it (based on whether hardware is affected or not). > > Even if you want to start with ZGC load barrier, it would be nice to factor the machinery in such a way that it's easy to apply it in the new code. > > For example, AbstractAssembler already holds some state which is managed in RAII-style (e.g., InstructionMark and ShortBranchVerifier). > > You could introduce a new capability in MacroAssembler which conditionally pads jumps and conditional jumps. > > I'm fine with doing the full refactoring later, but it would be nice to do first steps in that direction right away. > > Best regards, > Vladimir Ivanov > > On 19.11.2019 17:20, erik.osterlund at oracle.com wrote: >> Hi, >> >> Intel released an erratum (SKX102) which causes "unexpected system >> behaviour" when branches (including fused conditional branches) cross >> or end at 64 byte boundaries. >> They are mitigating this by rolling out microcode updates that disable >> micro op caching for conditional branches that cross or end at 32 byte >> boundaries. The mitigation can cause performance regressions, unless >> affected branches are aligned properly. >> >> The erratum and its mitigation are described in more detail in this >> document published by Intel: >> https://www.intel.com/content/dam/support/us/en/documents/processors/m >> itigations-jump-conditional-code-erratum.pdf >> >> >> My intention for this patch is to introduce the infrastructure to >> determine that we may have an affected CPU, and mitigate this by >> aligning the most important branch in the whole >> JVM: the ZGC load barrier fast path check. Perhaps similar methodology >> can be reused later to solve this for other performance critical code, >> but that is outside the scope of this CR. >> >> The sprinkling of nops do not seem to cause regressions in workloads I >> have tried, given a machine without the JCC mitigations. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8234160 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8234160/webrev.00/ >> >> Thanks, >> /Erik From erik.osterlund at oracle.com Fri Nov 22 14:23:29 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Fri, 22 Nov 2019 15:23:29 +0100 Subject: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier In-Reply-To: References: <8fb87bca-a013-e2e0-211f-ebcd5931b8c1@oracle.com> <0bf38389-2b1a-3198-acf5-3b526a9944d9@oracle.com> Message-ID: <5c442ef3-be3b-e540-0097-5a5b00f5b94e@oracle.com> Hi Vladimir, On 11/22/19 2:22 PM, Vladimir Ivanov wrote: > Hi Erik, >>>> That is a good question. Unfortunately, there are a few problems >>>> applying such a strategy: >>>> >>>> 1) We do not want to constrain the alignment such that the >>>> instruction (+ specific offset) sits at e.g. the beginning of a 32 >>>> byte boundary. We want to be more loose and say that any alignment >>>> is fine... except the bad ones (crossing and ending at a 32 byte >>>> boundary). Otherwise I fear we will find ourselves bloating the >>>> code cache with unnecessary nops to align instructions that would >>>> never have been a problem. So in terms of alignment constraints, I >>>> think such a hammer is too big. >>> >>> It would be interesting to have some data on that one. Aligning >>> 5-byte instruction on 8-byte boundary wastes 3 bytes at most. For >>> 10-byte sequence it wastes 6 bytes at most which doesn't sound good. >> >> I think you missed one of my points (#2), which is that aligning >> single instructions is not enough to remedy the problem. >> For example, consider you have an add; jne; sequence. Let's say we >> decide on a magical alignment that we apply globally for >> jcc instructions. >> Assume that the jne was indeed aligned correctly and we insert no >> nops. Then it is not necessarily the case that the fused >> add + jne sequence has the desired alignment property as well (i.e. >> the add might cross 32 byte boundaries, tainting the >> macro fused micro code). Therefore, this will not work. >> I suppose that if you would always put in at least one nop before the >> jcc to break all macro fusions of branches globally, >> then you will be able to do that. But that seems like a larger hammer >> than what we need. > > I was thinking about aligning macro-fused instruction sequences and > not individual jcc instructions. There are both automatic and > cooperative ways to detect them. Okay. > >>>> 2) Another issue is that the alignment constraints apply not just >>>> to the one Mach node. It's sometimes for a fused op + jcc. Since we >>>> currently match the conditions and their branches separately (and >>>> the conditions not necessarily knowing they are indeed conditions >>>> to a branch, like for example an and instruction). So aligning the >>>> jcc for example is not necessarily going to help, unless its >>>> alignment knows what its preceding instruction is, and whether it >>>> will be fused or not. And depending on that, we want different >>>> alignment properties. So here the hammer is seemingly too loose. >>> >>> I mentioned MacroAssembler in previous email, because I don't >>> consider it as C2-specific problem. Stubs, interpreter, and C1 are >>> also affected and we need to fix them too (considering being on the >>> edge of cache line may cause unpredictable behavior). >> >> I disagree that this is a correctness fix. The correctness fix is for >> branches on 64 byte boundaries, and is being dealt with >> using micro code updates (that disables micro op caching of the >> problematic branch and fused branch micro ops). What >> we are dealing with here is mitigating the performance hit of Intel's >> correctness mitigation for the erratum, which involves >> branches and fused branches crossing or ending at 32 byte boundaries. >> In other words, the correctness is dealt with elsewhere, >> and we are optimizing the code to avoid regressions for performance >> sensitive branches, due to that correctness fix. >> Therefore, I think it is wise to focus the optimization efforts where >> it matters the most: C2. > > Point taken. I agree that the focus should be on performance. > > But I'd include stubs as well. Many of them are extensively used from > C2-generated code. Okay. Any specific stubs you have in mind?If there are some critical ones, we can sprinkle some scope objects like I did in the ZGC code. > >> I don't think that doing some analysis on the Mach nodes and >> injecting the padding only where we actually need it is too >> complicated in C2 (which I believe, at least for now, is where we >> should focus). > > I'm curious is there anything special about Mach IR which > helps/simplifies the analysis? Yeah, the fact that you can walk the graph and tag the problematic combinations of mach nodes in one pass, and then fuse the alignment with that knowledge when code is emitted. > One problem I see with it is that some mach nodes issue local branches > for their own purposes (inside MachNode::emit()) and you can't > mitigate such cases on the level of Mach IR. Sure. Things like the MacroAssembler::fast_lock which is directly injected into the emission of a Mach node will have some internal branches that my analysis will not find. I do have concerns though about injecting magic into the MacroAssembler that tries to solve this automagically on the assembly level, by having the assembler spit out different instructions than you requested. The following comment from assembler.hpp captures my thought exactly: 207: // The Abstract Assembler: Pure assembler doing NO optimizations on the 208: // instruction level; i.e., what you write is what you get. 209: // The Assembler is generating code into a CodeBuffer. I think it is desirable to keep the property that when we tell the *Assembler to generate a __ cmp(); __ jcc(); it will do exactly that. When such assumptions break, any code that has calculated the size of instructions, making assumptions about their size, will fail. For example, any MachNode with hardcoded size() might underestimate how much memory is really needed, and code such as nmethod entry barriers that have calculated the offset to the cmp immediate might suddenly stop working because. There is similar code for oop maps where we calculate offsets into mach nodes with oop maps to describe the PC after a call, which will stop working: // !!!!! Special hack to get all types of calls to specify the byte offset //?????? from the start of the call to the point where the return address //?????? will point. int MachCallStaticJavaNode::ret_addr_offset() { ? int offset = 5; // 5 bytes from start of call to where return address points ? offset += clear_avx_size(); ? return offset; } Basically, I think you might be able to mitigate more branches on the MacroAssembler layer, but I think it would also be more risky, as code that was not built for having random size will start failing, in places we didn't think of.I can think of a few, and feel like there are probably other places I have not thought about. So from that point of view, I think I would rather to this on Mach nodes where it is safe, and I think we can catch the most important ones there, and miss a few branches that the macro assembler would have found with magic, than apply it to all branches and hope we find all the bugs due to unexpected magic. Do you agree? Or perhaps I misunderstood what you are suggesting. >> I have made a prototype, what this might look like and it looks like >> this: >> http://cr.openjdk.java.net/~eosterlund/8234160/webrev.01/ > > Just one more comment: it's weird to see intel_jcc_erratum referenced > in shared code. You could #ifdef it for x86-only, but it's much better > to move the code to x86-specific location. Sure, I can move that to an x86 file and make it build only on x86_64. Thanks, /Erik From thomas.stuefe at gmail.com Fri Nov 22 17:06:52 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Fri, 22 Nov 2019 18:06:52 +0100 Subject: RFR: 8234328: VectorSet::clear can cause fragmentation In-Reply-To: References: <47dce9ee-0e62-7375-4dff-2924f824ecc6@oracle.com> Message-ID: Hi Claes, I was curious how much space we actually waste this way, so I did a little test. Very simple, counted waste due to not-in-place reallocation, and on Arena destruction I printed it out. For a simple java -version -Xcomp, I had ~6000 arenas reporting waste on destruction, with a median waste of about 2K, outliers of about 250K. This is just the final waste, I did not count how many separate reallocation steps were involved. So, not sure how indicative this result is. I have seen scenarios in the past where this kind of "reallocation" gets excessive, albeit not in the compiler. On Thu, Nov 21, 2019 at 2:19 PM Claes Redestad wrote: > Hi Thomas, > > On 2019-11-19 20:31, Thomas St?fe wrote: > > Hi Claes, > > > > Not that this is wrong, but do we have to live in resource area? I fell > > over such problems several times already, e.g. with resource-area-backed > > StringStreams. Maybe it would be better to just forbid resizing of > > RA-allocated arrays altogether. > > this all gets a bit out of scope, but what alternatives do you see to > living in the resource area in general for something like this? > In general, I'd say if we rely on fine granular realloc, Arenas are not a good option. For VectorSet in particular, how about a two-layered sparse array instead. This only works if you know the max number of bits you'll ever need, e.g. the NodeLimit for the compiler. Let the first layer be a pointer array of size N to uint32 blocks of size M, which are to be allocated on demand. If M and N are powers of 2 access can be simple. Something like this: class VectorSet { struct bitblock_t { uint32_t bits[M]; } bitblock_t* _blocks[N]; void set_bit(int i) { int block_idx = i >> log2(M) if (_blocks[block_idx] == NULL) allocate-block _blocks[block_idx]->bits[i % M] = 1; } bool is_set(int i) { int block_idx = i >> log2(M) return (_blocks[block_idx] != NULL && _blocks[block_idx]->bits[i % M] == 1); } }; Advantages would be that the blocks would not need reallocating; no copying data around; you use less memory if the memory is sparse and not all bitblocks are populated, which also makes VectorSet::size() faster since you can omit counting NULL blocks; you might have better locality too since once a block is allocated it never changes position in memory.. Disadvantages would be a bit more memory overhead for the top pointer array and, on access, one indirection more to resolve. One could make this more involved for initially-smaller-sets, e.g. first allocate just a bitblock_t and use this as a small fixed sized VectorSet; then, should we outgrow it, change to a two-layered VectorSet with the initial bitblock_t as first block. I hope this makes any sense :) > > > > > Then there is also the problem with passing RA-allocated arrays down the > > stack and accidentally resizing them under a different ResourceMark. I > > am not sure if this could happen with VectorSet though. > > AFAICT there's a ResourceMark at the entry point of compilation, > then all others are restricted to a local scope around logging and > similar, so it doesn't _look_ like there's any potential issues around. > > I guess it'd be nice in general with some sort of debug-only > ProhibitResourceMark(Arena*) you could wrap around calls into utilities > out of your control which would assert if any code tries to > allocate/reallocate/free in a specific resource arena. > > That would be good to have, yes. > Thanks /Claes > Cheers, Thomas > > > > > Thanks, Thomas > > > > On Tue, Nov 19, 2019 at 11:16 AM Claes Redestad > > > wrote: > > Webrev: http://cr.openjdk.java.net/~redestad/8234328/open.00/ > From navy.xliu at gmail.com Fri Nov 22 18:09:15 2019 From: navy.xliu at gmail.com (Liu Xin) Date: Fri, 22 Nov 2019 10:09:15 -0800 Subject: [XXS] C1 misses to dump a reason when it inlines successfully Message-ID: hi, Reviewers, Could you review this extremely small change? Bugs: https://bugs.openjdk.java.net/browse/JDK-8234541 Webrev: https://cr.openjdk.java.net/~xliu/8234541/00/webrev/ When I analyzed PrintInlining, I was confused by the inline message without any detail. It's not easy for developer to tell if this method is inlined or not. This patch add a comment "inline by the rules of C1". I would like to add an explicit reason, but there's no decisive reason in GraphBuilder::try_inline_full. It just passes all restrict rules. Any other suggestion would be appreciated. Thanks, --lx From tom.rodriguez at oracle.com Fri Nov 22 18:27:20 2019 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 22 Nov 2019 10:27:20 -0800 Subject: RFR 8234359: [JVMCI] invalidate_nmethod_mirror shouldn't use a phantom reference In-Reply-To: <62e3317b-c1f2-fedb-d2b1-a1d0fdc30bed@oracle.com> References: <9e58f59e-b879-c9d5-be40-01a6b61cc87c@oracle.com> <62e3317b-c1f2-fedb-d2b1-a1d0fdc30bed@oracle.com> Message-ID: <7d7df088-87bc-34ef-0123-a7de32199fbd@oracle.com> Vladimir Kozlov wrote on 11/21/19 10:21 AM: > On other hand there is testing failure which seems 8234429. > > May be we should hold this fix until 8234429 is resolved. And retest > again after it fixed Whatever you like is fine with me. Just let me know. tom > > Thanks, > Vladimir > > On 11/21/19 10:15 AM, Vladimir Kozlov wrote: >> +1 >> >> Vladimir K >> >> On 11/20/19 10:50 PM, Erik ?sterlund wrote: >>> Hi Tom, >>> >>> Looks good. >>> >>> Thanks, >>> /Erik >>> >>>> On 21 Nov 2019, at 06:22, Tom Rodriguez >>>> wrote: >>>> >>>> ?http://cr.openjdk.java.net/~never/8234359/webrev >>>> https://bugs.openjdk.java.net/browse/JDK-8234359 >>>> >>>> While testing the latest JVMCI in JDK11, crashes were occurring >>>> during draining of the SATB buffers.? The problem was tracked down >>>> to invalidate_nmethod_mirror being called on an nmethod whose >>>> InstalledCode instance was also dead in the current GC. Reading this >>>> oop using NativeAccess lead to that oop being >>>> enqueued in the SATB buffer.? In JDK 14 it appears some other change >>>> in G1 disables those barriers at the point this code is executed but >>>> in JDK11 no such logic exists.? This code never resurrects that oop >>>> so using the normal AS_NO_KEEPALIVE semantics is correct and avoids >>>> attempting to enqueue the potentially dead object. >>>> >>>> tom >>> From tom.rodriguez at oracle.com Fri Nov 22 18:27:37 2019 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 22 Nov 2019 10:27:37 -0800 Subject: RFR 8234359: [JVMCI] invalidate_nmethod_mirror shouldn't use a phantom reference In-Reply-To: References: <9e58f59e-b879-c9d5-be40-01a6b61cc87c@oracle.com> Message-ID: Thanks! tom Erik ?sterlund wrote on 11/20/19 10:50 PM: > Hi Tom, > > Looks good. > > Thanks, > /Erik > >> On 21 Nov 2019, at 06:22, Tom Rodriguez wrote: >> >> ?http://cr.openjdk.java.net/~never/8234359/webrev >> https://bugs.openjdk.java.net/browse/JDK-8234359 >> >> While testing the latest JVMCI in JDK11, crashes were occurring during draining of the SATB buffers. The problem was tracked down to invalidate_nmethod_mirror being called on an nmethod whose InstalledCode instance was also dead in the current GC. Reading this oop using NativeAccess lead to that oop being enqueued in the SATB buffer. In JDK 14 it appears some other change in G1 disables those barriers at the point this code is executed but in JDK11 no such logic exists. This code never resurrects that oop so using the normal AS_NO_KEEPALIVE semantics is correct and avoids attempting to enqueue the potentially dead object. >> >> tom > From vladimir.kozlov at oracle.com Fri Nov 22 22:44:51 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 22 Nov 2019 14:44:51 -0800 Subject: [14] RFR(S) 8234681: Remove UseJVMCIClassLoader logic from JVMCI code Message-ID: https://cr.openjdk.java.net/~kvn/8234681/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8234681 UseJVMCIClassLoader logic is only applicable to graal-jvmci-8. It was ported from graal-jvmci-8 as part of JDK-8220623 "JVMCI support of libgraal". It is not needed in JDK. Tested tier1-3 Thanks, Vladimir From vladimir.kozlov at oracle.com Fri Nov 22 22:48:04 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 22 Nov 2019 14:48:04 -0800 Subject: [14] RFR(S) 8234681: Remove UseJVMCIClassLoader logic from JVMCI code In-Reply-To: References: Message-ID: <70fac7e7-5f2c-5fc6-2de2-3567a176a03d@oracle.com> Forgot to say that Author of patch is Doug Simon. On 11/22/19 2:44 PM, Vladimir Kozlov wrote: > https://cr.openjdk.java.net/~kvn/8234681/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8234681 > > UseJVMCIClassLoader logic is only applicable to graal-jvmci-8. It was ported from graal-jvmci-8 as part of JDK-8220623 > "JVMCI support of libgraal". It is not needed in JDK. > > Tested tier1-3 > > Thanks, > Vladimir From sandhya.viswanathan at intel.com Fri Nov 22 23:51:38 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 22 Nov 2019 23:51:38 +0000 Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 In-Reply-To: <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com> References: <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com> Message-ID: Hi Vladimir, The bug happens like below: * User specifies -XX:UseAVX=3 on command line. * On Skylake platform due to the following lines: __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset()))); __ movl(rax, Address(rsi, 0)); __ cmpl(rax, 0x50654); // If it is Skylake __ jcc(Assembler::equal, legacy_setup); The zmm registers are not saved/restored. * This results in os_supports_avx_vectors() returning false when called from VM_Version::get_processor_features(): int max_vector_size = 0; if (UseSSE < 2) { // Vectors (in XMM) are only supported with SSE2+ // SSE is always 2 on x64. max_vector_size = 0; } else if (UseAVX == 0 || !os_supports_avx_vectors()) { // 16 byte vectors (in XMM) are supported with SSE2+ max_vector_size = 16; ====> This is the point where max_vector_size is set to 16 } else else if (UseAVX == 1 || UseAVX == 2) { ... } * And so we get UseAVX=3 and max_vector_size = 16. So the fix is to save/restore zmm registers when the flag is not default and user specifies UseAVX > 2. On your question regarding why 8221092 needs the code you conditionally exclude: This was introduced so as not to do any AVX512 execution if not required. ZMM register save/restore uses AVX512 instruction. Best Regards, Sandhya -----Original Message----- From: Vladimir Ivanov Sent: Friday, November 22, 2019 4:39 AM To: Viswanathan, Sandhya ; hotspot compiler Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 Hi Sandhya, > On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092. > > When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument. > This should automatically result in MaxVectorSize being set to 64 bytes. > > However post JDK-8221092, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes. Please, elaborate how it happens and how legacy_setup affects it? Why 8221092 needs the code you conditionally exclude. Why the following isn't enough? if (FLAG_IS_DEFAULT(UseAVX)) { FLAG_SET_DEFAULT(UseAVX, use_avx_limit); + if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE && _stepping < 5) { + FLAG_SET_DEFAULT(UseAVX, 2); //Set UseAVX=2 for Skylake + } } else if (UseAVX > use_avx_limit) { Best regards, Vladimir Ivanov From sandhya.viswanathan at intel.com Fri Nov 22 23:58:42 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 22 Nov 2019 23:58:42 +0000 Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 In-Reply-To: References: Message-ID: Hi Vladimir, I agree the following code could be moved under use_evex: + if (FLAG_IS_DEFAULT(UseAVX)) { + __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset()))); + __ movl(rax, Address(rsi, 0)); + __ cmpl(rax, 0x50654); // If it is Skylake + __ jcc(Assembler::equal, legacy_setup); + } I will send the updated patch. I explained in the other email how we are getting MaxVectorSize as 16 when user specifies UseAVX=3 on Skylake. Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev On Behalf Of Vladimir Kozlov Sent: Thursday, November 21, 2019 6:03 PM To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 Hi Sandhya, I think you should put cpuid code added by 8221092 under if (use_evex) checks because if user specified UseAVX=2 the code under (use_evex) will not be executed anyway. Or I am missing something. I did not get why you said MaxVectorSize is being wrongly set to 16 bytes. It should 32 because it will set UseAVX=1 in current code. Thanks, Vladimir On 11/21/19 5:27 PM, Viswanathan, Sandhya wrote: > On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092. > > When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument. > This should automatically result in MaxVectorSize being set to 64 bytes. > > However post JDK-8221092, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes. > I have a patch which fixes the issue. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8234610 > Webrev: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.00/ > > Please review and approve. > > Best Regards, > Sandhya > > From sandhya.viswanathan at intel.com Sat Nov 23 00:35:23 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Sat, 23 Nov 2019 00:35:23 +0000 Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 In-Reply-To: References: Message-ID: Please find the updated webrev at: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.01/ JBS: https://bugs.openjdk.java.net/browse/JDK-8234610 Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev On Behalf Of Viswanathan, Sandhya Sent: Friday, November 22, 2019 3:59 PM To: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 Hi Vladimir, I agree the following code could be moved under use_evex: + if (FLAG_IS_DEFAULT(UseAVX)) { + __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset()))); + __ movl(rax, Address(rsi, 0)); + __ cmpl(rax, 0x50654); // If it is Skylake + __ jcc(Assembler::equal, legacy_setup); + } I will send the updated patch. I explained in the other email how we are getting MaxVectorSize as 16 when user specifies UseAVX=3 on Skylake. Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev On Behalf Of Vladimir Kozlov Sent: Thursday, November 21, 2019 6:03 PM To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 Hi Sandhya, I think you should put cpuid code added by 8221092 under if (use_evex) checks because if user specified UseAVX=2 the code under (use_evex) will not be executed anyway. Or I am missing something. I did not get why you said MaxVectorSize is being wrongly set to 16 bytes. It should 32 because it will set UseAVX=1 in current code. Thanks, Vladimir On 11/21/19 5:27 PM, Viswanathan, Sandhya wrote: > On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092. > > When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument. > This should automatically result in MaxVectorSize being set to 64 bytes. > > However post JDK-8221092, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes. > I have a patch which fixes the issue. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8234610 > Webrev: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.00/ > > Please review and approve. > > Best Regards, > Sandhya > > From vladimir.kozlov at oracle.com Sat Nov 23 00:45:49 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 22 Nov 2019 16:45:49 -0800 Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 In-Reply-To: References: Message-ID: Looks good. Thanks, Vladimir On 11/22/19 4:35 PM, Viswanathan, Sandhya wrote: > Please find the updated webrev at: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.01/ > > JBS: https://bugs.openjdk.java.net/browse/JDK-8234610 > > Best Regards, > Sandhya > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Viswanathan, Sandhya > Sent: Friday, November 22, 2019 3:59 PM > To: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net > Subject: RE: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 > > Hi Vladimir, > > I agree the following code could be moved under use_evex: > + if (FLAG_IS_DEFAULT(UseAVX)) { > + __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset()))); > + __ movl(rax, Address(rsi, 0)); > + __ cmpl(rax, 0x50654); // If it is Skylake > + __ jcc(Assembler::equal, legacy_setup); > + } > > I will send the updated patch. > > I explained in the other email how we are getting MaxVectorSize as 16 when user specifies UseAVX=3 on Skylake. > > Best Regards, > Sandhya > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Vladimir Kozlov > Sent: Thursday, November 21, 2019 6:03 PM > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 > > Hi Sandhya, > > I think you should put cpuid code added by 8221092 under if (use_evex) checks because if user specified UseAVX=2 the code under (use_evex) will not be executed anyway. Or I am missing something. > > I did not get why you said MaxVectorSize is being wrongly set to 16 bytes. It should 32 because it will set UseAVX=1 in current code. > > Thanks, > Vladimir > > On 11/21/19 5:27 PM, Viswanathan, Sandhya wrote: >> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092. >> >> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument. >> This should automatically result in MaxVectorSize being set to 64 bytes. >> >> However post JDK-8221092, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes. >> I have a patch which fixes the issue. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8234610 >> Webrev: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.00/ >> >> Please review and approve. >> >> Best Regards, >> Sandhya >> >> From igor.ignatyev at oracle.com Sat Nov 23 01:13:12 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 22 Nov 2019 17:13:12 -0800 Subject: [14] RFR(S) 8234681: Remove UseJVMCIClassLoader logic from JVMCI code In-Reply-To: <70fac7e7-5f2c-5fc6-2de2-3567a176a03d@oracle.com> References: <70fac7e7-5f2c-5fc6-2de2-3567a176a03d@oracle.com> Message-ID: <67B72D86-54F5-4964-972D-B03B220D11D6@oracle.com> Looks good to me. -- Igor > On Nov 22, 2019, at 2:48 PM, Vladimir Kozlov wrote: > > Forgot to say that Author of patch is Doug Simon. > > On 11/22/19 2:44 PM, Vladimir Kozlov wrote: >> https://cr.openjdk.java.net/~kvn/8234681/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8234681 >> UseJVMCIClassLoader logic is only applicable to graal-jvmci-8. It was ported from graal-jvmci-8 as part of JDK-8220623 "JVMCI support of libgraal". It is not needed in JDK. >> Tested tier1-3 >> Thanks, >> Vladimir From vladimir.kozlov at oracle.com Sat Nov 23 01:14:08 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 22 Nov 2019 17:14:08 -0800 Subject: [14] RFR(S) 8234681: Remove UseJVMCIClassLoader logic from JVMCI code In-Reply-To: <67B72D86-54F5-4964-972D-B03B220D11D6@oracle.com> References: <70fac7e7-5f2c-5fc6-2de2-3567a176a03d@oracle.com> <67B72D86-54F5-4964-972D-B03B220D11D6@oracle.com> Message-ID: <3cfca173-8a0a-ecce-dd0b-83055e53873d@oracle.com> Thank you, Igor Vladimir On 11/22/19 5:13 PM, Igor Ignatyev wrote: > Looks good to me. > > -- Igor > >> On Nov 22, 2019, at 2:48 PM, Vladimir Kozlov wrote: >> >> Forgot to say that Author of patch is Doug Simon. >> >> On 11/22/19 2:44 PM, Vladimir Kozlov wrote: >>> https://cr.openjdk.java.net/~kvn/8234681/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8234681 >>> UseJVMCIClassLoader logic is only applicable to graal-jvmci-8. It was ported from graal-jvmci-8 as part of JDK-8220623 "JVMCI support of libgraal". It is not needed in JDK. >>> Tested tier1-3 >>> Thanks, >>> Vladimir > From sandhya.viswanathan at intel.com Sat Nov 23 01:18:00 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Sat, 23 Nov 2019 01:18:00 +0000 Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 In-Reply-To: References: Message-ID: Thank a lot, Vladimir. Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev On Behalf Of Vladimir Kozlov Sent: Friday, November 22, 2019 4:46 PM To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 Looks good. Thanks, Vladimir On 11/22/19 4:35 PM, Viswanathan, Sandhya wrote: > Please find the updated webrev at: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.01/ > > JBS: https://bugs.openjdk.java.net/browse/JDK-8234610 > > Best Regards, > Sandhya > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Viswanathan, Sandhya > Sent: Friday, November 22, 2019 3:59 PM > To: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net > Subject: RE: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 > > Hi Vladimir, > > I agree the following code could be moved under use_evex: > + if (FLAG_IS_DEFAULT(UseAVX)) { > + __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset()))); > + __ movl(rax, Address(rsi, 0)); > + __ cmpl(rax, 0x50654); // If it is Skylake > + __ jcc(Assembler::equal, legacy_setup); > + } > > I will send the updated patch. > > I explained in the other email how we are getting MaxVectorSize as 16 when user specifies UseAVX=3 on Skylake. > > Best Regards, > Sandhya > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Vladimir Kozlov > Sent: Thursday, November 21, 2019 6:03 PM > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 > > Hi Sandhya, > > I think you should put cpuid code added by 8221092 under if (use_evex) checks because if user specified UseAVX=2 the code under (use_evex) will not be executed anyway. Or I am missing something. > > I did not get why you said MaxVectorSize is being wrongly set to 16 bytes. It should 32 because it will set UseAVX=1 in current code. > > Thanks, > Vladimir > > On 11/21/19 5:27 PM, Viswanathan, Sandhya wrote: >> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092. >> >> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument. >> This should automatically result in MaxVectorSize being set to 64 bytes. >> >> However post JDK-8221092, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes. >> I have a patch which fixes the issue. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8234610 >> Webrev: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.00/ >> >> Please review and approve. >> >> Best Regards, >> Sandhya >> >> From dean.long at oracle.com Sat Nov 23 01:55:36 2019 From: dean.long at oracle.com (Dean Long) Date: Fri, 22 Nov 2019 17:55:36 -0800 Subject: RFR(S) 8234432: AOT tests failing with 'used 'epsilon gc' is different from current 'g1 gc'' after CMS removal Message-ID: https://bugs.openjdk.java.net/browse/JDK-8234432 http://cr.openjdk.java.net/~dlong/8234432/webrev/ The change fixes AOT after CMS was removed.? Previously we relied to a Graal enum matching a JDK enum, but now we map from one to the other. dl From vladimir.kozlov at oracle.com Sat Nov 23 02:03:54 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 22 Nov 2019 18:03:54 -0800 Subject: RFR 8234359: [JVMCI] invalidate_nmethod_mirror shouldn't use a phantom reference In-Reply-To: <7d7df088-87bc-34ef-0123-a7de32199fbd@oracle.com> References: <9e58f59e-b879-c9d5-be40-01a6b61cc87c@oracle.com> <62e3317b-c1f2-fedb-d2b1-a1d0fdc30bed@oracle.com> <7d7df088-87bc-34ef-0123-a7de32199fbd@oracle.com> Message-ID: Hi Tom 8234429 was just fixed. Please, rebase your changes and test it again. Thanks, Vladimir On 11/22/19 10:27 AM, Tom Rodriguez wrote: > > > Vladimir Kozlov wrote on 11/21/19 10:21 AM: >> On other hand there is testing failure which seems 8234429. >> >> May be we should hold this fix until 8234429 is resolved. And retest again after it fixed > > Whatever you like is fine with me.? Just let me know. > > tom > >> >> Thanks, >> Vladimir >> >> On 11/21/19 10:15 AM, Vladimir Kozlov wrote: >>> +1 >>> >>> Vladimir K >>> >>> On 11/20/19 10:50 PM, Erik ?sterlund wrote: >>>> Hi Tom, >>>> >>>> Looks good. >>>> >>>> Thanks, >>>> /Erik >>>> >>>>> On 21 Nov 2019, at 06:22, Tom Rodriguez wrote: >>>>> >>>>> ?http://cr.openjdk.java.net/~never/8234359/webrev >>>>> https://bugs.openjdk.java.net/browse/JDK-8234359 >>>>> >>>>> While testing the latest JVMCI in JDK11, crashes were occurring during draining of the SATB buffers.? The problem >>>>> was tracked down to invalidate_nmethod_mirror being called on an nmethod whose InstalledCode instance was also dead >>>>> in the current GC. Reading this oop using NativeAccess lead to that oop being enqueued in the >>>>> SATB buffer.? In JDK 14 it appears some other change in G1 disables those barriers at the point this code is >>>>> executed but in JDK11 no such logic exists.? This code never resurrects that oop so using the normal >>>>> AS_NO_KEEPALIVE semantics is correct and avoids attempting to enqueue the potentially dead object. >>>>> >>>>> tom >>>> From vladimir.kozlov at oracle.com Sat Nov 23 02:37:21 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 22 Nov 2019 18:37:21 -0800 Subject: RFR(S) 8234432: AOT tests failing with 'used 'epsilon gc' is different from current 'g1 gc'' after CMS removal In-Reply-To: References: Message-ID: <5a054eb4-3685-5887-7ea8-6bfb52a56c21@oracle.com> Hmm. I assumed that Graal should have GCs list which is subset of GCs in Hotspot. But it could be not true since GraalVM have to run with JDK 8. May be we should bailout AOT compilation if GC is unknown in Hotspot instead of recording in library enum 'def' from Graal which does not match enum in HotSpot. And check for GC early before we start collecting classes to compile. Thanks, Vladimir On 11/22/19 5:55 PM, Dean Long wrote: > https://bugs.openjdk.java.net/browse/JDK-8234432 > http://cr.openjdk.java.net/~dlong/8234432/webrev/ > > The change fixes AOT after CMS was removed.? Previously we relied to a Graal enum matching a JDK enum, but now we map > from one to the other. > > dl From dean.long at oracle.com Sat Nov 23 02:47:58 2019 From: dean.long at oracle.com (Dean Long) Date: Fri, 22 Nov 2019 18:47:58 -0800 Subject: RFR(S) 8234432: AOT tests failing with 'used 'epsilon gc' is different from current 'g1 gc'' after CMS removal In-Reply-To: <5a054eb4-3685-5887-7ea8-6bfb52a56c21@oracle.com> References: <5a054eb4-3685-5887-7ea8-6bfb52a56c21@oracle.com> Message-ID: <6ec6734c-261b-dc65-095e-ade7dffe4e71@oracle.com> On 11/22/19 6:37 PM, Vladimir Kozlov wrote: > Hmm. I assumed that Graal should have GCs list which is subset of GCs > in Hotspot. But it could be not true since GraalVM have to run with > JDK 8. > > May be we should bailout AOT compilation if GC is unknown in Hotspot > instead of recording in library enum 'def' from Graal which does not > match enum in HotSpot. And check for GC early before we start > collecting classes to compile. > Graal uses the HotSpot flags to determine which GC is being used, so there is no way for AOT to store a GC that the underlying HotSpot doesn't know about.? The default fall-back of ordinal() + 1 is only for pre-JDK14 which doesn't have the CollectedHeap GC constants exported to JVMCI.? We could get rid of that if we backport the vmStructs_jvmci.cpp change to all the JDK versions that Graal supports. There is a separate issue, if you try to use a GC that JVMCI/Graal doesn't support: ?% jaotc -J-XX:+UseZGC java.lang.String JVMCI Compiler does not support selected GC: z gc dl > Thanks, > Vladimir > > On 11/22/19 5:55 PM, Dean Long wrote: >> https://bugs.openjdk.java.net/browse/JDK-8234432 >> http://cr.openjdk.java.net/~dlong/8234432/webrev/ >> >> The change fixes AOT after CMS was removed.? Previously we relied to >> a Graal enum matching a JDK enum, but now we map from one to the other. >> >> dl From vladimir.kozlov at oracle.com Sat Nov 23 02:54:05 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 22 Nov 2019 18:54:05 -0800 Subject: RFR(S) 8234432: AOT tests failing with 'used 'epsilon gc' is different from current 'g1 gc'' after CMS removal In-Reply-To: <6ec6734c-261b-dc65-095e-ade7dffe4e71@oracle.com> References: <5a054eb4-3685-5887-7ea8-6bfb52a56c21@oracle.com> <6ec6734c-261b-dc65-095e-ade7dffe4e71@oracle.com> Message-ID: Got it. My thinking was in reverse ;) Changes are good. Vladimir On 11/22/19 6:47 PM, Dean Long wrote: > On 11/22/19 6:37 PM, Vladimir Kozlov wrote: >> Hmm. I assumed that Graal should have GCs list which is subset of GCs in Hotspot. But it could be not true since >> GraalVM have to run with JDK 8. >> >> May be we should bailout AOT compilation if GC is unknown in Hotspot instead of recording in library enum 'def' from >> Graal which does not match enum in HotSpot. And check for GC early before we start collecting classes to compile. >> > > Graal uses the HotSpot flags to determine which GC is being used, so there is no way for AOT to store a GC that the > underlying HotSpot doesn't know about.? The default fall-back of ordinal() + 1 is only for pre-JDK14 which doesn't have > the CollectedHeap GC constants exported to JVMCI.? We could get rid of that if we backport the vmStructs_jvmci.cpp > change to all the JDK versions that Graal supports. > > There is a separate issue, if you try to use a GC that JVMCI/Graal doesn't support: > > ?% jaotc -J-XX:+UseZGC java.lang.String > > JVMCI Compiler does not support selected GC: z gc > > dl >> Thanks, >> Vladimir >> >> On 11/22/19 5:55 PM, Dean Long wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8234432 >>> http://cr.openjdk.java.net/~dlong/8234432/webrev/ >>> >>> The change fixes AOT after CMS was removed.? Previously we relied to a Graal enum matching a JDK enum, but now we map >>> from one to the other. >>> >>> dl > From dean.long at oracle.com Sat Nov 23 03:22:24 2019 From: dean.long at oracle.com (Dean Long) Date: Fri, 22 Nov 2019 19:22:24 -0800 Subject: RFR(S) 8234432: AOT tests failing with 'used 'epsilon gc' is different from current 'g1 gc'' after CMS removal In-Reply-To: References: <5a054eb4-3685-5887-7ea8-6bfb52a56c21@oracle.com> <6ec6734c-261b-dc65-095e-ade7dffe4e71@oracle.com> Message-ID: <9e790dd3-aba1-ecad-5d72-583613094f2e@oracle.com> Thanks Vladimir! dl On 11/22/19 6:54 PM, Vladimir Kozlov wrote: > Got it. My thinking was in reverse ;) > > Changes are good. > > Vladimir > > On 11/22/19 6:47 PM, Dean Long wrote: >> On 11/22/19 6:37 PM, Vladimir Kozlov wrote: >>> Hmm. I assumed that Graal should have GCs list which is subset of >>> GCs in Hotspot. But it could be not true since GraalVM have to run >>> with JDK 8. >>> >>> May be we should bailout AOT compilation if GC is unknown in Hotspot >>> instead of recording in library enum 'def' from Graal which does not >>> match enum in HotSpot. And check for GC early before we start >>> collecting classes to compile. >>> >> >> Graal uses the HotSpot flags to determine which GC is being used, so >> there is no way for AOT to store a GC that the underlying HotSpot >> doesn't know about.? The default fall-back of ordinal() + 1 is only >> for pre-JDK14 which doesn't have the CollectedHeap GC constants >> exported to JVMCI.? We could get rid of that if we backport the >> vmStructs_jvmci.cpp change to all the JDK versions that Graal supports. >> >> There is a separate issue, if you try to use a GC that JVMCI/Graal >> doesn't support: >> >> ??% jaotc -J-XX:+UseZGC java.lang.String >> >> JVMCI Compiler does not support selected GC: z gc >> >> dl >>> Thanks, >>> Vladimir >>> >>> On 11/22/19 5:55 PM, Dean Long wrote: >>>> https://bugs.openjdk.java.net/browse/JDK-8234432 >>>> http://cr.openjdk.java.net/~dlong/8234432/webrev/ >>>> >>>> The change fixes AOT after CMS was removed.? Previously we relied >>>> to a Graal enum matching a JDK enum, but now we map from one to the >>>> other. >>>> >>>> dl >> From lutz.schmidt at sap.com Mon Nov 25 14:06:12 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Mon, 25 Nov 2019 14:06:12 +0000 Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis library Message-ID: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com> Dear all, may I please request reviews for this small change, fixing a regression in the disassembler. Parameters to the hsdis- library were not passed on. The change was verified to fix the issue by the reporter (Jean-Philippe Bempel, on CC:). jdk/submit tests pending... Bug: https://bugs.openjdk.java.net/browse/JDK-8234583 Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/ Thank you, Lutz From vladimir.x.ivanov at oracle.com Mon Nov 25 15:31:03 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 25 Nov 2019 18:31:03 +0300 Subject: 8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier In-Reply-To: <5c442ef3-be3b-e540-0097-5a5b00f5b94e@oracle.com> References: <8fb87bca-a013-e2e0-211f-ebcd5931b8c1@oracle.com> <0bf38389-2b1a-3198-acf5-3b526a9944d9@oracle.com> <5c442ef3-be3b-e540-0097-5a5b00f5b94e@oracle.com> Message-ID: <742a43b3-0d0f-e7e3-8b16-410514cfc855@oracle.com> Hi Erik, >> But I'd include stubs as well. Many of them are extensively used from >> C2-generated code. > > Okay. Any specific stubs you have in mind?If there are some critical > ones, we can sprinkle some scope objects like I did in the ZGC code. There are intrinsics for compressed strings [1], numerous copy stubs [2], trigonometric functions [3]. It would be unfortunate if we have to go over all that code and manually instrument all the places where problematic instructions are issued. Moreover, the process has to be repeated for new code being added over time. > I do have concerns though about injecting magic into the MacroAssembler > that tries to solve this automagically on the assembly level, by having > the assembler spit out different > instructions than you requested. > The following comment from assembler.hpp captures my thought exactly: > > 207: // The Abstract Assembler: Pure assembler doing NO optimizations on > the > 208: // instruction level; i.e., what you write is what you get. > 209: // The Assembler is generating code into a CodeBuffer. While I see that Assembler follows that (instruction per method), MacroAssembler does not: there are cases when generated code differ depending on runtime flags (e.g., verification code) or input values (e.g., whether AddressLiteral is reachable or not). > I think it is desirable to keep the property that when we tell the > *Assembler to generate a __ cmp(); __ jcc(); it will do exactly that. > When such assumptions break, any code that has calculated the size of > instructions, making assumptions about their size, will fail. > For example, any MachNode with hardcoded size() might underestimate how > much memory is really needed, and code such as nmethod entry barriers > that have calculated the offset to the cmp immediate might suddenly stop > working because. There is similar code for oop maps where we > calculate offsets into mach nodes with oop maps to describe the PC after > a call, which will stop working: > > // !!!!! Special hack to get all types of calls to specify the byte offset > //?????? from the start of the call to the point where the return address > //?????? will point. > int MachCallStaticJavaNode::ret_addr_offset() > { > ? int offset = 5; // 5 bytes from start of call to where return address > points > ? offset += clear_avx_size(); > ? return offset; > } > > Basically, I think you might be able to mitigate more branches on the > MacroAssembler layer, but I think it would also be more risky, as code > that was > not built for having random size will start failing, in places we didn't > think of.I can think of a few, and feel like there are probably other > places I have not thought about. > > So from that point of view, I think I would rather to this on Mach nodes > where it is safe, and I think we can catch the most important ones there, > and miss a few branches that the macro assembler would have found with > magic, than apply it to all branches and hope we find all the bugs due > to unexpected magic. > > Do you agree? Or perhaps I misunderstood what you are suggesting. You raise a valid point: there are places in the VM which rely on hard-coded instruction sequences. If such instruction changes, all relevant places have to be adjusted. And JVM is already very cautious about such cases. I agree with you that MacroAssembler-based more risky, but IMO the risk is modest (few places are affected) and manageable (dedicated stress mode should greatly improve test effectiveness). My opinion is that if we are satisfied with the coverage C2 CFG instrumentation provides and don't expect any more work on mitigations, then there's no motivation in investing into MacroAssembler-based approach. Otherwise, there are basically 2 options: * "opt-in": explicitly mark all the places where mitigations are applied, by default nothing is mitigated * "opt-out": mitigate everything unless mitigations are explicitly disabled Both approaches provide fine-grained control over what's being mitigated, but with "opt-out" there's more code to care about: it's easy to miss important cases and too tempting to enable more than we are 100% certain about. Both can be applied to individual CFG nodes and make CFG instrumentation redundant. But if there's a need to instrument large portions of (macro)assembly code, then IMO opt-in adds too much in terms of work required, noise (on code level), maintenance, and burden for future code changes. So, I don't consider it as a feasible option in such situation. It looks like a mixture of opt-in (explicitly enable in some context: in C2 during code emission, particular stub generation, etc) and opt-out (on the level of individual instructions) gives the best of both approaches. But, again, if C2 CFG instrumentation is good enough, then it'll be a wasted effort. So, I envision 3 possible scenarios: (1) just instrument Mach IR and be done with it; (2) (a) start with Mach IR; (b) later it turns out that extensive portions of (macro)assembly code have to me instrumented (or, for example, C1/Interpreter) (c) implement MacroAssembler mitigations (3) start with MacroAssembler mitigations and be done with it * doesn't perclude gradual roll out across different subsystems Mach IR instrumentation (#1/#2) is the safest variant, but it may require more work. #3 is broadly applicable, but also riskier. What I don't consider as a viable option is C2 CFG instrumentation accompanied by numerous per-instruction mitigations scattered across the code base. >>> I have made a prototype, what this might look like and it looks like >>> this: >>> http://cr.openjdk.java.net/~eosterlund/8234160/webrev.01/ >> >> Just one more comment: it's weird to see intel_jcc_erratum referenced >> in shared code. You could #ifdef it for x86-only, but it's much better >> to move the code to x86-specific location. > > Sure, I can move that to an x86 file and make it build only on x86_64. Yes, sounds good. But let's agree on general direction first. Best regards, Vladimir Ivanov [1] http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/macroAssembler_x86.hpp#l1666 [2] http://hg.openjdk.java.net/jdk/jdk/file/623722a6aeb9/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp [3] http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/ macroAssembler_x86_(sin|cos|...).cpp From vladimir.x.ivanov at oracle.com Mon Nov 25 16:05:26 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 25 Nov 2019 19:05:26 +0300 Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis library In-Reply-To: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com> References: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com> Message-ID: <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com> Lutz, Can you elaborate, please, how the patch fixes the problem? Why did you decide to add the following guards? + if ((options() == NULL) || (strlen(options()) == 0)) { Best regards, Vladimir Ivanov On 25.11.2019 17:06, Schmidt, Lutz wrote: > Dear all, > > may I please request reviews for this small change, fixing a regression in the disassembler. Parameters to the hsdis- library were not passed on. > > The change was verified to fix the issue by the reporter (Jean-Philippe Bempel, on CC:). jdk/submit tests pending... > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234583 > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/ > > Thank you, > Lutz > From lutz.schmidt at sap.com Mon Nov 25 16:59:21 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Mon, 25 Nov 2019 16:59:21 +0000 Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis library In-Reply-To: <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com> References: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com> <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com> Message-ID: Hi Vladimir, I'm happy to elaborate in more detail about the issue and the fix. For each decode_env instance which is constructed, process_options() is called. It collects the disassembly options from various sources (Disassembler::pd_cpu_opts() and PrintAssemblyOptions), storing them in the private member "char _option_buf[512]". Further processing derives static flag settings from these options. Being static, these flags need to be set only once, not every time a decode_env is constructed. But that's just one part of the story. It was not taken into account that _option_buf is passed to and analyzed by hsdis-.so as well. That requires _option_buf to be filled every time a decode_env is constructed. Moving if (_optionsParsed) return; after the collect_options() calls heals this deficiency. I added the guards you question as additional "safety net". After looking at the code again I must admit the guards are not necessary. _option_buf can never be NULL and every invocation of process_options() is directly preceded by a memset(_option_buf, 0, sizeof(_option_buf)). I can remove the guards if you like. Please let me know if there are any more questions to be answered. Thanks, Lutz ?On 25.11.19, 17:05, "Vladimir Ivanov" wrote: Lutz, Can you elaborate, please, how the patch fixes the problem? Why did you decide to add the following guards? + if ((options() == NULL) || (strlen(options()) == 0)) { Best regards, Vladimir Ivanov On 25.11.2019 17:06, Schmidt, Lutz wrote: > Dear all, > > may I please request reviews for this small change, fixing a regression in the disassembler. Parameters to the hsdis- library were not passed on. > > The change was verified to fix the issue by the reporter (Jean-Philippe Bempel, on CC:). jdk/submit tests pending... > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234583 > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/ > > Thank you, > Lutz > From vladimir.x.ivanov at oracle.com Mon Nov 25 18:14:43 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 25 Nov 2019 21:14:43 +0300 Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 In-Reply-To: References: <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com> Message-ID: Thanks for the clarifications, Sandhya! Some more questions inlined. > The bug happens like below: > > * User specifies -XX:UseAVX=3 on command line. > * On Skylake platform due to the following lines: > __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset()))); > __ movl(rax, Address(rsi, 0)); > __ cmpl(rax, 0x50654); // If it is Skylake > __ jcc(Assembler::equal, legacy_setup); > The zmm registers are not saved/restored. > * This results in os_supports_avx_vectors() returning false when called from VM_Version::get_processor_features(): > int max_vector_size = 0; > if (UseSSE < 2) { > // Vectors (in XMM) are only supported with SSE2+ > // SSE is always 2 on x64. > max_vector_size = 0; > } else if (UseAVX == 0 || !os_supports_avx_vectors()) { > // 16 byte vectors (in XMM) are supported with SSE2+ > max_vector_size = 16; ====> This is the point where max_vector_size is set to 16 > } else else if (UseAVX == 1 || UseAVX == 2) { > ... > } > * And so we get UseAVX=3 and max_vector_size = 16. > > So the fix is to save/restore zmm registers when the flag is not default and user specifies UseAVX > 2. Unfortunately, I'm still confused :-( If it turns os_supports_avx_vectors() == false, why -XX:UseAVX=1 and -XX:UseAVX=2 aren't affected on Skylake as well? > On your question regarding why 8221092 needs the code you conditionally exclude: > This was introduced so as not to do any AVX512 execution if not required. ZMM register save/restore uses AVX512 instruction. Are you talking about completely avoiding execution of AVX512 instructions on Skylakes if UseAVX < 3? Considering it is in generate_get_cpu_info() which is run only once early at startup, what kind of effects you intend to avoid? Best regards, Vladimir Ivanov > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Ivanov > Sent: Friday, November 22, 2019 4:39 AM > To: Viswanathan, Sandhya ; hotspot compiler > Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 > > Hi Sandhya, > >> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092. >> >> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument. >> This should automatically result in MaxVectorSize being set to 64 bytes. >> >> However post JDK-8221092, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes. > > Please, elaborate how it happens and how legacy_setup affects it? > > Why 8221092 needs the code you conditionally exclude. > Why the following isn't enough? > > if (FLAG_IS_DEFAULT(UseAVX)) { > FLAG_SET_DEFAULT(UseAVX, use_avx_limit); > + if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE && > _stepping < 5) { > + FLAG_SET_DEFAULT(UseAVX, 2); //Set UseAVX=2 for Skylake > + } > } else if (UseAVX > use_avx_limit) { > > Best regards, > Vladimir Ivanov > From vladimir.x.ivanov at oracle.com Mon Nov 25 18:30:54 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 25 Nov 2019 21:30:54 +0300 Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis library In-Reply-To: References: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com> <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com> Message-ID: <03722e38-d5fc-fe27-3426-49e1e92d9d17@oracle.com> Thanks for the clarifications, Lutz. So, I assume you have a typo in the patch then: + if ((options() == NULL) || (strlen(options()) == 0)) { + // We need to fill the options buffer for each newly created + // decode_env instance. The hsdis_* library looks for options + // in that buffer. + collect_options(Disassembler::pd_cpu_opts()); + collect_options(PrintAssemblyOptions); + } It performs collect_options() calls only if _option_buf is either NULL or "\0". Also, what about the following updates of instance members? if (strstr(options(), "print-raw")) { _print_raw = (strstr(options(), "xml") ? 2 : 1); } if (strstr(options(), "help")) { _print_help = true; } BTW should _print_help (along with _helpPrinted) be better turned into static member? Can we make _option_buf static as well? Or do we want to keep a defensive copy to pass into hsdis.so? As an alternative approach to fix the bug, we could create a golden copy during parsing instead and then just copy it to _option_buf as part of decode_env initialization. Best regards, Vladimir Ivanov On 25.11.2019 19:59, Schmidt, Lutz wrote: > Hi Vladimir, > > I'm happy to elaborate in more detail about the issue and the fix. > > For each decode_env instance which is constructed, process_options() is called. It collects the disassembly options from various sources (Disassembler::pd_cpu_opts() and PrintAssemblyOptions), storing them in the private member "char _option_buf[512]". > > Further processing derives static flag settings from these options. Being static, these flags need to be set only once, not every time a decode_env is constructed. > > But that's just one part of the story. It was not taken into account that _option_buf is passed to and analyzed by hsdis-.so as well. That requires _option_buf to be filled every time a decode_env is constructed. > > Moving > if (_optionsParsed) return; > after the collect_options() calls heals this deficiency. > > I added the guards you question as additional "safety net". After looking at the code again I must admit the guards are not necessary. _option_buf can never be NULL and every invocation of process_options() is directly preceded by a memset(_option_buf, 0, sizeof(_option_buf)). I can remove the guards if you like. > > Please let me know if there are any more questions to be answered. > > Thanks, > Lutz > > > ?On 25.11.19, 17:05, "Vladimir Ivanov" wrote: > > Lutz, > > Can you elaborate, please, how the patch fixes the problem? > > Why did you decide to add the following guards? > > + if ((options() == NULL) || (strlen(options()) == 0)) { > > Best regards, > Vladimir Ivanov > > On 25.11.2019 17:06, Schmidt, Lutz wrote: > > Dear all, > > > > may I please request reviews for this small change, fixing a regression in the disassembler. Parameters to the hsdis- library were not passed on. > > > > The change was verified to fix the issue by the reporter (Jean-Philippe Bempel, on CC:). jdk/submit tests pending... > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234583 > > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/ > > > > Thank you, > > Lutz > > > > From sandhya.viswanathan at intel.com Mon Nov 25 18:44:48 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Mon, 25 Nov 2019 18:44:48 +0000 Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 In-Reply-To: References: <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com> Message-ID: Hi Vladimir, From the source code, it looks like os_supports_avx_vectors() does two different things based on the AVX level. If AVX=3, it checks if save/restore of zmm worked properly. If AVX=1 or 2, it checks is save/restore of ymm worked properly. Since only save/restore of zmm is done conditionally for Skylake, the problem is only with AVX=3. And that is what this patch is trying to fix. The rest we are entering 8221092 discussion. I think Vivek's intent there was not do any AVX 512 instructions at all if AVX < 3 to overcome performance regressions observed by Scott Oaks. Best Regards, Sandhya -----Original Message----- From: Vladimir Ivanov Sent: Monday, November 25, 2019 10:15 AM To: Viswanathan, Sandhya ; Vladimir Kozlov Cc: hotspot compiler Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 Thanks for the clarifications, Sandhya! Some more questions inlined. > The bug happens like below: > > * User specifies -XX:UseAVX=3 on command line. > * On Skylake platform due to the following lines: > __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset()))); > __ movl(rax, Address(rsi, 0)); > __ cmpl(rax, 0x50654); // If it is Skylake > __ jcc(Assembler::equal, legacy_setup); > The zmm registers are not saved/restored. > * This results in os_supports_avx_vectors() returning false when called from VM_Version::get_processor_features(): > int max_vector_size = 0; > if (UseSSE < 2) { > // Vectors (in XMM) are only supported with SSE2+ > // SSE is always 2 on x64. > max_vector_size = 0; > } else if (UseAVX == 0 || !os_supports_avx_vectors()) { > // 16 byte vectors (in XMM) are supported with SSE2+ > max_vector_size = 16; ====> This is the point where max_vector_size is set to 16 > } else else if (UseAVX == 1 || UseAVX == 2) { > ... > } > * And so we get UseAVX=3 and max_vector_size = 16. > > So the fix is to save/restore zmm registers when the flag is not default and user specifies UseAVX > 2. Unfortunately, I'm still confused :-( If it turns os_supports_avx_vectors() == false, why -XX:UseAVX=1 and -XX:UseAVX=2 aren't affected on Skylake as well? > On your question regarding why 8221092 needs the code you conditionally exclude: > This was introduced so as not to do any AVX512 execution if not required. ZMM register save/restore uses AVX512 instruction. Are you talking about completely avoiding execution of AVX512 instructions on Skylakes if UseAVX < 3? Considering it is in generate_get_cpu_info() which is run only once early at startup, what kind of effects you intend to avoid? Best regards, Vladimir Ivanov > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Ivanov > Sent: Friday, November 22, 2019 4:39 AM > To: Viswanathan, Sandhya ; hotspot > compiler > Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when > UseAVX=3 is specified after JDK-8221092 > > Hi Sandhya, > >> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092. >> >> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument. >> This should automatically result in MaxVectorSize being set to 64 bytes. >> >> However post JDK-8221092, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes. > > Please, elaborate how it happens and how legacy_setup affects it? > > Why 8221092 needs the code you conditionally exclude. > Why the following isn't enough? > > if (FLAG_IS_DEFAULT(UseAVX)) { > FLAG_SET_DEFAULT(UseAVX, use_avx_limit); > + if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE && > _stepping < 5) { > + FLAG_SET_DEFAULT(UseAVX, 2); //Set UseAVX=2 for Skylake > + } > } else if (UseAVX > use_avx_limit) { > > Best regards, > Vladimir Ivanov > From lutz.schmidt at sap.com Mon Nov 25 20:03:34 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Mon, 25 Nov 2019 20:03:34 +0000 Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis library In-Reply-To: <03722e38-d5fc-fe27-3426-49e1e92d9d17@oracle.com> References: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com> <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com> <03722e38-d5fc-fe27-3426-49e1e92d9d17@oracle.com> Message-ID: <30AB85CF-8A51-494A-AA08-9A4C9C2F1EF1@sap.com> Hi Vladimir, you are welcome. And you are right: the condition should read if ((options() != NULL) && (strlen(options()) == 0)) { to be correct. I suggest we end this discussion and I just remove these checks. All your other comments are valid. There is an open bug to address and improve the very basic options parsing: https://bugs.openjdk.java.net/browse/JDK-8223765 This task was split off from JDK-8213084. I would like to cover the improvements you suggest when working on that bug. To make _print_raw work correctly, I suggest to just move if (_optionsParsed) return; a bit further down. The help text should be printed only once anyway. Here is a new webrev iteration. It reflects what I suggest: https://cr.openjdk.java.net/~lucy/webrevs/8234583.01/ Thanks, Lutz ?On 25.11.19, 19:30, "Vladimir Ivanov" wrote: Thanks for the clarifications, Lutz. So, I assume you have a typo in the patch then: + if ((options() == NULL) || (strlen(options()) == 0)) { + // We need to fill the options buffer for each newly created + // decode_env instance. The hsdis_* library looks for options + // in that buffer. + collect_options(Disassembler::pd_cpu_opts()); + collect_options(PrintAssemblyOptions); + } It performs collect_options() calls only if _option_buf is either NULL or "\0". Also, what about the following updates of instance members? if (strstr(options(), "print-raw")) { _print_raw = (strstr(options(), "xml") ? 2 : 1); } if (strstr(options(), "help")) { _print_help = true; } BTW should _print_help (along with _helpPrinted) be better turned into static member? Can we make _option_buf static as well? Or do we want to keep a defensive copy to pass into hsdis.so? As an alternative approach to fix the bug, we could create a golden copy during parsing instead and then just copy it to _option_buf as part of decode_env initialization. Best regards, Vladimir Ivanov On 25.11.2019 19:59, Schmidt, Lutz wrote: > Hi Vladimir, > > I'm happy to elaborate in more detail about the issue and the fix. > > For each decode_env instance which is constructed, process_options() is called. It collects the disassembly options from various sources (Disassembler::pd_cpu_opts() and PrintAssemblyOptions), storing them in the private member "char _option_buf[512]". > > Further processing derives static flag settings from these options. Being static, these flags need to be set only once, not every time a decode_env is constructed. > > But that's just one part of the story. It was not taken into account that _option_buf is passed to and analyzed by hsdis-.so as well. That requires _option_buf to be filled every time a decode_env is constructed. > > Moving > if (_optionsParsed) return; > after the collect_options() calls heals this deficiency. > > I added the guards you question as additional "safety net". After looking at the code again I must admit the guards are not necessary. _option_buf can never be NULL and every invocation of process_options() is directly preceded by a memset(_option_buf, 0, sizeof(_option_buf)). I can remove the guards if you like. > > Please let me know if there are any more questions to be answered. > > Thanks, > Lutz > > > On 25.11.19, 17:05, "Vladimir Ivanov" wrote: > > Lutz, > > Can you elaborate, please, how the patch fixes the problem? > > Why did you decide to add the following guards? > > + if ((options() == NULL) || (strlen(options()) == 0)) { > > Best regards, > Vladimir Ivanov > > On 25.11.2019 17:06, Schmidt, Lutz wrote: > > Dear all, > > > > may I please request reviews for this small change, fixing a regression in the disassembler. Parameters to the hsdis- library were not passed on. > > > > The change was verified to fix the issue by the reporter (Jean-Philippe Bempel, on CC:). jdk/submit tests pending... > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234583 > > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/ > > > > Thank you, > > Lutz > > > > From vladimir.x.ivanov at oracle.com Mon Nov 25 20:23:04 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 25 Nov 2019 23:23:04 +0300 Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 In-Reply-To: References: <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com> Message-ID: > From the source code, it looks like os_supports_avx_vectors() does two different things based on the AVX level. > If AVX=3, it checks if save/restore of zmm worked properly. > If AVX=1 or 2, it checks is save/restore of ymm worked properly. > Since only save/restore of zmm is done conditionally for Skylake, the problem is only with AVX=3. And that is what this patch is trying to fix. Ok, now I see how it works: os_supports_avx_vectors() uses supports_evex() and supports_avx() to dispatch which inspect supported CPU features: static bool supports_avx() { return (_features & CPU_AVX) != 0; } static bool supports_evex() { return (_features & CPU_AVX512F) != 0; } But VM_Version::get_processor_features() clears detected features depending on UseAVX level. So, as you described os_supports_avx_vectors() goes through different paths between UseAVX=3 and UseAVX=1/2. > The rest we are entering 8221092 discussion. > I think Vivek's intent there was not do any AVX 512 instructions at all if AVX < 3 to overcome performance regressions observed by Scott Oaks. My best guess is it helped analysis by eliminating all problematic instructions. Anyway, I'm fine with leaving that code. (Though it would be nice to simply get rid of it.) It looks like all the code after start_simd_check (and checks to legacy_save_restore) can go under use_evex check. And it'll make the code clearer (I spent too much time reasoning about interactions between use_evex and FLAG_IS_DEFAULT(UseAVX) you added). Otherwise, looks good. Best regards, Vladimir Ivanov > -----Original Message----- > From: Vladimir Ivanov > Sent: Monday, November 25, 2019 10:15 AM > To: Viswanathan, Sandhya ; Vladimir Kozlov > Cc: hotspot compiler > Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 > > Thanks for the clarifications, Sandhya! > > Some more questions inlined. >> The bug happens like below: >> >> * User specifies -XX:UseAVX=3 on command line. >> * On Skylake platform due to the following lines: >> __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset()))); >> __ movl(rax, Address(rsi, 0)); >> __ cmpl(rax, 0x50654); // If it is Skylake >> __ jcc(Assembler::equal, legacy_setup); >> The zmm registers are not saved/restored. >> * This results in os_supports_avx_vectors() returning false when called from VM_Version::get_processor_features(): >> int max_vector_size = 0; >> if (UseSSE < 2) { >> // Vectors (in XMM) are only supported with SSE2+ >> // SSE is always 2 on x64. >> max_vector_size = 0; >> } else if (UseAVX == 0 || !os_supports_avx_vectors()) { >> // 16 byte vectors (in XMM) are supported with SSE2+ >> max_vector_size = 16; ====> This is the point where max_vector_size is set to 16 >> } else else if (UseAVX == 1 || UseAVX == 2) { >> ... >> } >> * And so we get UseAVX=3 and max_vector_size = 16. >> >> So the fix is to save/restore zmm registers when the flag is not default and user specifies UseAVX > 2. > > Unfortunately, I'm still confused :-( > > If it turns os_supports_avx_vectors() == false, why -XX:UseAVX=1 and > -XX:UseAVX=2 aren't affected on Skylake as well? > >> On your question regarding why 8221092 needs the code you conditionally exclude: >> This was introduced so as not to do any AVX512 execution if not required. ZMM register save/restore uses AVX512 instruction. > > Are you talking about completely avoiding execution of AVX512 instructions on Skylakes if UseAVX < 3? > > Considering it is in generate_get_cpu_info() which is run only once early at startup, what kind of effects you intend to avoid? > > Best regards, > Vladimir Ivanov > >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Vladimir Ivanov >> Sent: Friday, November 22, 2019 4:39 AM >> To: Viswanathan, Sandhya ; hotspot >> compiler >> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when >> UseAVX=3 is specified after JDK-8221092 >> >> Hi Sandhya, >> >>> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092. >>> >>> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument. >>> This should automatically result in MaxVectorSize being set to 64 bytes. >>> >>> However post JDK-8221092, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes. >> >> Please, elaborate how it happens and how legacy_setup affects it? >> >> Why 8221092 needs the code you conditionally exclude. >> Why the following isn't enough? >> >> if (FLAG_IS_DEFAULT(UseAVX)) { >> FLAG_SET_DEFAULT(UseAVX, use_avx_limit); >> + if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE && >> _stepping < 5) { >> + FLAG_SET_DEFAULT(UseAVX, 2); //Set UseAVX=2 for Skylake >> + } >> } else if (UseAVX > use_avx_limit) { >> >> Best regards, >> Vladimir Ivanov >> From vladimir.x.ivanov at oracle.com Mon Nov 25 20:26:16 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 25 Nov 2019 23:26:16 +0300 Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis library In-Reply-To: <30AB85CF-8A51-494A-AA08-9A4C9C2F1EF1@sap.com> References: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com> <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com> <03722e38-d5fc-fe27-3426-49e1e92d9d17@oracle.com> <30AB85CF-8A51-494A-AA08-9A4C9C2F1EF1@sap.com> Message-ID: <50f96d01-aeca-d2ce-44df-093be8c77310@oracle.com> > All your other comments are valid. There is an open bug to address and improve the very basic options parsing: https://bugs.openjdk.java.net/browse/JDK-8223765 This task was split off from JDK-8213084. > > I would like to cover the improvements you suggest when working on that bug. To make _print_raw work correctly, I suggest to just move > if (_optionsParsed) return; > a bit further down. The help text should be printed only once anyway. Ok, I'm fine with addressing it later. And thanks for taking care of it. > Here is a new webrev iteration. It reflects what I suggest: https://cr.openjdk.java.net/~lucy/webrevs/8234583.01/ Looks good. Best regards, Vladimir Ivanov > ?On 25.11.19, 19:30, "Vladimir Ivanov" wrote: > > Thanks for the clarifications, Lutz. > > So, I assume you have a typo in the patch then: > > + if ((options() == NULL) || (strlen(options()) == 0)) { > + // We need to fill the options buffer for each newly created > + // decode_env instance. The hsdis_* library looks for options > + // in that buffer. > + collect_options(Disassembler::pd_cpu_opts()); > + collect_options(PrintAssemblyOptions); > + } > > It performs collect_options() calls only if _option_buf is either NULL > or "\0". > > Also, what about the following updates of instance members? > > if (strstr(options(), "print-raw")) { > _print_raw = (strstr(options(), "xml") ? 2 : 1); > } > > if (strstr(options(), "help")) { > _print_help = true; > } > > BTW should _print_help (along with _helpPrinted) be better turned into > static member? > > Can we make _option_buf static as well? Or do we want to keep a > defensive copy to pass into hsdis.so? > > As an alternative approach to fix the bug, we could create a golden copy > during parsing instead and then just copy it to _option_buf as part of > decode_env initialization. > > Best regards, > Vladimir Ivanov > > On 25.11.2019 19:59, Schmidt, Lutz wrote: > > Hi Vladimir, > > > > I'm happy to elaborate in more detail about the issue and the fix. > > > > For each decode_env instance which is constructed, process_options() is called. It collects the disassembly options from various sources (Disassembler::pd_cpu_opts() and PrintAssemblyOptions), storing them in the private member "char _option_buf[512]". > > > > Further processing derives static flag settings from these options. Being static, these flags need to be set only once, not every time a decode_env is constructed. > > > > But that's just one part of the story. It was not taken into account that _option_buf is passed to and analyzed by hsdis-.so as well. That requires _option_buf to be filled every time a decode_env is constructed. > > > > Moving > > if (_optionsParsed) return; > > after the collect_options() calls heals this deficiency. > > > > I added the guards you question as additional "safety net". After looking at the code again I must admit the guards are not necessary. _option_buf can never be NULL and every invocation of process_options() is directly preceded by a memset(_option_buf, 0, sizeof(_option_buf)). I can remove the guards if you like. > > > > Please let me know if there are any more questions to be answered. > > > > Thanks, > > Lutz > > > > > > On 25.11.19, 17:05, "Vladimir Ivanov" wrote: > > > > Lutz, > > > > Can you elaborate, please, how the patch fixes the problem? > > > > Why did you decide to add the following guards? > > > > + if ((options() == NULL) || (strlen(options()) == 0)) { > > > > Best regards, > > Vladimir Ivanov > > > > On 25.11.2019 17:06, Schmidt, Lutz wrote: > > > Dear all, > > > > > > may I please request reviews for this small change, fixing a regression in the disassembler. Parameters to the hsdis- library were not passed on. > > > > > > The change was verified to fix the issue by the reporter (Jean-Philippe Bempel, on CC:). jdk/submit tests pending... > > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234583 > > > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/ > > > > > > Thank you, > > > Lutz > > > > > > > > > From lutz.schmidt at sap.com Mon Nov 25 20:47:57 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Mon, 25 Nov 2019 20:47:57 +0000 Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis library In-Reply-To: <50f96d01-aeca-d2ce-44df-093be8c77310@oracle.com> References: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com> <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com> <03722e38-d5fc-fe27-3426-49e1e92d9d17@oracle.com> <30AB85CF-8A51-494A-AA08-9A4C9C2F1EF1@sap.com> <50f96d01-aeca-d2ce-44df-093be8c77310@oracle.com> Message-ID: <7B40FDAF-7E30-4ABC-9E1B-B18B2C139150@sap.com> Thanks for the review, Vladimir! Still one to go. Regards, Lutz ?On 25.11.19, 21:26, "Vladimir Ivanov" wrote: > All your other comments are valid. There is an open bug to address and improve the very basic options parsing: https://bugs.openjdk.java.net/browse/JDK-8223765 This task was split off from JDK-8213084. > > I would like to cover the improvements you suggest when working on that bug. To make _print_raw work correctly, I suggest to just move > if (_optionsParsed) return; > a bit further down. The help text should be printed only once anyway. Ok, I'm fine with addressing it later. And thanks for taking care of it. > Here is a new webrev iteration. It reflects what I suggest: https://cr.openjdk.java.net/~lucy/webrevs/8234583.01/ Looks good. Best regards, Vladimir Ivanov > On 25.11.19, 19:30, "Vladimir Ivanov" wrote: > > Thanks for the clarifications, Lutz. > > So, I assume you have a typo in the patch then: > > + if ((options() == NULL) || (strlen(options()) == 0)) { > + // We need to fill the options buffer for each newly created > + // decode_env instance. The hsdis_* library looks for options > + // in that buffer. > + collect_options(Disassembler::pd_cpu_opts()); > + collect_options(PrintAssemblyOptions); > + } > > It performs collect_options() calls only if _option_buf is either NULL > or "\0". > > Also, what about the following updates of instance members? > > if (strstr(options(), "print-raw")) { > _print_raw = (strstr(options(), "xml") ? 2 : 1); > } > > if (strstr(options(), "help")) { > _print_help = true; > } > > BTW should _print_help (along with _helpPrinted) be better turned into > static member? > > Can we make _option_buf static as well? Or do we want to keep a > defensive copy to pass into hsdis.so? > > As an alternative approach to fix the bug, we could create a golden copy > during parsing instead and then just copy it to _option_buf as part of > decode_env initialization. > > Best regards, > Vladimir Ivanov > > On 25.11.2019 19:59, Schmidt, Lutz wrote: > > Hi Vladimir, > > > > I'm happy to elaborate in more detail about the issue and the fix. > > > > For each decode_env instance which is constructed, process_options() is called. It collects the disassembly options from various sources (Disassembler::pd_cpu_opts() and PrintAssemblyOptions), storing them in the private member "char _option_buf[512]". > > > > Further processing derives static flag settings from these options. Being static, these flags need to be set only once, not every time a decode_env is constructed. > > > > But that's just one part of the story. It was not taken into account that _option_buf is passed to and analyzed by hsdis-.so as well. That requires _option_buf to be filled every time a decode_env is constructed. > > > > Moving > > if (_optionsParsed) return; > > after the collect_options() calls heals this deficiency. > > > > I added the guards you question as additional "safety net". After looking at the code again I must admit the guards are not necessary. _option_buf can never be NULL and every invocation of process_options() is directly preceded by a memset(_option_buf, 0, sizeof(_option_buf)). I can remove the guards if you like. > > > > Please let me know if there are any more questions to be answered. > > > > Thanks, > > Lutz > > > > > > On 25.11.19, 17:05, "Vladimir Ivanov" wrote: > > > > Lutz, > > > > Can you elaborate, please, how the patch fixes the problem? > > > > Why did you decide to add the following guards? > > > > + if ((options() == NULL) || (strlen(options()) == 0)) { > > > > Best regards, > > Vladimir Ivanov > > > > On 25.11.2019 17:06, Schmidt, Lutz wrote: > > > Dear all, > > > > > > may I please request reviews for this small change, fixing a regression in the disassembler. Parameters to the hsdis- library were not passed on. > > > > > > The change was verified to fix the issue by the reporter (Jean-Philippe Bempel, on CC:). jdk/submit tests pending... > > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234583 > > > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/ > > > > > > Thank you, > > > Lutz > > > > > > > > > From sandhya.viswanathan at intel.com Mon Nov 25 23:31:42 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Mon, 25 Nov 2019 23:31:42 +0000 Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 In-Reply-To: References: <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com> Message-ID: Hi Vladimir, Please find below updated webrev with your comments implemented: http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.02/ Best Regards, Sandhya -----Original Message----- From: Vladimir Ivanov Sent: Monday, November 25, 2019 12:23 PM To: Viswanathan, Sandhya ; Vladimir Kozlov Cc: hotspot compiler Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 > From the source code, it looks like os_supports_avx_vectors() does two different things based on the AVX level. > If AVX=3, it checks if save/restore of zmm worked properly. > If AVX=1 or 2, it checks is save/restore of ymm worked properly. > Since only save/restore of zmm is done conditionally for Skylake, the problem is only with AVX=3. And that is what this patch is trying to fix. Ok, now I see how it works: os_supports_avx_vectors() uses supports_evex() and supports_avx() to dispatch which inspect supported CPU features: static bool supports_avx() { return (_features & CPU_AVX) != 0; } static bool supports_evex() { return (_features & CPU_AVX512F) != 0; } But VM_Version::get_processor_features() clears detected features depending on UseAVX level. So, as you described os_supports_avx_vectors() goes through different paths between UseAVX=3 and UseAVX=1/2. > The rest we are entering 8221092 discussion. > I think Vivek's intent there was not do any AVX 512 instructions at all if AVX < 3 to overcome performance regressions observed by Scott Oaks. My best guess is it helped analysis by eliminating all problematic instructions. Anyway, I'm fine with leaving that code. (Though it would be nice to simply get rid of it.) It looks like all the code after start_simd_check (and checks to legacy_save_restore) can go under use_evex check. And it'll make the code clearer (I spent too much time reasoning about interactions between use_evex and FLAG_IS_DEFAULT(UseAVX) you added). Otherwise, looks good. Best regards, Vladimir Ivanov > -----Original Message----- > From: Vladimir Ivanov > Sent: Monday, November 25, 2019 10:15 AM > To: Viswanathan, Sandhya ; Vladimir > Kozlov > Cc: hotspot compiler > Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when > UseAVX=3 is specified after JDK-8221092 > > Thanks for the clarifications, Sandhya! > > Some more questions inlined. >> The bug happens like below: >> >> * User specifies -XX:UseAVX=3 on command line. >> * On Skylake platform due to the following lines: >> __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset()))); >> __ movl(rax, Address(rsi, 0)); >> __ cmpl(rax, 0x50654); // If it is Skylake >> __ jcc(Assembler::equal, legacy_setup); >> The zmm registers are not saved/restored. >> * This results in os_supports_avx_vectors() returning false when called from VM_Version::get_processor_features(): >> int max_vector_size = 0; >> if (UseSSE < 2) { >> // Vectors (in XMM) are only supported with SSE2+ >> // SSE is always 2 on x64. >> max_vector_size = 0; >> } else if (UseAVX == 0 || !os_supports_avx_vectors()) { >> // 16 byte vectors (in XMM) are supported with SSE2+ >> max_vector_size = 16; ====> This is the point where max_vector_size is set to 16 >> } else else if (UseAVX == 1 || UseAVX == 2) { >> ... >> } >> * And so we get UseAVX=3 and max_vector_size = 16. >> >> So the fix is to save/restore zmm registers when the flag is not default and user specifies UseAVX > 2. > > Unfortunately, I'm still confused :-( > > If it turns os_supports_avx_vectors() == false, why -XX:UseAVX=1 and > -XX:UseAVX=2 aren't affected on Skylake as well? > >> On your question regarding why 8221092 needs the code you conditionally exclude: >> This was introduced so as not to do any AVX512 execution if not required. ZMM register save/restore uses AVX512 instruction. > > Are you talking about completely avoiding execution of AVX512 instructions on Skylakes if UseAVX < 3? > > Considering it is in generate_get_cpu_info() which is run only once early at startup, what kind of effects you intend to avoid? > > Best regards, > Vladimir Ivanov > >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Vladimir Ivanov >> Sent: Friday, November 22, 2019 4:39 AM >> To: Viswanathan, Sandhya ; hotspot >> compiler >> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when >> UseAVX=3 is specified after JDK-8221092 >> >> Hi Sandhya, >> >>> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092. >>> >>> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument. >>> This should automatically result in MaxVectorSize being set to 64 bytes. >>> >>> However post JDK-8221092, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes. >> >> Please, elaborate how it happens and how legacy_setup affects it? >> >> Why 8221092 needs the code you conditionally exclude. >> Why the following isn't enough? >> >> if (FLAG_IS_DEFAULT(UseAVX)) { >> FLAG_SET_DEFAULT(UseAVX, use_avx_limit); >> + if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE && >> _stepping < 5) { >> + FLAG_SET_DEFAULT(UseAVX, 2); //Set UseAVX=2 for Skylake >> + } >> } else if (UseAVX > use_avx_limit) { >> >> Best regards, >> Vladimir Ivanov >> From sandhya.viswanathan at intel.com Tue Nov 26 00:08:55 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 26 Nov 2019 00:08:55 +0000 Subject: [14] RFR (L): 8234391: C2: Generic vector operands In-Reply-To: <89904467-5010-129f-6f61-e279cce8936a@oracle.com> References: <89904467-5010-129f-6f61-e279cce8936a@oracle.com> Message-ID: Could we please get one more review of this patch? Best Regards, Sandhya -----Original Message----- From: Vladimir Ivanov Sent: Tuesday, November 19, 2019 6:31 AM To: hotspot compiler Cc: Bhateja, Jatin ; Viswanathan, Sandhya Subject: [14] RFR (L): 8234391: C2: Generic vector operands http://cr.openjdk.java.net/~vlivanov/jbhateja/8234391/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8234391 Introduce generic vector operands and migrate existing usages from fixed sized operands (vec[SDXYZ]) to generic ones. (It's an updated version of generic vector support posted for review in August, 2019 [1] [2]. AD instruction merges will be handled separately.) On a high-level it is organized as follows: (1) all AD instructions in x86.ad/x86_64.ad/x86_32.ad use vec/legVec; (2) at runtime, right after matching is over, a special pass is performed which does: * replaces vecOper with vec[SDXYZ] depending on mach node type - vector mach nodes capute bottom_type() of their ideal prototype; * eliminates redundant reg-to-reg vector moves (MoveVec2Leg /MoveLeg2Vec) - matcher needs them, but they are useless for register allocator (moreover, may cause additional spills); (3) after post-selection pass is over, all mach nodes should have fixed-size vector operands. Some details: (1) vec and legVec are marked as "dynamic" operands, so post-selection rewriting works (2) new logic is guarded by new matcher flag (Matcher::supports_generic_vector_operands) which is enabled only on x86 (3) post-selection analysis is implemented as a single pass over the graph and processing individual nodes using their own (for DEF operands) or their inputs (USE operands) bottom_type() (which is an instance of TypeVect) (4) most of the analysis is cross-platform and interface with platform-specific code through 3 methods: static bool is_generic_reg2reg_move(MachNode* m); // distinguishes MoveVec2Leg/MoveLeg2Vec nodes static bool is_generic_vector(MachOper* opnd); // distinguishes vec/legVec operands static MachOper* clone_generic_vector_operand(MachOper* generic_opnd, uint ideal_reg); // constructs fixed-sized vector operand based on ideal reg // vec + Op_Vec[SDXYZ] => vec[SDXYZ] // legVec + Op_Vec[SDXYZ] => legVec[SDXYZ] (5) TEMP operands are handled specially: - TEMP uses max_vector_size() to determine what fixed-sized operand to use * it is needed to cover reductions which don't produce vectors but scalars - TEMP_DEF inherits fixed-sized operand type from DEF; (6) there is limited number of special cases for mach nodes in Matcher::get_vector_operand_helper: - RShiftCntV/RShiftCntV: though it reports wide vector type as Node::bottom_type(), its ideal_reg is VecS! But for vector nodes only Node::bottom_type() is captured during matching and not ideal_reg(). - vshiftcntimm: chain instructions which convert scalar to vector don't have vector type. (7) idealreg2regmask initialization logic is adjusted to handle generic vector operands (see Matcher::get_vector_regmask) (8) operand renaming in x86_32.ad & x86_64.ad to avoid name conflicts with new vec/legVec operands (9) x86_64.ad: all TEMP usages of vecS/legVecS are replaced with regD/legRegD - it aligns the code between x86_64.ad and x86_32.ad - strictly speaking, it's illegal to use vector operands on a non-vector node (e.g., string_inflate) unless its usage is guarded by C2 vector support checks (-XX:MaxVectorSize=0) Contributed-by: Jatin Bhateja Reviewed-by: vlivanov, sviswanathan, ? Testing: tier1-tier4, jtreg compiler tests on KNL and SKL, performance testing (SPEC* + Octane + micros / G1 + ParGC). Best regards, Vladimir Ivanov [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html [2] http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf From nick.gasson at arm.com Tue Nov 26 09:25:03 2019 From: nick.gasson at arm.com (Nick Gasson) Date: Tue, 26 Nov 2019 17:25:03 +0800 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> Message-ID: Hi Andrew, > > I see > > if (use_XOR_for_compressed_class_base) { > if (CompressedKlassPointers::shift() != 0) { > eor(dst, src, (uint64_t)CompressedKlassPointers::base()); > lsr(dst, dst, LogKlassAlignmentInBytes); > } else { > eor(dst, src, (uint64_t)CompressedKlassPointers::base()); > } > return; > } > > if (((uint64_t)CompressedKlassPointers::base() & 0xffffffff) == 0 > && CompressedKlassPointers::shift() == 0) { > movw(dst, src); > return; > } > > ... followed by code which does use r27. > > Do you ever see r27 being used? If so, I'd be interested to know how > this gets triggered and what command-line arguments you use. It's > rather inefficient. > Oddly enough the test case runtime/memory/ReadFromNoaccessArea.java now hits this. I see: CompressedKlassPointers::base() => 0xffff0b4b5000 CompressedKlassPointers::shift() => 3 The itable stub calls MacroAssembler::load_klass() twice which then calls the above decode_klass_not_null() with dst==src if UseCompressedClassPointers is true. So we do the saving/restoring rheapbase dance twice which blows up the size of the itable stub beyond the estimated 152B max size. The key is that this test passes -XX:HeapBaseMinAddress=33G. That in conjunction with the recent changes to where the CDS archive is loaded hits this code path (I don't see this with -Xshare:off). Thanks, Nick From lutz.schmidt at sap.com Tue Nov 26 09:25:33 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 26 Nov 2019 09:25:33 +0000 Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis library In-Reply-To: References: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com> <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com> <03722e38-d5fc-fe27-3426-49e1e92d9d17@oracle.com> <30AB85CF-8A51-494A-AA08-9A4C9C2F1EF1@sap.com> <50f96d01-aeca-d2ce-44df-093be8c77310@oracle.com> <7B40FDAF-7E30-4ABC-9E1B-B18B2C139150@sap.com> Message-ID: <8F130FCB-4AA3-431F-A1D8-718EF80D010F@sap.com> Thank you, Jean-Philippe, for checking once again. Regards, Lutz From: Jean-Philippe BEMPEL Date: Tuesday, 26. November 2019 at 10:19 To: Lutz Schmidt Cc: Vladimir Ivanov , "hotspot-compiler-dev at openjdk.java.net" Subject: Re: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis library Hello, With last review, still works for me. Thanks On Mon, Nov 25, 2019 at 9:47 PM Schmidt, Lutz > wrote: Thanks for the review, Vladimir! Still one to go. Regards, Lutz On 25.11.19, 21:26, "Vladimir Ivanov" > wrote: > All your other comments are valid. There is an open bug to address and improve the very basic options parsing: https://bugs.openjdk.java.net/browse/JDK-8223765 This task was split off from JDK-8213084. > > I would like to cover the improvements you suggest when working on that bug. To make _print_raw work correctly, I suggest to just move > if (_optionsParsed) return; > a bit further down. The help text should be printed only once anyway. Ok, I'm fine with addressing it later. And thanks for taking care of it. > Here is a new webrev iteration. It reflects what I suggest: https://cr.openjdk.java.net/~lucy/webrevs/8234583.01/ Looks good. Best regards, Vladimir Ivanov > On 25.11.19, 19:30, "Vladimir Ivanov" > wrote: > > Thanks for the clarifications, Lutz. > > So, I assume you have a typo in the patch then: > > + if ((options() == NULL) || (strlen(options()) == 0)) { > + // We need to fill the options buffer for each newly created > + // decode_env instance. The hsdis_* library looks for options > + // in that buffer. > + collect_options(Disassembler::pd_cpu_opts()); > + collect_options(PrintAssemblyOptions); > + } > > It performs collect_options() calls only if _option_buf is either NULL > or "\0". > > Also, what about the following updates of instance members? > > if (strstr(options(), "print-raw")) { > _print_raw = (strstr(options(), "xml") ? 2 : 1); > } > > if (strstr(options(), "help")) { > _print_help = true; > } > > BTW should _print_help (along with _helpPrinted) be better turned into > static member? > > Can we make _option_buf static as well? Or do we want to keep a > defensive copy to pass into hsdis.so? > > As an alternative approach to fix the bug, we could create a golden copy > during parsing instead and then just copy it to _option_buf as part of > decode_env initialization. > > Best regards, > Vladimir Ivanov > > On 25.11.2019 19:59, Schmidt, Lutz wrote: > > Hi Vladimir, > > > > I'm happy to elaborate in more detail about the issue and the fix. > > > > For each decode_env instance which is constructed, process_options() is called. It collects the disassembly options from various sources (Disassembler::pd_cpu_opts() and PrintAssemblyOptions), storing them in the private member "char _option_buf[512]". > > > > Further processing derives static flag settings from these options. Being static, these flags need to be set only once, not every time a decode_env is constructed. > > > > But that's just one part of the story. It was not taken into account that _option_buf is passed to and analyzed by hsdis-.so as well. That requires _option_buf to be filled every time a decode_env is constructed. > > > > Moving > > if (_optionsParsed) return; > > after the collect_options() calls heals this deficiency. > > > > I added the guards you question as additional "safety net". After looking at the code again I must admit the guards are not necessary. _option_buf can never be NULL and every invocation of process_options() is directly preceded by a memset(_option_buf, 0, sizeof(_option_buf)). I can remove the guards if you like. > > > > Please let me know if there are any more questions to be answered. > > > > Thanks, > > Lutz > > > > > > On 25.11.19, 17:05, "Vladimir Ivanov" > wrote: > > > > Lutz, > > > > Can you elaborate, please, how the patch fixes the problem? > > > > Why did you decide to add the following guards? > > > > + if ((options() == NULL) || (strlen(options()) == 0)) { > > > > Best regards, > > Vladimir Ivanov > > > > On 25.11.2019 17:06, Schmidt, Lutz wrote: > > > Dear all, > > > > > > may I please request reviews for this small change, fixing a regression in the disassembler. Parameters to the hsdis- library were not passed on. > > > > > > The change was verified to fix the issue by the reporter (Jean-Philippe Bempel, on CC:). jdk/submit tests pending... > > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234583 > > > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/ > > > > > > Thank you, > > > Lutz > > > > > > > > > From vladimir.x.ivanov at oracle.com Tue Nov 26 09:26:20 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 26 Nov 2019 12:26:20 +0300 Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 In-Reply-To: References: <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com> Message-ID: > http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.02/ Looks good. Testing results (hs-precheckin-comp,hs-tier1,hs-tier2) are clean. Best regards, Vladimir Ivanov > -----Original Message----- > From: Vladimir Ivanov > Sent: Monday, November 25, 2019 12:23 PM > To: Viswanathan, Sandhya ; Vladimir Kozlov > Cc: hotspot compiler > Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 > > >> From the source code, it looks like os_supports_avx_vectors() does two different things based on the AVX level. >> If AVX=3, it checks if save/restore of zmm worked properly. >> If AVX=1 or 2, it checks is save/restore of ymm worked properly. >> Since only save/restore of zmm is done conditionally for Skylake, the problem is only with AVX=3. And that is what this patch is trying to fix. > > Ok, now I see how it works: > > os_supports_avx_vectors() uses supports_evex() and supports_avx() to dispatch which inspect supported CPU features: > > static bool supports_avx() { return (_features & CPU_AVX) != 0; } > > static bool supports_evex() { return (_features & CPU_AVX512F) != > 0; } > > But VM_Version::get_processor_features() clears detected features depending on UseAVX level. > > So, as you described os_supports_avx_vectors() goes through different paths between UseAVX=3 and UseAVX=1/2. > >> The rest we are entering 8221092 discussion. >> I think Vivek's intent there was not do any AVX 512 instructions at all if AVX < 3 to overcome performance regressions observed by Scott Oaks. > > My best guess is it helped analysis by eliminating all problematic instructions. > > Anyway, I'm fine with leaving that code. (Though it would be nice to simply get rid of it.) > > It looks like all the code after start_simd_check (and checks to > legacy_save_restore) can go under use_evex check. And it'll make the code clearer (I spent too much time reasoning about interactions between use_evex and FLAG_IS_DEFAULT(UseAVX) you added). > > Otherwise, looks good. > > Best regards, > Vladimir Ivanov > >> -----Original Message----- >> From: Vladimir Ivanov >> Sent: Monday, November 25, 2019 10:15 AM >> To: Viswanathan, Sandhya ; Vladimir >> Kozlov >> Cc: hotspot compiler >> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when >> UseAVX=3 is specified after JDK-8221092 >> >> Thanks for the clarifications, Sandhya! >> >> Some more questions inlined. >>> The bug happens like below: >>> >>> * User specifies -XX:UseAVX=3 on command line. >>> * On Skylake platform due to the following lines: >>> __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset()))); >>> __ movl(rax, Address(rsi, 0)); >>> __ cmpl(rax, 0x50654); // If it is Skylake >>> __ jcc(Assembler::equal, legacy_setup); >>> The zmm registers are not saved/restored. >>> * This results in os_supports_avx_vectors() returning false when called from VM_Version::get_processor_features(): >>> int max_vector_size = 0; >>> if (UseSSE < 2) { >>> // Vectors (in XMM) are only supported with SSE2+ >>> // SSE is always 2 on x64. >>> max_vector_size = 0; >>> } else if (UseAVX == 0 || !os_supports_avx_vectors()) { >>> // 16 byte vectors (in XMM) are supported with SSE2+ >>> max_vector_size = 16; ====> This is the point where max_vector_size is set to 16 >>> } else else if (UseAVX == 1 || UseAVX == 2) { >>> ... >>> } >>> * And so we get UseAVX=3 and max_vector_size = 16. >>> >>> So the fix is to save/restore zmm registers when the flag is not default and user specifies UseAVX > 2. >> >> Unfortunately, I'm still confused :-( >> >> If it turns os_supports_avx_vectors() == false, why -XX:UseAVX=1 and >> -XX:UseAVX=2 aren't affected on Skylake as well? >> >>> On your question regarding why 8221092 needs the code you conditionally exclude: >>> This was introduced so as not to do any AVX512 execution if not required. ZMM register save/restore uses AVX512 instruction. >> >> Are you talking about completely avoiding execution of AVX512 instructions on Skylakes if UseAVX < 3? >> >> Considering it is in generate_get_cpu_info() which is run only once early at startup, what kind of effects you intend to avoid? >> >> Best regards, >> Vladimir Ivanov >> >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: Vladimir Ivanov >>> Sent: Friday, November 22, 2019 4:39 AM >>> To: Viswanathan, Sandhya ; hotspot >>> compiler >>> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when >>> UseAVX=3 is specified after JDK-8221092 >>> >>> Hi Sandhya, >>> >>>> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092. >>>> >>>> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument. >>>> This should automatically result in MaxVectorSize being set to 64 bytes. >>>> >>>> However post JDK-8221092, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes. >>> >>> Please, elaborate how it happens and how legacy_setup affects it? >>> >>> Why 8221092 needs the code you conditionally exclude. >>> Why the following isn't enough? >>> >>> if (FLAG_IS_DEFAULT(UseAVX)) { >>> FLAG_SET_DEFAULT(UseAVX, use_avx_limit); >>> + if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE && >>> _stepping < 5) { >>> + FLAG_SET_DEFAULT(UseAVX, 2); //Set UseAVX=2 for Skylake >>> + } >>> } else if (UseAVX > use_avx_limit) { >>> >>> Best regards, >>> Vladimir Ivanov >>> From claes.redestad at oracle.com Tue Nov 26 09:44:36 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Tue, 26 Nov 2019 10:44:36 +0100 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ Message-ID: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> Hi, in various places in the hotspot we have custom code to calculate the next power of two, some of which have potential to go into an infinite loop in case of an overflow. This patch proposes adding next_power_of_two utility methods which avoid infinite loops on overflow, while providing slightly more efficient code in most cases. Bug: https://bugs.openjdk.java.net/browse/JDK-8234331 Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/ Testing: tier1-3 Thanks! /Claes From david.holmes at oracle.com Tue Nov 26 09:50:22 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 26 Nov 2019 19:50:22 +1000 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> Message-ID: Hi Claes, Just some high-level comments - should next_power_of_two be defined in globalDefinitions.hpp along side the related functionality ie is_power_of_two ? - can next_power_of_two build on the existing log2_* functions (or vice versa)? - do the existing ZUtils not cover the same general area? ./share/gc/z/zUtils.inline.hpp inline size_t ZUtils::round_up_power_of_2(size_t value) { assert(value != 0, "Invalid value"); if (is_power_of_2(value)) { return value; } return (size_t)1 << (log2_intptr(value) + 1); } inline size_t ZUtils::round_down_power_of_2(size_t value) { assert(value != 0, "Invalid value"); return (size_t)1 << log2_intptr(value); } Cheers, David On 26/11/2019 7:44 pm, Claes Redestad wrote: > Hi, > > in various places in the hotspot we have custom code to calculate the > next power of two, some of which have potential to go into an infinite > loop in case of an overflow. > > This patch proposes adding next_power_of_two utility methods which > avoid infinite loops on overflow, while providing slightly more > efficient code in most cases. > > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8234331 > Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/ > > Testing: tier1-3 > > Thanks! > > /Claes From claes.redestad at oracle.com Tue Nov 26 10:06:29 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Tue, 26 Nov 2019 11:06:29 +0100 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> Message-ID: <2686ec0d-3dac-a0c3-d2c3-0fa5211bc07b@oracle.com> On 2019-11-26 10:50, David Holmes wrote: > Hi Claes, > > Just some high-level comments > > - should next_power_of_two be defined in globalDefinitions.hpp along > side the related functionality ie is_power_of_two ? I thought we are trying to move things _out_ of globalDefinitions. I agree align.hpp might not be the best place, either, though.. > > - can next_power_of_two build on the existing log2_* functions (or vice > versa)? Yes, log2_intptr et al could probably be tamed to do a single step operation, although we'd need to add 64-bit implementations in count_leading_zeros. At least these log2_* functions already deal with overflows without looping forever. > > - do the existing ZUtils not cover the same general area? > > ./share/gc/z/zUtils.inline.hpp > > inline size_t ZUtils::round_up_power_of_2(size_t value) { > ? assert(value != 0, "Invalid value"); > > ? if (is_power_of_2(value)) { > ??? return value; > ? } > > ? return (size_t)1 << (log2_intptr(value) + 1); > } > > inline size_t ZUtils::round_down_power_of_2(size_t value) { > ? assert(value != 0, "Invalid value"); > ? return (size_t)1 << log2_intptr(value); > } round_up_power_of_2 is similar, but not identical (next_power_of_two doesn't care if the value is already a power of 2, nor should it). /Claes From david.holmes at oracle.com Tue Nov 26 10:23:12 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 26 Nov 2019 20:23:12 +1000 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: <2686ec0d-3dac-a0c3-d2c3-0fa5211bc07b@oracle.com> References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> <2686ec0d-3dac-a0c3-d2c3-0fa5211bc07b@oracle.com> Message-ID: <2eb6f996-cd85-00cb-b795-dd2eefabd10b@oracle.com> On 26/11/2019 8:06 pm, Claes Redestad wrote: > > > On 2019-11-26 10:50, David Holmes wrote: >> Hi Claes, >> >> Just some high-level comments >> >> - should next_power_of_two be defined in globalDefinitions.hpp along >> side the related functionality ie is_power_of_two ? > > I thought we are trying to move things _out_ of globalDefinitions. I We are? I don't recall hearing that. But wherever these go seems they all belong together. > agree align.hpp might not be the best place, either, though.. I thought align.hpp as strange place too. :) >> >> - can next_power_of_two build on the existing log2_* functions (or >> vice versa)? > > Yes, log2_intptr et al could probably be tamed to do a single step > operation, although we'd need to add 64-bit implementations in > count_leading_zeros. At least these log2_* functions already deal with > overflows without looping forever. > >> >> - do the existing ZUtils not cover the same general area? >> >> ./share/gc/z/zUtils.inline.hpp >> >> inline size_t ZUtils::round_up_power_of_2(size_t value) { >> ?? assert(value != 0, "Invalid value"); >> >> ?? if (is_power_of_2(value)) { >> ???? return value; >> ?? } >> >> ?? return (size_t)1 << (log2_intptr(value) + 1); >> } >> >> inline size_t ZUtils::round_down_power_of_2(size_t value) { >> ?? assert(value != 0, "Invalid value"); >> ?? return (size_t)1 << log2_intptr(value); >> } > > round_up_power_of_2 is similar, but not identical (next_power_of_two > doesn't care if the value is already a power of 2, nor should it). Okay but seems perhaps these should also be moved out of ZUtils and co-located with the other "power of two" functions. Cheers, David ----- > /Claes From nils.eliasson at oracle.com Tue Nov 26 10:49:48 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 26 Nov 2019 11:49:48 +0100 Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles In-Reply-To: <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com> References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com> <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com> Message-ID: <49878c82-d174-b2c9-219a-0fe561155c0d@oracle.com> Patch 2 was not the full patch. Use this instead: http://cr.openjdk.java.net/~neliasso/8234520/webrev.03/ Regards, Nils On 2019-11-21 12:53, Nils Eliasson wrote: > I updated this to version 2. > > http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/ > > I found a problen running > compiler/arguments/TestStressReflectiveCode.java > > Even though the clone was created as a oop clone, the type node type > returns isa_aryprt. This is caused by the src ptr not being the base > pointer. Until I fix that I wanted a more robust test. > > In this webrev I split up the is_clonebasic into is_clone_oop and > is_clone_array. (is_clone_oop_array is already there). Having a > complete set with the three clone types allows for a robust test and > easy verification. (The three variants end up in different paths with > different GCs). > > Regards, > > Nils > > > On 2019-11-20 15:25, Nils Eliasson wrote: >> Hi, >> >> I found a few bugs after the enabling of the clone intrinsic in ZGC. >> >> 1) The arraycopy clone_basic has the parameters adjusted to work as a >> memcopy. For an oop the src is pointing inside the oop to where we >> want to start copying. But when we want to do a runtime call to clone >> - the parameters are supposed to be the actual src oop and dst oop, >> and the size should be the instance size. >> >> For now I have made a workaround. What should be done later is using >> the offset in the arraycopy node to encode where the payload is, so >> that the base pointers are always correct. But that would require >> changes to the BarrierSet classes of all GCs. So I leave that for >> next release. >> >> 2) The size parameter of the TypeFunc for the runtime call has the >> wrong type. It was originally Long but missed the upper Half, it was >> fixed to INT (JDK-8233834), but that is wrong and causes the compiles >> to be skipped. We didn't notice that since they failed silently. That >> is also why we didn't notice problem #1 too. >> >> https://bugs.openjdk.java.net/browse/JDK-8234520 >> >> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/ >> >> Please review! >> >> Nils >> From vladimir.x.ivanov at oracle.com Tue Nov 26 12:02:06 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 26 Nov 2019 15:02:06 +0300 Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles In-Reply-To: <49878c82-d174-b2c9-219a-0fe561155c0d@oracle.com> References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com> <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com> <49878c82-d174-b2c9-219a-0fe561155c0d@oracle.com> Message-ID: <58fd5565-4342-ea70-511d-bace68308391@oracle.com> > http://cr.openjdk.java.net/~neliasso/8234520/webrev.03/ ==================================================================== src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp: + // The currently modeled arraycopy-clone_basic doesn't have the base pointers for src and dst, + // rather point at the start of the payload. + Node* src_base = get_base_for_arracycopy_clone(phase, src); + Node* dst_base = get_base_for_arracycopy_clone(phase, dst); + + // The size must also be increased to match the instance size. + int base_off = BarrierSetC2::extract_base_offset(false); + Node* full_size = phase->transform_later(new AddLNode(size, phase->longcon(base_off >> LogBytesPerLong))); Node* const call = phase->make_leaf_call(ctrl, mem, clone_type(), ZBarrierSetRuntime::clone_addr(), "ZBarrierSetRuntime::clone", TypeRawPtr::BOTTOM, - src, - dst, - size); + src_base, + dst_base, + full_size, + phase->top()); Do you see any problems with copying object header? ==================================================================== The rest are minor comments: src/hotspot/share/gc/shared/c2/barrierSetC2.cpp: -void BarrierSetC2::clone(GraphKit* kit, Node* src, Node* dst, Node* size, bool is_array) const { - // Exclude the header but include array length to copy by 8 bytes words. - // Can't use base_offset_in_bytes(bt) since basic type is unknown. +int BarrierSetC2::extract_base_offset(bool is_array) { ... +void BarrierSetC2::clone(GraphKit* kit, Node* src, Node* dst, Node* size, bool is_array) const { + // Exclude the header but include array length to copy by 8 bytes words. + // Can't use base_offset_in_bytes(bt) since basic type is unknown. + int base_off = extract_base_offset(is_array); I'd leave the comment in BarrierSetC2::extract_base_offset(). After the refactoring it looks confusing in BarrierSetC2::clone(). Also, considering it's used from zBarrierSetC2.cpp, it would be nice to have a more descriptive name. BarrierSetC2::arraycopy_payload_base_offset(bool is_array) maybe? ==================================================================== src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp: +Node* get_base_for_arracycopy_clone(PhaseMacroExpand* phase, Node* n) { Should get_base_for_arracycopy_clone be static? Also, would be nice if the name reflects that it works on instances and not arrays. ==================================================================== + // The currently modeled arraycopy-clone_basic doesn't have the base pointers for src and dst, + // rather point at the start of the payload. + Node* src_base = get_base_for_arracycopy_clone(phase, src); + Node* dst_base = get_base_for_arracycopy_clone(phase, dst); Another thing: "base" in zBarrierSetC2.cpp and in BarrierSetC2 has opposite meaning which is confusing. Renaming src/dst<->src_base/dst_base in ZBarrierSetC2::clone_at_expansion() would improve things. ==================================================================== - if (src->bottom_type()->isa_aryptr()) { + if (ac->is_clone_array()) { // Clone primitive array Is the comment valid? Doesn't it cover object array case as well? Best regards, Vladimir Ivanov > > Regards, > > Nils > > > On 2019-11-21 12:53, Nils Eliasson wrote: >> I updated this to version 2. >> >> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/ >> >> I found a problen running >> compiler/arguments/TestStressReflectiveCode.java >> >> Even though the clone was created as a oop clone, the type node type >> returns isa_aryprt. This is caused by the src ptr not being the base >> pointer. Until I fix that I wanted a more robust test. >> >> In this webrev I split up the is_clonebasic into is_clone_oop and >> is_clone_array. (is_clone_oop_array is already there). Having a >> complete set with the three clone types allows for a robust test and >> easy verification. (The three variants end up in different paths with >> different GCs). >> >> Regards, >> >> Nils >> >> >> On 2019-11-20 15:25, Nils Eliasson wrote: >>> Hi, >>> >>> I found a few bugs after the enabling of the clone intrinsic in ZGC. >>> >>> 1) The arraycopy clone_basic has the parameters adjusted to work as a >>> memcopy. For an oop the src is pointing inside the oop to where we >>> want to start copying. But when we want to do a runtime call to clone >>> - the parameters are supposed to be the actual src oop and dst oop, >>> and the size should be the instance size. >>> >>> For now I have made a workaround. What should be done later is using >>> the offset in the arraycopy node to encode where the payload is, so >>> that the base pointers are always correct. But that would require >>> changes to the BarrierSet classes of all GCs. So I leave that for >>> next release. >>> >>> 2) The size parameter of the TypeFunc for the runtime call has the >>> wrong type. It was originally Long but missed the upper Half, it was >>> fixed to INT (JDK-8233834), but that is wrong and causes the compiles >>> to be skipped. We didn't notice that since they failed silently. That >>> is also why we didn't notice problem #1 too. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8234520 >>> >>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/ >>> >>> Please review! >>> >>> Nils >>> From per.liden at oracle.com Tue Nov 26 13:29:40 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 26 Nov 2019 14:29:40 +0100 Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles In-Reply-To: <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com> References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com> <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com> Message-ID: <2154a53d-4d36-d26f-9155-5c955796f566@oracle.com> Hi Nils, On 11/21/19 12:53 PM, Nils Eliasson wrote: > I updated this to version 2. > > http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/ > > I found a problen running compiler/arguments/TestStressReflectiveCode.java > > Even though the clone was created as a oop clone, the type node type > returns isa_aryprt. This is caused by the src ptr not being the base > pointer. Until I fix that I wanted a more robust test. > > In this webrev I split up the is_clonebasic into is_clone_oop and > is_clone_array. (is_clone_oop_array is already there). Having a complete > set with the three clone types allows for a robust test and easy > verification. (The three variants end up in different paths with > different GCs). A couple of suggestions: 1) Instead of CloneOop CloneArray CloneOopArray I think we should call the three types: CloneInstance CloneTypeArray CloneOopArray Since CloneOop is not actually cloning an oop, but an object/instance. And being explicit about TypeArray seems like a good thing to avoid any confusion about the difference compared to CloneOopArray. I guess PrimArray would be an alternative to TypeArray. And of course, if we change this then the is_clone/set_clone functions should follow the same naming convention. Btw, what about CopyOf and CopyOfRange? Don't they also come in Oop and Type versions, or are we handling those differently in some way? Looking at the code it looks like they are only used for the oop array case? 2) In zBarrierSerC2.cpp, do you mind if we do like this instead? I find that quite a bit easier to read. [...] const Type** domain_fields = TypeTuple::fields(4); domain_fields[TypeFunc::Parms + 0] = TypeInstPtr::NOTNULL; // src domain_fields[TypeFunc::Parms + 1] = TypeInstPtr::NOTNULL; // dst domain_fields[TypeFunc::Parms + 2] = TypeLong::LONG; // size lower domain_fields[TypeFunc::Parms + 3] = Type::HALF; // size upper const TypeTuple* domain = TypeTuple::make(TypeFunc::Parms + 4, domain_fields); [...] 3) I'd also like to add some const, adjust indentation, etc, in a few places. Instead of listing them here I made a patch, which goes on top of yours. This patch also adjusts 2) above. Just shout if you have any objections. http://cr.openjdk.java.net/~pliden/8234520/webrev.03-review /Per > > Regards, > > Nils > > > On 2019-11-20 15:25, Nils Eliasson wrote: >> Hi, >> >> I found a few bugs after the enabling of the clone intrinsic in ZGC. >> >> 1) The arraycopy clone_basic has the parameters adjusted to work as a >> memcopy. For an oop the src is pointing inside the oop to where we >> want to start copying. But when we want to do a runtime call to clone >> - the parameters are supposed to be the actual src oop and dst oop, >> and the size should be the instance size. >> >> For now I have made a workaround. What should be done later is using >> the offset in the arraycopy node to encode where the payload is, so >> that the base pointers are always correct. But that would require >> changes to the BarrierSet classes of all GCs. So I leave that for next >> release. >> >> 2) The size parameter of the TypeFunc for the runtime call has the >> wrong type. It was originally Long but missed the upper Half, it was >> fixed to INT (JDK-8233834), but that is wrong and causes the compiles >> to be skipped. We didn't notice that since they failed silently. That >> is also why we didn't notice problem #1 too. >> >> https://bugs.openjdk.java.net/browse/JDK-8234520 >> >> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/ >> >> Please review! >> >> Nils >> From christoph.goettschkes at microdoc.com Tue Nov 26 13:28:23 2019 From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com) Date: Tue, 26 Nov 2019 14:28:23 +0100 Subject: RFR: 823480: [TESTBUG] LoopRotateBadNodeBudget fails for client VMs due to Unrecognized VM option PartialPeelNewPhiDelta Message-ID: Hi, please review the following small changeset which fixes the test test/hotspot/jtreg/compiler/loopopts/LoopRotateBadNodeBudget.java for client VMs. I simply added vm.compiler2.enabled to the requires tag, since the original bug only appeared with the server JIT: Bug: https://bugs.openjdk.java.net/browse/JDK-8234807 Webrev: http://cr.openjdk.java.net/~cgo/8234807/webrev.00/ Bug which introduced the issue: https://bugs.openjdk.java.net/browse/JDK-8231565 Thanks, Christoph From patric.hedlin at oracle.com Tue Nov 26 14:29:38 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 26 Nov 2019 15:29:38 +0100 Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 check In-Reply-To: <8d451a2e-fb94-8d59-e02e-e3182e115a0b@oracle.com> References: <8d451a2e-fb94-8d59-e02e-e3182e115a0b@oracle.com> Message-ID: Thanks for reviewing Nils. /Patric On 13/11/2019 17:12, Nils Eliasson wrote: > Hi Patric, > > Looks good! > > (I have pre-reviewed this patch offline) > > Regards, > > Nils > > On 2019-11-12 15:16, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ >> >> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check >> >> ??? Adding a simple subsumption test to IfNode::Ideal to enable a local >> ??? short-circuit for (obviously) redundant if-nodes. >> >> Testing: hs-tier1-4, hs-precheckin-comp >> >> >> Best regards, >> Patric From patric.hedlin at oracle.com Tue Nov 26 14:29:51 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 26 Nov 2019 15:29:51 +0100 Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 check In-Reply-To: <6f1dbc0a-b1a4-43e8-65e1-1e0df2115c33@oracle.com> References: <6f1dbc0a-b1a4-43e8-65e1-1e0df2115c33@oracle.com> Message-ID: <416ea976-e907-ace1-f166-f16318fbe6bb@oracle.com> Thanks Dean, I should take a look for the full patch. Best regards, Patric On 14/11/2019 06:12, dean.long at oracle.com wrote: > Hi Patric.? I was expecting the fix to allow the following existing > logic in DivINode::Ideal to work: > > ? // Check for excluding div-zero case > ? if (in(0) && (ti->_hi < 0 || ti->_lo > 0)) { > ??? set_req(0, NULL);?????????? // Yank control input > ??? return this; > ? } > > by making sure the range of "ti" has been sharpened by the previous > if-node.? I was just wondering if you looked at that solution and > thought it was feasible.? I see Parse::sharpen_type_after_if() is > almost doing the right thing, but only handles BoolTest::eq. > > dl > > On 11/12/19 6:16 AM, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ >> >> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check >> >> ??? Adding a simple subsumption test to IfNode::Ideal to enable a local >> ??? short-circuit for (obviously) redundant if-nodes. >> >> Testing: hs-tier1-4, hs-precheckin-comp >> >> >> Best regards, >> Patric >> > From patric.hedlin at oracle.com Tue Nov 26 14:30:02 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 26 Nov 2019 15:30:02 +0100 Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 check In-Reply-To: References: <35dd0640-7f48-11ae-bd12-1fcaf893b2fc@oracle.com> Message-ID: Thanks for reviewing Martin. Best regards, Patric On 18/11/2019 12:23, Doerr, Martin wrote: > Hi Patric, > > I'd consider moving subsuming_bool_test_encode up to avoid the prototype. > But I can also live with it. > > Looks good to me. > > Best regards, > Martin > > >> -----Original Message----- >> From: Patric Hedlin >> Sent: Montag, 18. November 2019 11:06 >> To: hotspot-compiler-dev at openjdk.java.net; Nils Eliasson >> ; Vladimir Ivanov >> ; Doerr, Martin >> Subject: Re: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 >> check >> >> Dear all, >> >> Please review the new patch, now reduced to a "minimum" (besides the >> table encoding). >> >> Updated in-place. >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ >> >> Testing: hs-tier1-3 >> >> >> Best regards, >> Patric >> >> On 12/11/2019 15:16, Patric Hedlin wrote: >>> Dear all, >>> >>> I would like to ask for help to review the following change/update: >>> >>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376 >>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ >>> >>> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check >>> >>> ??? Adding a simple subsumption test to IfNode::Ideal to enable a local >>> ??? short-circuit for (obviously) redundant if-nodes. >>> >>> Testing: hs-tier1-4, hs-precheckin-comp >>> >>> >>> Best regards, >>> Patric >>> From patric.hedlin at oracle.com Tue Nov 26 14:30:20 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 26 Nov 2019 15:30:20 +0100 Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 check In-Reply-To: <4314ca44-9094-a4d5-407e-9d9eaf5d4b37@oracle.com> References: <35dd0640-7f48-11ae-bd12-1fcaf893b2fc@oracle.com> <4314ca44-9094-a4d5-407e-9d9eaf5d4b37@oracle.com> Message-ID: <44fbba54-fdd2-af82-13e7-7bd36f16b752@oracle.com> Thanks for reviewing Vladimir. On 18/11/2019 23:56, Vladimir Ivanov wrote: > >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ > > Looks good, Patric. > > Best regards, > Vladimir Ivanov > > PS: some cleanup suggestions (feel free to ignore them if you don't > agree): > > src/hotspot/share/opto/ifnode.cpp: > > +//?? \r1 > +//? r2\? eqT? eqF? neT? neF? ltT? ltF? leT? leF? gtT? gtF? geT geF > +//? eq??? t??? f??? f??? t??? f??? -??? -??? f??? f??? -??? - f > +//? ne??? f??? t??? t??? f??? t??? -??? -??? t??? t??? -??? - t > +//? lt??? f??? -??? -??? f??? t??? f??? -??? f??? f??? -??? f t > +//? le??? t??? -??? -??? t??? t??? -??? t??? f??? f??? t??? - t > +//? gt??? f??? -??? -??? f??? f??? -??? f??? t??? t??? f??? - f > +//? ge??? t??? -??? -??? t??? f??? t??? -??? t??? t??? -??? t f > +// > +Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) { > +? // Table encoding: N/A (na), True-branch (tb), False-branch (fb). > +? static enum { na, tb, fb } s_subsume_map[6][12] = { > +? /*rel: eq+T eq+F ne+T ne+F lt+T lt+F le+T le+F gt+T gt+F ge+T ge+F*/ > +? /*eq*/{ tb,? fb,? fb,? tb,? fb,? na,? na,? fb,? fb,? na,? na, fb }, > +? /*ne*/{ fb,? tb,? tb,? fb,? tb,? na,? na,? tb,? tb,? na,? na, tb }, > +? /*lt*/{ fb,? na,? na,? fb,? tb,? fb,? na,? fb,? fb,? na,? fb, tb }, > +? /*le*/{ tb,? na,? na,? tb,? tb,? na,? tb,? fb,? fb,? tb,? na, tb }, > +? /*gt*/{ fb,? na,? na,? fb,? fb,? na,? fb,? tb,? tb,? fb,? na, fb }, > +? /*ge*/{ tb,? na,? na,? tb,? fb,? tb,? na,? tb,? tb,? na,? tb, fb }}; > > IMO you can dump the table from the comment: it mostly duplicates the > code. (Probably, you can use a different name for "N/A" or just refer > to it in numeric form (0?) to preserve clean structure of the table > from the comment.) > > ====================================== > > +? if (is_If() && (cmp = in(1)->in(1))->Opcode() == Op_CmpP) { > +??? if (cmp->in(2) != NULL && // make sure cmp is not already dead > +??????? cmp->in(2)->bottom_type() == TypePtr::NULL_PTR) { > > Merge nested ifs? > I to have problems with this part but for other reasons. What about: (?) -? Node* cmp; ?? int dist = 4;?????????????? // Cutoff limit for search -? if (is_If() && (cmp = in(1)->in(1))->Opcode() == Op_CmpP) { -??? if (cmp->in(2) != NULL && // make sure cmp is not already dead +? if (is_If() && in(1)->is_Bool()) { +??? Node* cmp = in(1)->in(1); +??? if (cmp->Opcode() == Op_CmpP && +??????? cmp->in(2) != NULL && // make sure cmp is not already dead Best regards, Patric > ====================================== > > Looks like extracting the following code into a helper function (along > with the enum and the table) can improve readability. > > +? int drel = subsuming_bool_test_encode(dom->in(1)); > +? int trel = subsuming_bool_test_encode(bol); > +? int bout = pre->is_IfFalse() ? 1 : 0; > + > +? if (drel < 0 || trel < 0) { > +??? return NULL; > +? } > +? int br = s_subsume_map[trel][2*drel+bout]; > +? if (br == na) { > +??? return NULL; > +? } > > New function can return intcon(0/1) or bol(or NULL?) and the caller > decides whether the update is needed. > >> On 12/11/2019 15:16, Patric Hedlin wrote: >>> Dear all, >>> >>> I would like to ask for help to review the following change/update: >>> >>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376 >>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ >>> >>> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check >>> >>> ??? Adding a simple subsumption test to IfNode::Ideal to enable a local >>> ??? short-circuit for (obviously) redundant if-nodes. >>> >>> Testing: hs-tier1-4, hs-precheckin-comp >>> >>> >>> Best regards, >>> Patric From ivan.gerasimov at oracle.com Tue Nov 26 10:14:11 2019 From: ivan.gerasimov at oracle.com (Ivan Gerasimov) Date: Tue, 26 Nov 2019 02:14:11 -0800 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> Message-ID: <603eb8ec-ea48-42ae-1c6e-92c2165133b7@oracle.com> Hi Claes! In the code in align.hpp it is assumed that (1U << 32) == 0, which is not guaranteed. In fact, if the right argument of the shift operator is >= 32 (for 32-bit left argument) then the behavior is undefined, and thus is compiler specific. With kind regards, Ivan On 11/26/19 1:44 AM, Claes Redestad wrote: > Hi, > > in various places in the hotspot we have custom code to calculate the > next power of two, some of which have potential to go into an infinite > loop in case of an overflow. > > This patch proposes adding next_power_of_two utility methods which > avoid infinite loops on overflow, while providing slightly more > efficient code in most cases. > > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8234331 > Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/ > > Testing: tier1-3 > > Thanks! > > /Claes -- With kind regards, Ivan Gerasimov From vladimir.x.ivanov at oracle.com Tue Nov 26 15:04:17 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 26 Nov 2019 18:04:17 +0300 Subject: RFR(M): 8220376: C2: Int >0 not recognized as !=0 for div by 0 check In-Reply-To: <44fbba54-fdd2-af82-13e7-7bd36f16b752@oracle.com> References: <35dd0640-7f48-11ae-bd12-1fcaf893b2fc@oracle.com> <4314ca44-9094-a4d5-407e-9d9eaf5d4b37@oracle.com> <44fbba54-fdd2-af82-13e7-7bd36f16b752@oracle.com> Message-ID: <3ddbec9f-63c8-1c72-6399-e0cbe1758a0b@oracle.com> >> >> +? if (is_If() && (cmp = in(1)->in(1))->Opcode() == Op_CmpP) { >> +??? if (cmp->in(2) != NULL && // make sure cmp is not already dead >> +??????? cmp->in(2)->bottom_type() == TypePtr::NULL_PTR) { >> >> Merge nested ifs? >> > I to have problems with this part but for other reasons. What about: (?) > > -? Node* cmp; > ?? int dist = 4;?????????????? // Cutoff limit for search > -? if (is_If() && (cmp = in(1)->in(1))->Opcode() == Op_CmpP) { > -??? if (cmp->in(2) != NULL && // make sure cmp is not already dead > +? if (is_If() && in(1)->is_Bool()) { > +??? Node* cmp = in(1)->in(1); > +??? if (cmp->Opcode() == Op_CmpP && > +??????? cmp->in(2) != NULL && // make sure cmp is not already dead Looks even better. Best regards, Vladimir Ivanov >> Looks like extracting the following code into a helper function (along >> with the enum and the table) can improve readability. >> >> +? int drel = subsuming_bool_test_encode(dom->in(1)); >> +? int trel = subsuming_bool_test_encode(bol); >> +? int bout = pre->is_IfFalse() ? 1 : 0; >> + >> +? if (drel < 0 || trel < 0) { >> +??? return NULL; >> +? } >> +? int br = s_subsume_map[trel][2*drel+bout]; >> +? if (br == na) { >> +??? return NULL; >> +? } >> >> New function can return intcon(0/1) or bol(or NULL?) and the caller >> decides whether the update is needed. >> >>> On 12/11/2019 15:16, Patric Hedlin wrote: >>>> Dear all, >>>> >>>> I would like to ask for help to review the following change/update: >>>> >>>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8220376 >>>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8220376/ >>>> >>>> 8220376: C2: Int >0 not recognized as !=0 for div by 0 check >>>> >>>> ??? Adding a simple subsumption test to IfNode::Ideal to enable a local >>>> ??? short-circuit for (obviously) redundant if-nodes. >>>> >>>> Testing: hs-tier1-4, hs-precheckin-comp >>>> >>>> >>>> Best regards, >>>> Patric > From sandhya.viswanathan at intel.com Tue Nov 26 16:08:59 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 26 Nov 2019 16:08:59 +0000 Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 In-Reply-To: References: <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com> Message-ID: Hi Vladimir, Please do sponsor the patch. I am not a committer. Best Regards, Sandhya -----Original Message----- From: Vladimir Ivanov Sent: Tuesday, November 26, 2019 1:26 AM To: Viswanathan, Sandhya ; Vladimir Kozlov Cc: hotspot compiler Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 > http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.02/ Looks good. Testing results (hs-precheckin-comp,hs-tier1,hs-tier2) are clean. Best regards, Vladimir Ivanov > -----Original Message----- > From: Vladimir Ivanov > Sent: Monday, November 25, 2019 12:23 PM > To: Viswanathan, Sandhya ; Vladimir > Kozlov > Cc: hotspot compiler > Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when > UseAVX=3 is specified after JDK-8221092 > > >> From the source code, it looks like os_supports_avx_vectors() does two different things based on the AVX level. >> If AVX=3, it checks if save/restore of zmm worked properly. >> If AVX=1 or 2, it checks is save/restore of ymm worked properly. >> Since only save/restore of zmm is done conditionally for Skylake, the problem is only with AVX=3. And that is what this patch is trying to fix. > > Ok, now I see how it works: > > os_supports_avx_vectors() uses supports_evex() and supports_avx() to dispatch which inspect supported CPU features: > > static bool supports_avx() { return (_features & CPU_AVX) != 0; } > > static bool supports_evex() { return (_features & CPU_AVX512F) != > 0; } > > But VM_Version::get_processor_features() clears detected features depending on UseAVX level. > > So, as you described os_supports_avx_vectors() goes through different paths between UseAVX=3 and UseAVX=1/2. > >> The rest we are entering 8221092 discussion. >> I think Vivek's intent there was not do any AVX 512 instructions at all if AVX < 3 to overcome performance regressions observed by Scott Oaks. > > My best guess is it helped analysis by eliminating all problematic instructions. > > Anyway, I'm fine with leaving that code. (Though it would be nice to > simply get rid of it.) > > It looks like all the code after start_simd_check (and checks to > legacy_save_restore) can go under use_evex check. And it'll make the code clearer (I spent too much time reasoning about interactions between use_evex and FLAG_IS_DEFAULT(UseAVX) you added). > > Otherwise, looks good. > > Best regards, > Vladimir Ivanov > >> -----Original Message----- >> From: Vladimir Ivanov >> Sent: Monday, November 25, 2019 10:15 AM >> To: Viswanathan, Sandhya ; Vladimir >> Kozlov >> Cc: hotspot compiler >> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when >> UseAVX=3 is specified after JDK-8221092 >> >> Thanks for the clarifications, Sandhya! >> >> Some more questions inlined. >>> The bug happens like below: >>> >>> * User specifies -XX:UseAVX=3 on command line. >>> * On Skylake platform due to the following lines: >>> __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset()))); >>> __ movl(rax, Address(rsi, 0)); >>> __ cmpl(rax, 0x50654); // If it is Skylake >>> __ jcc(Assembler::equal, legacy_setup); >>> The zmm registers are not saved/restored. >>> * This results in os_supports_avx_vectors() returning false when called from VM_Version::get_processor_features(): >>> int max_vector_size = 0; >>> if (UseSSE < 2) { >>> // Vectors (in XMM) are only supported with SSE2+ >>> // SSE is always 2 on x64. >>> max_vector_size = 0; >>> } else if (UseAVX == 0 || !os_supports_avx_vectors()) { >>> // 16 byte vectors (in XMM) are supported with SSE2+ >>> max_vector_size = 16; ====> This is the point where max_vector_size is set to 16 >>> } else else if (UseAVX == 1 || UseAVX == 2) { >>> ... >>> } >>> * And so we get UseAVX=3 and max_vector_size = 16. >>> >>> So the fix is to save/restore zmm registers when the flag is not default and user specifies UseAVX > 2. >> >> Unfortunately, I'm still confused :-( >> >> If it turns os_supports_avx_vectors() == false, why -XX:UseAVX=1 and >> -XX:UseAVX=2 aren't affected on Skylake as well? >> >>> On your question regarding why 8221092 needs the code you conditionally exclude: >>> This was introduced so as not to do any AVX512 execution if not required. ZMM register save/restore uses AVX512 instruction. >> >> Are you talking about completely avoiding execution of AVX512 instructions on Skylakes if UseAVX < 3? >> >> Considering it is in generate_get_cpu_info() which is run only once early at startup, what kind of effects you intend to avoid? >> >> Best regards, >> Vladimir Ivanov >> >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: Vladimir Ivanov >>> Sent: Friday, November 22, 2019 4:39 AM >>> To: Viswanathan, Sandhya ; hotspot >>> compiler >>> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when >>> UseAVX=3 is specified after JDK-8221092 >>> >>> Hi Sandhya, >>> >>>> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092. >>>> >>>> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument. >>>> This should automatically result in MaxVectorSize being set to 64 bytes. >>>> >>>> However post JDK-8221092, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes. >>> >>> Please, elaborate how it happens and how legacy_setup affects it? >>> >>> Why 8221092 needs the code you conditionally exclude. >>> Why the following isn't enough? >>> >>> if (FLAG_IS_DEFAULT(UseAVX)) { >>> FLAG_SET_DEFAULT(UseAVX, use_avx_limit); >>> + if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE && >>> _stepping < 5) { >>> + FLAG_SET_DEFAULT(UseAVX, 2); //Set UseAVX=2 for Skylake >>> + } >>> } else if (UseAVX > use_avx_limit) { >>> >>> Best regards, >>> Vladimir Ivanov >>> From vladimir.x.ivanov at oracle.com Tue Nov 26 16:20:48 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 26 Nov 2019 19:20:48 +0300 Subject: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 In-Reply-To: References: <8b771153-d4d8-fe55-f47d-fb3c4f23ef26@oracle.com> Message-ID: <8085a2bd-5204-e0d4-473f-39b19fcf20f4@oracle.com> Pushed [1] Best regards, Vladimir Ivanov [1] http://hg.openjdk.java.net/jdk/jdk/rev/dff8053bdb74 On 26.11.2019 19:08, Viswanathan, Sandhya wrote: > Hi Vladimir, > > Please do sponsor the patch. I am not a committer. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Ivanov > Sent: Tuesday, November 26, 2019 1:26 AM > To: Viswanathan, Sandhya ; Vladimir Kozlov > Cc: hotspot compiler > Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092 > > >> http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.02/ > > Looks good. > > Testing results (hs-precheckin-comp,hs-tier1,hs-tier2) are clean. > > Best regards, > Vladimir Ivanov > >> -----Original Message----- >> From: Vladimir Ivanov >> Sent: Monday, November 25, 2019 12:23 PM >> To: Viswanathan, Sandhya ; Vladimir >> Kozlov >> Cc: hotspot compiler >> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when >> UseAVX=3 is specified after JDK-8221092 >> >> >>> From the source code, it looks like os_supports_avx_vectors() does two different things based on the AVX level. >>> If AVX=3, it checks if save/restore of zmm worked properly. >>> If AVX=1 or 2, it checks is save/restore of ymm worked properly. >>> Since only save/restore of zmm is done conditionally for Skylake, the problem is only with AVX=3. And that is what this patch is trying to fix. >> >> Ok, now I see how it works: >> >> os_supports_avx_vectors() uses supports_evex() and supports_avx() to dispatch which inspect supported CPU features: >> >> static bool supports_avx() { return (_features & CPU_AVX) != 0; } >> >> static bool supports_evex() { return (_features & CPU_AVX512F) != >> 0; } >> >> But VM_Version::get_processor_features() clears detected features depending on UseAVX level. >> >> So, as you described os_supports_avx_vectors() goes through different paths between UseAVX=3 and UseAVX=1/2. >> >>> The rest we are entering 8221092 discussion. >>> I think Vivek's intent there was not do any AVX 512 instructions at all if AVX < 3 to overcome performance regressions observed by Scott Oaks. >> >> My best guess is it helped analysis by eliminating all problematic instructions. >> >> Anyway, I'm fine with leaving that code. (Though it would be nice to >> simply get rid of it.) >> >> It looks like all the code after start_simd_check (and checks to >> legacy_save_restore) can go under use_evex check. And it'll make the code clearer (I spent too much time reasoning about interactions between use_evex and FLAG_IS_DEFAULT(UseAVX) you added). >> >> Otherwise, looks good. >> >> Best regards, >> Vladimir Ivanov >> >>> -----Original Message----- >>> From: Vladimir Ivanov >>> Sent: Monday, November 25, 2019 10:15 AM >>> To: Viswanathan, Sandhya ; Vladimir >>> Kozlov >>> Cc: hotspot compiler >>> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when >>> UseAVX=3 is specified after JDK-8221092 >>> >>> Thanks for the clarifications, Sandhya! >>> >>> Some more questions inlined. >>>> The bug happens like below: >>>> >>>> * User specifies -XX:UseAVX=3 on command line. >>>> * On Skylake platform due to the following lines: >>>> __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset()))); >>>> __ movl(rax, Address(rsi, 0)); >>>> __ cmpl(rax, 0x50654); // If it is Skylake >>>> __ jcc(Assembler::equal, legacy_setup); >>>> The zmm registers are not saved/restored. >>>> * This results in os_supports_avx_vectors() returning false when called from VM_Version::get_processor_features(): >>>> int max_vector_size = 0; >>>> if (UseSSE < 2) { >>>> // Vectors (in XMM) are only supported with SSE2+ >>>> // SSE is always 2 on x64. >>>> max_vector_size = 0; >>>> } else if (UseAVX == 0 || !os_supports_avx_vectors()) { >>>> // 16 byte vectors (in XMM) are supported with SSE2+ >>>> max_vector_size = 16; ====> This is the point where max_vector_size is set to 16 >>>> } else else if (UseAVX == 1 || UseAVX == 2) { >>>> ... >>>> } >>>> * And so we get UseAVX=3 and max_vector_size = 16. >>>> >>>> So the fix is to save/restore zmm registers when the flag is not default and user specifies UseAVX > 2. >>> >>> Unfortunately, I'm still confused :-( >>> >>> If it turns os_supports_avx_vectors() == false, why -XX:UseAVX=1 and >>> -XX:UseAVX=2 aren't affected on Skylake as well? >>> >>>> On your question regarding why 8221092 needs the code you conditionally exclude: >>>> This was introduced so as not to do any AVX512 execution if not required. ZMM register save/restore uses AVX512 instruction. >>> >>> Are you talking about completely avoiding execution of AVX512 instructions on Skylakes if UseAVX < 3? >>> >>> Considering it is in generate_get_cpu_info() which is run only once early at startup, what kind of effects you intend to avoid? >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> >>>> Best Regards, >>>> Sandhya >>>> >>>> >>>> -----Original Message----- >>>> From: Vladimir Ivanov >>>> Sent: Friday, November 22, 2019 4:39 AM >>>> To: Viswanathan, Sandhya ; hotspot >>>> compiler >>>> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when >>>> UseAVX=3 is specified after JDK-8221092 >>>> >>>> Hi Sandhya, >>>> >>>>> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092. >>>>> >>>>> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument. >>>>> This should automatically result in MaxVectorSize being set to 64 bytes. >>>>> >>>>> However post JDK-8221092, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes. >>>> >>>> Please, elaborate how it happens and how legacy_setup affects it? >>>> >>>> Why 8221092 needs the code you conditionally exclude. >>>> Why the following isn't enough? >>>> >>>> if (FLAG_IS_DEFAULT(UseAVX)) { >>>> FLAG_SET_DEFAULT(UseAVX, use_avx_limit); >>>> + if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE && >>>> _stepping < 5) { >>>> + FLAG_SET_DEFAULT(UseAVX, 2); //Set UseAVX=2 for Skylake >>>> + } >>>> } else if (UseAVX > use_avx_limit) { >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> From martin.doerr at sap.com Tue Nov 26 16:21:58 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 26 Nov 2019 16:21:58 +0000 Subject: RFR(XS): 8234645: ARM32: C1: PatchingStub for field access: not enough bytes Message-ID: Hi Christoph, thanks for reporting the bug https://bugs.openjdk.java.net/browse/JDK-8234645 Seems like the large offset fix in mem2reg and reg2mem missed the first patching stub in the long/double cases. We should have nop padding there, too (for same reason as for the 2nd patching stub). NativeMovRegMem should always consist of 2 instructions on arm32 in order to support larger offsets. Webrev: http://cr.openjdk.java.net/~mdoerr/8234645_arm_padding/webrev.00/ May I ask you to test this fix? We don't have arm32 in our testing landscape. Best regards, Martin From tobias.hartmann at oracle.com Tue Nov 26 16:30:40 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 26 Nov 2019 17:30:40 +0100 Subject: [14] RFR (L): 8234391: C2: Generic vector operands In-Reply-To: <89904467-5010-129f-6f61-e279cce8936a@oracle.com> References: <89904467-5010-129f-6f61-e279cce8936a@oracle.com> Message-ID: <0e506a31-9107-0354-ffce-308d332cbfbd@oracle.com> Hi Vladimir, hard to review the .ad file changes but this looks good to me. Just noticed some code style issues: - x86_64.ad:11284, 11346, 11410, 11426: indentation is wrong (already before your fix) - whitespace in matcher.cpp:2598/2601 can be removed Best regards, Tobias On 19.11.19 15:30, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8234391/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8234391 > > Introduce generic vector operands and migrate existing usages from fixed sized operands (vec[SDXYZ]) > to generic ones. > > (It's an updated version of generic vector support posted for review in August, 2019 [1] [2]. AD > instruction merges will be handled separately.) > > On a high-level it is organized as follows: > > ? (1) all AD instructions in x86.ad/x86_64.ad/x86_32.ad use vec/legVec; > > ? (2) at runtime, right after matching is over, a special pass is performed which does: > > ????? * replaces vecOper with vec[SDXYZ] depending on mach node type > ???????? - vector mach nodes capute bottom_type() of their ideal prototype; > > ????? * eliminates redundant reg-to-reg vector moves (MoveVec2Leg /MoveLeg2Vec) > ???????? - matcher needs them, but they are useless for register allocator (moreover, may cause > additional spills); > > > ?? (3) after post-selection pass is over, all mach nodes should have fixed-size vector operands. > > > Some details: > > ?? (1) vec and legVec are marked as "dynamic" operands, so post-selection rewriting works > > > ?? (2) new logic is guarded by new matcher flag (Matcher::supports_generic_vector_operands) which is > enabled only on x86 > > > ?? (3) post-selection analysis is implemented as a single pass over the graph and processing > individual nodes using their own (for DEF operands) or their inputs (USE operands) bottom_type() > (which is an instance of TypeVect) > > > ?? (4) most of the analysis is cross-platform and interface with platform-specific code through 3 > methods: > > ???? static bool is_generic_reg2reg_move(MachNode* m); > ???? // distinguishes MoveVec2Leg/MoveLeg2Vec nodes > > ???? static bool is_generic_vector(MachOper* opnd); > ???? // distinguishes vec/legVec operands > > ???? static MachOper* clone_generic_vector_operand(MachOper* generic_opnd, uint ideal_reg); > ???? // constructs fixed-sized vector operand based on ideal reg > ???? //?? vec??? + Op_Vec[SDXYZ] =>??? vec[SDXYZ] > ???? //?? legVec + Op_Vec[SDXYZ] => legVec[SDXYZ] > > > ?? (5) TEMP operands are handled specially: > ???? - TEMP uses max_vector_size() to determine what fixed-sized operand to use > ???????? * it is needed to cover reductions which don't produce vectors but scalars > ???? - TEMP_DEF inherits fixed-sized operand type from DEF; > > > ?? (6) there is limited number of special cases for mach nodes in Matcher::get_vector_operand_helper: > > ?????? - RShiftCntV/RShiftCntV: though it reports wide vector type as Node::bottom_type(), its > ideal_reg is VecS! But for vector nodes only Node::bottom_type() is captured during matching and not > ideal_reg(). > > ?????? - vshiftcntimm: chain instructions which convert scalar to vector don't have vector type. > > > ?? (7) idealreg2regmask initialization logic is adjusted to handle generic vector operands (see > Matcher::get_vector_regmask) > > > ?? (8) operand renaming in x86_32.ad & x86_64.ad to avoid name conflicts with new vec/legVec operands > > > ?? (9) x86_64.ad: all TEMP usages of vecS/legVecS are replaced with regD/legRegD > ????? - it aligns the code between x86_64.ad and x86_32.ad > ????? - strictly speaking, it's illegal to use vector operands on a non-vector node (e.g., > string_inflate) unless its usage is guarded by C2 vector support checks (-XX:MaxVectorSize=0) > > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, ? > > Testing: tier1-tier4, jtreg compiler tests on KNL and SKL, > ???????? performance testing (SPEC* + Octane + micros / G1 + ParGC). > > Best regards, > Vladimir Ivanov > > [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html > > [2] http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf From vladimir.x.ivanov at oracle.com Tue Nov 26 16:52:37 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 26 Nov 2019 19:52:37 +0300 Subject: [14] RFR (L): 8234391: C2: Generic vector operands In-Reply-To: <0e506a31-9107-0354-ffce-308d332cbfbd@oracle.com> References: <89904467-5010-129f-6f61-e279cce8936a@oracle.com> <0e506a31-9107-0354-ffce-308d332cbfbd@oracle.com> Message-ID: <77c66795-ecd4-1461-4151-66c82b4e554c@oracle.com> Thanks, Tobias. > hard to review the .ad file changes but this looks good to me. Yes, the changes are massive, but mostly straightforward. In addition to the code needed for generic vector operand, the changes are: (1) (x86.ad) switching vec[SDXYZ] => vec and legVec[SDXYZ] => legVec; (2) (x86.ad) after the switch reduction instructions need additional checks on input vector size (example [1]); (3) (x86_64.ad/x86_32.ad) rename operands with "vec" name to avoid name conflicts with vec operand; (4) (x86_64.ad) migrate compressed string instructions from legVecS to legRegD to keep it working when vector support is explicitly disabled (e.g., -XX:MaxVectorSize=0) (1) is needed to avoid explicit moves between concrete (legVec/vec[SDXYZ]) and generic vectors (vec/legVec). (2)-(4) could have been reviewed/integrated separately, but they did look trivial enough to avoid the effort. > Just noticed some code style issues: > - x86_64.ad:11284, 11346, 11410, 11426: indentation is wrong (already before your fix) > - whitespace in matcher.cpp:2598/2601 can be removed Good catch. Best regards, Vladimir Ivanov [1] -instruct rvadd2F_reduction_reg(regF dst, vecD src2, vecD tmp) %{ - predicate(UseAVX > 0); +instruct rvadd2F_reduction_reg(regF dst, vec src2, vec tmp) %{ + predicate(UseAVX > 0 && n->in(2)->bottom_type()->is_vect()->length() == 2); > On 19.11.19 15:30, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234391/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8234391 >> >> Introduce generic vector operands and migrate existing usages from fixed sized operands (vec[SDXYZ]) >> to generic ones. >> >> (It's an updated version of generic vector support posted for review in August, 2019 [1] [2]. AD >> instruction merges will be handled separately.) >> >> On a high-level it is organized as follows: >> >> ? (1) all AD instructions in x86.ad/x86_64.ad/x86_32.ad use vec/legVec; >> >> ? (2) at runtime, right after matching is over, a special pass is performed which does: >> >> ????? * replaces vecOper with vec[SDXYZ] depending on mach node type >> ???????? - vector mach nodes capute bottom_type() of their ideal prototype; >> >> ????? * eliminates redundant reg-to-reg vector moves (MoveVec2Leg /MoveLeg2Vec) >> ???????? - matcher needs them, but they are useless for register allocator (moreover, may cause >> additional spills); >> >> >> ?? (3) after post-selection pass is over, all mach nodes should have fixed-size vector operands. >> >> >> Some details: >> >> ?? (1) vec and legVec are marked as "dynamic" operands, so post-selection rewriting works >> >> >> ?? (2) new logic is guarded by new matcher flag (Matcher::supports_generic_vector_operands) which is >> enabled only on x86 >> >> >> ?? (3) post-selection analysis is implemented as a single pass over the graph and processing >> individual nodes using their own (for DEF operands) or their inputs (USE operands) bottom_type() >> (which is an instance of TypeVect) >> >> >> ?? (4) most of the analysis is cross-platform and interface with platform-specific code through 3 >> methods: >> >> ???? static bool is_generic_reg2reg_move(MachNode* m); >> ???? // distinguishes MoveVec2Leg/MoveLeg2Vec nodes >> >> ???? static bool is_generic_vector(MachOper* opnd); >> ???? // distinguishes vec/legVec operands >> >> ???? static MachOper* clone_generic_vector_operand(MachOper* generic_opnd, uint ideal_reg); >> ???? // constructs fixed-sized vector operand based on ideal reg >> ???? //?? vec??? + Op_Vec[SDXYZ] =>??? vec[SDXYZ] >> ???? //?? legVec + Op_Vec[SDXYZ] => legVec[SDXYZ] >> >> >> ?? (5) TEMP operands are handled specially: >> ???? - TEMP uses max_vector_size() to determine what fixed-sized operand to use >> ???????? * it is needed to cover reductions which don't produce vectors but scalars >> ???? - TEMP_DEF inherits fixed-sized operand type from DEF; >> >> >> ?? (6) there is limited number of special cases for mach nodes in Matcher::get_vector_operand_helper: >> >> ?????? - RShiftCntV/RShiftCntV: though it reports wide vector type as Node::bottom_type(), its >> ideal_reg is VecS! But for vector nodes only Node::bottom_type() is captured during matching and not >> ideal_reg(). >> >> ?????? - vshiftcntimm: chain instructions which convert scalar to vector don't have vector type. >> >> >> ?? (7) idealreg2regmask initialization logic is adjusted to handle generic vector operands (see >> Matcher::get_vector_regmask) >> >> >> ?? (8) operand renaming in x86_32.ad & x86_64.ad to avoid name conflicts with new vec/legVec operands >> >> >> ?? (9) x86_64.ad: all TEMP usages of vecS/legVecS are replaced with regD/legRegD >> ????? - it aligns the code between x86_64.ad and x86_32.ad >> ????? - strictly speaking, it's illegal to use vector operands on a non-vector node (e.g., >> string_inflate) unless its usage is guarded by C2 vector support checks (-XX:MaxVectorSize=0) >> >> >> Contributed-by: Jatin Bhateja >> Reviewed-by: vlivanov, sviswanathan, ? >> >> Testing: tier1-tier4, jtreg compiler tests on KNL and SKL, >> ???????? performance testing (SPEC* + Octane + micros / G1 + ParGC). >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html >> >> [2] http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf From vladimir.kozlov at oracle.com Tue Nov 26 17:05:22 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 26 Nov 2019 09:05:22 -0800 Subject: RFR: 823480: [TESTBUG] LoopRotateBadNodeBudget fails for client VMs due to Unrecognized VM option PartialPeelNewPhiDelta In-Reply-To: <20191126133017.4DEC711CEFA@aojmv0009> References: <20191126133017.4DEC711CEFA@aojmv0009> Message-ID: <46e4c188-2d15-bd33-010b-cb203741cbd8@oracle.com> Good. Thanks, Vladimir On 11/26/19 5:28 AM, christoph.goettschkes at microdoc.com wrote: > Hi, > > please review the following small changeset which fixes the test > test/hotspot/jtreg/compiler/loopopts/LoopRotateBadNodeBudget.java for > client VMs. > I simply added vm.compiler2.enabled to the requires tag, since the > original bug only appeared with the server JIT: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234807 > Webrev: http://cr.openjdk.java.net/~cgo/8234807/webrev.00/ > > Bug which introduced the issue: > https://bugs.openjdk.java.net/browse/JDK-8231565 > > Thanks, > Christoph > From nils.eliasson at oracle.com Tue Nov 26 17:59:44 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 26 Nov 2019 18:59:44 +0100 Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles In-Reply-To: <58fd5565-4342-ea70-511d-bace68308391@oracle.com> References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com> <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com> <49878c82-d174-b2c9-219a-0fe561155c0d@oracle.com> <58fd5565-4342-ea70-511d-bace68308391@oracle.com> Message-ID: <3fcbaa19-1068-d30d-96b4-3f8a52089d28@oracle.com> Hi, On 2019-11-26 13:02, Vladimir Ivanov wrote: > >> http://cr.openjdk.java.net/~neliasso/8234520/webrev.03/ > > ==================================================================== > > src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp: > > +? // The currently modeled arraycopy-clone_basic doesn't have the > base pointers for src and dst, > +? // rather point at the start of the payload. > +? Node* src_base = get_base_for_arracycopy_clone(phase, src); > +? Node* dst_base = get_base_for_arracycopy_clone(phase, dst); > + > +? // The size must also be increased to match the instance size. > +? int base_off = BarrierSetC2::extract_base_offset(false); > +? Node* full_size = phase->transform_later(new AddLNode(size, > phase->longcon(base_off >> LogBytesPerLong))); > ?? Node* const call = phase->make_leaf_call(ctrl, > ??????????????????????????????????????????? mem, > ??????????????????????????????????????????? clone_type(), > > ZBarrierSetRuntime::clone_addr(), > "ZBarrierSetRuntime::clone", > ??????????????????????????????????????????? TypeRawPtr::BOTTOM, > -?????????????????????????????????????????? src, > -?????????????????????????????????????????? dst, > -?????????????????????????????????????????? size); > +?????????????????????????????????????????? src_base, > +?????????????????????????????????????????? dst_base, > +?????????????????????????????????????????? full_size, > +?????????????????????????????????????????? phase->top()); > > > Do you see any problems with copying object header? It won't be copied. It's just that the runtime call expects the arguments to be pointers to the objects, and the size of the object. It's the same function that is used by a call to the native clone impl. (jvm.cpp:720) > > ==================================================================== > > The rest are minor comments: > > src/hotspot/share/gc/shared/c2/barrierSetC2.cpp: > > -void BarrierSetC2::clone(GraphKit* kit, Node* src, Node* dst, Node* > size, bool is_array) const { > -? // Exclude the header but include array length to copy by 8 bytes > words. > -? // Can't use base_offset_in_bytes(bt) since basic type is unknown. > +int BarrierSetC2::extract_base_offset(bool is_array) { > > ... > > +void BarrierSetC2::clone(GraphKit* kit, Node* src, Node* dst, Node* > size, bool is_array) const { > +? // Exclude the header but include array length to copy by 8 bytes > words. > +? // Can't use base_offset_in_bytes(bt) since basic type is unknown. > +? int base_off = extract_base_offset(is_array); > > > I'd leave the comment in BarrierSetC2::extract_base_offset(). After > the refactoring it looks confusing in BarrierSetC2::clone(). > > Also, considering it's used from zBarrierSetC2.cpp, it would be nice > to have a more descriptive name. > BarrierSetC2::arraycopy_payload_base_offset(bool is_array) maybe? Sounds reasonable. > > > ==================================================================== > > src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp: > > +Node* get_base_for_arracycopy_clone(PhaseMacroExpand* phase, Node* n) { > > Should get_base_for_arracycopy_clone be static? Also, would be nice if > the name reflects that it works on instances and not arrays. Fixed > > ==================================================================== > > +? // The currently modeled arraycopy-clone_basic doesn't have the > base pointers for src and dst, > +? // rather point at the start of the payload. > +? Node* src_base = get_base_for_arracycopy_clone(phase, src); > +? Node* dst_base = get_base_for_arracycopy_clone(phase, dst); > > Another thing: "base" in zBarrierSetC2.cpp and in BarrierSetC2 has > opposite meaning which is confusing. > > Renaming src/dst<->src_base/dst_base in > ZBarrierSetC2::clone_at_expansion() would improve things. Done. > > ==================================================================== > > -? if (src->bottom_type()->isa_aryptr()) { > +? if (ac->is_clone_array()) { > ???? // Clone primitive array > > Is the comment valid? Doesn't it cover object array case as well? Nope - object arrays will be handled as clone_oop_array which uses the normal object copy which already applies the appropriate load barriers. The special case for ZGC is the cloning of instances because we don't know where to apply load barriers without looking up the type. (Except for clone on small objects and short arrays that are transformed to a series of load-stores.) http://cr.openjdk.java.net/~neliasso/8234520/webrev.04 Thank you for the feedback! // Nils > > > Best regards, > Vladimir Ivanov > >> >> Regards, >> >> Nils >> >> >> On 2019-11-21 12:53, Nils Eliasson wrote: >>> I updated this to version 2. >>> >>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/ >>> >>> I found a problen running >>> compiler/arguments/TestStressReflectiveCode.java >>> >>> Even though the clone was created as a oop clone, the type node type >>> returns isa_aryprt. This is caused by the src ptr not being the base >>> pointer. Until I fix that I wanted a more robust test. >>> >>> In this webrev I split up the is_clonebasic into is_clone_oop and >>> is_clone_array. (is_clone_oop_array is already there). Having a >>> complete set with the three clone types allows for a robust test and >>> easy verification. (The three variants end up in different paths >>> with different GCs). >>> >>> Regards, >>> >>> Nils >>> >>> >>> On 2019-11-20 15:25, Nils Eliasson wrote: >>>> Hi, >>>> >>>> I found a few bugs after the enabling of the clone intrinsic in ZGC. >>>> >>>> 1) The arraycopy clone_basic has the parameters adjusted to work as >>>> a memcopy. For an oop the src is pointing inside the oop to where >>>> we want to start copying. But when we want to do a runtime call to >>>> clone - the parameters are supposed to be the actual src oop and >>>> dst oop, and the size should be the instance size. >>>> >>>> For now I have made a workaround. What should be done later is >>>> using the offset in the arraycopy node to encode where the payload >>>> is, so that the base pointers are always correct. But that would >>>> require changes to the BarrierSet classes of all GCs. So I leave >>>> that for next release. >>>> >>>> 2) The size parameter of the TypeFunc for the runtime call has the >>>> wrong type. It was originally Long but missed the upper Half, it >>>> was fixed to INT (JDK-8233834), but that is wrong and causes the >>>> compiles to be skipped. We didn't notice that since they failed >>>> silently. That is also why we didn't notice problem #1 too. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8234520 >>>> >>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/ >>>> >>>> Please review! >>>> >>>> Nils >>>> From christoph.goettschkes at microdoc.com Wed Nov 27 09:09:20 2019 From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com) Date: Wed, 27 Nov 2019 10:09:20 +0100 Subject: RFR(XS): 8234645: ARM32: C1: PatchingStub for field access: not enough bytes In-Reply-To: References: Message-ID: Hi Martin, thanks for looking into the issue. I executed the tier1 hotspot tests with your patch applied and it looks good. Tests which were failing before pass now. I have some, but unrelated failures, which I will start looking into now. One small remark: When I encountered the problem, I was very much confused by the constant instruction size of 8 [1]. Maybe it would be more clear to define the instruction size to be 4, and instead, in "num_bytes_to_end_of_patch()", return "instruction_size * 2"? Best regards, Christoph [1] https://hg.openjdk.java.net/jdk/jdk/file/a2441ac23eeb/src/hotspot/cpu/arm/nativeInst_arm_32.hpp "Doerr, Martin" wrote on 2019-11-26 17:21:58: > From: "Doerr, Martin" > To: "christoph.goettschkes at microdoc.com" > , "'hotspot-compiler- > dev at openjdk.java.net'" > Date: 2019-11-26 17:22 > Subject: RFR(XS): 8234645: ARM32: C1: PatchingStub for field access: > not enough bytes > > Hi Christoph, > > thanks for reporting the bug > https://bugs.openjdk.java.net/browse/JDK-8234645 > > Seems like the large offset fix in mem2reg and reg2mem missed the first > patching stub in the long/double cases. > We should have nop padding there, too (for same reason as for the 2nd > patching stub). > NativeMovRegMem should always consist of 2 instructions on arm32 in > order to support larger offsets. > > Webrev: > http://cr.openjdk.java.net/~mdoerr/8234645_arm_padding/webrev.00/ > > May I ask you to test this fix? We don?t have arm32 in our testing landscape. > > Best regards, > Martin > From christoph.goettschkes at microdoc.com Wed Nov 27 09:22:33 2019 From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com) Date: Wed, 27 Nov 2019 10:22:33 +0100 Subject: RFR: 823480: [TESTBUG] LoopRotateBadNodeBudget fails for client VMs due to Unrecognized VM option PartialPeelNewPhiDelta In-Reply-To: <46e4c188-2d15-bd33-010b-cb203741cbd8@oracle.com> References: <20191126133017.4DEC711CEFA@aojmv0009> <46e4c188-2d15-bd33-010b-cb203741cbd8@oracle.com> Message-ID: Hi Vladimir, thanks for the review. I update the webrev, please find the changeset here: https://cr.openjdk.java.net/~cgo/8234807/webrev.02/jdk-jdk.changeset Could you please sponsor this change for me and commit it into the repository? Thanks, Christoph "hotspot-compiler-dev" wrote on 2019-11-26 18:05:22: > From: Vladimir Kozlov > To: hotspot-compiler-dev at openjdk.java.net > Date: 2019-11-26 18:06 > Subject: Re: RFR: 823480: [TESTBUG] LoopRotateBadNodeBudget fails for > client VMs due to Unrecognized VM option PartialPeelNewPhiDelta > Sent by: "hotspot-compiler-dev" > > Good. > > Thanks, > Vladimir > > On 11/26/19 5:28 AM, christoph.goettschkes at microdoc.com wrote: > > Hi, > > > > please review the following small changeset which fixes the test > > test/hotspot/jtreg/compiler/loopopts/LoopRotateBadNodeBudget.java for > > client VMs. > > I simply added vm.compiler2.enabled to the requires tag, since the > > original bug only appeared with the server JIT: > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234807 > > Webrev: http://cr.openjdk.java.net/~cgo/8234807/webrev.00/ > > > > Bug which introduced the issue: > > https://bugs.openjdk.java.net/browse/JDK-8231565 > > > > Thanks, > > Christoph > > > From christoph.goettschkes at microdoc.com Wed Nov 27 09:57:27 2019 From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com) Date: Wed, 27 Nov 2019 10:57:27 +0100 Subject: RFR: 8234894: [TESTBUG] TestEliminateLocksOffCrash fails for client VMs due to Unrecognized VM option EliminateLocks Message-ID: Hi, please review the following small changeset which fixes the test test/hotspot/jtreg/compiler/escapeAnalysis/TestEliminateLocksOffCrash.java for client VMs. I added the requirement "vm.compiler2.enabled & !vm.graal.enabled", since the original bug is for C2 only. Also, the flag EliminateLocks is only defined in c2_globals.hpp, and neither for C1, nor for JVMCI. Bug: https://bugs.openjdk.java.net/browse/JDK-8234894 Webrev: https://cr.openjdk.java.net/~cgo/8234894/webrev.00 Bug which introduced the issue: https://bugs.openjdk.java.net/browse/JDK-8227384 Thanks, Christoph From nils.eliasson at oracle.com Wed Nov 27 10:50:29 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 27 Nov 2019 11:50:29 +0100 Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles In-Reply-To: <2154a53d-4d36-d26f-9155-5c955796f566@oracle.com> References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com> <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com> <2154a53d-4d36-d26f-9155-5c955796f566@oracle.com> Message-ID: <79bc1866-2293-37a6-9780-af24a01eb699@oracle.com> Hi Per, Here is an update webrev with your fixes included: http://cr.openjdk.java.net/~neliasso/8234520/webrev.05/ I chose to go with CloneInst, and ClonePrimArray. CopyOf, CopyOfRange - like ArrayCopy, doesn't have specialized versions, rather check the type when expanding, and of course - there are no versions for instances. But there are differences to what guards they need, and when the guards are expanded. I would need to dig down into the details to determine if it could be simplified. Regards, Nils On 2019-11-26 14:29, Per Liden wrote: > Hi Nils, > > On 11/21/19 12:53 PM, Nils Eliasson wrote: >> I updated this to version 2. >> >> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/ >> >> I found a problen running >> compiler/arguments/TestStressReflectiveCode.java >> >> Even though the clone was created as a oop clone, the type node type >> returns isa_aryprt. This is caused by the src ptr not being the base >> pointer. Until I fix that I wanted a more robust test. >> >> In this webrev I split up the is_clonebasic into is_clone_oop and >> is_clone_array. (is_clone_oop_array is already there). Having a >> complete set with the three clone types allows for a robust test and >> easy verification. (The three variants end up in different paths with >> different GCs). > > A couple of suggestions: > > 1) Instead of > > ?CloneOop > ?CloneArray > ?CloneOopArray > > I think we should call the three types: > > ?CloneInstance > ?CloneTypeArray > ?CloneOopArray > > Since CloneOop is not actually cloning an oop, but an object/instance. > And being explicit about TypeArray seems like a good thing to avoid > any confusion about the difference compared to CloneOopArray. I guess > PrimArray would be an alternative to TypeArray. > > And of course, if we change this then the is_clone/set_clone functions > should follow the same naming convention. > > Btw, what about CopyOf and CopyOfRange? Don't they also come in Oop > and Type versions, or are we handling those differently in some way? > Looking at the code it looks like they are only used for the oop array > case? > > > 2) In zBarrierSerC2.cpp, do you mind if we do like this instead? I > find that quite a bit easier to read. > > [...] > ? const Type** domain_fields = TypeTuple::fields(4); > ? domain_fields[TypeFunc::Parms + 0] = TypeInstPtr::NOTNULL;? // src > ? domain_fields[TypeFunc::Parms + 1] = TypeInstPtr::NOTNULL;? // dst > ? domain_fields[TypeFunc::Parms + 2] = TypeLong::LONG;??????? // size > lower > ? domain_fields[TypeFunc::Parms + 3] = Type::HALF;??????????? // size > upper > ? const TypeTuple* domain = TypeTuple::make(TypeFunc::Parms + 4, > domain_fields); > [...] > > > 3) I'd also like to add some const, adjust indentation, etc, in a few > places. Instead of listing them here I made a patch, which goes on top > of yours. This patch also adjusts 2) above. Just shout if you have any > objections. > > http://cr.openjdk.java.net/~pliden/8234520/webrev.03-review > > /Per > >> >> Regards, >> >> Nils >> >> >> On 2019-11-20 15:25, Nils Eliasson wrote: >>> Hi, >>> >>> I found a few bugs after the enabling of the clone intrinsic in ZGC. >>> >>> 1) The arraycopy clone_basic has the parameters adjusted to work as >>> a memcopy. For an oop the src is pointing inside the oop to where we >>> want to start copying. But when we want to do a runtime call to >>> clone - the parameters are supposed to be the actual src oop and dst >>> oop, and the size should be the instance size. >>> >>> For now I have made a workaround. What should be done later is using >>> the offset in the arraycopy node to encode where the payload is, so >>> that the base pointers are always correct. But that would require >>> changes to the BarrierSet classes of all GCs. So I leave that for >>> next release. >>> >>> 2) The size parameter of the TypeFunc for the runtime call has the >>> wrong type. It was originally Long but missed the upper Half, it was >>> fixed to INT (JDK-8233834), but that is wrong and causes the >>> compiles to be skipped. We didn't notice that since they failed >>> silently. That is also why we didn't notice problem #1 too. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8234520 >>> >>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/ >>> >>> Please review! >>> >>> Nils >>> From aph at redhat.com Wed Nov 27 10:54:36 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 27 Nov 2019 10:54:36 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> Message-ID: <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com> On 11/26/19 9:25 AM, Nick Gasson wrote: > Oddly enough the test case runtime/memory/ReadFromNoaccessArea.java now > hits this. I see: > > CompressedKlassPointers::base() => 0xffff0b4b5000 > CompressedKlassPointers::shift() => 3 This is bad. Can you have a look at the allocation code to see why the search for an appropriate address range fails? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From martin.doerr at sap.com Wed Nov 27 11:07:10 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 27 Nov 2019 11:07:10 +0000 Subject: RFR(XS): 8234645: ARM32: C1: PatchingStub for field access: not enough bytes In-Reply-To: <14cb9da700b74a74b52d409aa1a018b7@DEROTE13EDGE02.wdf.sap.corp> References: <14cb9da700b74a74b52d409aa1a018b7@DEROTE13EDGE02.wdf.sap.corp> Message-ID: Hi Christoph, thank you for testing it. > One small remark: > When I encountered the problem, I was very much confused by the constant > instruction size of 8 [1]. Maybe it would be more clear to define the > instruction size to be 4, and instead, in "num_bytes_to_end_of_patch()", > return "instruction_size * 2"? "instruction_size" refers to the size of a "NativeMovRegMem". It's used this way on other platforms, too. I think the confusion comes from the fact that the "NativeMovRegMem" processor instructions get emitted individually on arm32, not by a "NativeMovRegMem" emitter which doesn't exist. Emitting less than 8 bytes is always a bug (see set_offset which needs that space and you don't know in advance if the second 4 bytes will be written or not). I need a 2nd review. I'll request an 11.0.6 backport afterwards. Best regards, Martin > -----Original Message----- > From: christoph.goettschkes at microdoc.com > > Sent: Mittwoch, 27. November 2019 10:09 > To: Doerr, Martin > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(XS): 8234645: ARM32: C1: PatchingStub for field access: not > enough bytes > > Hi Martin, > > thanks for looking into the issue. > I executed the tier1 hotspot tests with your patch applied and it looks > good. Tests which were failing before pass now. I have some, but unrelated > failures, which I will start looking into now. > > One small remark: > When I encountered the problem, I was very much confused by the constant > instruction size of 8 [1]. Maybe it would be more clear to define the > instruction size to be 4, and instead, in "num_bytes_to_end_of_patch()", > return "instruction_size * 2"? > > Best regards, > Christoph > > [1] > https://hg.openjdk.java.net/jdk/jdk/file/a2441ac23eeb/src/hotspot/cpu/ar > m/nativeInst_arm_32.hpp > > "Doerr, Martin" wrote on 2019-11-26 17:21:58: > > > From: "Doerr, Martin" > > To: "christoph.goettschkes at microdoc.com" > > , "'hotspot-compiler- > > dev at openjdk.java.net'" > > Date: 2019-11-26 17:22 > > Subject: RFR(XS): 8234645: ARM32: C1: PatchingStub for field access: > > not enough bytes > > > > Hi Christoph, > > > > thanks for reporting the bug > > https://bugs.openjdk.java.net/browse/JDK-8234645 > > > > Seems like the large offset fix in mem2reg and reg2mem missed the first > > patching stub in the long/double cases. > > We should have nop padding there, too (for same reason as for the 2nd > > patching stub). > > NativeMovRegMem should always consist of 2 instructions on arm32 in > > order to support larger offsets. > > > > Webrev: > > http://cr.openjdk.java.net/~mdoerr/8234645_arm_padding/webrev.00/ > > > > May I ask you to test this fix? We don?t have arm32 in our testing > landscape. > > > > Best regards, > > Martin > > From boris.ulasevich at bell-sw.com Wed Nov 27 12:55:18 2019 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Wed, 27 Nov 2019 15:55:18 +0300 Subject: RFR(S) 8234891: AArch64: Fix build failure after JDK-8234387 Message-ID: <847197ac-5969-900c-840a-61c475b19d84@bell-sw.com> Hi, Please review the fix in aarch64.ad to address the build issue "Ideal node missing: CmpOp" raised after recent change in C2. The intuitive operand name case correction CmpOp->cmpOp fixes the build, but leads to unworkable jvm. Removing the match rule works good and jdk/hotspot tests are Ok. http://bugs.openjdk.java.net/browse/JDK-8234891 http://cr.openjdk.java.net/~bulasevich/8234891/webrev.00 ARM32 build fails too. I will fix the problem in arm32.ad file separately. thanks, Boris From christoph.goettschkes at microdoc.com Wed Nov 27 12:55:47 2019 From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com) Date: Wed, 27 Nov 2019 13:55:47 +0100 Subject: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for client VMs due to Unrecognized VM option LoopUnrollLimit Message-ID: Hi, please review the following small changeset which fixes the test test/hotspot/jtreg/compiler/loopopts/TestDivZeroCheckControl.java for client VMs. I added the requirement "vm.compiler2.enabled & !vm.graal.enabled", since the original bug is for C2 only. If I understand the bug description well, the provided test case works solely because of the provided "LoopUnrollLimit" flag, which is only valid for C2. Bug: https://bugs.openjdk.java.net/browse/JDK-8234906 Webrev: https://cr.openjdk.java.net/~cgo/8234906/webrev.00/ Bug which introduced the issue: https://bugs.openjdk.java.net/browse/JDK-8229496 Thanks, Christoph From vladimir.x.ivanov at oracle.com Wed Nov 27 13:23:57 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 27 Nov 2019 16:23:57 +0300 Subject: RFR(S) 8234891: AArch64: Fix build failure after JDK-8234387 In-Reply-To: <847197ac-5969-900c-840a-61c475b19d84@bell-sw.com> References: <847197ac-5969-900c-840a-61c475b19d84@bell-sw.com> Message-ID: The fix looks good and trivial. Best regards, Vladimir Ivanov On 27.11.2019 15:55, Boris Ulasevich wrote: > Hi, > > Please review the fix in aarch64.ad to address the build issue "Ideal > node missing: CmpOp" raised after recent change in C2. The intuitive > operand name case correction CmpOp->cmpOp fixes the build, but leads to > unworkable jvm. Removing the match rule works good and jdk/hotspot tests > are Ok. > > http://bugs.openjdk.java.net/browse/JDK-8234891 > http://cr.openjdk.java.net/~bulasevich/8234891/webrev.00 > > ARM32 build fails too. I will fix the problem in arm32.ad file separately. > > thanks, > Boris From vladimir.x.ivanov at oracle.com Wed Nov 27 13:54:38 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 27 Nov 2019 16:54:38 +0300 Subject: [14] RFR (S): 8231430: C2: Memory stomp in max_array_length() for T_ILLEGAL type Message-ID: <00ab4462-ca1d-7e37-6e92-aca8e975e79d@oracle.com> http://cr.openjdk.java.net/~vlivanov/8231430/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8231430 There's a memory stomp happening in max_array_length() for T_ILLEGAL type. T_ILLEGAL type arises as an element basic type for a merge of 2 primitive arrays (bottom[]). max_array_length() does some input normalization (T_ILLEGAL => T_BYTE), but first it acquires a reference to the a cache slot which is out-of-bounds (T_ILLEGAL = 99 vs T_CONFLICT = 19). I was able to reproduce the problem as a corruption of one of the OOPs in Universe::_mirrors array which happened to be put close enough to max_array_length_cache in memory. I propose to completely remove the cache. arrayOopDesc::max_array_length() doesn't look too expensive and the method is not used on a hot path anywhere. Also, I put an assert for T_VOID, T_CONFLICT, T_NARROWKLASS cases, but left the logic there (=> T_BYTE) to get more testing before removing them. Testing: hs-precheckin-comp, tier1-5. Best regards, Vladimir Ivanov From stuart.monteith at linaro.org Wed Nov 27 16:06:44 2019 From: stuart.monteith at linaro.org (Stuart Monteith) Date: Wed, 27 Nov 2019 16:06:44 +0000 Subject: [aarch64-port-dev ] RFR(S) 8234891: AArch64: Fix build failure after JDK-8234387 In-Reply-To: References: <847197ac-5969-900c-840a-61c475b19d84@bell-sw.com> Message-ID: Thanks Boris - looks good to me. Please ask me or my fellow Arm engineers if you should need any help testing in future. On Wed, 27 Nov 2019 at 13:26, Vladimir Ivanov wrote: > > The fix looks good and trivial. > > Best regards, > Vladimir Ivanov > > On 27.11.2019 15:55, Boris Ulasevich wrote: > > Hi, > > > > Please review the fix in aarch64.ad to address the build issue "Ideal > > node missing: CmpOp" raised after recent change in C2. The intuitive > > operand name case correction CmpOp->cmpOp fixes the build, but leads to > > unworkable jvm. Removing the match rule works good and jdk/hotspot tests > > are Ok. > > > > http://bugs.openjdk.java.net/browse/JDK-8234891 > > http://cr.openjdk.java.net/~bulasevich/8234891/webrev.00 > > > > ARM32 build fails too. I will fix the problem in arm32.ad file separately. > > > > thanks, > > Boris From vladimir.kozlov at oracle.com Wed Nov 27 19:20:04 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 27 Nov 2019 11:20:04 -0800 Subject: [14] RFR (S): 8231430: C2: Memory stomp in max_array_length() for T_ILLEGAL type In-Reply-To: <00ab4462-ca1d-7e37-6e92-aca8e975e79d@oracle.com> References: <00ab4462-ca1d-7e37-6e92-aca8e975e79d@oracle.com> Message-ID: <83147646-d353-1f46-f50b-8c0edf16645f@oracle.com> There is assert in arrayOopDesc::max_array_length() which checks '< T_CONFLICT'. Next assert 'type2aelembytes(type) != 0' will be triggered for T_VOID. The assert in type2aelembytes() will be triggered for T_ADDRESS since allow_address argument is false by default. Which leaves T_METADATA and T_NARROWKLASS to check for since they can't be elements of array. Right? May be we should have permanent guarantee() in TypeAryPtr::max_array_length() for all types which we don't expect to see and not temporary assert(). Thanks, Vladimir On 11/27/19 5:54 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8231430/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8231430 > > There's a memory stomp happening in max_array_length() for T_ILLEGAL type. T_ILLEGAL type arises as an element basic > type for a merge of 2 primitive arrays (bottom[]). max_array_length() does some input normalization (T_ILLEGAL => > T_BYTE), but first it acquires a reference to the a cache slot which is out-of-bounds (T_ILLEGAL = 99 vs T_CONFLICT = 19). > > I was able to reproduce the problem as a corruption of one of the OOPs in Universe::_mirrors array which happened to be > put close enough to max_array_length_cache in memory. > > I propose to completely remove the cache. arrayOopDesc::max_array_length() doesn't look too expensive and the method is > not used on a hot path anywhere. > > Also, I put an assert for T_VOID, T_CONFLICT, T_NARROWKLASS cases, but left the logic there (=> T_BYTE) to get more > testing before removing them. > > Testing: hs-precheckin-comp, tier1-5. > > Best regards, > Vladimir Ivanov From vladimir.kozlov at oracle.com Wed Nov 27 19:54:02 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 27 Nov 2019 11:54:02 -0800 Subject: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for client VMs due to Unrecognized VM option LoopUnrollLimit In-Reply-To: <20191127125735.B9BE111F377@aojmv0009> References: <20191127125735.B9BE111F377@aojmv0009> Message-ID: Hi Christoph I was about suggest IgnoreUnrecognizedVMOptions flag but remembered discussion about 8231954 fix. But I think the test should be run with Graal - it does have OSR compilation and we need to test it. We can do it by splitting test runs (duplicate @test block with different run flags) to have 2 tests with different flags and conditions. See [1]. For existing @run block we use `@requires vm.compiler2.enabled` and for new without LoopUnrollLimit - `vm.graal.enabled`. Thanks, Vladimir [1] test/hotspot/jtreg/runtime/exceptionMsgs/ArrayIndexOutOfBoundsException/ArrayIndexOutOfBoundsExceptionTest.java On 11/27/19 4:55 AM, christoph.goettschkes at microdoc.com wrote: > Hi, > > please review the following small changeset which fixes the test > test/hotspot/jtreg/compiler/loopopts/TestDivZeroCheckControl.java for > client VMs. > I added the requirement "vm.compiler2.enabled & !vm.graal.enabled", since > the original bug is for C2 only. If I understand the bug description well, > the provided test case works solely because of the provided > "LoopUnrollLimit" flag, which is only valid for C2. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234906 > Webrev: https://cr.openjdk.java.net/~cgo/8234906/webrev.00/ > > Bug which introduced the issue: > https://bugs.openjdk.java.net/browse/JDK-8229496 > > Thanks, > Christoph > From Charlie.Gracie at microsoft.com Wed Nov 27 19:59:11 2019 From: Charlie.Gracie at microsoft.com (Charlie Gracie) Date: Wed, 27 Nov 2019 19:59:11 +0000 Subject: scalar replacement of arrays affected by minor changes to surrounding code In-Reply-To: <87a7azid39.fsf@redhat.com> References: <87d0fvidxs.fsf@redhat.com> <87a7azid39.fsf@redhat.com> Message-ID: <21F5DF77-4DA9-437C-9DEB-CFABFC4C9C19@microsoft.com> Hi, Recently, I was analyzing the performance of a workload and noticed a significant amount of allocation from a single array allocation site. This allocation does not survive the simple method which allocates it. A simplified version of the code is very similar to the code being described by Govind. From this conversation I noticed JDK-8231291 and the patch provided by Roland. I applied the patch to tip and after a few minor additions it indeed provides measurable improvements to the workload I was measuring and passes all of my testing. Roland is there anything I could do to help get this type of change into tip and any appropriate back ports? I have tested some workloads with this patch and I think I covered the required checks but I could have easily missed something. Here is a modified version of Roland's original patch: diff --git a/src/hotspot/share/opto/compile.cpp b/src/hotspot/share/opto/compile.cpp index 08552e0756..98bf37f6e0 100644 --- a/src/hotspot/share/opto/compile.cpp +++ b/src/hotspot/share/opto/compile.cpp @@ -2292,7 +2292,7 @@ void Compile::Optimize() { if (has_loops()) { // Cleanup graph (remove dead nodes). TracePhase tp("idealLoop", &timers[_t_idealLoop]); - PhaseIdealLoop::optimize(igvn, LoopOptsNone); + PhaseIdealLoop::optimize(igvn, LoopOptsMaxUnroll); if (major_progress()) print_method(PHASE_PHASEIDEAL_BEFORE_EA, 2); if (failing()) return; } diff --git a/src/hotspot/share/opto/compile.hpp b/src/hotspot/share/opto/compile.hpp index 50906800ba..710aec8e3a 100644 --- a/src/hotspot/share/opto/compile.hpp +++ b/src/hotspot/share/opto/compile.hpp @@ -93,6 +93,7 @@ struct Final_Reshape_Counts; enum LoopOptsMode { LoopOptsDefault, LoopOptsNone, + LoopOptsMaxUnroll, LoopOptsShenandoahExpand, LoopOptsShenandoahPostExpand, LoopOptsSkipSplitIf, diff --git a/src/hotspot/share/opto/loopnode.cpp b/src/hotspot/share/opto/loopnode.cpp index 2038a4c8f9..3c13a9aa10 100644 --- a/src/hotspot/share/opto/loopnode.cpp +++ b/src/hotspot/share/opto/loopnode.cpp @@ -2796,6 +2796,7 @@ bool PhaseIdealLoop::process_expensive_nodes() { void PhaseIdealLoop::build_and_optimize(LoopOptsMode mode) { bool do_split_ifs = (mode == LoopOptsDefault); bool skip_loop_opts = (mode == LoopOptsNone); + bool do_max_unroll = (mode == LoopOptsMaxUnroll); int old_progress = C->major_progress(); uint orig_worklist_size = _igvn._worklist.size(); @@ -2859,7 +2860,7 @@ void PhaseIdealLoop::build_and_optimize(LoopOptsMode mode) { BarrierSetC2* bs = BarrierSet::barrier_set()->barrier_set_c2(); // Nothing to do, so get out - bool stop_early = !C->has_loops() && !skip_loop_opts && !do_split_ifs && !_verify_me && !_verify_only && + bool stop_early = !C->has_loops() && !skip_loop_opts && !do_max_unroll && !do_split_ifs && !_verify_me && !_verify_only && !bs->is_gc_specific_loop_opts_pass(mode); bool do_expensive_nodes = C->should_optimize_expensive_nodes(_igvn); bool strip_mined_loops_expanded = bs->strip_mined_loops_expanded(mode); @@ -3009,6 +3010,44 @@ void PhaseIdealLoop::build_and_optimize(LoopOptsMode mode) { return; } + if (do_max_unroll) { + for (LoopTreeIterator iter(_ltree_root); !iter.done(); iter.next()) { + IdealLoopTree* lpt = iter.current(); + if (lpt->is_innermost() && lpt->_allow_optimizations && !lpt->_has_call && lpt->is_counted()) { + lpt->compute_trip_count(this); + + if (lpt->do_one_iteration_loop(this)) { + continue; + } + + if (lpt->do_remove_empty_loop(this)) { + continue; + } + AutoNodeBudget node_budget(this); + CountedLoopNode *cl = lpt->_head->as_CountedLoop(); + // Do not do anything for invalid, pre or post loops + if (cl->is_valid_counted_loop() && !cl->is_pre_loop() && !cl->is_post_loop()) { + // Compute loop trip count from profile data + lpt->compute_profile_trip_cnt(this); + if (cl->is_normal_loop()) { + if (lpt->policy_maximally_unroll(this)) { + memset(worklist.adr(), 0, worklist.Size()*sizeof(Node*)); + do_maximally_unroll(lpt, worklist); + } + } + } + } + } + + C->restore_major_progress(old_progress); + _igvn.optimize(); + + if (C->log() != NULL) { + log_loop_tree(_ltree_root, _ltree_root, C->log()); + } + return; + } + if (bs->optimize_loops(this, mode, visited, nstack, worklist)) { _igvn.optimize(); if (C->log() != NULL) { Thanks, Charlie Gracie ?On 2019-09-20, 4:38 AM, "hotspot-compiler-dev on behalf of Roland Westrelin" wrote: > The problem with that one is that EA would need the loop to be fully > unrolled to eliminate the allocation but that only happens after EA. So > it's a pass ordering problem. We already run a pass of loop > optimizations before EA so it seems we could have it take care of fully > unrolling the loop. I created JDK-8231291 for that one. Roland. From john.r.rose at oracle.com Thu Nov 28 03:05:11 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 27 Nov 2019 19:05:11 -0800 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: <2eb6f996-cd85-00cb-b795-dd2eefabd10b@oracle.com> References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> <2686ec0d-3dac-a0c3-d2c3-0fa5211bc07b@oracle.com> <2eb6f996-cd85-00cb-b795-dd2eefabd10b@oracle.com> Message-ID: I too would expect these things to be placed in globalDefinitions.hpp rather than align.hpp, for the reasons already given: There are already similar functions in there. I?m not against cleaning up globalDefinitions.hpp, but until there?s a better proposal on the table, let?s stay with it for functions like this. ? John On Nov 26, 2019, at 2:23 AM, David Holmes wrote: > > On 26/11/2019 8:06 pm, Claes Redestad wrote: >> On 2019-11-26 10:50, David Holmes wrote: >>> Hi Claes, >>> >>> Just some high-level comments >>> >>> - should next_power_of_two be defined in globalDefinitions.hpp along side the related functionality ie is_power_of_two ? >> I thought we are trying to move things _out_ of globalDefinitions. I > > We are? I don't recall hearing that. But wherever these go seems they all belong together. > >> agree align.hpp might not be the best place, either, though.. > > I thought align.hpp as strange place too. :) > >>> >>> - can next_power_of_two build on the existing log2_* functions (or vice versa)? >> Yes, log2_intptr et al could probably be tamed to do a single step >> operation, although we'd need to add 64-bit implementations in >> count_leading_zeros. At least these log2_* functions already deal with >> overflows without looping forever. >>> >>> - do the existing ZUtils not cover the same general area? >>> >>> ./share/gc/z/zUtils.inline.hpp >>> >>> inline size_t ZUtils::round_up_power_of_2(size_t value) { >>> assert(value != 0, "Invalid value"); >>> >>> if (is_power_of_2(value)) { >>> return value; >>> } >>> >>> return (size_t)1 << (log2_intptr(value) + 1); >>> } >>> >>> inline size_t ZUtils::round_down_power_of_2(size_t value) { >>> assert(value != 0, "Invalid value"); >>> return (size_t)1 << log2_intptr(value); >>> } >> round_up_power_of_2 is similar, but not identical (next_power_of_two doesn't care if the value is already a power of 2, nor should it). > > Okay but seems perhaps these should also be moved out of ZUtils and co-located with the other "power of two" functions. > > Cheers, > David > ----- > >> /Claes From thomas.stuefe at gmail.com Thu Nov 28 07:34:49 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 28 Nov 2019 08:34:49 +0100 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> Message-ID: Hi Claes, I think this is useful. Why not a 64bit variant too? If you do not want to go through the hassle of providing a count_leading_zeros(uint64_t), you could call the 32bit variant twice and take care of endianness for the caller. -- In inline int32_t next_power_of_two(int32_t value) , should we weed out negative input values right away instead of asserting at the end of the function? -- The functions will always return the next power of two, even if the input is a power of two - e.g. "2" for "1". Is that intended? It would be nice to have an API comment in the header describing these corner cases (what happens for negative input, what happens if input is power 2). -- The patch can cause subtle differences in some caller code, I think, if input value is a power of 2 already. See e.g: http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/libadt/dict.cpp.udiff.html - i=16; - while( i < size ) i <<= 1; + i = MAX2(16, (int)next_power_of_two(size)); If i == size == 16, old code would keep i==16, new code would come to i==32, I think. http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/opto/phaseX.cpp.udiff.html //------------------------------round_up--------------------------------------- // Round up to nearest power of 2 -uint NodeHash::round_up( uint x ) { - x += (x>>2); // Add 25% slop - if( x <16 ) return 16; // Small stuff - uint i=16; - while( i < x ) i <<= 1; // Double to fit - return i; // Return hash table size +uint NodeHash::round_up(uint x) { + x += (x >> 2); // Add 25% slop + return MAX2(16U, next_power_of_two(x)); } same here. If x == 16, before we'd return 16, now 32. --- http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/runtime/threadSMR.cpp.udiff.html I admit I do not understand the current coding :) I do not believe it works for all input values, e.g. were get_java_thread_list()->length()==1025, we'd get 1861 - if I am not mistaken. Your code is definitely clearer but not equivalent to the old one. --- In the end, I wonder whether we should have two kind of APIs, or a parameter, distinguishing between "next power of 2" and "next power of 2 unless input value is already power of 2". Cheers, Thomas On Tue, Nov 26, 2019 at 10:42 AM Claes Redestad wrote: > Hi, > > in various places in the hotspot we have custom code to calculate the > next power of two, some of which have potential to go into an infinite > loop in case of an overflow. > > This patch proposes adding next_power_of_two utility methods which > avoid infinite loops on overflow, while providing slightly more > efficient code in most cases. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234331 > Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/ > > Testing: tier1-3 > > Thanks! > > /Claes > From thomas.stuefe at gmail.com Thu Nov 28 07:44:25 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 28 Nov 2019 08:44:25 +0100 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> Message-ID: p.s. I think it would be good to have some gtests for these functions, especially to test corner cases. Cheers, Thomas On Thu, Nov 28, 2019 at 8:34 AM Thomas St?fe wrote: > Hi Claes, > > I think this is useful. Why not a 64bit variant too? If you do not want to > go through the hassle of providing a count_leading_zeros(uint64_t), you > could call the 32bit variant twice and take care of endianness for the > caller. > > -- > > In inline int32_t next_power_of_two(int32_t value) , should we weed out > negative input values right away instead of asserting at the end of the > function? > > -- > > The functions will always return the next power of two, even if the input > is a power of two - e.g. "2" for "1". Is that intended? It would be nice to > have an API comment in the header describing these corner cases (what > happens for negative input, what happens if input is power 2). > > -- > > The patch can cause subtle differences in some caller code, I think, if > input value is a power of 2 already. See e.g: > > > http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/libadt/dict.cpp.udiff.html > > - i=16; > - while( i < size ) i <<= 1; > + i = MAX2(16, (int)next_power_of_two(size)); > > If i == size == 16, old code would keep i==16, new code would come to > i==32, I think. > > > http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/opto/phaseX.cpp.udiff.html > > > //------------------------------round_up--------------------------------------- > // Round up to nearest power of 2 > -uint NodeHash::round_up( uint x ) { > - x += (x>>2); // Add 25% slop > - if( x <16 ) return 16; // Small stuff > - uint i=16; > - while( i < x ) i <<= 1; // Double to fit > - return i; // Return hash table size > +uint NodeHash::round_up(uint x) { > + x += (x >> 2); // Add 25% slop > + return MAX2(16U, next_power_of_two(x)); > } > > same here. If x == 16, before we'd return 16, now 32. > > --- > > > http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/runtime/threadSMR.cpp.udiff.html > > I admit I do not understand the current coding :) I do not believe it > works for all input values, e.g. were > get_java_thread_list()->length()==1025, we'd get 1861 - if I am not > mistaken. Your code is definitely clearer but not equivalent to the old one. > > --- > > In the end, I wonder whether we should have two kind of APIs, or a > parameter, distinguishing between "next power of 2" and "next power of 2 > unless input value is already power of 2". > > Cheers, Thomas > > > > > > On Tue, Nov 26, 2019 at 10:42 AM Claes Redestad > wrote: > >> Hi, >> >> in various places in the hotspot we have custom code to calculate the >> next power of two, some of which have potential to go into an infinite >> loop in case of an overflow. >> >> This patch proposes adding next_power_of_two utility methods which >> avoid infinite loops on overflow, while providing slightly more >> efficient code in most cases. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8234331 >> Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/ >> >> Testing: tier1-3 >> >> Thanks! >> >> /Claes >> > From nick.gasson at arm.com Thu Nov 28 07:50:32 2019 From: nick.gasson at arm.com (Nick Gasson) Date: Thu, 28 Nov 2019 15:50:32 +0800 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com> Message-ID: <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com> Hi Andrew, >> >> CompressedKlassPointers::base() => 0xffff0b4b5000 >> CompressedKlassPointers::shift() => 3 > > This is bad. Can you have a look at the allocation code to see why the search > for an appropriate address range fails? > We have a loop in Metaspace::allocate_metaspace_compressed_klass_ptrs that searches for a 4G aligned location for the compressed class space on AArch64, but this search is not done if CDS is in use and the archive was loaded successfully, because in that case the class space has already been mapped (i.e. `metaspace_rs.is_reserved()' is true). Previously it was only possible to map the CDS archive at 0x800000000. The compressed class base is set to the start of this region which happens to be 4G aligned so our MacroAssembler::load_klass optimisation applies and we emit the short code sequence. With the recent change in 8231610, if the CDS archive cannot be mapped at that address (e.g. because of ASLR or because the heap is mapped there) then the CDS archive will be relocated to an arbitrary address decided by mmap. That's where the oddly-aligned compressed klass base above comes from. This causes MacroAssembler::load_klass to emit the inefficient sequence which then overflows the buffer for the itable stub (the worst-case size estimate there is wrong, which needs to be fixed separately). A minimal way to reproduce this is: $ java -XX:HeapBaseMinAddress=33G -Xshare:on -Xlog:cds=debug -version ... [0.050s][info ][cds] CDS archive was created with max heap size = 128M, and the following configuration: [0.050s][info ][cds] narrow_klass_base = 0x0000fffec7507000, narrow_klass_shift = 3 ... # guarantee(masm->pc() <= s->code_end()) failed: itable #2: overflowed buffer, estimated len: 180, actual len: 184, overrun: 4 I suggest we move the 4G-aligned search from allocate_metaspace_compressed_klass_ptrs into its own function that can then be called from MetaspaceShared::reserve_shared_space when requested_address==NULL (i.e. the fallback path when mmap at 0x800000000 fails). If you're happy with this I'll make a patch for review? Thanks, Nick From ioi.lam at oracle.com Thu Nov 28 08:19:36 2019 From: ioi.lam at oracle.com (Ioi Lam) Date: Thu, 28 Nov 2019 00:19:36 -0800 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com> <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com> Message-ID: On 11/27/19 11:50 PM, Nick Gasson wrote: > Hi Andrew, > >>> >>> CompressedKlassPointers::base() => 0xffff0b4b5000 >>> CompressedKlassPointers::shift() => 3 >> >> This is bad. Can you have a look at the allocation code to see why >> the search >> for an appropriate address range fails? >> > > We have a loop in Metaspace::allocate_metaspace_compressed_klass_ptrs > that searches for a 4G aligned location for the compressed class space > on AArch64, but this search is not done if CDS is in use and the > archive was loaded successfully, because in that case the class space > has already been mapped (i.e. `metaspace_rs.is_reserved()' is true). > > Previously it was only possible to map the CDS archive at 0x800000000. > The compressed class base is set to the start of this region which > happens to be 4G aligned so our MacroAssembler::load_klass > optimisation applies and we emit the short code sequence. > > With the recent change in 8231610, if the CDS archive cannot be mapped > at that address (e.g. because of ASLR or because the heap is mapped > there) then the CDS archive will be relocated to an arbitrary address > decided by mmap. That's where the oddly-aligned compressed klass base > above comes from. This causes MacroAssembler::load_klass to emit the > inefficient sequence which then overflows the buffer for the itable > stub (the worst-case size estimate there is wrong, which needs to be > fixed separately). > > A minimal way to reproduce this is: > > $ java -XX:HeapBaseMinAddress=33G -Xshare:on -Xlog:cds=debug -version > ... > [0.050s][info ][cds] CDS archive was created with max heap size = > 128M, and the following configuration: > [0.050s][info ][cds]???? narrow_klass_base = 0x0000fffec7507000, > narrow_klass_shift = 3 > ... > #? guarantee(masm->pc() <= s->code_end()) failed: itable #2: > overflowed buffer, estimated len: 180, actual len: 184, overrun: 4 > > > I suggest we move the 4G-aligned search from > allocate_metaspace_compressed_klass_ptrs into its own function that > can then be called from MetaspaceShared::reserve_shared_space when > requested_address==NULL (i.e. the fallback path when mmap at > 0x800000000 fails). If you're happy with this I'll make a patch for > review? > You can also force CDS archive relocation with -XX:+UnlockDiagnosticVMOptions -XX:ArchiveRelocationMode=1. That way you can test the behavior with the default heap settings. Thanks - Ioi > > Thanks, > Nick From boris.ulasevich at bell-sw.com Thu Nov 28 08:42:37 2019 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Thu, 28 Nov 2019 11:42:37 +0300 Subject: RFR(S) 8234893: ARM32: build failure after JDK-8234387 Message-ID: Hi, Please review the fix in arm.ad to address the ARM32 build issue "Ideal node missing". The fix is just trivial adding missing declarations: R8RegP, R9RegP, R12RegP, SPRegP. jdk/hotspot jtreg tests are Ok. http://bugs.openjdk.java.net/browse/JDK-8234893 http://cr.openjdk.java.net/~bulasevich/8234893/webrev.00 thanks, Boris From boris.ulasevich at bell-sw.com Thu Nov 28 08:42:43 2019 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Thu, 28 Nov 2019 11:42:43 +0300 Subject: [aarch64-port-dev ] RFR(S) 8234891: AArch64: Fix build failure after JDK-8234387 In-Reply-To: References: <847197ac-5969-900c-840a-61c475b19d84@bell-sw.com> Message-ID: <6e5d8aec-538e-20c7-a035-b04ff7e8691f@bell-sw.com> Thank you! On 27.11.2019 19:06, Stuart Monteith wrote: > Thanks Boris - looks good to me. > Please ask me or my fellow Arm engineers if you should need any help > testing in future. > > On Wed, 27 Nov 2019 at 13:26, Vladimir Ivanov > wrote: >> >> The fix looks good and trivial. >> >> Best regards, >> Vladimir Ivanov >> >> On 27.11.2019 15:55, Boris Ulasevich wrote: >>> Hi, >>> >>> Please review the fix in aarch64.ad to address the build issue "Ideal >>> node missing: CmpOp" raised after recent change in C2. The intuitive >>> operand name case correction CmpOp->cmpOp fixes the build, but leads to >>> unworkable jvm. Removing the match rule works good and jdk/hotspot tests >>> are Ok. >>> >>> http://bugs.openjdk.java.net/browse/JDK-8234891 >>> http://cr.openjdk.java.net/~bulasevich/8234891/webrev.00 >>> >>> ARM32 build fails too. I will fix the problem in arm32.ad file separately. >>> >>> thanks, >>> Boris From vladimir.x.ivanov at oracle.com Thu Nov 28 08:55:49 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 28 Nov 2019 11:55:49 +0300 Subject: RFR(S) 8234893: ARM32: build failure after JDK-8234387 In-Reply-To: References: Message-ID: Looks good. Best regards, Vladimir Ivanov On 28.11.2019 11:42, Boris Ulasevich wrote: > Hi, > > Please review the fix in arm.ad to address the ARM32 build issue "Ideal > node missing". The fix is just trivial adding missing declarations: > R8RegP, R9RegP, R12RegP, SPRegP. jdk/hotspot jtreg tests are Ok. > > http://bugs.openjdk.java.net/browse/JDK-8234893 > http://cr.openjdk.java.net/~bulasevich/8234893/webrev.00 > > thanks, > Boris From christian.hagedorn at oracle.com Thu Nov 28 09:24:06 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 28 Nov 2019 10:24:06 +0100 Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and C1/Xint Message-ID: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com> Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8233033 http://cr.openjdk.java.net/~chagedorn/8233033/webrev.00/ The C2 compiled code produces a wrong result for 'iFld' in the test case. It is -8 instead of -7. The loop in the test case is partially peeled and then unswitched. The wrong result is produced because a wrong state is transferred to the interpreter when an uncommon trap is hit in the C2 compiled code in the fast version of the unswitched loop. The problem is when unswitching the loop, we clone the original loop predicates for the slow and fast version of the loop [1] but we do not account for partially peeled statements that are control dependent on the loop predicates (i.e. need to be executed after the predicates). As a result, these are executed before the cloned loop predicates. The situation of the test case method PartialPeelingUnswitch::test() is visualized in [2]. IfTrue 118, the entry control of the original loop which follows right after the loop predicates, has an output edge to the StoreI 353 node. This node belongs to the "iFld += -7" statement which was partially peeled before. When creating the slow version of the loop and cloning the predicates in PhaseIdealLoop::create_slow_version_of_loop(), this control dependency is lost. StoreI 353 is still dependent on IfTrue 118 instead of IfTrue 472 (fast loop entry control) and IfTrue 476 (slow loop entry control). The original loop predicates are later removed and thus, when hitting the uncommon trap in the fast loop, we accidentally executed "iFld += -7" (StoreI 353) already even though the interpreter assumes C2 has not executed any statements in the loop. As a result, "iFld += -7" is executed twice in a row which produces a wrong result. The fix is to replace the control input of all statements that have a control input from the original loop entry control (and are not the "loop selection" IfNode) with the fast and slow entry control, respectively. Since the statements cannot have two control inputs they need to be cloned together with all following nodes on a path to the loop phi nodes. The output of the last node before a loop phi on such a path needs to be adjusted to only point to the phi node belonging to the fast loop. The last node on the cloned path is set to the phi node belonging to the slow loop. The fix is visualized in [3]. The control input of StoreI 353 is now the entry control of the fast loop (IfTrue 472) and its output only points to the corresponding Phi 442 of the fast loop. The same was done for the cloned node StoreI 476 of StoreI 353 for the slow loop. This bug can also be reproduced with JDK 11. Should we target this fix to 14 or defer it to 15 (since it's a more complex one)? Thank you! Best regards, Christian [1] http://hg.openjdk.java.net/jdk/jdk/file/6f42d2a19117/src/hotspot/share/opto/loopUnswitch.cpp#l272 [2] https://bugs.openjdk.java.net/secure/attachment/85593/wrong_dependencies.png [3] https://bugs.openjdk.java.net/secure/attachment/85592/fixed_dependencies.png From vladimir.x.ivanov at oracle.com Thu Nov 28 09:46:39 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 28 Nov 2019 12:46:39 +0300 Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and C1/Xint In-Reply-To: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com> References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com> Message-ID: (while I'm looking into proposed fix) > This bug can also be reproduced with JDK 11. Should we target this fix > to 14 or defer it to 15 (since it's a more complex one)? Silently producing erroneous code is a more serious issue than a JVM crash (either during compilation or while executing generated code). Instead of just shutting down the process, it silently continues running and can stay unnoticed for a long time. Also, it makes it hard estimate the impact. If we aren't confident in the stability of the fix, then it's worth considering implementing a stop-the-gap solution first (detect problematic code shape and either avoid the transformation or even bail out the compilation) while continue working on a proper fix. But we should try our best to fix the problematic behavior in a prompt manner (in 14 and backported to 11). Best regards, Vladimir Ivanov PS: please, update bug synopsis with a summary of the problem. From aph at redhat.com Thu Nov 28 10:03:18 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 28 Nov 2019 10:03:18 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com> <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com> Message-ID: <52a223ea-13d3-d7ef-72c7-bd0f590c12be@redhat.com> On 11/28/19 7:50 AM, Nick Gasson wrote: > Hi Andrew, > >>> >>> CompressedKlassPointers::base() => 0xffff0b4b5000 >>> CompressedKlassPointers::shift() => 3 >> >> This is bad. Can you have a look at the allocation code to see why the search >> for an appropriate address range fails? > > We have a loop in Metaspace::allocate_metaspace_compressed_klass_ptrs > that searches for a 4G aligned location for the compressed class space > on AArch64, but this search is not done if CDS is in use and the archive > was loaded successfully, because in that case the class space has > already been mapped (i.e. `metaspace_rs.is_reserved()' is true). Right. At the time I wrote that code, CDS was not much used by anything, so I thought of it as a mariganl use case. > Previously it was only possible to map the CDS archive at 0x800000000. > The compressed class base is set to the start of this region which > happens to be 4G aligned so our MacroAssembler::load_klass optimisation > applies and we emit the short code sequence. > > With the recent change in 8231610, if the CDS archive cannot be mapped > at that address (e.g. because of ASLR or because the heap is mapped > there) then the CDS archive will be relocated to an arbitrary address > decided by mmap. That's where the oddly-aligned compressed klass base > above comes from. This causes MacroAssembler::load_klass to emit the > inefficient sequence which then overflows the buffer for the itable stub > (the worst-case size estimate there is wrong, which needs to be fixed > separately). Correcting the stub size is a minor tidy-up which does not really need its own Bug ID. > A minimal way to reproduce this is: > > $ java -XX:HeapBaseMinAddress=33G -Xshare:on -Xlog:cds=debug -version > ... > [0.050s][info ][cds] CDS archive was created with max heap size = 128M, > and the following configuration: > [0.050s][info ][cds] narrow_klass_base = 0x0000fffec7507000, > narrow_klass_shift = 3 > ... > # guarantee(masm->pc() <= s->code_end()) failed: itable #2: overflowed > buffer, estimated len: 180, actual len: 184, overrun: 4 > > > I suggest we move the 4G-aligned search from > allocate_metaspace_compressed_klass_ptrs into its own function that can > then be called from MetaspaceShared::reserve_shared_space when > requested_address==NULL (i.e. the fallback path when mmap at 0x800000000 > fails). If you're happy with this I'll make a patch for review? Yes, that sounds excellent. We really need it to avoid compressed class pointers becoming an expensive option. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From nick.gasson at arm.com Thu Nov 28 10:18:38 2019 From: nick.gasson at arm.com (Nick Gasson) Date: Thu, 28 Nov 2019 18:18:38 +0800 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <52a223ea-13d3-d7ef-72c7-bd0f590c12be@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com> <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com> <52a223ea-13d3-d7ef-72c7-bd0f590c12be@redhat.com> Message-ID: <044ce397-96c7-8b01-a6ed-d3ea2546749a@arm.com> On 28/11/2019 18:03, Andrew Haley wrote: >> (the worst-case size estimate there is wrong, which needs to be fixed >> separately). > > Correcting the stub size is a minor tidy-up which does not really need > its own Bug ID. > OK, but I'd like to also try removing the second call to __ load_klass in VtableStubs::create_itable_stub as that will shave a few instructions even in the normal case. I'll recalculate the size estimate when I do that. Thanks, Nick From aph at redhat.com Thu Nov 28 10:28:49 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 28 Nov 2019 10:28:49 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <044ce397-96c7-8b01-a6ed-d3ea2546749a@arm.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <6a7e7eb1-f3a5-d361-45ab-a4b7004683ec@redhat.com> <60b30e79-78e6-cda9-a813-f4f1a58f7501@arm.com> <52a223ea-13d3-d7ef-72c7-bd0f590c12be@redhat.com> <044ce397-96c7-8b01-a6ed-d3ea2546749a@arm.com> Message-ID: On 11/28/19 10:18 AM, Nick Gasson wrote: > OK, but I'd like to also try removing the second call to __ load_klass > in VtableStubs::create_itable_stub as that will shave a few instructions > even in the normal case. I'll recalculate the size estimate when I do that. OK. But beware of spending time on things that don't really matter. There's a risk in making any change. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From christian.hagedorn at oracle.com Thu Nov 28 10:32:28 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 28 Nov 2019 11:32:28 +0100 Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and C1/Xint In-Reply-To: References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com> Message-ID: <4bd7c970-f35f-008c-67a9-dac871fd6c3a@oracle.com> Hi Vladimir On 28.11.19 10:46, Vladimir Ivanov wrote: >> This bug can also be reproduced with JDK 11. Should we target this fix >> to 14 or defer it to 15 (since it's a more complex one)? > > Silently producing erroneous code is a more serious issue than a JVM > crash (either during compilation or while executing generated code). > Instead of just shutting down the process, it silently continues running > and can stay unnoticed for a long time. Also, it makes it hard estimate > the impact. > > If we aren't confident in the stability of the fix, then it's worth > considering implementing a stop-the-gap solution first (detect > problematic code shape and either avoid the transformation or even bail > out the compilation) while continue working on a proper fix. > > But we should try our best to fix the problematic behavior in a prompt > manner (in 14 and backported to 11). Thanks for the explanation. That sounds reasonable. > PS: please, update bug synopsis with a summary of the problem. I added a more precise description and changed the bug summary from "Results of execution differ for C2 and C1/Xint" into "C2 produces wrong result while unswitching a loop due to lost control dependencies" Best regards, Christian From goetz.lindenmaier at sap.com Thu Nov 28 10:50:50 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 28 Nov 2019 10:50:50 +0000 Subject: RFR(XS): 8234645: ARM32: C1: PatchingStub for field access: not enough bytes In-Reply-To: References: Message-ID: Hi Martin, the change looks good. I had to take a second look to grok your comment. Mentioning "2nd" is a bit shaky. I would prefer if you either move up the real comment to the first occurrence in the function, and add comment "// see above" at the others. Or you say: // see comment before patching_epilog for 2nd ldr (str respectively) Don't need a webrev for this. Best regards, Goetz. > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Doerr, Martin > Sent: Dienstag, 26. November 2019 17:22 > To: christoph.goettschkes at microdoc.com; 'hotspot-compiler- > dev at openjdk.java.net' > Subject: [CAUTION] RFR(XS): 8234645: ARM32: C1: PatchingStub for field > access: not enough bytes > > Hi Christoph, > > thanks for reporting the bug > https://bugs.openjdk.java.net/browse/JDK-8234645 > > Seems like the large offset fix in mem2reg and reg2mem missed the first > patching stub in the long/double cases. > We should have nop padding there, too (for same reason as for the 2nd > patching stub). > NativeMovRegMem should always consist of 2 instructions on arm32 in order > to support larger offsets. > > Webrev: > http://cr.openjdk.java.net/~mdoerr/8234645_arm_padding/webrev.00/ > > May I ask you to test this fix? We don't have arm32 in our testing landscape. > > Best regards, > Martin From martin.doerr at sap.com Thu Nov 28 11:08:21 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 28 Nov 2019 11:08:21 +0000 Subject: RFR(XS): 8234645: ARM32: C1: PatchingStub for field access: not enough bytes In-Reply-To: References: Message-ID: Hi G?tz, thanks for the review. Pushed with updated comments. Best regards, Martin > -----Original Message----- > From: Lindenmaier, Goetz > Sent: Donnerstag, 28. November 2019 11:51 > To: Doerr, Martin ; > christoph.goettschkes at microdoc.com; 'hotspot-compiler- > dev at openjdk.java.net' > Subject: RE: RFR(XS): 8234645: ARM32: C1: PatchingStub for field access: not > enough bytes > > Hi Martin, > > the change looks good. > I had to take a second look to grok your comment. Mentioning "2nd" > is a bit shaky. > I would prefer if you either move up the real comment to the > first occurrence in the function, and add comment "// see above" at > the others. > > Or you say: > // see comment before patching_epilog for 2nd ldr (str respectively) > > Don't need a webrev for this. > > Best regards, > Goetz. > > > > -----Original Message----- > > From: hotspot-compiler-dev > bounces at openjdk.java.net> On Behalf Of Doerr, Martin > > Sent: Dienstag, 26. November 2019 17:22 > > To: christoph.goettschkes at microdoc.com; 'hotspot-compiler- > > dev at openjdk.java.net' > > Subject: [CAUTION] RFR(XS): 8234645: ARM32: C1: PatchingStub for field > > access: not enough bytes > > > > Hi Christoph, > > > > thanks for reporting the bug > > https://bugs.openjdk.java.net/browse/JDK-8234645 > > > > Seems like the large offset fix in mem2reg and reg2mem missed the first > > patching stub in the long/double cases. > > We should have nop padding there, too (for same reason as for the 2nd > > patching stub). > > NativeMovRegMem should always consist of 2 instructions on arm32 in > order > > to support larger offsets. > > > > Webrev: > > http://cr.openjdk.java.net/~mdoerr/8234645_arm_padding/webrev.00/ > > > > May I ask you to test this fix? We don't have arm32 in our testing > landscape. > > > > Best regards, > > Martin From per.liden at oracle.com Thu Nov 28 11:10:29 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 28 Nov 2019 12:10:29 +0100 Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles In-Reply-To: <79bc1866-2293-37a6-9780-af24a01eb699@oracle.com> References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com> <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com> <2154a53d-4d36-d26f-9155-5c955796f566@oracle.com> <79bc1866-2293-37a6-9780-af24a01eb699@oracle.com> Message-ID: Thanks Nils! Looks good to me. /Per On 11/27/19 11:50 AM, Nils Eliasson wrote: > Hi Per, > > Here is an update webrev with your fixes included: > > http://cr.openjdk.java.net/~neliasso/8234520/webrev.05/ > > I chose to go with CloneInst, and ClonePrimArray. > > CopyOf, CopyOfRange - like ArrayCopy, doesn't have specialized versions, > rather check the type when expanding, and of course - there are no > versions for instances. But there are differences to what guards they > need, and when the guards are expanded. I would need to dig down into > the details to determine if it could be simplified. > > Regards, > > Nils > > On 2019-11-26 14:29, Per Liden wrote: >> Hi Nils, >> >> On 11/21/19 12:53 PM, Nils Eliasson wrote: >>> I updated this to version 2. >>> >>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/ >>> >>> I found a problen running >>> compiler/arguments/TestStressReflectiveCode.java >>> >>> Even though the clone was created as a oop clone, the type node type >>> returns isa_aryprt. This is caused by the src ptr not being the base >>> pointer. Until I fix that I wanted a more robust test. >>> >>> In this webrev I split up the is_clonebasic into is_clone_oop and >>> is_clone_array. (is_clone_oop_array is already there). Having a >>> complete set with the three clone types allows for a robust test and >>> easy verification. (The three variants end up in different paths with >>> different GCs). >> >> A couple of suggestions: >> >> 1) Instead of >> >> ?CloneOop >> ?CloneArray >> ?CloneOopArray >> >> I think we should call the three types: >> >> ?CloneInstance >> ?CloneTypeArray >> ?CloneOopArray >> >> Since CloneOop is not actually cloning an oop, but an object/instance. >> And being explicit about TypeArray seems like a good thing to avoid >> any confusion about the difference compared to CloneOopArray. I guess >> PrimArray would be an alternative to TypeArray. >> >> And of course, if we change this then the is_clone/set_clone functions >> should follow the same naming convention. >> >> Btw, what about CopyOf and CopyOfRange? Don't they also come in Oop >> and Type versions, or are we handling those differently in some way? >> Looking at the code it looks like they are only used for the oop array >> case? > >> >> >> 2) In zBarrierSerC2.cpp, do you mind if we do like this instead? I >> find that quite a bit easier to read. >> >> [...] >> ? const Type** domain_fields = TypeTuple::fields(4); >> ? domain_fields[TypeFunc::Parms + 0] = TypeInstPtr::NOTNULL;? // src >> ? domain_fields[TypeFunc::Parms + 1] = TypeInstPtr::NOTNULL;? // dst >> ? domain_fields[TypeFunc::Parms + 2] = TypeLong::LONG;??????? // size >> lower >> ? domain_fields[TypeFunc::Parms + 3] = Type::HALF;??????????? // size >> upper >> ? const TypeTuple* domain = TypeTuple::make(TypeFunc::Parms + 4, >> domain_fields); >> [...] >> >> >> 3) I'd also like to add some const, adjust indentation, etc, in a few >> places. Instead of listing them here I made a patch, which goes on top >> of yours. This patch also adjusts 2) above. Just shout if you have any >> objections. >> >> http://cr.openjdk.java.net/~pliden/8234520/webrev.03-review >> >> /Per >> >>> >>> Regards, >>> >>> Nils >>> >>> >>> On 2019-11-20 15:25, Nils Eliasson wrote: >>>> Hi, >>>> >>>> I found a few bugs after the enabling of the clone intrinsic in ZGC. >>>> >>>> 1) The arraycopy clone_basic has the parameters adjusted to work as >>>> a memcopy. For an oop the src is pointing inside the oop to where we >>>> want to start copying. But when we want to do a runtime call to >>>> clone - the parameters are supposed to be the actual src oop and dst >>>> oop, and the size should be the instance size. >>>> >>>> For now I have made a workaround. What should be done later is using >>>> the offset in the arraycopy node to encode where the payload is, so >>>> that the base pointers are always correct. But that would require >>>> changes to the BarrierSet classes of all GCs. So I leave that for >>>> next release. >>>> >>>> 2) The size parameter of the TypeFunc for the runtime call has the >>>> wrong type. It was originally Long but missed the upper Half, it was >>>> fixed to INT (JDK-8233834), but that is wrong and causes the >>>> compiles to be skipped. We didn't notice that since they failed >>>> silently. That is also why we didn't notice problem #1 too. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8234520 >>>> >>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/ >>>> >>>> Please review! >>>> >>>> Nils >>>> From nils.eliasson at oracle.com Thu Nov 28 11:25:08 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 28 Nov 2019 12:25:08 +0100 Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles In-Reply-To: References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com> <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com> <2154a53d-4d36-d26f-9155-5c955796f566@oracle.com> <79bc1866-2293-37a6-9780-af24a01eb699@oracle.com> Message-ID: Thank you Per! // Nils On 2019-11-28 12:10, Per Liden wrote: > Thanks Nils! > > Looks good to me. > > /Per > > On 11/27/19 11:50 AM, Nils Eliasson wrote: >> Hi Per, >> >> Here is an update webrev with your fixes included: >> >> http://cr.openjdk.java.net/~neliasso/8234520/webrev.05/ >> >> I chose to go with CloneInst, and ClonePrimArray. >> >> CopyOf, CopyOfRange - like ArrayCopy, doesn't have specialized >> versions, rather check the type when expanding, and of course - there >> are no versions for instances. But there are differences to what >> guards they need, and when the guards are expanded. I would need to >> dig down into the details to determine if it could be simplified. >> >> Regards, >> >> Nils >> >> On 2019-11-26 14:29, Per Liden wrote: >>> Hi Nils, >>> >>> On 11/21/19 12:53 PM, Nils Eliasson wrote: >>>> I updated this to version 2. >>>> >>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/ >>>> >>>> I found a problen running >>>> compiler/arguments/TestStressReflectiveCode.java >>>> >>>> Even though the clone was created as a oop clone, the type node >>>> type returns isa_aryprt. This is caused by the src ptr not being >>>> the base pointer. Until I fix that I wanted a more robust test. >>>> >>>> In this webrev I split up the is_clonebasic into is_clone_oop and >>>> is_clone_array. (is_clone_oop_array is already there). Having a >>>> complete set with the three clone types allows for a robust test >>>> and easy verification. (The three variants end up in different >>>> paths with different GCs). >>> >>> A couple of suggestions: >>> >>> 1) Instead of >>> >>> ?CloneOop >>> ?CloneArray >>> ?CloneOopArray >>> >>> I think we should call the three types: >>> >>> ?CloneInstance >>> ?CloneTypeArray >>> ?CloneOopArray >>> >>> Since CloneOop is not actually cloning an oop, but an >>> object/instance. And being explicit about TypeArray seems like a >>> good thing to avoid any confusion about the difference compared to >>> CloneOopArray. I guess PrimArray would be an alternative to TypeArray. >>> >>> And of course, if we change this then the is_clone/set_clone >>> functions should follow the same naming convention. >>> >>> Btw, what about CopyOf and CopyOfRange? Don't they also come in Oop >>> and Type versions, or are we handling those differently in some way? >>> Looking at the code it looks like they are only used for the oop >>> array case? >> >>> >>> >>> 2) In zBarrierSerC2.cpp, do you mind if we do like this instead? I >>> find that quite a bit easier to read. >>> >>> [...] >>> ? const Type** domain_fields = TypeTuple::fields(4); >>> ? domain_fields[TypeFunc::Parms + 0] = TypeInstPtr::NOTNULL; // src >>> ? domain_fields[TypeFunc::Parms + 1] = TypeInstPtr::NOTNULL; // dst >>> ? domain_fields[TypeFunc::Parms + 2] = TypeLong::LONG; // size lower >>> ? domain_fields[TypeFunc::Parms + 3] = Type::HALF; // size upper >>> ? const TypeTuple* domain = TypeTuple::make(TypeFunc::Parms + 4, >>> domain_fields); >>> [...] >>> >>> >>> 3) I'd also like to add some const, adjust indentation, etc, in a >>> few places. Instead of listing them here I made a patch, which goes >>> on top of yours. This patch also adjusts 2) above. Just shout if you >>> have any objections. >>> >>> http://cr.openjdk.java.net/~pliden/8234520/webrev.03-review >>> >>> /Per >>> >>>> >>>> Regards, >>>> >>>> Nils >>>> >>>> >>>> On 2019-11-20 15:25, Nils Eliasson wrote: >>>>> Hi, >>>>> >>>>> I found a few bugs after the enabling of the clone intrinsic in ZGC. >>>>> >>>>> 1) The arraycopy clone_basic has the parameters adjusted to work >>>>> as a memcopy. For an oop the src is pointing inside the oop to >>>>> where we want to start copying. But when we want to do a runtime >>>>> call to clone - the parameters are supposed to be the actual src >>>>> oop and dst oop, and the size should be the instance size. >>>>> >>>>> For now I have made a workaround. What should be done later is >>>>> using the offset in the arraycopy node to encode where the payload >>>>> is, so that the base pointers are always correct. But that would >>>>> require changes to the BarrierSet classes of all GCs. So I leave >>>>> that for next release. >>>>> >>>>> 2) The size parameter of the TypeFunc for the runtime call has the >>>>> wrong type. It was originally Long but missed the upper Half, it >>>>> was fixed to INT (JDK-8233834), but that is wrong and causes the >>>>> compiles to be skipped. We didn't notice that since they failed >>>>> silently. That is also why we didn't notice problem #1 too. >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8234520 >>>>> >>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/ >>>>> >>>>> Please review! >>>>> >>>>> Nils >>>>> From vladimir.x.ivanov at oracle.com Thu Nov 28 12:28:43 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 28 Nov 2019 15:28:43 +0300 Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles In-Reply-To: <3fcbaa19-1068-d30d-96b4-3f8a52089d28@oracle.com> References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com> <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com> <49878c82-d174-b2c9-219a-0fe561155c0d@oracle.com> <58fd5565-4342-ea70-511d-bace68308391@oracle.com> <3fcbaa19-1068-d30d-96b4-3f8a52089d28@oracle.com> Message-ID: >> Do you see any problems with copying object header? > > It won't be copied. It's just that the runtime call expects the > arguments to be pointers to the objects, and the size of the object. > It's the same function that is used by a call to the native clone impl. > (jvm.cpp:720) Actually the header is copied, but then cleared. http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/oops/accessBackend.inline.hpp#l363 >> -? if (src->bottom_type()->isa_aryptr()) { >> +? if (ac->is_clone_array()) { >> ???? // Clone primitive array >> >> Is the comment valid? Doesn't it cover object array case as well? > > Nope - object arrays will be handled as clone_oop_array which uses the > normal object copy which already applies the appropriate load barriers. > > The special case for ZGC is the cloning of instances because we don't > know where to apply load barriers without looking up the type. (Except > for clone on small objects and short arrays that are transformed to a > series of load-stores.) > > http://cr.openjdk.java.net/~neliasso/8234520/webrev.04 I'm looking at webrev.05: src/hotspot/share/gc/shared/c2/barrierSetC2.cpp: - ArrayCopyNode* ac = ArrayCopyNode::make(kit, false, src_base, NULL, dst_base, NULL, countx, true, false); - ac->set_clonebasic(); + ArrayCopyNode* ac = ArrayCopyNode::make(kit, false, payload_src, NULL, payload_dst, NULL, payload_size, true, false); + if (is_array) { + ac->set_clone_prim_array(); + } else { + ac->set_clone_inst(); + } I'm looking at LibraryCallKit::inline_native_clone(): http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/opto/library_call.cpp#l4323 It looks like object arrays are filtered out only if array_copy_requires_gc_barriers() == true: http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/opto/library_call.cpp#l4333 But your change sets set_clone_prim_array() irrespective of whether object arrays are specially treated or not. It looks like a naming problem, but still. PS: and extract_base_offset is still there: -void BarrierSetC2::clone(GraphKit* kit, Node* src, Node* dst, Node* size, bool is_array) const { +int BarrierSetC2::extract_base_offset(bool is_array) { Best regards, Vladimir Ivanov > > Thank you for the feedback! > > // Nils > >> >> >> Best regards, >> Vladimir Ivanov >> >>> >>> Regards, >>> >>> Nils >>> >>> >>> On 2019-11-21 12:53, Nils Eliasson wrote: >>>> I updated this to version 2. >>>> >>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/ >>>> >>>> I found a problen running >>>> compiler/arguments/TestStressReflectiveCode.java >>>> >>>> Even though the clone was created as a oop clone, the type node type >>>> returns isa_aryprt. This is caused by the src ptr not being the base >>>> pointer. Until I fix that I wanted a more robust test. >>>> >>>> In this webrev I split up the is_clonebasic into is_clone_oop and >>>> is_clone_array. (is_clone_oop_array is already there). Having a >>>> complete set with the three clone types allows for a robust test and >>>> easy verification. (The three variants end up in different paths >>>> with different GCs). >>>> >>>> Regards, >>>> >>>> Nils >>>> >>>> >>>> On 2019-11-20 15:25, Nils Eliasson wrote: >>>>> Hi, >>>>> >>>>> I found a few bugs after the enabling of the clone intrinsic in ZGC. >>>>> >>>>> 1) The arraycopy clone_basic has the parameters adjusted to work as >>>>> a memcopy. For an oop the src is pointing inside the oop to where >>>>> we want to start copying. But when we want to do a runtime call to >>>>> clone - the parameters are supposed to be the actual src oop and >>>>> dst oop, and the size should be the instance size. >>>>> >>>>> For now I have made a workaround. What should be done later is >>>>> using the offset in the arraycopy node to encode where the payload >>>>> is, so that the base pointers are always correct. But that would >>>>> require changes to the BarrierSet classes of all GCs. So I leave >>>>> that for next release. >>>>> >>>>> 2) The size parameter of the TypeFunc for the runtime call has the >>>>> wrong type. It was originally Long but missed the upper Half, it >>>>> was fixed to INT (JDK-8233834), but that is wrong and causes the >>>>> compiles to be skipped. We didn't notice that since they failed >>>>> silently. That is also why we didn't notice problem #1 too. >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8234520 >>>>> >>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/ >>>>> >>>>> Please review! >>>>> >>>>> Nils >>>>> From christoph.goettschkes at microdoc.com Thu Nov 28 12:44:05 2019 From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com) Date: Thu, 28 Nov 2019 13:44:05 +0100 Subject: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for client VMs due to Unrecognized VM option LoopUnrollLimit In-Reply-To: References: <20191127125735.B9BE111F377@aojmv0009> Message-ID: Hi Vladimir, Vladimir Kozlov wrote on 2019-11-27 20:54:02: > From: Vladimir Kozlov > To: christoph.goettschkes at microdoc.com, hotspot-compiler-dev at openjdk.java.net > Date: 2019-11-27 20:54 > Subject: Re: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for > client VMs due to Unrecognized VM option LoopUnrollLimit > > Hi Christoph > > I was about suggest IgnoreUnrecognizedVMOptions flag but remembered > discussion about 8231954 fix. Yes, I try to avoid "IgnoreUnrecognizedVMOptions" because of our previous discussion. I also think that it doesn't make sense to execute tests in VM configurations for which they are not written for. Most of the compiler tests simply have "IgnoreUnrecognizedVMOptions" and probably waste a good amount of time in certain VM configurations. > But I think the test should be run with Graal - it does have OSR > compilation and we need to test it. Sure. I disabled it, because I thought that the flag "LoopUnrollLimit" is required to trigger the faulty behavior, but I don't know much about optimization in the graal JIT. > > We can do it by splitting test runs (duplicate @test block with > different run flags) to have 2 tests with different > flags and conditions. See [1]. > > For existing @run block we use `@requires vm.compiler2.enabled` and for > new without LoopUnrollLimit - `vm.graal.enabled`. I did the following: https://cr.openjdk.java.net/~cgo/8234906/webrev.01/ Could you elaborate how the two flags are related? I though, if graal is used as a JIT, both `vm.graal.enabled` and `vm.compiler2.enabled` are set to true. Is that correct? I don't have a setup with graal, so I can not test this. Thanks, Christoph From martin.doerr at sap.com Thu Nov 28 13:10:40 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 28 Nov 2019 13:10:40 +0000 Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis library In-Reply-To: <7B40FDAF-7E30-4ABC-9E1B-B18B2C139150@sap.com> References: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com> <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com> <03722e38-d5fc-fe27-3426-49e1e92d9d17@oracle.com> <30AB85CF-8A51-494A-AA08-9A4C9C2F1EF1@sap.com> <50f96d01-aeca-d2ce-44df-093be8c77310@oracle.com> <7B40FDAF-7E30-4ABC-9E1B-B18B2C139150@sap.com> Message-ID: Hi Lutz, looks good to me, too. Thanks for fixing. Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Schmidt, Lutz > Sent: Montag, 25. November 2019 21:48 > To: Vladimir Ivanov ; 'hotspot-compiler- > dev at openjdk.java.net' > Cc: Jean-Philippe BEMPEL > Subject: [CAUTION] Re: [14] RFR(S): 8234583: PrintAssemblyOptions isn't > passed to hsdis library > > Thanks for the review, Vladimir! > Still one to go. > Regards, > Lutz > > ?On 25.11.19, 21:26, "Vladimir Ivanov" wrote: > > > > All your other comments are valid. There is an open bug to address and > improve the very basic options parsing: > https://bugs.openjdk.java.net/browse/JDK-8223765 This task was split off > from JDK-8213084. > > > > I would like to cover the improvements you suggest when working on > that bug. To make _print_raw work correctly, I suggest to just move > > if (_optionsParsed) return; > > a bit further down. The help text should be printed only once anyway. > > Ok, I'm fine with addressing it later. And thanks for taking care of it. > > > Here is a new webrev iteration. It reflects what I suggest: > https://cr.openjdk.java.net/~lucy/webrevs/8234583.01/ > > Looks good. > > Best regards, > Vladimir Ivanov > > > On 25.11.19, 19:30, "Vladimir Ivanov" > wrote: > > > > Thanks for the clarifications, Lutz. > > > > So, I assume you have a typo in the patch then: > > > > + if ((options() == NULL) || (strlen(options()) == 0)) { > > + // We need to fill the options buffer for each newly created > > + // decode_env instance. The hsdis_* library looks for options > > + // in that buffer. > > + collect_options(Disassembler::pd_cpu_opts()); > > + collect_options(PrintAssemblyOptions); > > + } > > > > It performs collect_options() calls only if _option_buf is either NULL > > or "\0". > > > > Also, what about the following updates of instance members? > > > > if (strstr(options(), "print-raw")) { > > _print_raw = (strstr(options(), "xml") ? 2 : 1); > > } > > > > if (strstr(options(), "help")) { > > _print_help = true; > > } > > > > BTW should _print_help (along with _helpPrinted) be better turned > into > > static member? > > > > Can we make _option_buf static as well? Or do we want to keep a > > defensive copy to pass into hsdis.so? > > > > As an alternative approach to fix the bug, we could create a golden > copy > > during parsing instead and then just copy it to _option_buf as part of > > decode_env initialization. > > > > Best regards, > > Vladimir Ivanov > > > > On 25.11.2019 19:59, Schmidt, Lutz wrote: > > > Hi Vladimir, > > > > > > I'm happy to elaborate in more detail about the issue and the fix. > > > > > > For each decode_env instance which is constructed, > process_options() is called. It collects the disassembly options from various > sources (Disassembler::pd_cpu_opts() and PrintAssemblyOptions), storing > them in the private member "char _option_buf[512]". > > > > > > Further processing derives static flag settings from these options. > Being static, these flags need to be set only once, not every time a > decode_env is constructed. > > > > > > But that's just one part of the story. It was not taken into account > that _option_buf is passed to and analyzed by hsdis-.so as well. > That requires _option_buf to be filled every time a decode_env is > constructed. > > > > > > Moving > > > if (_optionsParsed) return; > > > after the collect_options() calls heals this deficiency. > > > > > > I added the guards you question as additional "safety net". After > looking at the code again I must admit the guards are not necessary. > _option_buf can never be NULL and every invocation of process_options() is > directly preceded by a memset(_option_buf, 0, sizeof(_option_buf)). I can > remove the guards if you like. > > > > > > Please let me know if there are any more questions to be answered. > > > > > > Thanks, > > > Lutz > > > > > > > > > On 25.11.19, 17:05, "Vladimir Ivanov" > wrote: > > > > > > Lutz, > > > > > > Can you elaborate, please, how the patch fixes the problem? > > > > > > Why did you decide to add the following guards? > > > > > > + if ((options() == NULL) || (strlen(options()) == 0)) { > > > > > > Best regards, > > > Vladimir Ivanov > > > > > > On 25.11.2019 17:06, Schmidt, Lutz wrote: > > > > Dear all, > > > > > > > > may I please request reviews for this small change, fixing a > regression in the disassembler. Parameters to the hsdis- library > were not passed on. > > > > > > > > The change was verified to fix the issue by the reporter (Jean- > Philippe Bempel, on CC:). jdk/submit tests pending... > > > > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234583 > > > > Webrev: > https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/ > > > > > > > > Thank you, > > > > Lutz > > > > > > > > > > > > > > > From adinn at redhat.com Thu Nov 28 13:32:03 2019 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 28 Nov 2019 13:32:03 +0000 Subject: RFR(S) 8234891: AArch64: Fix build failure after JDK-8234387 In-Reply-To: References: <847197ac-5969-900c-840a-61c475b19d84@bell-sw.com> Message-ID: <0960f240-6801-48c2-9664-c7509e90f4a5@redhat.com> On 27/11/2019 13:23, Vladimir Ivanov wrote: > The fix looks good and trivial. Yes, the patch is good. The CmpOp matches are not needed and perhaps never were. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From rwestrel at redhat.com Thu Nov 28 13:45:26 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 28 Nov 2019 14:45:26 +0100 Subject: scalar replacement of arrays affected by minor changes to surrounding code In-Reply-To: <21F5DF77-4DA9-437C-9DEB-CFABFC4C9C19@microsoft.com> References: <87d0fvidxs.fsf@redhat.com> <87a7azid39.fsf@redhat.com> <21F5DF77-4DA9-437C-9DEB-CFABFC4C9C19@microsoft.com> Message-ID: <878so0azd5.fsf@redhat.com> Hi Charlie, > Roland is there anything I could do to help get this type of change into tip and any > appropriate back ports? I have tested some workloads with this patch and I think > I covered the required checks but I could have easily missed something. Let me go over the patch again and prepare it for review. Roland. From tobias.hartmann at oracle.com Thu Nov 28 14:04:01 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 28 Nov 2019 15:04:01 +0100 Subject: [XXS] C1 misses to dump a reason when it inlines successfully In-Reply-To: References: Message-ID: <3759c24e-cdd7-a39c-40cd-26a403a07477@oracle.com> Hi, I'm still not convinced that this message adds any useful information but let's see what other reviewers think. Anyway, with your change, all callers now pass a msg argument and therefore the null handling logic should be removed (replaced by an assert). Also, the default value for the argument can be removed. Best regards, Tobias On 22.11.19 19:09, Liu Xin wrote: > hi, Reviewers, > > Could you review this extremely small change? > Bugs: https://bugs.openjdk.java.net/browse/JDK-8234541 > Webrev: https://cr.openjdk.java.net/~xliu/8234541/00/webrev/ > > When I analyzed PrintInlining, I was confused by the inline message without > any detail. It's not easy for developer to tell if this method is inlined > or not. This patch add a comment "inline by the rules of C1". > > I would like to add an explicit reason, but there's no decisive reason in > GraphBuilder::try_inline_full. It just passes all restrict rules. Any other > suggestion would be appreciated. > > Thanks, > --lx > From tobias.hartmann at oracle.com Thu Nov 28 14:08:32 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 28 Nov 2019 15:08:32 +0100 Subject: [14] RFR (L): 8234391: C2: Generic vector operands In-Reply-To: <77c66795-ecd4-1461-4151-66c82b4e554c@oracle.com> References: <89904467-5010-129f-6f61-e279cce8936a@oracle.com> <0e506a31-9107-0354-ffce-308d332cbfbd@oracle.com> <77c66795-ecd4-1461-4151-66c82b4e554c@oracle.com> Message-ID: <346c4324-7069-0aca-288e-26c90363b170@oracle.com> Hi Vladimir, thanks for the additional details. Looks all good to me. Best regards, Tobias On 26.11.19 17:52, Vladimir Ivanov wrote: > Thanks, Tobias. > >> hard to review the .ad file changes but this looks good to me. > > Yes, the changes are massive, but mostly straightforward. In addition to the code needed for generic > vector operand, the changes are: > > ? (1) (x86.ad) switching vec[SDXYZ] => vec and legVec[SDXYZ] => legVec; > > ? (2) (x86.ad) after the switch reduction instructions need additional checks on input vector size > (example [1]); > > ? (3) (x86_64.ad/x86_32.ad) rename operands with "vec" name to avoid name conflicts with vec operand; > > ? (4) (x86_64.ad) migrate compressed string instructions from legVecS to legRegD to keep it working > when vector support is explicitly disabled (e.g., -XX:MaxVectorSize=0) > > (1) is needed to avoid explicit moves between concrete (legVec/vec[SDXYZ]) and generic vectors > (vec/legVec). > > (2)-(4) could have been reviewed/integrated separately, but they did look trivial enough to avoid > the effort. > >> Just noticed some code style issues: >> - x86_64.ad:11284, 11346, 11410, 11426: indentation is wrong (already before your fix) >> - whitespace in matcher.cpp:2598/2601 can be removed > > Good catch. > > Best regards, > Vladimir Ivanov > > [1] > -instruct rvadd2F_reduction_reg(regF dst, vecD src2, vecD tmp) %{ > -? predicate(UseAVX > 0); > +instruct rvadd2F_reduction_reg(regF dst, vec src2, vec tmp) %{ > +? predicate(UseAVX > 0 && n->in(2)->bottom_type()->is_vect()->length() == 2); > >> On 19.11.19 15:30, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234391/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8234391 >>> >>> Introduce generic vector operands and migrate existing usages from fixed sized operands (vec[SDXYZ]) >>> to generic ones. >>> >>> (It's an updated version of generic vector support posted for review in August, 2019 [1] [2]. AD >>> instruction merges will be handled separately.) >>> >>> On a high-level it is organized as follows: >>> >>> ?? (1) all AD instructions in x86.ad/x86_64.ad/x86_32.ad use vec/legVec; >>> >>> ?? (2) at runtime, right after matching is over, a special pass is performed which does: >>> >>> ?????? * replaces vecOper with vec[SDXYZ] depending on mach node type >>> ????????? - vector mach nodes capute bottom_type() of their ideal prototype; >>> >>> ?????? * eliminates redundant reg-to-reg vector moves (MoveVec2Leg /MoveLeg2Vec) >>> ????????? - matcher needs them, but they are useless for register allocator (moreover, may cause >>> additional spills); >>> >>> >>> ??? (3) after post-selection pass is over, all mach nodes should have fixed-size vector operands. >>> >>> >>> Some details: >>> >>> ??? (1) vec and legVec are marked as "dynamic" operands, so post-selection rewriting works >>> >>> >>> ??? (2) new logic is guarded by new matcher flag (Matcher::supports_generic_vector_operands) >>> which is >>> enabled only on x86 >>> >>> >>> ??? (3) post-selection analysis is implemented as a single pass over the graph and processing >>> individual nodes using their own (for DEF operands) or their inputs (USE operands) bottom_type() >>> (which is an instance of TypeVect) >>> >>> >>> ??? (4) most of the analysis is cross-platform and interface with platform-specific code through 3 >>> methods: >>> >>> ????? static bool is_generic_reg2reg_move(MachNode* m); >>> ????? // distinguishes MoveVec2Leg/MoveLeg2Vec nodes >>> >>> ????? static bool is_generic_vector(MachOper* opnd); >>> ????? // distinguishes vec/legVec operands >>> >>> ????? static MachOper* clone_generic_vector_operand(MachOper* generic_opnd, uint ideal_reg); >>> ????? // constructs fixed-sized vector operand based on ideal reg >>> ????? //?? vec??? + Op_Vec[SDXYZ] =>??? vec[SDXYZ] >>> ????? //?? legVec + Op_Vec[SDXYZ] => legVec[SDXYZ] >>> >>> >>> ??? (5) TEMP operands are handled specially: >>> ????? - TEMP uses max_vector_size() to determine what fixed-sized operand to use >>> ????????? * it is needed to cover reductions which don't produce vectors but scalars >>> ????? - TEMP_DEF inherits fixed-sized operand type from DEF; >>> >>> >>> ??? (6) there is limited number of special cases for mach nodes in >>> Matcher::get_vector_operand_helper: >>> >>> ??????? - RShiftCntV/RShiftCntV: though it reports wide vector type as Node::bottom_type(), its >>> ideal_reg is VecS! But for vector nodes only Node::bottom_type() is captured during matching and not >>> ideal_reg(). >>> >>> ??????? - vshiftcntimm: chain instructions which convert scalar to vector don't have vector type. >>> >>> >>> ??? (7) idealreg2regmask initialization logic is adjusted to handle generic vector operands (see >>> Matcher::get_vector_regmask) >>> >>> >>> ??? (8) operand renaming in x86_32.ad & x86_64.ad to avoid name conflicts with new vec/legVec >>> operands >>> >>> >>> ??? (9) x86_64.ad: all TEMP usages of vecS/legVecS are replaced with regD/legRegD >>> ?????? - it aligns the code between x86_64.ad and x86_32.ad >>> ?????? - strictly speaking, it's illegal to use vector operands on a non-vector node (e.g., >>> string_inflate) unless its usage is guarded by C2 vector support checks (-XX:MaxVectorSize=0) >>> >>> >>> Contributed-by: Jatin Bhateja >>> Reviewed-by: vlivanov, sviswanathan, ? >>> >>> Testing: tier1-tier4, jtreg compiler tests on KNL and SKL, >>> ????????? performance testing (SPEC* + Octane + micros / G1 + ParGC). >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html >>> >>> [2] http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf From tobias.hartmann at oracle.com Thu Nov 28 14:14:13 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 28 Nov 2019 15:14:13 +0100 Subject: RFR: 823480: [TESTBUG] LoopRotateBadNodeBudget fails for client VMs due to Unrecognized VM option PartialPeelNewPhiDelta In-Reply-To: <20191127092416.6556811C437@aojmv0009> References: <20191126133017.4DEC711CEFA@aojmv0009> <46e4c188-2d15-bd33-010b-cb203741cbd8@oracle.com> <20191127092416.6556811C437@aojmv0009> Message-ID: <05973876-c74f-dace-4c3c-06cfdf779e48@oracle.com> Hi Christoph, I've sponsored it. Best regards, Tobias On 27.11.19 10:22, christoph.goettschkes at microdoc.com wrote: > Hi Vladimir, > > thanks for the review. I update the webrev, please find the changeset > here: > https://cr.openjdk.java.net/~cgo/8234807/webrev.02/jdk-jdk.changeset > > Could you please sponsor this change for me and commit it into the > repository? > > Thanks, > Christoph > > "hotspot-compiler-dev" > wrote on 2019-11-26 18:05:22: > >> From: Vladimir Kozlov >> To: hotspot-compiler-dev at openjdk.java.net >> Date: 2019-11-26 18:06 >> Subject: Re: RFR: 823480: [TESTBUG] LoopRotateBadNodeBudget fails for >> client VMs due to Unrecognized VM option PartialPeelNewPhiDelta >> Sent by: "hotspot-compiler-dev" > >> >> Good. >> >> Thanks, >> Vladimir >> >> On 11/26/19 5:28 AM, christoph.goettschkes at microdoc.com wrote: >>> Hi, >>> >>> please review the following small changeset which fixes the test >>> test/hotspot/jtreg/compiler/loopopts/LoopRotateBadNodeBudget.java for >>> client VMs. >>> I simply added vm.compiler2.enabled to the requires tag, since the >>> original bug only appeared with the server JIT: >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234807 >>> Webrev: http://cr.openjdk.java.net/~cgo/8234807/webrev.00/ >>> >>> Bug which introduced the issue: >>> https://bugs.openjdk.java.net/browse/JDK-8231565 >>> >>> Thanks, >>> Christoph >>> >> > From tobias.hartmann at oracle.com Thu Nov 28 14:20:20 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 28 Nov 2019 15:20:20 +0100 Subject: [14] RFR(S): 8234617: C1: Incorrect result of field load due to missing narrowing conversion Message-ID: Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8234617 http://cr.openjdk.java.net/~thartmann/8234617/webrev.00/ Writing an (integer) value to a boolean, byte, char or short field includes an implicit narrowing conversion [1]. With -XX:+EliminateFieldAccess (default), C1 tries to omit field loads by caching and reusing the last written value. The problem is that this value is not necessarily converted to the field type and we end up using an incorrect value. For example, for the field store/load in testShort, C1 emits: [...] 0x00007f0fc582bd6c: mov %dx,0x12(%rsi) 0x00007f0fc582bd70: mov %rdx,%rax [...] The field load has been eliminated and the non-converted integer value (%rdx) is returned. The fix is to emit an explicit conversion to get the correct field value after the write: [...] 0x00007ff07982bd6c: mov %dx,0x12(%rsi) 0x00007ff07982bd70: movswl %dx,%edx 0x00007ff07982bd73: mov %rdx,%rax [...] Thanks, Tobias [1] https://docs.oracle.com/javase/specs/jvms/se13/html/jvms-6.html#jvms-6.5.putfield From tobias.hartmann at oracle.com Thu Nov 28 14:23:12 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 28 Nov 2019 14:23:12 +0000 (UTC) Subject: RFR: 8234894: [TESTBUG] TestEliminateLocksOffCrash fails for client VMs due to Unrecognized VM option EliminateLocks In-Reply-To: <20191127095912.A371B11C5C1@aojmv0009> References: <20191127095912.A371B11C5C1@aojmv0009> Message-ID: <719a17e8-2710-a890-54df-e80714385b4d@oracle.com> Hi Christoph, looks good to me. Best regards, Tobias On 27.11.19 10:57, christoph.goettschkes at microdoc.com wrote: > Hi, > > please review the following small changeset which fixes the test > test/hotspot/jtreg/compiler/escapeAnalysis/TestEliminateLocksOffCrash.java > for client VMs. > I added the requirement "vm.compiler2.enabled & !vm.graal.enabled", since > the original bug is for C2 only. Also, the flag EliminateLocks is only > defined in c2_globals.hpp, and neither for C1, nor for JVMCI. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234894 > Webrev: https://cr.openjdk.java.net/~cgo/8234894/webrev.00 > > Bug which introduced the issue: > https://bugs.openjdk.java.net/browse/JDK-8227384 > > Thanks, > Christoph > From lutz.schmidt at sap.com Thu Nov 28 14:31:49 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Thu, 28 Nov 2019 14:31:49 +0000 Subject: [14] RFR(S): 8234583: PrintAssemblyOptions isn't passed to hsdis library In-Reply-To: References: <2290BF7C-F6E5-4FF5-B158-4054991AA292@sap.com> <1fedb3c0-5262-5939-fc7c-163352ec18b5@oracle.com> <03722e38-d5fc-fe27-3426-49e1e92d9d17@oracle.com> <30AB85CF-8A51-494A-AA08-9A4C9C2F1EF1@sap.com> <50f96d01-aeca-d2ce-44df-093be8c77310@oracle.com> <7B40FDAF-7E30-4ABC-9E1B-B18B2C139150@sap.com> Message-ID: <59DDA9B7-D7CF-4BAD-9826-62186F4698C4@sap.com> HI Martin, thanks for reviewing. I'll go ahead and push. Regards, Lutz ?On 28.11.19, 14:10, "Doerr, Martin" wrote: Hi Lutz, looks good to me, too. Thanks for fixing. Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Schmidt, Lutz > Sent: Montag, 25. November 2019 21:48 > To: Vladimir Ivanov ; 'hotspot-compiler- > dev at openjdk.java.net' > Cc: Jean-Philippe BEMPEL > Subject: [CAUTION] Re: [14] RFR(S): 8234583: PrintAssemblyOptions isn't > passed to hsdis library > > Thanks for the review, Vladimir! > Still one to go. > Regards, > Lutz > > On 25.11.19, 21:26, "Vladimir Ivanov" wrote: > > > > All your other comments are valid. There is an open bug to address and > improve the very basic options parsing: > https://bugs.openjdk.java.net/browse/JDK-8223765 This task was split off > from JDK-8213084. > > > > I would like to cover the improvements you suggest when working on > that bug. To make _print_raw work correctly, I suggest to just move > > if (_optionsParsed) return; > > a bit further down. The help text should be printed only once anyway. > > Ok, I'm fine with addressing it later. And thanks for taking care of it. > > > Here is a new webrev iteration. It reflects what I suggest: > https://cr.openjdk.java.net/~lucy/webrevs/8234583.01/ > > Looks good. > > Best regards, > Vladimir Ivanov > > > On 25.11.19, 19:30, "Vladimir Ivanov" > wrote: > > > > Thanks for the clarifications, Lutz. > > > > So, I assume you have a typo in the patch then: > > > > + if ((options() == NULL) || (strlen(options()) == 0)) { > > + // We need to fill the options buffer for each newly created > > + // decode_env instance. The hsdis_* library looks for options > > + // in that buffer. > > + collect_options(Disassembler::pd_cpu_opts()); > > + collect_options(PrintAssemblyOptions); > > + } > > > > It performs collect_options() calls only if _option_buf is either NULL > > or "\0". > > > > Also, what about the following updates of instance members? > > > > if (strstr(options(), "print-raw")) { > > _print_raw = (strstr(options(), "xml") ? 2 : 1); > > } > > > > if (strstr(options(), "help")) { > > _print_help = true; > > } > > > > BTW should _print_help (along with _helpPrinted) be better turned > into > > static member? > > > > Can we make _option_buf static as well? Or do we want to keep a > > defensive copy to pass into hsdis.so? > > > > As an alternative approach to fix the bug, we could create a golden > copy > > during parsing instead and then just copy it to _option_buf as part of > > decode_env initialization. > > > > Best regards, > > Vladimir Ivanov > > > > On 25.11.2019 19:59, Schmidt, Lutz wrote: > > > Hi Vladimir, > > > > > > I'm happy to elaborate in more detail about the issue and the fix. > > > > > > For each decode_env instance which is constructed, > process_options() is called. It collects the disassembly options from various > sources (Disassembler::pd_cpu_opts() and PrintAssemblyOptions), storing > them in the private member "char _option_buf[512]". > > > > > > Further processing derives static flag settings from these options. > Being static, these flags need to be set only once, not every time a > decode_env is constructed. > > > > > > But that's just one part of the story. It was not taken into account > that _option_buf is passed to and analyzed by hsdis-.so as well. > That requires _option_buf to be filled every time a decode_env is > constructed. > > > > > > Moving > > > if (_optionsParsed) return; > > > after the collect_options() calls heals this deficiency. > > > > > > I added the guards you question as additional "safety net". After > looking at the code again I must admit the guards are not necessary. > _option_buf can never be NULL and every invocation of process_options() is > directly preceded by a memset(_option_buf, 0, sizeof(_option_buf)). I can > remove the guards if you like. > > > > > > Please let me know if there are any more questions to be answered. > > > > > > Thanks, > > > Lutz > > > > > > > > > On 25.11.19, 17:05, "Vladimir Ivanov" > wrote: > > > > > > Lutz, > > > > > > Can you elaborate, please, how the patch fixes the problem? > > > > > > Why did you decide to add the following guards? > > > > > > + if ((options() == NULL) || (strlen(options()) == 0)) { > > > > > > Best regards, > > > Vladimir Ivanov > > > > > > On 25.11.2019 17:06, Schmidt, Lutz wrote: > > > > Dear all, > > > > > > > > may I please request reviews for this small change, fixing a > regression in the disassembler. Parameters to the hsdis- library > were not passed on. > > > > > > > > The change was verified to fix the issue by the reporter (Jean- > Philippe Bempel, on CC:). jdk/submit tests pending... > > > > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234583 > > > > Webrev: > https://cr.openjdk.java.net/~lucy/webrevs/8234583.00/ > > > > > > > > Thank you, > > > > Lutz > > > > > > > > > > > > > > > From martin.doerr at sap.com Thu Nov 28 14:53:48 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 28 Nov 2019 14:53:48 +0000 Subject: [XXS] C1 misses to dump a reason when it inlines successfully In-Reply-To: <3759c24e-cdd7-a39c-40cd-26a403a07477@oracle.com> References: <3759c24e-cdd7-a39c-40cd-26a403a07477@oracle.com> Message-ID: Hi, I agree with Tobias. " by the rules of C1" doesn't add any useful information IMHO. But I'd be fine with just adding "inline" to avoid the confusion. > Anyway, with your change, all callers now pass a msg argument and > therefore the null handling logic > should be removed (replaced by an assert). Also, the default value for the > argument can be removed. +1 Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Tobias Hartmann > Sent: Donnerstag, 28. November 2019 15:04 > To: Liu Xin ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: [XXS] C1 misses to dump a reason when it inlines successfully > > Hi, > > I'm still not convinced that this message adds any useful information but let's > see what other > reviewers think. > > Anyway, with your change, all callers now pass a msg argument and > therefore the null handling logic > should be removed (replaced by an assert). Also, the default value for the > argument can be removed. > > Best regards, > Tobias > > On 22.11.19 19:09, Liu Xin wrote: > > hi, Reviewers, > > > > Could you review this extremely small change? > > Bugs: https://bugs.openjdk.java.net/browse/JDK-8234541 > > Webrev: https://cr.openjdk.java.net/~xliu/8234541/00/webrev/ > > > > When I analyzed PrintInlining, I was confused by the inline message > without > > any detail. It's not easy for developer to tell if this method is inlined > > or not. This patch add a comment "inline by the rules of C1". > > > > I would like to add an explicit reason, but there's no decisive reason in > > GraphBuilder::try_inline_full. It just passes all restrict rules. Any other > > suggestion would be appreciated. > > > > Thanks, > > --lx > > From nils.eliasson at oracle.com Thu Nov 28 15:15:40 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 28 Nov 2019 16:15:40 +0100 Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles In-Reply-To: References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com> <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com> <49878c82-d174-b2c9-219a-0fe561155c0d@oracle.com> <58fd5565-4342-ea70-511d-bace68308391@oracle.com> <3fcbaa19-1068-d30d-96b4-3f8a52089d28@oracle.com> Message-ID: On 2019-11-28 13:28, Vladimir Ivanov wrote: > >>> Do you see any problems with copying object header? >> >> It won't be copied. It's just that the runtime call expects the >> arguments to be pointers to the objects, and the size of the object. >> It's the same function that is used by a call to the native clone >> impl. (jvm.cpp:720) > > Actually the header is copied, but then cleared. > > > http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/oops/accessBackend.inline.hpp#l363 > Let me correct myself: For the intrinsic case - we are copying part of the header - the klass, but not the markword. For the runtime call that is used by native clone, and ZGCs clone_inst - the entire Object is cloned. This seems unnecessary to me - but that is how it is done when the intrinsic is disabled too. This could probably be changed, but then we would need to verify it everywhere. > > >>> -? if (src->bottom_type()->isa_aryptr()) { >>> +? if (ac->is_clone_array()) { >>> ???? // Clone primitive array >>> >>> Is the comment valid? Doesn't it cover object array case as well? >> >> Nope - object arrays will be handled as clone_oop_array which uses >> the normal object copy which already applies the appropriate load >> barriers. >> >> The special case for ZGC is the cloning of instances because we don't >> know where to apply load barriers without looking up the type. >> (Except for clone on small objects and short arrays that are >> transformed to a series of load-stores.) >> >> http://cr.openjdk.java.net/~neliasso/8234520/webrev.04 > > I'm looking at webrev.05: > > src/hotspot/share/gc/shared/c2/barrierSetC2.cpp: > > -? ArrayCopyNode* ac = ArrayCopyNode::make(kit, false, src_base, NULL, > dst_base, NULL, countx, true, false); > -? ac->set_clonebasic(); > +? ArrayCopyNode* ac = ArrayCopyNode::make(kit, false, payload_src, > NULL, payload_dst, NULL, payload_size, true, false); > +? if (is_array) { > +??? ac->set_clone_prim_array(); > +? } else { > +??? ac->set_clone_inst(); > +? } > > I'm looking at LibraryCallKit::inline_native_clone(): > > > http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/opto/library_call.cpp#l4323 > > > It looks like object arrays are filtered out only if > array_copy_requires_gc_barriers() == true: > > > http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/opto/library_call.cpp#l4333 > > > But your change sets set_clone_prim_array() irrespective of whether > object arrays are specially treated or not. > > It looks like a naming problem, but still. Yes, you are right, for GCs without a barriers on oop arrays the naming will be off. The right thing to do would probably be to have one state for inst or array, and another state for what type of array. And it gets more complicated, at all the places where is_clone_basic()/is_clone_inst_or_prim_array() is used, different GCs would need different answers. ZGC can treat prim and oop arrays the same, until when choosing the right type of acopy, while other GCs that expand barriers early, can't do that. After talking to Per I suggest to revert is_clone_inst_or_prim_array() to is_clonebasic() and give it a proper comment. I will need to revisit this and continue to simplify things and clean it up. http://cr.openjdk.java.net/~neliasso/8234520/webrev.06/ One small diff - src/hotspot/share/opto/arraycopynode.cpp only had a formatting change - so I have reverted it completely. // Nils > > PS: and extract_base_offset is still there: > > -void BarrierSetC2::clone(GraphKit* kit, Node* src, Node* dst, Node* > size, bool is_array) const { > +int BarrierSetC2::extract_base_offset(bool is_array) { ah - good - missed that one when merging with Pers stuff. Thanks for you feedback. / Nils > > Best regards, > Vladimir Ivanov > > >> >> Thank you for the feedback! >> >> // Nils >> >>> >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> >>>> Regards, >>>> >>>> Nils >>>> >>>> >>>> On 2019-11-21 12:53, Nils Eliasson wrote: >>>>> I updated this to version 2. >>>>> >>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/ >>>>> >>>>> I found a problen running >>>>> compiler/arguments/TestStressReflectiveCode.java >>>>> >>>>> Even though the clone was created as a oop clone, the type node >>>>> type returns isa_aryprt. This is caused by the src ptr not being >>>>> the base pointer. Until I fix that I wanted a more robust test. >>>>> >>>>> In this webrev I split up the is_clonebasic into is_clone_oop and >>>>> is_clone_array. (is_clone_oop_array is already there). Having a >>>>> complete set with the three clone types allows for a robust test >>>>> and easy verification. (The three variants end up in different >>>>> paths with different GCs). >>>>> >>>>> Regards, >>>>> >>>>> Nils >>>>> >>>>> >>>>> On 2019-11-20 15:25, Nils Eliasson wrote: >>>>>> Hi, >>>>>> >>>>>> I found a few bugs after the enabling of the clone intrinsic in ZGC. >>>>>> >>>>>> 1) The arraycopy clone_basic has the parameters adjusted to work >>>>>> as a memcopy. For an oop the src is pointing inside the oop to >>>>>> where we want to start copying. But when we want to do a runtime >>>>>> call to clone - the parameters are supposed to be the actual src >>>>>> oop and dst oop, and the size should be the instance size. >>>>>> >>>>>> For now I have made a workaround. What should be done later is >>>>>> using the offset in the arraycopy node to encode where the >>>>>> payload is, so that the base pointers are always correct. But >>>>>> that would require changes to the BarrierSet classes of all GCs. >>>>>> So I leave that for next release. >>>>>> >>>>>> 2) The size parameter of the TypeFunc for the runtime call has >>>>>> the wrong type. It was originally Long but missed the upper Half, >>>>>> it was fixed to INT (JDK-8233834), but that is wrong and causes >>>>>> the compiles to be skipped. We didn't notice that since they >>>>>> failed silently. That is also why we didn't notice problem #1 too. >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8234520 >>>>>> >>>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/ >>>>>> >>>>>> Please review! >>>>>> >>>>>> Nils >>>>>> From christoph.goettschkes at microdoc.com Thu Nov 28 15:17:14 2019 From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com) Date: Thu, 28 Nov 2019 16:17:14 +0100 Subject: RFR: 8234894: [TESTBUG] TestEliminateLocksOffCrash fails for client VMs due to Unrecognized VM option EliminateLocks In-Reply-To: <719a17e8-2710-a890-54df-e80714385b4d@oracle.com> References: <20191127095912.A371B11C5C1@aojmv0009> <719a17e8-2710-a890-54df-e80714385b4d@oracle.com> Message-ID: Hi Tobias, thanks for the review. I created the changeset: https://cr.openjdk.java.net/~cgo/8234894/webrev.01/jdk-jdk.changeset Could you please sponsor this change for me and commit it into the repository? Thanks, Christoph Tobias Hartmann wrote on 2019-11-28 15:23:12: > From: Tobias Hartmann > To: christoph.goettschkes at microdoc.com, hotspot-compiler-dev at openjdk.java.net > Date: 2019-11-28 15:23 > Subject: Re: RFR: 8234894: [TESTBUG] TestEliminateLocksOffCrash fails > for client VMs due to Unrecognized VM option EliminateLocks > > Hi Christoph, > > looks good to me. > > Best regards, > Tobias > > On 27.11.19 10:57, christoph.goettschkes at microdoc.com wrote: > > Hi, > > > > please review the following small changeset which fixes the test > > test/hotspot/jtreg/compiler/escapeAnalysis/TestEliminateLocksOffCrash.java > > for client VMs. > > I added the requirement "vm.compiler2.enabled & !vm.graal.enabled", since > > the original bug is for C2 only. Also, the flag EliminateLocks is only > > defined in c2_globals.hpp, and neither for C1, nor for JVMCI. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234894 > > Webrev: https://cr.openjdk.java.net/~cgo/8234894/webrev.00 > > > > Bug which introduced the issue: > > https://bugs.openjdk.java.net/browse/JDK-8227384 > > > > Thanks, > > Christoph > > > From tobias.hartmann at oracle.com Thu Nov 28 15:20:52 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 28 Nov 2019 16:20:52 +0100 Subject: RFR: 8234894: [TESTBUG] TestEliminateLocksOffCrash fails for client VMs due to Unrecognized VM option EliminateLocks In-Reply-To: <2wjg9h0fwy-1@userp2020.oracle.com> References: <20191127095912.A371B11C5C1@aojmv0009> <719a17e8-2710-a890-54df-e80714385b4d@oracle.com> <2wjg9h0fwy-1@userp2020.oracle.com> Message-ID: <981a697d-77c9-1c1d-bdd3-3d057ec196b7@oracle.com> Sure, pushed. Best regards, Tobias On 28.11.19 16:17, christoph.goettschkes at microdoc.com wrote: > Hi Tobias, > > thanks for the review. I created the changeset: > > https://cr.openjdk.java.net/~cgo/8234894/webrev.01/jdk-jdk.changeset > > Could you please sponsor this change for me and commit it into the > repository? > > Thanks, > Christoph > > Tobias Hartmann wrote on 2019-11-28 15:23:12: > >> From: Tobias Hartmann >> To: christoph.goettschkes at microdoc.com, > hotspot-compiler-dev at openjdk.java.net >> Date: 2019-11-28 15:23 >> Subject: Re: RFR: 8234894: [TESTBUG] TestEliminateLocksOffCrash fails >> for client VMs due to Unrecognized VM option EliminateLocks >> >> Hi Christoph, >> >> looks good to me. >> >> Best regards, >> Tobias >> >> On 27.11.19 10:57, christoph.goettschkes at microdoc.com wrote: >>> Hi, >>> >>> please review the following small changeset which fixes the test >>> > test/hotspot/jtreg/compiler/escapeAnalysis/TestEliminateLocksOffCrash.java > >>> for client VMs. >>> I added the requirement "vm.compiler2.enabled & !vm.graal.enabled", > since >>> the original bug is for C2 only. Also, the flag EliminateLocks is only > >>> defined in c2_globals.hpp, and neither for C1, nor for JVMCI. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234894 >>> Webrev: https://cr.openjdk.java.net/~cgo/8234894/webrev.00 >>> >>> Bug which introduced the issue: >>> https://bugs.openjdk.java.net/browse/JDK-8227384 >>> >>> Thanks, >>> Christoph >>> >> > From vladimir.x.ivanov at oracle.com Thu Nov 28 15:47:40 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 28 Nov 2019 18:47:40 +0300 Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles In-Reply-To: References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com> <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com> <49878c82-d174-b2c9-219a-0fe561155c0d@oracle.com> <58fd5565-4342-ea70-511d-bace68308391@oracle.com> <3fcbaa19-1068-d30d-96b4-3f8a52089d28@oracle.com> Message-ID: <63464ec9-c7fb-24b7-bf50-65b605087451@oracle.com> > http://cr.openjdk.java.net/~neliasso/8234520/webrev.06/ Looks good. src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp +#include "opto/castnode.hpp" Unused? Best regards, Vladimir Ivanov >>>>> On 2019-11-21 12:53, Nils Eliasson wrote: >>>>>> I updated this to version 2. >>>>>> >>>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/ >>>>>> >>>>>> I found a problen running >>>>>> compiler/arguments/TestStressReflectiveCode.java >>>>>> >>>>>> Even though the clone was created as a oop clone, the type node >>>>>> type returns isa_aryprt. This is caused by the src ptr not being >>>>>> the base pointer. Until I fix that I wanted a more robust test. >>>>>> >>>>>> In this webrev I split up the is_clonebasic into is_clone_oop and >>>>>> is_clone_array. (is_clone_oop_array is already there). Having a >>>>>> complete set with the three clone types allows for a robust test >>>>>> and easy verification. (The three variants end up in different >>>>>> paths with different GCs). >>>>>> >>>>>> Regards, >>>>>> >>>>>> Nils >>>>>> >>>>>> >>>>>> On 2019-11-20 15:25, Nils Eliasson wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I found a few bugs after the enabling of the clone intrinsic in ZGC. >>>>>>> >>>>>>> 1) The arraycopy clone_basic has the parameters adjusted to work >>>>>>> as a memcopy. For an oop the src is pointing inside the oop to >>>>>>> where we want to start copying. But when we want to do a runtime >>>>>>> call to clone - the parameters are supposed to be the actual src >>>>>>> oop and dst oop, and the size should be the instance size. >>>>>>> >>>>>>> For now I have made a workaround. What should be done later is >>>>>>> using the offset in the arraycopy node to encode where the >>>>>>> payload is, so that the base pointers are always correct. But >>>>>>> that would require changes to the BarrierSet classes of all GCs. >>>>>>> So I leave that for next release. >>>>>>> >>>>>>> 2) The size parameter of the TypeFunc for the runtime call has >>>>>>> the wrong type. It was originally Long but missed the upper Half, >>>>>>> it was fixed to INT (JDK-8233834), but that is wrong and causes >>>>>>> the compiles to be skipped. We didn't notice that since they >>>>>>> failed silently. That is also why we didn't notice problem #1 too. >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234520 >>>>>>> >>>>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/ >>>>>>> >>>>>>> Please review! >>>>>>> >>>>>>> Nils >>>>>>> From nils.eliasson at oracle.com Thu Nov 28 16:35:39 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 28 Nov 2019 17:35:39 +0100 Subject: RFR(S): ZGC: C2: Oop instance cloning causing skipped compiles In-Reply-To: <63464ec9-c7fb-24b7-bf50-65b605087451@oracle.com> References: <34a6dd1b-b4fd-8ea8-6eeb-4e55e9ae6c6b@oracle.com> <9713e565-0a7b-150f-f6a3-093a5568cbb7@oracle.com> <49878c82-d174-b2c9-219a-0fe561155c0d@oracle.com> <58fd5565-4342-ea70-511d-bace68308391@oracle.com> <3fcbaa19-1068-d30d-96b4-3f8a52089d28@oracle.com> <63464ec9-c7fb-24b7-bf50-65b605087451@oracle.com> Message-ID: <33a7c1bb-8a6b-5dc5-82a6-98c6508343fd@oracle.com> On 2019-11-28 16:47, Vladimir Ivanov wrote: > >> http://cr.openjdk.java.net/~neliasso/8234520/webrev.06/ > > Looks good. > > src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp > > +#include "opto/castnode.hpp" > > Unused? Yes, I'll remove that one. Thanks! // Nils > > Best regards, > Vladimir Ivanov > >>>>>> On 2019-11-21 12:53, Nils Eliasson wrote: >>>>>>> I updated this to version 2. >>>>>>> >>>>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.02/ >>>>>>> >>>>>>> I found a problen running >>>>>>> compiler/arguments/TestStressReflectiveCode.java >>>>>>> >>>>>>> Even though the clone was created as a oop clone, the type node >>>>>>> type returns isa_aryprt. This is caused by the src ptr not being >>>>>>> the base pointer. Until I fix that I wanted a more robust test. >>>>>>> >>>>>>> In this webrev I split up the is_clonebasic into is_clone_oop >>>>>>> and is_clone_array. (is_clone_oop_array is already there). >>>>>>> Having a complete set with the three clone types allows for a >>>>>>> robust test and easy verification. (The three variants end up in >>>>>>> different paths with different GCs). >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Nils >>>>>>> >>>>>>> >>>>>>> On 2019-11-20 15:25, Nils Eliasson wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I found a few bugs after the enabling of the clone intrinsic in >>>>>>>> ZGC. >>>>>>>> >>>>>>>> 1) The arraycopy clone_basic has the parameters adjusted to >>>>>>>> work as a memcopy. For an oop the src is pointing inside the >>>>>>>> oop to where we want to start copying. But when we want to do a >>>>>>>> runtime call to clone - the parameters are supposed to be the >>>>>>>> actual src oop and dst oop, and the size should be the instance >>>>>>>> size. >>>>>>>> >>>>>>>> For now I have made a workaround. What should be done later is >>>>>>>> using the offset in the arraycopy node to encode where the >>>>>>>> payload is, so that the base pointers are always correct. But >>>>>>>> that would require changes to the BarrierSet classes of all >>>>>>>> GCs. So I leave that for next release. >>>>>>>> >>>>>>>> 2) The size parameter of the TypeFunc for the runtime call has >>>>>>>> the wrong type. It was originally Long but missed the upper >>>>>>>> Half, it was fixed to INT (JDK-8233834), but that is wrong and >>>>>>>> causes the compiles to be skipped. We didn't notice that since >>>>>>>> they failed silently. That is also why we didn't notice problem >>>>>>>> #1 too. >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234520 >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~neliasso/8234520/webrev.01/ >>>>>>>> >>>>>>>> Please review! >>>>>>>> >>>>>>>> Nils >>>>>>>> From john.r.rose at oracle.com Thu Nov 28 20:23:26 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 28 Nov 2019 12:23:26 -0800 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> Message-ID: On Nov 27, 2019, at 11:34 PM, Thomas St?fe wrote: > > In the end, I wonder whether we should have two kind of APIs, or a > parameter, distinguishing between "next power of 2" and "next power of 2 > unless input value is already power of 2?. Naming is important for clarity. ?Round up? means if it?s already ?rounded? (whatever that means) the input is returned unchanged. The other notion is a true ?next up?, because it always increases ? barring overflow. The possibility of overflow makes the ?next up? function more bug-prone than the ?round up? function. The usual trick for deriving that second function is to add one to the argument to the first function, ensuring that the result will always increase. If (for clarity) we implement a ?next power of two? function, rather than ask coders to use the +1 trick, the second function should be implemented in terms of the first function using the +1 trick, maybe with an assert added against overflow. My $0.02. ? John From claes.redestad at oracle.com Thu Nov 28 21:02:56 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 28 Nov 2019 22:02:56 +0100 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> Message-ID: <7e25e62f-46c1-ec08-2b2f-60436251d12b@oracle.com> I'm working on a new version, but I'm also out sick, so don't expect anything soon. I just want to point out that the "round up to power of 2" implementations I've seen seem prone to the same kind of overflows as a next up would, just not for exactly the same set of inputs. /Claes On 2019-11-28 21:23, John Rose wrote: > On Nov 27, 2019, at 11:34 PM, Thomas St?fe > wrote: >> >> In the end, I wonder whether we should have two kind of APIs, or a >> parameter, distinguishing between "next power of 2" and "next power of 2 >> unless input value is already power of 2?. > > Naming is important for clarity. ??Round up? means if it?s already ?rounded? > (whatever that means) the input is returned unchanged. > > The other notion is a true ?next up?, because it always increases ? > barring overflow. ?The possibility of overflow makes the ?next up? function > more bug-prone than the ?round up? function. > > The usual trick for deriving that second function is to add one to the > argument > to the first function, ensuring that the result will always increase. > > If (for clarity) we implement a ?next power of two? function, rather > than ask > coders to use the +1 trick, the second function should be implemented in > terms of > the first function using the +1 trick, maybe with an assert added > against overflow. > > My $0.02. > > ? John > From Pengfei.Li at arm.com Fri Nov 29 03:41:56 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Fri, 29 Nov 2019 03:41:56 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> Message-ID: Hi Andrew, I just caught up with your discussion with Nick. > I guess. It'd be nicer to fix CDS on AArch64 so that it doesn't cause > performance regressions. The 4G alignment search may still fail after the fix. Regarding to my second webrev, do you agree that checking the base and shift values in a function in aarch64.ad? See my last email [1] for detail explanations. [1] https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2019-November/008278.html -- Thanks, Pengfei From nick.gasson at arm.com Fri Nov 29 06:40:23 2019 From: nick.gasson at arm.com (Nick Gasson) Date: Fri, 29 Nov 2019 14:40:23 +0800 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> Message-ID: On 29/11/2019 11:41, Pengfei Li (Arm Technology China) wrote: > > The 4G alignment search may still fail after the fix. Regarding to my second webrev, do you agree that checking the base and shift values in a function in aarch64.ad? See my last email [1] for detail explanations. > How about we exit with a fatal error if we can't find a suitably aligned region? Then we can remove the code in decode_klass_non_null that uses R27 and this patch is much simpler. That code path is poorly tested at the moment so it seems risky to leave it in. With a hard error at least users will report it to us so we can fix it. Thanks, Nick From navy.xliu at gmail.com Fri Nov 29 07:16:57 2019 From: navy.xliu at gmail.com (Liu Xin) Date: Thu, 28 Nov 2019 23:16:57 -0800 Subject: [XXS] C1 misses to dump a reason when it inlines successfully In-Reply-To: References: <3759c24e-cdd7-a39c-40cd-26a403a07477@oracle.com> Message-ID: Hi, Tobias & Martin, The original message even didn't say inline succeed or not. When I see those, I have to dig into the source code. I spent even more time because I didn't realize that those messages are from C1. That's my standpoint and motivation as a new hotspot developer. What C2 inliner emits "inline (hot)" is not very informative neither, but at least developers know that this callee is inlined. I gave up "by the rules of C1". I think "inline" is fine. developers can also tell difference from C2's output. I agree with you about the default parameter. Failed attempts should always have a reason. Developers can inspect them and make adjustment. Here is a new revision. Could you take a look? https://cr.openjdk.java.net/~xliu/8234541/01/webrev/ sample output is like this. @ 1 java.lang.String::isLatin1 (19 bytes) inline @ 12 java.lang.StringLatin1::charAt (28 bytes) inline @ 15 java/lang/StringIndexOutOfBoundsException:: (not loaded) not inlineable @ 21 java/lang/StringUTF16::charAt (not loaded) not inlineable @ 15 java/lang/StringIndexOutOfBoundsException:: (not loaded) not inlineable @ 3 java.util.stream.ReduceOps::makeInt (18 bytes) inline @ 1 java.util.Objects::requireNonNull (14 bytes) inline @ 8 java.lang.NullPointerException:: (5 bytes) don't inline Throwable constructors @ 14 java.util.stream.ReduceOps$6:: (16 bytes) inline @ 12 java.util.stream.ReduceOps$ReduceOp:: (10 bytes) inline @ 1 java.lang.Object:: (1 bytes) inline @ 6 java.util.stream.AbstractPipeline::evaluate (94 bytes) callee is too large thanks, --lx On Thu, Nov 28, 2019 at 6:53 AM Doerr, Martin wrote: > Hi, > > I agree with Tobias. " by the rules of C1" doesn't add any useful > information IMHO. > But I'd be fine with just adding "inline" to avoid the confusion. > > > Anyway, with your change, all callers now pass a msg argument and > > therefore the null handling logic > > should be removed (replaced by an assert). Also, the default value for > the > > argument can be removed. > +1 > > Best regards, > Martin > > > > -----Original Message----- > > From: hotspot-compiler-dev > bounces at openjdk.java.net> On Behalf Of Tobias Hartmann > > Sent: Donnerstag, 28. November 2019 15:04 > > To: Liu Xin ; hotspot-compiler-dev at openjdk.java.net > > Subject: Re: [XXS] C1 misses to dump a reason when it inlines > successfully > > > > Hi, > > > > I'm still not convinced that this message adds any useful information > but let's > > see what other > > reviewers think. > > > > Anyway, with your change, all callers now pass a msg argument and > > therefore the null handling logic > > should be removed (replaced by an assert). Also, the default value for > the > > argument can be removed. > > > > Best regards, > > Tobias > > > > On 22.11.19 19:09, Liu Xin wrote: > > > hi, Reviewers, > > > > > > Could you review this extremely small change? > > > Bugs: https://bugs.openjdk.java.net/browse/JDK-8234541 > > > Webrev: https://cr.openjdk.java.net/~xliu/8234541/00/webrev/ > > > > > > When I analyzed PrintInlining, I was confused by the inline message > > without > > > any detail. It's not easy for developer to tell if this method is > inlined > > > or not. This patch add a comment "inline by the rules of C1". > > > > > > I would like to add an explicit reason, but there's no decisive reason > in > > > GraphBuilder::try_inline_full. It just passes all restrict rules. Any > other > > > suggestion would be appreciated. > > > > > > Thanks, > > > --lx > > > > From martin.doerr at sap.com Fri Nov 29 08:51:58 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 29 Nov 2019 08:51:58 +0000 Subject: [XXS] C1 misses to dump a reason when it inlines successfully In-Reply-To: References: <3759c24e-cdd7-a39c-40cd-26a403a07477@oracle.com> Message-ID: Hi, looks good to me. Thanks for cleaning it up. Best regards, Martin From: Liu Xin Sent: Freitag, 29. November 2019 08:17 To: Doerr, Martin Cc: Tobias Hartmann ; hotspot-compiler-dev at openjdk.java.net Subject: Re: [XXS] C1 misses to dump a reason when it inlines successfully Hi, Tobias & Martin, The original message even didn't say inline succeed or not. When I see those, I have to dig into the source code. I spent even more time because I didn't realize that those messages are from C1. That's my standpoint and motivation as a new hotspot developer. What C2 inliner emits "inline (hot)" is not very informative neither, but at least developers know that this callee is inlined. I gave up "by the rules of C1". I think "inline" is fine. developers can also tell difference from C2's output. I agree with you about the default parameter. Failed attempts should always have a reason. Developers can inspect them and make adjustment. Here is a new revision. Could you take a look? https://cr.openjdk.java.net/~xliu/8234541/01/webrev/ sample output is like this. @ 1 java.lang.String::isLatin1 (19 bytes) inline @ 12 java.lang.StringLatin1::charAt (28 bytes) inline @ 15 java/lang/StringIndexOutOfBoundsException:: (not loaded) not inlineable @ 21 java/lang/StringUTF16::charAt (not loaded) not inlineable @ 15 java/lang/StringIndexOutOfBoundsException:: (not loaded) not inlineable @ 3 java.util.stream.ReduceOps::makeInt (18 bytes) inline @ 1 java.util.Objects::requireNonNull (14 bytes) inline @ 8 java.lang.NullPointerException:: (5 bytes) don't inline Throwable constructors @ 14 java.util.stream.ReduceOps$6:: (16 bytes) inline @ 12 java.util.stream.ReduceOps$ReduceOp:: (10 bytes) inline @ 1 java.lang.Object:: (1 bytes) inline @ 6 java.util.stream.AbstractPipeline::evaluate (94 bytes) callee is too large thanks, --lx On Thu, Nov 28, 2019 at 6:53 AM Doerr, Martin > wrote: Hi, I agree with Tobias. " by the rules of C1" doesn't add any useful information IMHO. But I'd be fine with just adding "inline" to avoid the confusion. > Anyway, with your change, all callers now pass a msg argument and > therefore the null handling logic > should be removed (replaced by an assert). Also, the default value for the > argument can be removed. +1 Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Tobias Hartmann > Sent: Donnerstag, 28. November 2019 15:04 > To: Liu Xin >; hotspot-compiler-dev at openjdk.java.net > Subject: Re: [XXS] C1 misses to dump a reason when it inlines successfully > > Hi, > > I'm still not convinced that this message adds any useful information but let's > see what other > reviewers think. > > Anyway, with your change, all callers now pass a msg argument and > therefore the null handling logic > should be removed (replaced by an assert). Also, the default value for the > argument can be removed. > > Best regards, > Tobias > > On 22.11.19 19:09, Liu Xin wrote: > > hi, Reviewers, > > > > Could you review this extremely small change? > > Bugs: https://bugs.openjdk.java.net/browse/JDK-8234541 > > Webrev: https://cr.openjdk.java.net/~xliu/8234541/00/webrev/ > > > > When I analyzed PrintInlining, I was confused by the inline message > without > > any detail. It's not easy for developer to tell if this method is inlined > > or not. This patch add a comment "inline by the rules of C1". > > > > I would like to add an explicit reason, but there's no decisive reason in > > GraphBuilder::try_inline_full. It just passes all restrict rules. Any other > > suggestion would be appreciated. > > > > Thanks, > > --lx > > From tobias.hartmann at oracle.com Fri Nov 29 08:56:14 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 29 Nov 2019 09:56:14 +0100 Subject: [XXS] C1 misses to dump a reason when it inlines successfully In-Reply-To: References: <3759c24e-cdd7-a39c-40cd-26a403a07477@oracle.com> Message-ID: Hi, looks good to me too. Best regards, Tobias On 29.11.19 09:51, Doerr, Martin wrote: > Hi, > > ? > > looks good to me. Thanks for cleaning it up. > > ? > > Best regards, > > Martin > > ? > > ? > > *From:*Liu Xin > *Sent:* Freitag, 29. November 2019 08:17 > *To:* Doerr, Martin > *Cc:* Tobias Hartmann ; hotspot-compiler-dev at openjdk.java.net > *Subject:* Re: [XXS] C1 misses to dump a reason when it inlines successfully > > ? > > Hi, Tobias & Martin,? > > ? > > The original?message even didn't say inline succeed or not. When I see those, I have to dig into the > source code. I spent even more time because I didn't realize that those messages are from C1. That's > my standpoint and motivation as a new hotspot developer. > > What C2 inliner emits "inline (hot)"? is not very informative neither, but at least developers know > that this callee is inlined. I gave up "by the rules of C1". I think "inline" is fine. developers > can also tell difference from C2's output.? > > ? > > I agree with you about the default parameter.? Failed attempts should always have a reason. > Developers can inspect them and make adjustment. > > ? > > Here is a new revision. Could you take a look? > > https://cr.openjdk.java.net/~xliu/8234541/01/webrev/ > > ? > > sample output is like this.?? > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 1 ? java.lang.String::isLatin1 (19 bytes) ? inline > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 12 ? java.lang.StringLatin1::charAt (28 bytes) ? inline > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 15 ?java/lang/StringIndexOutOfBoundsException:: (not loaded) > ? not inlineable > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 21 ?java/lang/StringUTF16::charAt (not loaded) ? not inlineable > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 15 ?java/lang/StringIndexOutOfBoundsException:: (not loaded) ? > not inlineable > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 3 ? java.util.stream.ReduceOps::makeInt (18 bytes) ? inline > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 1 ? java.util.Objects::requireNonNull (14 bytes) ? inline > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 8 ? java.lang.NullPointerException:: (5 bytes) ? don't > inline Throwable constructors > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 14 ? java.util.stream.ReduceOps$6:: (16 bytes) ? inline > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 12 ? java.util.stream.ReduceOps$ReduceOp:: (10 bytes) ? inline > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 1 ? java.lang.Object:: (1 bytes) ? inline > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 6 ? java.util.stream.AbstractPipeline::evaluate (94 bytes) ? callee > is too large > > thanks,? > > --lx > > ? > > ? > > ? > > On Thu, Nov 28, 2019 at 6:53 AM Doerr, Martin > > wrote: > > Hi, > > I agree with Tobias. " by the rules of C1" doesn't add any useful information IMHO. > But I'd be fine with just adding "inline" to avoid the confusion. > > > Anyway, with your change, all callers now pass a msg argument and > > therefore the null handling logic > > should be removed (replaced by an assert). Also, the default value for the > > argument can be removed. > +1 > > Best regards, > Martin > > > > -----Original Message----- > > From: hotspot-compiler-dev > bounces at openjdk.java.net > On Behalf Of Tobias Hartmann > > Sent: Donnerstag, 28. November 2019 15:04 > > To: Liu Xin >; > hotspot-compiler-dev at openjdk.java.net > > Subject: Re: [XXS] C1 misses to dump a reason when it inlines successfully > > > > Hi, > > > > I'm still not convinced that this message adds any useful information but let's > > see what other > > reviewers think. > > > > Anyway, with your change, all callers now pass a msg argument and > > therefore the null handling logic > > should be removed (replaced by an assert). Also, the default value for the > > argument can be removed. > > > > Best regards, > > Tobias > > > > On 22.11.19 19:09, Liu Xin wrote: > > > hi, Reviewers, > > > > > > Could you review this extremely small change? > > > Bugs: https://bugs.openjdk.java.net/browse/JDK-8234541 > > > Webrev: https://cr.openjdk.java.net/~xliu/8234541/00/webrev/ > > > > > > When I analyzed PrintInlining, I was confused by the inline message > > without > > > any detail. It's not easy for developer to tell if this method is inlined > > > or not. This patch add a comment "inline by the rules of C1". > > > > > > I would like to add an explicit reason, but there's no decisive reason in > > > GraphBuilder::try_inline_full. It just passes all restrict rules. Any other > > > suggestion would be appreciated. > > > > > > Thanks, > > > --lx > > > > From aph at redhat.com Fri Nov 29 10:07:38 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 29 Nov 2019 10:07:38 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> Message-ID: <8a0ae655-8544-a4fc-7551-d7634ebdaaa8@redhat.com> On 11/29/19 3:41 AM, Pengfei Li (Arm Technology China) wrote: > The 4G alignment search may still fail after the fix. It may, but very unlikely. Regarding to my second webrev, do you agree that checking the base and shift values in a function in aarch64.ad? See my last email [1] for detail explanations. > > [1] https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2019-November/008278.html Not really, no. A method should be called exactly once from the code that does the memory allocation, and then set a flag to be read thereafter. It is not ideal to do it from the MacroAssembler constructor, because Assembler instances are created wihte very hihg frequency. I don't undestand why you simply can't do what I suggested. You say > But we have to do it in Metaspace::set_narrow_klass_base_and_shift() > where the base and shift are finally determined and introduce new > code block of "#ifdef AARCH64 #endif" in HotSpot shared code. So do that, or perhaps introduce an overridable function in AbstractAssembler which does nothing on other ports. But don't keep executing the same logic again and again. Once base and shift are set they never change. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Fri Nov 29 10:10:07 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 29 Nov 2019 10:10:07 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> Message-ID: <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com> On 11/29/19 6:40 AM, Nick Gasson wrote: > On 29/11/2019 11:41, Pengfei Li (Arm Technology China) wrote: >> The 4G alignment search may still fail after the fix. Regarding to my second webrev, do you agree that checking the base and shift values in a function in aarch64.ad? See my last email [1] for detail explanations. >> > How about we exit with a fatal error if we can't find a suitably aligned > region? Then we can remove the code in decode_klass_non_null that uses > R27 and this patch is much simpler. That code path is poorly tested at > the moment so it seems risky to leave it in. With a hard error at least > users will report it to us so we can fix it. That is starting to sound very attractive. With a 64-bit address space I'm finding it very hard to imagine a scenario in which we don't find a suitable address. I think AOT-compiled code would still be OK, because it generates different code, but we'd have to do some testing. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From boris.ulasevich at bell-sw.com Fri Nov 29 10:15:48 2019 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Fri, 29 Nov 2019 13:15:48 +0300 Subject: RFR(S) 8234893: ARM32: build failure after JDK-8234387 In-Reply-To: References: Message-ID: <813c0785-01b7-0be4-a084-eb4306040fc3@bell-sw.com> Thank you! On 28.11.2019 11:55, Vladimir Ivanov wrote: > Looks good. > > Best regards, > Vladimir Ivanov > > On 28.11.2019 11:42, Boris Ulasevich wrote: >> Hi, >> >> Please review the fix in arm.ad to address the ARM32 build issue >> "Ideal node missing". The fix is just trivial adding missing >> declarations: R8RegP, R9RegP, R12RegP, SPRegP. jdk/hotspot jtreg tests >> are Ok. >> >> http://bugs.openjdk.java.net/browse/JDK-8234893 >> http://cr.openjdk.java.net/~bulasevich/8234893/webrev.00 >> >> thanks, >> Boris From vladimir.x.ivanov at oracle.com Fri Nov 29 14:19:52 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 29 Nov 2019 17:19:52 +0300 Subject: [14] RFR(S): 8234617: C1: Incorrect result of field load due to missing narrowing conversion In-Reply-To: References: Message-ID: > http://cr.openjdk.java.net/~thartmann/8234617/webrev.00/ Looks good. Best regards, Vladimir Ivanov > > Writing an (integer) value to a boolean, byte, char or short field includes an implicit narrowing > conversion [1]. With -XX:+EliminateFieldAccess (default), C1 tries to omit field loads by caching > and reusing the last written value. The problem is that this value is not necessarily converted to > the field type and we end up using an incorrect value. > > For example, for the field store/load in testShort, C1 emits: > [...] > 0x00007f0fc582bd6c: mov %dx,0x12(%rsi) > 0x00007f0fc582bd70: mov %rdx,%rax > [...] > > The field load has been eliminated and the non-converted integer value (%rdx) is returned. > > The fix is to emit an explicit conversion to get the correct field value after the write: > [...] > 0x00007ff07982bd6c: mov %dx,0x12(%rsi) > 0x00007ff07982bd70: movswl %dx,%edx > 0x00007ff07982bd73: mov %rdx,%rax > [...] > > Thanks, > Tobias > > [1] https://docs.oracle.com/javase/specs/jvms/se13/html/jvms-6.html#jvms-6.5.putfield > From vladimir.x.ivanov at oracle.com Fri Nov 29 15:28:08 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 29 Nov 2019 18:28:08 +0300 Subject: [14] RFR (S): 8231430: C2: Memory stomp in max_array_length() for T_ILLEGAL type In-Reply-To: <83147646-d353-1f46-f50b-8c0edf16645f@oracle.com> References: <00ab4462-ca1d-7e37-6e92-aca8e975e79d@oracle.com> <83147646-d353-1f46-f50b-8c0edf16645f@oracle.com> Message-ID: Thanks for the review, Vladimir. > May be we should have permanent guarantee() in > TypeAryPtr::max_array_length() for all types which we don't expect to > see and not temporary assert(). Makes sense. What do you think about the following? http://cr.openjdk.java.net/~vlivanov/8231430/webrev.01/ Best regards, Vladimir Ivanov > On 11/27/19 5:54 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8231430/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8231430 >> >> There's a memory stomp happening in max_array_length() for T_ILLEGAL >> type. T_ILLEGAL type arises as an element basic type for a merge of 2 >> primitive arrays (bottom[]). max_array_length() does some input >> normalization (T_ILLEGAL => T_BYTE), but first it acquires a reference >> to the a cache slot which is out-of-bounds (T_ILLEGAL = 99 vs >> T_CONFLICT = 19). >> >> I was able to reproduce the problem as a corruption of one of the OOPs >> in Universe::_mirrors array which happened to be put close enough to >> max_array_length_cache in memory. >> >> I propose to completely remove the cache. >> arrayOopDesc::max_array_length() doesn't look too expensive and the >> method is not used on a hot path anywhere. >> >> Also, I put an assert for T_VOID, T_CONFLICT, T_NARROWKLASS cases, but >> left the logic there (=> T_BYTE) to get more testing before removing >> them. >> >> Testing: hs-precheckin-comp, tier1-5. >> >> Best regards, >> Vladimir Ivanov From vladimir.x.ivanov at oracle.com Fri Nov 29 15:42:14 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 29 Nov 2019 18:42:14 +0300 Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap unsafe accesses Message-ID: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8226411 There were a number of fixes in C2 support for unsafe accesses recently which led to additional memory barriers around them. It improved stability, but in some cases it was redundant. One of important use cases which regressed is off-heap accesses [1]. The barriers around them are redundant because they are serialized on raw memory and don't intersect with any on-heap accesses. Proposed fix skips memory barriers around unsafe accesses which are provably off-heap (base == NULL). It (almost completely) recovers performance on the microbenchmark provided in JDK-8224182 [1]. Testing: tier1-6. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8224182 From adinn at redhat.com Fri Nov 29 15:48:45 2019 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 29 Nov 2019 15:48:45 +0000 Subject: RFR: (T) AArch64: compiler/c2/aarch64/TestVolatilesG1.java fails with "Missing expected output membar_volatile..." Message-ID: Could I please have a review of the following fix to file TestVolatiles.java which is currently broken after the commit of the fix for relates to JDK-8225776 (Optimize branch frequency of G1's write post-barrier in C2). JIRA: https://bugs.openjdk.java.net/browse/JDK-8225776 webrev: http://cr.openjdk.java.net/~adinn/8232828/webrev.00 The test parses compiler AArch64 PrintAssembly output foir a variety of volatile read, write and CAS operations to check that membars are added or omitted appropriately when using, respectively, acquire/release accesses vs unordered accesses supplemented with barriers. n.b. the test only runs on a debug build JVM. The current parse check expects to see G1 barrier operations between the access and the trailing barrier + return. JDK-8225776 relocates the barrier operations out of line after the trailing barrier + return. The test has been updated so that the expected pattern of instructions reflects this new order. Testing: Before the fix the test fails. After the fix it succeeds. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From vladimir.x.ivanov at oracle.com Fri Nov 29 15:55:43 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 29 Nov 2019 18:55:43 +0300 Subject: [14] RFR (S): 8234923: Missed call_site_target nmethod dependency for non-fully initialized ConstantCallSite instance Message-ID: <7d4c2ab1-f8ec-8ccc-a442-8401a048b353@oracle.com> http://cr.openjdk.java.net/~vlivanov/8234923/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8234923 The fix for 8234401 is incomplete: though it does notify the JVM about call site target update (by calling setTargetNormal [2]), on JVM side JITs skip nmethod dependencies for ConstantCallSites irrespective of whether they are fully initialized or not. So, affected nmethods aren't invalidated. The fix is to make JITs aware about initialization status of ConstantCallSite instance by inspecting its isFrozen field and register proper nmethod dependency if a CallSite is being initialized. Testing: tier1-6 Best regards, Vladimir Ivanov [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036019.html [2] https://hg.openjdk.java.net/jdk/jdk/rev/a6e25566cb56#l1.26 From aph at redhat.com Fri Nov 29 15:56:57 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 29 Nov 2019 15:56:57 +0000 Subject: RFR: (T) AArch64: compiler/c2/aarch64/TestVolatilesG1.java fails with "Missing expected output membar_volatile..." In-Reply-To: References: Message-ID: On 11/29/19 3:48 PM, Andrew Dinn wrote: > JIRA: https://bugs.openjdk.java.net/browse/JDK-8225776 > webrev: http://cr.openjdk.java.net/~adinn/8232828/webrev.00 > > The test parses compiler AArch64 PrintAssembly output foir a variety of > volatile read, write and CAS operations to check that membars are added > or omitted appropriately when using, respectively, acquire/release > accesses vs unordered accesses supplemented with barriers. n.b. the test > only runs on a debug build JVM. Cool, thanks. The endless game of whack-a-mole. :-) Trouble is, there are many possible correct sequences. Still, we can't check for them all without making this test AI-complete! -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rkennke at redhat.com Fri Nov 29 15:59:26 2019 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 29 Nov 2019 16:59:26 +0100 Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap unsafe accesses In-Reply-To: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com> References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com> Message-ID: <024ee660-a0e5-e899-5c48-4ca12ffa37fa@redhat.com> Hi Vladimir, This is cool. Does it affect this: https://bugs.openjdk.java.net/browse/JDK-8220714 Thanks, Roman > http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8226411 > > There were a number of fixes in C2 support for unsafe accesses recently > which led to additional memory barriers around them. It improved > stability, but in some cases it was redundant. One of important use > cases which regressed is off-heap accesses [1]. The barriers around them > are redundant because they are serialized on raw memory and don't > intersect with any on-heap accesses. > > Proposed fix skips memory barriers around unsafe accesses which are > provably off-heap (base == NULL). > > It (almost completely) recovers performance on the microbenchmark > provided in JDK-8224182 [1]. > > Testing: tier1-6. > > Best regards, > Vladimir Ivanov > > [1] https://bugs.openjdk.java.net/browse/JDK-8224182 > From adinn at redhat.com Fri Nov 29 16:12:14 2019 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 29 Nov 2019 16:12:14 +0000 Subject: RFR: (T) AArch64: compiler/c2/aarch64/TestVolatilesG1.java fails with "Missing expected output membar_volatile..." In-Reply-To: References: Message-ID: On 29/11/2019 15:56, Andrew Haley wrote: > On 11/29/19 3:48 PM, Andrew Dinn wrote: >> JIRA: https://bugs.openjdk.java.net/browse/JDK-8225776 >> webrev: http://cr.openjdk.java.net/~adinn/8232828/webrev.00 >> >> The test parses compiler AArch64 PrintAssembly output foir a variety of >> volatile read, write and CAS operations to check that membars are added >> or omitted appropriately when using, respectively, acquire/release >> accesses vs unordered accesses supplemented with barriers. n.b. the test >> only runs on a debug build JVM. > > Cool, thanks. The endless game of whack-a-mole. :-) > > Trouble is, there are many possible correct sequences. Still, we can't > check for them all without making this test AI-complete! There are in general many correct sequences for all programs. In this case we have a very straightforward program so there is no room for variation in the generated code -- modulo code inline/out of line scheduling fixes like 8232828, that is. So, although this test in no way guarantees that every volatile/CAS wil have the correct sequence of generated/elided barriers it will reliably check that the barrier elision/generation has not been messed up wholesale (the more logically minded readers will already have marshalled their quantifiers and negation operators accordingly). Anyway, I'll take your 'cool, thanks' as a confirmation that this is indeed trivial and also as a license to push. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From aph at redhat.com Fri Nov 29 16:14:08 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 29 Nov 2019 16:14:08 +0000 Subject: RFR: (T) AArch64: compiler/c2/aarch64/TestVolatilesG1.java fails with "Missing expected output membar_volatile..." In-Reply-To: References: Message-ID: On 11/29/19 4:12 PM, Andrew Dinn wrote: > Anyway, I'll take your 'cool, thanks' as a confirmation that this is > indeed trivial and also as a license to push. Yes, indeed. Sorry, I hadn't intended to be vague. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.x.ivanov at oracle.com Fri Nov 29 17:18:30 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 29 Nov 2019 20:18:30 +0300 Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap unsafe accesses In-Reply-To: <024ee660-a0e5-e899-5c48-4ca12ffa37fa@redhat.com> References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com> <024ee660-a0e5-e899-5c48-4ca12ffa37fa@redhat.com> Message-ID: > Does it affect this: > https://bugs.openjdk.java.net/browse/JDK-8220714 Good point, Roman. Proposed patch breaks the fix for JDK-8220714. I'll investigate what happens there and come back with a revised fix. Best regards, Vladimir Ivanov >> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8226411 >> >> There were a number of fixes in C2 support for unsafe accesses recently >> which led to additional memory barriers around them. It improved >> stability, but in some cases it was redundant. One of important use >> cases which regressed is off-heap accesses [1]. The barriers around them >> are redundant because they are serialized on raw memory and don't >> intersect with any on-heap accesses. >> >> Proposed fix skips memory barriers around unsafe accesses which are >> provably off-heap (base == NULL). >> >> It (almost completely) recovers performance on the microbenchmark >> provided in JDK-8224182 [1]. >> >> Testing: tier1-6. >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8224182 >> > From vladimir.x.ivanov at oracle.com Fri Nov 29 18:22:39 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 29 Nov 2019 21:22:39 +0300 Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap unsafe accesses In-Reply-To: References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com> <024ee660-a0e5-e899-5c48-4ca12ffa37fa@redhat.com> Message-ID: <0330b8f0-28c7-1372-2d18-2b5d0ced05c3@oracle.com> Roman, JDK-8220714 looks like a bug in Shenandoah barrier expansion. I slightly modified the test to simplify the analysis [1]. While running the test [2] I'm seeing the following: 127 LoadI === 44 7 125 160 StoreI === 44 7 91 127 94 LoadI === 44 7 91 193 StoreI === 44 160 125 94 After expansion is over, it looks as follows: 127 LoadI === 44 209 125 160 StoreI === 44 209 91 127 94 LoadI === 44 160 91 193 StoreI === 44 160 125 94 Note that 94 LoadI depends on 160 StoreI memory now. Before the expansion they were independent (7 Parm == initial memory state). And then 94 goes away, since it now reads updated value: < < 94 LoadI === _ _ _ [[]] [3200094] > int 127 LoadI === 44 209 125 [[ 160 193 ]] @rawptr:BotPTR, idx=Raw; unsafe #int (does not depend only on test) !orig=[94] !jvms: Unsafe::getInt @ bci:3 TestUnsafeOffheapSwap$Memory::getInt @ bci:14 TestUnsafeOffheapSwap$Memory::swap @ bci:10 TestUnsafeOffheapSwap::testUnsafeHelper @ bci:7 The final graph is evidently wrong: 127 LoadI === 44 209 125 160 StoreI === 44 209 91 127 193 StoreI === 44 160 125 127 Let me know how you want to proceed with it. Best regards, Vladimir Ivanov [1] diff --git a/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java b/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java --- a/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java +++ b/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java @@ -57,6 +57,16 @@ } } + static void testUnsafeHelper(int i) { + mem.swap(i - 1, i); + } + + static void testArrayHelper(int[] arr, int i) { + int tmp = arr[i - 1]; + arr[i - 1] = arr[i]; + arr[i] = tmp; + } + static void test() { Random rnd = new Random(SEED); for (int i = 0; i < SIZE; i++) { @@ -72,10 +82,8 @@ } for (int i = 1; i < SIZE; i++) { - mem.swap(i - 1, i); - int tmp = arr[i - 1]; - arr[i - 1] = arr[i]; - arr[i] = tmp; + testUnsafeHelper(i); + testArrayHelper(arr, i); } for (int i = 0; i < SIZE; i++) { [2] $ java -cp JTwork/classes/gc/shenandoah/compiler/TestUnsafeOffheapSwap.d/ --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED -XX:-UseOnStackReplacement -XX:-BackgroundCompilation -XX:-TieredCompilation -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:+PrintCompilation -XX:CICompilerCount=1 -XX:CompileCommand=quiet -XX:CompileCommand=compileonly,*::testUnsafeHelper -XX:CompileCommand=print,*::testUnsafeHelper -XX:PrintIdealGraphLevel=0 -XX:-VerifyOops -XX:-UseCompressedOops TestUnsafeOffheapSwap Best regards, Vladimir Ivanov On 29.11.2019 20:18, Vladimir Ivanov wrote: > >> Does it affect this: >> https://bugs.openjdk.java.net/browse/JDK-8220714 > > Good point, Roman. Proposed patch breaks the fix for JDK-8220714. > > I'll investigate what happens there and come back with a revised fix. > > Best regards, > Vladimir Ivanov > >>> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8226411 >>> >>> There were a number of fixes in C2 support for unsafe accesses recently >>> which led to additional memory barriers around them. It improved >>> stability, but in some cases it was redundant. One of important use >>> cases which regressed is off-heap accesses [1]. The barriers around them >>> are redundant because they are serialized on raw memory and don't >>> intersect with any on-heap accesses. >>> >>> Proposed fix skips memory barriers around unsafe accesses which are >>> provably off-heap (base == NULL). >>> >>> It (almost completely) recovers performance on the microbenchmark >>> provided in JDK-8224182 [1]. >>> >>> Testing: tier1-6. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8224182 >>> >> From rkennke at redhat.com Fri Nov 29 19:01:54 2019 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 29 Nov 2019 20:01:54 +0100 Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap unsafe accesses In-Reply-To: <0330b8f0-28c7-1372-2d18-2b5d0ced05c3@oracle.com> References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com> <024ee660-a0e5-e899-5c48-4ca12ffa37fa@redhat.com> <0330b8f0-28c7-1372-2d18-2b5d0ced05c3@oracle.com> Message-ID: <7776c872-e5a1-2dc4-4cf7-b1c733b6a314@redhat.com> Hi Vladimir, Oh! Now that is surprising, and weird! I will have to discuss this with Roland and likely open a new bug for this. Thanks for figuring this out! Please carry on with your changes then. Thanks, Roman > Roman, > > JDK-8220714 looks like a bug in Shenandoah barrier expansion. > > I slightly modified the test to simplify the analysis [1]. > > While running the test [2] I'm seeing the following: > > ? 127 LoadI? === 44?? 7 125 > ? 160 StoreI === 44?? 7? 91 127 > ?? 94 LoadI? === 44?? 7? 91 > ? 193 StoreI === 44 160 125? 94 > > After expansion is over, it looks as follows: > > ? 127 LoadI? ===? 44 209 125 > ? 160 StoreI ===? 44 209? 91 127 > ?? 94 LoadI? ===? 44 160? 91 > ? 193 StoreI ===? 44 160 125? 94 > > Note that 94 LoadI depends on 160 StoreI memory now. Before the > expansion they were independent (7 Parm == initial memory state). > > And then 94 goes away, since it now reads updated value: > > > int???????? 127??? LoadI??? ===? 44? 209? 125? [[ 160? 193 ]]? > @rawptr:BotPTR, idx=Raw; unsafe #int (does not depend only on test) > !orig=[94] !jvms: Unsafe::getInt @ bci:3 > TestUnsafeOffheapSwap$Memory::getInt @ bci:14 > TestUnsafeOffheapSwap$Memory::swap @ bci:10 > TestUnsafeOffheapSwap::testUnsafeHelper @ bci:7 > > The final graph is evidently wrong: > > ? 127 LoadI? ===? 44 209 125 > ? 160 StoreI ===? 44 209? 91 127 > ? 193 StoreI ===? 44 160 125 127 > > Let me know how you want to proceed with it. > > Best regards, > Vladimir Ivanov > > [1] > > diff --git > a/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java > b/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java > --- a/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java > +++ b/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java > @@ -57,6 +57,16 @@ > ???????? } > ???? } > > +??? static void testUnsafeHelper(int i) { > +??????? mem.swap(i - 1, i); > +??? } > + > +??? static void testArrayHelper(int[] arr, int i) { > +??????? int tmp = arr[i - 1]; > +??????? arr[i - 1] = arr[i]; > +??????? arr[i] = tmp; > +??? } > + > ???? static void test() { > ???????? Random rnd = new Random(SEED); > ???????? for (int i = 0; i < SIZE; i++) { > @@ -72,10 +82,8 @@ > ???????? } > > ???????? for (int i = 1; i < SIZE; i++) { > -??????????? mem.swap(i - 1, i); > -??????????? int tmp = arr[i - 1]; > -??????????? arr[i - 1] = arr[i]; > -??????????? arr[i] = tmp; > +??????????? testUnsafeHelper(i); > +??????????? testArrayHelper(arr, i); > ???????? } > > ???????? for (int i = 0; i < SIZE; i++) { > > [2] $ java -cp > JTwork/classes/gc/shenandoah/compiler/TestUnsafeOffheapSwap.d/ > --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED > -XX:-UseOnStackReplacement -XX:-BackgroundCompilation > -XX:-TieredCompilation? -XX:+UnlockExperimentalVMOptions > -XX:+UseShenandoahGC -XX:+PrintCompilation -XX:CICompilerCount=1 > -XX:CompileCommand=quiet > -XX:CompileCommand=compileonly,*::testUnsafeHelper > -XX:CompileCommand=print,*::testUnsafeHelper -XX:PrintIdealGraphLevel=0 > -XX:-VerifyOops -XX:-UseCompressedOops TestUnsafeOffheapSwap > > > Best regards, > Vladimir Ivanov > > On 29.11.2019 20:18, Vladimir Ivanov wrote: >> >>> Does it affect this: >>> https://bugs.openjdk.java.net/browse/JDK-8220714 >> >> Good point, Roman. Proposed patch breaks the fix for JDK-8220714. >> >> I'll investigate what happens there and come back with a revised fix. >> >> Best regards, >> Vladimir Ivanov >> >>>> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8226411 >>>> >>>> There were a number of fixes in C2 support for unsafe accesses recently >>>> which led to additional memory barriers around them. It improved >>>> stability, but in some cases it was redundant. One of important use >>>> cases which regressed is off-heap accesses [1]. The barriers around >>>> them >>>> are redundant because they are serialized on raw memory and don't >>>> intersect with any on-heap accesses. >>>> >>>> Proposed fix skips memory barriers around unsafe accesses which are >>>> provably off-heap (base == NULL). >>>> >>>> It (almost completely) recovers performance on the microbenchmark >>>> provided in JDK-8224182 [1]. >>>> >>>> Testing: tier1-6. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> [1] https://bugs.openjdk.java.net/browse/JDK-8224182 >>>> >>> > From john.r.rose at oracle.com Fri Nov 29 23:30:42 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 29 Nov 2019 15:30:42 -0800 Subject: [14] RFR (S): 8234923: Missed call_site_target nmethod dependency for non-fully initialized ConstantCallSite instance In-Reply-To: <7d4c2ab1-f8ec-8ccc-a442-8401a048b353@oracle.com> References: <7d4c2ab1-f8ec-8ccc-a442-8401a048b353@oracle.com> Message-ID: Reviewed. > On Nov 29, 2019, at 7:55 AM, Vladimir Ivanov wrote: > > http://cr.openjdk.java.net/~vlivanov/8234923/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8234923 From ioi.lam at oracle.com Sat Nov 30 01:02:29 2019 From: ioi.lam at oracle.com (Ioi Lam) Date: Fri, 29 Nov 2019 17:02:29 -0800 Subject: RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64 In-Reply-To: References: Message-ID: <0b33fa95-ff15-b628-1891-f990f239e60f@oracle.com> Hi Pengfei, I have cc-ed hotspot-compiler-dev at openjdk.java.net. Please do not push the patch until someone from hotspot-compiler-dev has looked at it. Many people are away due to Thanksgiving in the US. Thanks - Ioi On 11/28/19 7:56 PM, Pengfei Li (Arm Technology China) wrote: > Hi, > > Please help review this small fix for 64-bit client build. > > Webrev: http://cr.openjdk.java.net/~pli/rfr/8234791/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8234791 > > Current 64-bit client VM build fails because errors occurred in dumping > the CDS archive. In JDK 12, we enabled "Default CDS Archives"[1] which > runs "java -Xshare:dump" after linking the JDK image. But for Client VM > build on 64-bit platforms, the ergonomic flag UseCompressedOops is not > set.[2] This leads to VM exits in checking the flags for dumping the > shared archive.[3] > > This change removes the "#if defined" macro to make shared archive dump > successful in 64-bit client build. By tracking the history of the macro, > I found it is initially added as "#ifndef COMPILER1"[4] 10 years ago > when C1 did not have a good support of compressed oops and modified to > current shape[5] in the implementation of tiered compilation. It should > be safe to be removed today. > > This patch also fixes another client build issue on AArch64. > > [1] http://openjdk.java.net/jeps/341 > [2] http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l1694 > [3] http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l3551 > [4] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/323bd24c6520#l11.7 > [5] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/d5d065957597#l86.56 > > -- > Thanks, > Pengfei > From john.r.rose at oracle.com Sat Nov 30 01:28:28 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 29 Nov 2019 17:28:28 -0800 Subject: [14] RFR (L): 8234391: C2: Generic vector operands In-Reply-To: <89904467-5010-129f-6f61-e279cce8936a@oracle.com> References: <89904467-5010-129f-6f61-e279cce8936a@oracle.com> Message-ID: <375F08F3-BE51-41F0-A20C-9443CB9F0D46@oracle.com> Wow, very impressive work, Jatin and Vladimir. Here?s an extra review in case it helps. In machnode.cpp I think the code would be easier to read if a common subexpression were assigned a name ?num_edges? as in similar places: uint num_edges = _opnds[opcnt]->num_edges(); (Alternatively, you could just increment ?skipped? on the fly, which would be OK too.) I?m being picky because it cost me a second to verify that the set of raw edges was processed exhaustively. This is a task every reader of the code will have to do. Another nit: There is some dissonance between the seemingly general name postselect_cleanup and supports_generic_vector_operands, which is a very specific name. But they refer to the same condition. Pick the shorter name, maybe, and convert the longer one into a nice comment. It?s also not clear that helpers like process_mach_node and get_vector_operand are is part of postselect_cleanup. Should they be cleanup_mach_node cleanup_vector_operand? It?s a nit-pick, but might help future readers. The name get_vector_operand is particularly bad, since it suggests that it accesses something previously computed, where in fact it transforms the graph. Also, IMO, ?get_? is one of those noise words which offers little help to the reader and just takes up valuable screen space. The name clone_generic_vector_operand is confusing; I would expect it to be called something like [specialize,cleanup,?]_generic_vector_operand. Thank you for the get_vector_regmask refactor. Relatively few readers prefer walls of repetitive code. There?s a funny condition reported in this comment: // RShiftCntV/RShiftCntV report wide vector type, but VecS as ideal register. It seems to comes out of the blue sky. Maybe add a cross-referencing comment between that and the vshiftcnt instruction in the AD file? (Are there asserts that would catch similar oddities if they were to arise? Was this one caught via an assert? I certainly hope it was, rather than by debugging bad code!) On a similar note (about asserts), I?m very glad to see verify_mach_nodes. The name is a little non-specific. Maybe verify_after_postselect_cleanup. I skimmed the AD files and they look good. For a nicer experience I filtered out the trivial changes (to vec[SDXYZ]) and used this reduced patch file: http://cr.openjdk.java.net/~jrose/vectors/8234391-x86.ad.reduced.patch The diff noise from s/vec/vec1/ is unfortunate. I suppose that?s the price of adding a new type name to the AD file. Glad to see the problem doesn?t show up in the big file x86.ad. Why do vector[xy]_reg_legacy and vectorz_reg_legacy get different treatments in the change set? I?m mainly curious here. The vectorz_reg_vl thing is a dynamic set (?) which is fine, but why is it needed for z and not xy? A comment might help. Also, this gripe is not part of this review, but I?ll make it anyway: The very very short acronym ?vl? which appears here starts for ?AVX512VL? referring to ?variable length? but it bumps into the phrase ?vector legacy? with an unfortunate occasion for confusion. Suggest ?_vl? be renamed to ?_512vl? or some other more specific thing. For the string intrinsics, there?s a regular replacement of legRegS by legRegD. That strikes me as potentially a semantic change. I suppose the register allocator will do the same things in both cases, and there?s no spill code generated by the JIT for such a temp. I wonder, if the change was necessary, how do we know that all the occurrences were correctly found and changed? (Also, the string intrinsic macros use AVX512 instructions when available, and in theory those would require heftier register masks.) Can someone comment on this, for the record?maybe even in the source code? I?m very glad to see the duplicate operand definitions hoisted up to x86.ad. (Cut-and-paste coding makes my skin crawl.) Ignoring those insertions into x86.a, and ignoring the trivial changes of vec[X?] to vec, it turns out that there are 140 new lines added and 160 old lines removed. (Mostly the old lines are vector move instructions of particular sizes.) That?s a win! I suggest that the warning comments ?(leg)Vec should be used instead? could be a little less cryptic. An unconditional warning like this makes the reader wonder, ?so why is it here at all?? Maybe use a cross-reference as in: // Replaces (leg)vec during post-selection cleanup. See above. So, reviewed. I?m relieved to see lots of combinatorial complexity disappear from the AD files. ? John On Nov 19, 2019, at 6:30 AM, Vladimir Ivanov wrote: > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8234391/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8234391 > > Introduce generic vector operands and migrate existing usages from fixed sized operands (vec[SDXYZ]) to generic ones. > > (It's an updated version of generic vector support posted for review in August, 2019 [1] [2]. AD instruction merges will be handled separately.) > > On a high-level it is organized as follows: > > (1) all AD instructions in x86.ad/x86_64.ad/x86_32.ad use vec/legVec; > > (2) at runtime, right after matching is over, a special pass is performed which does: > > * replaces vecOper with vec[SDXYZ] depending on mach node type > - vector mach nodes capute bottom_type() of their ideal prototype; > > * eliminates redundant reg-to-reg vector moves (MoveVec2Leg /MoveLeg2Vec) > - matcher needs them, but they are useless for register allocator (moreover, may cause additional spills); > > > (3) after post-selection pass is over, all mach nodes should have fixed-size vector operands. > > > Some details: > > (1) vec and legVec are marked as "dynamic" operands, so post-selection rewriting works > > > (2) new logic is guarded by new matcher flag (Matcher::supports_generic_vector_operands) which is enabled only on x86 > > > (3) post-selection analysis is implemented as a single pass over the graph and processing individual nodes using their own (for DEF operands) or their inputs (USE operands) bottom_type() (which is an instance of TypeVect) > > > (4) most of the analysis is cross-platform and interface with platform-specific code through 3 methods: > > static bool is_generic_reg2reg_move(MachNode* m); > // distinguishes MoveVec2Leg/MoveLeg2Vec nodes > > static bool is_generic_vector(MachOper* opnd); > // distinguishes vec/legVec operands > > static MachOper* clone_generic_vector_operand(MachOper* generic_opnd, uint ideal_reg); > // constructs fixed-sized vector operand based on ideal reg > // vec + Op_Vec[SDXYZ] => vec[SDXYZ] > // legVec + Op_Vec[SDXYZ] => legVec[SDXYZ] > > > (5) TEMP operands are handled specially: > - TEMP uses max_vector_size() to determine what fixed-sized operand to use > * it is needed to cover reductions which don't produce vectors but scalars > - TEMP_DEF inherits fixed-sized operand type from DEF; > > > (6) there is limited number of special cases for mach nodes in Matcher::get_vector_operand_helper: > > - RShiftCntV/RShiftCntV: though it reports wide vector type as Node::bottom_type(), its ideal_reg is VecS! But for vector nodes only Node::bottom_type() is captured during matching and not ideal_reg(). > > - vshiftcntimm: chain instructions which convert scalar to vector don't have vector type. > > > (7) idealreg2regmask initialization logic is adjusted to handle generic vector operands (see Matcher::get_vector_regmask) > > > (8) operand renaming in x86_32.ad & x86_64.ad to avoid name conflicts with new vec/legVec operands > > > (9) x86_64.ad: all TEMP usages of vecS/legVecS are replaced with regD/legRegD > - it aligns the code between x86_64.ad and x86_32.ad > - strictly speaking, it's illegal to use vector operands on a non-vector node (e.g., string_inflate) unless its usage is guarded by C2 vector support checks (-XX:MaxVectorSize=0) > > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, ? > > Testing: tier1-tier4, jtreg compiler tests on KNL and SKL, > performance testing (SPEC* + Octane + micros / G1 + ParGC). > > Best regards, > Vladimir Ivanov > > [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html > > [2] http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf From john.r.rose at oracle.com Sat Nov 30 07:02:33 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 29 Nov 2019 23:02:33 -0800 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: <7e25e62f-46c1-ec08-2b2f-60436251d12b@oracle.com> References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> <7e25e62f-46c1-ec08-2b2f-60436251d12b@oracle.com> Message-ID: <84C1F5DD-E939-49A0-A82A-258E6E864B77@oracle.com> On Nov 28, 2019, at 1:02 PM, Claes Redestad wrote: > > I just want to point out that the "round up to power of 2" > implementations I've seen seem prone to the same kind of overflows as a > next up would, just not for exactly the same set of inputs. Thanks; I stand corrected.