From fweimer at redhat.com Mon Jul 1 04:34:22 2019 From: fweimer at redhat.com (Florian Weimer) Date: Mon, 01 Jul 2019 06:34:22 +0200 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: (Jiangli Zhou's message of "Fri, 28 Jun 2019 16:42:47 -0700") References: <87sgrwjtdk.fsf@oldenburg2.str.redhat.com> <874l4bzekk.fsf@oldenburg2.str.redhat.com> <87k1d7xxy7.fsf@oldenburg2.str.redhat.com> <87ftnvt604.fsf@oldenburg2.str.redhat.com> <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> Message-ID: <87a7dyid9t.fsf@oldenburg2.str.redhat.com> * Jiangli Zhou: > On both glibc 2.24 and 2.28, by default I see there is one page memory > for static TLS, without explicitly defining any __thread variables in > user code. And glibc uses static TLS for errno and other things, so this is not going away. Thanks, Florian From aoqi at loongson.cn Mon Jul 1 04:56:20 2019 From: aoqi at loongson.cn (Ao Qi) Date: Mon, 1 Jul 2019 12:56:20 +0800 Subject: RFR(trivial): JDK-8226967: Minimal VM: FALSE was not declared in this scope In-Reply-To: <8562156a-afc8-8fa3-4f53-788465d84de1@oracle.com> References: <9d8edf2f-aba9-d2ad-9210-c1b59bd6fcbe@oracle.com> <8562156a-afc8-8fa3-4f53-788465d84de1@oracle.com> Message-ID: On Mon, Jul 1, 2019 at 6:33 AM David Holmes wrote: > > Hi, > > On 29/06/2019 4:28 pm, Ao Qi wrote: > > On Sat, Jun 29, 2019 at 12:45 PM David Holmes wrote: > >> > >> Looks good! > > > > Thanks! > > > >> > >> But I'm afraid I can't sponsor as I'm traveling. > > > > Could some one help to sponsor? > > Looks like it's come back to me. Can you prepare the final committed > changeset please (correct format: bug synopsis, and reviewers) and I > will import and push it. Updated: http://cr.openjdk.java.net/~aoqi/8226967/webrev.01/ Thanks, Ao Qi > > Thanks, > David > > > Cheers, > > Ao Qi > > From david.holmes at oracle.com Mon Jul 1 08:22:42 2019 From: david.holmes at oracle.com (David Holmes) Date: Mon, 1 Jul 2019 18:22:42 +1000 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: References: <874l4bzekk.fsf@oldenburg2.str.redhat.com> <87k1d7xxy7.fsf@oldenburg2.str.redhat.com> <87ftnvt604.fsf@oldenburg2.str.redhat.com> <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> Message-ID: <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> Hi Jiangli, On 29/06/2019 9:42 am, Jiangli Zhou wrote: > Hi David, > > Thanks for the detailed comments! Here is the latest webrev: > http://cr.openjdk.java.net/~jiangli/8225035/webrev.04/. Apologizing > for not including an incremental webrev (realized that when I almost > done edits). That all looks fine - thanks for making the changes. However ... now that I see the logging output it occurred to me that checking for the TLS adjustment is something that should only happen once and we should be storing the adjustment amount in a static for direct use. Sorry I didn't think about this earlier. Something like: static size_t tls_size = 0; static bool tls_size_inited = false; static size_t get_static_tls_area_size(const pthread_attr_t *attr) { + if (!tls_size_inited) { + tls_size_inited = true; if (_get_minstack_func != NULL) { size_t minstack_size = _get_minstack_func(attr); ... if (minstack_size > (size_t)os::vm_page_size() + PTHREAD_STACK_MIN) { tls_size = minstack_size - os::vm_page_size() - PTHREAD_STACK_MIN; } } + } log_info(os, thread)("Stack size adjustment for TLS is " SIZE_FORMAT, tls_size); return tls_size; } Or even fold it all into get_minstack_init() ? I'm assuming that the result of __pthread_get_minstack wont' change over time of course. Thanks, David ----- > On Fri, Jun 28, 2019 at 8:57 AM David Holmes wrote: >> >> Hi Jiangli, >> >> This is very well written up - thanks. >> >> I apologize in advance that I'm about to traveling for a few days so >> won't be able to respond further until next week. > > Hope you have a relaxed and safe travel. This can wait. > >> >> On 27/06/2019 11:58 pm, Jiangli Zhou wrote: >>> Updated webrev: >>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.03/ >> >> Overall changes look good. I also have a concern around this code: >> >> + tls_size = minstack_size - os::vm_page_size() - PTHREAD_STACK_MIN; >> + assert(tls_size > 0, "unexpected size"); >> >> In addition to Thomas's comments re-signedness, can't the result == 0 if >> there is no sttaic TLS in use? Or is there always some static TLS in use? > > On both glibc 2.24 and 2.28, by default I see there is one page memory > for static TLS, without explicitly defining any __thread variables in > user code. > >> >> A few other comments/requests: >> >> 822 static void get_minstack_init() { >> 823 _get_minstack_func = >> 824 (GetMinStack)dlsym(RTLD_DEFAULT, "__pthread_get_minstack"); >> 825 } >> >> Can you add a logging statement please: >> >> log(os, thread)("Lookup of __pthread_get_minstack %s", >> _get_minstack_func == NULL ? "failed" : "succeeded"); > > Added log info. > >> >> -- >> >> 884 // In the Linux NPTL pthread implementation the guard size mechanism >> >> Now that we have additional information on this could you update this >> old comment to say >> >> // In glibc versions prior to 2.7 the guard size mechanism > > Done. > >> >> -- >> >> 897 // Adjust the stack_size by adding the on-stack TLS size if >> 898 // AdjustStackSizeForTLS is true. The guard size is already >> 899 // accounted in this case, please see comments in >> 900 // get_static_tls_area_size(). >> >> Given the extensive commentary in get_static_tls_area_size() can we be >> more brief here and just say: >> >> // Adjust the stack size for on-stack TLS - see get_static_tls_area_size(). > > Done. > >> >> -- >> >> 5193 get_minstack_init(); >> >> Can you make this conditional on AdjustStackSizeForTLS please so there >> is no affect when not using the flag - thanks. > > Done. > >> >> -- >> >> 855 } >> 856 return tls_size; >> >> Can you insert a logging statement: >> >> log(os, thread)("Stack size adjustment for TLS is " SIZE_T_FORMAT, >> tls_size); > > Added. > >> >> --- >> >> test/hotspot/jtreg/runtime/TLS/T.java >> >> 37 // Starting a ProcessBuilder causes the process reaper >> thread being >> >> s/being/to be/ > > Fixed. > >> >> 43 // failure mode the VM fails to create thread with >> error message >> >> s/create thread/create a thread/ > > Fixed. > >> >> 53 System.out.println("Unexpected Echo output: " + >> echoOutput + >> 54 ", expects: " + echoInput); >> >> should this be an exception so that test fails? I can't imagine how echo >> would fail but probably better to fail the test if something unexpected >> happens. > > If no expected output is obtained from echo (due to ProcessBuilder > failure caused by TLS issue), the test does fail and reports to the > caller (returns false). If we throw an explicit expectation, it will > be caught by the outer try/catch, which seems to be unnecessary. > >> >> 66 try { >> 67 br = new BufferedReader(new >> InputStreamReader(inputStream)); >> 68 s = br.readLine(); >> 69 } finally { >> 70 br.close(); >> 71 } >> >> This could use try-with-resources for both streams. > > Sounds good. Done. > >> >> --- >> >> exestack-tls.c >> >> It's simpler if no argument means no-tls and an argument means tls. >> >> 42 char classpath[4096]; >> 43 snprintf(classpath, sizeof classpath, >> 44 "-Djava.class.path=%s", getenv("CLASSPATH")); >> 45 options[0].optionString = classpath; >> >> Do we need to explicitly set the classpath? I'm concerned that our test >> environment uses really, really long paths and a number of them. >> (Probably not 4096 but still ...) > > The classpath needs to be set so we know where to load the test class. > Given that we use 4096 for the same type of usage in other existing > test(s) (for example StackGap), it probably is okay for our test > environments? I can increase the array size if we want to be extra > safe ... > >> >> >> test/hotspot/jtreg/runtime/TLS/testtls.sh >> >> 40 if [ "${VM_OS}" != "linux" ] >> 41 then >> 42 echo "Test is only valid for Linux" >> 43 exit 0 >> 44 fi >> >> This should be done via "@requires os.family != Linux" > > Done. > > Thanks! > > Best regards, > Jiangli >> >> Thanks, >> David >> ----- >> >>> Thanks for everyone's contribution on carving out the current workaround! >>> >>> Best regards, >>> Jiangli >>> >>> On Thu, Jun 27, 2019 at 11:42 AM Jiangli Zhou wrote: >>>> >>>> Thank you Thomas and David! Glad to see that we are converging on an >>>> acceptable approach here. I'll try to factor in all the latest inputs >>>> from everyone and send out a new update. >>>> >>>> Thanks and best regards, >>>> Jiangli >>>> >>>> On Thu, Jun 27, 2019 at 11:13 AM David Holmes wrote: >>>>> >>>>> Trimming .... >>>>> >>>>> On 27/06/2019 12:35 pm, Jiangli Zhou wrote: >>>>>> On Thu, Jun 27, 2019 at 9:23 AM Florian Weimer wrote: >>>>>>> I think you can handle the guard size in this way: >>>>>>> >>>>>>> pthread_attr_setguardsize(&attr, guard_size); >>>>>>> >>>>>>> size_t stack_adjust_size = 0; >>>>>>> if (AdjustStackSizeForTLS) { >>>>>>> size_t minstack_size = get_minstack(&attr); >>>>>>> size_t tls_size = minstack_size - vm_page_size() - PTHREAD_STACK_MIN; >>>>>>> // In glibc before 2.27, tls_size still includes guard_size. >>>>>>> // In glibc 2.27 and later, guard_size is automatically >>>>>>> // added to the stack size by pthread_create. >>>>>>> // In both cases, the guard size is taken into account. >>>>>>> stack_adjust_size += tls_size; >>>>>>> } else { >>>>>>> stack_adjust_size += guard_size; >>>>>>> } >>>>>> >>>>>> Is the vm_page_size() counted for the dl_pagesize? As long as others >>>>>> are okay with the above suggested adjustment, it looks good to me. >>>>>> Thomas, David and others, any objection? >>>>> >>>>> I find the above acceptable. I've been waiting for the dust to settle. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Thanks and best regards, >>>>>> Jiangli >>>>>>> >>>>>>> Thanks, >>>>>>> Florian From fweimer at redhat.com Mon Jul 1 08:27:57 2019 From: fweimer at redhat.com (Florian Weimer) Date: Mon, 01 Jul 2019 10:27:57 +0200 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> (David Holmes's message of "Mon, 1 Jul 2019 18:22:42 +1000") References: <874l4bzekk.fsf@oldenburg2.str.redhat.com> <87k1d7xxy7.fsf@oldenburg2.str.redhat.com> <87ftnvt604.fsf@oldenburg2.str.redhat.com> <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> Message-ID: <87woh2f9bm.fsf@oldenburg2.str.redhat.com> * David Holmes: > I'm assuming that the result of __pthread_get_minstack wont' change > over time of course. Correct, the return value is fixed upon startup. Thanks, Florian From markus.gronlund at oracle.com Mon Jul 1 09:36:17 2019 From: markus.gronlund at oracle.com (Markus Gronlund) Date: Mon, 1 Jul 2019 02:36:17 -0700 (PDT) Subject: [13] RFR(XS): 8227011: Starting a JFR recording in response to JVMTI VMInit and / or Java agent premain corrupts memory Message-ID: <70e80480-4129-45fd-9663-2d6fc6151f01@default> Greetings, Please review the following change set: Bug: https://bugs.openjdk.java.net/browse/JDK-8227011 Webrev: http://cr.openjdk.java.net/~mgronlun/8227011/webrev01/ Test: Testing: test/jdk/:jdk_jfr Summary: See bug. Comment: No test was added under this bug due to the relative complexities involved for reliable tests using instrumentation APIs. There should indeed be follow-up work to better test the interaction of the instrumentation APIs and JFR, some of it is tracked under https://bugs.openjdk.java.net/browse/JDK-8226779 . Thanks Markus From erik.gahlin at oracle.com Mon Jul 1 12:30:56 2019 From: erik.gahlin at oracle.com (Erik Gahlin) Date: Mon, 1 Jul 2019 14:30:56 +0200 Subject: [13] RFR(XS): 8227011: Starting a JFR recording in response to JVMTI VMInit and / or Java agent premain corrupts memory In-Reply-To: <70e80480-4129-45fd-9663-2d6fc6151f01@default> References: <70e80480-4129-45fd-9663-2d6fc6151f01@default> Message-ID: <5D19FD00.8010508@oracle.com> Looks good. Erik > Greetings, > > Please review the following change set: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8227011 > Webrev: http://cr.openjdk.java.net/~mgronlun/8227011/webrev01/ > Test: Testing: test/jdk/:jdk_jfr > > Summary: > See bug. > > Comment: > No test was added under this bug due to the relative complexities involved for reliable tests using instrumentation APIs. > There should indeed be follow-up work to better test the interaction of the instrumentation APIs and JFR, some of it is tracked under https://bugs.openjdk.java.net/browse/JDK-8226779 . > > Thanks > Markus From erik.gahlin at oracle.com Mon Jul 1 14:49:21 2019 From: erik.gahlin at oracle.com (Erik Gahlin) Date: Mon, 1 Jul 2019 16:49:21 +0200 Subject: [13] RFR(XS): 8225706: JFR RootResolver resets CLD claims with no restore In-Reply-To: <1155dea6-6991-4de1-ba8f-b647fbcd6544@default> References: <79acbea0-e3a4-4368-ac05-a1e0cfdc7bc4@default> <8f693e7f-2dec-bf01-4d26-5a4d0ca3fd7a@redhat.com> <1155dea6-6991-4de1-ba8f-b647fbcd6544@default> Message-ID: <0725D7CE-DA8C-43F8-8E3A-93A515E5309D@oracle.com> Looks good Erik > On 26 Jun 2019, at 23:55, Markus Gronlund wrote: > > Hi Zhengyu, > > Thanks for taking a look and for the valuable suggestion. > > Iterating over the CLDG passing _claim_none in the closure will bypass interfering with existing claims (and yes, the code is currently single-threaded here). > > This means we can actually avoid the whole save/restore operation altogether. At least as long as we don't attempt to traverse over metadata (tbd), which would need save/restore and setting claims. > > I think we can simplify this quite a bit in following your suggestion: > > Webrev02: http://cr.openjdk.java.net/~mgronlun/8225706/webrev02/ > > Thanks again Zhengyu > > Markus > > -----Original Message----- > From: Zhengyu Gu > Sent: den 26 juni 2019 14:09 > To: Markus Gronlund ; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: [13] RFR(XS): 8225706: JFR RootResolver resets CLD claims with no restore > > Hi Markus, > > Looks like RootSetClosure::process_roots() is single-threaded. Could you just not clear claimed masks for CLDG and use _claim_none instead to walk CLDG? > > Thanks, > > -Zhengyu > > On 6/25/19 4:58 PM, Markus Gronlund wrote: >> Greetings, >> >> Kindly asking for reviews for the following changeset: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8225706 >> Webrev: http://cr.openjdk.java.net/~mgronlun/8225706/webrev01/ >> Testing: test/jdk/:jdk_jfr >> >> Summary: >> We need to move SaveRestoreCLDClaimBits, from currently inside RootSetClosure::process_roots() back up to EmitEventOperation::doit() for proper scope. >> RootResolver::resolve() clears out already set claim bits via ClassLoaderDataGraph::clear_claimed_marks() under the (currently false) premise that the original claims will be restored later. >> The problematic path triggers only if the option "path-to-gc-roots=true" is set via the command-line or via jcmd AND an existing old object candidate is found. >> >> Thanks to Stefan Karlsson for calling attention to this. >> >> Thank you >> Markus >> From erik.gahlin at oracle.com Mon Jul 1 14:48:41 2019 From: erik.gahlin at oracle.com (Erik Gahlin) Date: Mon, 1 Jul 2019 16:48:41 +0200 Subject: [13] RFR(L): 8214542: JFR: Old Object Sample event slow on a deep heap in debug builds In-Reply-To: <0d9b2624-a0ef-48fa-aae2-b495bd326426@default> References: <0d9b2624-a0ef-48fa-aae2-b495bd326426@default> Message-ID: <395644CF-307B-4F15-A7C4-61F6F9532F5C@oracle.com> Looks good. Erik > On 28 Jun 2019, at 17:56, Markus Gronlund wrote: > > Greetings, > > Kindly asking for reviews for the following changeset: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8214542 > Webrev: http://cr.openjdk.java.net/~mgronlun/8214542/webrev01/ > Testing: test/jdk/:jdk_jfr > > Summary: > Removal of heavy current_frontier assertion. > Trimmed down storage requirements by only storing necessary contextual edges. > Normalized representation of edges for better reuse. > Better preservation of already persisted tree structure by less intrusive reuse logic. > Caching. > > Thank you in advance > Markus From thomas.stuefe at gmail.com Mon Jul 1 16:09:16 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 1 Jul 2019 18:09:16 +0200 Subject: RFR(xxs): 8227035: JVM::printFlags fails in native OOM situations Message-ID: Hi all, JVM::printFlags crashed on me in a native OOM situation. That was unnecessary. small and simple fix: issue: https://bugs.openjdk.java.net/browse/JDK-8227035 webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8227035-jvmflags-printflags-shall-not-allocate/webrev.00/webrev/ Thank you! ..Thomas From thomas.stuefe at gmail.com Mon Jul 1 16:16:46 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 1 Jul 2019 18:16:46 +0200 Subject: RFR(xxs): 8227032: MetaspaceUtils::print_report crashes when called before initialization Message-ID: Hi all, this is a tiny fix which prevents MetaspaceUtils::print_report from crashing when called before metaspace is initialized. Issue: https://bugs.openjdk.java.net/browse/JDK-8227032 webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8227032-metaspaceutils-print-report-pre-init-crash/webrev.00/webrev/index.html Thanks, Thomas From gerard.ziemski at oracle.com Mon Jul 1 16:44:43 2019 From: gerard.ziemski at oracle.com (gerard ziemski) Date: Mon, 1 Jul 2019 11:44:43 -0500 Subject: RFR(xxs): 8227035: JVM::printFlags fails in native OOM situations In-Reply-To: References: Message-ID: <357997bf-6c59-f802-663d-439dd6861811@oracle.com> hi Thomas, Looks good, thank you for fixing this. Can you please share any details, in what situation you ran into a problem with this code? I'd like to know, so in the future I could maybe test out such a scenario on my own? cheers On 7/1/19 11:09 AM, Thomas St?fe wrote: > Hi all, > > JVM::printFlags crashed on me in a native OOM situation. That was > unnecessary. > > small and simple fix: > > issue: https://bugs.openjdk.java.net/browse/JDK-8227035 > webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8227035-jvmflags-printflags-shall-not-allocate/webrev.00/webrev/ > > Thank you! > > ..Thomas > From thomas.stuefe at gmail.com Mon Jul 1 17:14:22 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 1 Jul 2019 19:14:22 +0200 Subject: RFR(xxs): 8227035: JVM::printFlags fails in native OOM situations In-Reply-To: <357997bf-6c59-f802-663d-439dd6861811@oracle.com> References: <357997bf-6c59-f802-663d-439dd6861811@oracle.com> Message-ID: Hi Gerard, thanks for the quick review! Easy to reproduce by setting an ulimit -d or -v and letting the VM run against it. I was looking at a hotspot jtreg test (runtime/memory/RunUnitTestsConcurrently.java) which sporadically allocates a lot of memory and gets killed by the OOM killer. Since that is useless for analysis I set an ulimit -d. That causes malloc or mmap etc to fail with ENOMEM, which calls our error handler. Then I tried to get a detailed NMT report at that point and ran into many problems, one of them this. Cheers, Thomas On Mon, Jul 1, 2019 at 6:46 PM gerard ziemski wrote: > hi Thomas, > > Looks good, thank you for fixing this. > > Can you please share any details, in what situation you ran into a > problem with this code? I'd like to know, so in the future I could maybe > test out such a scenario on my own? > > > cheers > > > On 7/1/19 11:09 AM, Thomas St?fe wrote: > > Hi all, > > > > JVM::printFlags crashed on me in a native OOM situation. That was > > unnecessary. > > > > small and simple fix: > > > > issue: https://bugs.openjdk.java.net/browse/JDK-8227035 > > webrev: > > > http://cr.openjdk.java.net/~stuefe/webrevs/8227035-jvmflags-printflags-shall-not-allocate/webrev.00/webrev/ > > > > Thank you! > > > > ..Thomas > > > > From harold.seigel at oracle.com Mon Jul 1 20:46:00 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Mon, 1 Jul 2019 16:46:00 -0400 Subject: RFR 8226956: Add invocation tests for Graal and C1 Message-ID: <77cd561a-fe69-b112-b68d-dc37523ace39@oracle.com> Hi, Please review this JDK-14 change to add invocation testing of Graal and C1.? The change also includes minor clean-ups of some invocation test code. Open Webrev: http://cr.openjdk.java.net/~hseigel/bug_8226956/webrev/index.html JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8226956 The change was tested by running hs-tier3 tests on Linux-x64, Solaris, Windows, and Mac OS X. Thanks, Harold From demonszhuo at gmail.com Mon Jul 1 15:36:04 2019 From: demonszhuo at gmail.com (zhuo chen) Date: Mon, 1 Jul 2019 23:36:04 +0800 Subject: About the issue of lock downgrade Message-ID: Dear All Please help me, I have a question, from this article http://openjdk.java.net/jeps/8183909. Why is the weight lock here downgraded to lightweight, is there any benefit to doing this? Why not directly downgrade to a bias lock? Thank you for answering my question.Very looking forward to reply. Thank you guys zhuo From jianglizhou at google.com Mon Jul 1 22:33:22 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Mon, 1 Jul 2019 15:33:22 -0700 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> References: <874l4bzekk.fsf@oldenburg2.str.redhat.com> <87k1d7xxy7.fsf@oldenburg2.str.redhat.com> <87ftnvt604.fsf@oldenburg2.str.redhat.com> <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> Message-ID: Hi David, On Mon, Jul 1, 2019 at 1:22 AM David Holmes wrote: > > Hi Jiangli, > > On 29/06/2019 9:42 am, Jiangli Zhou wrote: > > Hi David, > > > > Thanks for the detailed comments! Here is the latest webrev: > > http://cr.openjdk.java.net/~jiangli/8225035/webrev.04/. Apologizing > > for not including an incremental webrev (realized that when I almost > > done edits). > > That all looks fine - thanks for making the changes. > > However ... now that I see the logging output it occurred to me that > checking for the TLS adjustment is something that should only happen > once and we should be storing the adjustment amount in a static for > direct use. Sorry I didn't think about this earlier. Something like: No problem at all! I actually thought in the same direction as well when making the change initially but didn't go with it as I had some concerns. Florian's latest reply is reassuring (thanks again!). So here is the update: http://cr.openjdk.java.net/~jiangli/8225035/webrev_inc.05/. Since we are settling down on the approach and final implementation details, it would be a good idea to get the CSR ball rolling. Could you please review and I'll finalize the CSR. Thanks! Best regards, Jiangli > > static size_t tls_size = 0; > static bool tls_size_inited = false; > > static size_t get_static_tls_area_size(const pthread_attr_t *attr) { > + if (!tls_size_inited) { > + tls_size_inited = true; > if (_get_minstack_func != NULL) { > size_t minstack_size = _get_minstack_func(attr); > ... > if (minstack_size > (size_t)os::vm_page_size() + > PTHREAD_STACK_MIN) { > tls_size = minstack_size - os::vm_page_size() - PTHREAD_STACK_MIN; > } > } > + } > log_info(os, thread)("Stack size adjustment for TLS is " SIZE_FORMAT, > tls_size); > return tls_size; > } > > Or even fold it all into get_minstack_init() ? > > I'm assuming that the result of __pthread_get_minstack wont' change over > time of course. > > Thanks, > David > ----- > > > On Fri, Jun 28, 2019 at 8:57 AM David Holmes wrote: > >> > >> Hi Jiangli, > >> > >> This is very well written up - thanks. > >> > >> I apologize in advance that I'm about to traveling for a few days so > >> won't be able to respond further until next week. > > > > Hope you have a relaxed and safe travel. This can wait. > > > >> > >> On 27/06/2019 11:58 pm, Jiangli Zhou wrote: > >>> Updated webrev: > >>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.03/ > >> > >> Overall changes look good. I also have a concern around this code: > >> > >> + tls_size = minstack_size - os::vm_page_size() - PTHREAD_STACK_MIN; > >> + assert(tls_size > 0, "unexpected size"); > >> > >> In addition to Thomas's comments re-signedness, can't the result == 0 if > >> there is no sttaic TLS in use? Or is there always some static TLS in use? > > > > On both glibc 2.24 and 2.28, by default I see there is one page memory > > for static TLS, without explicitly defining any __thread variables in > > user code. > > > >> > >> A few other comments/requests: > >> > >> 822 static void get_minstack_init() { > >> 823 _get_minstack_func = > >> 824 (GetMinStack)dlsym(RTLD_DEFAULT, "__pthread_get_minstack"); > >> 825 } > >> > >> Can you add a logging statement please: > >> > >> log(os, thread)("Lookup of __pthread_get_minstack %s", > >> _get_minstack_func == NULL ? "failed" : "succeeded"); > > > > Added log info. > > > >> > >> -- > >> > >> 884 // In the Linux NPTL pthread implementation the guard size mechanism > >> > >> Now that we have additional information on this could you update this > >> old comment to say > >> > >> // In glibc versions prior to 2.7 the guard size mechanism > > > > Done. > > > >> > >> -- > >> > >> 897 // Adjust the stack_size by adding the on-stack TLS size if > >> 898 // AdjustStackSizeForTLS is true. The guard size is already > >> 899 // accounted in this case, please see comments in > >> 900 // get_static_tls_area_size(). > >> > >> Given the extensive commentary in get_static_tls_area_size() can we be > >> more brief here and just say: > >> > >> // Adjust the stack size for on-stack TLS - see get_static_tls_area_size(). > > > > Done. > > > >> > >> -- > >> > >> 5193 get_minstack_init(); > >> > >> Can you make this conditional on AdjustStackSizeForTLS please so there > >> is no affect when not using the flag - thanks. > > > > Done. > > > >> > >> -- > >> > >> 855 } > >> 856 return tls_size; > >> > >> Can you insert a logging statement: > >> > >> log(os, thread)("Stack size adjustment for TLS is " SIZE_T_FORMAT, > >> tls_size); > > > > Added. > > > >> > >> --- > >> > >> test/hotspot/jtreg/runtime/TLS/T.java > >> > >> 37 // Starting a ProcessBuilder causes the process reaper > >> thread being > >> > >> s/being/to be/ > > > > Fixed. > > > >> > >> 43 // failure mode the VM fails to create thread with > >> error message > >> > >> s/create thread/create a thread/ > > > > Fixed. > > > >> > >> 53 System.out.println("Unexpected Echo output: " + > >> echoOutput + > >> 54 ", expects: " + echoInput); > >> > >> should this be an exception so that test fails? I can't imagine how echo > >> would fail but probably better to fail the test if something unexpected > >> happens. > > > > If no expected output is obtained from echo (due to ProcessBuilder > > failure caused by TLS issue), the test does fail and reports to the > > caller (returns false). If we throw an explicit expectation, it will > > be caught by the outer try/catch, which seems to be unnecessary. > > > >> > >> 66 try { > >> 67 br = new BufferedReader(new > >> InputStreamReader(inputStream)); > >> 68 s = br.readLine(); > >> 69 } finally { > >> 70 br.close(); > >> 71 } > >> > >> This could use try-with-resources for both streams. > > > > Sounds good. Done. > > > >> > >> --- > >> > >> exestack-tls.c > >> > >> It's simpler if no argument means no-tls and an argument means tls. > >> > >> 42 char classpath[4096]; > >> 43 snprintf(classpath, sizeof classpath, > >> 44 "-Djava.class.path=%s", getenv("CLASSPATH")); > >> 45 options[0].optionString = classpath; > >> > >> Do we need to explicitly set the classpath? I'm concerned that our test > >> environment uses really, really long paths and a number of them. > >> (Probably not 4096 but still ...) > > > > The classpath needs to be set so we know where to load the test class. > > Given that we use 4096 for the same type of usage in other existing > > test(s) (for example StackGap), it probably is okay for our test > > environments? I can increase the array size if we want to be extra > > safe ... > > > >> > >> > >> test/hotspot/jtreg/runtime/TLS/testtls.sh > >> > >> 40 if [ "${VM_OS}" != "linux" ] > >> 41 then > >> 42 echo "Test is only valid for Linux" > >> 43 exit 0 > >> 44 fi > >> > >> This should be done via "@requires os.family != Linux" > > > > Done. > > > > Thanks! > > > > Best regards, > > Jiangli > >> > >> Thanks, > >> David > >> ----- > >> > >>> Thanks for everyone's contribution on carving out the current workaround! > >>> > >>> Best regards, > >>> Jiangli > >>> > >>> On Thu, Jun 27, 2019 at 11:42 AM Jiangli Zhou wrote: > >>>> > >>>> Thank you Thomas and David! Glad to see that we are converging on an > >>>> acceptable approach here. I'll try to factor in all the latest inputs > >>>> from everyone and send out a new update. > >>>> > >>>> Thanks and best regards, > >>>> Jiangli > >>>> > >>>> On Thu, Jun 27, 2019 at 11:13 AM David Holmes wrote: > >>>>> > >>>>> Trimming .... > >>>>> > >>>>> On 27/06/2019 12:35 pm, Jiangli Zhou wrote: > >>>>>> On Thu, Jun 27, 2019 at 9:23 AM Florian Weimer wrote: > >>>>>>> I think you can handle the guard size in this way: > >>>>>>> > >>>>>>> pthread_attr_setguardsize(&attr, guard_size); > >>>>>>> > >>>>>>> size_t stack_adjust_size = 0; > >>>>>>> if (AdjustStackSizeForTLS) { > >>>>>>> size_t minstack_size = get_minstack(&attr); > >>>>>>> size_t tls_size = minstack_size - vm_page_size() - PTHREAD_STACK_MIN; > >>>>>>> // In glibc before 2.27, tls_size still includes guard_size. > >>>>>>> // In glibc 2.27 and later, guard_size is automatically > >>>>>>> // added to the stack size by pthread_create. > >>>>>>> // In both cases, the guard size is taken into account. > >>>>>>> stack_adjust_size += tls_size; > >>>>>>> } else { > >>>>>>> stack_adjust_size += guard_size; > >>>>>>> } > >>>>>> > >>>>>> Is the vm_page_size() counted for the dl_pagesize? As long as others > >>>>>> are okay with the above suggested adjustment, it looks good to me. > >>>>>> Thomas, David and others, any objection? > >>>>> > >>>>> I find the above acceptable. I've been waiting for the dust to settle. > >>>>> > >>>>> Thanks, > >>>>> David > >>>>> > >>>>>> Thanks and best regards, > >>>>>> Jiangli > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Florian From david.holmes at oracle.com Mon Jul 1 23:15:41 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 2 Jul 2019 09:15:41 +1000 Subject: About the issue of lock downgrade In-Reply-To: References: Message-ID: Hi Zhuo, On 2/07/2019 1:36 am, zhuo chen wrote: > Dear All > > Please help me, I have a question, from this article > http://openjdk.java.net/jeps/8183909. Why is the weight lock here > downgraded to lightweight, is there any benefit to doing this? Why not > directly downgrade to a bias lock? Thank you for answering my question.Very > looking forward to reply. To get to monitor inflation biased-locking for an object has to have been revoked at some point. Once the bias has been revoked for an instance we do not allow it to be re-biased again. Biased-locking assumes only a single thread will lock a particular instance, so once we have shown that is not the case we drop biased-locking out of the picture. David > Thank you guys > > zhuo > From goetz.lindenmaier at sap.com Tue Jul 2 06:16:32 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 2 Jul 2019 06:16:32 +0000 Subject: RFR(xxs): 8227035: JVM::printFlags fails in native OOM situations In-Reply-To: References: Message-ID: Hi Thomas, looks good, nice fix! Best regards, Goetz. > -----Original Message----- > From: hotspot-runtime-dev > On Behalf Of Thomas St?fe > Sent: Montag, 1. Juli 2019 18:09 > To: Hotspot dev runtime > Subject: RFR(xxs): 8227035: JVM::printFlags fails in native OOM situations > > Hi all, > > JVM::printFlags crashed on me in a native OOM situation. That was > unnecessary. > > small and simple fix: > > issue: https://bugs.openjdk.java.net/browse/JDK-8227035 > webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8227035-jvmflags-printflags-shall- > not-allocate/webrev.00/webrev/ > > Thank you! > > ..Thomas From thomas.stuefe at gmail.com Tue Jul 2 06:18:47 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 2 Jul 2019 08:18:47 +0200 Subject: RFR(xxs): 8227035: JVM::printFlags fails in native OOM situations In-Reply-To: References: Message-ID: Thanks Goetz! On Tue, Jul 2, 2019 at 8:16 AM Lindenmaier, Goetz wrote: > Hi Thomas, > > looks good, nice fix! > > Best regards, > Goetz. > > > -----Original Message----- > > From: hotspot-runtime-dev > > On Behalf Of Thomas St?fe > > Sent: Montag, 1. Juli 2019 18:09 > > To: Hotspot dev runtime > > Subject: RFR(xxs): 8227035: JVM::printFlags fails in native OOM > situations > > > > Hi all, > > > > JVM::printFlags crashed on me in a native OOM situation. That was > > unnecessary. > > > > small and simple fix: > > > > issue: https://bugs.openjdk.java.net/browse/JDK-8227035 > > webrev: > > > http://cr.openjdk.java.net/~stuefe/webrevs/8227035-jvmflags-printflags-shall- > > not-allocate/webrev.00/webrev/ > > > > Thank you! > > > > ..Thomas > From thomas.schatzl at oracle.com Tue Jul 2 11:05:16 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 02 Jul 2019 13:05:16 +0200 Subject: RFR(xxs): 8227032: MetaspaceUtils::print_report crashes when called before initialization In-Reply-To: References: Message-ID: <7786446b1343e2eea5a81817eb8cb84ea8e0470b.camel@oracle.com> Hi, On Mon, 2019-07-01 at 18:16 +0200, Thomas St?fe wrote: > Hi all, > > this is a tiny fix which prevents MetaspaceUtils::print_report from > crashing when called before metaspace is initialized. > > Issue: https://bugs.openjdk.java.net/browse/JDK-8227032 > webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8227032-metaspaceutils-print-report-pre-init-crash/webrev.00/webrev/index.html > > Thanks, Thomas looks good. Thomas From fweimer at redhat.com Tue Jul 2 11:20:05 2019 From: fweimer at redhat.com (Florian Weimer) Date: Tue, 02 Jul 2019 13:20:05 +0200 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: (Jiangli Zhou's message of "Mon, 1 Jul 2019 15:33:22 -0700") References: <874l4bzekk.fsf@oldenburg2.str.redhat.com> <87k1d7xxy7.fsf@oldenburg2.str.redhat.com> <87ftnvt604.fsf@oldenburg2.str.redhat.com> <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> Message-ID: <87v9wk4ra2.fsf@oldenburg2.str.redhat.com> * Jiangli Zhou: > No problem at all! I actually thought in the same direction as well > when making the change initially but didn't go with it as I had some > concerns. Florian's latest reply is reassuring (thanks again!). So > here is the update: > http://cr.openjdk.java.net/~jiangli/8225035/webrev_inc.05/. I'm sorry, but I'm not familiar with Hotspot. Is there external locking? Otherwise this code is not thread-safe. Thanks, Florian From thomas.stuefe at gmail.com Tue Jul 2 11:47:50 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 2 Jul 2019 13:47:50 +0200 Subject: RFR(xxs): 8227032: MetaspaceUtils::print_report crashes when called before initialization In-Reply-To: <7786446b1343e2eea5a81817eb8cb84ea8e0470b.camel@oracle.com> References: <7786446b1343e2eea5a81817eb8cb84ea8e0470b.camel@oracle.com> Message-ID: Thank you Thomas. On Tue, Jul 2, 2019 at 1:05 PM Thomas Schatzl wrote: > Hi, > > On Mon, 2019-07-01 at 18:16 +0200, Thomas St?fe wrote: > > Hi all, > > > > this is a tiny fix which prevents MetaspaceUtils::print_report from > > crashing when called before metaspace is initialized. > > > > Issue: https://bugs.openjdk.java.net/browse/JDK-8227032 > > webrev: > > > > http://cr.openjdk.java.net/~stuefe/webrevs/8227032-metaspaceutils-print-report-pre-init-crash/webrev.00/webrev/index.html > > > > Thanks, Thomas > > looks good. > > Thomas > > From markus.gronlund at oracle.com Tue Jul 2 13:06:12 2019 From: markus.gronlund at oracle.com (Markus Gronlund) Date: Tue, 2 Jul 2019 06:06:12 -0700 (PDT) Subject: [13] RFR(XS): 8227011: Starting a JFR recording in response to JVMTI VMInit and / or Java agent premain corrupts memory In-Reply-To: <5D19FD00.8010508@oracle.com> References: <70e80480-4129-45fd-9663-2d6fc6151f01@default> <5D19FD00.8010508@oracle.com> Message-ID: Thank you Erik for the review. Can I please ask for a second reviewer? Thanks Markus -----Original Message----- From: Erik Gahlin Sent: den 1 juli 2019 14:31 To: Markus Gronlund ; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: [13] RFR(XS): 8227011: Starting a JFR recording in response to JVMTI VMInit and / or Java agent premain corrupts memory Looks good. Erik > Greetings, > > Please review the following change set: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8227011 > Webrev: http://cr.openjdk.java.net/~mgronlun/8227011/webrev01/ > Test: Testing: test/jdk/:jdk_jfr > > Summary: > See bug. > > Comment: > No test was added under this bug due to the relative complexities involved for reliable tests using instrumentation APIs. > There should indeed be follow-up work to better test the interaction of the instrumentation APIs and JFR, some of it is tracked under https://bugs.openjdk.java.net/browse/JDK-8226779 . > > Thanks > Markus From markus.gronlund at oracle.com Tue Jul 2 13:08:17 2019 From: markus.gronlund at oracle.com (Markus Gronlund) Date: Tue, 2 Jul 2019 06:08:17 -0700 (PDT) Subject: [13] RFR(L): 8214542: JFR: Old Object Sample event slow on a deep heap in debug builds In-Reply-To: <395644CF-307B-4F15-A7C4-61F6F9532F5C@oracle.com> References: <0d9b2624-a0ef-48fa-aae2-b495bd326426@default> <395644CF-307B-4F15-A7C4-61F6F9532F5C@oracle.com> Message-ID: <2f456637-ebd3-444b-bdb8-5e0f4dd67399@default> Thank you Erik for the review. Can I please ask for a second reviewer? Thanks Markus -----Original Message----- From: Erik Gahlin Sent: den 1 juli 2019 16:49 To: Markus Gronlund Cc: hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: [13] RFR(L): 8214542: JFR: Old Object Sample event slow on a deep heap in debug builds Looks good. Erik > On 28 Jun 2019, at 17:56, Markus Gronlund wrote: > > Greetings, > > Kindly asking for reviews for the following changeset: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8214542 > Webrev: http://cr.openjdk.java.net/~mgronlun/8214542/webrev01/ > Testing: test/jdk/:jdk_jfr > > Summary: > Removal of heavy current_frontier assertion. > Trimmed down storage requirements by only storing necessary contextual edges. > Normalized representation of edges for better reuse. > Better preservation of already persisted tree structure by less intrusive reuse logic. > Caching. > > Thank you in advance > Markus From lois.foltan at oracle.com Tue Jul 2 13:59:22 2019 From: lois.foltan at oracle.com (Lois Foltan) Date: Tue, 2 Jul 2019 09:59:22 -0400 Subject: RFR 8226956: Add invocation tests for Graal and C1 In-Reply-To: <77cd561a-fe69-b112-b68d-dc37523ace39@oracle.com> References: <77cd561a-fe69-b112-b68d-dc37523ace39@oracle.com> Message-ID: <98784ce5-13cd-6d4d-e361-5b3b81178657@oracle.com> Looks good. Lois On 7/1/2019 4:46 PM, Harold Seigel wrote: > Hi, > > Please review this JDK-14 change to add invocation testing of Graal > and C1.? The change also includes minor clean-ups of some invocation > test code. > > Open Webrev: > http://cr.openjdk.java.net/~hseigel/bug_8226956/webrev/index.html > > JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8226956 > > The change was tested by running hs-tier3 tests on Linux-x64, Solaris, > Windows, and Mac OS X. > > Thanks, Harold > From harold.seigel at oracle.com Tue Jul 2 14:01:11 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Tue, 2 Jul 2019 10:01:11 -0400 Subject: RFR 8226956: Add invocation tests for Graal and C1 In-Reply-To: <98784ce5-13cd-6d4d-e361-5b3b81178657@oracle.com> References: <77cd561a-fe69-b112-b68d-dc37523ace39@oracle.com> <98784ce5-13cd-6d4d-e361-5b3b81178657@oracle.com> Message-ID: Thanks Lois! Harold On 7/2/2019 9:59 AM, Lois Foltan wrote: > Looks good. > Lois > > On 7/1/2019 4:46 PM, Harold Seigel wrote: >> Hi, >> >> Please review this JDK-14 change to add invocation testing of Graal >> and C1.? The change also includes minor clean-ups of some invocation >> test code. >> >> Open Webrev: >> http://cr.openjdk.java.net/~hseigel/bug_8226956/webrev/index.html >> >> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8226956 >> >> The change was tested by running hs-tier3 tests on Linux-x64, >> Solaris, Windows, and Mac OS X. >> >> Thanks, Harold >> > From jianglizhou at google.com Tue Jul 2 15:08:24 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Tue, 2 Jul 2019 08:08:24 -0700 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: <87v9wk4ra2.fsf@oldenburg2.str.redhat.com> References: <874l4bzekk.fsf@oldenburg2.str.redhat.com> <87k1d7xxy7.fsf@oldenburg2.str.redhat.com> <87ftnvt604.fsf@oldenburg2.str.redhat.com> <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> <87v9wk4ra2.fsf@oldenburg2.str.redhat.com> Message-ID: Hi Florian, On Tue, Jul 2, 2019 at 4:20 AM Florian Weimer wrote: > > * Jiangli Zhou: > > > No problem at all! I actually thought in the same direction as well > > when making the change initially but didn't go with it as I had some > > concerns. Florian's latest reply is reassuring (thanks again!). So > > here is the update: > > http://cr.openjdk.java.net/~jiangli/8225035/webrev_inc.05/. > > I'm sorry, but I'm not familiar with Hotspot. Is there external > locking? Otherwise this code is not thread-safe. The first time when get_static_tls_area_size is called, it happens during early JVM initialization when the main thread is trying to spawn a GC thread (it is the first thread being created by the main thread). The 'tls_size' is fully initialized before the creation of the new thread. I think it's safe without an extra lock protection since there's no race here. If we want to be extra cautious, we could initialize 'tls_size' as part of get_minstack_init during os::init_2. I'm not sure if there are cases where __pthread_get_minstack() is called too 'early' during an execution and not be able to get the proper result? I feel safer with the current change. Best regards, Jiangli > > Thanks, > Florian From robin.westberg at oracle.com Tue Jul 2 15:20:20 2019 From: robin.westberg at oracle.com (Robin Westberg) Date: Tue, 2 Jul 2019 17:20:20 +0200 Subject: [13] RFR(L): 8214542: JFR: Old Object Sample event slow on a deep heap in debug builds In-Reply-To: <2f456637-ebd3-444b-bdb8-5e0f4dd67399@default> References: <0d9b2624-a0ef-48fa-aae2-b495bd326426@default> <395644CF-307B-4F15-A7C4-61F6F9532F5C@oracle.com> <2f456637-ebd3-444b-bdb8-5e0f4dd67399@default> Message-ID: <4903FDD1-A734-47F4-AF6B-AA1E8FA2CDA8@oracle.com> Hi Markus, Looks good to me! Very minor nits: src/hotspot/share/jfr/leakprofiler/sampling/objectSampler.cpp 131 s/succesful/successful/ src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp 441 S/Exlusive/Exclusive/ Best regards, Robin > On 2 Jul 2019, at 15:08, Markus Gronlund wrote: > > Thank you Erik for the review. > > Can I please ask for a second reviewer? > > Thanks > Markus > > -----Original Message----- > From: Erik Gahlin > Sent: den 1 juli 2019 16:49 > To: Markus Gronlund > Cc: hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: [13] RFR(L): 8214542: JFR: Old Object Sample event slow on a deep heap in debug builds > > Looks good. > > Erik > >> On 28 Jun 2019, at 17:56, Markus Gronlund wrote: >> >> Greetings, >> >> Kindly asking for reviews for the following changeset: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8214542 >> Webrev: http://cr.openjdk.java.net/~mgronlun/8214542/webrev01/ >> Testing: test/jdk/:jdk_jfr >> >> Summary: >> Removal of heavy current_frontier assertion. >> Trimmed down storage requirements by only storing necessary contextual edges. >> Normalized representation of edges for better reuse. >> Better preservation of already persisted tree structure by less intrusive reuse logic. >> Caching. >> >> Thank you in advance >> Markus > From robin.westberg at oracle.com Tue Jul 2 15:21:47 2019 From: robin.westberg at oracle.com (Robin Westberg) Date: Tue, 2 Jul 2019 17:21:47 +0200 Subject: [13] RFR(XS): 8227011: Starting a JFR recording in response to JVMTI VMInit and / or Java agent premain corrupts memory In-Reply-To: References: <70e80480-4129-45fd-9663-2d6fc6151f01@default> <5D19FD00.8010508@oracle.com> Message-ID: <5277798D-286F-4623-AD93-AE183FC7817D@oracle.com> Hi Markus, Looks good to me! Best regards, Robin > On 2 Jul 2019, at 15:06, Markus Gronlund wrote: > > Thank you Erik for the review. > > Can I please ask for a second reviewer? > > Thanks > Markus > > > -----Original Message----- > From: Erik Gahlin > Sent: den 1 juli 2019 14:31 > To: Markus Gronlund ; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: [13] RFR(XS): 8227011: Starting a JFR recording in response to JVMTI VMInit and / or Java agent premain corrupts memory > > Looks good. > > Erik > >> Greetings, >> >> Please review the following change set: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8227011 >> Webrev: http://cr.openjdk.java.net/~mgronlun/8227011/webrev01/ >> Test: Testing: test/jdk/:jdk_jfr >> >> Summary: >> See bug. >> >> Comment: >> No test was added under this bug due to the relative complexities involved for reliable tests using instrumentation APIs. >> There should indeed be follow-up work to better test the interaction of the instrumentation APIs and JFR, some of it is tracked under https://bugs.openjdk.java.net/browse/JDK-8226779 . >> >> Thanks >> Markus > From coleen.phillimore at oracle.com Tue Jul 2 17:19:11 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 2 Jul 2019 13:19:11 -0400 Subject: RFR 8226956: Add invocation tests for Graal and C1 In-Reply-To: <77cd561a-fe69-b112-b68d-dc37523ace39@oracle.com> References: <77cd561a-fe69-b112-b68d-dc37523ace39@oracle.com> Message-ID: <36d63ea5-0c96-1745-74fb-3b4b7aaa4e9e@oracle.com> Looks good! Coleen On 7/1/19 4:46 PM, Harold Seigel wrote: > Hi, > > Please review this JDK-14 change to add invocation testing of Graal > and C1.? The change also includes minor clean-ups of some invocation > test code. > > Open Webrev: > http://cr.openjdk.java.net/~hseigel/bug_8226956/webrev/index.html > > JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8226956 > > The change was tested by running hs-tier3 tests on Linux-x64, Solaris, > Windows, and Mac OS X. > > Thanks, Harold > From harold.seigel at oracle.com Tue Jul 2 17:23:03 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Tue, 2 Jul 2019 13:23:03 -0400 Subject: RFR 8226956: Add invocation tests for Graal and C1 In-Reply-To: <36d63ea5-0c96-1745-74fb-3b4b7aaa4e9e@oracle.com> References: <77cd561a-fe69-b112-b68d-dc37523ace39@oracle.com> <36d63ea5-0c96-1745-74fb-3b4b7aaa4e9e@oracle.com> Message-ID: Thanks Coleen! Harold On 7/2/2019 1:19 PM, coleen.phillimore at oracle.com wrote: > Looks good! > Coleen > > On 7/1/19 4:46 PM, Harold Seigel wrote: >> Hi, >> >> Please review this JDK-14 change to add invocation testing of Graal >> and C1.? The change also includes minor clean-ups of some invocation >> test code. >> >> Open Webrev: >> http://cr.openjdk.java.net/~hseigel/bug_8226956/webrev/index.html >> >> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8226956 >> >> The change was tested by running hs-tier3 tests on Linux-x64, >> Solaris, Windows, and Mac OS X. >> >> Thanks, Harold >> > From mikhailo.seledtsov at oracle.com Tue Jul 2 22:24:01 2019 From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com) Date: Tue, 2 Jul 2019 15:24:01 -0700 Subject: RFR(S): 8227122: [TESTBUG] Create Docker sidecar test cases Message-ID: <11c03bdc-832b-3479-075b-1a8b5dc95e87@oracle.com> Please review this new test that uses a Docker sidecar pattern to manage/monitor JVM running in the main payload container. Sidecar is a common pattern used in the cloud environments for monitoring among other uses. In side car pattern the main application/service container that runs the payload is paired with a sidecar container. It is achieved by sharing certain namespace aspects between the two containers such as PID namespace, specific sub-directories, IPC and more. This test implements the following cases: ? - "jcmd -l" to list java processes running in "main" container from the "sidecar" container ? - "jhsdb jinfo" in the sidecar configuration ? - jcmd This change also builds a basis for more test cases in the future. Minor changes were done to DockerTestUtils: ? - changing access to DOCKER_COMMAND constant to public ? - minor spelling and terminology corrections ??? JBS: https://bugs.openjdk.java.net/browse/JDK-8227122 ??? Webrev: http://cr.openjdk.java.net/~mseledtsov/8227122.00/ ??? Testing: ??????? 1. ran Docker tests on Linux-x64 - PASS ??????? 2. Running Docker tests in test cluster - in progress Thank you, Misha From david.holmes at oracle.com Wed Jul 3 06:59:51 2019 From: david.holmes at oracle.com (David Holmes) Date: Wed, 3 Jul 2019 16:59:51 +1000 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: References: <874l4bzekk.fsf@oldenburg2.str.redhat.com> <87k1d7xxy7.fsf@oldenburg2.str.redhat.com> <87ftnvt604.fsf@oldenburg2.str.redhat.com> <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> Message-ID: <0dacfb70-8901-9e9f-9a1f-6a5dc1bc6a91@oracle.com> Hi Jiangli, On 2/07/2019 8:33 am, Jiangli Zhou wrote: > Hi David, > > On Mon, Jul 1, 2019 at 1:22 AM David Holmes wrote: >> >> Hi Jiangli, >> >> On 29/06/2019 9:42 am, Jiangli Zhou wrote: >>> Hi David, >>> >>> Thanks for the detailed comments! Here is the latest webrev: >>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.04/. Apologizing >>> for not including an incremental webrev (realized that when I almost >>> done edits). >> >> That all looks fine - thanks for making the changes. >> >> However ... now that I see the logging output it occurred to me that >> checking for the TLS adjustment is something that should only happen >> once and we should be storing the adjustment amount in a static for >> direct use. Sorry I didn't think about this earlier. Something like: > > No problem at all! I actually thought in the same direction as well > when making the change initially but didn't go with it as I had some > concerns. Florian's latest reply is reassuring (thanks again!). So > here is the update: > http://cr.openjdk.java.net/~jiangli/8225035/webrev_inc.05/. The incremental change looks fine to me. To address Florian's concern I suggest adding an additional comment: // Returns the size of the static TLS area glibc puts on thread stacks. + // The value is cached on first use, which occurs when the first thread + // is created during VM initialization. static size_t get_static_tls_area_size(const pthread_attr_t *attr) { > Since we are settling down on the approach and final implementation > details, it would be a good idea to get the CSR ball rolling. Could > you please review and I'll finalize the CSR. Thanks! Done. Thanks, David > Best regards, > Jiangli > >> >> static size_t tls_size = 0; >> static bool tls_size_inited = false; >> >> static size_t get_static_tls_area_size(const pthread_attr_t *attr) { >> + if (!tls_size_inited) { >> + tls_size_inited = true; >> if (_get_minstack_func != NULL) { >> size_t minstack_size = _get_minstack_func(attr); >> ... >> if (minstack_size > (size_t)os::vm_page_size() + >> PTHREAD_STACK_MIN) { >> tls_size = minstack_size - os::vm_page_size() - PTHREAD_STACK_MIN; >> } >> } >> + } >> log_info(os, thread)("Stack size adjustment for TLS is " SIZE_FORMAT, >> tls_size); >> return tls_size; >> } >> >> Or even fold it all into get_minstack_init() ? >> >> I'm assuming that the result of __pthread_get_minstack wont' change over >> time of course. >> >> Thanks, >> David >> ----- >> >>> On Fri, Jun 28, 2019 at 8:57 AM David Holmes wrote: >>>> >>>> Hi Jiangli, >>>> >>>> This is very well written up - thanks. >>>> >>>> I apologize in advance that I'm about to traveling for a few days so >>>> won't be able to respond further until next week. >>> >>> Hope you have a relaxed and safe travel. This can wait. >>> >>>> >>>> On 27/06/2019 11:58 pm, Jiangli Zhou wrote: >>>>> Updated webrev: >>>>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.03/ >>>> >>>> Overall changes look good. I also have a concern around this code: >>>> >>>> + tls_size = minstack_size - os::vm_page_size() - PTHREAD_STACK_MIN; >>>> + assert(tls_size > 0, "unexpected size"); >>>> >>>> In addition to Thomas's comments re-signedness, can't the result == 0 if >>>> there is no sttaic TLS in use? Or is there always some static TLS in use? >>> >>> On both glibc 2.24 and 2.28, by default I see there is one page memory >>> for static TLS, without explicitly defining any __thread variables in >>> user code. >>> >>>> >>>> A few other comments/requests: >>>> >>>> 822 static void get_minstack_init() { >>>> 823 _get_minstack_func = >>>> 824 (GetMinStack)dlsym(RTLD_DEFAULT, "__pthread_get_minstack"); >>>> 825 } >>>> >>>> Can you add a logging statement please: >>>> >>>> log(os, thread)("Lookup of __pthread_get_minstack %s", >>>> _get_minstack_func == NULL ? "failed" : "succeeded"); >>> >>> Added log info. >>> >>>> >>>> -- >>>> >>>> 884 // In the Linux NPTL pthread implementation the guard size mechanism >>>> >>>> Now that we have additional information on this could you update this >>>> old comment to say >>>> >>>> // In glibc versions prior to 2.7 the guard size mechanism >>> >>> Done. >>> >>>> >>>> -- >>>> >>>> 897 // Adjust the stack_size by adding the on-stack TLS size if >>>> 898 // AdjustStackSizeForTLS is true. The guard size is already >>>> 899 // accounted in this case, please see comments in >>>> 900 // get_static_tls_area_size(). >>>> >>>> Given the extensive commentary in get_static_tls_area_size() can we be >>>> more brief here and just say: >>>> >>>> // Adjust the stack size for on-stack TLS - see get_static_tls_area_size(). >>> >>> Done. >>> >>>> >>>> -- >>>> >>>> 5193 get_minstack_init(); >>>> >>>> Can you make this conditional on AdjustStackSizeForTLS please so there >>>> is no affect when not using the flag - thanks. >>> >>> Done. >>> >>>> >>>> -- >>>> >>>> 855 } >>>> 856 return tls_size; >>>> >>>> Can you insert a logging statement: >>>> >>>> log(os, thread)("Stack size adjustment for TLS is " SIZE_T_FORMAT, >>>> tls_size); >>> >>> Added. >>> >>>> >>>> --- >>>> >>>> test/hotspot/jtreg/runtime/TLS/T.java >>>> >>>> 37 // Starting a ProcessBuilder causes the process reaper >>>> thread being >>>> >>>> s/being/to be/ >>> >>> Fixed. >>> >>>> >>>> 43 // failure mode the VM fails to create thread with >>>> error message >>>> >>>> s/create thread/create a thread/ >>> >>> Fixed. >>> >>>> >>>> 53 System.out.println("Unexpected Echo output: " + >>>> echoOutput + >>>> 54 ", expects: " + echoInput); >>>> >>>> should this be an exception so that test fails? I can't imagine how echo >>>> would fail but probably better to fail the test if something unexpected >>>> happens. >>> >>> If no expected output is obtained from echo (due to ProcessBuilder >>> failure caused by TLS issue), the test does fail and reports to the >>> caller (returns false). If we throw an explicit expectation, it will >>> be caught by the outer try/catch, which seems to be unnecessary. >>> >>>> >>>> 66 try { >>>> 67 br = new BufferedReader(new >>>> InputStreamReader(inputStream)); >>>> 68 s = br.readLine(); >>>> 69 } finally { >>>> 70 br.close(); >>>> 71 } >>>> >>>> This could use try-with-resources for both streams. >>> >>> Sounds good. Done. >>> >>>> >>>> --- >>>> >>>> exestack-tls.c >>>> >>>> It's simpler if no argument means no-tls and an argument means tls. >>>> >>>> 42 char classpath[4096]; >>>> 43 snprintf(classpath, sizeof classpath, >>>> 44 "-Djava.class.path=%s", getenv("CLASSPATH")); >>>> 45 options[0].optionString = classpath; >>>> >>>> Do we need to explicitly set the classpath? I'm concerned that our test >>>> environment uses really, really long paths and a number of them. >>>> (Probably not 4096 but still ...) >>> >>> The classpath needs to be set so we know where to load the test class. >>> Given that we use 4096 for the same type of usage in other existing >>> test(s) (for example StackGap), it probably is okay for our test >>> environments? I can increase the array size if we want to be extra >>> safe ... >>> >>>> >>>> >>>> test/hotspot/jtreg/runtime/TLS/testtls.sh >>>> >>>> 40 if [ "${VM_OS}" != "linux" ] >>>> 41 then >>>> 42 echo "Test is only valid for Linux" >>>> 43 exit 0 >>>> 44 fi >>>> >>>> This should be done via "@requires os.family != Linux" >>> >>> Done. >>> >>> Thanks! >>> >>> Best regards, >>> Jiangli >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>>> Thanks for everyone's contribution on carving out the current workaround! >>>>> >>>>> Best regards, >>>>> Jiangli >>>>> >>>>> On Thu, Jun 27, 2019 at 11:42 AM Jiangli Zhou wrote: >>>>>> >>>>>> Thank you Thomas and David! Glad to see that we are converging on an >>>>>> acceptable approach here. I'll try to factor in all the latest inputs >>>>>> from everyone and send out a new update. >>>>>> >>>>>> Thanks and best regards, >>>>>> Jiangli >>>>>> >>>>>> On Thu, Jun 27, 2019 at 11:13 AM David Holmes wrote: >>>>>>> >>>>>>> Trimming .... >>>>>>> >>>>>>> On 27/06/2019 12:35 pm, Jiangli Zhou wrote: >>>>>>>> On Thu, Jun 27, 2019 at 9:23 AM Florian Weimer wrote: >>>>>>>>> I think you can handle the guard size in this way: >>>>>>>>> >>>>>>>>> pthread_attr_setguardsize(&attr, guard_size); >>>>>>>>> >>>>>>>>> size_t stack_adjust_size = 0; >>>>>>>>> if (AdjustStackSizeForTLS) { >>>>>>>>> size_t minstack_size = get_minstack(&attr); >>>>>>>>> size_t tls_size = minstack_size - vm_page_size() - PTHREAD_STACK_MIN; >>>>>>>>> // In glibc before 2.27, tls_size still includes guard_size. >>>>>>>>> // In glibc 2.27 and later, guard_size is automatically >>>>>>>>> // added to the stack size by pthread_create. >>>>>>>>> // In both cases, the guard size is taken into account. >>>>>>>>> stack_adjust_size += tls_size; >>>>>>>>> } else { >>>>>>>>> stack_adjust_size += guard_size; >>>>>>>>> } >>>>>>>> >>>>>>>> Is the vm_page_size() counted for the dl_pagesize? As long as others >>>>>>>> are okay with the above suggested adjustment, it looks good to me. >>>>>>>> Thomas, David and others, any objection? >>>>>>> >>>>>>> I find the above acceptable. I've been waiting for the dust to settle. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>>> Thanks and best regards, >>>>>>>> Jiangli >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Florian From markus.gronlund at oracle.com Wed Jul 3 08:42:42 2019 From: markus.gronlund at oracle.com (Markus Gronlund) Date: Wed, 3 Jul 2019 01:42:42 -0700 (PDT) Subject: [13] RFR(XS): 8227011: Starting a JFR recording in response to JVMTI VMInit and / or Java agent premain corrupts memory In-Reply-To: <5277798D-286F-4623-AD93-AE183FC7817D@oracle.com> References: <70e80480-4129-45fd-9663-2d6fc6151f01@default> <5D19FD00.8010508@oracle.com> <5277798D-286F-4623-AD93-AE183FC7817D@oracle.com> Message-ID: Thank you Robin for the review. Markus -----Original Message----- From: Robin Westberg Sent: den 2 juli 2019 17:22 To: Markus Gronlund Cc: Erik Gahlin ; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: [13] RFR(XS): 8227011: Starting a JFR recording in response to JVMTI VMInit and / or Java agent premain corrupts memory Hi Markus, Looks good to me! Best regards, Robin > On 2 Jul 2019, at 15:06, Markus Gronlund wrote: > > Thank you Erik for the review. > > Can I please ask for a second reviewer? > > Thanks > Markus > > > -----Original Message----- > From: Erik Gahlin > Sent: den 1 juli 2019 14:31 > To: Markus Gronlund ; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: [13] RFR(XS): 8227011: Starting a JFR recording in response to JVMTI VMInit and / or Java agent premain corrupts memory > > Looks good. > > Erik > >> Greetings, >> >> Please review the following change set: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8227011 >> Webrev: http://cr.openjdk.java.net/~mgronlun/8227011/webrev01/ >> Test: Testing: test/jdk/:jdk_jfr >> >> Summary: >> See bug. >> >> Comment: >> No test was added under this bug due to the relative complexities involved for reliable tests using instrumentation APIs. >> There should indeed be follow-up work to better test the interaction of the instrumentation APIs and JFR, some of it is tracked under https://bugs.openjdk.java.net/browse/JDK-8226779 . >> >> Thanks >> Markus > From markus.gronlund at oracle.com Wed Jul 3 08:43:38 2019 From: markus.gronlund at oracle.com (Markus Gronlund) Date: Wed, 3 Jul 2019 01:43:38 -0700 (PDT) Subject: [13] RFR(L): 8214542: JFR: Old Object Sample event slow on a deep heap in debug builds In-Reply-To: <4903FDD1-A734-47F4-AF6B-AA1E8FA2CDA8@oracle.com> References: <0d9b2624-a0ef-48fa-aae2-b495bd326426@default> <395644CF-307B-4F15-A7C4-61F6F9532F5C@oracle.com> <2f456637-ebd3-444b-bdb8-5e0f4dd67399@default> <4903FDD1-A734-47F4-AF6B-AA1E8FA2CDA8@oracle.com> Message-ID: <70d4c795-530b-473e-b707-9d30b270e27c@default> Thank you Robin for the review. I will update the spelling. Thanks Markus -----Original Message----- From: Robin Westberg Sent: den 2 juli 2019 17:20 To: Markus Gronlund Cc: Erik Gahlin ; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: [13] RFR(L): 8214542: JFR: Old Object Sample event slow on a deep heap in debug builds Hi Markus, Looks good to me! Very minor nits: src/hotspot/share/jfr/leakprofiler/sampling/objectSampler.cpp 131 s/succesful/successful/ src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp 441 S/Exlusive/Exclusive/ Best regards, Robin > On 2 Jul 2019, at 15:08, Markus Gronlund wrote: > > Thank you Erik for the review. > > Can I please ask for a second reviewer? > > Thanks > Markus > > -----Original Message----- > From: Erik Gahlin > Sent: den 1 juli 2019 16:49 > To: Markus Gronlund > Cc: hotspot-jfr-dev at openjdk.java.net; > hotspot-runtime-dev at openjdk.java.net > Subject: Re: [13] RFR(L): 8214542: JFR: Old Object Sample event slow > on a deep heap in debug builds > > Looks good. > > Erik > >> On 28 Jun 2019, at 17:56, Markus Gronlund wrote: >> >> Greetings, >> >> Kindly asking for reviews for the following changeset: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8214542 >> Webrev: http://cr.openjdk.java.net/~mgronlun/8214542/webrev01/ >> Testing: test/jdk/:jdk_jfr >> >> Summary: >> Removal of heavy current_frontier assertion. >> Trimmed down storage requirements by only storing necessary contextual edges. >> Normalized representation of edges for better reuse. >> Better preservation of already persisted tree structure by less intrusive reuse logic. >> Caching. >> >> Thank you in advance >> Markus > From bob.vandette at oracle.com Wed Jul 3 14:49:30 2019 From: bob.vandette at oracle.com (Bob Vandette) Date: Wed, 3 Jul 2019 10:49:30 -0400 Subject: RFR(S): 8227122: [TESTBUG] Create Docker sidecar test cases In-Reply-To: <11c03bdc-832b-3479-075b-1a8b5dc95e87@oracle.com> References: <11c03bdc-832b-3479-075b-1a8b5dc95e87@oracle.com> Message-ID: Very nice addition to ensuring support for popular docker use cases. A few comments on the TestJcmdWithSideCar.java 1. Shouldn?t you use @requires to only run this test on Linux x64? CanTestDocker should protect us but your test wouldn?t run on windows if we added docker support there. 2. Why is this repeated? 149 "--pid=container:" + MAIN_CONTAINER_NAME, 150 "--pid=container:" + MAIN_CONTAINER_NAME, 3. I?m a little concerned about the built in fixed delays especially the startMainContainer one. We don?t want any intermittent test failures. Could you maybe add a DockerThread.checkIsAlive function and call that every second for 20 seconds and then give up? What tier are you adding this test to? Thanks, Bob. > On Jul 2, 2019, at 6:24 PM, mikhailo.seledtsov at oracle.com wrote: > > Please review this new test that uses a Docker sidecar pattern to manage/monitor JVM running in the main payload container. > > Sidecar is a common pattern used in the cloud environments for monitoring among other uses. In side car pattern the main application/service container that runs the payload is paired with a sidecar container. It is achieved by sharing certain namespace aspects between the two containers such as PID namespace, specific sub-directories, IPC and more. > > This test implements the following cases: > - "jcmd -l" to list java processes running in "main" container from the "sidecar" container > - "jhsdb jinfo" in the sidecar configuration > - jcmd > > This change also builds a basis for more test cases in the future. > > Minor changes were done to DockerTestUtils: > - changing access to DOCKER_COMMAND constant to public > - minor spelling and terminology corrections > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8227122 > Webrev: http://cr.openjdk.java.net/~mseledtsov/8227122.00/ > Testing: > 1. ran Docker tests on Linux-x64 - PASS > 2. Running Docker tests in test cluster - in progress > > > Thank you, > Misha > From thomas.stuefe at gmail.com Wed Jul 3 18:03:04 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 3 Jul 2019 20:03:04 +0200 Subject: "Nestegg" buffer for error reporting in native oom scenarios Message-ID: Hi,, I carry this tiny patch around since quite a while which makes error handling more stable in native OOM situations. I usually apply it when dealing with memory leaks, to increase the chance of useful error reports. A number of error reporting things require memory. When we are out of memory, those steps may fail. A prominent example is NMT: when creating a detailed report, it allocates memory. In OOM scenarios NMT will not work because of this, which is a pity since this is exactly the time where having an NMT report would be super useful. A clean solution would be to harden everything running inside error handling to work with pre-allocated buffers instead, or to not alloc memory at all. But that is difficult or even impossible. What I do instead is tp allocate memory at VM startup and to release it back into the clib when a native OOM happens (of course, only when the switch is set). This is of course no guarantee that this works - code running concurrently may gobble the memory up the moment I release it, for instance - but it works surprisingly often, and in a number of cases helped me e.g. to get a detailed NMT report where otherwise I would have gotten nothing. Patch: http://cr.openjdk.java.net/~stuefe/webrevs/nestegg/webrev.00/webrev/index.html What do you think? Too stupid or weird? We can talk of course about the naming :) I am not especially proud of that hack, but as a technique, it is at least dead simple and reasonably successful. Thanks, Thomas From thomas.stuefe at gmail.com Wed Jul 3 18:10:02 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 3 Jul 2019 20:10:02 +0200 Subject: "Nestegg" buffer for error reporting in native oom scenarios In-Reply-To: References: Message-ID: p.s. this proposal is the most simple variant of the "pre-reserve a chunk of memory for bad times" theme. Alternative implementations could be to e.g. switch os::malloc() to a preallocated buffer on OOM, but that would require to implement some sort of free(), so one needs some sort of allocator atop of that buffer, however primitive. All doable, my proposal is just simpler. Cheers, Thomas On Wed, Jul 3, 2019 at 8:03 PM Thomas St?fe wrote: > Hi,, > > I carry this tiny patch around since quite a while which makes error > handling more stable in native OOM situations. I usually apply it when > dealing with memory leaks, to increase the chance of useful error reports. > > A number of error reporting things require memory. When we are out of > memory, those steps may fail. > > A prominent example is NMT: when creating a detailed report, it allocates > memory. In OOM scenarios NMT will not work because of this, which is a pity > since this is exactly the time where having an NMT report would be super > useful. > > A clean solution would be to harden everything running inside error > handling to work with pre-allocated buffers instead, or to not alloc memory > at all. But that is difficult or even impossible. > > What I do instead is tp allocate memory at VM startup and to release it > back into the clib when a native OOM happens (of course, only when the > switch is set). > > This is of course no guarantee that this works - code running concurrently > may gobble the memory up the moment I release it, for instance - but it > works surprisingly often, and in a number of cases helped me e.g. to get a > detailed NMT report where otherwise I would have gotten nothing. > > Patch: > > > http://cr.openjdk.java.net/~stuefe/webrevs/nestegg/webrev.00/webrev/index.html > > What do you think? Too stupid or weird? We can talk of course about the > naming :) > > I am not especially proud of that hack, but as a technique, it is at least > dead simple and reasonably successful. > > Thanks, Thomas > > > > From jianglizhou at google.com Wed Jul 3 18:22:53 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Wed, 3 Jul 2019 11:22:53 -0700 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: <0dacfb70-8901-9e9f-9a1f-6a5dc1bc6a91@oracle.com> References: <874l4bzekk.fsf@oldenburg2.str.redhat.com> <87k1d7xxy7.fsf@oldenburg2.str.redhat.com> <87ftnvt604.fsf@oldenburg2.str.redhat.com> <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> <0dacfb70-8901-9e9f-9a1f-6a5dc1bc6a91@oracle.com> Message-ID: Hi David, On Wed, Jul 3, 2019 at 12:00 AM David Holmes wrote: > > Hi Jiangli, > > On 2/07/2019 8:33 am, Jiangli Zhou wrote: > > Hi David, > > > > On Mon, Jul 1, 2019 at 1:22 AM David Holmes wrote: > >> > >> Hi Jiangli, > >> > >> On 29/06/2019 9:42 am, Jiangli Zhou wrote: > >>> Hi David, > >>> > >>> Thanks for the detailed comments! Here is the latest webrev: > >>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.04/. Apologizing > >>> for not including an incremental webrev (realized that when I almost > >>> done edits). > >> > >> That all looks fine - thanks for making the changes. > >> > >> However ... now that I see the logging output it occurred to me that > >> checking for the TLS adjustment is something that should only happen > >> once and we should be storing the adjustment amount in a static for > >> direct use. Sorry I didn't think about this earlier. Something like: > > > > No problem at all! I actually thought in the same direction as well > > when making the change initially but didn't go with it as I had some > > concerns. Florian's latest reply is reassuring (thanks again!). So > > here is the update: > > http://cr.openjdk.java.net/~jiangli/8225035/webrev_inc.05/. > > The incremental change looks fine to me. To address Florian's concern I > suggest adding an additional comment: > > // Returns the size of the static TLS area glibc puts on thread stacks. > + // The value is cached on first use, which occurs when the first thread > + // is created during VM initialization. > static size_t get_static_tls_area_size(const pthread_attr_t *attr) { Done. > > > Since we are settling down on the approach and final implementation > > details, it would be a good idea to get the CSR ball rolling. Could > > you please review and I'll finalize the CSR. Thanks! > > Done. Thank you! As you, Florian, Thomas all made great contributions to this workaround, I should list all of you as both contributors and reviewers in the changeset. If there is any objection, please let me know. Best regards, Jiangli > > Thanks, > David > > > Best regards, > > Jiangli > > > >> > >> static size_t tls_size = 0; > >> static bool tls_size_inited = false; > >> > >> static size_t get_static_tls_area_size(const pthread_attr_t *attr) { > >> + if (!tls_size_inited) { > >> + tls_size_inited = true; > >> if (_get_minstack_func != NULL) { > >> size_t minstack_size = _get_minstack_func(attr); > >> ... > >> if (minstack_size > (size_t)os::vm_page_size() + > >> PTHREAD_STACK_MIN) { > >> tls_size = minstack_size - os::vm_page_size() - PTHREAD_STACK_MIN; > >> } > >> } > >> + } > >> log_info(os, thread)("Stack size adjustment for TLS is " SIZE_FORMAT, > >> tls_size); > >> return tls_size; > >> } > >> > >> Or even fold it all into get_minstack_init() ? > >> > >> I'm assuming that the result of __pthread_get_minstack wont' change over > >> time of course. > >> > >> Thanks, > >> David > >> ----- > >> > >>> On Fri, Jun 28, 2019 at 8:57 AM David Holmes wrote: > >>>> > >>>> Hi Jiangli, > >>>> > >>>> This is very well written up - thanks. > >>>> > >>>> I apologize in advance that I'm about to traveling for a few days so > >>>> won't be able to respond further until next week. > >>> > >>> Hope you have a relaxed and safe travel. This can wait. > >>> > >>>> > >>>> On 27/06/2019 11:58 pm, Jiangli Zhou wrote: > >>>>> Updated webrev: > >>>>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.03/ > >>>> > >>>> Overall changes look good. I also have a concern around this code: > >>>> > >>>> + tls_size = minstack_size - os::vm_page_size() - PTHREAD_STACK_MIN; > >>>> + assert(tls_size > 0, "unexpected size"); > >>>> > >>>> In addition to Thomas's comments re-signedness, can't the result == 0 if > >>>> there is no sttaic TLS in use? Or is there always some static TLS in use? > >>> > >>> On both glibc 2.24 and 2.28, by default I see there is one page memory > >>> for static TLS, without explicitly defining any __thread variables in > >>> user code. > >>> > >>>> > >>>> A few other comments/requests: > >>>> > >>>> 822 static void get_minstack_init() { > >>>> 823 _get_minstack_func = > >>>> 824 (GetMinStack)dlsym(RTLD_DEFAULT, "__pthread_get_minstack"); > >>>> 825 } > >>>> > >>>> Can you add a logging statement please: > >>>> > >>>> log(os, thread)("Lookup of __pthread_get_minstack %s", > >>>> _get_minstack_func == NULL ? "failed" : "succeeded"); > >>> > >>> Added log info. > >>> > >>>> > >>>> -- > >>>> > >>>> 884 // In the Linux NPTL pthread implementation the guard size mechanism > >>>> > >>>> Now that we have additional information on this could you update this > >>>> old comment to say > >>>> > >>>> // In glibc versions prior to 2.7 the guard size mechanism > >>> > >>> Done. > >>> > >>>> > >>>> -- > >>>> > >>>> 897 // Adjust the stack_size by adding the on-stack TLS size if > >>>> 898 // AdjustStackSizeForTLS is true. The guard size is already > >>>> 899 // accounted in this case, please see comments in > >>>> 900 // get_static_tls_area_size(). > >>>> > >>>> Given the extensive commentary in get_static_tls_area_size() can we be > >>>> more brief here and just say: > >>>> > >>>> // Adjust the stack size for on-stack TLS - see get_static_tls_area_size(). > >>> > >>> Done. > >>> > >>>> > >>>> -- > >>>> > >>>> 5193 get_minstack_init(); > >>>> > >>>> Can you make this conditional on AdjustStackSizeForTLS please so there > >>>> is no affect when not using the flag - thanks. > >>> > >>> Done. > >>> > >>>> > >>>> -- > >>>> > >>>> 855 } > >>>> 856 return tls_size; > >>>> > >>>> Can you insert a logging statement: > >>>> > >>>> log(os, thread)("Stack size adjustment for TLS is " SIZE_T_FORMAT, > >>>> tls_size); > >>> > >>> Added. > >>> > >>>> > >>>> --- > >>>> > >>>> test/hotspot/jtreg/runtime/TLS/T.java > >>>> > >>>> 37 // Starting a ProcessBuilder causes the process reaper > >>>> thread being > >>>> > >>>> s/being/to be/ > >>> > >>> Fixed. > >>> > >>>> > >>>> 43 // failure mode the VM fails to create thread with > >>>> error message > >>>> > >>>> s/create thread/create a thread/ > >>> > >>> Fixed. > >>> > >>>> > >>>> 53 System.out.println("Unexpected Echo output: " + > >>>> echoOutput + > >>>> 54 ", expects: " + echoInput); > >>>> > >>>> should this be an exception so that test fails? I can't imagine how echo > >>>> would fail but probably better to fail the test if something unexpected > >>>> happens. > >>> > >>> If no expected output is obtained from echo (due to ProcessBuilder > >>> failure caused by TLS issue), the test does fail and reports to the > >>> caller (returns false). If we throw an explicit expectation, it will > >>> be caught by the outer try/catch, which seems to be unnecessary. > >>> > >>>> > >>>> 66 try { > >>>> 67 br = new BufferedReader(new > >>>> InputStreamReader(inputStream)); > >>>> 68 s = br.readLine(); > >>>> 69 } finally { > >>>> 70 br.close(); > >>>> 71 } > >>>> > >>>> This could use try-with-resources for both streams. > >>> > >>> Sounds good. Done. > >>> > >>>> > >>>> --- > >>>> > >>>> exestack-tls.c > >>>> > >>>> It's simpler if no argument means no-tls and an argument means tls. > >>>> > >>>> 42 char classpath[4096]; > >>>> 43 snprintf(classpath, sizeof classpath, > >>>> 44 "-Djava.class.path=%s", getenv("CLASSPATH")); > >>>> 45 options[0].optionString = classpath; > >>>> > >>>> Do we need to explicitly set the classpath? I'm concerned that our test > >>>> environment uses really, really long paths and a number of them. > >>>> (Probably not 4096 but still ...) > >>> > >>> The classpath needs to be set so we know where to load the test class. > >>> Given that we use 4096 for the same type of usage in other existing > >>> test(s) (for example StackGap), it probably is okay for our test > >>> environments? I can increase the array size if we want to be extra > >>> safe ... > >>> > >>>> > >>>> > >>>> test/hotspot/jtreg/runtime/TLS/testtls.sh > >>>> > >>>> 40 if [ "${VM_OS}" != "linux" ] > >>>> 41 then > >>>> 42 echo "Test is only valid for Linux" > >>>> 43 exit 0 > >>>> 44 fi > >>>> > >>>> This should be done via "@requires os.family != Linux" > >>> > >>> Done. > >>> > >>> Thanks! > >>> > >>> Best regards, > >>> Jiangli > >>>> > >>>> Thanks, > >>>> David > >>>> ----- > >>>> > >>>>> Thanks for everyone's contribution on carving out the current workaround! > >>>>> > >>>>> Best regards, > >>>>> Jiangli > >>>>> > >>>>> On Thu, Jun 27, 2019 at 11:42 AM Jiangli Zhou wrote: > >>>>>> > >>>>>> Thank you Thomas and David! Glad to see that we are converging on an > >>>>>> acceptable approach here. I'll try to factor in all the latest inputs > >>>>>> from everyone and send out a new update. > >>>>>> > >>>>>> Thanks and best regards, > >>>>>> Jiangli > >>>>>> > >>>>>> On Thu, Jun 27, 2019 at 11:13 AM David Holmes wrote: > >>>>>>> > >>>>>>> Trimming .... > >>>>>>> > >>>>>>> On 27/06/2019 12:35 pm, Jiangli Zhou wrote: > >>>>>>>> On Thu, Jun 27, 2019 at 9:23 AM Florian Weimer wrote: > >>>>>>>>> I think you can handle the guard size in this way: > >>>>>>>>> > >>>>>>>>> pthread_attr_setguardsize(&attr, guard_size); > >>>>>>>>> > >>>>>>>>> size_t stack_adjust_size = 0; > >>>>>>>>> if (AdjustStackSizeForTLS) { > >>>>>>>>> size_t minstack_size = get_minstack(&attr); > >>>>>>>>> size_t tls_size = minstack_size - vm_page_size() - PTHREAD_STACK_MIN; > >>>>>>>>> // In glibc before 2.27, tls_size still includes guard_size. > >>>>>>>>> // In glibc 2.27 and later, guard_size is automatically > >>>>>>>>> // added to the stack size by pthread_create. > >>>>>>>>> // In both cases, the guard size is taken into account. > >>>>>>>>> stack_adjust_size += tls_size; > >>>>>>>>> } else { > >>>>>>>>> stack_adjust_size += guard_size; > >>>>>>>>> } > >>>>>>>> > >>>>>>>> Is the vm_page_size() counted for the dl_pagesize? As long as others > >>>>>>>> are okay with the above suggested adjustment, it looks good to me. > >>>>>>>> Thomas, David and others, any objection? > >>>>>>> > >>>>>>> I find the above acceptable. I've been waiting for the dust to settle. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> David > >>>>>>> > >>>>>>>> Thanks and best regards, > >>>>>>>> Jiangli > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Florian From thomas.stuefe at gmail.com Wed Jul 3 20:09:16 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 3 Jul 2019 22:09:16 +0200 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: References: <874l4bzekk.fsf@oldenburg2.str.redhat.com> <87k1d7xxy7.fsf@oldenburg2.str.redhat.com> <87ftnvt604.fsf@oldenburg2.str.redhat.com> <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> <0dacfb70-8901-9e9f-9a1f-6a5dc1bc6a91@oracle.com> Message-ID: Hi Jiangli, On Wed 3. Jul 2019 at 20:23, Jiangli Zhou wrote: > Hi David, > > On Wed, Jul 3, 2019 at 12:00 AM David Holmes > wrote: > > > > Hi Jiangli, > > > > On 2/07/2019 8:33 am, Jiangli Zhou wrote: > > > Hi David, > > > > > > On Mon, Jul 1, 2019 at 1:22 AM David Holmes > wrote: > > >> > > >> Hi Jiangli, > > >> > > >> On 29/06/2019 9:42 am, Jiangli Zhou wrote: > > >>> Hi David, > > >>> > > >>> Thanks for the detailed comments! Here is the latest webrev: > > >>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.04/. Apologizing > > >>> for not including an incremental webrev (realized that when I almost > > >>> done edits). > > >> > > >> That all looks fine - thanks for making the changes. > > >> > > >> However ... now that I see the logging output it occurred to me that > > >> checking for the TLS adjustment is something that should only happen > > >> once and we should be storing the adjustment amount in a static for > > >> direct use. Sorry I didn't think about this earlier. Something like: > > > > > > No problem at all! I actually thought in the same direction as well > > > when making the change initially but didn't go with it as I had some > > > concerns. Florian's latest reply is reassuring (thanks again!). So > > > here is the update: > > > http://cr.openjdk.java.net/~jiangli/8225035/webrev_inc.05/. > > > > The incremental change looks fine to me. To address Florian's concern I > > suggest adding an additional comment: > > > > // Returns the size of the static TLS area glibc puts on thread > stacks. > > + // The value is cached on first use, which occurs when the first thread > > + // is created during VM initialization. > > static size_t get_static_tls_area_size(const pthread_attr_t *attr) { > > > Done. > > > > > > Since we are settling down on the approach and final implementation > > > details, it would be a good idea to get the CSR ball rolling. Could > > > you please review and I'll finalize the CSR. Thanks! > > > > Done. > > Thank you! > > As you, Florian, Thomas all made great contributions to this > workaround, I should list all of you as both contributors and > reviewers in the changeset. If there is any objection, please let me > know. > Appreciated, but not for me. May make sense for Florian if he intends to get author status at some point. Thanks for your perseverance. Cheers, Thomas > Best regards, > Jiangli > > > > > Thanks, > > David > > > > > Best regards, > > > Jiangli > > > > > >> > > >> static size_t tls_size = 0; > > >> static bool tls_size_inited = false; > > >> > > >> static size_t get_static_tls_area_size(const pthread_attr_t *attr) { > > >> + if (!tls_size_inited) { > > >> + tls_size_inited = true; > > >> if (_get_minstack_func != NULL) { > > >> size_t minstack_size = _get_minstack_func(attr); > > >> ... > > >> if (minstack_size > (size_t)os::vm_page_size() + > > >> PTHREAD_STACK_MIN) { > > >> tls_size = minstack_size - os::vm_page_size() - > PTHREAD_STACK_MIN; > > >> } > > >> } > > >> + } > > >> log_info(os, thread)("Stack size adjustment for TLS is " > SIZE_FORMAT, > > >> tls_size); > > >> return tls_size; > > >> } > > >> > > >> Or even fold it all into get_minstack_init() ? > > >> > > >> I'm assuming that the result of __pthread_get_minstack wont' change > over > > >> time of course. > > >> > > >> Thanks, > > >> David > > >> ----- > > >> > > >>> On Fri, Jun 28, 2019 at 8:57 AM David Holmes < > david.holmes at oracle.com> wrote: > > >>>> > > >>>> Hi Jiangli, > > >>>> > > >>>> This is very well written up - thanks. > > >>>> > > >>>> I apologize in advance that I'm about to traveling for a few days so > > >>>> won't be able to respond further until next week. > > >>> > > >>> Hope you have a relaxed and safe travel. This can wait. > > >>> > > >>>> > > >>>> On 27/06/2019 11:58 pm, Jiangli Zhou wrote: > > >>>>> Updated webrev: > > >>>>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.03/ > > >>>> > > >>>> Overall changes look good. I also have a concern around this code: > > >>>> > > >>>> + tls_size = minstack_size - os::vm_page_size() - > PTHREAD_STACK_MIN; > > >>>> + assert(tls_size > 0, "unexpected size"); > > >>>> > > >>>> In addition to Thomas's comments re-signedness, can't the result == > 0 if > > >>>> there is no sttaic TLS in use? Or is there always some static TLS > in use? > > >>> > > >>> On both glibc 2.24 and 2.28, by default I see there is one page > memory > > >>> for static TLS, without explicitly defining any __thread variables in > > >>> user code. > > >>> > > >>>> > > >>>> A few other comments/requests: > > >>>> > > >>>> 822 static void get_minstack_init() { > > >>>> 823 _get_minstack_func = > > >>>> 824 (GetMinStack)dlsym(RTLD_DEFAULT, > "__pthread_get_minstack"); > > >>>> 825 } > > >>>> > > >>>> Can you add a logging statement please: > > >>>> > > >>>> log(os, thread)("Lookup of __pthread_get_minstack %s", > > >>>> _get_minstack_func == NULL ? "failed" : > "succeeded"); > > >>> > > >>> Added log info. > > >>> > > >>>> > > >>>> -- > > >>>> > > >>>> 884 // In the Linux NPTL pthread implementation the guard > size mechanism > > >>>> > > >>>> Now that we have additional information on this could you update > this > > >>>> old comment to say > > >>>> > > >>>> // In glibc versions prior to 2.7 the guard size mechanism > > >>> > > >>> Done. > > >>> > > >>>> > > >>>> -- > > >>>> > > >>>> 897 // Adjust the stack_size by adding the on-stack TLS > size if > > >>>> 898 // AdjustStackSizeForTLS is true. The guard size is > already > > >>>> 899 // accounted in this case, please see comments in > > >>>> 900 // get_static_tls_area_size(). > > >>>> > > >>>> Given the extensive commentary in get_static_tls_area_size() can we > be > > >>>> more brief here and just say: > > >>>> > > >>>> // Adjust the stack size for on-stack TLS - see > get_static_tls_area_size(). > > >>> > > >>> Done. > > >>> > > >>>> > > >>>> -- > > >>>> > > >>>> 5193 get_minstack_init(); > > >>>> > > >>>> Can you make this conditional on AdjustStackSizeForTLS please so > there > > >>>> is no affect when not using the flag - thanks. > > >>> > > >>> Done. > > >>> > > >>>> > > >>>> -- > > >>>> > > >>>> 855 } > > >>>> 856 return tls_size; > > >>>> > > >>>> Can you insert a logging statement: > > >>>> > > >>>> log(os, thread)("Stack size adjustment for TLS is " SIZE_T_FORMAT, > > >>>> tls_size); > > >>> > > >>> Added. > > >>> > > >>>> > > >>>> --- > > >>>> > > >>>> test/hotspot/jtreg/runtime/TLS/T.java > > >>>> > > >>>> 37 // Starting a ProcessBuilder causes the process > reaper > > >>>> thread being > > >>>> > > >>>> s/being/to be/ > > >>> > > >>> Fixed. > > >>> > > >>>> > > >>>> 43 // failure mode the VM fails to create thread > with > > >>>> error message > > >>>> > > >>>> s/create thread/create a thread/ > > >>> > > >>> Fixed. > > >>> > > >>>> > > >>>> 53 System.out.println("Unexpected Echo output: > " + > > >>>> echoOutput + > > >>>> 54 ", expects: " + > echoInput); > > >>>> > > >>>> should this be an exception so that test fails? I can't imagine how > echo > > >>>> would fail but probably better to fail the test if something > unexpected > > >>>> happens. > > >>> > > >>> If no expected output is obtained from echo (due to ProcessBuilder > > >>> failure caused by TLS issue), the test does fail and reports to the > > >>> caller (returns false). If we throw an explicit expectation, it will > > >>> be caught by the outer try/catch, which seems to be unnecessary. > > >>> > > >>>> > > >>>> 66 try { > > >>>> 67 br = new BufferedReader(new > > >>>> InputStreamReader(inputStream)); > > >>>> 68 s = br.readLine(); > > >>>> 69 } finally { > > >>>> 70 br.close(); > > >>>> 71 } > > >>>> > > >>>> This could use try-with-resources for both streams. > > >>> > > >>> Sounds good. Done. > > >>> > > >>>> > > >>>> --- > > >>>> > > >>>> exestack-tls.c > > >>>> > > >>>> It's simpler if no argument means no-tls and an argument means tls. > > >>>> > > >>>> 42 char classpath[4096]; > > >>>> 43 snprintf(classpath, sizeof classpath, > > >>>> 44 "-Djava.class.path=%s", getenv("CLASSPATH")); > > >>>> 45 options[0].optionString = classpath; > > >>>> > > >>>> Do we need to explicitly set the classpath? I'm concerned that our > test > > >>>> environment uses really, really long paths and a number of them. > > >>>> (Probably not 4096 but still ...) > > >>> > > >>> The classpath needs to be set so we know where to load the test > class. > > >>> Given that we use 4096 for the same type of usage in other existing > > >>> test(s) (for example StackGap), it probably is okay for our test > > >>> environments? I can increase the array size if we want to be extra > > >>> safe ... > > >>> > > >>>> > > >>>> > > >>>> test/hotspot/jtreg/runtime/TLS/testtls.sh > > >>>> > > >>>> 40 if [ "${VM_OS}" != "linux" ] > > >>>> 41 then > > >>>> 42 echo "Test is only valid for Linux" > > >>>> 43 exit 0 > > >>>> 44 fi > > >>>> > > >>>> This should be done via "@requires os.family != Linux" > > >>> > > >>> Done. > > >>> > > >>> Thanks! > > >>> > > >>> Best regards, > > >>> Jiangli > > >>>> > > >>>> Thanks, > > >>>> David > > >>>> ----- > > >>>> > > >>>>> Thanks for everyone's contribution on carving out the current > workaround! > > >>>>> > > >>>>> Best regards, > > >>>>> Jiangli > > >>>>> > > >>>>> On Thu, Jun 27, 2019 at 11:42 AM Jiangli Zhou < > jianglizhou at google.com> wrote: > > >>>>>> > > >>>>>> Thank you Thomas and David! Glad to see that we are converging on > an > > >>>>>> acceptable approach here. I'll try to factor in all the latest > inputs > > >>>>>> from everyone and send out a new update. > > >>>>>> > > >>>>>> Thanks and best regards, > > >>>>>> Jiangli > > >>>>>> > > >>>>>> On Thu, Jun 27, 2019 at 11:13 AM David Holmes < > david.holmes at oracle.com> wrote: > > >>>>>>> > > >>>>>>> Trimming .... > > >>>>>>> > > >>>>>>> On 27/06/2019 12:35 pm, Jiangli Zhou wrote: > > >>>>>>>> On Thu, Jun 27, 2019 at 9:23 AM Florian Weimer < > fweimer at redhat.com> wrote: > > >>>>>>>>> I think you can handle the guard size in this way: > > >>>>>>>>> > > >>>>>>>>> pthread_attr_setguardsize(&attr, guard_size); > > >>>>>>>>> > > >>>>>>>>> size_t stack_adjust_size = 0; > > >>>>>>>>> if (AdjustStackSizeForTLS) { > > >>>>>>>>> size_t minstack_size = get_minstack(&attr); > > >>>>>>>>> size_t tls_size = minstack_size - vm_page_size() - > PTHREAD_STACK_MIN; > > >>>>>>>>> // In glibc before 2.27, tls_size still includes > guard_size. > > >>>>>>>>> // In glibc 2.27 and later, guard_size is automatically > > >>>>>>>>> // added to the stack size by pthread_create. > > >>>>>>>>> // In both cases, the guard size is taken into account. > > >>>>>>>>> stack_adjust_size += tls_size; > > >>>>>>>>> } else { > > >>>>>>>>> stack_adjust_size += guard_size; > > >>>>>>>>> } > > >>>>>>>> > > >>>>>>>> Is the vm_page_size() counted for the dl_pagesize? As long as > others > > >>>>>>>> are okay with the above suggested adjustment, it looks good to > me. > > >>>>>>>> Thomas, David and others, any objection? > > >>>>>>> > > >>>>>>> I find the above acceptable. I've been waiting for the dust to > settle. > > >>>>>>> > > >>>>>>> Thanks, > > >>>>>>> David > > >>>>>>> > > >>>>>>>> Thanks and best regards, > > >>>>>>>> Jiangli > > >>>>>>>>> > > >>>>>>>>> Thanks, > > >>>>>>>>> Florian > From david.holmes at oracle.com Wed Jul 3 23:58:22 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 4 Jul 2019 09:58:22 +1000 Subject: "Nestegg" buffer for error reporting in native oom scenarios In-Reply-To: References: Message-ID: <076a7999-ea01-34b5-df2c-d25a7f6bac8e@oracle.com> Hi Thomas, On 4/07/2019 4:03 am, Thomas St?fe wrote: > Hi,, > > I carry this tiny patch around since quite a while which makes error > handling more stable in native OOM situations. I usually apply it when > dealing with memory leaks, to increase the chance of useful error reports. > > A number of error reporting things require memory. When we are out of > memory, those steps may fail. > > A prominent example is NMT: when creating a detailed report, it allocates > memory. In OOM scenarios NMT will not work because of this, which is a pity > since this is exactly the time where having an NMT report would be super > useful. > > A clean solution would be to harden everything running inside error > handling to work with pre-allocated buffers instead, or to not alloc memory > at all. But that is difficult or even impossible. > > What I do instead is tp allocate memory at VM startup and to release it > back into the clib when a native OOM happens (of course, only when the > switch is set). > > This is of course no guarantee that this works - code running concurrently > may gobble the memory up the moment I release it, for instance - but it > works surprisingly often, and in a number of cases helped me e.g. to get a > detailed NMT report where otherwise I would have gotten nothing. Basic approach seems okay, but I don't like some of the details ie the need to expose this in the OS API. but I'm into my last 2 days before 2 weeks vacation and don't have time to get into it at this stage - sorry. (BTW a lot of folk are on vacation, or going on vacation this month.) Also pet peeve: p_foo - we don't use hungarian notation in the VM so the p_ should be dropped. ;-) Cheers, David ----- > Patch: > > http://cr.openjdk.java.net/~stuefe/webrevs/nestegg/webrev.00/webrev/index.html > > What do you think? Too stupid or weird? We can talk of course about the > naming :) > > I am not especially proud of that hack, but as a technique, it is at least > dead simple and reasonably successful. > > Thanks, Thomas > From calvin.cheung at oracle.com Thu Jul 4 00:59:00 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Wed, 3 Jul 2019 17:59:00 -0700 Subject: RFR(S) 8226406: JVM fails to detect mismatched or corrupt CDS archive Message-ID: <1c5ab850-8b93-0b1f-3be4-8ba9f6eb469d@oracle.com> bug: https://bugs.openjdk.java.net/browse/JDK-8226406 webrev: http://cr.openjdk.java.net/~ccheung/8226406/webrev.00/ This bug was found during a bootcycle build when a shared archive built by a 64-bit JDK version is used by a 32-bit JDK version. It is due to some of the important header fields such as the _jvm_ident was not checked prior to accessinng other fields such as the _paths_misc_info_size. This fix involves checking most the fields in CDSFileMapHeaderBase before accessing other fields. Testing: tiers 1-3. thanks, Calvin From daniel.daugherty at oracle.com Thu Jul 4 02:04:59 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Wed, 3 Jul 2019 22:04:59 -0400 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH Message-ID: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> Greetings, Robbin recently discovered this issue with Thread Local Handshakes. Since he's not available at the moment, I'm handling the issue: ??? JDK-8227117 normal interpreter table is not restored after single stepping with TLH ??? https://bugs.openjdk.java.net/browse/JDK-8227117 When using Thread Local Handshakes, the normal interpreter table is not restored after single stepping. This issue is caused by the VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to restore the normal interpreter table for the "off" case. Prior to Thread Local Handshakes, this was a valid assumption to make. SafepointSynchronize::end() has been refactored into disarm_safepoint() and it only calls Interpreter::ignore_safepoints() on the global safepoint branch. That matches up with the call to Interpreter::notice_safepoints() that is also on the global safepoint branch. The solution is for the VM_ChangeSingleStep VM-op for the "off" case to call Interpreter::ignore_safepoints() directly. Here's the webrev URL: http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ The fix is just a small addition to VM_ChangeSingleStep::doit(): ?? if (_on) { ???? Interpreter::notice_safepoints(); +? } else { +??? Interpreter::ignore_safepoints(); ?? } Everything else is just new logging support for future debugging of interpreter table management and single stepping. Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. Mach5 Tier[4-6] on standard Oracle platforms is running now. Thanks, in advance, for questions, comments or suggestions. Dan From daniil.x.titov at oracle.com Thu Jul 4 03:04:26 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Wed, 03 Jul 2019 20:04:26 -0700 Subject: RFR: 8170299: Debugger does not stop inside the low memory notifications code Message-ID: <75B5F778-DC49-494B-AC12-270F301677CA@oracle.com> Please review the change the fixes the problem with the debugger not stopping in the low memory notification code. The problem here is that the ServiceThread that calls these MXBean listeners is hidden from the external view that prevents the debugger from stopping in it. The fix introduces new NotificationThread that is visible to the external view and offloads the ServiceThread from sending low memory and other notifications that could result in Java calls ( GC and diagnostic commands notifications) by moving these activities in this new NotificationThread. Testing: Mach5 tier1,tier2 and tier3 tests succeeded. Webrev: https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/ Bug: https://bugs.openjdk.java.net/browse/JDK-8170299 Thanks! --Daniil From serguei.spitsyn at oracle.com Thu Jul 4 03:34:49 2019 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 3 Jul 2019 20:34:49 -0700 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> Message-ID: <550c9f2f-66d5-4269-db67-ee2818b898c8@oracle.com> Hi Dan, Thank you and Robbin for discovering and fixing this! The fix looks good to me. It is nice to have this new logging. Should we fix this bug in 13 first, or you consider it risky? Thanks, Serguei On 7/3/19 7:04 PM, Daniel D. Daugherty wrote: > Greetings, > > Robbin recently discovered this issue with Thread Local Handshakes. Since > he's not available at the moment, I'm handling the issue: > > ??? JDK-8227117 normal interpreter table is not restored after single > stepping with TLH > ??? https://bugs.openjdk.java.net/browse/JDK-8227117 > > When using Thread Local Handshakes, the normal interpreter table is > not restored after single stepping. This issue is caused by the > VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to > restore the normal interpreter table for the "off" case. > > Prior to Thread Local Handshakes, this was a valid assumption to make. > SafepointSynchronize::end() has been refactored into > disarm_safepoint() and it only calls Interpreter::ignore_safepoints() > on the global safepoint branch. That matches up with the call to > Interpreter::notice_safepoints() that is also on the global safepoint > branch. > > The solution is for the VM_ChangeSingleStep VM-op for the "off" case > to call Interpreter::ignore_safepoints() directly. > > Here's the webrev URL: > > http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ > > The fix is just a small addition to VM_ChangeSingleStep::doit(): > > ?? if (_on) { > ???? Interpreter::notice_safepoints(); > +? } else { > +??? Interpreter::ignore_safepoints(); > ?? } > > Everything else is just new logging support for future debugging of > interpreter table management and single stepping. > > Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. > Mach5 Tier[4-6] on standard Oracle platforms is running now. > > Thanks, in advance, for questions, comments or suggestions. > > Dan > From serguei.spitsyn at oracle.com Thu Jul 4 04:02:36 2019 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 3 Jul 2019 21:02:36 -0700 Subject: RFR: 8170299: Debugger does not stop inside the low memory notifications code In-Reply-To: <75B5F778-DC49-494B-AC12-270F301677CA@oracle.com> References: <75B5F778-DC49-494B-AC12-270F301677CA@oracle.com> Message-ID: <30fa7fad-92c7-c6a2-9c73-84265cc02208@oracle.com> Hi Daniil, I've not finished my review but it looks good in general. A couple of quick comments. https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/src/hotspot/share/runtime/notificationThread.hpp.html ?I wonder if this function is also needed: ?? static bool is_notification_thread(Thread* thread); https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/src/hotspot/share/runtime/notificationThread.cpp.html I wonder why this include statement is missed: ? #include "runtime/mutexLocker.hpp" Also, these have to be correctly ordred: 29 #include "runtime/notificationThread.hpp" 30 #include "services/lowMemoryDetector.hpp" 31 #include "services/gcNotifier.hpp" 32 #include "services/diagnosticArgument.hpp" 33 #include "services/diagnosticFramework.hpp" Thanks, Serguei On 7/3/19 8:04 PM, Daniil Titov wrote: > Please review the change the fixes the problem with the debugger not stopping in the low memory notification code. > > The problem here is that the ServiceThread that calls these MXBean listeners is hidden from the external view that prevents the debugger from stopping in it. > > The fix introduces new NotificationThread that is visible to the external view and offloads the ServiceThread from sending low memory and other notifications that could result in Java calls ( GC and diagnostic commands notifications) by moving these activities in this new NotificationThread. > > Testing: Mach5 tier1,tier2 and tier3 tests succeeded. > > Webrev: https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8170299 > > Thanks! > --Daniil > > From david.holmes at oracle.com Thu Jul 4 06:47:36 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 4 Jul 2019 16:47:36 +1000 Subject: RFR: 8170299: Debugger does not stop inside the low memory notifications code In-Reply-To: <75B5F778-DC49-494B-AC12-270F301677CA@oracle.com> References: <75B5F778-DC49-494B-AC12-270F301677CA@oracle.com> Message-ID: <60639d41-735a-00d3-c9db-1955f581b89a@oracle.com> Hi Daniil, On 4/07/2019 1:04 pm, Daniil Titov wrote: > Please review the change the fixes the problem with the debugger not stopping in the low memory notification code. > > The problem here is that the ServiceThread that calls these MXBean listeners is hidden from the external view that prevents the debugger from stopping in it. > > The fix introduces new NotificationThread that is visible to the external view and offloads the ServiceThread from sending low memory and other notifications that could result in Java calls ( GC and diagnostic commands notifications) by moving these activities in this new NotificationThread. There is a long and unfortunate history with this bug. The original incarnation of this fix was introducing a new thread at the Java library level, and I had some concerns about that: http://mail.openjdk.java.net/pipermail/serviceability-dev/2017-December/022612.html That effort was resurrected at: http://mail.openjdk.java.net/pipermail/serviceability-dev/2018-July/024466.html and http://mail.openjdk.java.net/pipermail/serviceability-dev/2018-August/024849.html but was left somewhat in limbo. There was a lot of doubt about the right way to fix this bug and whether introducing a new thread was too disruptive. But introducing a new thread in the VM also has the same set of concerns! This needs consideration by the runtime team before going ahead. Introducing a new thread likes this needs to be examined in detail - particularly the synchronization interactions with other threads. It also introduces another monitor designated safepoint-never at a time when we are in the process of cleaning up monitors so that JavaThreads will only use safepoint-check-always monitors. Unfortunately I'm about to head out for two weeks vacation, and a number of other key runtime folk are also on vacation. but I'd ask that you hold off on this until we can look at it in more detail. Thanks, David ----- > Testing: Mach5 tier1,tier2 and tier3 tests succeeded. > > Webrev: https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8170299 > > Thanks! > --Daniil > > From david.holmes at oracle.com Thu Jul 4 06:52:56 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 4 Jul 2019 16:52:56 +1000 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: References: <87k1d7xxy7.fsf@oldenburg2.str.redhat.com> <87ftnvt604.fsf@oldenburg2.str.redhat.com> <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> <0dacfb70-8901-9e9f-9a1f-6a5dc1bc6a91@oracle.com> Message-ID: <74d4d8b0-6e5f-d9b0-8500-530b42419496@oracle.com> On 4/07/2019 4:22 am, Jiangli Zhou wrote: > Hi David, > > On Wed, Jul 3, 2019 at 12:00 AM David Holmes wrote: >> >> Hi Jiangli, >> >> On 2/07/2019 8:33 am, Jiangli Zhou wrote: >>> Hi David, >>> >>> On Mon, Jul 1, 2019 at 1:22 AM David Holmes wrote: >>>> >>>> Hi Jiangli, >>>> >>>> On 29/06/2019 9:42 am, Jiangli Zhou wrote: >>>>> Hi David, >>>>> >>>>> Thanks for the detailed comments! Here is the latest webrev: >>>>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.04/. Apologizing >>>>> for not including an incremental webrev (realized that when I almost >>>>> done edits). >>>> >>>> That all looks fine - thanks for making the changes. >>>> >>>> However ... now that I see the logging output it occurred to me that >>>> checking for the TLS adjustment is something that should only happen >>>> once and we should be storing the adjustment amount in a static for >>>> direct use. Sorry I didn't think about this earlier. Something like: >>> >>> No problem at all! I actually thought in the same direction as well >>> when making the change initially but didn't go with it as I had some >>> concerns. Florian's latest reply is reassuring (thanks again!). So >>> here is the update: >>> http://cr.openjdk.java.net/~jiangli/8225035/webrev_inc.05/. >> >> The incremental change looks fine to me. To address Florian's concern I >> suggest adding an additional comment: >> >> // Returns the size of the static TLS area glibc puts on thread stacks. >> + // The value is cached on first use, which occurs when the first thread >> + // is created during VM initialization. >> static size_t get_static_tls_area_size(const pthread_attr_t *attr) { > > > Done. > >> >>> Since we are settling down on the approach and final implementation >>> details, it would be a good idea to get the CSR ball rolling. Could >>> you please review and I'll finalize the CSR. Thanks! >> >> Done. > > Thank you! > > As you, Florian, Thomas all made great contributions to this > workaround, I should list all of you as both contributors and > reviewers in the changeset. If there is any objection, please let me > know. It's not necessary for me, but I appreciate the consideration. Cheers, David > > Best regards, > Jiangli > >> >> Thanks, >> David >> >>> Best regards, >>> Jiangli >>> >>>> >>>> static size_t tls_size = 0; >>>> static bool tls_size_inited = false; >>>> >>>> static size_t get_static_tls_area_size(const pthread_attr_t *attr) { >>>> + if (!tls_size_inited) { >>>> + tls_size_inited = true; >>>> if (_get_minstack_func != NULL) { >>>> size_t minstack_size = _get_minstack_func(attr); >>>> ... >>>> if (minstack_size > (size_t)os::vm_page_size() + >>>> PTHREAD_STACK_MIN) { >>>> tls_size = minstack_size - os::vm_page_size() - PTHREAD_STACK_MIN; >>>> } >>>> } >>>> + } >>>> log_info(os, thread)("Stack size adjustment for TLS is " SIZE_FORMAT, >>>> tls_size); >>>> return tls_size; >>>> } >>>> >>>> Or even fold it all into get_minstack_init() ? >>>> >>>> I'm assuming that the result of __pthread_get_minstack wont' change over >>>> time of course. >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>>> On Fri, Jun 28, 2019 at 8:57 AM David Holmes wrote: >>>>>> >>>>>> Hi Jiangli, >>>>>> >>>>>> This is very well written up - thanks. >>>>>> >>>>>> I apologize in advance that I'm about to traveling for a few days so >>>>>> won't be able to respond further until next week. >>>>> >>>>> Hope you have a relaxed and safe travel. This can wait. >>>>> >>>>>> >>>>>> On 27/06/2019 11:58 pm, Jiangli Zhou wrote: >>>>>>> Updated webrev: >>>>>>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.03/ >>>>>> >>>>>> Overall changes look good. I also have a concern around this code: >>>>>> >>>>>> + tls_size = minstack_size - os::vm_page_size() - PTHREAD_STACK_MIN; >>>>>> + assert(tls_size > 0, "unexpected size"); >>>>>> >>>>>> In addition to Thomas's comments re-signedness, can't the result == 0 if >>>>>> there is no sttaic TLS in use? Or is there always some static TLS in use? >>>>> >>>>> On both glibc 2.24 and 2.28, by default I see there is one page memory >>>>> for static TLS, without explicitly defining any __thread variables in >>>>> user code. >>>>> >>>>>> >>>>>> A few other comments/requests: >>>>>> >>>>>> 822 static void get_minstack_init() { >>>>>> 823 _get_minstack_func = >>>>>> 824 (GetMinStack)dlsym(RTLD_DEFAULT, "__pthread_get_minstack"); >>>>>> 825 } >>>>>> >>>>>> Can you add a logging statement please: >>>>>> >>>>>> log(os, thread)("Lookup of __pthread_get_minstack %s", >>>>>> _get_minstack_func == NULL ? "failed" : "succeeded"); >>>>> >>>>> Added log info. >>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> 884 // In the Linux NPTL pthread implementation the guard size mechanism >>>>>> >>>>>> Now that we have additional information on this could you update this >>>>>> old comment to say >>>>>> >>>>>> // In glibc versions prior to 2.7 the guard size mechanism >>>>> >>>>> Done. >>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> 897 // Adjust the stack_size by adding the on-stack TLS size if >>>>>> 898 // AdjustStackSizeForTLS is true. The guard size is already >>>>>> 899 // accounted in this case, please see comments in >>>>>> 900 // get_static_tls_area_size(). >>>>>> >>>>>> Given the extensive commentary in get_static_tls_area_size() can we be >>>>>> more brief here and just say: >>>>>> >>>>>> // Adjust the stack size for on-stack TLS - see get_static_tls_area_size(). >>>>> >>>>> Done. >>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> 5193 get_minstack_init(); >>>>>> >>>>>> Can you make this conditional on AdjustStackSizeForTLS please so there >>>>>> is no affect when not using the flag - thanks. >>>>> >>>>> Done. >>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> 855 } >>>>>> 856 return tls_size; >>>>>> >>>>>> Can you insert a logging statement: >>>>>> >>>>>> log(os, thread)("Stack size adjustment for TLS is " SIZE_T_FORMAT, >>>>>> tls_size); >>>>> >>>>> Added. >>>>> >>>>>> >>>>>> --- >>>>>> >>>>>> test/hotspot/jtreg/runtime/TLS/T.java >>>>>> >>>>>> 37 // Starting a ProcessBuilder causes the process reaper >>>>>> thread being >>>>>> >>>>>> s/being/to be/ >>>>> >>>>> Fixed. >>>>> >>>>>> >>>>>> 43 // failure mode the VM fails to create thread with >>>>>> error message >>>>>> >>>>>> s/create thread/create a thread/ >>>>> >>>>> Fixed. >>>>> >>>>>> >>>>>> 53 System.out.println("Unexpected Echo output: " + >>>>>> echoOutput + >>>>>> 54 ", expects: " + echoInput); >>>>>> >>>>>> should this be an exception so that test fails? I can't imagine how echo >>>>>> would fail but probably better to fail the test if something unexpected >>>>>> happens. >>>>> >>>>> If no expected output is obtained from echo (due to ProcessBuilder >>>>> failure caused by TLS issue), the test does fail and reports to the >>>>> caller (returns false). If we throw an explicit expectation, it will >>>>> be caught by the outer try/catch, which seems to be unnecessary. >>>>> >>>>>> >>>>>> 66 try { >>>>>> 67 br = new BufferedReader(new >>>>>> InputStreamReader(inputStream)); >>>>>> 68 s = br.readLine(); >>>>>> 69 } finally { >>>>>> 70 br.close(); >>>>>> 71 } >>>>>> >>>>>> This could use try-with-resources for both streams. >>>>> >>>>> Sounds good. Done. >>>>> >>>>>> >>>>>> --- >>>>>> >>>>>> exestack-tls.c >>>>>> >>>>>> It's simpler if no argument means no-tls and an argument means tls. >>>>>> >>>>>> 42 char classpath[4096]; >>>>>> 43 snprintf(classpath, sizeof classpath, >>>>>> 44 "-Djava.class.path=%s", getenv("CLASSPATH")); >>>>>> 45 options[0].optionString = classpath; >>>>>> >>>>>> Do we need to explicitly set the classpath? I'm concerned that our test >>>>>> environment uses really, really long paths and a number of them. >>>>>> (Probably not 4096 but still ...) >>>>> >>>>> The classpath needs to be set so we know where to load the test class. >>>>> Given that we use 4096 for the same type of usage in other existing >>>>> test(s) (for example StackGap), it probably is okay for our test >>>>> environments? I can increase the array size if we want to be extra >>>>> safe ... >>>>> >>>>>> >>>>>> >>>>>> test/hotspot/jtreg/runtime/TLS/testtls.sh >>>>>> >>>>>> 40 if [ "${VM_OS}" != "linux" ] >>>>>> 41 then >>>>>> 42 echo "Test is only valid for Linux" >>>>>> 43 exit 0 >>>>>> 44 fi >>>>>> >>>>>> This should be done via "@requires os.family != Linux" >>>>> >>>>> Done. >>>>> >>>>> Thanks! >>>>> >>>>> Best regards, >>>>> Jiangli >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> ----- >>>>>> >>>>>>> Thanks for everyone's contribution on carving out the current workaround! >>>>>>> >>>>>>> Best regards, >>>>>>> Jiangli >>>>>>> >>>>>>> On Thu, Jun 27, 2019 at 11:42 AM Jiangli Zhou wrote: >>>>>>>> >>>>>>>> Thank you Thomas and David! Glad to see that we are converging on an >>>>>>>> acceptable approach here. I'll try to factor in all the latest inputs >>>>>>>> from everyone and send out a new update. >>>>>>>> >>>>>>>> Thanks and best regards, >>>>>>>> Jiangli >>>>>>>> >>>>>>>> On Thu, Jun 27, 2019 at 11:13 AM David Holmes wrote: >>>>>>>>> >>>>>>>>> Trimming .... >>>>>>>>> >>>>>>>>> On 27/06/2019 12:35 pm, Jiangli Zhou wrote: >>>>>>>>>> On Thu, Jun 27, 2019 at 9:23 AM Florian Weimer wrote: >>>>>>>>>>> I think you can handle the guard size in this way: >>>>>>>>>>> >>>>>>>>>>> pthread_attr_setguardsize(&attr, guard_size); >>>>>>>>>>> >>>>>>>>>>> size_t stack_adjust_size = 0; >>>>>>>>>>> if (AdjustStackSizeForTLS) { >>>>>>>>>>> size_t minstack_size = get_minstack(&attr); >>>>>>>>>>> size_t tls_size = minstack_size - vm_page_size() - PTHREAD_STACK_MIN; >>>>>>>>>>> // In glibc before 2.27, tls_size still includes guard_size. >>>>>>>>>>> // In glibc 2.27 and later, guard_size is automatically >>>>>>>>>>> // added to the stack size by pthread_create. >>>>>>>>>>> // In both cases, the guard size is taken into account. >>>>>>>>>>> stack_adjust_size += tls_size; >>>>>>>>>>> } else { >>>>>>>>>>> stack_adjust_size += guard_size; >>>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> Is the vm_page_size() counted for the dl_pagesize? As long as others >>>>>>>>>> are okay with the above suggested adjustment, it looks good to me. >>>>>>>>>> Thomas, David and others, any objection? >>>>>>>>> >>>>>>>>> I find the above acceptable. I've been waiting for the dust to settle. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> >>>>>>>>>> Thanks and best regards, >>>>>>>>>> Jiangli >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Florian From erik.osterlund at oracle.com Thu Jul 4 07:10:47 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 4 Jul 2019 03:10:47 -0400 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> Message-ID: <610e1384-71a0-9b14-3bff-69f22eaee3ed@oracle.com> Hi Dan, Thanks for picking this up. The change looks good. However, when reviewing this, I looked at the code for actually restoring the table (ignore/notice safepoints). It copies the dispatch table for the interpreter. There is a comment stating it is important the copying is atomic for MT-safety, and I can definitely see why. However, the copying the line after that comment is in fact not atomic. Here is the copying code in templateInterpreter.cpp: static inline void copy_table(address* from, address* to, int size) { // Copy non-overlapping tables. The copy has to occur word wise for MT safety. while (size-- > 0) *to++ = *from++; } Copying using a loop of non-volatile loads and stores can and definitely will on some compilers turn into memcpy calls instead as the compiler (correctly) considers that an equivalent transformation. And memcpy does not guarantee atomicity. Indeed on some platforms it is not atomic. On some platforms it will even enjoy out-of-thin-air values. Perhaps Copy::disjoint_words_atomic() would be a better choice for atomic word copying. If not, at the very least we should use Atomic::load/store here. Having said that, the fix for that issue seems like a separate RFE, because it has been sitting there for a lot longer than TLH has been around. Thanks, /Erik On 2019-07-04 04:04, Daniel D. Daugherty wrote: > Greetings, > > Robbin recently discovered this issue with Thread Local Handshakes. Since > he's not available at the moment, I'm handling the issue: > > ??? JDK-8227117 normal interpreter table is not restored after single > stepping with TLH > ??? https://bugs.openjdk.java.net/browse/JDK-8227117 > > When using Thread Local Handshakes, the normal interpreter table is > not restored after single stepping. This issue is caused by the > VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to > restore the normal interpreter table for the "off" case. > > Prior to Thread Local Handshakes, this was a valid assumption to make. > SafepointSynchronize::end() has been refactored into > disarm_safepoint() and it only calls Interpreter::ignore_safepoints() > on the global safepoint branch. That matches up with the call to > Interpreter::notice_safepoints() that is also on the global safepoint > branch. > > The solution is for the VM_ChangeSingleStep VM-op for the "off" case > to call Interpreter::ignore_safepoints() directly. > > Here's the webrev URL: > > http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ > > The fix is just a small addition to VM_ChangeSingleStep::doit(): > > ?? if (_on) { > ???? Interpreter::notice_safepoints(); > +? } else { > +??? Interpreter::ignore_safepoints(); > ?? } > > Everything else is just new logging support for future debugging of > interpreter table management and single stepping. > > Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. > Mach5 Tier[4-6] on standard Oracle platforms is running now. > > Thanks, in advance, for questions, comments or suggestions. > > Dan > From david.holmes at oracle.com Thu Jul 4 07:13:06 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 4 Jul 2019 17:13:06 +1000 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> Message-ID: <784a7d5e-da81-22a1-20f2-a507e958a1e3@oracle.com> Hi Dan, On 4/07/2019 12:04 pm, Daniel D. Daugherty wrote: > Greetings, > > Robbin recently discovered this issue with Thread Local Handshakes. Since > he's not available at the moment, I'm handling the issue: > > ??? JDK-8227117 normal interpreter table is not restored after single > stepping with TLH > ??? https://bugs.openjdk.java.net/browse/JDK-8227117 > > When using Thread Local Handshakes, the normal interpreter table is > not restored after single stepping. This issue is caused by the > VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to > restore the normal interpreter table for the "off" case. So the result of this is that debugging tests may run more slowly overall? > Prior to Thread Local Handshakes, this was a valid assumption to make. > SafepointSynchronize::end() has been refactored into > disarm_safepoint() and it only calls Interpreter::ignore_safepoints() > on the global safepoint branch. That matches up with the call to > Interpreter::notice_safepoints() that is also on the global safepoint > branch. > > The solution is for the VM_ChangeSingleStep VM-op for the "off" case > to call Interpreter::ignore_safepoints() directly. > > Here's the webrev URL: > > http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ > > The fix is just a small addition to VM_ChangeSingleStep::doit(): > > ?? if (_on) { > ???? Interpreter::notice_safepoints(); > +? } else { > +??? Interpreter::ignore_safepoints(); > ?? } Looks good - thanks for the detailed analysis in the bug report. I have on additional request from looking at related code - can you fix this confused initializer: VM_ChangeSingleStep::VM_ChangeSingleStep(bool on) : _on(on != 0) { } as _on and on are both bool the assignment can be direct and we shouldn't be comparing a bool to 0 as a matter of style. Thanks. > Everything else is just new logging support for future debugging of > interpreter table management and single stepping. Logging looks good too. Thanks, David ----- > Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. > Mach5 Tier[4-6] on standard Oracle platforms is running now. > > Thanks, in advance, for questions, comments or suggestions. > > Dan > From david.holmes at oracle.com Thu Jul 4 07:17:24 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 4 Jul 2019 17:17:24 +1000 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <610e1384-71a0-9b14-3bff-69f22eaee3ed@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <610e1384-71a0-9b14-3bff-69f22eaee3ed@oracle.com> Message-ID: Hi Erik, On 4/07/2019 5:10 pm, Erik ?sterlund wrote: > Hi Dan, > > Thanks for picking this up. The change looks good. > > However, when reviewing this, I looked at the code for actually > restoring the table (ignore/notice safepoints). It copies the dispatch > table for the interpreter. There is a comment stating it is important > the copying is atomic for MT-safety, and I can definitely see why. > However, the copying the line after that comment is in fact not atomic. Is it assuming "atomicity" by virtue of executing at a safepoint? David ----- > Here is the copying code in templateInterpreter.cpp: > > static inline void copy_table(address* from, address* to, int size) { > ? // Copy non-overlapping tables. The copy has to occur word wise for > MT safety. > ? while (size-- > 0) *to++ = *from++; > } > > Copying using a loop of non-volatile loads and stores can and definitely > will on some compilers turn into memcpy calls instead as the compiler > (correctly) considers that an equivalent transformation. And memcpy does > not guarantee atomicity. Indeed on some platforms it is not atomic. On > some platforms it will even enjoy out-of-thin-air values. Perhaps > Copy::disjoint_words_atomic() would be a better choice for atomic word > copying. If not, at the very least we should use Atomic::load/store here. > > Having said that, the fix for that issue seems like a separate RFE, > because it has been sitting there for a lot longer than TLH has been > around. > > Thanks, > /Erik > > On 2019-07-04 04:04, Daniel D. Daugherty wrote: >> Greetings, >> >> Robbin recently discovered this issue with Thread Local Handshakes. Since >> he's not available at the moment, I'm handling the issue: >> >> ???? JDK-8227117 normal interpreter table is not restored after single >> stepping with TLH >> ???? https://bugs.openjdk.java.net/browse/JDK-8227117 >> >> When using Thread Local Handshakes, the normal interpreter table is >> not restored after single stepping. This issue is caused by the >> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >> restore the normal interpreter table for the "off" case. >> >> Prior to Thread Local Handshakes, this was a valid assumption to make. >> SafepointSynchronize::end() has been refactored into >> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >> on the global safepoint branch. That matches up with the call to >> Interpreter::notice_safepoints() that is also on the global safepoint >> branch. >> >> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >> to call Interpreter::ignore_safepoints() directly. >> >> Here's the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >> >> The fix is just a small addition to VM_ChangeSingleStep::doit(): >> >> ??? if (_on) { >> ????? Interpreter::notice_safepoints(); >> +? } else { >> +??? Interpreter::ignore_safepoints(); >> ??? } >> >> Everything else is just new logging support for future debugging of >> interpreter table management and single stepping. >> >> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >> Mach5 Tier[4-6] on standard Oracle platforms is running now. >> >> Thanks, in advance, for questions, comments or suggestions. >> >> Dan >> From david.holmes at oracle.com Thu Jul 4 07:18:45 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 4 Jul 2019 17:18:45 +1000 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <784a7d5e-da81-22a1-20f2-a507e958a1e3@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <784a7d5e-da81-22a1-20f2-a507e958a1e3@oracle.com> Message-ID: <6cc70adc-bcfe-505e-9c50-db7b10933613@oracle.com> PS. I just noticed this comment: // This change must always be occur when at a safepoint. // Being at a safepoint causes the interpreter to use the // safepoint dispatch table which we overload to find single // step points. Just to be sure that it has been set, we // call notice_safepoints when turning on single stepping. // When we leave our current safepoint, should_post_single_step // will be checked by the interpreter, and the table kept // or changed accordingly. void VM_ChangeSingleStep::doit() { The "when we leave the safepoint" part is actually the bug that is being fixed - right? So the comment is not accurate. David ----- On 4/07/2019 5:13 pm, David Holmes wrote: > Hi Dan, > > On 4/07/2019 12:04 pm, Daniel D. Daugherty wrote: >> Greetings, >> >> Robbin recently discovered this issue with Thread Local Handshakes. Since >> he's not available at the moment, I'm handling the issue: >> >> ???? JDK-8227117 normal interpreter table is not restored after single >> stepping with TLH >> ???? https://bugs.openjdk.java.net/browse/JDK-8227117 >> >> When using Thread Local Handshakes, the normal interpreter table is >> not restored after single stepping. This issue is caused by the >> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >> restore the normal interpreter table for the "off" case. > > So the result of this is that debugging tests may run more slowly overall? > >> Prior to Thread Local Handshakes, this was a valid assumption to make. >> SafepointSynchronize::end() has been refactored into >> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >> on the global safepoint branch. That matches up with the call to >> Interpreter::notice_safepoints() that is also on the global safepoint >> branch. >> >> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >> to call Interpreter::ignore_safepoints() directly. >> >> Here's the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >> >> The fix is just a small addition to VM_ChangeSingleStep::doit(): >> >> ??? if (_on) { >> ????? Interpreter::notice_safepoints(); >> +? } else { >> +??? Interpreter::ignore_safepoints(); >> ??? } > > Looks good - thanks for the detailed analysis in the bug report. > > I have on additional request from looking at related code - can you fix > this confused initializer: > > VM_ChangeSingleStep::VM_ChangeSingleStep(bool on) > ? : _on(on != 0) > { > } > > as _on and on are both bool the assignment can be direct and we > shouldn't be comparing a bool to 0 as a matter of style. Thanks. > >> Everything else is just new logging support for future debugging of >> interpreter table management and single stepping. > > Logging looks good too. > > Thanks, > David > ----- > >> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >> Mach5 Tier[4-6] on standard Oracle platforms is running now. >> >> Thanks, in advance, for questions, comments or suggestions. >> >> Dan >> From erik.osterlund at oracle.com Thu Jul 4 08:08:58 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 4 Jul 2019 10:08:58 +0200 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <610e1384-71a0-9b14-3bff-69f22eaee3ed@oracle.com> Message-ID: <89c1417e-c606-3145-0d70-af062d4a8fbc@oracle.com> Hi David, When you run without TLH, this copying mechanism is used to synchronize the safepoint while JavaThreads are running. The interpreter doesn't emit any polls then. Instead it clobbers the dispatch table. JavaThreads will be reading from the dispatch table while it is being (non-atomically) modified. That could crash. For example with the Solaris + studio + SPARC - TLH configuration, the compiler will almost certainly emit a memcpy (this transformation has been observed in practice), the memcpy will use BIS instructions (observed in practice) for performance, with out-of-thin-air values (observed in practice), and the JavaThreads will occasionally crash during safepoint synchronization due to said out-of-thin-air values. So I guess the problem might be larger back when TLH was not default. But this seems conceptually wrong. /Erik On 2019-07-04 09:17, David Holmes wrote: > Hi Erik, > > On 4/07/2019 5:10 pm, Erik ?sterlund wrote: >> Hi Dan, >> >> Thanks for picking this up. The change looks good. >> >> However, when reviewing this, I looked at the code for actually >> restoring the table (ignore/notice safepoints). It copies the >> dispatch table for the interpreter. There is a comment stating it is >> important the copying is atomic for MT-safety, and I can definitely >> see why. However, the copying the line after that comment is in fact >> not atomic. > > Is it assuming "atomicity" by virtue of executing at a safepoint? > > David > ----- > >> Here is the copying code in templateInterpreter.cpp: >> >> static inline void copy_table(address* from, address* to, int size) { >> ?? // Copy non-overlapping tables. The copy has to occur word wise >> for MT safety. >> ?? while (size-- > 0) *to++ = *from++; >> } >> >> Copying using a loop of non-volatile loads and stores can and >> definitely will on some compilers turn into memcpy calls instead as >> the compiler (correctly) considers that an equivalent transformation. >> And memcpy does not guarantee atomicity. Indeed on some platforms it >> is not atomic. On some platforms it will even enjoy out-of-thin-air >> values. Perhaps Copy::disjoint_words_atomic() would be a better >> choice for atomic word copying. If not, at the very least we should >> use Atomic::load/store here. >> >> Having said that, the fix for that issue seems like a separate RFE, >> because it has been sitting there for a lot longer than TLH has been >> around. >> >> Thanks, >> /Erik >> >> On 2019-07-04 04:04, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> Robbin recently discovered this issue with Thread Local Handshakes. >>> Since >>> he's not available at the moment, I'm handling the issue: >>> >>> ???? JDK-8227117 normal interpreter table is not restored after >>> single stepping with TLH >>> ???? https://bugs.openjdk.java.net/browse/JDK-8227117 >>> >>> When using Thread Local Handshakes, the normal interpreter table is >>> not restored after single stepping. This issue is caused by the >>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >>> restore the normal interpreter table for the "off" case. >>> >>> Prior to Thread Local Handshakes, this was a valid assumption to make. >>> SafepointSynchronize::end() has been refactored into >>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >>> on the global safepoint branch. That matches up with the call to >>> Interpreter::notice_safepoints() that is also on the global safepoint >>> branch. >>> >>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >>> to call Interpreter::ignore_safepoints() directly. >>> >>> Here's the webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >>> >>> The fix is just a small addition to VM_ChangeSingleStep::doit(): >>> >>> ??? if (_on) { >>> ????? Interpreter::notice_safepoints(); >>> +? } else { >>> +??? Interpreter::ignore_safepoints(); >>> ??? } >>> >>> Everything else is just new logging support for future debugging of >>> interpreter table management and single stepping. >>> >>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >>> Mach5 Tier[4-6] on standard Oracle platforms is running now. >>> >>> Thanks, in advance, for questions, comments or suggestions. >>> >>> Dan >>> From david.holmes at oracle.com Thu Jul 4 09:38:27 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 4 Jul 2019 19:38:27 +1000 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <89c1417e-c606-3145-0d70-af062d4a8fbc@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <610e1384-71a0-9b14-3bff-69f22eaee3ed@oracle.com> <89c1417e-c606-3145-0d70-af062d4a8fbc@oracle.com> Message-ID: <6ca57c3a-9b94-5774-d4ed-20d91c0cdc02@oracle.com> Hi Erik, On 4/07/2019 6:08 pm, Erik ?sterlund wrote: > Hi David, > > When you run without TLH, this copying mechanism is used to synchronize > the safepoint while JavaThreads are running. The interpreter doesn't > emit any polls then. Instead it clobbers the dispatch table. JavaThreads > will be reading from the dispatch table while it is being > (non-atomically) modified. That could crash. For example with the > Solaris + studio + SPARC - TLH configuration, the compiler will almost > certainly emit a memcpy (this transformation has been observed in > practice), the memcpy will use BIS instructions (observed in practice) > for performance, with out-of-thin-air values (observed in practice), and > the JavaThreads will occasionally crash during safepoint synchronization > due to said out-of-thin-air values. > > So I guess the problem might be larger back when TLH was not default. > But this seems conceptually wrong. I always thought there were two dispatch tables and we simply switched between them - not copied anything! David > /Erik > > On 2019-07-04 09:17, David Holmes wrote: >> Hi Erik, >> >> On 4/07/2019 5:10 pm, Erik ?sterlund wrote: >>> Hi Dan, >>> >>> Thanks for picking this up. The change looks good. >>> >>> However, when reviewing this, I looked at the code for actually >>> restoring the table (ignore/notice safepoints). It copies the >>> dispatch table for the interpreter. There is a comment stating it is >>> important the copying is atomic for MT-safety, and I can definitely >>> see why. However, the copying the line after that comment is in fact >>> not atomic. >> >> Is it assuming "atomicity" by virtue of executing at a safepoint? >> >> David >> ----- >> >>> Here is the copying code in templateInterpreter.cpp: >>> >>> static inline void copy_table(address* from, address* to, int size) { >>> ?? // Copy non-overlapping tables. The copy has to occur word wise >>> for MT safety. >>> ?? while (size-- > 0) *to++ = *from++; >>> } >>> >>> Copying using a loop of non-volatile loads and stores can and >>> definitely will on some compilers turn into memcpy calls instead as >>> the compiler (correctly) considers that an equivalent transformation. >>> And memcpy does not guarantee atomicity. Indeed on some platforms it >>> is not atomic. On some platforms it will even enjoy out-of-thin-air >>> values. Perhaps Copy::disjoint_words_atomic() would be a better >>> choice for atomic word copying. If not, at the very least we should >>> use Atomic::load/store here. >>> >>> Having said that, the fix for that issue seems like a separate RFE, >>> because it has been sitting there for a lot longer than TLH has been >>> around. >>> >>> Thanks, >>> /Erik >>> >>> On 2019-07-04 04:04, Daniel D. Daugherty wrote: >>>> Greetings, >>>> >>>> Robbin recently discovered this issue with Thread Local Handshakes. >>>> Since >>>> he's not available at the moment, I'm handling the issue: >>>> >>>> ???? JDK-8227117 normal interpreter table is not restored after >>>> single stepping with TLH >>>> ???? https://bugs.openjdk.java.net/browse/JDK-8227117 >>>> >>>> When using Thread Local Handshakes, the normal interpreter table is >>>> not restored after single stepping. This issue is caused by the >>>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >>>> restore the normal interpreter table for the "off" case. >>>> >>>> Prior to Thread Local Handshakes, this was a valid assumption to make. >>>> SafepointSynchronize::end() has been refactored into >>>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >>>> on the global safepoint branch. That matches up with the call to >>>> Interpreter::notice_safepoints() that is also on the global safepoint >>>> branch. >>>> >>>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >>>> to call Interpreter::ignore_safepoints() directly. >>>> >>>> Here's the webrev URL: >>>> >>>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >>>> >>>> The fix is just a small addition to VM_ChangeSingleStep::doit(): >>>> >>>> ??? if (_on) { >>>> ????? Interpreter::notice_safepoints(); >>>> +? } else { >>>> +??? Interpreter::ignore_safepoints(); >>>> ??? } >>>> >>>> Everything else is just new logging support for future debugging of >>>> interpreter table management and single stepping. >>>> >>>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >>>> Mach5 Tier[4-6] on standard Oracle platforms is running now. >>>> >>>> Thanks, in advance, for questions, comments or suggestions. >>>> >>>> Dan >>>> > From goetz.lindenmaier at sap.com Thu Jul 4 10:10:57 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 4 Jul 2019 10:10:57 +0000 Subject: RFR(xxs): 8227032: MetaspaceUtils::print_report crashes when called before initialization In-Reply-To: References: <7786446b1343e2eea5a81817eb8cb84ea8e0470b.camel@oracle.com> Message-ID: Hi, looks good, nice and simple fix. Best regards, Goetz. > -----Original Message----- > From: hotspot-runtime-dev > On Behalf Of Thomas St?fe > Sent: Dienstag, 2. Juli 2019 13:48 > To: Thomas Schatzl > Cc: Hotspot dev runtime > Subject: Re: RFR(xxs): 8227032: MetaspaceUtils::print_report crashes when > called before initialization > > Thank you Thomas. > > On Tue, Jul 2, 2019 at 1:05 PM Thomas Schatzl > wrote: > > > Hi, > > > > On Mon, 2019-07-01 at 18:16 +0200, Thomas St?fe wrote: > > > Hi all, > > > > > > this is a tiny fix which prevents MetaspaceUtils::print_report from > > > crashing when called before metaspace is initialized. > > > > > > Issue: https://bugs.openjdk.java.net/browse/JDK-8227032 > > > webrev: > > > > > > > http://cr.openjdk.java.net/~stuefe/webrevs/8227032-metaspaceutils-print- > report-pre-init-crash/webrev.00/webrev/index.html > > > > > > Thanks, Thomas > > > > looks good. > > > > Thomas > > > > From thomas.stuefe at gmail.com Thu Jul 4 10:12:14 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 4 Jul 2019 12:12:14 +0200 Subject: RFR(xxs): 8227032: MetaspaceUtils::print_report crashes when called before initialization In-Reply-To: References: <7786446b1343e2eea5a81817eb8cb84ea8e0470b.camel@oracle.com> Message-ID: Thank you Goetz On Thu, Jul 4, 2019 at 12:11 PM Lindenmaier, Goetz < goetz.lindenmaier at sap.com> wrote: > Hi, > > looks good, nice and simple fix. > > Best regards, > Goetz. > > > -----Original Message----- > > From: hotspot-runtime-dev > > On Behalf Of Thomas St?fe > > Sent: Dienstag, 2. Juli 2019 13:48 > > To: Thomas Schatzl > > Cc: Hotspot dev runtime > > Subject: Re: RFR(xxs): 8227032: MetaspaceUtils::print_report crashes when > > called before initialization > > > > Thank you Thomas. > > > > On Tue, Jul 2, 2019 at 1:05 PM Thomas Schatzl > > > wrote: > > > > > Hi, > > > > > > On Mon, 2019-07-01 at 18:16 +0200, Thomas St?fe wrote: > > > > Hi all, > > > > > > > > this is a tiny fix which prevents MetaspaceUtils::print_report from > > > > crashing when called before metaspace is initialized. > > > > > > > > Issue: https://bugs.openjdk.java.net/browse/JDK-8227032 > > > > webrev: > > > > > > > > > > > http://cr.openjdk.java.net/~stuefe/webrevs/8227032-metaspaceutils-print- > > report-pre-init-crash/webrev.00/webrev/index.html > > > > > > > > Thanks, Thomas > > > > > > looks good. > > > > > > Thomas > > > > > > > From goetz.lindenmaier at sap.com Thu Jul 4 10:35:48 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 4 Jul 2019 10:35:48 +0000 Subject: RFR(S): 8227255: Switchable helpful NullPointerExceptions Message-ID: Hi, please review this small change. http://cr.openjdk.java.net/~goetz/wr19/8227255-NPE-switchable/01/ It will be part of JEP 8220715. https://bugs.openjdk.java.net/browse/JDK-8220715 The exception messages proposed there will first be off per default. After gathering experience, they will be turned on per default. I was asked to use a manageable flag so it can be switched by jcmd. The flag: SuppressCodeDetailsInExceptionMessages "Suppress" because the feature is meant to be on per default in the long run. Then you'll have to use -XX:_+_ if using the switch. "CodeDetails" tries to summarize the concerns with the message. The flag does not mention NPE so it can be used in other, similar cases. If there are not objections to the flag name, I'll file a CSR. Or should I wait with the CSR until the JEP is targeted? Best regards, Goetz. From david.holmes at oracle.com Thu Jul 4 10:52:22 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 4 Jul 2019 20:52:22 +1000 Subject: RFR(S): 8227255: Switchable helpful NullPointerExceptions In-Reply-To: References: Message-ID: Sorry Goetz but I am confused by this. You have a JEP that is still in draft but here you have a RFR for a change related to that JEP but not the implementation of that JEP ??? I only expect to see one issue filed to implement that JEP including the creation of the flag to enable/disable it. The introduction of the flag should be part of the JEP as well. That said you may as well get the CSR going in parallel with the JEP. David On 4/07/2019 8:35 pm, Lindenmaier, Goetz wrote: > Hi, > > please review this small change. > > http://cr.openjdk.java.net/~goetz/wr19/8227255-NPE-switchable/01/ > > It will be part of JEP 8220715. > > https://bugs.openjdk.java.net/browse/JDK-8220715 > > The exception messages proposed there will first be > > off per default. After gathering experience, they > > will be turned on per default. > > I was asked to use a manageable flag so it can be switched > > by jcmd. > > The flag: SuppressCodeDetailsInExceptionMessages > > ? ?Suppress?? because the feature is meant to be on per > > ????????????? ?????????default in the long run. Then you?ll have to > > ????????????? ?????????use -XX:_+_ if using the switch. > > ? ?CodeDetails? tries to summarize the concerns with > > ??the message. > > The flag does not mention NPE so it can ?be used in > > other, similar cases. > > If there are not objections to the flag name, I?ll file a > > CSR.? Or should I wait with the CSR until the JEP is > > targeted? > > Best regards, > > ? Goetz. > From ralf.schmelter at sap.com Thu Jul 4 10:56:28 2019 From: ralf.schmelter at sap.com (Schmelter, Ralf) Date: Thu, 4 Jul 2019 10:56:28 +0000 Subject: RFR (M) 8191521: handle long relative path specified in -Xbootclasspath/a on windows Message-ID: Hi, can you please review this patch to fix various long path related problems in the hotspot os code on Windows. As described in the bug the current code cannot handle relative paths in these cases: 1. If the relative path is < 260 chars, but the absolute path is > 260 chars. In this case if the I/O method uses the *A variant of the system call as an optimization, it will fail. 2. If the relative path is > 260 chars or the I/O method always uses the *W variant. In this case the create_unc_path() method is called, which just prepends \\?\ to the relative path. But this is not a valid path to use and the system call will fail. Additionally there are problems with some other kinds of paths: 1. An absolute path which contains '.' or '..' parts and is > 260 chars or the I/O method always uses the *W variant. When given to the create_unc_path() method, it will just prepend \\?\. But this is not a valid path to use and the system call will fail. 2. An UNC path which is > 260 or the I/O method always uses the *W variant. The create_unc_path erroneously converts \\host\path to \\?\UNC\\host\path (notice the double backslash before the host name). This again is not a valid path. Additionally '.' or '..' parts would not be handled correctly too. To fix this I've introduced a new function, which converts a path to a wide character unc path, calling _wfullpath() to make the path absolute if needed and to remove the '.' and '..' path parts. I've adjusted all methods which used create_unc_path() to use the new method. And I removed all fallback code using the ANSI variants, since benchmarking showed that on my machine the additional overhead of converting to a wchar and potentially calling _wfullpath() was less than 5% of the actual I/O routine called. And for this reason, why I haven't tried to optimize avoiding calls to _wfullpath() (e.g. checking for '.' and '..' and only calling it if we find this in the path). bugreport: https://bugs.openjdk.java.net/browse/JDK-8191521 webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8191521/webrev.0/ Best regards, Ralf From goetz.lindenmaier at sap.com Thu Jul 4 10:59:12 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 4 Jul 2019 10:59:12 +0000 Subject: RFR(S): 8227255: Switchable helpful NullPointerExceptions In-Reply-To: References: Message-ID: Hi David, the implementation of the JEP is to be found here: https://bugs.openjdk.java.net/browse/JDK-8218628 http://cr.openjdk.java.net/~goetz/wr19/8218628-exMsg-NPE/12/ I thought it's good to keep different aspects in changes of their own, also as I need two CSRs: one to mention there are new messages and one to mention there is a new flag. It also simplifies reviews a lot. Best regards, Goetz. > -----Original Message----- > From: David Holmes > Sent: Donnerstag, 4. Juli 2019 12:52 > To: Lindenmaier, Goetz ; hotspot-runtime- > dev at openjdk.java.net > Cc: Coleen Phillimore (coleen.phillimore at oracle.com) > > Subject: Re: RFR(S): 8227255: Switchable helpful NullPointerExceptions > > Sorry Goetz but I am confused by this. You have a JEP that is still in > draft but here you have a RFR for a change related to that JEP but not > the implementation of that JEP ??? I only expect to see one issue filed > to implement that JEP including the creation of the flag to > enable/disable it. The introduction of the flag should be part of the > JEP as well. > > That said you may as well get the CSR going in parallel with the JEP. > > David > > On 4/07/2019 8:35 pm, Lindenmaier, Goetz wrote: > > Hi, > > > > please review this small change. > > > > http://cr.openjdk.java.net/~goetz/wr19/8227255-NPE-switchable/01/ > > > > It will be part of JEP 8220715. > > > > https://bugs.openjdk.java.net/browse/JDK-8220715 > > > > The exception messages proposed there will first be > > > > off per default. After gathering experience, they > > > > will be turned on per default. > > > > I was asked to use a manageable flag so it can be switched > > > > by jcmd. > > > > The flag: SuppressCodeDetailsInExceptionMessages > > > > ? "Suppress"? because the feature is meant to be on per > > > > ????????????? ?????????default in the long run. Then you'll have to > > > > ????????????? ?????????use -XX:_+_ if using the switch. > > > > ? "CodeDetails" tries to summarize the concerns with > > > > ??the message. > > > > The flag does not mention NPE so it can ?be used in > > > > other, similar cases. > > > > If there are not objections to the flag name, I'll file a > > > > CSR.? Or should I wait with the CSR until the JEP is > > > > targeted? > > > > Best regards, > > > > ? Goetz. > > From thomas.stuefe at gmail.com Thu Jul 4 14:12:20 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 4 Jul 2019 16:12:20 +0200 Subject: RFR(xs): 8227031: Print NMT statistics on fatal errors Message-ID: Hi all, We have -XX:+-PrintNMTStatistics, a very useful switch which will cause the VM to print out the NMT statistics if the VM exits normally. Currently it does not work if the VM exits due to a fatal error. But especially in fatal exits due to native OOM a NMT report would be very helpful. JBS: https://bugs.openjdk.java.net/browse/JDK-8227031 cr: http://cr.openjdk.java.net/~stuefe/webrevs/8227031-optionally-print-nmt-report-on-oom/webrev.00/webrev/index.html Changes in this patch: - handle PrintNMTStatistics on fatal error - make sure the final report is not called twice accidentally and it is not called recursively due to secondary error handling - change the Metaspace report portion of the NMT report to only include the brief metaspace report - that one can be called at any time, it does not lock nor require any resources. Please note: this will not work when we are in an OOM situation and request a detailed NMT report; that scenario needs more work since NMT detailed reports need memory as well. That is a separate issue. Thanks, Thomas From david.holmes at oracle.com Thu Jul 4 20:36:09 2019 From: david.holmes at oracle.com (David Holmes) Date: Fri, 5 Jul 2019 06:36:09 +1000 Subject: RFR(S): 8227255: Switchable helpful NullPointerExceptions In-Reply-To: References: Message-ID: <8d4b125a-2c3b-bb84-6572-8cb903c68130@oracle.com> Hi Goetz, On 4/07/2019 8:59 pm, Lindenmaier, Goetz wrote: > Hi David, > > the implementation of the JEP is to be found here: > https://bugs.openjdk.java.net/browse/JDK-8218628 > http://cr.openjdk.java.net/~goetz/wr19/8218628-exMsg-NPE/12/ > > I thought it's good to keep different aspects in changes of their > own I expect to see one issue being used to push a complete implementation for a JEP. Independent parts unrelated to the actual JEP can be split off but anything dependent on the JEP should be done all together. > also as I need two CSRs: one to mention there are new messages > and one to mention there is a new flag. You can use one CSR, for 8218628, to cover all aspects. David ------ > It also simplifies reviews a lot. > > Best regards, > Goetz. > >> -----Original Message----- >> From: David Holmes >> Sent: Donnerstag, 4. Juli 2019 12:52 >> To: Lindenmaier, Goetz ; hotspot-runtime- >> dev at openjdk.java.net >> Cc: Coleen Phillimore (coleen.phillimore at oracle.com) >> >> Subject: Re: RFR(S): 8227255: Switchable helpful NullPointerExceptions >> >> Sorry Goetz but I am confused by this. You have a JEP that is still in >> draft but here you have a RFR for a change related to that JEP but not >> the implementation of that JEP ??? I only expect to see one issue filed >> to implement that JEP including the creation of the flag to >> enable/disable it. The introduction of the flag should be part of the >> JEP as well. >> >> That said you may as well get the CSR going in parallel with the JEP. >> >> David >> >> On 4/07/2019 8:35 pm, Lindenmaier, Goetz wrote: >>> Hi, >>> >>> please review this small change. >>> >>> http://cr.openjdk.java.net/~goetz/wr19/8227255-NPE-switchable/01/ >>> >>> It will be part of JEP 8220715. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8220715 >>> >>> The exception messages proposed there will first be >>> >>> off per default. After gathering experience, they >>> >>> will be turned on per default. >>> >>> I was asked to use a manageable flag so it can be switched >>> >>> by jcmd. >>> >>> The flag: SuppressCodeDetailsInExceptionMessages >>> >>> ? "Suppress"? because the feature is meant to be on per >>> >>> ????????????? ?????????default in the long run. Then you'll have to >>> >>> ????????????? ?????????use -XX:_+_ if using the switch. >>> >>> ? "CodeDetails" tries to summarize the concerns with >>> >>> ??the message. >>> >>> The flag does not mention NPE so it can ?be used in >>> >>> other, similar cases. >>> >>> If there are not objections to the flag name, I'll file a >>> >>> CSR.? Or should I wait with the CSR until the JEP is >>> >>> targeted? >>> >>> Best regards, >>> >>> ? Goetz. >>> From david.holmes at oracle.com Fri Jul 5 01:54:49 2019 From: david.holmes at oracle.com (David Holmes) Date: Fri, 5 Jul 2019 11:54:49 +1000 Subject: RFR: 8185005: Improve performance of ThreadMXBean.getThreadInfo(long ids[], int maxDepth) In-Reply-To: References: <4C4212D0-BFFF-4C85-ACC6-05200F220C3F@oracle.com> Message-ID: <63cb79fb-99d1-79e8-0f28-b67dfacd858f@oracle.com> Hi Daniil, Sorry I found it harder to get to this this week than I would have hoped, so I've asked a couple of other runtime folk to please take a look while I'm on vacation. I do have some comments though. First, you've based this off the ResolvedMethodTable code and it isn't clear to me that everything in that code necessarily maps to this code. For example: 68 static uintx get_hash(Value const& value, bool* is_dead) { 69 *is_dead = false; ... 167 bool equals(ThreadTableEntry **value, bool* is_dead) { 168 *is_dead = false; "is_dead" is not relevant for this code and should be deleted. Then I'm not at all clear about getting the serviceThread to resize/rehash the table. Is there a specific reason to do that or did you just copy what is done for the ResolvedMethodTable? The usage constraints of that table may be different to this one and require using the serviceThread where here we may not need to. The initialization in universe_init() may not be right for the ThreadsTable, which is logically encapsulated by ThreadsSMRSupport. Overall there is complexity in ResolvedMethodTable code that I don't grok and it isn't obvious to me that it is all needed here. Further, the ConcurrentHashTable was only added in Java 11 so this will still need an alternate implementation - as per the bug report - to backport to Java 8. The is_valid_java_thread you added to ThreadsList isn't really needed. I know you've copied the core of that logic from the linear search code, but it really doesn't apply when using the table given the way you keep the table up to date. If you find the JavaThread using a given tid then that is the JavaThread. There's a typo PMIMORDIAL_JAVA_TID. I'm unclear about the tid==1 handling, you say "ThreadsSMRSupport::add_thread() is not called for the primordial thread" but the main thread does have this called via Threads:add just like every other created or attached thread. And note this isn't generally the "primordial thread" (which is the initial thread of a process) but just the "main" thread used to load the JVM. Overall I'm concerned about the duplication/overlap that now exists between the ThreadsList and the ThreadsTable. Maybe it is unavoidable to get the hashed lookup, or perhaps there is some way to get the desired functionality without the overlap? I'm hoping Dan will be able to chime in on that (ideally Robbin too but he is away this month.). And of course we still need to check overall footprint and performance impact. (E.g. if we have large numbers of threads might the rehash cause observable pauses in the other cleanup activities that the service thread does?). Thanks, David ----- On 29/06/2019 2:39 pm, David Holmes wrote: > Hi Daniil, > > The definition and use of this hashtable (yet another hashtable > implementation!) will need careful examination. We have to be concerned > about the cost of maintaining it when it may never even be queried. You > would need to look at footprint cost and performance impact. > > Unfortunately I'm just about to board a plane and will be out for the > next few days. I will try to look at this asap next week, but we will > need a lot more data on it. > > Thanks, > David > > On 28/06/2019 6:31 pm, Daniil Titov wrote: >> Please review the change that improves performance of ThreadMXBean >> MXBean methods returning the >> information for specific threads. The change introduces the thread >> table that uses ConcurrentHashTable >> to store one-to-one the mapping between the thread ids and JavaThread >> objects and replaces the linear >> search over the thread list in >> ThreadsList::find_JavaThread_from_java_tid(jlong tid) method with the >> lookup >> in the thread table. >> >> Testing: Mach5 tier1,tier2 and tier3 tests successfully passed. >> >> Webrev: https://cr.openjdk.java.net/~dtitov/8185005/webrev.01/ >> Bug: https://bugs.openjdk.java.net/browse/JDK-8185005 >> >> Thanks! >> >> Best regards, >> Daniil >> >> >> From jianglizhou at google.com Fri Jul 5 04:08:07 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Thu, 4 Jul 2019 21:08:07 -0700 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: <74d4d8b0-6e5f-d9b0-8500-530b42419496@oracle.com> References: <87k1d7xxy7.fsf@oldenburg2.str.redhat.com> <87ftnvt604.fsf@oldenburg2.str.redhat.com> <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> <0dacfb70-8901-9e9f-9a1f-6a5dc1bc6a91@oracle.com> <74d4d8b0-6e5f-d9b0-8500-530b42419496@oracle.com> Message-ID: Hi Thomas and David, Acknowledged. Best regards, Jiangli On Wed, Jul 3, 2019, 11:53 PM David Holmes wrote: > > > On 4/07/2019 4:22 am, Jiangli Zhou wrote: > > Hi David, > > > > On Wed, Jul 3, 2019 at 12:00 AM David Holmes > wrote: > >> > >> Hi Jiangli, > >> > >> On 2/07/2019 8:33 am, Jiangli Zhou wrote: > >>> Hi David, > >>> > >>> On Mon, Jul 1, 2019 at 1:22 AM David Holmes > wrote: > >>>> > >>>> Hi Jiangli, > >>>> > >>>> On 29/06/2019 9:42 am, Jiangli Zhou wrote: > >>>>> Hi David, > >>>>> > >>>>> Thanks for the detailed comments! Here is the latest webrev: > >>>>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.04/. Apologizing > >>>>> for not including an incremental webrev (realized that when I almost > >>>>> done edits). > >>>> > >>>> That all looks fine - thanks for making the changes. > >>>> > >>>> However ... now that I see the logging output it occurred to me that > >>>> checking for the TLS adjustment is something that should only happen > >>>> once and we should be storing the adjustment amount in a static for > >>>> direct use. Sorry I didn't think about this earlier. Something like: > >>> > >>> No problem at all! I actually thought in the same direction as well > >>> when making the change initially but didn't go with it as I had some > >>> concerns. Florian's latest reply is reassuring (thanks again!). So > >>> here is the update: > >>> http://cr.openjdk.java.net/~jiangli/8225035/webrev_inc.05/. > >> > >> The incremental change looks fine to me. To address Florian's concern I > >> suggest adding an additional comment: > >> > >> // Returns the size of the static TLS area glibc puts on thread > stacks. > >> + // The value is cached on first use, which occurs when the first > thread > >> + // is created during VM initialization. > >> static size_t get_static_tls_area_size(const pthread_attr_t *attr) { > > > > > > Done. > > > >> > >>> Since we are settling down on the approach and final implementation > >>> details, it would be a good idea to get the CSR ball rolling. Could > >>> you please review and I'll finalize the CSR. Thanks! > >> > >> Done. > > > > Thank you! > > > > As you, Florian, Thomas all made great contributions to this > > workaround, I should list all of you as both contributors and > > reviewers in the changeset. If there is any objection, please let me > > know. > > It's not necessary for me, but I appreciate the consideration. > > Cheers, > David > > > > > Best regards, > > Jiangli > > > >> > >> Thanks, > >> David > >> > >>> Best regards, > >>> Jiangli > >>> > >>>> > >>>> static size_t tls_size = 0; > >>>> static bool tls_size_inited = false; > >>>> > >>>> static size_t get_static_tls_area_size(const pthread_attr_t *attr) { > >>>> + if (!tls_size_inited) { > >>>> + tls_size_inited = true; > >>>> if (_get_minstack_func != NULL) { > >>>> size_t minstack_size = _get_minstack_func(attr); > >>>> ... > >>>> if (minstack_size > (size_t)os::vm_page_size() + > >>>> PTHREAD_STACK_MIN) { > >>>> tls_size = minstack_size - os::vm_page_size() - > PTHREAD_STACK_MIN; > >>>> } > >>>> } > >>>> + } > >>>> log_info(os, thread)("Stack size adjustment for TLS is " > SIZE_FORMAT, > >>>> tls_size); > >>>> return tls_size; > >>>> } > >>>> > >>>> Or even fold it all into get_minstack_init() ? > >>>> > >>>> I'm assuming that the result of __pthread_get_minstack wont' change > over > >>>> time of course. > >>>> > >>>> Thanks, > >>>> David > >>>> ----- > >>>> > >>>>> On Fri, Jun 28, 2019 at 8:57 AM David Holmes < > david.holmes at oracle.com> wrote: > >>>>>> > >>>>>> Hi Jiangli, > >>>>>> > >>>>>> This is very well written up - thanks. > >>>>>> > >>>>>> I apologize in advance that I'm about to traveling for a few days so > >>>>>> won't be able to respond further until next week. > >>>>> > >>>>> Hope you have a relaxed and safe travel. This can wait. > >>>>> > >>>>>> > >>>>>> On 27/06/2019 11:58 pm, Jiangli Zhou wrote: > >>>>>>> Updated webrev: > >>>>>>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.03/ > >>>>>> > >>>>>> Overall changes look good. I also have a concern around this code: > >>>>>> > >>>>>> + tls_size = minstack_size - os::vm_page_size() - > PTHREAD_STACK_MIN; > >>>>>> + assert(tls_size > 0, "unexpected size"); > >>>>>> > >>>>>> In addition to Thomas's comments re-signedness, can't the result == > 0 if > >>>>>> there is no sttaic TLS in use? Or is there always some static TLS > in use? > >>>>> > >>>>> On both glibc 2.24 and 2.28, by default I see there is one page > memory > >>>>> for static TLS, without explicitly defining any __thread variables in > >>>>> user code. > >>>>> > >>>>>> > >>>>>> A few other comments/requests: > >>>>>> > >>>>>> 822 static void get_minstack_init() { > >>>>>> 823 _get_minstack_func = > >>>>>> 824 (GetMinStack)dlsym(RTLD_DEFAULT, > "__pthread_get_minstack"); > >>>>>> 825 } > >>>>>> > >>>>>> Can you add a logging statement please: > >>>>>> > >>>>>> log(os, thread)("Lookup of __pthread_get_minstack %s", > >>>>>> _get_minstack_func == NULL ? "failed" : > "succeeded"); > >>>>> > >>>>> Added log info. > >>>>> > >>>>>> > >>>>>> -- > >>>>>> > >>>>>> 884 // In the Linux NPTL pthread implementation the guard > size mechanism > >>>>>> > >>>>>> Now that we have additional information on this could you update > this > >>>>>> old comment to say > >>>>>> > >>>>>> // In glibc versions prior to 2.7 the guard size mechanism > >>>>> > >>>>> Done. > >>>>> > >>>>>> > >>>>>> -- > >>>>>> > >>>>>> 897 // Adjust the stack_size by adding the on-stack TLS > size if > >>>>>> 898 // AdjustStackSizeForTLS is true. The guard size is > already > >>>>>> 899 // accounted in this case, please see comments in > >>>>>> 900 // get_static_tls_area_size(). > >>>>>> > >>>>>> Given the extensive commentary in get_static_tls_area_size() can we > be > >>>>>> more brief here and just say: > >>>>>> > >>>>>> // Adjust the stack size for on-stack TLS - see > get_static_tls_area_size(). > >>>>> > >>>>> Done. > >>>>> > >>>>>> > >>>>>> -- > >>>>>> > >>>>>> 5193 get_minstack_init(); > >>>>>> > >>>>>> Can you make this conditional on AdjustStackSizeForTLS please so > there > >>>>>> is no affect when not using the flag - thanks. > >>>>> > >>>>> Done. > >>>>> > >>>>>> > >>>>>> -- > >>>>>> > >>>>>> 855 } > >>>>>> 856 return tls_size; > >>>>>> > >>>>>> Can you insert a logging statement: > >>>>>> > >>>>>> log(os, thread)("Stack size adjustment for TLS is " SIZE_T_FORMAT, > >>>>>> tls_size); > >>>>> > >>>>> Added. > >>>>> > >>>>>> > >>>>>> --- > >>>>>> > >>>>>> test/hotspot/jtreg/runtime/TLS/T.java > >>>>>> > >>>>>> 37 // Starting a ProcessBuilder causes the process > reaper > >>>>>> thread being > >>>>>> > >>>>>> s/being/to be/ > >>>>> > >>>>> Fixed. > >>>>> > >>>>>> > >>>>>> 43 // failure mode the VM fails to create thread > with > >>>>>> error message > >>>>>> > >>>>>> s/create thread/create a thread/ > >>>>> > >>>>> Fixed. > >>>>> > >>>>>> > >>>>>> 53 System.out.println("Unexpected Echo > output: " + > >>>>>> echoOutput + > >>>>>> 54 ", expects: " + > echoInput); > >>>>>> > >>>>>> should this be an exception so that test fails? I can't imagine how > echo > >>>>>> would fail but probably better to fail the test if something > unexpected > >>>>>> happens. > >>>>> > >>>>> If no expected output is obtained from echo (due to ProcessBuilder > >>>>> failure caused by TLS issue), the test does fail and reports to the > >>>>> caller (returns false). If we throw an explicit expectation, it will > >>>>> be caught by the outer try/catch, which seems to be unnecessary. > >>>>> > >>>>>> > >>>>>> 66 try { > >>>>>> 67 br = new BufferedReader(new > >>>>>> InputStreamReader(inputStream)); > >>>>>> 68 s = br.readLine(); > >>>>>> 69 } finally { > >>>>>> 70 br.close(); > >>>>>> 71 } > >>>>>> > >>>>>> This could use try-with-resources for both streams. > >>>>> > >>>>> Sounds good. Done. > >>>>> > >>>>>> > >>>>>> --- > >>>>>> > >>>>>> exestack-tls.c > >>>>>> > >>>>>> It's simpler if no argument means no-tls and an argument means tls. > >>>>>> > >>>>>> 42 char classpath[4096]; > >>>>>> 43 snprintf(classpath, sizeof classpath, > >>>>>> 44 "-Djava.class.path=%s", getenv("CLASSPATH")); > >>>>>> 45 options[0].optionString = classpath; > >>>>>> > >>>>>> Do we need to explicitly set the classpath? I'm concerned that our > test > >>>>>> environment uses really, really long paths and a number of them. > >>>>>> (Probably not 4096 but still ...) > >>>>> > >>>>> The classpath needs to be set so we know where to load the test > class. > >>>>> Given that we use 4096 for the same type of usage in other existing > >>>>> test(s) (for example StackGap), it probably is okay for our test > >>>>> environments? I can increase the array size if we want to be extra > >>>>> safe ... > >>>>> > >>>>>> > >>>>>> > >>>>>> test/hotspot/jtreg/runtime/TLS/testtls.sh > >>>>>> > >>>>>> 40 if [ "${VM_OS}" != "linux" ] > >>>>>> 41 then > >>>>>> 42 echo "Test is only valid for Linux" > >>>>>> 43 exit 0 > >>>>>> 44 fi > >>>>>> > >>>>>> This should be done via "@requires os.family != Linux" > >>>>> > >>>>> Done. > >>>>> > >>>>> Thanks! > >>>>> > >>>>> Best regards, > >>>>> Jiangli > >>>>>> > >>>>>> Thanks, > >>>>>> David > >>>>>> ----- > >>>>>> > >>>>>>> Thanks for everyone's contribution on carving out the current > workaround! > >>>>>>> > >>>>>>> Best regards, > >>>>>>> Jiangli > >>>>>>> > >>>>>>> On Thu, Jun 27, 2019 at 11:42 AM Jiangli Zhou < > jianglizhou at google.com> wrote: > >>>>>>>> > >>>>>>>> Thank you Thomas and David! Glad to see that we are converging on > an > >>>>>>>> acceptable approach here. I'll try to factor in all the latest > inputs > >>>>>>>> from everyone and send out a new update. > >>>>>>>> > >>>>>>>> Thanks and best regards, > >>>>>>>> Jiangli > >>>>>>>> > >>>>>>>> On Thu, Jun 27, 2019 at 11:13 AM David Holmes < > david.holmes at oracle.com> wrote: > >>>>>>>>> > >>>>>>>>> Trimming .... > >>>>>>>>> > >>>>>>>>> On 27/06/2019 12:35 pm, Jiangli Zhou wrote: > >>>>>>>>>> On Thu, Jun 27, 2019 at 9:23 AM Florian Weimer < > fweimer at redhat.com> wrote: > >>>>>>>>>>> I think you can handle the guard size in this way: > >>>>>>>>>>> > >>>>>>>>>>> pthread_attr_setguardsize(&attr, guard_size); > >>>>>>>>>>> > >>>>>>>>>>> size_t stack_adjust_size = 0; > >>>>>>>>>>> if (AdjustStackSizeForTLS) { > >>>>>>>>>>> size_t minstack_size = get_minstack(&attr); > >>>>>>>>>>> size_t tls_size = minstack_size - vm_page_size() - > PTHREAD_STACK_MIN; > >>>>>>>>>>> // In glibc before 2.27, tls_size still includes > guard_size. > >>>>>>>>>>> // In glibc 2.27 and later, guard_size is > automatically > >>>>>>>>>>> // added to the stack size by pthread_create. > >>>>>>>>>>> // In both cases, the guard size is taken into > account. > >>>>>>>>>>> stack_adjust_size += tls_size; > >>>>>>>>>>> } else { > >>>>>>>>>>> stack_adjust_size += guard_size; > >>>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> Is the vm_page_size() counted for the dl_pagesize? As long as > others > >>>>>>>>>> are okay with the above suggested adjustment, it looks good to > me. > >>>>>>>>>> Thomas, David and others, any objection? > >>>>>>>>> > >>>>>>>>> I find the above acceptable. I've been waiting for the dust to > settle. > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> David > >>>>>>>>> > >>>>>>>>>> Thanks and best regards, > >>>>>>>>>> Jiangli > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Florian > From daniel.daugherty at oracle.com Fri Jul 5 16:34:18 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 5 Jul 2019 12:34:18 -0400 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <550c9f2f-66d5-4269-db67-ee2818b898c8@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <550c9f2f-66d5-4269-db67-ee2818b898c8@oracle.com> Message-ID: Hi Serguei! Thanks for the quick review. I didn't expect any U.S. reviewers to chime in until after the 4th of July holiday... On 7/3/19 11:34 PM, serguei.spitsyn at oracle.com wrote: > Hi Dan, > > Thank you and Robbin for discovering and fixing this! > The fix looks good to me. Thanks! > It is nice to have this new logging. Yup. Will be very useful if we decide to write a test to verify this isolated bit of expected behavior... :-) > Should we fix this bug in 13 first, or you consider it risky? No I don't consider it risky, but a number of things have come up from the other reviewers so let's see where the dust settles... Dan > > Thanks, > Serguei > > > On 7/3/19 7:04 PM, Daniel D. Daugherty wrote: >> Greetings, >> >> Robbin recently discovered this issue with Thread Local Handshakes. >> Since >> he's not available at the moment, I'm handling the issue: >> >> ??? JDK-8227117 normal interpreter table is not restored after single >> stepping with TLH >> ??? https://bugs.openjdk.java.net/browse/JDK-8227117 >> >> When using Thread Local Handshakes, the normal interpreter table is >> not restored after single stepping. This issue is caused by the >> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >> restore the normal interpreter table for the "off" case. >> >> Prior to Thread Local Handshakes, this was a valid assumption to make. >> SafepointSynchronize::end() has been refactored into >> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >> on the global safepoint branch. That matches up with the call to >> Interpreter::notice_safepoints() that is also on the global safepoint >> branch. >> >> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >> to call Interpreter::ignore_safepoints() directly. >> >> Here's the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >> >> The fix is just a small addition to VM_ChangeSingleStep::doit(): >> >> ?? if (_on) { >> ???? Interpreter::notice_safepoints(); >> +? } else { >> +??? Interpreter::ignore_safepoints(); >> ?? } >> >> Everything else is just new logging support for future debugging of >> interpreter table management and single stepping. >> >> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >> Mach5 Tier[4-6] on standard Oracle platforms is running now. >> >> Thanks, in advance, for questions, comments or suggestions. >> >> Dan >> > From daniel.daugherty at oracle.com Fri Jul 5 17:07:54 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 5 Jul 2019 13:07:54 -0400 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <610e1384-71a0-9b14-3bff-69f22eaee3ed@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <610e1384-71a0-9b14-3bff-69f22eaee3ed@oracle.com> Message-ID: On 7/4/19 3:10 AM, Erik ?sterlund wrote: > Hi Dan, > > Thanks for picking this up. The change looks good. Thanks!? Of course, just the size of the comment below makes me wonder what I got myself into... :-) And I was so happy that the non-logging part of the fix was an else-statement with _one_ line... > However, when reviewing this, I looked at the code for actually > restoring the table (ignore/notice safepoints). It copies the dispatch > table for the interpreter. There is a comment stating it is important > the copying is atomic for MT-safety, and I can definitely see why. > However, the copying the line after that comment is in fact not atomic. Actually, the comment doesn't mention 'atomic', but that's probably because the code and the comment are very, very old. It mentions 'word wise for MT safety' and I agree that 'atomic' is what the person likely meant... The history: $ sgv src/share/vm/interpreter/templateInterpreter.cpp | grep 'The copy has to occur word wise for MT safety' 1.1?????? // Copy non-overlapping tables. The copy has to occur word wise for MT safety. $ sp -r1.1 src/share/vm/interpreter/templateInterpreter.cpp src/share/vm/interpreter/SCCS/s.templateInterpreter.cpp: D 1.1 07/08/29 13:42:26 sgoldman 1 0??? 00600/00000/00000 MRs: COMMENTS: 6571248 - continuation_for is specialized for template interpreter Hmmm... I expected that comment to be even older... ahhhh... a little more poking around and I found: $ sgv -r1.147 src/share/vm/interpreter/interpreter.cpp | grep 'The copy has to occur word wise for MT safety' 1.147???? // Copy non-overlapping tables. The copy has to occur word wise for MT safety. $ sp -r1.147 src/share/vm/interpreter/interpreter.cpp src/share/vm/interpreter/SCCS/s.interpreter.cpp: D 1.147 99/02/17 10:14:36 steffen 235 233?????? 00008/00002/00762 MRs: COMMENTS: This makes more sense (timeline wise) and dates back to when all of the interpreter was in vm/interpreter/interpreter.cpp. > Here is the copying code in templateInterpreter.cpp: > > static inline void copy_table(address* from, address* to, int size) { > ? // Copy non-overlapping tables. The copy has to occur word wise for > MT safety. > ? while (size-- > 0) *to++ = *from++; > } > > Copying using a loop of non-volatile loads and stores can and > definitely will on some compilers turn into memcpy calls instead as > the compiler (correctly) considers that an equivalent transformation. Yet another C++ compiler optimization land mine... sigh... > And memcpy does not guarantee atomicity. Indeed on some platforms it > is not atomic. On some platforms it will even enjoy out-of-thin-air > values. That last bit is scary... > Perhaps Copy::disjoint_words_atomic() would be a better choice for > atomic word copying. If not, at the very least we should use > Atomic::load/store here. Copy::disjoint_words_atomic() sounds appealing... For those folks that aren't familiar with this part of safepointing... SafepointSynchronize::arm_safepoint() calls Interpreter::notice_safepoints() which calls calls copy_table(). So we're not at a safepoint yet, and, in fact, we're trying to bring those pesky JavaThreads to a safepoint... SafepointSynchronize::disarm_safepoint() calls Interpreter::ignore_safepoints() which also calls copy_table(). However, we did that before we have woken the JavaThreads that are blocked for the safepoint so that use of copy_table is safe: ? // Release threads lock, so threads can be created/destroyed again. ? Threads_lock->unlock(); ? // Wake threads after local state is correctly set. ? _wait_barrier->disarm(); } The 'Threads_lock->unlock()' should synchronize memory so that the restored table should be properly synced out to memory... > Having said that, the fix for that issue seems like a separate RFE, > because it has been sitting there for a lot longer than TLH has been > around. Yes I would like to keep the copy_table() issue for a separate bug (not RFE). I'll file a follow up bug after the dust settles for 8227117. Thanks again for the review! Dan > > Thanks, > /Erik > > On 2019-07-04 04:04, Daniel D. Daugherty wrote: >> Greetings, >> >> Robbin recently discovered this issue with Thread Local Handshakes. >> Since >> he's not available at the moment, I'm handling the issue: >> >> ???? JDK-8227117 normal interpreter table is not restored after >> single stepping with TLH >> ???? https://bugs.openjdk.java.net/browse/JDK-8227117 >> >> When using Thread Local Handshakes, the normal interpreter table is >> not restored after single stepping. This issue is caused by the >> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >> restore the normal interpreter table for the "off" case. >> >> Prior to Thread Local Handshakes, this was a valid assumption to make. >> SafepointSynchronize::end() has been refactored into >> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >> on the global safepoint branch. That matches up with the call to >> Interpreter::notice_safepoints() that is also on the global safepoint >> branch. >> >> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >> to call Interpreter::ignore_safepoints() directly. >> >> Here's the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >> >> The fix is just a small addition to VM_ChangeSingleStep::doit(): >> >> ??? if (_on) { >> ????? Interpreter::notice_safepoints(); >> +? } else { >> +??? Interpreter::ignore_safepoints(); >> ??? } >> >> Everything else is just new logging support for future debugging of >> interpreter table management and single stepping. >> >> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >> Mach5 Tier[4-6] on standard Oracle platforms is running now. >> >> Thanks, in advance, for questions, comments or suggestions. >> >> Dan >> From daniel.daugherty at oracle.com Fri Jul 5 17:12:00 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 5 Jul 2019 13:12:00 -0400 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <784a7d5e-da81-22a1-20f2-a507e958a1e3@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <784a7d5e-da81-22a1-20f2-a507e958a1e3@oracle.com> Message-ID: On 7/4/19 3:13 AM, David Holmes wrote: > Hi Dan, > > On 4/07/2019 12:04 pm, Daniel D. Daugherty wrote: >> Greetings, >> >> Robbin recently discovered this issue with Thread Local Handshakes. >> Since >> he's not available at the moment, I'm handling the issue: >> >> ???? JDK-8227117 normal interpreter table is not restored after >> single stepping with TLH >> ???? https://bugs.openjdk.java.net/browse/JDK-8227117 >> >> When using Thread Local Handshakes, the normal interpreter table is >> not restored after single stepping. This issue is caused by the >> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >> restore the normal interpreter table for the "off" case. > > So the result of this is that debugging tests may run more slowly > overall? Not just tests. An interactive debugging session would also be affected. After single stepping once, we won't ever switch back to the normal table so we'll be stuck with the safepoint interpreter dispatch table. > >> Prior to Thread Local Handshakes, this was a valid assumption to make. >> SafepointSynchronize::end() has been refactored into >> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >> on the global safepoint branch. That matches up with the call to >> Interpreter::notice_safepoints() that is also on the global safepoint >> branch. >> >> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >> to call Interpreter::ignore_safepoints() directly. >> >> Here's the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >> >> The fix is just a small addition to VM_ChangeSingleStep::doit(): >> >> ??? if (_on) { >> ????? Interpreter::notice_safepoints(); >> +? } else { >> +??? Interpreter::ignore_safepoints(); >> ??? } > > Looks good - thanks for the detailed analysis in the bug report. > > I have on additional request from looking at related code - can you > fix this confused initializer: > > VM_ChangeSingleStep::VM_ChangeSingleStep(bool on) > ? : _on(on != 0) > { > } Yes, I can fix that. > as _on and on are both bool the assignment can be direct and we > shouldn't be comparing a bool to 0 as a matter of style. Thanks. > >> Everything else is just new logging support for future debugging of >> interpreter table management and single stepping. > > Logging looks good too. Thanks. Thanks for the review. Dan > > Thanks, > David > ----- > >> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >> Mach5 Tier[4-6] on standard Oracle platforms is running now. >> >> Thanks, in advance, for questions, comments or suggestions. >> >> Dan >> From daniel.daugherty at oracle.com Fri Jul 5 17:13:51 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 5 Jul 2019 13:13:51 -0400 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <610e1384-71a0-9b14-3bff-69f22eaee3ed@oracle.com> Message-ID: <6a7a3365-9ab4-6d9b-84f8-24feca86a730@oracle.com> On 7/4/19 3:17 AM, David Holmes wrote: > Hi Erik, > > On 4/07/2019 5:10 pm, Erik ?sterlund wrote: >> Hi Dan, >> >> Thanks for picking this up. The change looks good. >> >> However, when reviewing this, I looked at the code for actually >> restoring the table (ignore/notice safepoints). It copies the >> dispatch table for the interpreter. There is a comment stating it is >> important the copying is atomic for MT-safety, and I can definitely >> see why. However, the copying the line after that comment is in fact >> not atomic. > > Is it assuming "atomicity" by virtue of executing at a safepoint? Copying part of my reply to Erik here: SafepointSynchronize::arm_safepoint() calls Interpreter::notice_safepoints() which calls calls copy_table(). So we're not at a safepoint yet, and, in fact, we're trying to bring those pesky JavaThreads to a safepoint... SafepointSynchronize::disarm_safepoint() calls Interpreter::ignore_safepoints() which also calls copy_table(). However, we did that before we have woken the JavaThreads that are blocked for the safepoint so that use of copy_table is safe: ? // Release threads lock, so threads can be created/destroyed again. ? Threads_lock->unlock(); ? // Wake threads after local state is correctly set. ? _wait_barrier->disarm(); } The 'Threads_lock->unlock()' should synchronize memory so that the restored table should be properly synced out to memory... Dan > > David > ----- > >> Here is the copying code in templateInterpreter.cpp: >> >> static inline void copy_table(address* from, address* to, int size) { >> ?? // Copy non-overlapping tables. The copy has to occur word wise >> for MT safety. >> ?? while (size-- > 0) *to++ = *from++; >> } >> >> Copying using a loop of non-volatile loads and stores can and >> definitely will on some compilers turn into memcpy calls instead as >> the compiler (correctly) considers that an equivalent transformation. >> And memcpy does not guarantee atomicity. Indeed on some platforms it >> is not atomic. On some platforms it will even enjoy out-of-thin-air >> values. Perhaps Copy::disjoint_words_atomic() would be a better >> choice for atomic word copying. If not, at the very least we should >> use Atomic::load/store here. >> >> Having said that, the fix for that issue seems like a separate RFE, >> because it has been sitting there for a lot longer than TLH has been >> around. >> >> Thanks, >> /Erik >> >> On 2019-07-04 04:04, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> Robbin recently discovered this issue with Thread Local Handshakes. >>> Since >>> he's not available at the moment, I'm handling the issue: >>> >>> ???? JDK-8227117 normal interpreter table is not restored after >>> single stepping with TLH >>> ???? https://bugs.openjdk.java.net/browse/JDK-8227117 >>> >>> When using Thread Local Handshakes, the normal interpreter table is >>> not restored after single stepping. This issue is caused by the >>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >>> restore the normal interpreter table for the "off" case. >>> >>> Prior to Thread Local Handshakes, this was a valid assumption to make. >>> SafepointSynchronize::end() has been refactored into >>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >>> on the global safepoint branch. That matches up with the call to >>> Interpreter::notice_safepoints() that is also on the global safepoint >>> branch. >>> >>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >>> to call Interpreter::ignore_safepoints() directly. >>> >>> Here's the webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >>> >>> The fix is just a small addition to VM_ChangeSingleStep::doit(): >>> >>> ??? if (_on) { >>> ????? Interpreter::notice_safepoints(); >>> +? } else { >>> +??? Interpreter::ignore_safepoints(); >>> ??? } >>> >>> Everything else is just new logging support for future debugging of >>> interpreter table management and single stepping. >>> >>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >>> Mach5 Tier[4-6] on standard Oracle platforms is running now. >>> >>> Thanks, in advance, for questions, comments or suggestions. >>> >>> Dan >>> From daniel.daugherty at oracle.com Fri Jul 5 17:16:54 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 5 Jul 2019 13:16:54 -0400 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <6cc70adc-bcfe-505e-9c50-db7b10933613@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <784a7d5e-da81-22a1-20f2-a507e958a1e3@oracle.com> <6cc70adc-bcfe-505e-9c50-db7b10933613@oracle.com> Message-ID: <5afde71a-ea23-6a5c-b1c7-77f9ac1cd83f@oracle.com> On 7/4/19 3:18 AM, David Holmes wrote: > PS. I just noticed this comment: > > // This change must always be occur when at a safepoint. > // Being at a safepoint causes the interpreter to use the > // safepoint dispatch table which we overload to find single > // step points.? Just to be sure that it has been set, we > // call notice_safepoints when turning on single stepping. > // When we leave our current safepoint, should_post_single_step > // will be checked by the interpreter, and the table kept > // or changed accordingly. > void VM_ChangeSingleStep::doit() { > > The "when we leave the safepoint" part is actually the bug that is > being fixed - right? So the comment is not accurate. I'll take a closer look at this part of the comment: // When we leave our current safepoint, should_post_single_step // will be checked by the interpreter, and the table kept // or changed accordingly. and figure out how to clarify it as part of this change. Dan > > David > ----- > > On 4/07/2019 5:13 pm, David Holmes wrote: >> Hi Dan, >> >> On 4/07/2019 12:04 pm, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> Robbin recently discovered this issue with Thread Local Handshakes. >>> Since >>> he's not available at the moment, I'm handling the issue: >>> >>> ???? JDK-8227117 normal interpreter table is not restored after >>> single stepping with TLH >>> ???? https://bugs.openjdk.java.net/browse/JDK-8227117 >>> >>> When using Thread Local Handshakes, the normal interpreter table is >>> not restored after single stepping. This issue is caused by the >>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >>> restore the normal interpreter table for the "off" case. >> >> So the result of this is that debugging tests may run more slowly >> overall? >> >>> Prior to Thread Local Handshakes, this was a valid assumption to make. >>> SafepointSynchronize::end() has been refactored into >>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >>> on the global safepoint branch. That matches up with the call to >>> Interpreter::notice_safepoints() that is also on the global safepoint >>> branch. >>> >>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >>> to call Interpreter::ignore_safepoints() directly. >>> >>> Here's the webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >>> >>> The fix is just a small addition to VM_ChangeSingleStep::doit(): >>> >>> ??? if (_on) { >>> ????? Interpreter::notice_safepoints(); >>> +? } else { >>> +??? Interpreter::ignore_safepoints(); >>> ??? } >> >> Looks good - thanks for the detailed analysis in the bug report. >> >> I have on additional request from looking at related code - can you >> fix this confused initializer: >> >> VM_ChangeSingleStep::VM_ChangeSingleStep(bool on) >> ?? : _on(on != 0) >> { >> } >> >> as _on and on are both bool the assignment can be direct and we >> shouldn't be comparing a bool to 0 as a matter of style. Thanks. >> >>> Everything else is just new logging support for future debugging of >>> interpreter table management and single stepping. >> >> Logging looks good too. >> >> Thanks, >> David >> ----- >> >>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >>> Mach5 Tier[4-6] on standard Oracle platforms is running now. >>> >>> Thanks, in advance, for questions, comments or suggestions. >>> >>> Dan >>> From daniel.daugherty at oracle.com Fri Jul 5 17:37:28 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 5 Jul 2019 13:37:28 -0400 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <89c1417e-c606-3145-0d70-af062d4a8fbc@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <610e1384-71a0-9b14-3bff-69f22eaee3ed@oracle.com> <89c1417e-c606-3145-0d70-af062d4a8fbc@oracle.com> Message-ID: <0eefb994-2973-9a36-bea6-fdc30cb18eb2@oracle.com> On 7/4/19 4:08 AM, Erik ?sterlund wrote: > Hi David, > > When you run without TLH, this copying mechanism is used to > synchronize the safepoint while JavaThreads are running. The > interpreter doesn't emit any polls then. Instead it clobbers the > dispatch table. JavaThreads will be reading from the dispatch table > while it is being (non-atomically) modified. That could crash. For > example with the Solaris + studio + SPARC - TLH configuration, the > compiler will almost certainly emit a memcpy (this transformation has > been observed in practice), the memcpy will use BIS instructions > (observed in practice) for performance, with out-of-thin-air values > (observed in practice), and the JavaThreads will occasionally crash > during safepoint synchronization due to said out-of-thin-air values. > > So I guess the problem might be larger back when TLH was not default. > But this seems conceptually wrong. It could also be that our older compilers on Solaris weren't making this transformation prior to TLH and we were less exposed to the possibility of the race. TLH landed in JDK10 (JDK-8189941). As far I can tell, it looks like TLH was enabled by default on platforms that could support it when it landed. Dan > > /Erik > > On 2019-07-04 09:17, David Holmes wrote: >> Hi Erik, >> >> On 4/07/2019 5:10 pm, Erik ?sterlund wrote: >>> Hi Dan, >>> >>> Thanks for picking this up. The change looks good. >>> >>> However, when reviewing this, I looked at the code for actually >>> restoring the table (ignore/notice safepoints). It copies the >>> dispatch table for the interpreter. There is a comment stating it is >>> important the copying is atomic for MT-safety, and I can definitely >>> see why. However, the copying the line after that comment is in fact >>> not atomic. >> >> Is it assuming "atomicity" by virtue of executing at a safepoint? >> >> David >> ----- >> >>> Here is the copying code in templateInterpreter.cpp: >>> >>> static inline void copy_table(address* from, address* to, int size) { >>> ?? // Copy non-overlapping tables. The copy has to occur word wise >>> for MT safety. >>> ?? while (size-- > 0) *to++ = *from++; >>> } >>> >>> Copying using a loop of non-volatile loads and stores can and >>> definitely will on some compilers turn into memcpy calls instead as >>> the compiler (correctly) considers that an equivalent >>> transformation. And memcpy does not guarantee atomicity. Indeed on >>> some platforms it is not atomic. On some platforms it will even >>> enjoy out-of-thin-air values. Perhaps Copy::disjoint_words_atomic() >>> would be a better choice for atomic word copying. If not, at the >>> very least we should use Atomic::load/store here. >>> >>> Having said that, the fix for that issue seems like a separate RFE, >>> because it has been sitting there for a lot longer than TLH has been >>> around. >>> >>> Thanks, >>> /Erik >>> >>> On 2019-07-04 04:04, Daniel D. Daugherty wrote: >>>> Greetings, >>>> >>>> Robbin recently discovered this issue with Thread Local Handshakes. >>>> Since >>>> he's not available at the moment, I'm handling the issue: >>>> >>>> ???? JDK-8227117 normal interpreter table is not restored after >>>> single stepping with TLH >>>> ???? https://bugs.openjdk.java.net/browse/JDK-8227117 >>>> >>>> When using Thread Local Handshakes, the normal interpreter table is >>>> not restored after single stepping. This issue is caused by the >>>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >>>> restore the normal interpreter table for the "off" case. >>>> >>>> Prior to Thread Local Handshakes, this was a valid assumption to make. >>>> SafepointSynchronize::end() has been refactored into >>>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >>>> on the global safepoint branch. That matches up with the call to >>>> Interpreter::notice_safepoints() that is also on the global safepoint >>>> branch. >>>> >>>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >>>> to call Interpreter::ignore_safepoints() directly. >>>> >>>> Here's the webrev URL: >>>> >>>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >>>> >>>> The fix is just a small addition to VM_ChangeSingleStep::doit(): >>>> >>>> ??? if (_on) { >>>> ????? Interpreter::notice_safepoints(); >>>> +? } else { >>>> +??? Interpreter::ignore_safepoints(); >>>> ??? } >>>> >>>> Everything else is just new logging support for future debugging of >>>> interpreter table management and single stepping. >>>> >>>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >>>> Mach5 Tier[4-6] on standard Oracle platforms is running now. >>>> >>>> Thanks, in advance, for questions, comments or suggestions. >>>> >>>> Dan >>>> > From daniel.daugherty at oracle.com Fri Jul 5 17:38:37 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 5 Jul 2019 13:38:37 -0400 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <6ca57c3a-9b94-5774-d4ed-20d91c0cdc02@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <610e1384-71a0-9b14-3bff-69f22eaee3ed@oracle.com> <89c1417e-c606-3145-0d70-af062d4a8fbc@oracle.com> <6ca57c3a-9b94-5774-d4ed-20d91c0cdc02@oracle.com> Message-ID: On 7/4/19 5:38 AM, David Holmes wrote: > Hi Erik, > > On 4/07/2019 6:08 pm, Erik ?sterlund wrote: >> Hi David, >> >> When you run without TLH, this copying mechanism is used to >> synchronize the safepoint while JavaThreads are running. The >> interpreter doesn't emit any polls then. Instead it clobbers the >> dispatch table. JavaThreads will be reading from the dispatch table >> while it is being (non-atomically) modified. That could crash. For >> example with the Solaris + studio + SPARC - TLH configuration, the >> compiler will almost certainly emit a memcpy (this transformation has >> been observed in practice), the memcpy will use BIS instructions >> (observed in practice) for performance, with out-of-thin-air values >> (observed in practice), and the JavaThreads will occasionally crash >> during safepoint synchronization due to said out-of-thin-air values. >> >> So I guess the problem might be larger back when TLH was not default. >> But this seems conceptually wrong. > > I always thought there were two dispatch tables and we simply switched > between them - not copied anything! Based on my search back into the TeamWare repos, it looks like we have always copied the table... Dan > > David > >> /Erik >> >> On 2019-07-04 09:17, David Holmes wrote: >>> Hi Erik, >>> >>> On 4/07/2019 5:10 pm, Erik ?sterlund wrote: >>>> Hi Dan, >>>> >>>> Thanks for picking this up. The change looks good. >>>> >>>> However, when reviewing this, I looked at the code for actually >>>> restoring the table (ignore/notice safepoints). It copies the >>>> dispatch table for the interpreter. There is a comment stating it >>>> is important the copying is atomic for MT-safety, and I can >>>> definitely see why. However, the copying the line after that >>>> comment is in fact not atomic. >>> >>> Is it assuming "atomicity" by virtue of executing at a safepoint? >>> >>> David >>> ----- >>> >>>> Here is the copying code in templateInterpreter.cpp: >>>> >>>> static inline void copy_table(address* from, address* to, int size) { >>>> ?? // Copy non-overlapping tables. The copy has to occur word wise >>>> for MT safety. >>>> ?? while (size-- > 0) *to++ = *from++; >>>> } >>>> >>>> Copying using a loop of non-volatile loads and stores can and >>>> definitely will on some compilers turn into memcpy calls instead as >>>> the compiler (correctly) considers that an equivalent >>>> transformation. And memcpy does not guarantee atomicity. Indeed on >>>> some platforms it is not atomic. On some platforms it will even >>>> enjoy out-of-thin-air values. Perhaps Copy::disjoint_words_atomic() >>>> would be a better choice for atomic word copying. If not, at the >>>> very least we should use Atomic::load/store here. >>>> >>>> Having said that, the fix for that issue seems like a separate RFE, >>>> because it has been sitting there for a lot longer than TLH has >>>> been around. >>>> >>>> Thanks, >>>> /Erik >>>> >>>> On 2019-07-04 04:04, Daniel D. Daugherty wrote: >>>>> Greetings, >>>>> >>>>> Robbin recently discovered this issue with Thread Local >>>>> Handshakes. Since >>>>> he's not available at the moment, I'm handling the issue: >>>>> >>>>> ???? JDK-8227117 normal interpreter table is not restored after >>>>> single stepping with TLH >>>>> ???? https://bugs.openjdk.java.net/browse/JDK-8227117 >>>>> >>>>> When using Thread Local Handshakes, the normal interpreter table is >>>>> not restored after single stepping. This issue is caused by the >>>>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >>>>> restore the normal interpreter table for the "off" case. >>>>> >>>>> Prior to Thread Local Handshakes, this was a valid assumption to >>>>> make. >>>>> SafepointSynchronize::end() has been refactored into >>>>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >>>>> on the global safepoint branch. That matches up with the call to >>>>> Interpreter::notice_safepoints() that is also on the global safepoint >>>>> branch. >>>>> >>>>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >>>>> to call Interpreter::ignore_safepoints() directly. >>>>> >>>>> Here's the webrev URL: >>>>> >>>>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >>>>> >>>>> The fix is just a small addition to VM_ChangeSingleStep::doit(): >>>>> >>>>> ??? if (_on) { >>>>> ????? Interpreter::notice_safepoints(); >>>>> +? } else { >>>>> +??? Interpreter::ignore_safepoints(); >>>>> ??? } >>>>> >>>>> Everything else is just new logging support for future debugging of >>>>> interpreter table management and single stepping. >>>>> >>>>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle >>>>> platforms. >>>>> Mach5 Tier[4-6] on standard Oracle platforms is running now. >>>>> >>>>> Thanks, in advance, for questions, comments or suggestions. >>>>> >>>>> Dan >>>>> >> From daniel.daugherty at oracle.com Fri Jul 5 19:47:16 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 5 Jul 2019 15:47:16 -0400 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <5afde71a-ea23-6a5c-b1c7-77f9ac1cd83f@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <784a7d5e-da81-22a1-20f2-a507e958a1e3@oracle.com> <6cc70adc-bcfe-505e-9c50-db7b10933613@oracle.com> <5afde71a-ea23-6a5c-b1c7-77f9ac1cd83f@oracle.com> Message-ID: <6386c344-5fd3-e53b-a535-7e7fa396e51c@oracle.com> On 7/5/19 1:16 PM, Daniel D. Daugherty wrote: > On 7/4/19 3:18 AM, David Holmes wrote: >> PS. I just noticed this comment: >> >> // This change must always be occur when at a safepoint. >> // Being at a safepoint causes the interpreter to use the >> // safepoint dispatch table which we overload to find single >> // step points.? Just to be sure that it has been set, we >> // call notice_safepoints when turning on single stepping. >> // When we leave our current safepoint, should_post_single_step >> // will be checked by the interpreter, and the table kept >> // or changed accordingly. >> void VM_ChangeSingleStep::doit() { >> >> The "when we leave the safepoint" part is actually the bug that is >> being fixed - right? So the comment is not accurate. > > I'll take a closer look at this part of the comment: > > // When we leave our current safepoint, should_post_single_step > // will be checked by the interpreter, and the table kept > // or changed accordingly. > > and figure out how to clarify it as part of this change. I ended up rewriting the entire block comment that David quoted above to be this: +// When _on == true, we use the safepoint interpreter dispatch table +// to allow us to find the single step points. Otherwise, we switch +// back to the regular interpreter dispatch table. +// Note: We call Interpreter::notice_safepoints() and ignore_safepoints() +// in a VM_Operation to safely make the dispatch table switch. We +// no longer rely on the safepoint mechanism to do any of this work +// for us. Dan > > Dan > > >> >> David >> ----- >> >> On 4/07/2019 5:13 pm, David Holmes wrote: >>> Hi Dan, >>> >>> On 4/07/2019 12:04 pm, Daniel D. Daugherty wrote: >>>> Greetings, >>>> >>>> Robbin recently discovered this issue with Thread Local Handshakes. >>>> Since >>>> he's not available at the moment, I'm handling the issue: >>>> >>>> ???? JDK-8227117 normal interpreter table is not restored after >>>> single stepping with TLH >>>> ???? https://bugs.openjdk.java.net/browse/JDK-8227117 >>>> >>>> When using Thread Local Handshakes, the normal interpreter table is >>>> not restored after single stepping. This issue is caused by the >>>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >>>> restore the normal interpreter table for the "off" case. >>> >>> So the result of this is that debugging tests may run more slowly >>> overall? >>> >>>> Prior to Thread Local Handshakes, this was a valid assumption to make. >>>> SafepointSynchronize::end() has been refactored into >>>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >>>> on the global safepoint branch. That matches up with the call to >>>> Interpreter::notice_safepoints() that is also on the global safepoint >>>> branch. >>>> >>>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >>>> to call Interpreter::ignore_safepoints() directly. >>>> >>>> Here's the webrev URL: >>>> >>>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >>>> >>>> The fix is just a small addition to VM_ChangeSingleStep::doit(): >>>> >>>> ??? if (_on) { >>>> ????? Interpreter::notice_safepoints(); >>>> +? } else { >>>> +??? Interpreter::ignore_safepoints(); >>>> ??? } >>> >>> Looks good - thanks for the detailed analysis in the bug report. >>> >>> I have on additional request from looking at related code - can you >>> fix this confused initializer: >>> >>> VM_ChangeSingleStep::VM_ChangeSingleStep(bool on) >>> ?? : _on(on != 0) >>> { >>> } >>> >>> as _on and on are both bool the assignment can be direct and we >>> shouldn't be comparing a bool to 0 as a matter of style. Thanks. >>> >>>> Everything else is just new logging support for future debugging of >>>> interpreter table management and single stepping. >>> >>> Logging looks good too. >>> >>> Thanks, >>> David >>> ----- >>> >>>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >>>> Mach5 Tier[4-6] on standard Oracle platforms is running now. >>>> >>>> Thanks, in advance, for questions, comments or suggestions. >>>> >>>> Dan >>>> > > From daniel.daugherty at oracle.com Fri Jul 5 19:53:37 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 5 Jul 2019 15:53:37 -0400 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <610e1384-71a0-9b14-3bff-69f22eaee3ed@oracle.com> Message-ID: > I'll file a follow up bug after the dust settles for 8227117. I filed the following: ??? JDK-8227338 templateInterpreter.cpp: copy_table() needs to be safer ??? https://bugs.openjdk.java.net/browse/JDK-8227338 Dan On 7/5/19 1:07 PM, Daniel D. Daugherty wrote: > On 7/4/19 3:10 AM, Erik ?sterlund wrote: >> Hi Dan, >> >> Thanks for picking this up. The change looks good. > > Thanks!? Of course, just the size of the comment below makes me wonder > what I got myself into... :-) And I was so happy that the non-logging > part of the fix was an else-statement with _one_ line... > > >> However, when reviewing this, I looked at the code for actually >> restoring the table (ignore/notice safepoints). It copies the >> dispatch table for the interpreter. There is a comment stating it is >> important the copying is atomic for MT-safety, and I can definitely >> see why. However, the copying the line after that comment is in fact >> not atomic. > > Actually, the comment doesn't mention 'atomic', but that's probably > because the code and the comment are very, very old. It mentions > 'word wise for MT safety' and I agree that 'atomic' is what the > person likely meant... > > The history: > > $ sgv src/share/vm/interpreter/templateInterpreter.cpp | grep 'The > copy has to occur word wise for MT safety' > 1.1?????? // Copy non-overlapping tables. The copy has to occur word > wise for MT safety. > > $ sp -r1.1 src/share/vm/interpreter/templateInterpreter.cpp > src/share/vm/interpreter/SCCS/s.templateInterpreter.cpp: > > D 1.1 07/08/29 13:42:26 sgoldman 1 0??? 00600/00000/00000 > MRs: > COMMENTS: > 6571248 - continuation_for is specialized for template interpreter > > Hmmm... I expected that comment to be even older... ahhhh... a little > more poking around and I found: > > $ sgv -r1.147 src/share/vm/interpreter/interpreter.cpp | grep 'The > copy has to occur word wise for MT safety' > 1.147???? // Copy non-overlapping tables. The copy has to occur word > wise for MT safety. > > $ sp -r1.147 src/share/vm/interpreter/interpreter.cpp > src/share/vm/interpreter/SCCS/s.interpreter.cpp: > > D 1.147 99/02/17 10:14:36 steffen 235 233?????? 00008/00002/00762 > MRs: > COMMENTS: > > This makes more sense (timeline wise) and dates back to when all > of the interpreter was in vm/interpreter/interpreter.cpp. > > >> Here is the copying code in templateInterpreter.cpp: >> >> static inline void copy_table(address* from, address* to, int size) { >> ? // Copy non-overlapping tables. The copy has to occur word wise for >> MT safety. >> ? while (size-- > 0) *to++ = *from++; >> } >> >> Copying using a loop of non-volatile loads and stores can and >> definitely will on some compilers turn into memcpy calls instead as >> the compiler (correctly) considers that an equivalent transformation. > > Yet another C++ compiler optimization land mine... sigh... > > >> And memcpy does not guarantee atomicity. Indeed on some platforms it >> is not atomic. On some platforms it will even enjoy out-of-thin-air >> values. > > That last bit is scary... > > >> Perhaps Copy::disjoint_words_atomic() would be a better choice for >> atomic word copying. If not, at the very least we should use >> Atomic::load/store here. > > Copy::disjoint_words_atomic() sounds appealing... > > For those folks that aren't familiar with this part of safepointing... > > SafepointSynchronize::arm_safepoint() calls > Interpreter::notice_safepoints() > which calls calls copy_table(). So we're not at a safepoint yet, and, > in fact, > we're trying to bring those pesky JavaThreads to a safepoint... > > SafepointSynchronize::disarm_safepoint() calls > Interpreter::ignore_safepoints() > which also calls copy_table(). However, we did that before we have > woken the > JavaThreads that are blocked for the safepoint so that use of > copy_table is safe: > > > ? // Release threads lock, so threads can be created/destroyed again. > ? Threads_lock->unlock(); > > ? // Wake threads after local state is correctly set. > ? _wait_barrier->disarm(); > } > > The 'Threads_lock->unlock()' should synchronize memory so that the > restored > table should be properly synced out to memory... > > >> Having said that, the fix for that issue seems like a separate RFE, >> because it has been sitting there for a lot longer than TLH has been >> around. > > Yes I would like to keep the copy_table() issue for a separate bug > (not RFE). > I'll file a follow up bug after the dust settles for 8227117. > > Thanks again for the review! > > Dan > >> >> Thanks, >> /Erik >> >> On 2019-07-04 04:04, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> Robbin recently discovered this issue with Thread Local Handshakes. >>> Since >>> he's not available at the moment, I'm handling the issue: >>> >>> ???? JDK-8227117 normal interpreter table is not restored after >>> single stepping with TLH >>> ???? https://bugs.openjdk.java.net/browse/JDK-8227117 >>> >>> When using Thread Local Handshakes, the normal interpreter table is >>> not restored after single stepping. This issue is caused by the >>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >>> restore the normal interpreter table for the "off" case. >>> >>> Prior to Thread Local Handshakes, this was a valid assumption to make. >>> SafepointSynchronize::end() has been refactored into >>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >>> on the global safepoint branch. That matches up with the call to >>> Interpreter::notice_safepoints() that is also on the global safepoint >>> branch. >>> >>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >>> to call Interpreter::ignore_safepoints() directly. >>> >>> Here's the webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >>> >>> The fix is just a small addition to VM_ChangeSingleStep::doit(): >>> >>> ??? if (_on) { >>> ????? Interpreter::notice_safepoints(); >>> +? } else { >>> +??? Interpreter::ignore_safepoints(); >>> ??? } >>> >>> Everything else is just new logging support for future debugging of >>> interpreter table management and single stepping. >>> >>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >>> Mach5 Tier[4-6] on standard Oracle platforms is running now. >>> >>> Thanks, in advance, for questions, comments or suggestions. >>> >>> Dan >>> > > From daniel.daugherty at oracle.com Fri Jul 5 20:24:11 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 5 Jul 2019 16:24:11 -0400 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> Message-ID: <27269b98-226c-d8f9-b7c3-77e7fd67d976@oracle.com> Greetings, Here's the new webrev URLs after CR0: Full: http://cr.openjdk.java.net/~dcubed/8227117-webrev/1_for_jdk14.full/ Inc: http://cr.openjdk.java.net/~dcubed/8227117-webrev/1_for_jdk14.inc/ Only src/hotspot/share/prims/jvmtiEventController.cpp changed in this round. The only code change was this: ?VM_ChangeSingleStep::VM_ChangeSingleStep(bool on) -? : _on(on != 0) +? : _on(on) ?{ ?} The other change is a header comment block rewrite... Thanks, in advance, for questions, comments or suggestions. Dan On 7/3/19 10:04 PM, Daniel D. Daugherty wrote: > Greetings, > > Robbin recently discovered this issue with Thread Local Handshakes. Since > he's not available at the moment, I'm handling the issue: > > ??? JDK-8227117 normal interpreter table is not restored after single > stepping with TLH > ??? https://bugs.openjdk.java.net/browse/JDK-8227117 > > When using Thread Local Handshakes, the normal interpreter table is > not restored after single stepping. This issue is caused by the > VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to > restore the normal interpreter table for the "off" case. > > Prior to Thread Local Handshakes, this was a valid assumption to make. > SafepointSynchronize::end() has been refactored into > disarm_safepoint() and it only calls Interpreter::ignore_safepoints() > on the global safepoint branch. That matches up with the call to > Interpreter::notice_safepoints() that is also on the global safepoint > branch. > > The solution is for the VM_ChangeSingleStep VM-op for the "off" case > to call Interpreter::ignore_safepoints() directly. > > Here's the webrev URL: > > http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ > > The fix is just a small addition to VM_ChangeSingleStep::doit(): > > ?? if (_on) { > ???? Interpreter::notice_safepoints(); > +? } else { > +??? Interpreter::ignore_safepoints(); > ?? } > > Everything else is just new logging support for future debugging of > interpreter table management and single stepping. > > Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. > Mach5 Tier[4-6] on standard Oracle platforms is running now. > > Thanks, in advance, for questions, comments or suggestions. > > Dan > > From erik.osterlund at oracle.com Fri Jul 5 20:41:45 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Fri, 5 Jul 2019 22:41:45 +0200 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <610e1384-71a0-9b14-3bff-69f22eaee3ed@oracle.com> Message-ID: Thanks Dan! /Erik On 5 Jul 2019, at 21:53, Daniel D. Daugherty wrote: >> I'll file a follow up bug after the dust settles for 8227117. > > I filed the following: > > JDK-8227338 templateInterpreter.cpp: copy_table() needs to be safer > https://bugs.openjdk.java.net/browse/JDK-8227338 > > Dan > > >> On 7/5/19 1:07 PM, Daniel D. Daugherty wrote: >>> On 7/4/19 3:10 AM, Erik ?sterlund wrote: >>> Hi Dan, >>> >>> Thanks for picking this up. The change looks good. >> >> Thanks! Of course, just the size of the comment below makes me wonder >> what I got myself into... :-) And I was so happy that the non-logging >> part of the fix was an else-statement with _one_ line... >> >> >>> However, when reviewing this, I looked at the code for actually restoring the table (ignore/notice safepoints). It copies the dispatch table for the interpreter. There is a comment stating it is important the copying is atomic for MT-safety, and I can definitely see why. However, the copying the line after that comment is in fact not atomic. >> >> Actually, the comment doesn't mention 'atomic', but that's probably >> because the code and the comment are very, very old. It mentions >> 'word wise for MT safety' and I agree that 'atomic' is what the >> person likely meant... >> >> The history: >> >> $ sgv src/share/vm/interpreter/templateInterpreter.cpp | grep 'The copy has to occur word wise for MT safety' >> 1.1 // Copy non-overlapping tables. The copy has to occur word wise for MT safety. >> >> $ sp -r1.1 src/share/vm/interpreter/templateInterpreter.cpp >> src/share/vm/interpreter/SCCS/s.templateInterpreter.cpp: >> >> D 1.1 07/08/29 13:42:26 sgoldman 1 0 00600/00000/00000 >> MRs: >> COMMENTS: >> 6571248 - continuation_for is specialized for template interpreter >> >> Hmmm... I expected that comment to be even older... ahhhh... a little >> more poking around and I found: >> >> $ sgv -r1.147 src/share/vm/interpreter/interpreter.cpp | grep 'The copy has to occur word wise for MT safety' >> 1.147 // Copy non-overlapping tables. The copy has to occur word wise for MT safety. >> >> $ sp -r1.147 src/share/vm/interpreter/interpreter.cpp >> src/share/vm/interpreter/SCCS/s.interpreter.cpp: >> >> D 1.147 99/02/17 10:14:36 steffen 235 233 00008/00002/00762 >> MRs: >> COMMENTS: >> >> This makes more sense (timeline wise) and dates back to when all >> of the interpreter was in vm/interpreter/interpreter.cpp. >> >> >>> Here is the copying code in templateInterpreter.cpp: >>> >>> static inline void copy_table(address* from, address* to, int size) { >>> // Copy non-overlapping tables. The copy has to occur word wise for MT safety. >>> while (size-- > 0) *to++ = *from++; >>> } >>> >>> Copying using a loop of non-volatile loads and stores can and definitely will on some compilers turn into memcpy calls instead as the compiler (correctly) considers that an equivalent transformation. >> >> Yet another C++ compiler optimization land mine... sigh... >> >> >>> And memcpy does not guarantee atomicity. Indeed on some platforms it is not atomic. On some platforms it will even enjoy out-of-thin-air values. >> >> That last bit is scary... >> >> >>> Perhaps Copy::disjoint_words_atomic() would be a better choice for atomic word copying. If not, at the very least we should use Atomic::load/store here. >> >> Copy::disjoint_words_atomic() sounds appealing... >> >> For those folks that aren't familiar with this part of safepointing... >> >> SafepointSynchronize::arm_safepoint() calls Interpreter::notice_safepoints() >> which calls calls copy_table(). So we're not at a safepoint yet, and, in fact, >> we're trying to bring those pesky JavaThreads to a safepoint... >> >> SafepointSynchronize::disarm_safepoint() calls Interpreter::ignore_safepoints() >> which also calls copy_table(). However, we did that before we have woken the >> JavaThreads that are blocked for the safepoint so that use of copy_table is safe: >> >> >> // Release threads lock, so threads can be created/destroyed again. >> Threads_lock->unlock(); >> >> // Wake threads after local state is correctly set. >> _wait_barrier->disarm(); >> } >> >> The 'Threads_lock->unlock()' should synchronize memory so that the restored >> table should be properly synced out to memory... >> >> >>> Having said that, the fix for that issue seems like a separate RFE, because it has been sitting there for a lot longer than TLH has been around. >> >> Yes I would like to keep the copy_table() issue for a separate bug (not RFE). >> I'll file a follow up bug after the dust settles for 8227117. >> >> Thanks again for the review! >> >> Dan >> >>> >>> Thanks, >>> /Erik >>> >>>> On 2019-07-04 04:04, Daniel D. Daugherty wrote: >>>> Greetings, >>>> >>>> Robbin recently discovered this issue with Thread Local Handshakes. Since >>>> he's not available at the moment, I'm handling the issue: >>>> >>>> JDK-8227117 normal interpreter table is not restored after single stepping with TLH >>>> https://bugs.openjdk.java.net/browse/JDK-8227117 >>>> >>>> When using Thread Local Handshakes, the normal interpreter table is >>>> not restored after single stepping. This issue is caused by the >>>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >>>> restore the normal interpreter table for the "off" case. >>>> >>>> Prior to Thread Local Handshakes, this was a valid assumption to make. >>>> SafepointSynchronize::end() has been refactored into >>>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >>>> on the global safepoint branch. That matches up with the call to >>>> Interpreter::notice_safepoints() that is also on the global safepoint >>>> branch. >>>> >>>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >>>> to call Interpreter::ignore_safepoints() directly. >>>> >>>> Here's the webrev URL: >>>> >>>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >>>> >>>> The fix is just a small addition to VM_ChangeSingleStep::doit(): >>>> >>>> if (_on) { >>>> Interpreter::notice_safepoints(); >>>> + } else { >>>> + Interpreter::ignore_safepoints(); >>>> } >>>> >>>> Everything else is just new logging support for future debugging of >>>> interpreter table management and single stepping. >>>> >>>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >>>> Mach5 Tier[4-6] on standard Oracle platforms is running now. >>>> >>>> Thanks, in advance, for questions, comments or suggestions. >>>> >>>> Dan >>>> >> >> > From david.holmes at oracle.com Fri Jul 5 21:06:35 2019 From: david.holmes at oracle.com (David Holmes) Date: Sat, 6 Jul 2019 07:06:35 +1000 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <27269b98-226c-d8f9-b7c3-77e7fd67d976@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <27269b98-226c-d8f9-b7c3-77e7fd67d976@oracle.com> Message-ID: <62ba86a4-0258-fb4d-ea54-0cf8640e646f@oracle.com> Looks good! Thanks, David On 6/07/2019 6:24 am, Daniel D. Daugherty wrote: > Greetings, > > Here's the new webrev URLs after CR0: > > Full: > > http://cr.openjdk.java.net/~dcubed/8227117-webrev/1_for_jdk14.full/ > > Inc: > > http://cr.openjdk.java.net/~dcubed/8227117-webrev/1_for_jdk14.inc/ > > Only src/hotspot/share/prims/jvmtiEventController.cpp changed in this > round. The only code change was this: > > ?VM_ChangeSingleStep::VM_ChangeSingleStep(bool on) > -? : _on(on != 0) > +? : _on(on) > ?{ > ?} > > The other change is a header comment block rewrite... > > Thanks, in advance, for questions, comments or suggestions. > > Dan > > > > On 7/3/19 10:04 PM, Daniel D. Daugherty wrote: >> Greetings, >> >> Robbin recently discovered this issue with Thread Local Handshakes. Since >> he's not available at the moment, I'm handling the issue: >> >> ??? JDK-8227117 normal interpreter table is not restored after single >> stepping with TLH >> ??? https://bugs.openjdk.java.net/browse/JDK-8227117 >> >> When using Thread Local Handshakes, the normal interpreter table is >> not restored after single stepping. This issue is caused by the >> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >> restore the normal interpreter table for the "off" case. >> >> Prior to Thread Local Handshakes, this was a valid assumption to make. >> SafepointSynchronize::end() has been refactored into >> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >> on the global safepoint branch. That matches up with the call to >> Interpreter::notice_safepoints() that is also on the global safepoint >> branch. >> >> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >> to call Interpreter::ignore_safepoints() directly. >> >> Here's the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >> >> The fix is just a small addition to VM_ChangeSingleStep::doit(): >> >> ?? if (_on) { >> ???? Interpreter::notice_safepoints(); >> +? } else { >> +??? Interpreter::ignore_safepoints(); >> ?? } >> >> Everything else is just new logging support for future debugging of >> interpreter table management and single stepping. >> >> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >> Mach5 Tier[4-6] on standard Oracle platforms is running now. >> >> Thanks, in advance, for questions, comments or suggestions. >> >> Dan >> >> > From daniel.daugherty at oracle.com Fri Jul 5 21:07:30 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 5 Jul 2019 17:07:30 -0400 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <62ba86a4-0258-fb4d-ea54-0cf8640e646f@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <27269b98-226c-d8f9-b7c3-77e7fd67d976@oracle.com> <62ba86a4-0258-fb4d-ea54-0cf8640e646f@oracle.com> Message-ID: Thanks for the quick re-review! Dan P.S. Aren't you on vacation?!?!? :-) On 7/5/19 5:06 PM, David Holmes wrote: > Looks good! > > Thanks, > David > > On 6/07/2019 6:24 am, Daniel D. Daugherty wrote: >> Greetings, >> >> Here's the new webrev URLs after CR0: >> >> Full: >> >> http://cr.openjdk.java.net/~dcubed/8227117-webrev/1_for_jdk14.full/ >> >> Inc: >> >> http://cr.openjdk.java.net/~dcubed/8227117-webrev/1_for_jdk14.inc/ >> >> Only src/hotspot/share/prims/jvmtiEventController.cpp changed in this >> round. The only code change was this: >> >> ??VM_ChangeSingleStep::VM_ChangeSingleStep(bool on) >> -? : _on(on != 0) >> +? : _on(on) >> ??{ >> ??} >> >> The other change is a header comment block rewrite... >> >> Thanks, in advance, for questions, comments or suggestions. >> >> Dan >> >> >> >> On 7/3/19 10:04 PM, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> Robbin recently discovered this issue with Thread Local Handshakes. >>> Since >>> he's not available at the moment, I'm handling the issue: >>> >>> ??? JDK-8227117 normal interpreter table is not restored after >>> single stepping with TLH >>> ??? https://bugs.openjdk.java.net/browse/JDK-8227117 >>> >>> When using Thread Local Handshakes, the normal interpreter table is >>> not restored after single stepping. This issue is caused by the >>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >>> restore the normal interpreter table for the "off" case. >>> >>> Prior to Thread Local Handshakes, this was a valid assumption to make. >>> SafepointSynchronize::end() has been refactored into >>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >>> on the global safepoint branch. That matches up with the call to >>> Interpreter::notice_safepoints() that is also on the global safepoint >>> branch. >>> >>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >>> to call Interpreter::ignore_safepoints() directly. >>> >>> Here's the webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >>> >>> The fix is just a small addition to VM_ChangeSingleStep::doit(): >>> >>> ?? if (_on) { >>> ???? Interpreter::notice_safepoints(); >>> +? } else { >>> +??? Interpreter::ignore_safepoints(); >>> ?? } >>> >>> Everything else is just new logging support for future debugging of >>> interpreter table management and single stepping. >>> >>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >>> Mach5 Tier[4-6] on standard Oracle platforms is running now. >>> >>> Thanks, in advance, for questions, comments or suggestions. >>> >>> Dan >>> >>> >> From daniel.daugherty at oracle.com Sat Jul 6 13:53:04 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Sat, 6 Jul 2019 09:53:04 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer Message-ID: Greetings, During the code review for the following fix: ??? JDK-8227117 normal interpreter table is not restored after single stepping with TLH ??? https://bugs.openjdk.java.net/browse/JDK-8227117 Erik O. noticed a potential race with templateInterpreter.cpp: copy_table() depending on C++ compiler optimizations. The following bug is being used to fix this issue: ??? JDK-8227338 templateInterpreter.cpp: copy_table() needs to be safer ??? https://bugs.openjdk.java.net/browse/JDK-8227338 Here's the webrev URL: ??? http://cr.openjdk.java.net/~dcubed/8227338-webrev/0_for_jdk14/ This fix has been tested via Mach5 Tier[1-3] on Oracle's usual platforms. Mach5 tier[4-6] is running now. It has also been tested with the manual jdb test from JDK-8227117 using 'release' and 'fastdebug' bits. Thanks, in advance, for questions, comments or suggestions. Dan From david.holmes at oracle.com Sat Jul 6 22:06:34 2019 From: david.holmes at oracle.com (David Holmes) Date: Sun, 7 Jul 2019 08:06:34 +1000 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: References: Message-ID: <72e29c28-7a97-36a0-a633-45d840ccbfb5@oracle.com> Hi Dan, On 6/07/2019 11:53 pm, Daniel D. Daugherty wrote: > Greetings, > > During the code review for the following fix: > > ??? JDK-8227117 normal interpreter table is not restored after single > stepping with TLH > ??? https://bugs.openjdk.java.net/browse/JDK-8227117 > > Erik O. noticed a potential race with templateInterpreter.cpp: copy_table() > depending on C++ compiler optimizations. The following bug is being used > to fix this issue: > > ??? JDK-8227338 templateInterpreter.cpp: copy_table() needs to be safer > ??? https://bugs.openjdk.java.net/browse/JDK-8227338 > > Here's the webrev URL: > > ??? http://cr.openjdk.java.net/~dcubed/8227338-webrev/0_for_jdk14/ So the original code uses a loop to copy, while the new code calls Copy::disjoint_words_atomic, but the implementation of that on x64 is just a loop same as the original AFAICS: static void pd_disjoint_words_atomic(const HeapWord* from, HeapWord* to, size_t count) { #ifdef AMD64 switch (count) { case 8: to[7] = from[7]; case 7: to[6] = from[6]; case 6: to[5] = from[5]; case 5: to[4] = from[4]; case 4: to[3] = from[3]; case 3: to[2] = from[2]; case 2: to[1] = from[1]; case 1: to[0] = from[0]; case 0: break; default: while (count-- > 0) { *to++ = *from++; } break; } #else David ----- > This fix has been tested via Mach5 Tier[1-3] on Oracle's usual platforms. > Mach5 tier[4-6] is running now. It has also been tested with the manual > jdb test from JDK-8227117 using 'release' and 'fastdebug' bits. > > Thanks, in advance, for questions, comments or suggestions. > > Dan From daniel.daugherty at oracle.com Sun Jul 7 00:05:23 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Sat, 6 Jul 2019 20:05:23 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: <72e29c28-7a97-36a0-a633-45d840ccbfb5@oracle.com> References: <72e29c28-7a97-36a0-a633-45d840ccbfb5@oracle.com> Message-ID: <06a48457-5ea3-1801-1409-fcb5f2e5d5b8@oracle.com> On 7/6/19 6:06 PM, David Holmes wrote: > Hi Dan, > > On 6/07/2019 11:53 pm, Daniel D. Daugherty wrote: >> Greetings, >> >> During the code review for the following fix: >> >> ???? JDK-8227117 normal interpreter table is not restored after >> single stepping with TLH >> ???? https://bugs.openjdk.java.net/browse/JDK-8227117 >> >> Erik O. noticed a potential race with templateInterpreter.cpp: >> copy_table() >> depending on C++ compiler optimizations. The following bug is being used >> to fix this issue: >> >> ???? JDK-8227338 templateInterpreter.cpp: copy_table() needs to be safer >> ???? https://bugs.openjdk.java.net/browse/JDK-8227338 >> >> Here's the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8227338-webrev/0_for_jdk14/ > > So the original code uses a loop to copy, while the new code calls > Copy::disjoint_words_atomic, but the implementation of that on x64 is > just a loop same as the original AFAICS: Yup. I figure Erik O. will jump in here with his reasoning... :-) Dan > > static void pd_disjoint_words_atomic(const HeapWord* from, HeapWord* > to, size_t count) { > #ifdef AMD64 > ? switch (count) { > ? case 8:? to[7] = from[7]; > ? case 7:? to[6] = from[6]; > ? case 6:? to[5] = from[5]; > ? case 5:? to[4] = from[4]; > ? case 4:? to[3] = from[3]; > ? case 3:? to[2] = from[2]; > ? case 2:? to[1] = from[1]; > ? case 1:? to[0] = from[0]; > ? case 0:? break; > ? default: > ??? while (count-- > 0) { > ????? *to++ = *from++; > ??? } > ??? break; > ? } > #else > > David > ----- > >> This fix has been tested via Mach5 Tier[1-3] on Oracle's usual >> platforms. >> Mach5 tier[4-6] is running now. It has also been tested with the manual >> jdb test from JDK-8227117 using 'release' and 'fastdebug' bits. >> >> Thanks, in advance, for questions, comments or suggestions. >> >> Dan From daniel.daugherty at oracle.com Sun Jul 7 00:46:26 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Sat, 6 Jul 2019 20:46:26 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: <06a48457-5ea3-1801-1409-fcb5f2e5d5b8@oracle.com> References: <72e29c28-7a97-36a0-a633-45d840ccbfb5@oracle.com> <06a48457-5ea3-1801-1409-fcb5f2e5d5b8@oracle.com> Message-ID: <9d02d527-76e0-8460-df38-bc80a0c8ef88@oracle.com> Added Erik O. to the "To:" list... On 7/6/19 8:05 PM, Daniel D. Daugherty wrote: > On 7/6/19 6:06 PM, David Holmes wrote: >> Hi Dan, >> >> On 6/07/2019 11:53 pm, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> During the code review for the following fix: >>> >>> ???? JDK-8227117 normal interpreter table is not restored after >>> single stepping with TLH >>> ???? https://bugs.openjdk.java.net/browse/JDK-8227117 >>> >>> Erik O. noticed a potential race with templateInterpreter.cpp: >>> copy_table() >>> depending on C++ compiler optimizations. The following bug is being >>> used >>> to fix this issue: >>> >>> ???? JDK-8227338 templateInterpreter.cpp: copy_table() needs to be >>> safer >>> ???? https://bugs.openjdk.java.net/browse/JDK-8227338 >>> >>> Here's the webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8227338-webrev/0_for_jdk14/ >> >> So the original code uses a loop to copy, while the new code calls >> Copy::disjoint_words_atomic, but the implementation of that on x64 is >> just a loop same as the original AFAICS: > > Yup. I figure Erik O. will jump in here with his reasoning... :-) Thinking about it more... I think the answer is that we are switching to calling code that specifies the type of behavior that we need: ??? Copy::disjoint_words_atomic() is what we need when we're not at a safepoint. If, down the road, we find that the compiler does something with pd_disjoint_words_atomic() that breaks our expectation for pd_disjoint_words_atomic(), then we fix that version of pd_disjoint_words_atomic() and all the callers will be good again...? Or something like that... The version in src/hotspot/os_cpu/solaris_x86/copy_solaris_x86.inline.hpp happens to be exactly our loop with no switch statement... which is particularly funny given Erik's observations about what at least one Solaris X64 compiler did to loops... Still, I'm just guessing here on a Saturday night... hopefully Erik will chime in here... Dan > > Dan > > >> >> static void pd_disjoint_words_atomic(const HeapWord* from, HeapWord* >> to, size_t count) { >> #ifdef AMD64 >> ? switch (count) { >> ? case 8:? to[7] = from[7]; >> ? case 7:? to[6] = from[6]; >> ? case 6:? to[5] = from[5]; >> ? case 5:? to[4] = from[4]; >> ? case 4:? to[3] = from[3]; >> ? case 3:? to[2] = from[2]; >> ? case 2:? to[1] = from[1]; >> ? case 1:? to[0] = from[0]; >> ? case 0:? break; >> ? default: >> ??? while (count-- > 0) { >> ????? *to++ = *from++; >> ??? } >> ??? break; >> ? } >> #else >> >> David >> ----- >> >>> This fix has been tested via Mach5 Tier[1-3] on Oracle's usual >>> platforms. >>> Mach5 tier[4-6] is running now. It has also been tested with the manual >>> jdb test from JDK-8227117 using 'release' and 'fastdebug' bits. >>> >>> Thanks, in advance, for questions, comments or suggestions. >>> >>> Dan > From erik.osterlund at oracle.com Sun Jul 7 08:48:15 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Sun, 7 Jul 2019 10:48:15 +0200 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: <9d02d527-76e0-8460-df38-bc80a0c8ef88@oracle.com> References: <72e29c28-7a97-36a0-a633-45d840ccbfb5@oracle.com> <06a48457-5ea3-1801-1409-fcb5f2e5d5b8@oracle.com> <9d02d527-76e0-8460-df38-bc80a0c8ef88@oracle.com> Message-ID: <068FF99F-5AAB-47CD-8DAE-C4435C4967F0@oracle.com> Yeah that switch statement code and yet another plain non-volatile load/store loop looks like complete nonsense unfortunately. It should at least use Atomic::load/store. Fortunately, on x86_64, I believe it will in practice yield word atomic copying anyway by chance. But it should be fixed anyway. *sigh* The real danger is SPARC though and its BIS instructions. I don?t have the code in front of me, but I really hope not to see that switch statement and non-volatile loop in that pd_disjoint_words_atomic() function. And I agree that the atomic copying API should be used when we need atomic copying. And if it turns out the implementation of that API is not atomic, it should be fixed in that atomic copying API. So I think this change looks good. But it looks like we are not done yet. :c Thanks, /Erik > On 7 Jul 2019, at 02:46, Daniel D. Daugherty wrote: > > Added Erik O. to the "To:" list... > > >> On 7/6/19 8:05 PM, Daniel D. Daugherty wrote: >>> On 7/6/19 6:06 PM, David Holmes wrote: >>> Hi Dan, >>> >>>> On 6/07/2019 11:53 pm, Daniel D. Daugherty wrote: >>>> Greetings, >>>> >>>> During the code review for the following fix: >>>> >>>> JDK-8227117 normal interpreter table is not restored after single stepping with TLH >>>> https://bugs.openjdk.java.net/browse/JDK-8227117 >>>> >>>> Erik O. noticed a potential race with templateInterpreter.cpp: copy_table() >>>> depending on C++ compiler optimizations. The following bug is being used >>>> to fix this issue: >>>> >>>> JDK-8227338 templateInterpreter.cpp: copy_table() needs to be safer >>>> https://bugs.openjdk.java.net/browse/JDK-8227338 >>>> >>>> Here's the webrev URL: >>>> >>>> http://cr.openjdk.java.net/~dcubed/8227338-webrev/0_for_jdk14/ >>> >>> So the original code uses a loop to copy, while the new code calls Copy::disjoint_words_atomic, but the implementation of that on x64 is just a loop same as the original AFAICS: >> >> Yup. I figure Erik O. will jump in here with his reasoning... :-) > > Thinking about it more... I think the answer is that we are switching to > calling code that specifies the type of behavior that we need: > > Copy::disjoint_words_atomic() > > is what we need when we're not at a safepoint. If, down the road, we find > that the compiler does something with pd_disjoint_words_atomic() that > breaks our expectation for pd_disjoint_words_atomic(), then we fix that > version of pd_disjoint_words_atomic() and all the callers will be good > again... Or something like that... > > The version in src/hotspot/os_cpu/solaris_x86/copy_solaris_x86.inline.hpp > happens to be exactly our loop with no switch statement... which is > particularly funny given Erik's observations about what at least one > Solaris X64 compiler did to loops... > > Still, I'm just guessing here on a Saturday night... hopefully Erik > will chime in here... > > Dan > > > >> >> Dan >> >> >>> >>> static void pd_disjoint_words_atomic(const HeapWord* from, HeapWord* to, size_t count) { >>> #ifdef AMD64 >>> switch (count) { >>> case 8: to[7] = from[7]; >>> case 7: to[6] = from[6]; >>> case 6: to[5] = from[5]; >>> case 5: to[4] = from[4]; >>> case 4: to[3] = from[3]; >>> case 3: to[2] = from[2]; >>> case 2: to[1] = from[1]; >>> case 1: to[0] = from[0]; >>> case 0: break; >>> default: >>> while (count-- > 0) { >>> *to++ = *from++; >>> } >>> break; >>> } >>> #else >>> >>> David >>> ----- >>> >>>> This fix has been tested via Mach5 Tier[1-3] on Oracle's usual platforms. >>>> Mach5 tier[4-6] is running now. It has also been tested with the manual >>>> jdb test from JDK-8227117 using 'release' and 'fastdebug' bits. >>>> >>>> Thanks, in advance, for questions, comments or suggestions. >>>> >>>> Dan >> > From serguei.spitsyn at oracle.com Sun Jul 7 09:16:00 2019 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Sun, 7 Jul 2019 02:16:00 -0700 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <27269b98-226c-d8f9-b7c3-77e7fd67d976@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <27269b98-226c-d8f9-b7c3-77e7fd67d976@oracle.com> Message-ID: Hi Dan, The update looks good to me. Thanks, Serguei On 7/5/19 13:24, Daniel D. Daugherty wrote: > Greetings, > > Here's the new webrev URLs after CR0: > > Full: > > http://cr.openjdk.java.net/~dcubed/8227117-webrev/1_for_jdk14.full/ > > Inc: > > http://cr.openjdk.java.net/~dcubed/8227117-webrev/1_for_jdk14.inc/ > > Only src/hotspot/share/prims/jvmtiEventController.cpp changed in this > round. The only code change was this: > > ?VM_ChangeSingleStep::VM_ChangeSingleStep(bool on) > -? : _on(on != 0) > +? : _on(on) > ?{ > ?} > > The other change is a header comment block rewrite... > > Thanks, in advance, for questions, comments or suggestions. > > Dan > > > > On 7/3/19 10:04 PM, Daniel D. Daugherty wrote: >> Greetings, >> >> Robbin recently discovered this issue with Thread Local Handshakes. >> Since >> he's not available at the moment, I'm handling the issue: >> >> ??? JDK-8227117 normal interpreter table is not restored after single >> stepping with TLH >> ??? https://bugs.openjdk.java.net/browse/JDK-8227117 >> >> When using Thread Local Handshakes, the normal interpreter table is >> not restored after single stepping. This issue is caused by the >> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >> restore the normal interpreter table for the "off" case. >> >> Prior to Thread Local Handshakes, this was a valid assumption to make. >> SafepointSynchronize::end() has been refactored into >> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >> on the global safepoint branch. That matches up with the call to >> Interpreter::notice_safepoints() that is also on the global safepoint >> branch. >> >> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >> to call Interpreter::ignore_safepoints() directly. >> >> Here's the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >> >> The fix is just a small addition to VM_ChangeSingleStep::doit(): >> >> ?? if (_on) { >> ???? Interpreter::notice_safepoints(); >> +? } else { >> +??? Interpreter::ignore_safepoints(); >> ?? } >> >> Everything else is just new logging support for future debugging of >> interpreter table management and single stepping. >> >> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >> Mach5 Tier[4-6] on standard Oracle platforms is running now. >> >> Thanks, in advance, for questions, comments or suggestions. >> >> Dan >> >> > From daniel.daugherty at oracle.com Sun Jul 7 13:03:05 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Sun, 7 Jul 2019 09:03:05 -0400 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <27269b98-226c-d8f9-b7c3-77e7fd67d976@oracle.com> Message-ID: <3cadb8c0-0e31-7e77-afcf-21dc2c50b602@oracle.com> Thanks for the re-review! Dan On 7/7/19 5:16 AM, serguei.spitsyn at oracle.com wrote: > Hi Dan, > > The update looks good to me. > > Thanks, > Serguei > > > On 7/5/19 13:24, Daniel D. Daugherty wrote: >> Greetings, >> >> Here's the new webrev URLs after CR0: >> >> Full: >> >> http://cr.openjdk.java.net/~dcubed/8227117-webrev/1_for_jdk14.full/ >> >> Inc: >> >> http://cr.openjdk.java.net/~dcubed/8227117-webrev/1_for_jdk14.inc/ >> >> Only src/hotspot/share/prims/jvmtiEventController.cpp changed in this >> round. The only code change was this: >> >> ?VM_ChangeSingleStep::VM_ChangeSingleStep(bool on) >> -? : _on(on != 0) >> +? : _on(on) >> ?{ >> ?} >> >> The other change is a header comment block rewrite... >> >> Thanks, in advance, for questions, comments or suggestions. >> >> Dan >> >> >> >> On 7/3/19 10:04 PM, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> Robbin recently discovered this issue with Thread Local Handshakes. >>> Since >>> he's not available at the moment, I'm handling the issue: >>> >>> ??? JDK-8227117 normal interpreter table is not restored after >>> single stepping with TLH >>> ??? https://bugs.openjdk.java.net/browse/JDK-8227117 >>> >>> When using Thread Local Handshakes, the normal interpreter table is >>> not restored after single stepping. This issue is caused by the >>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >>> restore the normal interpreter table for the "off" case. >>> >>> Prior to Thread Local Handshakes, this was a valid assumption to make. >>> SafepointSynchronize::end() has been refactored into >>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >>> on the global safepoint branch. That matches up with the call to >>> Interpreter::notice_safepoints() that is also on the global safepoint >>> branch. >>> >>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >>> to call Interpreter::ignore_safepoints() directly. >>> >>> Here's the webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >>> >>> The fix is just a small addition to VM_ChangeSingleStep::doit(): >>> >>> ?? if (_on) { >>> ???? Interpreter::notice_safepoints(); >>> +? } else { >>> +??? Interpreter::ignore_safepoints(); >>> ?? } >>> >>> Everything else is just new logging support for future debugging of >>> interpreter table management and single stepping. >>> >>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >>> Mach5 Tier[4-6] on standard Oracle platforms is running now. >>> >>> Thanks, in advance, for questions, comments or suggestions. >>> >>> Dan >>> >>> >> > > From daniel.daugherty at oracle.com Sun Jul 7 13:21:41 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Sun, 7 Jul 2019 09:21:41 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: <068FF99F-5AAB-47CD-8DAE-C4435C4967F0@oracle.com> References: <72e29c28-7a97-36a0-a633-45d840ccbfb5@oracle.com> <06a48457-5ea3-1801-1409-fcb5f2e5d5b8@oracle.com> <9d02d527-76e0-8460-df38-bc80a0c8ef88@oracle.com> <068FF99F-5AAB-47CD-8DAE-C4435C4967F0@oracle.com> Message-ID: Erik, Thanks for chiming in on this thread... On 7/7/19 4:48 AM, Erik Osterlund wrote: > Yeah that switch statement code and yet another plain non-volatile load/store loop looks like complete nonsense unfortunately. It should at least use Atomic::load/store. > > Fortunately, on x86_64, I believe it will in practice yield word atomic copying anyway by chance. But it should be fixed anyway. *sigh* > > The real danger is SPARC though and its BIS instructions. I don?t have the code in front of me, but I really hope not to see that switch statement and non-volatile loop in that pd_disjoint_words_atomic() function. > > And I agree that the atomic copying API should be used when we need atomic copying. And if it turns out the implementation of that API is not atomic, it should be fixed in that atomic copying API. Okay, so I think we're on the same page w.r.t. this fix (8227338). David, do you concur that this fix can move forward? > So I think this change looks good. Thanks! > But it looks like we are not done yet. :c So we need another bug: ??? JDK-8227369 pd_disjoint_words_atomic() needs to be atomic ??? https://bugs.openjdk.java.net/browse/JDK-8227369 However, I'll leave that for you (or someone else) to take. Thanks for the review. Dan > > Thanks, > /Erik > >> On 7 Jul 2019, at 02:46, Daniel D. Daugherty wrote: >> >> Added Erik O. to the "To:" list... >> >> >>> On 7/6/19 8:05 PM, Daniel D. Daugherty wrote: >>>> On 7/6/19 6:06 PM, David Holmes wrote: >>>> Hi Dan, >>>> >>>>> On 6/07/2019 11:53 pm, Daniel D. Daugherty wrote: >>>>> Greetings, >>>>> >>>>> During the code review for the following fix: >>>>> >>>>> JDK-8227117 normal interpreter table is not restored after single stepping with TLH >>>>> https://bugs.openjdk.java.net/browse/JDK-8227117 >>>>> >>>>> Erik O. noticed a potential race with templateInterpreter.cpp: copy_table() >>>>> depending on C++ compiler optimizations. The following bug is being used >>>>> to fix this issue: >>>>> >>>>> JDK-8227338 templateInterpreter.cpp: copy_table() needs to be safer >>>>> https://bugs.openjdk.java.net/browse/JDK-8227338 >>>>> >>>>> Here's the webrev URL: >>>>> >>>>> http://cr.openjdk.java.net/~dcubed/8227338-webrev/0_for_jdk14/ >>>> So the original code uses a loop to copy, while the new code calls Copy::disjoint_words_atomic, but the implementation of that on x64 is just a loop same as the original AFAICS: >>> Yup. I figure Erik O. will jump in here with his reasoning... :-) >> Thinking about it more... I think the answer is that we are switching to >> calling code that specifies the type of behavior that we need: >> >> Copy::disjoint_words_atomic() >> >> is what we need when we're not at a safepoint. If, down the road, we find >> that the compiler does something with pd_disjoint_words_atomic() that >> breaks our expectation for pd_disjoint_words_atomic(), then we fix that >> version of pd_disjoint_words_atomic() and all the callers will be good >> again... Or something like that... >> >> The version in src/hotspot/os_cpu/solaris_x86/copy_solaris_x86.inline.hpp >> happens to be exactly our loop with no switch statement... which is >> particularly funny given Erik's observations about what at least one >> Solaris X64 compiler did to loops... >> >> Still, I'm just guessing here on a Saturday night... hopefully Erik >> will chime in here... >> >> Dan >> >> >> >>> Dan >>> >>> >>>> static void pd_disjoint_words_atomic(const HeapWord* from, HeapWord* to, size_t count) { >>>> #ifdef AMD64 >>>> switch (count) { >>>> case 8: to[7] = from[7]; >>>> case 7: to[6] = from[6]; >>>> case 6: to[5] = from[5]; >>>> case 5: to[4] = from[4]; >>>> case 4: to[3] = from[3]; >>>> case 3: to[2] = from[2]; >>>> case 2: to[1] = from[1]; >>>> case 1: to[0] = from[0]; >>>> case 0: break; >>>> default: >>>> while (count-- > 0) { >>>> *to++ = *from++; >>>> } >>>> break; >>>> } >>>> #else >>>> >>>> David >>>> ----- >>>> >>>>> This fix has been tested via Mach5 Tier[1-3] on Oracle's usual platforms. >>>>> Mach5 tier[4-6] is running now. It has also been tested with the manual >>>>> jdb test from JDK-8227117 using 'release' and 'fastdebug' bits. >>>>> >>>>> Thanks, in advance, for questions, comments or suggestions. >>>>> >>>>> Dan From patricio.chilano.mateo at oracle.com Sun Jul 7 19:09:13 2019 From: patricio.chilano.mateo at oracle.com (Patricio Chilano) Date: Sun, 7 Jul 2019 15:09:13 -0400 Subject: RFR 8191890: Biased locking still uses the inferior stop the world safepoint for revocation Message-ID: <4df1d772-bea5-0921-93fb-d6c20f29bcdd@oracle.com> Hi all, Below is the webrev for v05. This is just v04 on top of a new baseline that includes the backout of 8221734 and other changes made to biasedLocking code by 8225702 and 8225344. The only difference between v05 and v04 is the use of SafepointSynchronize::safepoint_id() instead of SafepointSynchronize::safepoint_counter() introduced by 8225702, and not having to remove method BiasedLocking::revoke_own_locks_in_handshake() and to edit method Deoptimization::revoke_using_handshake() which were actually removed by the backout of 8221734. Full Webrev: http://cr.openjdk.java.net/~pchilanomate/8191890/v05/webrev/ Tested with tiers1-7. Running another round now. Thanks! Patricio From david.holmes at oracle.com Mon Jul 8 00:08:44 2019 From: david.holmes at oracle.com (David Holmes) Date: Mon, 8 Jul 2019 10:08:44 +1000 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: <068FF99F-5AAB-47CD-8DAE-C4435C4967F0@oracle.com> References: <72e29c28-7a97-36a0-a633-45d840ccbfb5@oracle.com> <06a48457-5ea3-1801-1409-fcb5f2e5d5b8@oracle.com> <9d02d527-76e0-8460-df38-bc80a0c8ef88@oracle.com> <068FF99F-5AAB-47CD-8DAE-C4435C4967F0@oracle.com> Message-ID: On 7/07/2019 6:48 pm, Erik Osterlund wrote: > Yeah that switch statement code and yet another plain non-volatile load/store loop looks like complete nonsense unfortunately. It should at least use Atomic::load/store. > > Fortunately, on x86_64, I believe it will in practice yield word atomic copying anyway by chance. But it should be fixed anyway. *sigh* The requirement is for atomic word accesses, which properly aligned addresses will always yield - we rely on that guarantee throughout to avoid word-tearing. The issue is that the loop may be converted to a 'bulk' copying operation that may not provide atomic word accesses. So unless the Atomic::load/store will prevent the loop from being converted they are not in and off themselves required for correctness (else we need to use them nearly everywhere). > The real danger is SPARC though and its BIS instructions. I don?t have the code in front of me, but I really hope not to see that switch statement and non-volatile loop in that pd_disjoint_words_atomic() function. sparc uses the same loop. Let's face it, almost no body expects the compiler to do these kinds of transformations. :( > And I agree that the atomic copying API should be used when we need atomic copying. And if it turns out the implementation of that API is not atomic, it should be fixed in that atomic copying API. I agree to some extent, but we assume atomic load/stores of words all over the place - and rightly so. The issue here is that we need to hide the loop inside an API that we can somehow prevent the C++ compiler from screwing up. It's hardly intuitive or obvious when this is needed e.g if I simply copy three adjacent words without a loop could the compiler convert that to a block move that is non-atomic? > So I think this change looks good. But it looks like we are not done yet. :c I agree that changing the current code to use the atomic copy API to convey intent is fine. Cheers, David ----- > Thanks, > /Erik > >> On 7 Jul 2019, at 02:46, Daniel D. Daugherty wrote: >> >> Added Erik O. to the "To:" list... >> >> >>> On 7/6/19 8:05 PM, Daniel D. Daugherty wrote: >>>> On 7/6/19 6:06 PM, David Holmes wrote: >>>> Hi Dan, >>>> >>>>> On 6/07/2019 11:53 pm, Daniel D. Daugherty wrote: >>>>> Greetings, >>>>> >>>>> During the code review for the following fix: >>>>> >>>>> JDK-8227117 normal interpreter table is not restored after single stepping with TLH >>>>> https://bugs.openjdk.java.net/browse/JDK-8227117 >>>>> >>>>> Erik O. noticed a potential race with templateInterpreter.cpp: copy_table() >>>>> depending on C++ compiler optimizations. The following bug is being used >>>>> to fix this issue: >>>>> >>>>> JDK-8227338 templateInterpreter.cpp: copy_table() needs to be safer >>>>> https://bugs.openjdk.java.net/browse/JDK-8227338 >>>>> >>>>> Here's the webrev URL: >>>>> >>>>> http://cr.openjdk.java.net/~dcubed/8227338-webrev/0_for_jdk14/ >>>> >>>> So the original code uses a loop to copy, while the new code calls Copy::disjoint_words_atomic, but the implementation of that on x64 is just a loop same as the original AFAICS: >>> >>> Yup. I figure Erik O. will jump in here with his reasoning... :-) >> >> Thinking about it more... I think the answer is that we are switching to >> calling code that specifies the type of behavior that we need: >> >> Copy::disjoint_words_atomic() >> >> is what we need when we're not at a safepoint. If, down the road, we find >> that the compiler does something with pd_disjoint_words_atomic() that >> breaks our expectation for pd_disjoint_words_atomic(), then we fix that >> version of pd_disjoint_words_atomic() and all the callers will be good >> again... Or something like that... >> >> The version in src/hotspot/os_cpu/solaris_x86/copy_solaris_x86.inline.hpp >> happens to be exactly our loop with no switch statement... which is >> particularly funny given Erik's observations about what at least one >> Solaris X64 compiler did to loops... >> >> Still, I'm just guessing here on a Saturday night... hopefully Erik >> will chime in here... >> >> Dan >> >> >> >>> >>> Dan >>> >>> >>>> >>>> static void pd_disjoint_words_atomic(const HeapWord* from, HeapWord* to, size_t count) { >>>> #ifdef AMD64 >>>> switch (count) { >>>> case 8: to[7] = from[7]; >>>> case 7: to[6] = from[6]; >>>> case 6: to[5] = from[5]; >>>> case 5: to[4] = from[4]; >>>> case 4: to[3] = from[3]; >>>> case 3: to[2] = from[2]; >>>> case 2: to[1] = from[1]; >>>> case 1: to[0] = from[0]; >>>> case 0: break; >>>> default: >>>> while (count-- > 0) { >>>> *to++ = *from++; >>>> } >>>> break; >>>> } >>>> #else >>>> >>>> David >>>> ----- >>>> >>>>> This fix has been tested via Mach5 Tier[1-3] on Oracle's usual platforms. >>>>> Mach5 tier[4-6] is running now. It has also been tested with the manual >>>>> jdb test from JDK-8227117 using 'release' and 'fastdebug' bits. >>>>> >>>>> Thanks, in advance, for questions, comments or suggestions. >>>>> >>>>> Dan >>> >> > From jianglizhou at google.com Mon Jul 8 00:12:39 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Sun, 7 Jul 2019 17:12:39 -0700 Subject: RFR(S) 8226406: JVM fails to detect mismatched or corrupt CDS archive In-Reply-To: <1c5ab850-8b93-0b1f-3be4-8ba9f6eb469d@oracle.com> References: <1c5ab850-8b93-0b1f-3be4-8ba9f6eb469d@oracle.com> Message-ID: Hi Calvin, Per our off-mailing-list email exchange from the previous code review for https://bugs.openjdk.java.net/browse/JDK-8211723, I created https://bugs.openjdk.java.net/browse/JDK-8227370, 'Remove SharedPathsMiscInfo' . I think the crash caused by premature runtime accessing of _paths_misc_info_size should be handled as part of JDK-8227370, rather than further patching up the SharedPathsMiscInfo. Thanks and regards, Jiangli On Wed, Jul 3, 2019 at 5:59 PM Calvin Cheung wrote: > > bug: https://bugs.openjdk.java.net/browse/JDK-8226406 > > webrev: http://cr.openjdk.java.net/~ccheung/8226406/webrev.00/ > > This bug was found during a bootcycle build when a shared archive built > by a 64-bit JDK version is used by a 32-bit JDK version. It is due to > some of the important header fields such as the _jvm_ident was not > checked prior to accessinng other fields such as the _paths_misc_info_size. > > This fix involves checking most the fields in CDSFileMapHeaderBase > before accessing other fields. > > Testing: tiers 1-3. > > thanks, > > Calvin > From ioi.lam at oracle.com Mon Jul 8 04:01:59 2019 From: ioi.lam at oracle.com (Ioi Lam) Date: Sun, 7 Jul 2019 21:01:59 -0700 Subject: RFR(S) 8226406: JVM fails to detect mismatched or corrupt CDS archive In-Reply-To: <1c5ab850-8b93-0b1f-3be4-8ba9f6eb469d@oracle.com> References: <1c5ab850-8b93-0b1f-3be4-8ba9f6eb469d@oracle.com> Message-ID: Hi Calvin, These changes look good to me. Thanks - Ioi On 7/3/19 5:59 PM, Calvin Cheung wrote: > bug: https://bugs.openjdk.java.net/browse/JDK-8226406 > > webrev: http://cr.openjdk.java.net/~ccheung/8226406/webrev.00/ > > This bug was found during a bootcycle build when a shared archive > built by a 64-bit JDK version is used by a 32-bit JDK version. It is > due to some of the important header fields such as the _jvm_ident was > not checked prior to accessinng other fields such as the > _paths_misc_info_size. > > This fix involves checking most the fields in CDSFileMapHeaderBase > before accessing other fields. > > Testing: tiers 1-3. > > thanks, > > Calvin > From fweimer at redhat.com Mon Jul 8 09:27:27 2019 From: fweimer at redhat.com (Florian Weimer) Date: Mon, 08 Jul 2019 11:27:27 +0200 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: (Jiangli Zhou's message of "Wed, 3 Jul 2019 11:22:53 -0700") References: <874l4bzekk.fsf@oldenburg2.str.redhat.com> <87k1d7xxy7.fsf@oldenburg2.str.redhat.com> <87ftnvt604.fsf@oldenburg2.str.redhat.com> <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> <0dacfb70-8901-9e9f-9a1f-6a5dc1bc6a91@oracle.com> Message-ID: <875zocyiyo.fsf@oldenburg2.str.redhat.com> * Jiangli Zhou: > As you, Florian, Thomas all made great contributions to this > workaround, I should list all of you as both contributors and > reviewers in the changeset. If there is any objection, please let me > know. Can you share a link with the final patch? I would like to have another look. Thanks, Florian From serguei.spitsyn at oracle.com Mon Jul 8 11:28:05 2019 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 8 Jul 2019 04:28:05 -0700 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: References: Message-ID: <1e32e9fa-8c37-de8d-25c3-3cb2db5ea38d@oracle.com> Hi Dan, This looks good to me as far as just discussed the atomic copy is considered to be a separate issue. Thanks, Serguei On 7/6/19 06:53, Daniel D. Daugherty wrote: > Greetings, > > During the code review for the following fix: > > ??? JDK-8227117 normal interpreter table is not restored after single > stepping with TLH > ??? https://bugs.openjdk.java.net/browse/JDK-8227117 > > Erik O. noticed a potential race with templateInterpreter.cpp: > copy_table() > depending on C++ compiler optimizations. The following bug is being used > to fix this issue: > > ??? JDK-8227338 templateInterpreter.cpp: copy_table() needs to be safer > ??? https://bugs.openjdk.java.net/browse/JDK-8227338 > > Here's the webrev URL: > > ??? http://cr.openjdk.java.net/~dcubed/8227338-webrev/0_for_jdk14/ > > This fix has been tested via Mach5 Tier[1-3] on Oracle's usual platforms. > Mach5 tier[4-6] is running now. It has also been tested with the manual > jdb test from JDK-8227117 using 'release' and 'fastdebug' bits. > > Thanks, in advance, for questions, comments or suggestions. > > Dan From daniel.daugherty at oracle.com Mon Jul 8 12:36:05 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 8 Jul 2019 08:36:05 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: <1e32e9fa-8c37-de8d-25c3-3cb2db5ea38d@oracle.com> References: <1e32e9fa-8c37-de8d-25c3-3cb2db5ea38d@oracle.com> Message-ID: <196c4b07-4d3f-3878-6889-80d6380b2d6d@oracle.com> Thanks for the review Serguei. Dan On 7/8/19 7:28 AM, serguei.spitsyn at oracle.com wrote: > Hi Dan, > > This looks good to me as far as just discussed the atomic copy is > considered to be a separate issue. > > Thanks, > Serguei > > > On 7/6/19 06:53, Daniel D. Daugherty wrote: >> Greetings, >> >> During the code review for the following fix: >> >> ??? JDK-8227117 normal interpreter table is not restored after single >> stepping with TLH >> ??? https://bugs.openjdk.java.net/browse/JDK-8227117 >> >> Erik O. noticed a potential race with templateInterpreter.cpp: >> copy_table() >> depending on C++ compiler optimizations. The following bug is being used >> to fix this issue: >> >> ??? JDK-8227338 templateInterpreter.cpp: copy_table() needs to be safer >> ??? https://bugs.openjdk.java.net/browse/JDK-8227338 >> >> Here's the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8227338-webrev/0_for_jdk14/ >> >> This fix has been tested via Mach5 Tier[1-3] on Oracle's usual >> platforms. >> Mach5 tier[4-6] is running now. It has also been tested with the manual >> jdb test from JDK-8227117 using 'release' and 'fastdebug' bits. >> >> Thanks, in advance, for questions, comments or suggestions. >> >> Dan > From daniel.daugherty at oracle.com Mon Jul 8 12:35:34 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 8 Jul 2019 08:35:34 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: References: <72e29c28-7a97-36a0-a633-45d840ccbfb5@oracle.com> <06a48457-5ea3-1801-1409-fcb5f2e5d5b8@oracle.com> <9d02d527-76e0-8460-df38-bc80a0c8ef88@oracle.com> <068FF99F-5AAB-47CD-8DAE-C4435C4967F0@oracle.com> Message-ID: <05894efc-c4eb-1877-fc41-60311567e154@oracle.com> Hi David, On 7/7/19 8:08 PM, David Holmes wrote: > On 7/07/2019 6:48 pm, Erik Osterlund wrote: >> Yeah that switch statement code and yet another plain non-volatile >> load/store loop looks like complete nonsense unfortunately. It should >> at least use Atomic::load/store. >> >> Fortunately, on x86_64, I believe it will in practice yield word >> atomic copying anyway by chance. But it should be fixed anyway. *sigh* > > The requirement is for atomic word accesses, which properly aligned > addresses will always yield - we rely on that guarantee throughout to > avoid word-tearing. The issue is that the loop may be converted to a > 'bulk' copying operation that may not provide atomic word accesses. So > unless the Atomic::load/store will prevent the loop from being > converted they are not in and off themselves required for correctness > (else we need to use them nearly everywhere). > >> The real danger is SPARC though and its BIS instructions. I don?t >> have the code in front of me, but I really hope not to see that >> switch statement and non-volatile loop in that >> pd_disjoint_words_atomic() function. > > sparc uses the same loop. > > Let's face it, almost no body expects the compiler to do these kinds > of transformations. :( > >> And I agree that the atomic copying API should be used when we need >> atomic copying. And if it turns out the implementation of that API is >> not atomic, it should be fixed in that atomic copying API. > > I agree to some extent, but we assume atomic load/stores of words all > over the place - and rightly so. The issue here is that we need to > hide the loop inside an API that we can somehow prevent the C++ > compiler from screwing up. It's hardly intuitive or obvious when this > is needed e.g if I simply copy three adjacent words without a loop > could the compiler convert that to a block move that is non-atomic? > >> So I think this change looks good. But it looks like we are not done >> yet. :c > > I agree that changing the current code to use the atomic copy API to > convey intent is fine. Thanks. Dan > > Cheers, > David > ----- > >> Thanks, >> /Erik >> >>> On 7 Jul 2019, at 02:46, Daniel D. Daugherty >>> wrote: >>> >>> Added Erik O. to the "To:" list... >>> >>> >>>> On 7/6/19 8:05 PM, Daniel D. Daugherty wrote: >>>>> On 7/6/19 6:06 PM, David Holmes wrote: >>>>> Hi Dan, >>>>> >>>>>> On 6/07/2019 11:53 pm, Daniel D. Daugherty wrote: >>>>>> Greetings, >>>>>> >>>>>> During the code review for the following fix: >>>>>> >>>>>> ????? JDK-8227117 normal interpreter table is not restored after >>>>>> single stepping with TLH >>>>>> ????? https://bugs.openjdk.java.net/browse/JDK-8227117 >>>>>> >>>>>> Erik O. noticed a potential race with templateInterpreter.cpp: >>>>>> copy_table() >>>>>> depending on C++ compiler optimizations. The following bug is >>>>>> being used >>>>>> to fix this issue: >>>>>> >>>>>> ????? JDK-8227338 templateInterpreter.cpp: copy_table() needs to >>>>>> be safer >>>>>> ????? https://bugs.openjdk.java.net/browse/JDK-8227338 >>>>>> >>>>>> Here's the webrev URL: >>>>>> >>>>>> http://cr.openjdk.java.net/~dcubed/8227338-webrev/0_for_jdk14/ >>>>> >>>>> So the original code uses a loop to copy, while the new code calls >>>>> Copy::disjoint_words_atomic, but the implementation of that on x64 >>>>> is just a loop same as the original AFAICS: >>>> >>>> Yup. I figure Erik O. will jump in here with his reasoning... :-) >>> >>> Thinking about it more... I think the answer is that we are >>> switching to >>> calling code that specifies the type of behavior that we need: >>> >>> ???? Copy::disjoint_words_atomic() >>> >>> is what we need when we're not at a safepoint. If, down the road, we >>> find >>> that the compiler does something with pd_disjoint_words_atomic() that >>> breaks our expectation for pd_disjoint_words_atomic(), then we fix that >>> version of pd_disjoint_words_atomic() and all the callers will be good >>> again...? Or something like that... >>> >>> The version in >>> src/hotspot/os_cpu/solaris_x86/copy_solaris_x86.inline.hpp >>> happens to be exactly our loop with no switch statement... which is >>> particularly funny given Erik's observations about what at least one >>> Solaris X64 compiler did to loops... >>> >>> Still, I'm just guessing here on a Saturday night... hopefully Erik >>> will chime in here... >>> >>> Dan >>> >>> >>> >>>> >>>> Dan >>>> >>>> >>>>> >>>>> static void pd_disjoint_words_atomic(const HeapWord* from, >>>>> HeapWord* to, size_t count) { >>>>> #ifdef AMD64 >>>>> ?? switch (count) { >>>>> ?? case 8:? to[7] = from[7]; >>>>> ?? case 7:? to[6] = from[6]; >>>>> ?? case 6:? to[5] = from[5]; >>>>> ?? case 5:? to[4] = from[4]; >>>>> ?? case 4:? to[3] = from[3]; >>>>> ?? case 3:? to[2] = from[2]; >>>>> ?? case 2:? to[1] = from[1]; >>>>> ?? case 1:? to[0] = from[0]; >>>>> ?? case 0:? break; >>>>> ?? default: >>>>> ???? while (count-- > 0) { >>>>> ?????? *to++ = *from++; >>>>> ???? } >>>>> ???? break; >>>>> ?? } >>>>> #else >>>>> >>>>> David >>>>> ----- >>>>> >>>>>> This fix has been tested via Mach5 Tier[1-3] on Oracle's usual >>>>>> platforms. >>>>>> Mach5 tier[4-6] is running now. It has also been tested with the >>>>>> manual >>>>>> jdb test from JDK-8227117 using 'release' and 'fastdebug' bits. >>>>>> >>>>>> Thanks, in advance, for questions, comments or suggestions. >>>>>> >>>>>> Dan >>>> >>> >> From jianglizhou at google.com Mon Jul 8 14:27:11 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Mon, 8 Jul 2019 07:27:11 -0700 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: <875zocyiyo.fsf@oldenburg2.str.redhat.com> References: <874l4bzekk.fsf@oldenburg2.str.redhat.com> <87k1d7xxy7.fsf@oldenburg2.str.redhat.com> <87ftnvt604.fsf@oldenburg2.str.redhat.com> <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> <0dacfb70-8901-9e9f-9a1f-6a5dc1bc6a91@oracle.com> <875zocyiyo.fsf@oldenburg2.str.redhat.com> Message-ID: Hi Florian, Here is the full webrev: http://cr.openjdk.java.net/~jiangli/8225035/webrev.05/, including the additional comments above get_static_tls_area_size. Best regards, Jiangli On Mon, Jul 8, 2019 at 2:27 AM Florian Weimer wrote: > > * Jiangli Zhou: > > > As you, Florian, Thomas all made great contributions to this > > workaround, I should list all of you as both contributors and > > reviewers in the changeset. If there is any objection, please let me > > know. > > Can you share a link with the final patch? I would like to have another > look. > > Thanks, > Florian From Roger.Riggs at oracle.com Mon Jul 8 14:55:55 2019 From: Roger.Riggs at oracle.com (Roger Riggs) Date: Mon, 8 Jul 2019 10:55:55 -0400 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: References: <87ftnvt604.fsf@oldenburg2.str.redhat.com> <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> <0dacfb70-8901-9e9f-9a1f-6a5dc1bc6a91@oracle.com> <875zocyiyo.fsf@oldenburg2.str.redhat.com> Message-ID: <871f39c4-0bf9-35ac-a411-3ac10d012009@oracle.com> Hi, src/hotspot/os/linux/os_linux.cpp: 849: typo in comment: "allocats" ->? "allocates" Will there be a release note describing the behavior and possibly the relation to the system property "jdk.lang.processReaperUseDefaultStackSize" added by 8086278? [1]. If so, add a label release-note=yes to the issue and create a subtask with for the "Release Note:.....". Thanks, Roger [1] https://bugs.openjdk.java.net/browse/JDK-8086278 On 7/8/19 10:27 AM, Jiangli Zhou wrote: > Hi Florian, > > Here is the full webrev: > http://cr.openjdk.java.net/~jiangli/8225035/webrev.05/, including the > additional comments above get_static_tls_area_size. > > Best regards, > Jiangli > > On Mon, Jul 8, 2019 at 2:27 AM Florian Weimer wrote: >> * Jiangli Zhou: >> >>> As you, Florian, Thomas all made great contributions to this >>> workaround, I should list all of you as both contributors and >>> reviewers in the changeset. If there is any objection, please let me >>> know. >> Can you share a link with the final patch? I would like to have another >> look. >> >> Thanks, >> Florian From coleen.phillimore at oracle.com Mon Jul 8 15:30:02 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 8 Jul 2019 11:30:02 -0400 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <784a7d5e-da81-22a1-20f2-a507e958a1e3@oracle.com> Message-ID: <6a9be07d-c290-f06e-15fe-2ae80abaf10d@oracle.com> The change and comment look good.? I have a question below though: On 7/5/19 1:12 PM, Daniel D. Daugherty wrote: > On 7/4/19 3:13 AM, David Holmes wrote: >> Hi Dan, >> >> On 4/07/2019 12:04 pm, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> Robbin recently discovered this issue with Thread Local Handshakes. >>> Since >>> he's not available at the moment, I'm handling the issue: >>> >>> ???? JDK-8227117 normal interpreter table is not restored after >>> single stepping with TLH >>> ???? https://bugs.openjdk.java.net/browse/JDK-8227117 >>> >>> When using Thread Local Handshakes, the normal interpreter table is >>> not restored after single stepping. This issue is caused by the >>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >>> restore the normal interpreter table for the "off" case. >> >> So the result of this is that debugging tests may run more slowly >> overall? > > Not just tests. An interactive debugging session would also be affected. > After single stepping once, we won't ever switch back to the normal table > so we'll be stuck with the safepoint interpreter dispatch table. I'm trying to think if there's a good assertion to test this, but I don't think there is.? Maybe renaming the safept_table to breakpoint_table would be good to do to make it clear, once TLH is the only way to safepoint. Thanks, Coleen > > >> >>> Prior to Thread Local Handshakes, this was a valid assumption to make. >>> SafepointSynchronize::end() has been refactored into >>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >>> on the global safepoint branch. That matches up with the call to >>> Interpreter::notice_safepoints() that is also on the global safepoint >>> branch. >>> >>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >>> to call Interpreter::ignore_safepoints() directly. >>> >>> Here's the webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >>> >>> The fix is just a small addition to VM_ChangeSingleStep::doit(): >>> >>> ??? if (_on) { >>> ????? Interpreter::notice_safepoints(); >>> +? } else { >>> +??? Interpreter::ignore_safepoints(); >>> ??? } >> >> Looks good - thanks for the detailed analysis in the bug report. >> >> I have on additional request from looking at related code - can you >> fix this confused initializer: >> >> VM_ChangeSingleStep::VM_ChangeSingleStep(bool on) >> ? : _on(on != 0) >> { >> } > > Yes, I can fix that. > > >> as _on and on are both bool the assignment can be direct and we >> shouldn't be comparing a bool to 0 as a matter of style. Thanks. >> >>> Everything else is just new logging support for future debugging of >>> interpreter table management and single stepping. >> >> Logging looks good too. > > Thanks. > > Thanks for the review. > > Dan > > >> >> Thanks, >> David >> ----- >> >>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >>> Mach5 Tier[4-6] on standard Oracle platforms is running now. >>> >>> Thanks, in advance, for questions, comments or suggestions. >>> >>> Dan >>> > From daniel.daugherty at oracle.com Mon Jul 8 15:34:47 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 8 Jul 2019 11:34:47 -0400 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <6a9be07d-c290-f06e-15fe-2ae80abaf10d@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <784a7d5e-da81-22a1-20f2-a507e958a1e3@oracle.com> <6a9be07d-c290-f06e-15fe-2ae80abaf10d@oracle.com> Message-ID: On 7/8/19 11:30 AM, coleen.phillimore at oracle.com wrote: > > The change and comment look good.? I have a question below though: Thanks for the review. I've already committed the changeset in preparation for pushing. Hope you don't mind if I don't list you as a reviewer... More below... > > On 7/5/19 1:12 PM, Daniel D. Daugherty wrote: >> On 7/4/19 3:13 AM, David Holmes wrote: >>> Hi Dan, >>> >>> On 4/07/2019 12:04 pm, Daniel D. Daugherty wrote: >>>> Greetings, >>>> >>>> Robbin recently discovered this issue with Thread Local Handshakes. >>>> Since >>>> he's not available at the moment, I'm handling the issue: >>>> >>>> ???? JDK-8227117 normal interpreter table is not restored after >>>> single stepping with TLH >>>> ???? https://bugs.openjdk.java.net/browse/JDK-8227117 >>>> >>>> When using Thread Local Handshakes, the normal interpreter table is >>>> not restored after single stepping. This issue is caused by the >>>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >>>> restore the normal interpreter table for the "off" case. >>> >>> So the result of this is that debugging tests may run more slowly >>> overall? >> >> Not just tests. An interactive debugging session would also be affected. >> After single stepping once, we won't ever switch back to the normal >> table >> so we'll be stuck with the safepoint interpreter dispatch table. > > > I'm trying to think if there's a good assertion to test this, but I > don't think there is.? Maybe renaming the safept_table to > breakpoint_table would be good to do to make it clear, once TLH is the > only way to safepoint. Definitely something to keep in mind for the future. I don't know if there is a bug for eventually phasing out global safepoints... Dan > > Thanks, > Coleen >> >> >>> >>>> Prior to Thread Local Handshakes, this was a valid assumption to make. >>>> SafepointSynchronize::end() has been refactored into >>>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >>>> on the global safepoint branch. That matches up with the call to >>>> Interpreter::notice_safepoints() that is also on the global safepoint >>>> branch. >>>> >>>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >>>> to call Interpreter::ignore_safepoints() directly. >>>> >>>> Here's the webrev URL: >>>> >>>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >>>> >>>> The fix is just a small addition to VM_ChangeSingleStep::doit(): >>>> >>>> ??? if (_on) { >>>> ????? Interpreter::notice_safepoints(); >>>> +? } else { >>>> +??? Interpreter::ignore_safepoints(); >>>> ??? } >>> >>> Looks good - thanks for the detailed analysis in the bug report. >>> >>> I have on additional request from looking at related code - can you >>> fix this confused initializer: >>> >>> VM_ChangeSingleStep::VM_ChangeSingleStep(bool on) >>> ? : _on(on != 0) >>> { >>> } >> >> Yes, I can fix that. >> >> >>> as _on and on are both bool the assignment can be direct and we >>> shouldn't be comparing a bool to 0 as a matter of style. Thanks. >>> >>>> Everything else is just new logging support for future debugging of >>>> interpreter table management and single stepping. >>> >>> Logging looks good too. >> >> Thanks. >> >> Thanks for the review. >> >> Dan >> >> >>> >>> Thanks, >>> David >>> ----- >>> >>>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle platforms. >>>> Mach5 Tier[4-6] on standard Oracle platforms is running now. >>>> >>>> Thanks, in advance, for questions, comments or suggestions. >>>> >>>> Dan >>>> >> > From daniel.daugherty at oracle.com Mon Jul 8 15:42:44 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 8 Jul 2019 11:42:44 -0400 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <784a7d5e-da81-22a1-20f2-a507e958a1e3@oracle.com> <6a9be07d-c290-f06e-15fe-2ae80abaf10d@oracle.com> Message-ID: <6daf6fc1-eb33-f16e-9133-4ded32344f9a@oracle.com> Added back serviceability-dev at ... Coleen, please double check before using reply-to-list... if there's more than one list, that feature doesn't work right... On 7/8/19 11:34 AM, Daniel D. Daugherty wrote: > On 7/8/19 11:30 AM, coleen.phillimore at oracle.com wrote: >> >> The change and comment look good.? I have a question below though: > > Thanks for the review. I've already committed the changeset > in preparation for pushing. Hope you don't mind if I don't > list you as a reviewer... I was able to use "hg rollback" and redo the patch to include you in the list of reviewers... Dan > > More below... > > >> >> On 7/5/19 1:12 PM, Daniel D. Daugherty wrote: >>> On 7/4/19 3:13 AM, David Holmes wrote: >>>> Hi Dan, >>>> >>>> On 4/07/2019 12:04 pm, Daniel D. Daugherty wrote: >>>>> Greetings, >>>>> >>>>> Robbin recently discovered this issue with Thread Local >>>>> Handshakes. Since >>>>> he's not available at the moment, I'm handling the issue: >>>>> >>>>> ???? JDK-8227117 normal interpreter table is not restored after >>>>> single stepping with TLH >>>>> ???? https://bugs.openjdk.java.net/browse/JDK-8227117 >>>>> >>>>> When using Thread Local Handshakes, the normal interpreter table is >>>>> not restored after single stepping. This issue is caused by the >>>>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >>>>> restore the normal interpreter table for the "off" case. >>>> >>>> So the result of this is that debugging tests may run more slowly >>>> overall? >>> >>> Not just tests. An interactive debugging session would also be >>> affected. >>> After single stepping once, we won't ever switch back to the normal >>> table >>> so we'll be stuck with the safepoint interpreter dispatch table. >> >> >> I'm trying to think if there's a good assertion to test this, but I >> don't think there is.? Maybe renaming the safept_table to >> breakpoint_table would be good to do to make it clear, once TLH is >> the only way to safepoint. > > Definitely something to keep in mind for the future. I don't > know if there is a bug for eventually phasing out global > safepoints... > > Dan > > >> >> Thanks, >> Coleen >>> >>> >>>> >>>>> Prior to Thread Local Handshakes, this was a valid assumption to >>>>> make. >>>>> SafepointSynchronize::end() has been refactored into >>>>> disarm_safepoint() and it only calls Interpreter::ignore_safepoints() >>>>> on the global safepoint branch. That matches up with the call to >>>>> Interpreter::notice_safepoints() that is also on the global safepoint >>>>> branch. >>>>> >>>>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >>>>> to call Interpreter::ignore_safepoints() directly. >>>>> >>>>> Here's the webrev URL: >>>>> >>>>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >>>>> >>>>> The fix is just a small addition to VM_ChangeSingleStep::doit(): >>>>> >>>>> ??? if (_on) { >>>>> ????? Interpreter::notice_safepoints(); >>>>> +? } else { >>>>> +??? Interpreter::ignore_safepoints(); >>>>> ??? } >>>> >>>> Looks good - thanks for the detailed analysis in the bug report. >>>> >>>> I have on additional request from looking at related code - can you >>>> fix this confused initializer: >>>> >>>> VM_ChangeSingleStep::VM_ChangeSingleStep(bool on) >>>> ? : _on(on != 0) >>>> { >>>> } >>> >>> Yes, I can fix that. >>> >>> >>>> as _on and on are both bool the assignment can be direct and we >>>> shouldn't be comparing a bool to 0 as a matter of style. Thanks. >>>> >>>>> Everything else is just new logging support for future debugging of >>>>> interpreter table management and single stepping. >>>> >>>> Logging looks good too. >>> >>> Thanks. >>> >>> Thanks for the review. >>> >>> Dan >>> >>> >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle >>>>> platforms. >>>>> Mach5 Tier[4-6] on standard Oracle platforms is running now. >>>>> >>>>> Thanks, in advance, for questions, comments or suggestions. >>>>> >>>>> Dan >>>>> >>> >> > > From jianglizhou at google.com Mon Jul 8 16:05:18 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Mon, 8 Jul 2019 09:05:18 -0700 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: <871f39c4-0bf9-35ac-a411-3ac10d012009@oracle.com> References: <87ftnvt604.fsf@oldenburg2.str.redhat.com> <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> <0dacfb70-8901-9e9f-9a1f-6a5dc1bc6a91@oracle.com> <875zocyiyo.fsf@oldenburg2.str.redhat.com> <871f39c4-0bf9-35ac-a411-3ac10d012009@oracle.com> Message-ID: Hi Roger, On Mon, Jul 8, 2019 at 7:56 AM Roger Riggs wrote: > > Hi, > > src/hotspot/os/linux/os_linux.cpp: > > 849: typo in comment: "allocats" -> "allocates" Fixed in place. Thanks! > > > > Will there be a release note describing the behavior and possibly the > relation to the > system property "jdk.lang.processReaperUseDefaultStackSize" added by > 8086278 [1]. A release note sounds like a good idea. The "jdk.lang.processReaperUseDefaultStackSize" system property seems no longer necessary with this more-general workaround, but that can be made as separate decision. Thoughts? > > If so, add a label release-note=yes to the issue and create a subtask > with for the "Release Note:.....". Will do. The TLS issue and the new AdjustStackSizeForTLS option are Linux only, any additional process regarding the release note? Best regards, Jiangli > > Thanks, Roger > > > > [1] https://bugs.openjdk.java.net/browse/JDK-8086278 > > On 7/8/19 10:27 AM, Jiangli Zhou wrote: > > Hi Florian, > > > > Here is the full webrev: > > http://cr.openjdk.java.net/~jiangli/8225035/webrev.05/, including the > > additional comments above get_static_tls_area_size. > > > > Best regards, > > Jiangli > > > > On Mon, Jul 8, 2019 at 2:27 AM Florian Weimer wrote: > >> * Jiangli Zhou: > >> > >>> As you, Florian, Thomas all made great contributions to this > >>> workaround, I should list all of you as both contributors and > >>> reviewers in the changeset. If there is any objection, please let me > >>> know. > >> Can you share a link with the final patch? I would like to have another > >> look. > >> > >> Thanks, > >> Florian > From coleen.phillimore at oracle.com Mon Jul 8 16:05:52 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 8 Jul 2019 12:05:52 -0400 Subject: RFR(XXS): 8227117: normal interpreter table is not restored after single stepping with TLH In-Reply-To: <6daf6fc1-eb33-f16e-9133-4ded32344f9a@oracle.com> References: <5a4b9d27-a4d1-a0af-cfd9-1982d3508b7d@oracle.com> <784a7d5e-da81-22a1-20f2-a507e958a1e3@oracle.com> <6a9be07d-c290-f06e-15fe-2ae80abaf10d@oracle.com> <6daf6fc1-eb33-f16e-9133-4ded32344f9a@oracle.com> Message-ID: <7c316369-e9cc-f891-0d19-2b6890947ff9@oracle.com> On 7/8/19 11:42 AM, Daniel D. Daugherty wrote: > Added back serviceability-dev at ... > > Coleen, please double check before using reply-to-list... if there's more > than one list, that feature doesn't work right... Sometimes Reply-All in my mailer gets all the lists and sometimes it doesn't.? I'll check next time that I get them all. > > On 7/8/19 11:34 AM, Daniel D. Daugherty wrote: >> On 7/8/19 11:30 AM, coleen.phillimore at oracle.com wrote: >>> >>> The change and comment look good.? I have a question below though: >> >> Thanks for the review. I've already committed the changeset >> in preparation for pushing. Hope you don't mind if I don't >> list you as a reviewer... > > I was able to use "hg rollback" and redo the patch to include > you in the list of reviewers... Thanks, I didn't mind not being listed but I was interested in the change and how it escaped our testing. Coleen > > Dan > > >> >> More below... >> >> >>> >>> On 7/5/19 1:12 PM, Daniel D. Daugherty wrote: >>>> On 7/4/19 3:13 AM, David Holmes wrote: >>>>> Hi Dan, >>>>> >>>>> On 4/07/2019 12:04 pm, Daniel D. Daugherty wrote: >>>>>> Greetings, >>>>>> >>>>>> Robbin recently discovered this issue with Thread Local >>>>>> Handshakes. Since >>>>>> he's not available at the moment, I'm handling the issue: >>>>>> >>>>>> ???? JDK-8227117 normal interpreter table is not restored after >>>>>> single stepping with TLH >>>>>> ???? https://bugs.openjdk.java.net/browse/JDK-8227117 >>>>>> >>>>>> When using Thread Local Handshakes, the normal interpreter table is >>>>>> not restored after single stepping. This issue is caused by the >>>>>> VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to >>>>>> restore the normal interpreter table for the "off" case. >>>>> >>>>> So the result of this is that debugging tests may run more slowly >>>>> overall? >>>> >>>> Not just tests. An interactive debugging session would also be >>>> affected. >>>> After single stepping once, we won't ever switch back to the normal >>>> table >>>> so we'll be stuck with the safepoint interpreter dispatch table. >>> >>> >>> I'm trying to think if there's a good assertion to test this, but I >>> don't think there is.? Maybe renaming the safept_table to >>> breakpoint_table would be good to do to make it clear, once TLH is >>> the only way to safepoint. >> >> Definitely something to keep in mind for the future. I don't >> know if there is a bug for eventually phasing out global >> safepoints... >> >> Dan >> >> >>> >>> Thanks, >>> Coleen >>>> >>>> >>>>> >>>>>> Prior to Thread Local Handshakes, this was a valid assumption to >>>>>> make. >>>>>> SafepointSynchronize::end() has been refactored into >>>>>> disarm_safepoint() and it only calls >>>>>> Interpreter::ignore_safepoints() >>>>>> on the global safepoint branch. That matches up with the call to >>>>>> Interpreter::notice_safepoints() that is also on the global >>>>>> safepoint >>>>>> branch. >>>>>> >>>>>> The solution is for the VM_ChangeSingleStep VM-op for the "off" case >>>>>> to call Interpreter::ignore_safepoints() directly. >>>>>> >>>>>> Here's the webrev URL: >>>>>> >>>>>> http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/ >>>>>> >>>>>> The fix is just a small addition to VM_ChangeSingleStep::doit(): >>>>>> >>>>>> ??? if (_on) { >>>>>> ????? Interpreter::notice_safepoints(); >>>>>> +? } else { >>>>>> +??? Interpreter::ignore_safepoints(); >>>>>> ??? } >>>>> >>>>> Looks good - thanks for the detailed analysis in the bug report. >>>>> >>>>> I have on additional request from looking at related code - can >>>>> you fix this confused initializer: >>>>> >>>>> VM_ChangeSingleStep::VM_ChangeSingleStep(bool on) >>>>> ? : _on(on != 0) >>>>> { >>>>> } >>>> >>>> Yes, I can fix that. >>>> >>>> >>>>> as _on and on are both bool the assignment can be direct and we >>>>> shouldn't be comparing a bool to 0 as a matter of style. Thanks. >>>>> >>>>>> Everything else is just new logging support for future debugging of >>>>>> interpreter table management and single stepping. >>>>> >>>>> Logging looks good too. >>>> >>>> Thanks. >>>> >>>> Thanks for the review. >>>> >>>> Dan >>>> >>>> >>>>> >>>>> Thanks, >>>>> David >>>>> ----- >>>>> >>>>>> Tested this fix with Mach5 Tier[1-3] on the standard Oracle >>>>>> platforms. >>>>>> Mach5 Tier[4-6] on standard Oracle platforms is running now. >>>>>> >>>>>> Thanks, in advance, for questions, comments or suggestions. >>>>>> >>>>>> Dan >>>>>> >>>> >>> >> >> > From calvin.cheung at oracle.com Mon Jul 8 16:59:21 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Mon, 8 Jul 2019 09:59:21 -0700 Subject: RFR(S) 8226406: JVM fails to detect mismatched or corrupt CDS archive In-Reply-To: References: <1c5ab850-8b93-0b1f-3be4-8ba9f6eb469d@oracle.com> Message-ID: <4f7f0d69-9729-2d98-1437-83e5276b041c@oracle.com> Hi Jiangli, On 7/7/19 5:12 PM, Jiangli Zhou wrote: > Hi Calvin, > > Per our off-mailing-list email exchange from the previous code review > for https://bugs.openjdk.java.net/browse/JDK-8211723, I created > https://bugs.openjdk.java.net/browse/JDK-8227370, 'Remove > SharedPathsMiscInfo' Thanks for filing the RFE. > . I think the crash caused by premature runtime accessing of > _paths_misc_info_size should be handled as part of JDK-8227370, rather > than further patching up the SharedPathsMiscInfo My current patch involves checking most the fields in CDSFileMapHeaderBase before accessing other fields. This part is applicable to other fields, not only to the _paths_misc_info_size. This bug existed for a while and I think it would be a good backport candidate for 11u. The patch for JDK-8211723 and the follow-up RFE JDK-8227370 are not necessary to be backported to 11u. I'd like to fix this bug first and then handle JDK-8227370 as a separate changeset. thanks, Calvin > > Thanks and regards, > Jiangli > > On Wed, Jul 3, 2019 at 5:59 PM Calvin Cheung wrote: >> bug: https://bugs.openjdk.java.net/browse/JDK-8226406 >> >> webrev: http://cr.openjdk.java.net/~ccheung/8226406/webrev.00/ >> >> This bug was found during a bootcycle build when a shared archive built >> by a 64-bit JDK version is used by a 32-bit JDK version. It is due to >> some of the important header fields such as the _jvm_ident was not >> checked prior to accessinng other fields such as the _paths_misc_info_size. >> >> This fix involves checking most the fields in CDSFileMapHeaderBase >> before accessing other fields. >> >> Testing: tiers 1-3. >> >> thanks, >> >> Calvin >> From jianglizhou at google.com Mon Jul 8 17:25:41 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Mon, 8 Jul 2019 10:25:41 -0700 Subject: RFR(S) 8226406: JVM fails to detect mismatched or corrupt CDS archive In-Reply-To: <4f7f0d69-9729-2d98-1437-83e5276b041c@oracle.com> References: <1c5ab850-8b93-0b1f-3be4-8ba9f6eb469d@oracle.com> <4f7f0d69-9729-2d98-1437-83e5276b041c@oracle.com> Message-ID: Hi Calvin, On Mon, Jul 8, 2019 at 10:00 AM Calvin Cheung wrote: > > Hi Jiangli, > > On 7/7/19 5:12 PM, Jiangli Zhou wrote: > > Hi Calvin, > > > > Per our off-mailing-list email exchange from the previous code review > > for https://bugs.openjdk.java.net/browse/JDK-8211723, I created > > https://bugs.openjdk.java.net/browse/JDK-8227370, 'Remove > > SharedPathsMiscInfo' > Thanks for filing the RFE. > > . I think the crash caused by premature runtime accessing of > > _paths_misc_info_size should be handled as part of JDK-8227370, rather > > than further patching up the SharedPathsMiscInfo > > My current patch involves checking most the fields in > CDSFileMapHeaderBase before accessing other fields. This part is > applicable to other fields, not only to the _paths_misc_info_size. This > bug existed for a while and I think it would be a good backport > candidate for 11u. The patch for JDK-8211723 and the follow-up RFE > JDK-8227370 are not necessary to be backported to 11u. I'd like to fix > this bug first and then handle JDK-8227370 as a separate changeset. That sounds like a good plan. A fix targeted for backporting should have a clean-cut (less dependency) and controlled scope. Addressing this incrementally in separate changesets is a suitable approach. I took a quick look over the weekend and noticed some issues with your current patch. That's why I suggested to go with the complete removal without spending extra effort on SharedPathsMiscInfo. I will need to take a closer look and try to get back to you later today. Best regards, Jiangli > > thanks, > > Calvin > > > > > Thanks and regards, > > Jiangli > > > > On Wed, Jul 3, 2019 at 5:59 PM Calvin Cheung wrote: > >> bug: https://bugs.openjdk.java.net/browse/JDK-8226406 > >> > >> webrev: http://cr.openjdk.java.net/~ccheung/8226406/webrev.00/ > >> > >> This bug was found during a bootcycle build when a shared archive built > >> by a 64-bit JDK version is used by a 32-bit JDK version. It is due to > >> some of the important header fields such as the _jvm_ident was not > >> checked prior to accessinng other fields such as the _paths_misc_info_size. > >> > >> This fix involves checking most the fields in CDSFileMapHeaderBase > >> before accessing other fields. > >> > >> Testing: tiers 1-3. > >> > >> thanks, > >> > >> Calvin > >> From daniil.x.titov at oracle.com Mon Jul 8 18:42:31 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Mon, 08 Jul 2019 11:42:31 -0700 Subject: RFR: 8170299: Debugger does not stop inside the low memory notifications code Message-ID: <92E6C72E-E1DC-48CB-9965-628AE5C3F9AB@oracle.com> Hi Serguei, Please review the new version of the fix that corrects the order of include statements in src/hotspot/share/runtime/notificationThread.cpp. The list of Include statements doesn't contain "#include "runtime/mutexLocker.hpp" since this include file is already included by runtime/interfaceSupport.inline.hpp that is in this list. I don't think we need the following function: static bool is_notification_thread(Thread* thread); For the ServiceThread the function is_service_thread(Thread* thread) is used only once in the code. It is used inside JVmtiDeferredEvent::post() to assert that the proper thread is used to post these events. Low memory, GC and diagnostic command notification never had such asserts so I'm not sure we need to introduce them regarding new NotificationThread. Webrev: https://cr.openjdk.java.net/~dtitov/8170299/webrev.02/ Bug: https://bugs.openjdk.java.net/browse/JDK-8170299 Thanks! --Daniil ?On 7/3/19, 9:02 PM, "serguei.spitsyn at oracle.com" wrote: Hi Daniil, I've not finished my review but it looks good in general. A couple of quick comments. https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/src/hotspot/share/runtime/notificationThread.hpp.html I wonder if this function is also needed: static bool is_notification_thread(Thread* thread); https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/src/hotspot/share/runtime/notificationThread.cpp.html I wonder why this include statement is missed: #include "runtime/mutexLocker.hpp" Also, these have to be correctly ordred: 29 #include "runtime/notificationThread.hpp" 30 #include "services/lowMemoryDetector.hpp" 31 #include "services/gcNotifier.hpp" 32 #include "services/diagnosticArgument.hpp" 33 #include "services/diagnosticFramework.hpp" Thanks, Serguei On 7/3/19 8:04 PM, Daniil Titov wrote: > Please review the change the fixes the problem with the debugger not stopping in the low memory notification code. > > The problem here is that the ServiceThread that calls these MXBean listeners is hidden from the external view that prevents the debugger from stopping in it. > > The fix introduces new NotificationThread that is visible to the external view and offloads the ServiceThread from sending low memory and other notifications that could result in Java calls ( GC and diagnostic commands notifications) by moving these activities in this new NotificationThread. > > Testing: Mach5 tier1,tier2 and tier3 tests succeeded. > > Webrev: https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8170299 > > Thanks! > --Daniil > > From coleen.phillimore at oracle.com Mon Jul 8 19:37:14 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 8 Jul 2019 15:37:14 -0400 Subject: RFR 8191890: Biased locking still uses the inferior stop the world safepoint for revocation In-Reply-To: <4df1d772-bea5-0921-93fb-d6c20f29bcdd@oracle.com> References: <4df1d772-bea5-0921-93fb-d6c20f29bcdd@oracle.com> Message-ID: This looks good! Coleen On 7/7/19 3:09 PM, Patricio Chilano wrote: > Hi all, > > Below is the webrev for v05. This is just v04 on top of a new baseline > that includes the backout of 8221734 and other changes made to > biasedLocking code by 8225702 and 8225344. > The only difference between v05 and v04 is the use of > SafepointSynchronize::safepoint_id() instead of > SafepointSynchronize::safepoint_counter() introduced by 8225702, and > not having to remove method > BiasedLocking::revoke_own_locks_in_handshake() and to edit method > Deoptimization::revoke_using_handshake() which were actually removed > by the backout of 8221734. > > Full Webrev: > http://cr.openjdk.java.net/~pchilanomate/8191890/v05/webrev/ > > > Tested with tiers1-7. Running another round now. > > Thanks! > Patricio From patricio.chilano.mateo at oracle.com Mon Jul 8 19:57:40 2019 From: patricio.chilano.mateo at oracle.com (Patricio Chilano) Date: Mon, 8 Jul 2019 15:57:40 -0400 Subject: RFR 8191890: Biased locking still uses the inferior stop the world safepoint for revocation In-Reply-To: References: <4df1d772-bea5-0921-93fb-d6c20f29bcdd@oracle.com> Message-ID: <9f32c329-7cd1-22c6-30dc-8b0b84f23baa@oracle.com> Thanks Coleen! Patricio On 7/8/19 3:37 PM, coleen.phillimore at oracle.com wrote: > > This looks good! > Coleen > > On 7/7/19 3:09 PM, Patricio Chilano wrote: >> Hi all, >> >> Below is the webrev for v05. This is just v04 on top of a new >> baseline that includes the backout of 8221734 and other changes made >> to biasedLocking code by 8225702 and 8225344. >> The only difference between v05 and v04 is the use of >> SafepointSynchronize::safepoint_id() instead of >> SafepointSynchronize::safepoint_counter() introduced by 8225702, and >> not having to remove method >> BiasedLocking::revoke_own_locks_in_handshake() and to edit method >> Deoptimization::revoke_using_handshake() which were actually removed >> by the backout of 8221734. >> >> Full Webrev: >> http://cr.openjdk.java.net/~pchilanomate/8191890/v05/webrev/ >> >> >> Tested with tiers1-7. Running another round now. >> >> Thanks! >> Patricio > From Roger.Riggs at oracle.com Mon Jul 8 19:59:13 2019 From: Roger.Riggs at oracle.com (Roger Riggs) Date: Mon, 8 Jul 2019 15:59:13 -0400 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: References: <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> <0dacfb70-8901-9e9f-9a1f-6a5dc1bc6a91@oracle.com> <875zocyiyo.fsf@oldenburg2.str.redhat.com> <871f39c4-0bf9-35ac-a411-3ac10d012009@oracle.com> Message-ID: <2c3e8f63-9e3b-6ec3-c766-876afb16727c@oracle.com> Hi, On 7/8/19 12:05 PM, Jiangli Zhou wrote: > Hi Roger, > > On Mon, Jul 8, 2019 at 7:56 AM Roger Riggs wrote: >> Hi, >> >> src/hotspot/os/linux/os_linux.cpp: >> >> 849: typo in comment: "allocats" -> "allocates" > Fixed in place. Thanks! > >> >> >> Will there be a release note describing the behavior and possibly the >> relation to the >> system property "jdk.lang.processReaperUseDefaultStackSize" added by >> 8086278 [1]. > A release note sounds like a good idea. The > "jdk.lang.processReaperUseDefaultStackSize" system property seems no > longer necessary with this more-general workaround, but that can be > made as separate decision. Thoughts? yes, there's a separate issue filed to mitigate unexpected stack needs of the Process Reaper. JDK-8217475 That might still be useful to mitigate unexpected memory use during initialization. >> If so, add a label release-note=yes to the issue and create a subtask >> with for the "Release Note:.....". > Will do. The TLS issue and the new AdjustStackSizeForTLS option are > Linux only, any additional process regarding the release note? There is a proposal for a code-tools project to formalize the generation of release notes though I haven't seen a response. https://mail.openjdk.java.net/pipermail/code-tools-dev/2019-April/000527.html There are some conventions for labeling release notes and process but I don't have them all in one place. The subtask should have a label = "release-note"; There are labels starting with "RN-" that indicate what kind of change it is. RN-NewFeature might be appropriate. The first line should be written to be brief and specific about the change in behavior since it may be used in a list of release notes. Take a look at another release note for example: https://bugs.openjdk.java.net/browse/JDK-8223588 Regards, Roger > > Best regards, > Jiangli >> Thanks, Roger >> >> >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8086278 >> >> On 7/8/19 10:27 AM, Jiangli Zhou wrote: >>> Hi Florian, >>> >>> Here is the full webrev: >>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.05/, including the >>> additional comments above get_static_tls_area_size. >>> >>> Best regards, >>> Jiangli >>> >>> On Mon, Jul 8, 2019 at 2:27 AM Florian Weimer wrote: >>>> * Jiangli Zhou: >>>> >>>>> As you, Florian, Thomas all made great contributions to this >>>>> workaround, I should list all of you as both contributors and >>>>> reviewers in the changeset. If there is any objection, please let me >>>>> know. >>>> Can you share a link with the final patch? I would like to have another >>>> look. >>>> >>>> Thanks, >>>> Florian From jianglizhou at google.com Mon Jul 8 20:34:55 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Mon, 8 Jul 2019 13:34:55 -0700 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: <2c3e8f63-9e3b-6ec3-c766-876afb16727c@oracle.com> References: <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> <0dacfb70-8901-9e9f-9a1f-6a5dc1bc6a91@oracle.com> <875zocyiyo.fsf@oldenburg2.str.redhat.com> <871f39c4-0bf9-35ac-a411-3ac10d012009@oracle.com> <2c3e8f63-9e3b-6ec3-c766-876afb16727c@oracle.com> Message-ID: Hi Roger, Thanks for the information! Best regards, Jiangli On Mon, Jul 8, 2019 at 12:59 PM Roger Riggs wrote: > > Hi, > > > On 7/8/19 12:05 PM, Jiangli Zhou wrote: > > Hi Roger, > > On Mon, Jul 8, 2019 at 7:56 AM Roger Riggs wrote: > > Hi, > > src/hotspot/os/linux/os_linux.cpp: > > 849: typo in comment: "allocats" -> "allocates" > > Fixed in place. Thanks! > > > > Will there be a release note describing the behavior and possibly the > relation to the > system property "jdk.lang.processReaperUseDefaultStackSize" added by > 8086278 [1]. > > A release note sounds like a good idea. The > "jdk.lang.processReaperUseDefaultStackSize" system property seems no > longer necessary with this more-general workaround, but that can be > made as separate decision. Thoughts? > > yes, there's a separate issue filed to mitigate unexpected stack needs of the Process Reaper. > JDK-8217475 > > That might still be useful to mitigate unexpected memory use during initialization. > > If so, add a label release-note=yes to the issue and create a subtask > with for the "Release Note:.....". > > Will do. The TLS issue and the new AdjustStackSizeForTLS option are > Linux only, any additional process regarding the release note? > > > There is a proposal for a code-tools project to formalize the generation of release notes > though I haven't seen a response. > > https://mail.openjdk.java.net/pipermail/code-tools-dev/2019-April/000527.html > > There are some conventions for labeling release notes and process but I don't have them all in one place. > > The subtask should have a label = "release-note"; > There are labels starting with "RN-" that indicate what kind of change it is. > RN-NewFeature might be appropriate. > The first line should be written to be brief and specific about the change in behavior since > it may be used in a list of release notes. > Take a look at another release note for example: https://bugs.openjdk.java.net/browse/JDK-8223588 > > Regards, Roger > > > > > Best regards, > Jiangli > > Thanks, Roger > > > > [1] https://bugs.openjdk.java.net/browse/JDK-8086278 > > On 7/8/19 10:27 AM, Jiangli Zhou wrote: > > Hi Florian, > > Here is the full webrev: > http://cr.openjdk.java.net/~jiangli/8225035/webrev.05/, including the > additional comments above get_static_tls_area_size. > > Best regards, > Jiangli > > On Mon, Jul 8, 2019 at 2:27 AM Florian Weimer wrote: > > * Jiangli Zhou: > > As you, Florian, Thomas all made great contributions to this > workaround, I should list all of you as both contributors and > reviewers in the changeset. If there is any objection, please let me > know. > > Can you share a link with the final patch? I would like to have another > look. > > Thanks, > Florian > > From daniel.daugherty at oracle.com Mon Jul 8 21:37:32 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 8 Jul 2019 17:37:32 -0400 Subject: RFR 8191890: Biased locking still uses the inferior stop the world safepoint for revocation In-Reply-To: <4df1d772-bea5-0921-93fb-d6c20f29bcdd@oracle.com> References: <4df1d772-bea5-0921-93fb-d6c20f29bcdd@oracle.com> Message-ID: <6526490c-8a50-f00f-138e-9a16c2de1840@oracle.com> On 7/7/19 3:09 PM, Patricio Chilano wrote: > Hi all, > > Below is the webrev for v05. This is just v04 on top of a new baseline > that includes the backout of 8221734 and other changes made to > biasedLocking code by 8225702 and 8225344. > The only difference between v05 and v04 is the use of > SafepointSynchronize::safepoint_id() instead of > SafepointSynchronize::safepoint_counter() introduced by 8225702, and > not having to remove method > BiasedLocking::revoke_own_locks_in_handshake() and to edit method > Deoptimization::revoke_using_handshake() which were actually removed > by the backout of 8221734. > > Full Webrev: > http://cr.openjdk.java.net/~pchilanomate/8191890/v05/webrev/ > src/hotspot/share/interpreter/interpreterRuntime.cpp ??? No comments. src/hotspot/share/runtime/biasedLocking.cpp ??? No comments. src/hotspot/share/runtime/biasedLocking.hpp ??? No comments. src/hotspot/share/runtime/deoptimization.cpp ??? No comments. src/hotspot/share/runtime/handshake.cpp ??? No comments. src/hotspot/share/runtime/vmOperations.hpp ??? No comment. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/VMOps.java ??? The copyright year needs to be updated. test/jdk/jdk/jfr/event/runtime/TestBiasedLockRevocationEvents.java ??? No comments. I did the first pass review using: ??? $ jfilemerge -r open.8191890.v4.patch open.8191890.v5.patch and nothing jumped out at me. I did a quick review of the v5 webrev and the only thing I noticed was the copyright year mentioned above. I have to repeated a comment from the v01 code review: > Outstanding job on a very arcane and complicated part of the system. Thanks for sticking with this fix! Dan > > Tested with tiers1-7. Running another round now. > > Thanks! > Patricio From daniel.daugherty at oracle.com Mon Jul 8 22:24:18 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 8 Jul 2019 18:24:18 -0400 Subject: RFR: 8185005: Improve performance of ThreadMXBean.getThreadInfo(long ids[], int maxDepth) In-Reply-To: <6E7B043A-4647-4931-977C-1854CA7EBEC1@oracle.com> References: <4C4212D0-BFFF-4C85-ACC6-05200F220C3F@oracle.com> <2d6dede1-aa79-99ce-a823-773fa2e19827@oracle.com> <6E7B043A-4647-4931-977C-1854CA7EBEC1@oracle.com> Message-ID: On 6/29/19 12:06 PM, Daniil Titov wrote: > Hi Serguei and David, > > Serguei is right, ThreadTable::find_thread(java_tid) cannot return a JavaThread with an unmatched java_tid. > > Please find a new version of the fix that includes the changes Serguei suggested. > > Regarding the concern about the maintaining the thread table when it may never even be queried, one of > the options could be to add ThreadTable ::isEnabled flag, set it to "false" by default, and wrap the calls to the thread table > in ThreadsSMRSupport add_thread() and remove_thread() methods to check this flag. > > When ThreadsList::find_JavaThread_from_java_tid() is called for the first time it could check if ThreadTable ::isEnabled > Is on and if not then set it on and populate the thread table with all existing threads from the thread list. I have the same concerns as David H. about this new ThreadTable. ThreadsList::find_JavaThread_from_java_tid() is only called from code in src/hotspot/share/services/management.cpp so I think that table needs to enabled and populated only if it is going to be used. I've taken a look at the webrev below and I see that David has followed up with additional comments. Before I do a crawl through code review for this, I would like to see the ThreadTable stuff made optional and David's other comments addressed. Another possible optimization is for callers of find_JavaThread_from_java_tid() to save the calling thread's tid value before they loop and if the current tid == saved_tid then use the current JavaThread* instead of calling find_JavaThread_from_java_tid() to get the JavaThread*. Dan > > Webrev: https://cr.openjdk.java.net/~dtitov/8185005/webrev.02/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8185005 > > Thanks! > --Daniil > > From: > Organization: Oracle Corporation > Date: Friday, June 28, 2019 at 7:56 PM > To: Daniil Titov , OpenJDK Serviceability , "hotspot-runtime-dev at openjdk.java.net" , "jmx-dev at openjdk.java.net" > Subject: Re: RFR: 8185005: Improve performance of ThreadMXBean.getThreadInfo(long ids[], int maxDepth) > > Hi Daniil, > > I have several quick comments. > > The indent in the hotspot c/c++ files has to be 2, not 4. > > https://cr.openjdk.java.net/~dtitov/8185005/webrev.01/src/hotspot/share/runtime/threadSMR.cpp.frames.html > 614 JavaThread* ThreadsList::find_JavaThread_from_java_tid(jlong java_tid) const { > 615 JavaThread* java_thread = ThreadTable::find_thread(java_tid); > 616 if (java_thread == NULL && java_tid == PMIMORDIAL_JAVA_TID) { > 617 // ThreadsSMRSupport::add_thread() is not called for the primordial > 618 // thread. Thus, we find this thread with a linear search and add it > 619 // to the thread table. > 620 for (uint i = 0; i < length(); i++) { > 621 JavaThread* thread = thread_at(i); > 622 if (is_valid_java_thread(java_tid,thread)) { > 623 ThreadTable::add_thread(java_tid, thread); > 624 return thread; > 625 } > 626 } > 627 } else if (java_thread != NULL && is_valid_java_thread(java_tid, java_thread)) { > 628 return java_thread; > 629 } > 630 return NULL; > 631 } > 632 bool ThreadsList::is_valid_java_thread(jlong java_tid, JavaThread* java_thread) { > 633 oop tobj = java_thread->threadObj(); > 634 // Ignore the thread if it hasn't run yet, has exited > 635 // or is starting to exit. > 636 return (tobj != NULL && !java_thread->is_exiting() && > 637 java_tid == java_lang_Thread::thread_id(tobj)); > 638 } > > 615 JavaThread* java_thread = ThreadTable::find_thread(java_tid); > > I'd suggest to rename find_thread() to find_thread_by_tid(). > > A space is missed after the comma: > ? 622 if (is_valid_java_thread(java_tid,thread)) { > > An empty line is needed before L632. > > The name 'is_valid_java_thread' looks wrong (or confusing) to me. > Something like 'is_alive_java_thread_with_tid()' would be better. > It'd better to list parameters in the opposite order. > > The call to is_valid_java_thread() is confusing: > ?? 627 } else if (java_thread != NULL && is_valid_java_thread(java_tid, java_thread)) { > > Why would the call ThreadTable::find_thread(java_tid) return a JavaThread with an unmatched java_tid? > > > Thanks, > Serguei > > On 6/28/19, 9:40 PM, "David Holmes" wrote: > > Hi Daniil, > > The definition and use of this hashtable (yet another hashtable > implementation!) will need careful examination. We have to be concerned > about the cost of maintaining it when it may never even be queried. You > would need to look at footprint cost and performance impact. > > Unfortunately I'm just about to board a plane and will be out for the > next few days. I will try to look at this asap next week, but we will > need a lot more data on it. > > Thanks, > David > > On 6/28/19 3:31 PM, Daniil Titov wrote: > Please review the change that improves performance of ThreadMXBean MXBean methods returning the > information for specific threads. The change introduces the thread table that uses ConcurrentHashTable > to store one-to-one the mapping between the thread ids and JavaThread objects and replaces the linear > search over the thread list in ThreadsList::find_JavaThread_from_java_tid(jlong tid) method with the lookup > in the thread table. > > Testing: Mach5 tier1,tier2 and tier3 tests successfully passed. > > Webrev: https://cr.openjdk.java.net/~dtitov/8185005/webrev.01/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8185005 > > Thanks! > > Best regards, > Daniil > > > > > > > From kim.barrett at oracle.com Mon Jul 8 23:00:12 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 8 Jul 2019 19:00:12 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: References: <72e29c28-7a97-36a0-a633-45d840ccbfb5@oracle.com> <06a48457-5ea3-1801-1409-fcb5f2e5d5b8@oracle.com> <9d02d527-76e0-8460-df38-bc80a0c8ef88@oracle.com> <068FF99F-5AAB-47CD-8DAE-C4435C4967F0@oracle.com> Message-ID: <6D2140BE-BBA0-4530-8A5B-E5B791F94767@oracle.com> > On Jul 7, 2019, at 8:08 PM, David Holmes wrote: > > On 7/07/2019 6:48 pm, Erik Osterlund wrote: >> The real danger is SPARC though and its BIS instructions. I don?t have the code in front of me, but I really hope not to see that switch statement and non-volatile loop in that pd_disjoint_words_atomic() function. > > sparc uses the same loop. > > Let's face it, almost no body expects the compiler to do these kinds of transformations. :( See JDK-8131330 and JDK-8142368, where we saw exactly this sort of transformation from a fill-loop to memset (which may use BIS, and indeed empirically does in some cases). The loops in question seem trivially convertible to memcpy/memmove. Also see JDK-8142349. >> And I agree that the atomic copying API should be used when we need atomic copying. And if it turns out the implementation of that API is not atomic, it should be fixed in that atomic copying API. > > I agree to some extent, but we assume atomic load/stores of words all over the place - and rightly so. The issue here is that we need to hide the loop inside an API that we can somehow prevent the C++ compiler from screwing up. It's hardly intuitive or obvious when this is needed e.g if I simply copy three adjacent words without a loop could the compiler convert that to a block move that is non-atomic? > >> So I think this change looks good. But it looks like we are not done yet. :c > > I agree that changing the current code to use the atomic copy API to convey intent is fine. I?ve been reserving Atomic::load/store for cases where the location ?ought? to be declared std::atomic if we were using C++11 atomics (or alternatively some homebrew equivalent). Not all places where we do stuff ?atomically? is appropriate for that though (consider card tables, being arrays of bytes, where using an atomic type might impose alignment constraints that are undesirable here). I *think* just using volatile here would likely be sufficient, e.g. we should have Copy::disjoint_words_atomic(const HeapWord* from,volatile HeapWord* to, size_t count) From kim.barrett at oracle.com Mon Jul 8 23:08:19 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 8 Jul 2019 19:08:19 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: References: Message-ID: <56F4779A-7C29-4ED1-A742-FD8B61326A76@oracle.com> > On Jul 6, 2019, at 9:53 AM, Daniel D. Daugherty wrote: > > Greetings, > > During the code review for the following fix: > > JDK-8227117 normal interpreter table is not restored after single stepping with TLH > https://bugs.openjdk.java.net/browse/JDK-8227117 > > Erik O. noticed a potential race with templateInterpreter.cpp: copy_table() > depending on C++ compiler optimizations. The following bug is being used > to fix this issue: > > JDK-8227338 templateInterpreter.cpp: copy_table() needs to be safer > https://bugs.openjdk.java.net/browse/JDK-8227338 > > Here's the webrev URL: > > http://cr.openjdk.java.net/~dcubed/8227338-webrev/0_for_jdk14/ > > This fix has been tested via Mach5 Tier[1-3] on Oracle's usual platforms. > Mach5 tier[4-6] is running now. It has also been tested with the manual > jdb test from JDK-8227117 using 'release' and 'fastdebug' bits. > > Thanks, in advance, for questions, comments or suggestions. > > Dan [This review is ignoring the issues around the current implementation of atomic copies discussed elsewhere in this thread. I assume those will be addressed elsewhere.] ------------------------------------------------------------------------------ src/hotspot/share/interpreter/templateInterpreter.cpp 286 while (size-- > 0) *to++ = *from++; [pre-existing] This ought to be using Copy::disjoint_words. That's even more obvious in conjunction with the change to use Copy::disjoint_words_atomic in the non-safepoint case. ------------------------------------------------------------------------------ src/hotspot/share/interpreter/templateInterpreter.cpp 284 if (SafepointSynchronize::is_at_safepoint()) { I wonder how much benefit we really get from having distinct safepoint and non-safepoint cases, rather than just unconditionally using Copy::disjoint_words_atomic. ------------------------------------------------------------------------------ From daniil.x.titov at oracle.com Mon Jul 8 23:11:44 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Mon, 08 Jul 2019 16:11:44 -0700 Subject: RFR: 8170299: Debugger does not stop inside the low memory notifications code In-Reply-To: <60639d41-735a-00d3-c9db-1955f581b89a@oracle.com> References: <75B5F778-DC49-494B-AC12-270F301677CA@oracle.com> <60639d41-735a-00d3-c9db-1955f581b89a@oracle.com> Message-ID: <2BDA2105-D987-438C-BC2A-052496F07B7F@oracle.com> Hi David, Sure, I will put it on hold till you are back from the vacations. Webrev: https://cr.openjdk.java.net/~dtitov/8170299/webrev.02/ Bug: https://bugs.openjdk.java.net/browse/JDK-8170299 Have a nice vacations! Best regards, Daniil ? On 7/3/19, 11:47 PM, "David Holmes" wrote: Hi Daniil, On 4/07/2019 1:04 pm, Daniil Titov wrote: > Please review the change the fixes the problem with the debugger not stopping in the low memory notification code. > > The problem here is that the ServiceThread that calls these MXBean listeners is hidden from the external view that prevents the debugger from stopping in it. > > The fix introduces new NotificationThread that is visible to the external view and offloads the ServiceThread from sending low memory and other notifications that could result in Java calls ( GC and diagnostic commands notifications) by moving these activities in this new NotificationThread. There is a long and unfortunate history with this bug. The original incarnation of this fix was introducing a new thread at the Java library level, and I had some concerns about that: http://mail.openjdk.java.net/pipermail/serviceability-dev/2017-December/022612.html That effort was resurrected at: http://mail.openjdk.java.net/pipermail/serviceability-dev/2018-July/024466.html and http://mail.openjdk.java.net/pipermail/serviceability-dev/2018-August/024849.html but was left somewhat in limbo. There was a lot of doubt about the right way to fix this bug and whether introducing a new thread was too disruptive. But introducing a new thread in the VM also has the same set of concerns! This needs consideration by the runtime team before going ahead. Introducing a new thread likes this needs to be examined in detail - particularly the synchronization interactions with other threads. It also introduces another monitor designated safepoint-never at a time when we are in the process of cleaning up monitors so that JavaThreads will only use safepoint-check-always monitors. Unfortunately I'm about to head out for two weeks vacation, and a number of other key runtime folk are also on vacation. but I'd ask that you hold off on this until we can look at it in more detail. Thanks, David ----- > Testing: Mach5 tier1,tier2 and tier3 tests succeeded. > > Webrev: https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8170299 > > Thanks! > --Daniil > > From kim.barrett at oracle.com Mon Jul 8 23:15:43 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 8 Jul 2019 19:15:43 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: <6D2140BE-BBA0-4530-8A5B-E5B791F94767@oracle.com> References: <72e29c28-7a97-36a0-a633-45d840ccbfb5@oracle.com> <06a48457-5ea3-1801-1409-fcb5f2e5d5b8@oracle.com> <9d02d527-76e0-8460-df38-bc80a0c8ef88@oracle.com> <068FF99F-5AAB-47CD-8DAE-C4435C4967F0@oracle.com> <6D2140BE-BBA0-4530-8A5B-E5B791F94767@oracle.com> Message-ID: <6A2FA4AE-8E22-4949-B6CC-B1587652DF6C@oracle.com> > On Jul 8, 2019, at 7:00 PM, Kim Barrett wrote: > Copy::disjoint_words_atomic(const HeapWord* from,volatile HeapWord* to, size_t count) Or maybe Copy::disjoint_words_atomic(const volatile HeapWord* from, volatile HeapWord* to, size_t count) From daniil.x.titov at oracle.com Mon Jul 8 23:24:15 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Mon, 08 Jul 2019 16:24:15 -0700 Subject: RFR: 8170299: Debugger does not stop inside the low memory notifications code In-Reply-To: <9e1ac22e-e425-1623-2e49-91418711d1c7@oracle.com> References: <92E6C72E-E1DC-48CB-9965-628AE5C3F9AB@oracle.com> <9e1ac22e-e425-1623-2e49-91418711d1c7@oracle.com> Message-ID: <5324D2D0-94CD-4A79-ADEB-064E713D34F7@oracle.com> Hi Serguei, I will put it on hold as David asked but before doing so I just wanted to give a quick reply to the questions you asked. Thanks! Best regards, Daniil From: "serguei.spitsyn at oracle.com" Date: Monday, July 8, 2019 at 3:09 PM To: Daniil Titov , OpenJDK Serviceability , "hotspot-runtime-dev at openjdk.java.net" , "jmx-dev at openjdk.java.net" , David Holmes Subject: Re: RFR: 8170299: Debugger does not stop inside the low memory notifications code Hi Daniil, Did you see a message from David Holmes? I do not see your reply. Specifically, David asked to hold on with this while he is on vacation for two weeks: > But introducing a new thread in the VM also has the same set of concerns! > This needs consideration by the runtime team before going ahead. > Introducing a new thread likes this needs to be examined in detail - > particularly the synchronization interactions with other threads. > > It also introduces another monitor designated safepoint-never at a time > when we are in the process of cleaning up monitors so that > JavaThreads will only use safepoint-check-always monitors. > > Unfortunately I'm about to head out for two weeks vacation, and > a number of other key runtime folk are also on vacation. > But I'd ask that you hold off on this until we can look at it in more detail. In fact, I was expecting this kind of concerns from David. Thanks, Serguei On 7/8/19 11:42, Daniil Titov wrote: Hi Serguei, Please review the new version of the fix that corrects the order of include statements in src/hotspot/share/runtime/notificationThread.cpp. The list of Include statements doesn't contain? "#include "runtime/mutexLocker.hpp" since this include file is already included by runtime/interfaceSupport.inline.hpp that is in this list. I don't think we need? the following function: static bool is_notification_thread(Thread* thread); ??? For the ServiceThread the function is_service_thread(Thread* thread) is used only once in the code. It is used inside JVmtiDeferredEvent::post() to assert that the proper thread is used to post these events. Low memory, GC and diagnostic command notification never had such asserts so I'm not sure? we need to introduce them regarding new NotificationThread. Webrev: https://cr.openjdk.java.net/~dtitov/8170299/webrev.02/ Bug: https://bugs.openjdk.java.net/browse/JDK-8170299 Thanks! --Daniil ?On 7/3/19, 9:02 PM, "serguei.spitsyn at oracle.com" wrote: ??? Hi Daniil, ??? ????I've not finished my review but it looks good in general. ??? ????A couple of quick comments. ??? ???? ????https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/src/hotspot/share/runtime/notificationThread.hpp.html ??? ??????I wonder if this function is also needed: ??????? static bool is_notification_thread(Thread* thread); ??? ???? ????https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/src/hotspot/share/runtime/notificationThread.cpp.html ??? ????I wonder why this include statement is missed: ?????? #include "runtime/mutexLocker.hpp" ??? ????Also, these have to be correctly ordred: ??? ???????29 #include "runtime/notificationThread.hpp" ?????? 30 #include "services/lowMemoryDetector.hpp" ?????? 31 #include "services/gcNotifier.hpp" ?????? 32 #include "services/diagnosticArgument.hpp" ?????? 33 #include "services/diagnosticFramework.hpp" ??? ???? ????Thanks, ? ??Serguei ??? ???? ????On 7/3/19 8:04 PM, Daniil Titov wrote: ??? > Please review the change the fixes the problem with the debugger not stopping in the low memory notification code. ??? > ??? > The problem here is that the ServiceThread that calls these MXBean listeners is hidden from the external view that prevents the debugger from stopping in it. ??? > ??? > The fix introduces new NotificationThread that is visible to the external view and offloads the ServiceThread from sending low memory and other notifications that could result in Java calls ( GC and diagnostic commands notifications) by moving these activities in this new NotificationThread. ??? > ??? > Testing: Mach5 tier1,tier2 and tier3 tests succeeded. ??? > ??? > Webrev: https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/ ??? > Bug: https://bugs.openjdk.java.net/browse/JDK-8170299 ??? > ??? > Thanks! ??? > --Daniil ??? > ??? > ??? ???? From jianglizhou at google.com Mon Jul 8 23:38:51 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Mon, 8 Jul 2019 16:38:51 -0700 Subject: RFR(S) 8226406: JVM fails to detect mismatched or corrupt CDS archive In-Reply-To: References: <1c5ab850-8b93-0b1f-3be4-8ba9f6eb469d@oracle.com> <4f7f0d69-9729-2d98-1437-83e5276b041c@oracle.com> Message-ID: Hi Calvin, - src/hotspot/share/include/cds.h 36 #define NUM_CDS_REGIONS 8 The above change would need to be hand fixed when backporting to older versions. It's fine to include it in the current review, but it's better to create a separate bug and commit using that bug ID. So it will make the backports cleaner. -------- 39 #define CDS_END_MAGIC 0xf00babae What's the significance of the new end magic? Should the existing header validation be sufficient as long as it's done first? -------- - src/hotspot/share/memory/filemap.cpp 901 if (_header->_magic != CDS_ARCHIVE_MAGIC && _header->_magic != CDS_DYNAMIC_ARCHIVE_MAGIC) { 902 unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : CDS_DYNAMIC_ARCHIVE_MAGIC; 903 log_info(cds)("_magic expected: 0x%08x", expected_magic); 904 log_info(cds)(" actual: 0x%08x", _header->_magic); 905 FileMapInfo::fail_continue("The shared archive file has a bad magic number."); 906 return false; 907 } ... 964 if (is_static) { 965 if (_header->_magic != CDS_ARCHIVE_MAGIC) { 966 fail_continue("Incorrect static archive magic number"); 967 return false; 968 } There are two checks for _header->_magic in FileMapInfo::init_from_file now but behave differently. The second one can be removed. The first check at line 901 should check the _magic value based on the 'is_static' flag: unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : CDS_DYNAMIC_ARCHIVE_MAGIC; if (_header->_magic != expected_magic) { ... -------- Most of the work now in FileMapInfo::init_from_file should really belong to FileMapInfo::validate_header. It would be cleaner to simply FileMapInfo::init_from_file to be the following and move the rest to FileMapInfo::validate_header. Thoughts? 888 bool FileMapInfo::init_from_file(int fd, bool is_static) { 889 size_t sz = is_static ? sizeof(FileMapHeader) : sizeof(DynamicArchiveHeader); 890 size_t n = os::read(fd, _header, (unsigned int)sz); 891 if (n != sz) { 892 fail_continue("Unable to read the file header."); 893 return false; 894 } 895 return true; } Best regards, Jiangli On Mon, Jul 8, 2019 at 10:25 AM Jiangli Zhou wrote: > > Hi Calvin, > > On Mon, Jul 8, 2019 at 10:00 AM Calvin Cheung wrote: > > > > Hi Jiangli, > > > > On 7/7/19 5:12 PM, Jiangli Zhou wrote: > > > Hi Calvin, > > > > > > Per our off-mailing-list email exchange from the previous code review > > > for https://bugs.openjdk.java.net/browse/JDK-8211723, I created > > > https://bugs.openjdk.java.net/browse/JDK-8227370, 'Remove > > > SharedPathsMiscInfo' > > Thanks for filing the RFE. > > > . I think the crash caused by premature runtime accessing of > > > _paths_misc_info_size should be handled as part of JDK-8227370, rather > > > than further patching up the SharedPathsMiscInfo > > > > My current patch involves checking most the fields in > > CDSFileMapHeaderBase before accessing other fields. This part is > > applicable to other fields, not only to the _paths_misc_info_size. This > > bug existed for a while and I think it would be a good backport > > candidate for 11u. The patch for JDK-8211723 and the follow-up RFE > > JDK-8227370 are not necessary to be backported to 11u. I'd like to fix > > this bug first and then handle JDK-8227370 as a separate changeset. > > That sounds like a good plan. A fix targeted for backporting should > have a clean-cut (less dependency) and controlled scope. Addressing > this incrementally in separate changesets is a suitable approach. > > I took a quick look over the weekend and noticed some issues with your > current patch. That's why I suggested to go with the complete removal > without spending extra effort on SharedPathsMiscInfo. I will need to > take a closer look and try to get back to you later today. > > Best regards, > Jiangli > > > > > thanks, > > > > Calvin > > > > > > > > Thanks and regards, > > > Jiangli > > > > > > On Wed, Jul 3, 2019 at 5:59 PM Calvin Cheung wrote: > > >> bug: https://bugs.openjdk.java.net/browse/JDK-8226406 > > >> > > >> webrev: http://cr.openjdk.java.net/~ccheung/8226406/webrev.00/ > > >> > > >> This bug was found during a bootcycle build when a shared archive built > > >> by a 64-bit JDK version is used by a 32-bit JDK version. It is due to > > >> some of the important header fields such as the _jvm_ident was not > > >> checked prior to accessinng other fields such as the _paths_misc_info_size. > > >> > > >> This fix involves checking most the fields in CDSFileMapHeaderBase > > >> before accessing other fields. > > >> > > >> Testing: tiers 1-3. > > >> > > >> thanks, > > >> > > >> Calvin > > >> From jianglizhou at google.com Mon Jul 8 23:45:43 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Mon, 8 Jul 2019 16:45:43 -0700 Subject: RFR(S) 8226406: JVM fails to detect mismatched or corrupt CDS archive In-Reply-To: References: <1c5ab850-8b93-0b1f-3be4-8ba9f6eb469d@oracle.com> <4f7f0d69-9729-2d98-1437-83e5276b041c@oracle.com> Message-ID: -#define CURRENT_CDS_ARCHIVE_VERSION 5 +#define CURRENT_CDS_ARCHIVE_VERSION 6 I would also suggestion to not do the above change in this bug fix since that would make all older versions to use '6' when backported (unless hand merge is involved). Thanks, Jiangli On Mon, Jul 8, 2019 at 4:38 PM Jiangli Zhou wrote: > > Hi Calvin, > > - src/hotspot/share/include/cds.h > > 36 #define NUM_CDS_REGIONS 8 > > The above change would need to be hand fixed when backporting to older > versions. It's fine to include it in the current review, but it's > better to create a separate bug and commit using that bug ID. So it > will make the backports cleaner. > > -------- > > 39 #define CDS_END_MAGIC 0xf00babae > > What's the significance of the new end magic? Should the existing > header validation be sufficient as long as it's done first? > > -------- > > - src/hotspot/share/memory/filemap.cpp > > 901 if (_header->_magic != CDS_ARCHIVE_MAGIC && _header->_magic != > CDS_DYNAMIC_ARCHIVE_MAGIC) { > 902 unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : > CDS_DYNAMIC_ARCHIVE_MAGIC; > 903 log_info(cds)("_magic expected: 0x%08x", expected_magic); > 904 log_info(cds)(" actual: 0x%08x", _header->_magic); > 905 FileMapInfo::fail_continue("The shared archive file has a bad > magic number."); > 906 return false; > 907 } > ... > > 964 if (is_static) { > 965 if (_header->_magic != CDS_ARCHIVE_MAGIC) { > 966 fail_continue("Incorrect static archive magic number"); > 967 return false; > 968 } > > There are two checks for _header->_magic in > FileMapInfo::init_from_file now but behave differently. The second one > can be removed. The first check at line 901 should check the _magic > value based on the 'is_static' flag: > > unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : > CDS_DYNAMIC_ARCHIVE_MAGIC; > if (_header->_magic != expected_magic) { > ... > > -------- > > Most of the work now in FileMapInfo::init_from_file should really > belong to FileMapInfo::validate_header. It would be cleaner to simply > FileMapInfo::init_from_file to be the following and move the rest to > FileMapInfo::validate_header. Thoughts? > > 888 bool FileMapInfo::init_from_file(int fd, bool is_static) { > 889 size_t sz = is_static ? sizeof(FileMapHeader) : > sizeof(DynamicArchiveHeader); > 890 size_t n = os::read(fd, _header, (unsigned int)sz); > 891 if (n != sz) { > 892 fail_continue("Unable to read the file header."); > 893 return false; > 894 } > 895 return true; > } > > Best regards, > > Jiangli > > > On Mon, Jul 8, 2019 at 10:25 AM Jiangli Zhou wrote: > > > > Hi Calvin, > > > > On Mon, Jul 8, 2019 at 10:00 AM Calvin Cheung wrote: > > > > > > Hi Jiangli, > > > > > > On 7/7/19 5:12 PM, Jiangli Zhou wrote: > > > > Hi Calvin, > > > > > > > > Per our off-mailing-list email exchange from the previous code review > > > > for https://bugs.openjdk.java.net/browse/JDK-8211723, I created > > > > https://bugs.openjdk.java.net/browse/JDK-8227370, 'Remove > > > > SharedPathsMiscInfo' > > > Thanks for filing the RFE. > > > > . I think the crash caused by premature runtime accessing of > > > > _paths_misc_info_size should be handled as part of JDK-8227370, rather > > > > than further patching up the SharedPathsMiscInfo > > > > > > My current patch involves checking most the fields in > > > CDSFileMapHeaderBase before accessing other fields. This part is > > > applicable to other fields, not only to the _paths_misc_info_size. This > > > bug existed for a while and I think it would be a good backport > > > candidate for 11u. The patch for JDK-8211723 and the follow-up RFE > > > JDK-8227370 are not necessary to be backported to 11u. I'd like to fix > > > this bug first and then handle JDK-8227370 as a separate changeset. > > > > That sounds like a good plan. A fix targeted for backporting should > > have a clean-cut (less dependency) and controlled scope. Addressing > > this incrementally in separate changesets is a suitable approach. > > > > I took a quick look over the weekend and noticed some issues with your > > current patch. That's why I suggested to go with the complete removal > > without spending extra effort on SharedPathsMiscInfo. I will need to > > take a closer look and try to get back to you later today. > > > > Best regards, > > Jiangli > > > > > > > > thanks, > > > > > > Calvin > > > > > > > > > > > Thanks and regards, > > > > Jiangli > > > > > > > > On Wed, Jul 3, 2019 at 5:59 PM Calvin Cheung wrote: > > > >> bug: https://bugs.openjdk.java.net/browse/JDK-8226406 > > > >> > > > >> webrev: http://cr.openjdk.java.net/~ccheung/8226406/webrev.00/ > > > >> > > > >> This bug was found during a bootcycle build when a shared archive built > > > >> by a 64-bit JDK version is used by a 32-bit JDK version. It is due to > > > >> some of the important header fields such as the _jvm_ident was not > > > >> checked prior to accessinng other fields such as the _paths_misc_info_size. > > > >> > > > >> This fix involves checking most the fields in CDSFileMapHeaderBase > > > >> before accessing other fields. > > > >> > > > >> Testing: tiers 1-3. > > > >> > > > >> thanks, > > > >> > > > >> Calvin > > > >> From patricio.chilano.mateo at oracle.com Tue Jul 9 00:10:20 2019 From: patricio.chilano.mateo at oracle.com (Patricio Chilano) Date: Mon, 8 Jul 2019 20:10:20 -0400 Subject: RFR 8191890: Biased locking still uses the inferior stop the world safepoint for revocation In-Reply-To: <6526490c-8a50-f00f-138e-9a16c2de1840@oracle.com> References: <4df1d772-bea5-0921-93fb-d6c20f29bcdd@oracle.com> <6526490c-8a50-f00f-138e-9a16c2de1840@oracle.com> Message-ID: <891323cf-10e1-450c-16ad-7196d5ea1c37@oracle.com> Hi Dan, On 7/8/19 5:37 PM, Daniel D. Daugherty wrote: > On 7/7/19 3:09 PM, Patricio Chilano wrote: >> Hi all, >> >> Below is the webrev for v05. This is just v04 on top of a new >> baseline that includes the backout of 8221734 and other changes made >> to biasedLocking code by 8225702 and 8225344. >> The only difference between v05 and v04 is the use of >> SafepointSynchronize::safepoint_id() instead of >> SafepointSynchronize::safepoint_counter() introduced by 8225702, and >> not having to remove method >> BiasedLocking::revoke_own_locks_in_handshake() and to edit method >> Deoptimization::revoke_using_handshake() which were actually removed >> by the backout of 8221734. >> >> Full Webrev: >> http://cr.openjdk.java.net/~pchilanomate/8191890/v05/webrev/ >> > > src/hotspot/share/interpreter/interpreterRuntime.cpp > ??? No comments. > > src/hotspot/share/runtime/biasedLocking.cpp > ??? No comments. > > src/hotspot/share/runtime/biasedLocking.hpp > ??? No comments. > > src/hotspot/share/runtime/deoptimization.cpp > ??? No comments. > > src/hotspot/share/runtime/handshake.cpp > ??? No comments. > > src/hotspot/share/runtime/vmOperations.hpp > ??? No comment. > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/VMOps.java > ??? The copyright year needs to be updated. Fixed! Do you need to see another webrev? > test/jdk/jdk/jfr/event/runtime/TestBiasedLockRevocationEvents.java > ??? No comments. > > I did the first pass review using: > > ??? $ jfilemerge -r open.8191890.v4.patch open.8191890.v5.patch > > and nothing jumped out at me. I did a quick review of the > v5 webrev and the only thing I noticed was the copyright > year mentioned above. > > I have to repeated a comment from the v01 code review: > > > Outstanding job on a very arcane and complicated part of the system. > > Thanks for sticking with this fix! Thanks Dan!? : ) Patricio > Dan > > >> >> Tested with tiers1-7. Running another round now. >> >> Thanks! >> Patricio > From calvin.cheung at oracle.com Tue Jul 9 06:29:16 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Mon, 8 Jul 2019 23:29:16 -0700 Subject: RFR(S) 8226406: JVM fails to detect mismatched or corrupt CDS archive In-Reply-To: References: <1c5ab850-8b93-0b1f-3be4-8ba9f6eb469d@oracle.com> <4f7f0d69-9729-2d98-1437-83e5276b041c@oracle.com> Message-ID: <53b972d6-a253-b2a7-e9f5-1d533753ca0d@oracle.com> On 7/8/19 4:38 PM, Jiangli Zhou wrote: > Hi Calvin, > > - src/hotspot/share/include/cds.h > > 36 #define NUM_CDS_REGIONS 8 > > The above change would need to be hand fixed when backporting to older > versions. It's fine to include it in the current review, but it's > better to create a separate bug and commit using that bug ID. So it > will make the backports cleaner. I don't think it is worthwhile filing a bug just for this line. I've added a comment as follows: 36 #define NUM_CDS_REGIONS 8 // this must be the same as MetaspaceShared::n_regions > > -------- > > 39 #define CDS_END_MAGIC 0xf00babae > > What's the significance of the new end magic? Should the existing > header validation be sufficient as long as it's done first? It seems unnecessary now. I got rid of it. > > -------- > > - src/hotspot/share/memory/filemap.cpp > > 901 if (_header->_magic != CDS_ARCHIVE_MAGIC && _header->_magic != > CDS_DYNAMIC_ARCHIVE_MAGIC) { > 902 unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : > CDS_DYNAMIC_ARCHIVE_MAGIC; > 903 log_info(cds)("_magic expected: 0x%08x", expected_magic); > 904 log_info(cds)(" actual: 0x%08x", _header->_magic); > 905 FileMapInfo::fail_continue("The shared archive file has a bad > magic number."); > 906 return false; > 907 } > ... > > 964 if (is_static) { > 965 if (_header->_magic != CDS_ARCHIVE_MAGIC) { > 966 fail_continue("Incorrect static archive magic number"); > 967 return false; > 968 } > > There are two checks for _header->_magic in > FileMapInfo::init_from_file now but behave differently. The second one > can be removed. The first check at line 901 should check the _magic > value based on the 'is_static' flag: > > unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : > CDS_DYNAMIC_ARCHIVE_MAGIC; > if (_header->_magic != expected_magic) { > ... I've made the above change. > > -------- > > Most of the work now in FileMapInfo::init_from_file should really > belong to FileMapInfo::validate_header. It would be cleaner to simply > FileMapInfo::init_from_file to be the following and move the rest to > FileMapInfo::validate_header. Thoughts? > > 888 bool FileMapInfo::init_from_file(int fd, bool is_static) { > 889 size_t sz = is_static ? sizeof(FileMapHeader) : > sizeof(DynamicArchiveHeader); > 890 size_t n = os::read(fd, _header, (unsigned int)sz); > 891 if (n != sz) { > 892 fail_continue("Unable to read the file header."); > 893 return false; > 894 } > 895 return true; > } The _file_offset will be based on the size_t n and some other fields (_paths_misc_info, SharedBaseAddress) will be set at lines 953 - 976. Also, there's the following check in validate_header(): 1859???? if (!ClassLoader::check_shared_paths_misc_info(_paths_misc_info, _header->_paths_misc_info_size, is_static)) { If the SharedPathsMiscInfo could be removed (JDK-8227370), then it is possible that validate_header could be called within init_from_file. I think we should defer this until JDK-8227370. updated webrev: ??? http://cr.openjdk.java.net/~ccheung/8226406/webrev.01/ thanks, Calvin > > Best regards, > > Jiangli > > > On Mon, Jul 8, 2019 at 10:25 AM Jiangli Zhou wrote: >> Hi Calvin, >> >> On Mon, Jul 8, 2019 at 10:00 AM Calvin Cheung wrote: >>> Hi Jiangli, >>> >>> On 7/7/19 5:12 PM, Jiangli Zhou wrote: >>>> Hi Calvin, >>>> >>>> Per our off-mailing-list email exchange from the previous code review >>>> for https://bugs.openjdk.java.net/browse/JDK-8211723, I created >>>> https://bugs.openjdk.java.net/browse/JDK-8227370, 'Remove >>>> SharedPathsMiscInfo' >>> Thanks for filing the RFE. >>>> . I think the crash caused by premature runtime accessing of >>>> _paths_misc_info_size should be handled as part of JDK-8227370, rather >>>> than further patching up the SharedPathsMiscInfo >>> My current patch involves checking most the fields in >>> CDSFileMapHeaderBase before accessing other fields. This part is >>> applicable to other fields, not only to the _paths_misc_info_size. This >>> bug existed for a while and I think it would be a good backport >>> candidate for 11u. The patch for JDK-8211723 and the follow-up RFE >>> JDK-8227370 are not necessary to be backported to 11u. I'd like to fix >>> this bug first and then handle JDK-8227370 as a separate changeset. >> That sounds like a good plan. A fix targeted for backporting should >> have a clean-cut (less dependency) and controlled scope. Addressing >> this incrementally in separate changesets is a suitable approach. >> >> I took a quick look over the weekend and noticed some issues with your >> current patch. That's why I suggested to go with the complete removal >> without spending extra effort on SharedPathsMiscInfo. I will need to >> take a closer look and try to get back to you later today. >> >> Best regards, >> Jiangli >> >>> thanks, >>> >>> Calvin >>> >>>> Thanks and regards, >>>> Jiangli >>>> >>>> On Wed, Jul 3, 2019 at 5:59 PM Calvin Cheung wrote: >>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8226406 >>>>> >>>>> webrev: http://cr.openjdk.java.net/~ccheung/8226406/webrev.00/ >>>>> >>>>> This bug was found during a bootcycle build when a shared archive built >>>>> by a 64-bit JDK version is used by a 32-bit JDK version. It is due to >>>>> some of the important header fields such as the _jvm_ident was not >>>>> checked prior to accessinng other fields such as the _paths_misc_info_size. >>>>> >>>>> This fix involves checking most the fields in CDSFileMapHeaderBase >>>>> before accessing other fields. >>>>> >>>>> Testing: tiers 1-3. >>>>> >>>>> thanks, >>>>> >>>>> Calvin >>>>> From calvin.cheung at oracle.com Tue Jul 9 06:34:28 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Mon, 8 Jul 2019 23:34:28 -0700 Subject: RFR(S) 8226406: JVM fails to detect mismatched or corrupt CDS archive In-Reply-To: References: <1c5ab850-8b93-0b1f-3be4-8ba9f6eb469d@oracle.com> <4f7f0d69-9729-2d98-1437-83e5276b041c@oracle.com> Message-ID: On 7/8/19 4:45 PM, Jiangli Zhou wrote: > -#define CURRENT_CDS_ARCHIVE_VERSION 5 > +#define CURRENT_CDS_ARCHIVE_VERSION 6 > > I would also suggestion to not do the above change in this bug fix > since that would make all older versions to use '6' when backported > (unless hand merge is involved). Since the _jvm_ident field has been moved to a different location, I think the CURRENT_CDS_ARCHIVE_VERSION should be updated. Even if the version stays the same, shared archive created by an older version of JVM cannot be used by the current JVM version. thanks, Calvin > > Thanks, > Jiangli > > On Mon, Jul 8, 2019 at 4:38 PM Jiangli Zhou wrote: >> Hi Calvin, >> >> - src/hotspot/share/include/cds.h >> >> 36 #define NUM_CDS_REGIONS 8 >> >> The above change would need to be hand fixed when backporting to older >> versions. It's fine to include it in the current review, but it's >> better to create a separate bug and commit using that bug ID. So it >> will make the backports cleaner. >> >> -------- >> >> 39 #define CDS_END_MAGIC 0xf00babae >> >> What's the significance of the new end magic? Should the existing >> header validation be sufficient as long as it's done first? >> >> -------- >> >> - src/hotspot/share/memory/filemap.cpp >> >> 901 if (_header->_magic != CDS_ARCHIVE_MAGIC && _header->_magic != >> CDS_DYNAMIC_ARCHIVE_MAGIC) { >> 902 unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : >> CDS_DYNAMIC_ARCHIVE_MAGIC; >> 903 log_info(cds)("_magic expected: 0x%08x", expected_magic); >> 904 log_info(cds)(" actual: 0x%08x", _header->_magic); >> 905 FileMapInfo::fail_continue("The shared archive file has a bad >> magic number."); >> 906 return false; >> 907 } >> ... >> >> 964 if (is_static) { >> 965 if (_header->_magic != CDS_ARCHIVE_MAGIC) { >> 966 fail_continue("Incorrect static archive magic number"); >> 967 return false; >> 968 } >> >> There are two checks for _header->_magic in >> FileMapInfo::init_from_file now but behave differently. The second one >> can be removed. The first check at line 901 should check the _magic >> value based on the 'is_static' flag: >> >> unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : >> CDS_DYNAMIC_ARCHIVE_MAGIC; >> if (_header->_magic != expected_magic) { >> ... >> >> -------- >> >> Most of the work now in FileMapInfo::init_from_file should really >> belong to FileMapInfo::validate_header. It would be cleaner to simply >> FileMapInfo::init_from_file to be the following and move the rest to >> FileMapInfo::validate_header. Thoughts? >> >> 888 bool FileMapInfo::init_from_file(int fd, bool is_static) { >> 889 size_t sz = is_static ? sizeof(FileMapHeader) : >> sizeof(DynamicArchiveHeader); >> 890 size_t n = os::read(fd, _header, (unsigned int)sz); >> 891 if (n != sz) { >> 892 fail_continue("Unable to read the file header."); >> 893 return false; >> 894 } >> 895 return true; >> } >> >> Best regards, >> >> Jiangli >> >> >> On Mon, Jul 8, 2019 at 10:25 AM Jiangli Zhou wrote: >>> Hi Calvin, >>> >>> On Mon, Jul 8, 2019 at 10:00 AM Calvin Cheung wrote: >>>> Hi Jiangli, >>>> >>>> On 7/7/19 5:12 PM, Jiangli Zhou wrote: >>>>> Hi Calvin, >>>>> >>>>> Per our off-mailing-list email exchange from the previous code review >>>>> for https://bugs.openjdk.java.net/browse/JDK-8211723, I created >>>>> https://bugs.openjdk.java.net/browse/JDK-8227370, 'Remove >>>>> SharedPathsMiscInfo' >>>> Thanks for filing the RFE. >>>>> . I think the crash caused by premature runtime accessing of >>>>> _paths_misc_info_size should be handled as part of JDK-8227370, rather >>>>> than further patching up the SharedPathsMiscInfo >>>> My current patch involves checking most the fields in >>>> CDSFileMapHeaderBase before accessing other fields. This part is >>>> applicable to other fields, not only to the _paths_misc_info_size. This >>>> bug existed for a while and I think it would be a good backport >>>> candidate for 11u. The patch for JDK-8211723 and the follow-up RFE >>>> JDK-8227370 are not necessary to be backported to 11u. I'd like to fix >>>> this bug first and then handle JDK-8227370 as a separate changeset. >>> That sounds like a good plan. A fix targeted for backporting should >>> have a clean-cut (less dependency) and controlled scope. Addressing >>> this incrementally in separate changesets is a suitable approach. >>> >>> I took a quick look over the weekend and noticed some issues with your >>> current patch. That's why I suggested to go with the complete removal >>> without spending extra effort on SharedPathsMiscInfo. I will need to >>> take a closer look and try to get back to you later today. >>> >>> Best regards, >>> Jiangli >>> >>>> thanks, >>>> >>>> Calvin >>>> >>>>> Thanks and regards, >>>>> Jiangli >>>>> >>>>> On Wed, Jul 3, 2019 at 5:59 PM Calvin Cheung wrote: >>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8226406 >>>>>> >>>>>> webrev: http://cr.openjdk.java.net/~ccheung/8226406/webrev.00/ >>>>>> >>>>>> This bug was found during a bootcycle build when a shared archive built >>>>>> by a 64-bit JDK version is used by a 32-bit JDK version. It is due to >>>>>> some of the important header fields such as the _jvm_ident was not >>>>>> checked prior to accessinng other fields such as the _paths_misc_info_size. >>>>>> >>>>>> This fix involves checking most the fields in CDSFileMapHeaderBase >>>>>> before accessing other fields. >>>>>> >>>>>> Testing: tiers 1-3. >>>>>> >>>>>> thanks, >>>>>> >>>>>> Calvin >>>>>> From matthias.baesken at sap.com Tue Jul 9 07:15:20 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Tue, 9 Jul 2019 07:15:20 +0000 Subject: RFR(xs): 8227031: Print NMT statistics on fatal errors Message-ID: Hi Thomas, In wonder about the following : MemTracker::final_report is called also from print_statistics() : hotspot/share/runtime/java.cpp ----------------------------------------------------- void print_statistics() { ... 353 // Native memory tracking data 354 if (PrintNMTStatistics) { 355 MemTracker::final_report(tty); 356 } Would this mean that when called before from print_statistics() , we would not call it again from vmError because of the g_final_report_did_run check ? src/hotspot/share/services/memTracker.cpp ----------------------------------------------- 179 static volatile bool g_final_report_did_run = false; 180 void MemTracker::final_report(outputStream* output) { 181 // This function is called during both error reporting and normal VM exit. 182 // However, it should only ever run once. E.g. if the VM crashes after 183 // printing the final report during normal VM exit, it should not print 184 // the final report again. In addition, it should be guarded from 185 // recursive calls in case NMT reporting itself crashes. 186 if (Atomic::cmpxchg(true, &g_final_report_did_run, false) == false) { 187 NMT_TrackingLevel level = tracking_level(); 188 if (level >= NMT_summary) { 189 report(level == NMT_summary, output); 190 } 191 } 192 } Is this really what we want ? Of course we want to avoid printing it twice (or more than that ) from error reporting. But I think we would miss it from error reporting in some situations when we want it there . Otherwise looks okay to me . Best regards, Matthias >Hi all, > >We have -XX:+-PrintNMTStatistics, a very useful switch which will cause the >VM to print out the NMT statistics if the VM exits normally. > >Currently it does not work if the VM exits due to a fatal error. But >especially in fatal exits due to native OOM a NMT report would be very >helpful. > >JBS: https://bugs.openjdk.java.net/browse/JDK-8227031 > >cr: >http://cr.openjdk.java.net/~stuefe/webrevs/8227031-optionally-print-nmt-report-on-oom/webrev.00/webrev/index.html > >Changes in this patch: >- handle PrintNMTStatistics on fatal error >- make sure the final report is not called twice accidentally and it is not >called recursively due to secondary error handling >- change the Metaspace report portion of the NMT report to only include the >brief metaspace report - that one can be called at any time, it does not >lock nor require any resources. > >Please note: this will not work when we are in an OOM situation and request >a detailed NMT report; that scenario needs more work since NMT detailed >reports need memory as well. That is a separate issue. From thomas.stuefe at gmail.com Tue Jul 9 07:31:36 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 9 Jul 2019 09:31:36 +0200 Subject: RFR(xs): 8227031: Print NMT statistics on fatal errors In-Reply-To: References: Message-ID: Hi Matthias, On Tue, Jul 9, 2019 at 9:15 AM Baesken, Matthias wrote: > Hi Thomas, In wonder about the following : > > MemTracker::final_report is called also from print_statistics() : > > hotspot/share/runtime/java.cpp > ----------------------------------------------------- > void print_statistics() { > ... > 353 // Native memory tracking data > 354 if (PrintNMTStatistics) { > 355 MemTracker::final_report(tty); > 356 } > > > Would this mean that when called before from print_statistics() , we > would not call it again from vmError because of the > g_final_report_did_run check ? > > src/hotspot/share/services/memTracker.cpp > ----------------------------------------------- > > 179 static volatile bool g_final_report_did_run = false; > 180 void MemTracker::final_report(outputStream* output) { > 181 // This function is called during both error reporting and normal VM > exit. > 182 // However, it should only ever run once. E.g. if the VM crashes > after > 183 // printing the final report during normal VM exit, it should not > print > 184 // the final report again. In addition, it should be guarded from > 185 // recursive calls in case NMT reporting itself crashes. > 186 if (Atomic::cmpxchg(true, &g_final_report_did_run, false) == false) { > 187 NMT_TrackingLevel level = tracking_level(); > 188 if (level >= NMT_summary) { > 189 report(level == NMT_summary, output); > 190 } > 191 } > 192 } > > Is this really what we want ? Of course we want to avoid printing it > twice (or more than that ) from error reporting. > But I think we would miss it from error reporting in some situations when > we want it there . > > This is exactly what I wanted: MemTracker::final_report(tty) is supposed to print the final report. It is called in two places, during normal VM shutdown (A) and during error handling (B). By only allowing the code to run once I get the behaviour I wanted: Case 1: normal shutdown: we execute (A) from before_exit(), all is well. Case 2: we crash before normal shutdown: we execute (B) from within the error handler. Case 3: we crash during normal shutdown: we executed already (A) and hence (B) is a noop Case 4: crash within (!) MemTracker::final_report(): Case 4.1: MemTracker::final_report() was called during normal shutdown and crashed (A): - we enter error handling, but we will not re-enter NMT reporting at (B) which is good since NMT reporting is not reentrant. Case 4.2: MemTracker::final_report() was called during error handling and crashed (B): - we enter the secondary signal handler, restart VMError::report_and_die(), but will not attempt to print NMT report again Especially Case 4 is important, since it can lead to hanging error reporting since NMT is not reentrant and will suffocate on its own lock. Arguably, Case 2 and 3 are "just" aesthetics and prevent seeing the same report twice. > Otherwise looks okay to me . > > Thanks! > Best regards, Matthias > > Cheers, Thomas > > > >Hi all, > > > >We have -XX:+-PrintNMTStatistics, a very useful switch which will cause > the > >VM to print out the NMT statistics if the VM exits normally. > > > >Currently it does not work if the VM exits due to a fatal error. But > >especially in fatal exits due to native OOM a NMT report would be very > >helpful. > > > >JBS: https://bugs.openjdk.java.net/browse/JDK-8227031 > > > >cr: > > > http://cr.openjdk.java.net/~stuefe/webrevs/8227031-optionally-print-nmt-report-on-oom/webrev.00/webrev/index.html > > > >Changes in this patch: > >- handle PrintNMTStatistics on fatal error > >- make sure the final report is not called twice accidentally and it is > not > >called recursively due to secondary error handling > >- change the Metaspace report portion of the NMT report to only include > the > >brief metaspace report - that one can be called at any time, it does not > >lock nor require any resources. > > > >Please note: this will not work when we are in an OOM situation and > request > >a detailed NMT report; that scenario needs more work since NMT detailed > >reports need memory as well. That is a separate issue. > > > > From goetz.lindenmaier at sap.com Tue Jul 9 09:01:17 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 9 Jul 2019 09:01:17 +0000 Subject: RFR(S): 8227255: Switchable helpful NullPointerExceptions In-Reply-To: <8d4b125a-2c3b-bb84-6572-8cb903c68130@oracle.com> References: <8d4b125a-2c3b-bb84-6572-8cb903c68130@oracle.com> Message-ID: Hi David, I had made another separate issue for this: https://bugs.openjdk.java.net/browse/JDK-8221077 I assume you want this to be merged into 8218628, too? Best regards, Goetz. > -----Original Message----- > From: David Holmes > Sent: Donnerstag, 4. Juli 2019 22:36 > To: Lindenmaier, Goetz ; hotspot-runtime- > dev at openjdk.java.net > Cc: Coleen Phillimore (coleen.phillimore at oracle.com) > > Subject: Re: RFR(S): 8227255: Switchable helpful NullPointerExceptions > > Hi Goetz, > > On 4/07/2019 8:59 pm, Lindenmaier, Goetz wrote: > > Hi David, > > > > the implementation of the JEP is to be found here: > > https://bugs.openjdk.java.net/browse/JDK-8218628 > > http://cr.openjdk.java.net/~goetz/wr19/8218628-exMsg-NPE/12/ > > > > I thought it's good to keep different aspects in changes of their > > own > > I expect to see one issue being used to push a complete implementation > for a JEP. Independent parts unrelated to the actual JEP can be split > off but anything dependent on the JEP should be done all together. > > > also as I need two CSRs: one to mention there are new messages > > and one to mention there is a new flag. > > You can use one CSR, for 8218628, to cover all aspects. > > David > ------ > > > It also simplifies reviews a lot. > > > > Best regards, > > Goetz. > > > >> -----Original Message----- > >> From: David Holmes > >> Sent: Donnerstag, 4. Juli 2019 12:52 > >> To: Lindenmaier, Goetz ; hotspot-runtime- > >> dev at openjdk.java.net > >> Cc: Coleen Phillimore (coleen.phillimore at oracle.com) > >> > >> Subject: Re: RFR(S): 8227255: Switchable helpful NullPointerExceptions > >> > >> Sorry Goetz but I am confused by this. You have a JEP that is still in > >> draft but here you have a RFR for a change related to that JEP but not > >> the implementation of that JEP ??? I only expect to see one issue filed > >> to implement that JEP including the creation of the flag to > >> enable/disable it. The introduction of the flag should be part of the > >> JEP as well. > >> > >> That said you may as well get the CSR going in parallel with the JEP. > >> > >> David > >> > >> On 4/07/2019 8:35 pm, Lindenmaier, Goetz wrote: > >>> Hi, > >>> > >>> please review this small change. > >>> > >>> http://cr.openjdk.java.net/~goetz/wr19/8227255-NPE-switchable/01/ > >>> > >>> It will be part of JEP 8220715. > >>> > >>> https://bugs.openjdk.java.net/browse/JDK-8220715 > >>> > >>> The exception messages proposed there will first be > >>> > >>> off per default. After gathering experience, they > >>> > >>> will be turned on per default. > >>> > >>> I was asked to use a manageable flag so it can be switched > >>> > >>> by jcmd. > >>> > >>> The flag: SuppressCodeDetailsInExceptionMessages > >>> > >>> ? "Suppress"? because the feature is meant to be on per > >>> > >>> ????????????? ?????????default in the long run. Then you'll have to > >>> > >>> ????????????? ?????????use -XX:_+_ if using the switch. > >>> > >>> ? "CodeDetails" tries to summarize the concerns with > >>> > >>> ??the message. > >>> > >>> The flag does not mention NPE so it can ?be used in > >>> > >>> other, similar cases. > >>> > >>> If there are not objections to the flag name, I'll file a > >>> > >>> CSR.? Or should I wait with the CSR until the JEP is > >>> > >>> targeted? > >>> > >>> Best regards, > >>> > >>> ? Goetz. > >>> From goetz.lindenmaier at sap.com Tue Jul 9 09:06:51 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 9 Jul 2019 09:06:51 +0000 Subject: RFR(S): 8227255: Switchable helpful NullPointerExceptions In-Reply-To: References: Message-ID: Hi any comments on the name of the flag or the corresponding documentation? (This change will be merged into 8218628...) Best regards, Goetz. > -----Original Message----- > From: Lindenmaier, Goetz > Sent: Donnerstag, 4. Juli 2019 12:36 > To: hotspot-runtime-dev at openjdk.java.net > Cc: Coleen Phillimore (coleen.phillimore at oracle.com) > ; David Holmes > Subject: RFR(S): 8227255: Switchable helpful NullPointerExceptions > > Hi, > > please review this small change. > http://cr.openjdk.java.net/~goetz/wr19/8227255-NPE-switchable/01/ > > It will be part of JEP 8220715 > https://bugs.openjdk.java.net/browse/JDK-8220715 > > The exception messages proposed there will first be > off per default. After gathering experience, they > will be turned on per default. > I was asked to use a manageable flag so it can be switched > by jcmd. > > The flag: SuppressCodeDetailsInExceptionMessages > "Suppress" because the feature is meant to be on per > default in the long run. Then you'll have to > use -XX:_+_ if using the switch. > "CodeDetails" tries to summarize the concerns with > the message. > > The flag does not mention NPE so it can be used in > other, similar cases. > > If there are not objections to the flag name, I'll file a > CSR. Or should I wait with the CSR until the JEP is > targeted? > > Best regards, > Goetz. From thomas.stuefe at gmail.com Tue Jul 9 09:23:33 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 9 Jul 2019 11:23:33 +0200 Subject: RFR[13, xs]: 8227275: Within native OOM error handling, assertions may hang the process Message-ID: Dear all, may I please have reviews for the following issue: JBS: https://bugs.openjdk.java.net/browse/JDK-8227275 cr: http://cr.openjdk.java.net/~stuefe/webrevs/8227275-native-oom-hanging-assertions/webrev.00/webrev/ Summary: on OOM, we may fail to disarm assertion poison page; this may lead to endless loops during error handling if assertions happen in native OOM scenarios. For more details, pls see the JBS issue. Thanks, Thomas From daniel.daugherty at oracle.com Tue Jul 9 12:38:01 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 9 Jul 2019 08:38:01 -0400 Subject: RFR 8191890: Biased locking still uses the inferior stop the world safepoint for revocation In-Reply-To: <891323cf-10e1-450c-16ad-7196d5ea1c37@oracle.com> References: <4df1d772-bea5-0921-93fb-d6c20f29bcdd@oracle.com> <6526490c-8a50-f00f-138e-9a16c2de1840@oracle.com> <891323cf-10e1-450c-16ad-7196d5ea1c37@oracle.com> Message-ID: <9ec89683-528b-532b-9ee0-414fc6816ee2@oracle.com> On 7/8/19 8:10 PM, Patricio Chilano wrote: > Hi Dan, > > On 7/8/19 5:37 PM, Daniel D. Daugherty wrote: >> On 7/7/19 3:09 PM, Patricio Chilano wrote: >>> Hi all, >>> >>> Below is the webrev for v05. This is just v04 on top of a new >>> baseline that includes the backout of 8221734 and other changes made >>> to biasedLocking code by 8225702 and 8225344. >>> The only difference between v05 and v04 is the use of >>> SafepointSynchronize::safepoint_id() instead of >>> SafepointSynchronize::safepoint_counter() introduced by 8225702, and >>> not having to remove method >>> BiasedLocking::revoke_own_locks_in_handshake() and to edit method >>> Deoptimization::revoke_using_handshake() which were actually removed >>> by the backout of 8221734. >>> >>> Full Webrev: >>> http://cr.openjdk.java.net/~pchilanomate/8191890/v05/webrev/ >>> >> >> src/hotspot/share/interpreter/interpreterRuntime.cpp >> ??? No comments. >> >> src/hotspot/share/runtime/biasedLocking.cpp >> ??? No comments. >> >> src/hotspot/share/runtime/biasedLocking.hpp >> ??? No comments. >> >> src/hotspot/share/runtime/deoptimization.cpp >> ??? No comments. >> >> src/hotspot/share/runtime/handshake.cpp >> ??? No comments. >> >> src/hotspot/share/runtime/vmOperations.hpp >> ??? No comment. >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/VMOps.java >> ??? The copyright year needs to be updated. > Fixed! Do you need to see another webrev? No new webrev needed. Sorry, I should said that. Dan > >> test/jdk/jdk/jfr/event/runtime/TestBiasedLockRevocationEvents.java >> ??? No comments. >> >> I did the first pass review using: >> >> ??? $ jfilemerge -r open.8191890.v4.patch open.8191890.v5.patch >> >> and nothing jumped out at me. I did a quick review of the >> v5 webrev and the only thing I noticed was the copyright >> year mentioned above. >> >> I have to repeated a comment from the v01 code review: >> >> > Outstanding job on a very arcane and complicated part of the system. >> >> Thanks for sticking with this fix! > Thanks Dan!? : ) > > > Patricio >> Dan >> >> >>> >>> Tested with tiers1-7. Running another round now. >>> >>> Thanks! >>> Patricio >> > From daniel.daugherty at oracle.com Tue Jul 9 13:13:10 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 9 Jul 2019 09:13:10 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: <6D2140BE-BBA0-4530-8A5B-E5B791F94767@oracle.com> References: <72e29c28-7a97-36a0-a633-45d840ccbfb5@oracle.com> <06a48457-5ea3-1801-1409-fcb5f2e5d5b8@oracle.com> <9d02d527-76e0-8460-df38-bc80a0c8ef88@oracle.com> <068FF99F-5AAB-47CD-8DAE-C4435C4967F0@oracle.com> <6D2140BE-BBA0-4530-8A5B-E5B791F94767@oracle.com> Message-ID: <5d53d895-a351-53c3-bdb0-26dbc63a76cb@oracle.com> Hi Kim, Thanks for the review. On 7/8/19 7:00 PM, Kim Barrett wrote: >> On Jul 7, 2019, at 8:08 PM, David Holmes wrote: >> >> On 7/07/2019 6:48 pm, Erik Osterlund wrote: >>> The real danger is SPARC though and its BIS instructions. I don?t have the code in front of me, but I really hope not to see that switch statement and non-volatile loop in that pd_disjoint_words_atomic() function. >> sparc uses the same loop. >> >> Let's face it, almost no body expects the compiler to do these kinds of transformations. :( > See JDK-8131330 and JDK-8142368, where we saw exactly this sort of transformation from a fill-loop > to memset (which may use BIS, and indeed empirically does in some cases). The loops in question > seem trivially convertible to memcpy/memmove. Very interesting reads. Thanks for pointing those out. src/hotspot/share/interpreter/templateInterpreter.cpp: DispatchTable TemplateInterpreter::_active_table; DispatchTable TemplateInterpreter::_normal_table; DispatchTable TemplateInterpreter::_safept_table; So it seems like changing _active_table to: volatile DispatchTable TemplateInterpreter::_active_table; might be a good idea... Do you concur? > Also see JDK-8142349. > >>> And I agree that the atomic copying API should be used when we need atomic copying. And if it turns out the implementation of that API is not atomic, it should be fixed in that atomic copying API. >> I agree to some extent, but we assume atomic load/stores of words all over the place - and rightly so. The issue here is that we need to hide the loop inside an API that we can somehow prevent the C++ compiler from screwing up. It's hardly intuitive or obvious when this is needed e.g if I simply copy three adjacent words without a loop could the compiler convert that to a block move that is non-atomic? >> >>> So I think this change looks good. But it looks like we are not done yet. :c >> I agree that changing the current code to use the atomic copy API to convey intent is fine. > I?ve been reserving Atomic::load/store for cases where the location ?ought? to be declared std::atomic if > we were using C++11 atomics (or alternatively some homebrew equivalent). Not all places where we do > stuff ?atomically? is appropriate for that though (consider card tables, being arrays of bytes, where using an > atomic type might impose alignment constraints that are undesirable here). I *think* just using volatile > here would likely be sufficient, e.g. we should have > > Copy::disjoint_words_atomic(const HeapWord* from,volatile HeapWord* to, size_t count) I think this part should be taken up in the follow bug that I filed: ??? JDK-8227369 pd_disjoint_words_atomic() needs to be atomic ??? https://bugs.openjdk.java.net/browse/JDK-8227369 Thanks for chiming in on the review! Dan From daniel.daugherty at oracle.com Tue Jul 9 13:19:30 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 9 Jul 2019 09:19:30 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: <56F4779A-7C29-4ED1-A742-FD8B61326A76@oracle.com> References: <56F4779A-7C29-4ED1-A742-FD8B61326A76@oracle.com> Message-ID: <6e2d84ad-5591-35bb-990c-134f6531ec9b@oracle.com> Hi Kim, Thanks for the review. On 7/8/19 7:08 PM, Kim Barrett wrote: >> On Jul 6, 2019, at 9:53 AM, Daniel D. Daugherty wrote: >> >> Greetings, >> >> During the code review for the following fix: >> >> JDK-8227117 normal interpreter table is not restored after single stepping with TLH >> https://bugs.openjdk.java.net/browse/JDK-8227117 >> >> Erik O. noticed a potential race with templateInterpreter.cpp: copy_table() >> depending on C++ compiler optimizations. The following bug is being used >> to fix this issue: >> >> JDK-8227338 templateInterpreter.cpp: copy_table() needs to be safer >> https://bugs.openjdk.java.net/browse/JDK-8227338 >> >> Here's the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8227338-webrev/0_for_jdk14/ >> >> This fix has been tested via Mach5 Tier[1-3] on Oracle's usual platforms. >> Mach5 tier[4-6] is running now. It has also been tested with the manual >> jdb test from JDK-8227117 using 'release' and 'fastdebug' bits. >> >> Thanks, in advance, for questions, comments or suggestions. >> >> Dan > [This review is ignoring the issues around the current implementation > of atomic copies discussed elsewhere in this thread. I assume those > will be addressed elsewhere.] > > ------------------------------------------------------------------------------ > src/hotspot/share/interpreter/templateInterpreter.cpp > 286 while (size-- > 0) *to++ = *from++; > > [pre-existing] > > This ought to be using Copy::disjoint_words. That's even more obvious > in conjunction with the change to use Copy::disjoint_words_atomic in > the non-safepoint case. I can make that change. Is there a specific advantage/reason that you have in mind here? > ------------------------------------------------------------------------------ > src/hotspot/share/interpreter/templateInterpreter.cpp > 284 if (SafepointSynchronize::is_at_safepoint()) { > > I wonder how much benefit we really get from having distinct safepoint > and non-safepoint cases, rather than just unconditionally using > Copy::disjoint_words_atomic. Sorry, I don't know the answer to that. My intention was to use Copy::disjoint_words_atomic() only in the case where I knew that I needed it so no potential impact on existing uses at a safepoint. Thanks for the review. Dan > > ------------------------------------------------------------------------------ > From daniel.daugherty at oracle.com Tue Jul 9 13:20:28 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 9 Jul 2019 09:20:28 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: <6A2FA4AE-8E22-4949-B6CC-B1587652DF6C@oracle.com> References: <72e29c28-7a97-36a0-a633-45d840ccbfb5@oracle.com> <06a48457-5ea3-1801-1409-fcb5f2e5d5b8@oracle.com> <9d02d527-76e0-8460-df38-bc80a0c8ef88@oracle.com> <068FF99F-5AAB-47CD-8DAE-C4435C4967F0@oracle.com> <6D2140BE-BBA0-4530-8A5B-E5B791F94767@oracle.com> <6A2FA4AE-8E22-4949-B6CC-B1587652DF6C@oracle.com> Message-ID: On 7/8/19 7:15 PM, Kim Barrett wrote: >> On Jul 8, 2019, at 7:00 PM, Kim Barrett wrote: >> Copy::disjoint_words_atomic(const HeapWord* from,volatile HeapWord* to, size_t count) > Or maybe > > Copy::disjoint_words_atomic(const volatile HeapWord* from, volatile HeapWord* to, size_t count) > I think this part should be taken up in the follow bug that I filed: ??? JDK-8227369 pd_disjoint_words_atomic() needs to be atomic ??? https://bugs.openjdk.java.net/browse/JDK-8227369 Thanks for chiming in on the review! Dan From daniel.daugherty at oracle.com Tue Jul 9 13:46:40 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 9 Jul 2019 09:46:40 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: <5d53d895-a351-53c3-bdb0-26dbc63a76cb@oracle.com> References: <72e29c28-7a97-36a0-a633-45d840ccbfb5@oracle.com> <06a48457-5ea3-1801-1409-fcb5f2e5d5b8@oracle.com> <9d02d527-76e0-8460-df38-bc80a0c8ef88@oracle.com> <068FF99F-5AAB-47CD-8DAE-C4435C4967F0@oracle.com> <6D2140BE-BBA0-4530-8A5B-E5B791F94767@oracle.com> <5d53d895-a351-53c3-bdb0-26dbc63a76cb@oracle.com> Message-ID: <1803e1c6-1a7b-b25d-2521-e68f8df63230@oracle.com> On 7/9/19 9:13 AM, Daniel D. Daugherty wrote: > Hi Kim, > > Thanks for the review. > > > On 7/8/19 7:00 PM, Kim Barrett wrote: >>> On Jul 7, 2019, at 8:08 PM, David Holmes >>> wrote: >>> >>> On 7/07/2019 6:48 pm, Erik Osterlund wrote: >>>> The real danger is SPARC though and its BIS instructions. I don?t >>>> have the code in front of me, but I really hope not to see that >>>> switch statement and non-volatile loop in that >>>> pd_disjoint_words_atomic() function. >>> sparc uses the same loop. >>> >>> Let's face it, almost no body expects the compiler to do these kinds >>> of transformations. :( >> See JDK-8131330 and JDK-8142368, where we saw exactly this sort of >> transformation from a fill-loop >> to memset (which may use BIS, and indeed empirically does in some >> cases).? The loops in question >> seem trivially convertible to memcpy/memmove. > > Very interesting reads. Thanks for pointing those out. > > src/hotspot/share/interpreter/templateInterpreter.cpp: > > DispatchTable TemplateInterpreter::_active_table; > DispatchTable TemplateInterpreter::_normal_table; > DispatchTable TemplateInterpreter::_safept_table; > > So it seems like changing _active_table to: > > volatile DispatchTable TemplateInterpreter::_active_table; > > might be a good idea... Do you concur? This change would require a bunch of additional changes so I'm not planning to make it (way too intrusive). Dan > > >> Also see JDK-8142349. >> >>>> And I agree that the atomic copying API should be used when we need >>>> atomic copying. And if it turns out the implementation of that API >>>> is not atomic, it should be fixed in that atomic copying API. >>> I agree to some extent, but we assume atomic load/stores of words >>> all over the place - and rightly so. The issue here is that we need >>> to hide the loop inside an API that we can somehow prevent the C++ >>> compiler from screwing up. It's hardly intuitive or obvious when >>> this is needed e.g if I simply copy three adjacent words without a >>> loop could the compiler convert that to a block move that is >>> non-atomic? >>> >>>> So I think this change looks good. But it looks like we are not >>>> done yet. :c >>> I agree that changing the current code to use the atomic copy API to >>> convey intent is fine. >> I?ve been reserving Atomic::load/store for cases where the location >> ?ought? to be declared std::atomic if >> we were using C++11 atomics (or alternatively some homebrew >> equivalent).? Not all places where we do >> stuff ?atomically? is appropriate for that though (consider card >> tables, being arrays of bytes, where using an >> atomic type might impose alignment constraints that are >> undesirable here).? I *think* just using volatile >> here would likely be sufficient, e.g. we should have >> >> ???? Copy::disjoint_words_atomic(const HeapWord* from,volatile >> HeapWord* to, size_t count) > > I think this part should be taken up in the follow bug that I filed: > > ??? JDK-8227369 pd_disjoint_words_atomic() needs to be atomic > ??? https://bugs.openjdk.java.net/browse/JDK-8227369 > > Thanks for chiming in on the review! > > Dan > > From coleen.phillimore at oracle.com Tue Jul 9 14:05:20 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 9 Jul 2019 10:05:20 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: <1803e1c6-1a7b-b25d-2521-e68f8df63230@oracle.com> References: <72e29c28-7a97-36a0-a633-45d840ccbfb5@oracle.com> <06a48457-5ea3-1801-1409-fcb5f2e5d5b8@oracle.com> <9d02d527-76e0-8460-df38-bc80a0c8ef88@oracle.com> <068FF99F-5AAB-47CD-8DAE-C4435C4967F0@oracle.com> <6D2140BE-BBA0-4530-8A5B-E5B791F94767@oracle.com> <5d53d895-a351-53c3-bdb0-26dbc63a76cb@oracle.com> <1803e1c6-1a7b-b25d-2521-e68f8df63230@oracle.com> Message-ID: On 7/9/19 9:46 AM, Daniel D. Daugherty wrote: > On 7/9/19 9:13 AM, Daniel D. Daugherty wrote: >> Hi Kim, >> >> Thanks for the review. >> >> >> On 7/8/19 7:00 PM, Kim Barrett wrote: >>>> On Jul 7, 2019, at 8:08 PM, David Holmes >>>> wrote: >>>> >>>> On 7/07/2019 6:48 pm, Erik Osterlund wrote: >>>>> The real danger is SPARC though and its BIS instructions. I don?t >>>>> have the code in front of me, but I really hope not to see that >>>>> switch statement and non-volatile loop in that >>>>> pd_disjoint_words_atomic() function. >>>> sparc uses the same loop. >>>> >>>> Let's face it, almost no body expects the compiler to do these >>>> kinds of transformations. :( >>> See JDK-8131330 and JDK-8142368, where we saw exactly this sort of >>> transformation from a fill-loop >>> to memset (which may use BIS, and indeed empirically does in some >>> cases).? The loops in question >>> seem trivially convertible to memcpy/memmove. >> >> Very interesting reads. Thanks for pointing those out. >> >> src/hotspot/share/interpreter/templateInterpreter.cpp: >> >> DispatchTable TemplateInterpreter::_active_table; >> DispatchTable TemplateInterpreter::_normal_table; >> DispatchTable TemplateInterpreter::_safept_table; >> >> So it seems like changing _active_table to: >> >> volatile DispatchTable TemplateInterpreter::_active_table; >> >> might be a good idea... Do you concur? > > This change would require a bunch of additional changes so I'm > not planning to make it (way too intrusive). Can you file an additional RFE to examine the uses of dispatch tables for when we only have handshakes for safepoints?? And capture this idea of making the tables volatile? thanks, Coleen > > Dan > > >> >> >>> Also see JDK-8142349. >>> >>>>> And I agree that the atomic copying API should be used when we >>>>> need atomic copying. And if it turns out the implementation of >>>>> that API is not atomic, it should be fixed in that atomic copying >>>>> API. >>>> I agree to some extent, but we assume atomic load/stores of words >>>> all over the place - and rightly so. The issue here is that we need >>>> to hide the loop inside an API that we can somehow prevent the C++ >>>> compiler from screwing up. It's hardly intuitive or obvious when >>>> this is needed e.g if I simply copy three adjacent words without a >>>> loop could the compiler convert that to a block move that is >>>> non-atomic? >>>> >>>>> So I think this change looks good. But it looks like we are not >>>>> done yet. :c >>>> I agree that changing the current code to use the atomic copy API >>>> to convey intent is fine. >>> I?ve been reserving Atomic::load/store for cases where the location >>> ?ought? to be declared std::atomic if >>> we were using C++11 atomics (or alternatively some homebrew >>> equivalent).? Not all places where we do >>> stuff ?atomically? is appropriate for that though (consider card >>> tables, being arrays of bytes, where using an >>> atomic type might impose alignment constraints that are >>> undesirable here).? I *think* just using volatile >>> here would likely be sufficient, e.g. we should have >>> >>> ???? Copy::disjoint_words_atomic(const HeapWord* from,volatile >>> HeapWord* to, size_t count) >> >> I think this part should be taken up in the follow bug that I filed: >> >> ??? JDK-8227369 pd_disjoint_words_atomic() needs to be atomic >> ??? https://bugs.openjdk.java.net/browse/JDK-8227369 >> >> Thanks for chiming in on the review! >> >> Dan >> >> > From daniel.daugherty at oracle.com Tue Jul 9 14:09:45 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 9 Jul 2019 10:09:45 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: References: Message-ID: <4c7150d1-24be-9586-dda4-af60d1ef1b5b@oracle.com> Greetings, I've made one minor tweak based on Kim's code review. Here's the full webrev: http://cr.openjdk.java.net/~dcubed/8227338-webrev/1_for_jdk14.full/ Here's the incremental webrev: http://cr.openjdk.java.net/~dcubed/8227338-webrev/1_for_jdk14.inc/ Here's the context diff: $ hg diff diff -r 32fe92d8b539 src/hotspot/share/interpreter/templateInterpreter.cpp --- a/src/hotspot/share/interpreter/templateInterpreter.cpp??? Mon Jul 08 16:58:27 2019 -0400 +++ b/src/hotspot/share/interpreter/templateInterpreter.cpp??? Tue Jul 09 10:02:46 2019 -0400 @@ -283,7 +283,7 @@ ?? // Copy non-overlapping tables. ?? if (SafepointSynchronize::is_at_safepoint()) { ???? // Nothing is using the table at a safepoint so skip atomic word copy. -??? while (size-- > 0) *to++ = *from++; +??? Copy::disjoint_words((HeapWord*)from, (HeapWord*)to, (size_t)size); ?? } else { ???? // Use atomic word copy when not at a safepoint for safety. ???? Copy::disjoint_words_atomic((HeapWord*)from, (HeapWord*)to, (size_t)size); Thanks, in advance, for questions, comments or suggestions. Dan On 7/6/19 9:53 AM, Daniel D. Daugherty wrote: > Greetings, > > During the code review for the following fix: > > ??? JDK-8227117 normal interpreter table is not restored after single > stepping with TLH > ??? https://bugs.openjdk.java.net/browse/JDK-8227117 > > Erik O. noticed a potential race with templateInterpreter.cpp: > copy_table() > depending on C++ compiler optimizations. The following bug is being used > to fix this issue: > > ??? JDK-8227338 templateInterpreter.cpp: copy_table() needs to be safer > ??? https://bugs.openjdk.java.net/browse/JDK-8227338 > > Here's the webrev URL: > > ??? http://cr.openjdk.java.net/~dcubed/8227338-webrev/0_for_jdk14/ > > This fix has been tested via Mach5 Tier[1-3] on Oracle's usual platforms. > Mach5 tier[4-6] is running now. It has also been tested with the manual > jdb test from JDK-8227117 using 'release' and 'fastdebug' bits. > > Thanks, in advance, for questions, comments or suggestions. > > Dan > From daniel.daugherty at oracle.com Tue Jul 9 14:17:36 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 9 Jul 2019 10:17:36 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: References: <72e29c28-7a97-36a0-a633-45d840ccbfb5@oracle.com> <06a48457-5ea3-1801-1409-fcb5f2e5d5b8@oracle.com> <9d02d527-76e0-8460-df38-bc80a0c8ef88@oracle.com> <068FF99F-5AAB-47CD-8DAE-C4435C4967F0@oracle.com> <6D2140BE-BBA0-4530-8A5B-E5B791F94767@oracle.com> <5d53d895-a351-53c3-bdb0-26dbc63a76cb@oracle.com> <1803e1c6-1a7b-b25d-2521-e68f8df63230@oracle.com> Message-ID: <96c69abc-5450-688b-22d6-81879472ede0@oracle.com> > Can you file an additional RFE to examine the uses of dispatch tables > for when we only have handshakes for safepoints?? And capture this > idea of making the tables volatile? Done. ??? JDK-8227443 TemplateInterpreter::_active_table needs to be reexamined ??? https://bugs.openjdk.java.net/browse/JDK-8227443 Feel free to update the new RFE. Dan On 7/9/19 10:05 AM, coleen.phillimore at oracle.com wrote: > > > On 7/9/19 9:46 AM, Daniel D. Daugherty wrote: >> On 7/9/19 9:13 AM, Daniel D. Daugherty wrote: >>> Hi Kim, >>> >>> Thanks for the review. >>> >>> >>> On 7/8/19 7:00 PM, Kim Barrett wrote: >>>>> On Jul 7, 2019, at 8:08 PM, David Holmes >>>>> wrote: >>>>> >>>>> On 7/07/2019 6:48 pm, Erik Osterlund wrote: >>>>>> The real danger is SPARC though and its BIS instructions. I don?t >>>>>> have the code in front of me, but I really hope not to see that >>>>>> switch statement and non-volatile loop in that >>>>>> pd_disjoint_words_atomic() function. >>>>> sparc uses the same loop. >>>>> >>>>> Let's face it, almost no body expects the compiler to do these >>>>> kinds of transformations. :( >>>> See JDK-8131330 and JDK-8142368, where we saw exactly this sort of >>>> transformation from a fill-loop >>>> to memset (which may use BIS, and indeed empirically does in some >>>> cases).? The loops in question >>>> seem trivially convertible to memcpy/memmove. >>> >>> Very interesting reads. Thanks for pointing those out. >>> >>> src/hotspot/share/interpreter/templateInterpreter.cpp: >>> >>> DispatchTable TemplateInterpreter::_active_table; >>> DispatchTable TemplateInterpreter::_normal_table; >>> DispatchTable TemplateInterpreter::_safept_table; >>> >>> So it seems like changing _active_table to: >>> >>> volatile DispatchTable TemplateInterpreter::_active_table; >>> >>> might be a good idea... Do you concur? >> >> This change would require a bunch of additional changes so I'm >> not planning to make it (way too intrusive). > > Can you file an additional RFE to examine the uses of dispatch tables > for when we only have handshakes for safepoints?? And capture this > idea of making the tables volatile? > > thanks, > Coleen >> >> Dan >> >> >>> >>> >>>> Also see JDK-8142349. >>>> >>>>>> And I agree that the atomic copying API should be used when we >>>>>> need atomic copying. And if it turns out the implementation of >>>>>> that API is not atomic, it should be fixed in that atomic copying >>>>>> API. >>>>> I agree to some extent, but we assume atomic load/stores of words >>>>> all over the place - and rightly so. The issue here is that we >>>>> need to hide the loop inside an API that we can somehow prevent >>>>> the C++ compiler from screwing up. It's hardly intuitive or >>>>> obvious when this is needed e.g if I simply copy three adjacent >>>>> words without a loop could the compiler convert that to a block >>>>> move that is non-atomic? >>>>> >>>>>> So I think this change looks good. But it looks like we are not >>>>>> done yet. :c >>>>> I agree that changing the current code to use the atomic copy API >>>>> to convey intent is fine. >>>> I?ve been reserving Atomic::load/store for cases where the location >>>> ?ought? to be declared std::atomic if >>>> we were using C++11 atomics (or alternatively some homebrew >>>> equivalent).? Not all places where we do >>>> stuff ?atomically? is appropriate for that though (consider card >>>> tables, being arrays of bytes, where using an >>>> atomic type might impose alignment constraints that are >>>> undesirable here).? I *think* just using volatile >>>> here would likely be sufficient, e.g. we should have >>>> >>>> ???? Copy::disjoint_words_atomic(const HeapWord* from,volatile >>>> HeapWord* to, size_t count) >>> >>> I think this part should be taken up in the follow bug that I filed: >>> >>> ??? JDK-8227369 pd_disjoint_words_atomic() needs to be atomic >>> ??? https://bugs.openjdk.java.net/browse/JDK-8227369 >>> >>> Thanks for chiming in on the review! >>> >>> Dan >>> >>> >> > From daniil.x.titov at oracle.com Tue Jul 9 15:37:42 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Tue, 09 Jul 2019 08:37:42 -0700 Subject: 8170299: Debugger does not stop inside the low memory notifications code In-Reply-To: <0221970f-ec8c-6ffc-836e-2adf6eb09eb0@oracle.com> References: <92E6C72E-E1DC-48CB-9965-628AE5C3F9AB@oracle.com> <9e1ac22e-e425-1623-2e49-91418711d1c7@oracle.com> <5324D2D0-94CD-4A79-ADEB-064E713D34F7@oracle.com> <0221970f-ec8c-6ffc-836e-2adf6eb09eb0@oracle.com> Message-ID: <1BABD981-5D4A-4018-B899-16EBD0D33C6E@oracle.com> Hi Serguei, I tried to answer your question regarding "runtime/mutexLocker.hpp" in my reply. > The list of Include statements doesn't contain "#include "runtime/mutexLocker.hpp" since this include file is already included by runtime/interfaceSupport.inline.hpp that is in this list. File "src/hotspot/share/runtime/notificationThread.cpp" includes "runtime/interfaceSupport.inline.hpp" and the header file "runtime/interfaceSupport.inline.hpp", in turns, includes "runtime/mutexLocker.hpp". Therefore, there is no need in having " #include "runtime/mutexLocker.hpp" statement in "src/hotspot/share/runtime/notificationThread.cpp" file since the header file "runtime/mutexLocker.hpp" is already included. Thanks, Daniil From: "serguei.spitsyn at oracle.com" Date: Tuesday, July 9, 2019 at 1:37 AM To: Daniil Titov , OpenJDK Serviceability , "hotspot-runtime-dev at openjdk.java.net" , "jmx-dev at openjdk.java.net" , David Holmes Subject: Re: RFR: 8170299: Debugger does not stop inside the low memory notifications code Hi Daniil, On 7/8/19 16:24, Daniil Titov wrote: Hi Serguei, ? I will put it on hold as David asked but before doing so I just wanted to give a quick reply to the questions you asked. You did not answer my question about include: ?????? #include "runtime/mutexLocker.hpp" I'll provide a complete review after you sort out the David's concerns. Thanks, Serguei ? Thanks! ? Best regards, Daniil ? ? ? From: mailto:serguei.spitsyn at oracle.com mailto:serguei.spitsyn at oracle.com Date: Monday, July 8, 2019 at 3:09 PM To: Daniil Titov mailto:daniil.x.titov at oracle.com, OpenJDK Serviceability mailto:serviceability-dev at openjdk.java.net, mailto:hotspot-runtime-dev at openjdk.java.net mailto:hotspot-runtime-dev at openjdk.java.net, mailto:jmx-dev at openjdk.java.net mailto:jmx-dev at openjdk.java.net, David Holmes mailto:david.holmes at oracle.com Subject: Re: RFR: 8170299: Debugger does not stop inside the low memory notifications code ? Hi Daniil, Did you see a message from David Holmes? I do not see your reply. Specifically, David asked to hold on with this while he is on vacation for two weeks: > But introducing a new thread in the VM also has the same set of concerns! > This needs consideration by the runtime team before going ahead. > Introducing a new thread likes this needs to be examined in detail - > particularly the synchronization interactions with other threads. > > It also introduces another monitor designated safepoint-never at a time > when we are in the process of cleaning up monitors so that > JavaThreads will only use safepoint-check-always monitors. > > Unfortunately I'm about to head out for two weeks vacation, and > a number of other key runtime folk are also on vacation. > But I'd ask that you hold off on this until we can look at it in more detail. In fact, I was expecting this kind of concerns from David. Thanks, Serguei On 7/8/19 11:42, Daniil Titov wrote: Hi Serguei, ? Please review the new version of the fix that corrects the order of include statements in src/hotspot/share/runtime/notificationThread.cpp. ? The list of Include statements doesn't contain? "#include "runtime/mutexLocker.hpp" since this include file is already included by runtime/interfaceSupport.inline.hpp that is in this list. ? I don't think we need? the following function: static bool is_notification_thread(Thread* thread); ??? For the ServiceThread the function is_service_thread(Thread* thread) is used only once in the code. It is used inside JVmtiDeferredEvent::post() to assert that the proper thread is used to post these events. Low memory, GC and diagnostic command notification never had such asserts so I'm not sure? we need to introduce them regarding new NotificationThread. ? Webrev: https://cr.openjdk.java.net/~dtitov/8170299/webrev.02/ Bug: https://bugs.openjdk.java.net/browse/JDK-8170299 ? Thanks! --Daniil ? ? ?On 7/3/19, 9:02 PM, mailto:serguei.spitsyn at oracle.com mailto:serguei.spitsyn at oracle.com wrote: ? ??? Hi Daniil, ??? ????I've not finished my review but it looks good in general. ??? ????A couple of quick comments. ??? ???? ????https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/src/hotspot/share/runtime/notificationThread.hpp.html ??? ??????I wonder if this function is also needed: ??????? static bool is_notification_thread(Thread* thread); ??? ???? ????https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/src/hotspot/share/runtime/notificationThread.cpp.html ??? ????I wonder why this include statement is missed: ?????? #include "runtime/mutexLocker.hpp" ??? ????Also, these have to be correctly ordred: ??? ???????29 #include "runtime/notificationThread.hpp" ?????? 30 #include "services/lowMemoryDetector.hpp" ?????? 31 #include "services/gcNotifier.hpp" ?????? 32 #include "services/diagnosticArgument.hpp" ?????? 33 #include "services/diagnosticFramework.hpp" ??? ???? ????Thanks, ? ??Serguei ??? ???? ????On 7/3/19 8:04 PM, Daniil Titov wrote: ??? > Please review the change the fixes the problem with the debugger not stopping in the low memory notification code. ??? > ??? > The problem here is that the ServiceThread that calls these MXBean listeners is hidden from the external view that prevents the debugger from stopping in it. ??? > ??? > The fix introduces new NotificationThread that is visible to the external view and offloads the ServiceThread from sending low memory and other notifications that could result in Java calls ( GC and diagnostic commands notifications) by moving these activities in this new NotificationThread. ??? > ??? > Testing: Mach5 tier1,tier2 and tier3 tests succeeded. ??? > ??? > Webrev: https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/ ??? > Bug: https://bugs.openjdk.java.net/browse/JDK-8170299 ??? > ??? > Thanks! ??? > --Daniil ??? > ??? > ??? ???? ? ? From serguei.spitsyn at oracle.com Tue Jul 9 16:02:11 2019 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 9 Jul 2019 09:02:11 -0700 Subject: 8170299: Debugger does not stop inside the low memory notifications code In-Reply-To: <1BABD981-5D4A-4018-B899-16EBD0D33C6E@oracle.com> References: <92E6C72E-E1DC-48CB-9965-628AE5C3F9AB@oracle.com> <9e1ac22e-e425-1623-2e49-91418711d1c7@oracle.com> <5324D2D0-94CD-4A79-ADEB-064E713D34F7@oracle.com> <0221970f-ec8c-6ffc-836e-2adf6eb09eb0@oracle.com> <1BABD981-5D4A-4018-B899-16EBD0D33C6E@oracle.com> Message-ID: Hi Daniil, Missed it, sorry. Thanks, Serguei On 7/9/19 08:37, Daniil Titov wrote: > Hi Serguei, > > I tried to answer your question regarding "runtime/mutexLocker.hpp" in my reply. > >> The list of Include statements doesn't contain "#include "runtime/mutexLocker.hpp" since this include file is already included by runtime/interfaceSupport.inline.hpp that is in this list. > File "src/hotspot/share/runtime/notificationThread.cpp" includes "runtime/interfaceSupport.inline.hpp" and the header file "runtime/interfaceSupport.inline.hpp", in turns, includes "runtime/mutexLocker.hpp". Therefore, there is no need in having " #include "runtime/mutexLocker.hpp" statement in "src/hotspot/share/runtime/notificationThread.cpp" file since the header file "runtime/mutexLocker.hpp" is already included. > > Thanks, > Daniil > > From: "serguei.spitsyn at oracle.com" > Date: Tuesday, July 9, 2019 at 1:37 AM > To: Daniil Titov , OpenJDK Serviceability , "hotspot-runtime-dev at openjdk.java.net" , "jmx-dev at openjdk.java.net" , David Holmes > Subject: Re: RFR: 8170299: Debugger does not stop inside the low memory notifications code > > Hi Daniil, > > > On 7/8/19 16:24, Daniil Titov wrote: > Hi Serguei, > > I will put it on hold as David asked but before doing so I just wanted to give a quick reply to the questions you asked. > > You did not answer my question about include: > ?????? #include "runtime/mutexLocker.hpp" > > I'll provide a complete review after you sort out the David's concerns. > > Thanks, > Serguei > > Thanks! > > Best regards, > Daniil > > > > From: mailto:serguei.spitsyn at oracle.com mailto:serguei.spitsyn at oracle.com > Date: Monday, July 8, 2019 at 3:09 PM > To: Daniil Titov mailto:daniil.x.titov at oracle.com, OpenJDK Serviceability mailto:serviceability-dev at openjdk.java.net, mailto:hotspot-runtime-dev at openjdk.java.net mailto:hotspot-runtime-dev at openjdk.java.net, mailto:jmx-dev at openjdk.java.net mailto:jmx-dev at openjdk.java.net, David Holmes mailto:david.holmes at oracle.com > Subject: Re: RFR: 8170299: Debugger does not stop inside the low memory notifications code > > Hi Daniil, > > Did you see a message from David Holmes? > I do not see your reply. > > Specifically, David asked to hold on with this while he is on vacation for two weeks: > >> But introducing a new thread in the VM also has the same set of concerns! >> This needs consideration by the runtime team before going ahead. >> Introducing a new thread likes this needs to be examined in detail - >> particularly the synchronization interactions with other threads. >> >> It also introduces another monitor designated safepoint-never at a time >> when we are in the process of cleaning up monitors so that >> JavaThreads will only use safepoint-check-always monitors. >> >> Unfortunately I'm about to head out for two weeks vacation, and >> a number of other key runtime folk are also on vacation. >> But I'd ask that you hold off on this until we can look at it in more detail. > > In fact, I was expecting this kind of concerns from David. > > Thanks, > Serguei > > > On 7/8/19 11:42, Daniil Titov wrote: > Hi Serguei, > > Please review the new version of the fix that corrects the order of include statements in src/hotspot/share/runtime/notificationThread.cpp. > > The list of Include statements doesn't contain? "#include "runtime/mutexLocker.hpp" since this include file is already included by runtime/interfaceSupport.inline.hpp that is in this list. > > I don't think we need? the following function: > static bool is_notification_thread(Thread* thread); > > For the ServiceThread the function is_service_thread(Thread* thread) is used only once in the code. It is used inside JVmtiDeferredEvent::post() to assert that the proper thread is used to post these events. Low memory, GC and diagnostic command notification never had such asserts so I'm not sure? we need to introduce them regarding new NotificationThread. > > Webrev: https://cr.openjdk.java.net/~dtitov/8170299/webrev.02/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8170299 > > Thanks! > --Daniil > > > ?On 7/3/19, 9:02 PM, mailto:serguei.spitsyn at oracle.com mailto:serguei.spitsyn at oracle.com wrote: > > ??? Hi Daniil, > > ????I've not finished my review but it looks good in general. > > ????A couple of quick comments. > > > ????https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/src/hotspot/share/runtime/notificationThread.hpp.html > > ??????I wonder if this function is also needed: > ??????? static bool is_notification_thread(Thread* thread); > > > ????https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/src/hotspot/share/runtime/notificationThread.cpp.html > > ????I wonder why this include statement is missed: > ?????? #include "runtime/mutexLocker.hpp" > > ????Also, these have to be correctly ordred: > > ???????29 #include "runtime/notificationThread.hpp" > ?????? 30 #include "services/lowMemoryDetector.hpp" > ?????? 31 #include "services/gcNotifier.hpp" > ?????? 32 #include "services/diagnosticArgument.hpp" > ?????? 33 #include "services/diagnosticFramework.hpp" > > > ????Thanks, > ? ??Serguei > > > ????On 7/3/19 8:04 PM, Daniil Titov wrote: > ??? > Please review the change the fixes the problem with the debugger not stopping in the low memory notification code. > ??? > > ??? > The problem here is that the ServiceThread that calls these MXBean listeners is hidden from the external view that prevents the debugger from stopping in it. > ??? > > ??? > The fix introduces new NotificationThread that is visible to the external view and offloads the ServiceThread from sending low memory and other notifications that could result in Java calls ( GC and diagnostic commands notifications) by moving these activities in this new NotificationThread. > ??? > > ??? > Testing: Mach5 tier1,tier2 and tier3 tests succeeded. > ??? > > ??? > Webrev: https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/ > ??? > Bug: https://bugs.openjdk.java.net/browse/JDK-8170299 > ??? > > ??? > Thanks! > ??? > --Daniil > ??? > > ??? > > > > > > > > > > From serguei.spitsyn at oracle.com Tue Jul 9 16:34:15 2019 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 9 Jul 2019 09:34:15 -0700 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: <4c7150d1-24be-9586-dda4-af60d1ef1b5b@oracle.com> References: <4c7150d1-24be-9586-dda4-af60d1ef1b5b@oracle.com> Message-ID: Hi Dan, This looks good too. Thanks, Serguei n 7/9/19 07:09, Daniel D. Daugherty wrote: > Greetings, > > I've made one minor tweak based on Kim's code review. > > Here's the full webrev: > > http://cr.openjdk.java.net/~dcubed/8227338-webrev/1_for_jdk14.full/ > > Here's the incremental webrev: > > http://cr.openjdk.java.net/~dcubed/8227338-webrev/1_for_jdk14.inc/ > > Here's the context diff: > > $ hg diff > diff -r 32fe92d8b539 > src/hotspot/share/interpreter/templateInterpreter.cpp > --- a/src/hotspot/share/interpreter/templateInterpreter.cpp??? Mon Jul > 08 16:58:27 2019 -0400 > +++ b/src/hotspot/share/interpreter/templateInterpreter.cpp??? Tue Jul > 09 10:02:46 2019 -0400 > @@ -283,7 +283,7 @@ > ?? // Copy non-overlapping tables. > ?? if (SafepointSynchronize::is_at_safepoint()) { > ???? // Nothing is using the table at a safepoint so skip atomic word > copy. > -??? while (size-- > 0) *to++ = *from++; > +??? Copy::disjoint_words((HeapWord*)from, (HeapWord*)to, (size_t)size); > ?? } else { > ???? // Use atomic word copy when not at a safepoint for safety. > ???? Copy::disjoint_words_atomic((HeapWord*)from, (HeapWord*)to, > (size_t)size); > > Thanks, in advance, for questions, comments or suggestions. > > Dan > > > On 7/6/19 9:53 AM, Daniel D. Daugherty wrote: >> Greetings, >> >> During the code review for the following fix: >> >> ??? JDK-8227117 normal interpreter table is not restored after single >> stepping with TLH >> ??? https://bugs.openjdk.java.net/browse/JDK-8227117 >> >> Erik O. noticed a potential race with templateInterpreter.cpp: >> copy_table() >> depending on C++ compiler optimizations. The following bug is being used >> to fix this issue: >> >> ??? JDK-8227338 templateInterpreter.cpp: copy_table() needs to be safer >> ??? https://bugs.openjdk.java.net/browse/JDK-8227338 >> >> Here's the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8227338-webrev/0_for_jdk14/ >> >> This fix has been tested via Mach5 Tier[1-3] on Oracle's usual >> platforms. >> Mach5 tier[4-6] is running now. It has also been tested with the manual >> jdb test from JDK-8227117 using 'release' and 'fastdebug' bits. >> >> Thanks, in advance, for questions, comments or suggestions. >> >> Dan >> > From daniel.daugherty at oracle.com Tue Jul 9 18:02:10 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 9 Jul 2019 14:02:10 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: References: <4c7150d1-24be-9586-dda4-af60d1ef1b5b@oracle.com> Message-ID: <329f7412-b5f7-16db-1646-93b2112efbd2@oracle.com> Serguei, Thanks for the quick re-review! Dan On 7/9/19 12:34 PM, serguei.spitsyn at oracle.com wrote: > Hi Dan, > > This looks good too. > > Thanks, > Serguei > > > n 7/9/19 07:09, Daniel D. Daugherty wrote: >> Greetings, >> >> I've made one minor tweak based on Kim's code review. >> >> Here's the full webrev: >> >> http://cr.openjdk.java.net/~dcubed/8227338-webrev/1_for_jdk14.full/ >> >> Here's the incremental webrev: >> >> http://cr.openjdk.java.net/~dcubed/8227338-webrev/1_for_jdk14.inc/ >> >> Here's the context diff: >> >> $ hg diff >> diff -r 32fe92d8b539 >> src/hotspot/share/interpreter/templateInterpreter.cpp >> --- a/src/hotspot/share/interpreter/templateInterpreter.cpp Mon Jul >> 08 16:58:27 2019 -0400 >> +++ b/src/hotspot/share/interpreter/templateInterpreter.cpp Tue Jul >> 09 10:02:46 2019 -0400 >> @@ -283,7 +283,7 @@ >> ?? // Copy non-overlapping tables. >> ?? if (SafepointSynchronize::is_at_safepoint()) { >> ???? // Nothing is using the table at a safepoint so skip atomic word >> copy. >> -??? while (size-- > 0) *to++ = *from++; >> +??? Copy::disjoint_words((HeapWord*)from, (HeapWord*)to, (size_t)size); >> ?? } else { >> ???? // Use atomic word copy when not at a safepoint for safety. >> ???? Copy::disjoint_words_atomic((HeapWord*)from, (HeapWord*)to, >> (size_t)size); >> >> Thanks, in advance, for questions, comments or suggestions. >> >> Dan >> >> >> On 7/6/19 9:53 AM, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> During the code review for the following fix: >>> >>> ??? JDK-8227117 normal interpreter table is not restored after >>> single stepping with TLH >>> ??? https://bugs.openjdk.java.net/browse/JDK-8227117 >>> >>> Erik O. noticed a potential race with templateInterpreter.cpp: >>> copy_table() >>> depending on C++ compiler optimizations. The following bug is being >>> used >>> to fix this issue: >>> >>> ??? JDK-8227338 templateInterpreter.cpp: copy_table() needs to be safer >>> ??? https://bugs.openjdk.java.net/browse/JDK-8227338 >>> >>> Here's the webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8227338-webrev/0_for_jdk14/ >>> >>> This fix has been tested via Mach5 Tier[1-3] on Oracle's usual >>> platforms. >>> Mach5 tier[4-6] is running now. It has also been tested with the manual >>> jdb test from JDK-8227117 using 'release' and 'fastdebug' bits. >>> >>> Thanks, in advance, for questions, comments or suggestions. >>> >>> Dan >>> >> > From coleen.phillimore at oracle.com Tue Jul 9 21:11:41 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 9 Jul 2019 17:11:41 -0400 Subject: RFR[13, xs]: 8227275: Within native OOM error handling, assertions may hang the process In-Reply-To: References: Message-ID: http://cr.openjdk.java.net/~stuefe/webrevs/8227275-native-oom-hanging-assertions/webrev.00/webrev/src/hotspot/share/utilities/debug.cpp.udiff.html I don't understand why you don't just leave the poison page PROT_NONE and call this from handle_assert_poison_fault? +void disarm_assert_poison() { + g_assert_poison = &g_dummy; +} + Then you don't have to check that it succeeded.?? At this point, it doesn't matter. Coleen On 7/9/19 5:23 AM, Thomas St?fe wrote: > Dear all, > > may I please have reviews for the following issue: > > JBS: https://bugs.openjdk.java.net/browse/JDK-8227275 > cr: > http://cr.openjdk.java.net/~stuefe/webrevs/8227275-native-oom-hanging-assertions/webrev.00/webrev/ > > Summary: on OOM, we may fail to disarm assertion poison page; this may lead > to endless loops during error handling if assertions happen in native OOM > scenarios. > > For more details, pls see the JBS issue. > > Thanks, Thomas From kim.barrett at oracle.com Tue Jul 9 23:05:50 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 9 Jul 2019 19:05:50 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: <5d53d895-a351-53c3-bdb0-26dbc63a76cb@oracle.com> References: <72e29c28-7a97-36a0-a633-45d840ccbfb5@oracle.com> <06a48457-5ea3-1801-1409-fcb5f2e5d5b8@oracle.com> <9d02d527-76e0-8460-df38-bc80a0c8ef88@oracle.com> <068FF99F-5AAB-47CD-8DAE-C4435C4967F0@oracle.com> <6D2140BE-BBA0-4530-8A5B-E5B791F94767@oracle.com> <5d53d895-a351-53c3-bdb0-26dbc63a76cb@oracle.com> Message-ID: <43A40540-A760-4C5A-B3CC-D69B276D8A60@oracle.com> > On Jul 9, 2019, at 9:13 AM, Daniel D. Daugherty wrote: > > Hi Kim, > > Thanks for the review. More like drive by commentary :) I?ve never really looked at the interpreter code, and make no claim to understand it at all. I *think* I understand what?s going on with this change, but I don?t think you should count me toward the requisite number of reviewers. > On 7/8/19 7:00 PM, Kim Barrett wrote: >>> On Jul 7, 2019, at 8:08 PM, David Holmes wrote: >>> >>> On 7/07/2019 6:48 pm, Erik Osterlund wrote: >>>> The real danger is SPARC though and its BIS instructions. I don?t have the code in front of me, but I really hope not to see that switch statement and non-volatile loop in that pd_disjoint_words_atomic() function. >>> sparc uses the same loop. >>> >>> Let's face it, almost no body expects the compiler to do these kinds of transformations. :( >> See JDK-8131330 and JDK-8142368, where we saw exactly this sort of transformation from a fill-loop >> to memset (which may use BIS, and indeed empirically does in some cases). The loops in question >> seem trivially convertible to memcpy/memmove. > > Very interesting reads. Thanks for pointing those out. > > src/hotspot/share/interpreter/templateInterpreter.cpp: > > DispatchTable TemplateInterpreter::_active_table; > DispatchTable TemplateInterpreter::_normal_table; > DispatchTable TemplateInterpreter::_safept_table; > > So it seems like changing _active_table to: > > volatile DispatchTable TemplateInterpreter::_active_table; > > might be a good idea... Do you concur? I suspect that might be a problem for various reasons. Reading ahead, I see you?ve run into at least some, and deferred this to a new RFE. So I think I?m not going to pretend to understand this code well enough to understand the ramifications of such a change. >> I?ve been reserving Atomic::load/store for cases where the location ?ought? to be declared std::atomic if >> we were using C++11 atomics (or alternatively some homebrew equivalent). Not all places where we do >> stuff ?atomically? is appropriate for that though (consider card tables, being arrays of bytes, where using an >> atomic type might impose alignment constraints that are undesirable here). I *think* just using volatile >> here would likely be sufficient, e.g. we should have >> >> Copy::disjoint_words_atomic(const HeapWord* from,volatile HeapWord* to, size_t count) > > I think this part should be taken up in the follow bug that I filed: > > JDK-8227369 pd_disjoint_words_atomic() needs to be atomic > https://bugs.openjdk.java.net/browse/JDK-8227369 Agreed. > > Thanks for chiming in on the review! > > Dan From kim.barrett at oracle.com Tue Jul 9 23:17:11 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 9 Jul 2019 19:17:11 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: <6e2d84ad-5591-35bb-990c-134f6531ec9b@oracle.com> References: <56F4779A-7C29-4ED1-A742-FD8B61326A76@oracle.com> <6e2d84ad-5591-35bb-990c-134f6531ec9b@oracle.com> Message-ID: <72858E7B-898B-44A8-B7EE-864B77EFBA8B@oracle.com> > On Jul 9, 2019, at 9:19 AM, Daniel D. Daugherty wrote: > On 7/8/19 7:08 PM, Kim Barrett wrote: >> src/hotspot/share/interpreter/templateInterpreter.cpp >> 286 while (size-- > 0) *to++ = *from++; >> >> [pre-existing] >> >> This ought to be using Copy::disjoint_words. That's even more obvious >> in conjunction with the change to use Copy::disjoint_words_atomic in >> the non-safepoint case. > > I can make that change. Is there a specific advantage/reason that you > have in mind here? Mostly a ?if we?re going to have these kinds of utility APIs because we think they are useful, then we really ought to use them? argument. One might benefit from some highly tuned memcpy-like thing provided by the per-platform implementation of Copy::disjoint_words. Of course, the code is sufficiently simple that a compiler has a reason chance of making the appropriate transformation even without us telling it. The non-performance reason is a named operation is easier to read and understand than the corresponding explicit loop. >> src/hotspot/share/interpreter/templateInterpreter.cpp >> 284 if (SafepointSynchronize::is_at_safepoint()) { >> >> I wonder how much benefit we really get from having distinct safepoint >> and non-safepoint cases, rather than just unconditionally using >> Copy::disjoint_words_atomic. > > Sorry, I don't know the answer to that. My intention was to use > Copy::disjoint_words_atomic() only in the case where I knew that > I needed it so no potential impact on existing uses at a safepoint. Yeah, I just wasn?t sure how performance critical this copy is. Hm, I see that it might affect the time to get out of a safepoint, so potentially getting a highly tuned platform-specific memcpy operation in the safepoint case might indeed be worthwhile. So okay. From jianglizhou at google.com Wed Jul 10 00:25:34 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Tue, 9 Jul 2019 17:25:34 -0700 Subject: RFR(S) 8226406: JVM fails to detect mismatched or corrupt CDS archive In-Reply-To: <53b972d6-a253-b2a7-e9f5-1d533753ca0d@oracle.com> References: <1c5ab850-8b93-0b1f-3be4-8ba9f6eb469d@oracle.com> <4f7f0d69-9729-2d98-1437-83e5276b041c@oracle.com> <53b972d6-a253-b2a7-e9f5-1d533753ca0d@oracle.com> Message-ID: Hi Calvin, On Mon, Jul 8, 2019 at 11:31 PM Calvin Cheung wrote: > > > On 7/8/19 4:38 PM, Jiangli Zhou wrote: > > Hi Calvin, > > > > - src/hotspot/share/include/cds.h > > > > 36 #define NUM_CDS_REGIONS 8 > > > > The above change would need to be hand fixed when backporting to older > > versions. It's fine to include it in the current review, but it's > > better to create a separate bug and commit using that bug ID. So it > > will make the backports cleaner. > > I don't think it is worthwhile filing a bug just for this line. > > I've added a comment as follows: > > 36 #define NUM_CDS_REGIONS 8 // this must be the same as > MetaspaceShared::n_regions The issue is that one needs to manually change or revert the above when backporting the change to older JDK versions, so the backport is not clean and introduces risks. Committing it under a separate bug id will avoid that issue. Or, you could combine the above it with some other change later. In general, it's a good practice to avoid combining unrelated changes in one changeset (with single bug id). > > > > > -------- > > > > 39 #define CDS_END_MAGIC 0xf00babae > > > > What's the significance of the new end magic? Should the existing > > header validation be sufficient as long as it's done first? > It seems unnecessary now. I got rid of it. > > > > -------- > > > > - src/hotspot/share/memory/filemap.cpp > > > > 901 if (_header->_magic != CDS_ARCHIVE_MAGIC && _header->_magic != > > CDS_DYNAMIC_ARCHIVE_MAGIC) { > > 902 unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : > > CDS_DYNAMIC_ARCHIVE_MAGIC; > > 903 log_info(cds)("_magic expected: 0x%08x", expected_magic); > > 904 log_info(cds)(" actual: 0x%08x", _header->_magic); > > 905 FileMapInfo::fail_continue("The shared archive file has a bad > > magic number."); > > 906 return false; > > 907 } > > ... > > > > 964 if (is_static) { > > 965 if (_header->_magic != CDS_ARCHIVE_MAGIC) { > > 966 fail_continue("Incorrect static archive magic number"); > > 967 return false; > > 968 } > > > > There are two checks for _header->_magic in > > FileMapInfo::init_from_file now but behave differently. The second one > > can be removed. The first check at line 901 should check the _magic > > value based on the 'is_static' flag: > > > > unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : > > CDS_DYNAMIC_ARCHIVE_MAGIC; > > if (_header->_magic != expected_magic) { > > ... > I've made the above change. > > > > -------- > > > > Most of the work now in FileMapInfo::init_from_file should really > > belong to FileMapInfo::validate_header. It would be cleaner to simply > > FileMapInfo::init_from_file to be the following and move the rest to > > FileMapInfo::validate_header. Thoughts? > > > > 888 bool FileMapInfo::init_from_file(int fd, bool is_static) { > > 889 size_t sz = is_static ? sizeof(FileMapHeader) : > > sizeof(DynamicArchiveHeader); > > 890 size_t n = os::read(fd, _header, (unsigned int)sz); > > 891 if (n != sz) { > > 892 fail_continue("Unable to read the file header."); > > 893 return false; > > 894 } > > 895 return true; > > } > > The _file_offset will be based on the size_t n and some other fields > (_paths_misc_info, SharedBaseAddress) will be set at lines 953 - 976. > Also, there's the following check in validate_header(): > > 1859 if > (!ClassLoader::check_shared_paths_misc_info(_paths_misc_info, > _header->_paths_misc_info_size, is_static)) { > > If the SharedPathsMiscInfo could be removed (JDK-8227370), then it is > possible that validate_header could be called within init_from_file. I > think we should defer this until JDK-8227370. Ok for deferring it. Best, Jiangli > > updated webrev: > > http://cr.openjdk.java.net/~ccheung/8226406/webrev.01/ > > thanks, > > Calvin > > > > > Best regards, > > > > Jiangli > > > > > > On Mon, Jul 8, 2019 at 10:25 AM Jiangli Zhou wrote: > >> Hi Calvin, > >> > >> On Mon, Jul 8, 2019 at 10:00 AM Calvin Cheung wrote: > >>> Hi Jiangli, > >>> > >>> On 7/7/19 5:12 PM, Jiangli Zhou wrote: > >>>> Hi Calvin, > >>>> > >>>> Per our off-mailing-list email exchange from the previous code review > >>>> for https://bugs.openjdk.java.net/browse/JDK-8211723, I created > >>>> https://bugs.openjdk.java.net/browse/JDK-8227370, 'Remove > >>>> SharedPathsMiscInfo' > >>> Thanks for filing the RFE. > >>>> . I think the crash caused by premature runtime accessing of > >>>> _paths_misc_info_size should be handled as part of JDK-8227370, rather > >>>> than further patching up the SharedPathsMiscInfo > >>> My current patch involves checking most the fields in > >>> CDSFileMapHeaderBase before accessing other fields. This part is > >>> applicable to other fields, not only to the _paths_misc_info_size. This > >>> bug existed for a while and I think it would be a good backport > >>> candidate for 11u. The patch for JDK-8211723 and the follow-up RFE > >>> JDK-8227370 are not necessary to be backported to 11u. I'd like to fix > >>> this bug first and then handle JDK-8227370 as a separate changeset. > >> That sounds like a good plan. A fix targeted for backporting should > >> have a clean-cut (less dependency) and controlled scope. Addressing > >> this incrementally in separate changesets is a suitable approach. > >> > >> I took a quick look over the weekend and noticed some issues with your > >> current patch. That's why I suggested to go with the complete removal > >> without spending extra effort on SharedPathsMiscInfo. I will need to > >> take a closer look and try to get back to you later today. > >> > >> Best regards, > >> Jiangli > >> > >>> thanks, > >>> > >>> Calvin > >>> > >>>> Thanks and regards, > >>>> Jiangli > >>>> > >>>> On Wed, Jul 3, 2019 at 5:59 PM Calvin Cheung wrote: > >>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8226406 > >>>>> > >>>>> webrev: http://cr.openjdk.java.net/~ccheung/8226406/webrev.00/ > >>>>> > >>>>> This bug was found during a bootcycle build when a shared archive built > >>>>> by a 64-bit JDK version is used by a 32-bit JDK version. It is due to > >>>>> some of the important header fields such as the _jvm_ident was not > >>>>> checked prior to accessinng other fields such as the _paths_misc_info_size. > >>>>> > >>>>> This fix involves checking most the fields in CDSFileMapHeaderBase > >>>>> before accessing other fields. > >>>>> > >>>>> Testing: tiers 1-3. > >>>>> > >>>>> thanks, > >>>>> > >>>>> Calvin > >>>>> From jianglizhou at google.com Wed Jul 10 00:32:00 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Tue, 9 Jul 2019 17:32:00 -0700 Subject: RFR(S) 8226406: JVM fails to detect mismatched or corrupt CDS archive In-Reply-To: References: <1c5ab850-8b93-0b1f-3be4-8ba9f6eb469d@oracle.com> <4f7f0d69-9729-2d98-1437-83e5276b041c@oracle.com> Message-ID: On Mon, Jul 8, 2019 at 11:35 PM Calvin Cheung wrote: > > > On 7/8/19 4:45 PM, Jiangli Zhou wrote: > > -#define CURRENT_CDS_ARCHIVE_VERSION 5 > > +#define CURRENT_CDS_ARCHIVE_VERSION 6 > > > > I would also suggestion to not do the above change in this bug fix > > since that would make all older versions to use '6' when backported > > (unless hand merge is involved). > > Since the _jvm_ident field has been moved to a different location, I > think the CURRENT_CDS_ARCHIVE_VERSION should be updated. Even if the > version stays the same, shared archive created by an older version of > JVM cannot be used by the current JVM version. Can you please clarify the reason for moving the field? It's confusing for all different JDK versions to have the same CURRENT_CDS_ARCHIVE_VERSION but with significantly different archive layouts. Best, Jiangli > > thanks, > > Calvin > > > > > Thanks, > > Jiangli > > > > On Mon, Jul 8, 2019 at 4:38 PM Jiangli Zhou wrote: > >> Hi Calvin, > >> > >> - src/hotspot/share/include/cds.h > >> > >> 36 #define NUM_CDS_REGIONS 8 > >> > >> The above change would need to be hand fixed when backporting to older > >> versions. It's fine to include it in the current review, but it's > >> better to create a separate bug and commit using that bug ID. So it > >> will make the backports cleaner. > >> > >> -------- > >> > >> 39 #define CDS_END_MAGIC 0xf00babae > >> > >> What's the significance of the new end magic? Should the existing > >> header validation be sufficient as long as it's done first? > >> > >> -------- > >> > >> - src/hotspot/share/memory/filemap.cpp > >> > >> 901 if (_header->_magic != CDS_ARCHIVE_MAGIC && _header->_magic != > >> CDS_DYNAMIC_ARCHIVE_MAGIC) { > >> 902 unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : > >> CDS_DYNAMIC_ARCHIVE_MAGIC; > >> 903 log_info(cds)("_magic expected: 0x%08x", expected_magic); > >> 904 log_info(cds)(" actual: 0x%08x", _header->_magic); > >> 905 FileMapInfo::fail_continue("The shared archive file has a bad > >> magic number."); > >> 906 return false; > >> 907 } > >> ... > >> > >> 964 if (is_static) { > >> 965 if (_header->_magic != CDS_ARCHIVE_MAGIC) { > >> 966 fail_continue("Incorrect static archive magic number"); > >> 967 return false; > >> 968 } > >> > >> There are two checks for _header->_magic in > >> FileMapInfo::init_from_file now but behave differently. The second one > >> can be removed. The first check at line 901 should check the _magic > >> value based on the 'is_static' flag: > >> > >> unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : > >> CDS_DYNAMIC_ARCHIVE_MAGIC; > >> if (_header->_magic != expected_magic) { > >> ... > >> > >> -------- > >> > >> Most of the work now in FileMapInfo::init_from_file should really > >> belong to FileMapInfo::validate_header. It would be cleaner to simply > >> FileMapInfo::init_from_file to be the following and move the rest to > >> FileMapInfo::validate_header. Thoughts? > >> > >> 888 bool FileMapInfo::init_from_file(int fd, bool is_static) { > >> 889 size_t sz = is_static ? sizeof(FileMapHeader) : > >> sizeof(DynamicArchiveHeader); > >> 890 size_t n = os::read(fd, _header, (unsigned int)sz); > >> 891 if (n != sz) { > >> 892 fail_continue("Unable to read the file header."); > >> 893 return false; > >> 894 } > >> 895 return true; > >> } > >> > >> Best regards, > >> > >> Jiangli > >> > >> > >> On Mon, Jul 8, 2019 at 10:25 AM Jiangli Zhou wrote: > >>> Hi Calvin, > >>> > >>> On Mon, Jul 8, 2019 at 10:00 AM Calvin Cheung wrote: > >>>> Hi Jiangli, > >>>> > >>>> On 7/7/19 5:12 PM, Jiangli Zhou wrote: > >>>>> Hi Calvin, > >>>>> > >>>>> Per our off-mailing-list email exchange from the previous code review > >>>>> for https://bugs.openjdk.java.net/browse/JDK-8211723, I created > >>>>> https://bugs.openjdk.java.net/browse/JDK-8227370, 'Remove > >>>>> SharedPathsMiscInfo' > >>>> Thanks for filing the RFE. > >>>>> . I think the crash caused by premature runtime accessing of > >>>>> _paths_misc_info_size should be handled as part of JDK-8227370, rather > >>>>> than further patching up the SharedPathsMiscInfo > >>>> My current patch involves checking most the fields in > >>>> CDSFileMapHeaderBase before accessing other fields. This part is > >>>> applicable to other fields, not only to the _paths_misc_info_size. This > >>>> bug existed for a while and I think it would be a good backport > >>>> candidate for 11u. The patch for JDK-8211723 and the follow-up RFE > >>>> JDK-8227370 are not necessary to be backported to 11u. I'd like to fix > >>>> this bug first and then handle JDK-8227370 as a separate changeset. > >>> That sounds like a good plan. A fix targeted for backporting should > >>> have a clean-cut (less dependency) and controlled scope. Addressing > >>> this incrementally in separate changesets is a suitable approach. > >>> > >>> I took a quick look over the weekend and noticed some issues with your > >>> current patch. That's why I suggested to go with the complete removal > >>> without spending extra effort on SharedPathsMiscInfo. I will need to > >>> take a closer look and try to get back to you later today. > >>> > >>> Best regards, > >>> Jiangli > >>> > >>>> thanks, > >>>> > >>>> Calvin > >>>> > >>>>> Thanks and regards, > >>>>> Jiangli > >>>>> > >>>>> On Wed, Jul 3, 2019 at 5:59 PM Calvin Cheung wrote: > >>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8226406 > >>>>>> > >>>>>> webrev: http://cr.openjdk.java.net/~ccheung/8226406/webrev.00/ > >>>>>> > >>>>>> This bug was found during a bootcycle build when a shared archive built > >>>>>> by a 64-bit JDK version is used by a 32-bit JDK version. It is due to > >>>>>> some of the important header fields such as the _jvm_ident was not > >>>>>> checked prior to accessinng other fields such as the _paths_misc_info_size. > >>>>>> > >>>>>> This fix involves checking most the fields in CDSFileMapHeaderBase > >>>>>> before accessing other fields. > >>>>>> > >>>>>> Testing: tiers 1-3. > >>>>>> > >>>>>> thanks, > >>>>>> > >>>>>> Calvin > >>>>>> From mikhailo.seledtsov at oracle.com Wed Jul 10 02:45:38 2019 From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com) Date: Tue, 9 Jul 2019 19:45:38 -0700 Subject: RFR(S): 8227122: [TESTBUG] Create Docker sidecar test cases In-Reply-To: References: <11c03bdc-832b-3479-075b-1a8b5dc95e87@oracle.com> Message-ID: Hi Bob, On 7/3/19 7:49 AM, Bob Vandette wrote: > Very nice addition to ensuring support for popular docker use cases. Thank you for reviewing the change. > > A few comments on the TestJcmdWithSideCar.java > > 1. Shouldn?t you use @requires to only run this test on Linux x64? > ?CanTestDocker should > protect us but your test wouldn?t run on windows if we added docker > support there. Looks like all other Docker tests in this directory use "@requires docker.support". This can be handy for other platforms, such as Linux-arm, Linux-PPC. Also, for development purposes, a user can run tests on Mac by simple temporary modification to docker.support. Unless you have strong position on this, I would like to continue using just "@requires docker.support". > > 2. Why is this repeated? > 149 "--pid=container:" + MAIN_CONTAINER_NAME, > 150 "--pid=container:" + MAIN_CONTAINER_NAME, Good catch. Will fix it. > 3. I?m a little concerned about the built in fixed delays especially > the startMainContainer one. > We don?t want any intermittent test failures. Could you maybe add a > DockerThread.checkIsAlive > function and call that every second for 20 seconds and then give up? Sounds good, I will do it the way you recommend. > > What tier are you adding this test to? > > Thanks, > Bob. > I will upload the second webrev after I update the changes and retest. Thank you, Misha > >> On Jul 2, 2019, at 6:24 PM, mikhailo.seledtsov at oracle.com >> wrote: >> >> Please review this new test that uses a Docker sidecar pattern to >> manage/monitor JVM running in the main payload container. >> >> Sidecar is a common pattern used in the cloud environments for >> monitoring among other uses. In side car pattern the main >> application/service container that runs the payload is paired with a >> sidecar container. It is achieved by sharing certain namespace >> aspects between the two containers such as PID namespace, specific >> sub-directories, IPC and more. >> >> This test implements the following cases: >> ? - "jcmd -l" to list java processes running in "main" container from >> the "sidecar" container >> ? - "jhsdb jinfo" in the sidecar configuration >> ? - jcmd >> >> This change also builds a basis for more test cases in the future. >> >> Minor changes were done to DockerTestUtils: >> ? - changing access to DOCKER_COMMAND constant to public >> ? - minor spelling and terminology corrections >> >> >> ??? JBS: https://bugs.openjdk.java.net/browse/JDK-8227122 >> ??? Webrev: http://cr.openjdk.java.net/~mseledtsov/8227122.00/ >> ??? Testing: >> ??????? 1. ran Docker tests on Linux-x64 - PASS >> ??????? 2. Running Docker tests in test cluster - in progress >> >> >> Thank you, >> Misha >> > From thomas.stuefe at gmail.com Wed Jul 10 05:38:11 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 10 Jul 2019 07:38:11 +0200 Subject: RFR[13, xs]: 8227275: Within native OOM error handling, assertions may hang the process In-Reply-To: References: Message-ID: Hi Coleen, thanks for looking at it! Remarks below. On Tue, Jul 9, 2019 at 11:12 PM wrote: > > > http://cr.openjdk.java.net/~stuefe/webrevs/8227275-native-oom-hanging-assertions/webrev.00/webrev/src/hotspot/share/utilities/debug.cpp.udiff.html > > I don't understand why you don't just leave the poison page PROT_NONE > and call this from handle_assert_poison_fault? > > +void disarm_assert_poison() { > + g_assert_poison = &g_dummy; > +} > + > > Then you don't have to check that it succeeded. At this point, it > doesn't matter. > > Because unfortunately that does not work. At the point it is too late. handle_assert_poison_fault() is called from the signal handler to handle a poison touch SIGSEGV. When it is handled, it will return to the caller - jumps back to the instruction triggering the SEGV. There, poison page address is already loaded into a register and cannot be changed. The only other choice we have, beside removing the write protection from the poison page, is not to return to the caller. That is what happens when I return false from handle_assert_poison_fault(). In that case the signal handler proceeds as if this were a real SEGV. Cheers, Thomas Coleen > > On 7/9/19 5:23 AM, Thomas St?fe wrote: > > Dear all, > > > > may I please have reviews for the following issue: > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8227275 > > cr: > > > http://cr.openjdk.java.net/~stuefe/webrevs/8227275-native-oom-hanging-assertions/webrev.00/webrev/ > > > > Summary: on OOM, we may fail to disarm assertion poison page; this may > lead > > to endless loops during error handling if assertions happen in native OOM > > scenarios. > > > > For more details, pls see the JBS issue. > > > > Thanks, Thomas > > From calvin.cheung at oracle.com Wed Jul 10 07:13:33 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Wed, 10 Jul 2019 00:13:33 -0700 Subject: RFR(S) 8226406: JVM fails to detect mismatched or corrupt CDS archive In-Reply-To: References: <1c5ab850-8b93-0b1f-3be4-8ba9f6eb469d@oracle.com> <4f7f0d69-9729-2d98-1437-83e5276b041c@oracle.com> <53b972d6-a253-b2a7-e9f5-1d533753ca0d@oracle.com> Message-ID: On 7/9/19 5:25 PM, Jiangli Zhou wrote: > Hi Calvin, > > On Mon, Jul 8, 2019 at 11:31 PM Calvin Cheung wrote: >> >> On 7/8/19 4:38 PM, Jiangli Zhou wrote: >>> Hi Calvin, >>> >>> - src/hotspot/share/include/cds.h >>> >>> 36 #define NUM_CDS_REGIONS 8 >>> >>> The above change would need to be hand fixed when backporting to older >>> versions. It's fine to include it in the current review, but it's >>> better to create a separate bug and commit using that bug ID. So it >>> will make the backports cleaner. >> I don't think it is worthwhile filing a bug just for this line. >> >> I've added a comment as follows: >> >> 36 #define NUM_CDS_REGIONS 8 // this must be the same as >> MetaspaceShared::n_regions > The issue is that one needs to manually change or revert the above > when backporting the change to older JDK versions, so the backport is > not clean and introduces risks. Committing it under a separate bug id > will avoid that issue. Or, you could combine the above it with some > other change later. In general, it's a good practice to avoid > combining unrelated changes in one changeset (with single bug id). I've reverted the changes in cds.h and filemap.hpp and filed the following bug for the NUM_CDS_REGIONS adjustment: ??? https://bugs.openjdk.java.net/browse/JDK-8227496 Updated webrev: ??? http://cr.openjdk.java.net/~ccheung/8226406/webrev.02/ thanks, Calvin > >>> -------- >>> >>> 39 #define CDS_END_MAGIC 0xf00babae >>> >>> What's the significance of the new end magic? Should the existing >>> header validation be sufficient as long as it's done first? >> It seems unnecessary now. I got rid of it. >>> -------- >>> >>> - src/hotspot/share/memory/filemap.cpp >>> >>> 901 if (_header->_magic != CDS_ARCHIVE_MAGIC && _header->_magic != >>> CDS_DYNAMIC_ARCHIVE_MAGIC) { >>> 902 unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : >>> CDS_DYNAMIC_ARCHIVE_MAGIC; >>> 903 log_info(cds)("_magic expected: 0x%08x", expected_magic); >>> 904 log_info(cds)(" actual: 0x%08x", _header->_magic); >>> 905 FileMapInfo::fail_continue("The shared archive file has a bad >>> magic number."); >>> 906 return false; >>> 907 } >>> ... >>> >>> 964 if (is_static) { >>> 965 if (_header->_magic != CDS_ARCHIVE_MAGIC) { >>> 966 fail_continue("Incorrect static archive magic number"); >>> 967 return false; >>> 968 } >>> >>> There are two checks for _header->_magic in >>> FileMapInfo::init_from_file now but behave differently. The second one >>> can be removed. The first check at line 901 should check the _magic >>> value based on the 'is_static' flag: >>> >>> unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : >>> CDS_DYNAMIC_ARCHIVE_MAGIC; >>> if (_header->_magic != expected_magic) { >>> ... >> I've made the above change. >>> -------- >>> >>> Most of the work now in FileMapInfo::init_from_file should really >>> belong to FileMapInfo::validate_header. It would be cleaner to simply >>> FileMapInfo::init_from_file to be the following and move the rest to >>> FileMapInfo::validate_header. Thoughts? >>> >>> 888 bool FileMapInfo::init_from_file(int fd, bool is_static) { >>> 889 size_t sz = is_static ? sizeof(FileMapHeader) : >>> sizeof(DynamicArchiveHeader); >>> 890 size_t n = os::read(fd, _header, (unsigned int)sz); >>> 891 if (n != sz) { >>> 892 fail_continue("Unable to read the file header."); >>> 893 return false; >>> 894 } >>> 895 return true; >>> } >> The _file_offset will be based on the size_t n and some other fields >> (_paths_misc_info, SharedBaseAddress) will be set at lines 953 - 976. >> Also, there's the following check in validate_header(): >> >> 1859 if >> (!ClassLoader::check_shared_paths_misc_info(_paths_misc_info, >> _header->_paths_misc_info_size, is_static)) { >> >> If the SharedPathsMiscInfo could be removed (JDK-8227370), then it is >> possible that validate_header could be called within init_from_file. I >> think we should defer this until JDK-8227370. > Ok for deferring it. > > Best, > Jiangli > >> updated webrev: >> >> http://cr.openjdk.java.net/~ccheung/8226406/webrev.01/ >> >> thanks, >> >> Calvin >> >>> Best regards, >>> >>> Jiangli >>> >>> >>> On Mon, Jul 8, 2019 at 10:25 AM Jiangli Zhou wrote: >>>> Hi Calvin, >>>> >>>> On Mon, Jul 8, 2019 at 10:00 AM Calvin Cheung wrote: >>>>> Hi Jiangli, >>>>> >>>>> On 7/7/19 5:12 PM, Jiangli Zhou wrote: >>>>>> Hi Calvin, >>>>>> >>>>>> Per our off-mailing-list email exchange from the previous code review >>>>>> for https://bugs.openjdk.java.net/browse/JDK-8211723, I created >>>>>> https://bugs.openjdk.java.net/browse/JDK-8227370, 'Remove >>>>>> SharedPathsMiscInfo' >>>>> Thanks for filing the RFE. >>>>>> . I think the crash caused by premature runtime accessing of >>>>>> _paths_misc_info_size should be handled as part of JDK-8227370, rather >>>>>> than further patching up the SharedPathsMiscInfo >>>>> My current patch involves checking most the fields in >>>>> CDSFileMapHeaderBase before accessing other fields. This part is >>>>> applicable to other fields, not only to the _paths_misc_info_size. This >>>>> bug existed for a while and I think it would be a good backport >>>>> candidate for 11u. The patch for JDK-8211723 and the follow-up RFE >>>>> JDK-8227370 are not necessary to be backported to 11u. I'd like to fix >>>>> this bug first and then handle JDK-8227370 as a separate changeset. >>>> That sounds like a good plan. A fix targeted for backporting should >>>> have a clean-cut (less dependency) and controlled scope. Addressing >>>> this incrementally in separate changesets is a suitable approach. >>>> >>>> I took a quick look over the weekend and noticed some issues with your >>>> current patch. That's why I suggested to go with the complete removal >>>> without spending extra effort on SharedPathsMiscInfo. I will need to >>>> take a closer look and try to get back to you later today. >>>> >>>> Best regards, >>>> Jiangli >>>> >>>>> thanks, >>>>> >>>>> Calvin >>>>> >>>>>> Thanks and regards, >>>>>> Jiangli >>>>>> >>>>>> On Wed, Jul 3, 2019 at 5:59 PM Calvin Cheung wrote: >>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8226406 >>>>>>> >>>>>>> webrev: http://cr.openjdk.java.net/~ccheung/8226406/webrev.00/ >>>>>>> >>>>>>> This bug was found during a bootcycle build when a shared archive built >>>>>>> by a 64-bit JDK version is used by a 32-bit JDK version. It is due to >>>>>>> some of the important header fields such as the _jvm_ident was not >>>>>>> checked prior to accessinng other fields such as the _paths_misc_info_size. >>>>>>> >>>>>>> This fix involves checking most the fields in CDSFileMapHeaderBase >>>>>>> before accessing other fields. >>>>>>> >>>>>>> Testing: tiers 1-3. >>>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> Calvin >>>>>>> From calvin.cheung at oracle.com Wed Jul 10 07:28:53 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Wed, 10 Jul 2019 00:28:53 -0700 Subject: RFR(S) 8226406: JVM fails to detect mismatched or corrupt CDS archive In-Reply-To: References: <1c5ab850-8b93-0b1f-3be4-8ba9f6eb469d@oracle.com> <4f7f0d69-9729-2d98-1437-83e5276b041c@oracle.com> Message-ID: On 7/9/19 5:32 PM, Jiangli Zhou wrote: > On Mon, Jul 8, 2019 at 11:35 PM Calvin Cheung wrote: >> >> On 7/8/19 4:45 PM, Jiangli Zhou wrote: >>> -#define CURRENT_CDS_ARCHIVE_VERSION 5 >>> +#define CURRENT_CDS_ARCHIVE_VERSION 6 >>> >>> I would also suggestion to not do the above change in this bug fix >>> since that would make all older versions to use '6' when backported >>> (unless hand merge is involved). >> Since the _jvm_ident field has been moved to a different location, I >> think the CURRENT_CDS_ARCHIVE_VERSION should be updated. Even if the >> version stays the same, shared archive created by an older version of >> JVM cannot be used by the current JVM version. > Can you please clarify the reason for moving the field? One advantage is that there's currently a 4-byte gap between _version and _space. Placing the _jvm_ident field after _version, the first 4 fields will be 4-byte aligned. Anyway, I've reverted the change in my latest webrev. ? 57 struct CDSFileMapHeaderBase { ? 58?? unsigned int _magic;?????????? // identify file type ? 59?? int????????? _crc;???????????? // header crc checksum ? 60?? int????????? _version;???????? // must be CURRENT_CDS_ARCHIVE_VERSION ? 61?? struct CDSFileMapRegion _space[NUM_CDS_REGIONS]; ? 62 }; > > It's confusing for all different JDK versions to have the same > CURRENT_CDS_ARCHIVE_VERSION but with significantly different archive > layouts. I've checked the change history of the CURRENT_CDS_ARCHIVE_VERSION. Last update was for the following bug fix: 8208658: Make CDS archived heap regions usable even if compressed oop encoding has changed Since there were 2 fields added to the header for the dynamic CDS archive, the version should have been updated again. Should I file a bug to update the version? thanks, Calvin > > Best, > Jiangli >> thanks, >> >> Calvin >> >>> Thanks, >>> Jiangli >>> >>> On Mon, Jul 8, 2019 at 4:38 PM Jiangli Zhou wrote: >>>> Hi Calvin, >>>> >>>> - src/hotspot/share/include/cds.h >>>> >>>> 36 #define NUM_CDS_REGIONS 8 >>>> >>>> The above change would need to be hand fixed when backporting to older >>>> versions. It's fine to include it in the current review, but it's >>>> better to create a separate bug and commit using that bug ID. So it >>>> will make the backports cleaner. >>>> >>>> -------- >>>> >>>> 39 #define CDS_END_MAGIC 0xf00babae >>>> >>>> What's the significance of the new end magic? Should the existing >>>> header validation be sufficient as long as it's done first? >>>> >>>> -------- >>>> >>>> - src/hotspot/share/memory/filemap.cpp >>>> >>>> 901 if (_header->_magic != CDS_ARCHIVE_MAGIC && _header->_magic != >>>> CDS_DYNAMIC_ARCHIVE_MAGIC) { >>>> 902 unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : >>>> CDS_DYNAMIC_ARCHIVE_MAGIC; >>>> 903 log_info(cds)("_magic expected: 0x%08x", expected_magic); >>>> 904 log_info(cds)(" actual: 0x%08x", _header->_magic); >>>> 905 FileMapInfo::fail_continue("The shared archive file has a bad >>>> magic number."); >>>> 906 return false; >>>> 907 } >>>> ... >>>> >>>> 964 if (is_static) { >>>> 965 if (_header->_magic != CDS_ARCHIVE_MAGIC) { >>>> 966 fail_continue("Incorrect static archive magic number"); >>>> 967 return false; >>>> 968 } >>>> >>>> There are two checks for _header->_magic in >>>> FileMapInfo::init_from_file now but behave differently. The second one >>>> can be removed. The first check at line 901 should check the _magic >>>> value based on the 'is_static' flag: >>>> >>>> unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : >>>> CDS_DYNAMIC_ARCHIVE_MAGIC; >>>> if (_header->_magic != expected_magic) { >>>> ... >>>> >>>> -------- >>>> >>>> Most of the work now in FileMapInfo::init_from_file should really >>>> belong to FileMapInfo::validate_header. It would be cleaner to simply >>>> FileMapInfo::init_from_file to be the following and move the rest to >>>> FileMapInfo::validate_header. Thoughts? >>>> >>>> 888 bool FileMapInfo::init_from_file(int fd, bool is_static) { >>>> 889 size_t sz = is_static ? sizeof(FileMapHeader) : >>>> sizeof(DynamicArchiveHeader); >>>> 890 size_t n = os::read(fd, _header, (unsigned int)sz); >>>> 891 if (n != sz) { >>>> 892 fail_continue("Unable to read the file header."); >>>> 893 return false; >>>> 894 } >>>> 895 return true; >>>> } >>>> >>>> Best regards, >>>> >>>> Jiangli >>>> >>>> >>>> On Mon, Jul 8, 2019 at 10:25 AM Jiangli Zhou wrote: >>>>> Hi Calvin, >>>>> >>>>> On Mon, Jul 8, 2019 at 10:00 AM Calvin Cheung wrote: >>>>>> Hi Jiangli, >>>>>> >>>>>> On 7/7/19 5:12 PM, Jiangli Zhou wrote: >>>>>>> Hi Calvin, >>>>>>> >>>>>>> Per our off-mailing-list email exchange from the previous code review >>>>>>> for https://bugs.openjdk.java.net/browse/JDK-8211723, I created >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227370, 'Remove >>>>>>> SharedPathsMiscInfo' >>>>>> Thanks for filing the RFE. >>>>>>> . I think the crash caused by premature runtime accessing of >>>>>>> _paths_misc_info_size should be handled as part of JDK-8227370, rather >>>>>>> than further patching up the SharedPathsMiscInfo >>>>>> My current patch involves checking most the fields in >>>>>> CDSFileMapHeaderBase before accessing other fields. This part is >>>>>> applicable to other fields, not only to the _paths_misc_info_size. This >>>>>> bug existed for a while and I think it would be a good backport >>>>>> candidate for 11u. The patch for JDK-8211723 and the follow-up RFE >>>>>> JDK-8227370 are not necessary to be backported to 11u. I'd like to fix >>>>>> this bug first and then handle JDK-8227370 as a separate changeset. >>>>> That sounds like a good plan. A fix targeted for backporting should >>>>> have a clean-cut (less dependency) and controlled scope. Addressing >>>>> this incrementally in separate changesets is a suitable approach. >>>>> >>>>> I took a quick look over the weekend and noticed some issues with your >>>>> current patch. That's why I suggested to go with the complete removal >>>>> without spending extra effort on SharedPathsMiscInfo. I will need to >>>>> take a closer look and try to get back to you later today. >>>>> >>>>> Best regards, >>>>> Jiangli >>>>> >>>>>> thanks, >>>>>> >>>>>> Calvin >>>>>> >>>>>>> Thanks and regards, >>>>>>> Jiangli >>>>>>> >>>>>>> On Wed, Jul 3, 2019 at 5:59 PM Calvin Cheung wrote: >>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8226406 >>>>>>>> >>>>>>>> webrev: http://cr.openjdk.java.net/~ccheung/8226406/webrev.00/ >>>>>>>> >>>>>>>> This bug was found during a bootcycle build when a shared archive built >>>>>>>> by a 64-bit JDK version is used by a 32-bit JDK version. It is due to >>>>>>>> some of the important header fields such as the _jvm_ident was not >>>>>>>> checked prior to accessinng other fields such as the _paths_misc_info_size. >>>>>>>> >>>>>>>> This fix involves checking most the fields in CDSFileMapHeaderBase >>>>>>>> before accessing other fields. >>>>>>>> >>>>>>>> Testing: tiers 1-3. >>>>>>>> >>>>>>>> thanks, >>>>>>>> >>>>>>>> Calvin >>>>>>>> From martin.doerr at sap.com Wed Jul 10 09:45:40 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 10 Jul 2019 09:45:40 +0000 Subject: RFR[13, xs]: 8227275: Within native OOM error handling, assertions may hang the process In-Reply-To: References: Message-ID: Hi Thomas, thanks for the explanations. Your fix looks good to me (except the missing include in vmError.cpp we discussed offline). Best regards, Martin > -----Original Message----- > From: hotspot-runtime-dev bounces at openjdk.java.net> On Behalf Of Thomas St?fe > Sent: Mittwoch, 10. Juli 2019 07:38 > To: Coleen Phillmore > Cc: Hotspot dev runtime > Subject: Re: RFR[13, xs]: 8227275: Within native OOM error handling, > assertions may hang the process > > Hi Coleen, > > thanks for looking at it! Remarks below. > > On Tue, Jul 9, 2019 at 11:12 PM wrote: > > > > > > > http://cr.openjdk.java.net/~stuefe/webrevs/8227275-native-oom- > hanging- > assertions/webrev.00/webrev/src/hotspot/share/utilities/debug.cpp.udiff.h > tml > > > > I don't understand why you don't just leave the poison page PROT_NONE > > and call this from handle_assert_poison_fault? > > > > +void disarm_assert_poison() { > > + g_assert_poison = &g_dummy; > > +} > > + > > > > Then you don't have to check that it succeeded. At this point, it > > doesn't matter. > > > > > Because unfortunately that does not work. At the point it is too late. > > handle_assert_poison_fault() is called from the signal handler to handle a > poison touch SIGSEGV. When it is handled, it will return to the caller - > jumps back to the instruction triggering the SEGV. There, poison page > address is already loaded into a register and cannot be changed. > > The only other choice we have, beside removing the write protection from > the poison page, is not to return to the caller. That is what happens when > I return false from handle_assert_poison_fault(). In that case the signal > handler proceeds as if this were a real SEGV. > > Cheers, Thomas > > > Coleen > > > > On 7/9/19 5:23 AM, Thomas St?fe wrote: > > > Dear all, > > > > > > may I please have reviews for the following issue: > > > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8227275 > > > cr: > > > > > http://cr.openjdk.java.net/~stuefe/webrevs/8227275-native-oom- > hanging-assertions/webrev.00/webrev/ > > > > > > Summary: on OOM, we may fail to disarm assertion poison page; this may > > lead > > > to endless loops during error handling if assertions happen in native OOM > > > scenarios. > > > > > > For more details, pls see the JBS issue. > > > > > > Thanks, Thomas > > > > From thomas.stuefe at gmail.com Wed Jul 10 09:59:23 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 10 Jul 2019 11:59:23 +0200 Subject: RFR[13, xs]: 8227275: Within native OOM error handling, assertions may hang the process In-Reply-To: References: Message-ID: Thanks Martin! On Wed, Jul 10, 2019 at 11:45 AM Doerr, Martin wrote: > Hi Thomas, > > thanks for the explanations. Your fix looks good to me (except the missing > include in vmError.cpp we discussed offline). > > The include was not missing; instead, the call to disarm_poison_page() must be enclosed in #ifdef CAN_SHOW_REGISTERS_ON_ASSERT. I fixed the patch in place. Cheers Thomas > Best regards, > Martin > > > > -----Original Message----- > > From: hotspot-runtime-dev > bounces at openjdk.java.net> On Behalf Of Thomas St?fe > > Sent: Mittwoch, 10. Juli 2019 07:38 > > To: Coleen Phillmore > > Cc: Hotspot dev runtime > > Subject: Re: RFR[13, xs]: 8227275: Within native OOM error handling, > > assertions may hang the process > > > > Hi Coleen, > > > > thanks for looking at it! Remarks below. > > > > On Tue, Jul 9, 2019 at 11:12 PM wrote: > > > > > > > > > > > http://cr.openjdk.java.net/~stuefe/webrevs/8227275-native-oom- > > hanging- > > assertions/webrev.00/webrev/src/hotspot/share/utilities/debug.cpp.udiff.h > > tml > > > > > > I don't understand why you don't just leave the poison page PROT_NONE > > > and call this from handle_assert_poison_fault? > > > > > > +void disarm_assert_poison() { > > > + g_assert_poison = &g_dummy; > > > +} > > > + > > > > > > Then you don't have to check that it succeeded. At this point, it > > > doesn't matter. > > > > > > > > Because unfortunately that does not work. At the point it is too late. > > > > handle_assert_poison_fault() is called from the signal handler to handle > a > > poison touch SIGSEGV. When it is handled, it will return to the caller - > > jumps back to the instruction triggering the SEGV. There, poison page > > address is already loaded into a register and cannot be changed. > > > > The only other choice we have, beside removing the write protection from > > the poison page, is not to return to the caller. That is what happens > when > > I return false from handle_assert_poison_fault(). In that case the signal > > handler proceeds as if this were a real SEGV. > > > > Cheers, Thomas > > > > > > Coleen > > > > > > On 7/9/19 5:23 AM, Thomas St?fe wrote: > > > > Dear all, > > > > > > > > may I please have reviews for the following issue: > > > > > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8227275 > > > > cr: > > > > > > > http://cr.openjdk.java.net/~stuefe/webrevs/8227275-native-oom- > > hanging-assertions/webrev.00/webrev/ > > > > > > > > Summary: on OOM, we may fail to disarm assertion poison page; this > may > > > lead > > > > to endless loops during error handling if assertions happen in > native OOM > > > > scenarios. > > > > > > > > For more details, pls see the JBS issue. > > > > > > > > Thanks, Thomas > > > > > > > From harold.seigel at oracle.com Wed Jul 10 12:09:50 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Wed, 10 Jul 2019 08:09:50 -0400 Subject: RFR 8226798: JVM crash in klassItable::initialize_itable_for_interface(int, InstanceKlass*, bool, Thread*) Message-ID: <680017b5-a8d9-2e30-f452-c994162c8a44@oracle.com> Hi, Please review this JDK-14 fix for 8226798.? At class load time, the JVM was incorrectly calculating the size of a class's vtable in cases where a super class, in another package, contained a package private method that was also in a super interface. Open Webrev: http://cr.openjdk.java.net/~hseigel/bug_8226798/webrev/index.html JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8226798 The fix was regression tested by running Mach5 tiers 1 and 2 tests and builds on Linux-x64, Solaris, Windows, and Mac OS X, by running Mach5 tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on Linux-x64. Thanks, Harold From martin.doerr at sap.com Wed Jul 10 12:40:01 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 10 Jul 2019 12:40:01 +0000 Subject: RFR(xs): 8227031: Print NMT statistics on fatal errors In-Reply-To: References: Message-ID: Hi Thomas, looks good to me. vmError.cpp: // Print out NMT statistics if this was desired. What is the value of this comment? I suggest to remove it if we don't have anything important to explain. Best regards, Martin > -----Original Message----- > From: hotspot-runtime-dev bounces at openjdk.java.net> On Behalf Of Thomas St?fe > Sent: Dienstag, 9. Juli 2019 09:32 > To: Baesken, Matthias > Cc: Hotspot dev runtime > Subject: Re: RFR(xs): 8227031: Print NMT statistics on fatal errors > > Hi Matthias, > > > On Tue, Jul 9, 2019 at 9:15 AM Baesken, Matthias > > wrote: > > > Hi Thomas, In wonder about the following : > > > > MemTracker::final_report is called also from print_statistics() : > > > > hotspot/share/runtime/java.cpp > > ----------------------------------------------------- > > void print_statistics() { > > ... > > 353 // Native memory tracking data > > 354 if (PrintNMTStatistics) { > > 355 MemTracker::final_report(tty); > > 356 } > > > > > > Would this mean that when called before from print_statistics() , we > > would not call it again from vmError because of the > > g_final_report_did_run check ? > > > > src/hotspot/share/services/memTracker.cpp > > ----------------------------------------------- > > > > 179 static volatile bool g_final_report_did_run = false; > > 180 void MemTracker::final_report(outputStream* output) { > > 181 // This function is called during both error reporting and normal VM > > exit. > > 182 // However, it should only ever run once. E.g. if the VM crashes > > after > > 183 // printing the final report during normal VM exit, it should not > > print > > 184 // the final report again. In addition, it should be guarded from > > 185 // recursive calls in case NMT reporting itself crashes. > > 186 if (Atomic::cmpxchg(true, &g_final_report_did_run, false) == false) { > > 187 NMT_TrackingLevel level = tracking_level(); > > 188 if (level >= NMT_summary) { > > 189 report(level == NMT_summary, output); > > 190 } > > 191 } > > 192 } > > > > Is this really what we want ? Of course we want to avoid printing it > > twice (or more than that ) from error reporting. > > But I think we would miss it from error reporting in some situations when > > we want it there . > > > > > This is exactly what I wanted: > > MemTracker::final_report(tty) is supposed to print the final report. It is > called in two places, during normal VM shutdown (A) and during error > handling (B). By only allowing the code to run once I get the behaviour I > wanted: > > Case 1: normal shutdown: we execute (A) from before_exit(), all is well. > Case 2: we crash before normal shutdown: we execute (B) from within the > error handler. > Case 3: we crash during normal shutdown: we executed already (A) and > hence > (B) is a noop > Case 4: crash within (!) MemTracker::final_report(): > Case 4.1: MemTracker::final_report() was called during normal shutdown > and crashed (A): - we enter error handling, but we will not re-enter NMT > reporting at (B) which is good since NMT reporting is not reentrant. > Case 4.2: MemTracker::final_report() was called during error handling > and crashed (B): - we enter the secondary signal handler, restart > VMError::report_and_die(), but will not attempt to print NMT report again > > Especially Case 4 is important, since it can lead to hanging error > reporting since NMT is not reentrant and will suffocate on its own lock. > > Arguably, Case 2 and 3 are "just" aesthetics and prevent seeing the same > report twice. > > > > > Otherwise looks okay to me . > > > > > Thanks! > > > > Best regards, Matthias > > > > > Cheers, Thomas > > > > > > > >Hi all, > > > > > >We have -XX:+-PrintNMTStatistics, a very useful switch which will cause > > the > > >VM to print out the NMT statistics if the VM exits normally. > > > > > >Currently it does not work if the VM exits due to a fatal error. But > > >especially in fatal exits due to native OOM a NMT report would be very > > >helpful. > > > > > >JBS: https://bugs.openjdk.java.net/browse/JDK-8227031 > > > > > >cr: > > > > > http://cr.openjdk.java.net/~stuefe/webrevs/8227031-optionally-print- > nmt-report-on-oom/webrev.00/webrev/index.html > > > > > >Changes in this patch: > > >- handle PrintNMTStatistics on fatal error > > >- make sure the final report is not called twice accidentally and it is > > not > > >called recursively due to secondary error handling > > >- change the Metaspace report portion of the NMT report to only include > > the > > >brief metaspace report - that one can be called at any time, it does not > > >lock nor require any resources. > > > > > >Please note: this will not work when we are in an OOM situation and > > request > > >a detailed NMT report; that scenario needs more work since NMT > detailed > > >reports need memory as well. That is a separate issue. > > > > > > > > From thomas.stuefe at gmail.com Wed Jul 10 13:02:17 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 10 Jul 2019 15:02:17 +0200 Subject: RFR(xs): 8227031: Print NMT statistics on fatal errors In-Reply-To: References: Message-ID: Hi Martin, thanks for the review! I will remove the unnecessary comment before pushing. Cheers, Thomas On Wed, Jul 10, 2019 at 2:40 PM Doerr, Martin wrote: > Hi Thomas, > > looks good to me. > > vmError.cpp: > // Print out NMT statistics if this was desired. > > What is the value of this comment? > I suggest to remove it if we don't have anything important to explain. > > Best regards, > Martin > > > > -----Original Message----- > > From: hotspot-runtime-dev > bounces at openjdk.java.net> On Behalf Of Thomas St?fe > > Sent: Dienstag, 9. Juli 2019 09:32 > > To: Baesken, Matthias > > Cc: Hotspot dev runtime > > Subject: Re: RFR(xs): 8227031: Print NMT statistics on fatal errors > > > > Hi Matthias, > > > > > > On Tue, Jul 9, 2019 at 9:15 AM Baesken, Matthias > > > > wrote: > > > > > Hi Thomas, In wonder about the following : > > > > > > MemTracker::final_report is called also from print_statistics() : > > > > > > hotspot/share/runtime/java.cpp > > > ----------------------------------------------------- > > > void print_statistics() { > > > ... > > > 353 // Native memory tracking data > > > 354 if (PrintNMTStatistics) { > > > 355 MemTracker::final_report(tty); > > > 356 } > > > > > > > > > Would this mean that when called before from print_statistics() , > we > > > would not call it again from vmError because of the > > > g_final_report_did_run check ? > > > > > > src/hotspot/share/services/memTracker.cpp > > > ----------------------------------------------- > > > > > > 179 static volatile bool g_final_report_did_run = false; > > > 180 void MemTracker::final_report(outputStream* output) { > > > 181 // This function is called during both error reporting and > normal VM > > > exit. > > > 182 // However, it should only ever run once. E.g. if the VM crashes > > > after > > > 183 // printing the final report during normal VM exit, it should not > > > print > > > 184 // the final report again. In addition, it should be guarded from > > > 185 // recursive calls in case NMT reporting itself crashes. > > > 186 if (Atomic::cmpxchg(true, &g_final_report_did_run, false) == > false) { > > > 187 NMT_TrackingLevel level = tracking_level(); > > > 188 if (level >= NMT_summary) { > > > 189 report(level == NMT_summary, output); > > > 190 } > > > 191 } > > > 192 } > > > > > > Is this really what we want ? Of course we want to avoid printing it > > > twice (or more than that ) from error reporting. > > > But I think we would miss it from error reporting in some situations > when > > > we want it there . > > > > > > > > This is exactly what I wanted: > > > > MemTracker::final_report(tty) is supposed to print the final report. It > is > > called in two places, during normal VM shutdown (A) and during error > > handling (B). By only allowing the code to run once I get the behaviour I > > wanted: > > > > Case 1: normal shutdown: we execute (A) from before_exit(), all is well. > > Case 2: we crash before normal shutdown: we execute (B) from within the > > error handler. > > Case 3: we crash during normal shutdown: we executed already (A) and > > hence > > (B) is a noop > > Case 4: crash within (!) MemTracker::final_report(): > > Case 4.1: MemTracker::final_report() was called during normal > shutdown > > and crashed (A): - we enter error handling, but we will not re-enter NMT > > reporting at (B) which is good since NMT reporting is not reentrant. > > Case 4.2: MemTracker::final_report() was called during error handling > > and crashed (B): - we enter the secondary signal handler, restart > > VMError::report_and_die(), but will not attempt to print NMT report again > > > > Especially Case 4 is important, since it can lead to hanging error > > reporting since NMT is not reentrant and will suffocate on its own lock. > > > > Arguably, Case 2 and 3 are "just" aesthetics and prevent seeing the same > > report twice. > > > > > > > > > Otherwise looks okay to me . > > > > > > > > Thanks! > > > > > > > Best regards, Matthias > > > > > > > > Cheers, Thomas > > > > > > > > > > > >Hi all, > > > > > > > >We have -XX:+-PrintNMTStatistics, a very useful switch which will > cause > > > the > > > >VM to print out the NMT statistics if the VM exits normally. > > > > > > > >Currently it does not work if the VM exits due to a fatal error. But > > > >especially in fatal exits due to native OOM a NMT report would be very > > > >helpful. > > > > > > > >JBS: https://bugs.openjdk.java.net/browse/JDK-8227031 > > > > > > > >cr: > > > > > > > http://cr.openjdk.java.net/~stuefe/webrevs/8227031-optionally-print- > > nmt-report-on-oom/webrev.00/webrev/index.html > > > > > > > >Changes in this patch: > > > >- handle PrintNMTStatistics on fatal error > > > >- make sure the final report is not called twice accidentally and it > is > > > not > > > >called recursively due to secondary error handling > > > >- change the Metaspace report portion of the NMT report to only > include > > > the > > > >brief metaspace report - that one can be called at any time, it does > not > > > >lock nor require any resources. > > > > > > > >Please note: this will not work when we are in an OOM situation and > > > request > > > >a detailed NMT report; that scenario needs more work since NMT > > detailed > > > >reports need memory as well. That is a separate issue. > > > > > > > > > > > > > From coleen.phillimore at oracle.com Wed Jul 10 13:02:39 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 10 Jul 2019 09:02:39 -0400 Subject: RFR[13, xs]: 8227275: Within native OOM error handling, assertions may hang the process In-Reply-To: References: Message-ID: <98a9def9-4b57-d181-cde2-aa14f82d643c@oracle.com> On 7/10/19 1:38 AM, Thomas St?fe wrote: > > Hi Coleen, > > thanks for looking at it! Remarks below. > > On Tue, Jul 9, 2019 at 11:12 PM > wrote: > > > http://cr.openjdk.java.net/~stuefe/webrevs/8227275-native-oom-hanging-assertions/webrev.00/webrev/src/hotspot/share/utilities/debug.cpp.udiff.html > > I don't understand why you don't just leave the poison page PROT_NONE > and call this from handle_assert_poison_fault? > > +void disarm_assert_poison() { > + g_assert_poison = &g_dummy; > +} > + > > Then you don't have to check that it succeeded.?? At this point, it > doesn't matter. > > > Because unfortunately that does not work. At the point it is too late. > > handle_assert_poison_fault() is called from the signal handler to > handle a poison touch SIGSEGV. When it is handled, it will return to > the caller - jumps back to the instruction triggering the SEGV. There, > poison page address is already loaded into a register and cannot be > changed. Oh, I see.? Thank you for the explanation.? Looks good to me then. Coleen > > The only other choice we have, beside removing the write protection > from the poison page, is not to return to the caller. That is what > happens when I return false from handle_assert_poison_fault(). In that > case the signal handler proceeds as if this were a real SEGV. > > Cheers, Thomas > > > Coleen > > On 7/9/19 5:23 AM, Thomas St?fe wrote: > > Dear all, > > > > may I please have reviews for the following issue: > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8227275 > > cr: > > > http://cr.openjdk.java.net/~stuefe/webrevs/8227275-native-oom-hanging-assertions/webrev.00/webrev/ > > > > Summary: on OOM, we may fail to disarm assertion poison page; > this may lead > > to endless loops during error handling if assertions happen in > native OOM > > scenarios. > > > > For more details, pls see the JBS issue. > > > > Thanks, Thomas > From thomas.stuefe at gmail.com Wed Jul 10 13:04:38 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 10 Jul 2019 15:04:38 +0200 Subject: RFR[13, xs]: 8227275: Within native OOM error handling, assertions may hang the process In-Reply-To: <98a9def9-4b57-d181-cde2-aa14f82d643c@oracle.com> References: <98a9def9-4b57-d181-cde2-aa14f82d643c@oracle.com> Message-ID: Great, thanks! .. Thomas On Wed, Jul 10, 2019 at 3:02 PM wrote: > > > On 7/10/19 1:38 AM, Thomas St?fe wrote: > > > Hi Coleen, > > thanks for looking at it! Remarks below. > > On Tue, Jul 9, 2019 at 11:12 PM wrote: > >> >> >> http://cr.openjdk.java.net/~stuefe/webrevs/8227275-native-oom-hanging-assertions/webrev.00/webrev/src/hotspot/share/utilities/debug.cpp.udiff.html >> >> I don't understand why you don't just leave the poison page PROT_NONE >> and call this from handle_assert_poison_fault? >> >> +void disarm_assert_poison() { >> + g_assert_poison = &g_dummy; >> +} >> + >> >> Then you don't have to check that it succeeded. At this point, it >> doesn't matter. >> >> > Because unfortunately that does not work. At the point it is too late. > > handle_assert_poison_fault() is called from the signal handler to handle a > poison touch SIGSEGV. When it is handled, it will return to the caller - > jumps back to the instruction triggering the SEGV. There, poison page > address is already loaded into a register and cannot be changed. > > > Oh, I see. Thank you for the explanation. Looks good to me then. > > Coleen > > > The only other choice we have, beside removing the write protection from > the poison page, is not to return to the caller. That is what happens when > I return false from handle_assert_poison_fault(). In that case the signal > handler proceeds as if this were a real SEGV. > > Cheers, Thomas > > > Coleen >> >> On 7/9/19 5:23 AM, Thomas St?fe wrote: >> > Dear all, >> > >> > may I please have reviews for the following issue: >> > >> > JBS: https://bugs.openjdk.java.net/browse/JDK-8227275 >> > cr: >> > >> http://cr.openjdk.java.net/~stuefe/webrevs/8227275-native-oom-hanging-assertions/webrev.00/webrev/ >> > >> > Summary: on OOM, we may fail to disarm assertion poison page; this may >> lead >> > to endless loops during error handling if assertions happen in native >> OOM >> > scenarios. >> > >> > For more details, pls see the JBS issue. >> > >> > Thanks, Thomas >> >> > From daniel.daugherty at oracle.com Wed Jul 10 13:32:21 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Wed, 10 Jul 2019 09:32:21 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: <43A40540-A760-4C5A-B3CC-D69B276D8A60@oracle.com> References: <72e29c28-7a97-36a0-a633-45d840ccbfb5@oracle.com> <06a48457-5ea3-1801-1409-fcb5f2e5d5b8@oracle.com> <9d02d527-76e0-8460-df38-bc80a0c8ef88@oracle.com> <068FF99F-5AAB-47CD-8DAE-C4435C4967F0@oracle.com> <6D2140BE-BBA0-4530-8A5B-E5B791F94767@oracle.com> <5d53d895-a351-53c3-bdb0-26dbc63a76cb@oracle.com> <43A40540-A760-4C5A-B3CC-D69B276D8A60@oracle.com> Message-ID: <1e27ac1d-2a83-b688-6484-a5f6fa814b86@oracle.com> On 7/9/19 7:05 PM, Kim Barrett wrote: >> On Jul 9, 2019, at 9:13 AM, Daniel D. Daugherty wrote: >> >> Hi Kim, >> >> Thanks for the review. > More like drive by commentary :) Your commentary, drive by or otherwise, is always appreciated... :-) > I?ve never really looked at the interpreter code, and make no > claim to understand it at all. I *think* I understand what?s going on with this change, but I don?t > think you should count me toward the requisite number of reviewers. I have three (R)eviewers at the moment so no worries on that account. Since one of your comments motivated a change to the code, I plan to list you as a reviewer... > >> On 7/8/19 7:00 PM, Kim Barrett wrote: >>>> On Jul 7, 2019, at 8:08 PM, David Holmes wrote: >>>> >>>> On 7/07/2019 6:48 pm, Erik Osterlund wrote: >>>>> The real danger is SPARC though and its BIS instructions. I don?t have the code in front of me, but I really hope not to see that switch statement and non-volatile loop in that pd_disjoint_words_atomic() function. >>>> sparc uses the same loop. >>>> >>>> Let's face it, almost no body expects the compiler to do these kinds of transformations. :( >>> See JDK-8131330 and JDK-8142368, where we saw exactly this sort of transformation from a fill-loop >>> to memset (which may use BIS, and indeed empirically does in some cases). The loops in question >>> seem trivially convertible to memcpy/memmove. >> Very interesting reads. Thanks for pointing those out. >> >> src/hotspot/share/interpreter/templateInterpreter.cpp: >> >> DispatchTable TemplateInterpreter::_active_table; >> DispatchTable TemplateInterpreter::_normal_table; >> DispatchTable TemplateInterpreter::_safept_table; >> >> So it seems like changing _active_table to: >> >> volatile DispatchTable TemplateInterpreter::_active_table; >> >> might be a good idea... Do you concur? > I suspect that might be a problem for various reasons. Reading ahead, I see you?ve run into at > least some, and deferred this to a new RFE. So I think I?m not going to pretend to understand > this code well enough to understand the ramifications of such a change. Agreed. Doing this fix for Robbin (JDK-8227117) has turned into quite the adventure... Seems to be the story of my life right now... :-) > >>> I?ve been reserving Atomic::load/store for cases where the location ?ought? to be declared std::atomic if >>> we were using C++11 atomics (or alternatively some homebrew equivalent). Not all places where we do >>> stuff ?atomically? is appropriate for that though (consider card tables, being arrays of bytes, where using an >>> atomic type might impose alignment constraints that are undesirable here). I *think* just using volatile >>> here would likely be sufficient, e.g. we should have >>> >>> Copy::disjoint_words_atomic(const HeapWord* from,volatile HeapWord* to, size_t count) >> I think this part should be taken up in the follow bug that I filed: >> >> JDK-8227369 pd_disjoint_words_atomic() needs to be atomic >> https://bugs.openjdk.java.net/browse/JDK-8227369 > Agreed. I pasted the above comment and the follow up comment into JDK-8227369 yesterday... Thanks again for chiming in... Dan > >> Thanks for chiming in on the review! >> >> Dan > From daniel.daugherty at oracle.com Wed Jul 10 13:38:21 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Wed, 10 Jul 2019 09:38:21 -0400 Subject: RFR(XXS): 8227338: templateInterpreter.cpp: copy_table() needs to be safer In-Reply-To: <72858E7B-898B-44A8-B7EE-864B77EFBA8B@oracle.com> References: <56F4779A-7C29-4ED1-A742-FD8B61326A76@oracle.com> <6e2d84ad-5591-35bb-990c-134f6531ec9b@oracle.com> <72858E7B-898B-44A8-B7EE-864B77EFBA8B@oracle.com> Message-ID: On 7/9/19 7:17 PM, Kim Barrett wrote: >> On Jul 9, 2019, at 9:19 AM, Daniel D. Daugherty wrote: >> On 7/8/19 7:08 PM, Kim Barrett wrote: >>> src/hotspot/share/interpreter/templateInterpreter.cpp >>> 286 while (size-- > 0) *to++ = *from++; >>> >>> [pre-existing] >>> >>> This ought to be using Copy::disjoint_words. That's even more obvious >>> in conjunction with the change to use Copy::disjoint_words_atomic in >>> the non-safepoint case. >> I can make that change. Is there a specific advantage/reason that you >> have in mind here? > Mostly a ?if we?re going to have these kinds of utility APIs because we think they > are useful, then we really ought to use them? argument. I like that reasoning. > One might benefit from > some highly tuned memcpy-like thing provided by the per-platform implementation > of Copy::disjoint_words. I wondered about that, but I think we're trying to get away from such hand coding (assuming assembly here)... > Of course, the code is sufficiently simple that a compiler > has a reason chance of making the appropriate transformation even without us > telling it. :-) And the chance to make an inappropriate transformation, but we'll deal with that if it happens (in one place)... > The non-performance reason is a named operation is easier to read and > understand than the corresponding explicit loop. Also agreed. We are communicating intent by calling that function. So I did make that change in the CR1 round... >>> src/hotspot/share/interpreter/templateInterpreter.cpp >>> 284 if (SafepointSynchronize::is_at_safepoint()) { >>> >>> I wonder how much benefit we really get from having distinct safepoint >>> and non-safepoint cases, rather than just unconditionally using >>> Copy::disjoint_words_atomic. >> Sorry, I don't know the answer to that. My intention was to use >> Copy::disjoint_words_atomic() only in the case where I knew that >> I needed it so no potential impact on existing uses at a safepoint. > Yeah, I just wasn?t sure how performance critical this copy is. Hm, I see that it might > affect the time to get out of a safepoint, so potentially getting a highly tuned > platform-specific memcpy operation in the safepoint case might indeed be worthwhile. > So okay. Someone said something about a bunch of small performance improvements add up over time... or something about death from a thousand cuts... I can never keep those things straight... :-) Thanks again for chiming in on the review thread. Dan From claes.redestad at oracle.com Wed Jul 10 14:56:07 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 10 Jul 2019 16:56:07 +0200 Subject: RFR [S]: 8227527: LogDecorations should lazily resolve host name Message-ID: Hi, reportedly the uname syscall taken to initialize logDecorations can carry a small but measurable startup cost on some systems/platforms, so the not-used-by-default _host_name should be lazily initialized. Webrev: http://cr.openjdk.java.net/~redestad/8227527/open.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8227527 Testing: t1-3 Thanks! /Claes From gerard.ziemski at oracle.com Wed Jul 10 15:13:27 2019 From: gerard.ziemski at oracle.com (gerard ziemski) Date: Wed, 10 Jul 2019 10:13:27 -0500 Subject: RFR [S]: 8227527: LogDecorations should lazily resolve host name In-Reply-To: References: Message-ID: <04c33892-0c5d-113b-3d2e-050a249bf806@oracle.com> Looks good, thank you for fixing this! cheers On 7/10/19 9:56 AM, Claes Redestad wrote: > Hi, > > reportedly the uname syscall taken to initialize logDecorations can > carry a small but measurable startup cost on some systems/platforms, so > the not-used-by-default _host_name should be lazily initialized. > > Webrev:? http://cr.openjdk.java.net/~redestad/8227527/open.00/ > Bug:???? https://bugs.openjdk.java.net/browse/JDK-8227527 > Testing: t1-3 > > Thanks! > > /Claes From claes.redestad at oracle.com Wed Jul 10 15:37:19 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 10 Jul 2019 17:37:19 +0200 Subject: RFR [S]: 8227527: LogDecorations should lazily resolve host name In-Reply-To: <04c33892-0c5d-113b-3d2e-050a249bf806@oracle.com> References: <04c33892-0c5d-113b-3d2e-050a249bf806@oracle.com> Message-ID: On 2019-07-10 17:13, gerard ziemski wrote: > Looks good, thank you for fixing this! Thanks for reviewing, Gerard! /Claes From karen.kinnear at oracle.com Wed Jul 10 16:22:53 2019 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Wed, 10 Jul 2019 12:22:53 -0400 Subject: RFR 8226798: JVM crash in klassItable::initialize_itable_for_interface(int, InstanceKlass*, bool, Thread*) In-Reply-To: <680017b5-a8d9-2e30-f452-c994162c8a44@oracle.com> References: <680017b5-a8d9-2e30-f452-c994162c8a44@oracle.com> Message-ID: <8F557F98-C399-4785-A289-450B5F88BB0F@oracle.com> Harold, Thank you for figuring out a fix for this. The code looks good for the fix. Thank you for the assertions. A suggestion on the code: After the initial large loop, instead of if (!found_pkg_prvt_method) { do the check } Replace that with "If found_pkg_prvt_method is set, then the ONLY matching method in the superclasses is package private in another package. That matching method will prevent a miranda vtable entry from being created. Because the target method can not override the package private method in another package, then it needs to be the root for its own vtable entry." if (found_pkg_prvt_method) { return true; } Then leave the old code and comment alone. ?? Suggestion on the first set of comments: ?But, that package private method does ?override? any matching methods in super interfaces, so there will be no miranda vtable entry created. So, set flag to TRUE for use below, in case there are no methods in super classes that this target method overrides." thank you so much, Karen > On Jul 10, 2019, at 8:09 AM, Harold Seigel wrote: > > Hi, > > Please review this JDK-14 fix for 8226798. At class load time, the JVM was incorrectly calculating the size of a class's vtable in cases where a super class, in another package, contained a package private method that was also in a super interface. > > Open Webrev: http://cr.openjdk.java.net/~hseigel/bug_8226798/webrev/index.html > > JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8226798 > > The fix was regression tested by running Mach5 tiers 1 and 2 tests and builds on Linux-x64, Solaris, Windows, and Mac OS X, by running Mach5 tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on Linux-x64. > > Thanks, Harold > From sgehwolf at redhat.com Wed Jul 10 17:40:02 2019 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Wed, 10 Jul 2019 19:40:02 +0200 Subject: RFR(S): 8227122: [TESTBUG] Create Docker sidecar test cases In-Reply-To: <11c03bdc-832b-3479-075b-1a8b5dc95e87@oracle.com> References: <11c03bdc-832b-3479-075b-1a8b5dc95e87@oracle.com> Message-ID: Hi Misha, On Tue, 2019-07-02 at 15:24 -0700, mikhailo.seledtsov at oracle.com wrote: > Please review this new test that uses a Docker sidecar pattern to > manage/monitor JVM running in the main payload container. > > Sidecar is a common pattern used in the cloud environments for > monitoring among other uses. In side car pattern the main > application/service container that runs the payload is paired with a > sidecar container. It is achieved by sharing certain namespace > aspects > between the two containers such as PID namespace, specific > sub-directories, IPC and more. > > This test implements the following cases: > - "jcmd -l" to list java processes running in "main" container > from > the "sidecar" container > - "jhsdb jinfo" in the sidecar configuration > - jcmd > > This change also builds a basis for more test cases in the future. > > Minor changes were done to DockerTestUtils: > - changing access to DOCKER_COMMAND constant to public > - minor spelling and terminology corrections > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8227122 > Webrev: http://cr.openjdk.java.net/~mseledtsov/8227122.00/ > Testing: > 1. ran Docker tests on Linux-x64 - PASS > 2. Running Docker tests in test cluster - in progress > // JCMD does not work in sidecar configuration, except for "jcmd -l". // Including this test case to assist in reproduction of the problem. // t.assertIsAlive(); // testCase03(mainProcPid); FWIW, "jcmd -l" doesn't work in this case either. It only sees itself as far as I can tell. It should see the JVM of the host container too. That issue can be fixed by creating a shared /tmp filesystem and mounting into both containers. What's more, this seems to be a case of AttachListener::is_init_trigger[1] and VirtualMachineImpl.createAttachFile[2] disagreeing. The former looks in $(pwd)/.attach_pid or /tmp/.attach_pid and the latter creates it in /proc//root/tmp/.attach_pid. There seems to be more issues involved. As attaching to a JVM inside a container doesn't seem to work from outside which is supposed to be fixed with JDK-8179498. That alone seems to warrant a bug. private static DockerThread startMainContainer() throws Exception { // start "main" container (the observee) DockerRunOptions opts = commonDockerOpts("EventGeneratorLoop"); opts.addDockerOpts("--cap-add=SYS_PTRACE", "--ipc=shareable") Is '--ipc=shareable' really needed? It's not a supported option for my docker here :-( Thanks, Severin [1] http://hg.openjdk.java.net/jdk/jdk/file/ba72dac556c3/src/hotspot/os/linux/attachListener_linux.cpp#l500 [2] http://hg.openjdk.java.net/jdk/jdk/file/ba72dac556c3/src/jdk.attach/linux/classes/sun/tools/attach/VirtualMachineImpl.java#l295 From lois.foltan at oracle.com Wed Jul 10 18:02:11 2019 From: lois.foltan at oracle.com (Lois Foltan) Date: Wed, 10 Jul 2019 14:02:11 -0400 Subject: RFR [S]: 8227527: LogDecorations should lazily resolve host name In-Reply-To: References: Message-ID: On 7/10/2019 10:56 AM, Claes Redestad wrote: > Hi, > > reportedly the uname syscall taken to initialize logDecorations can > carry a small but measurable startup cost on some systems/platforms, so > the not-used-by-default _host_name should be lazily initialized. > > Webrev:? http://cr.openjdk.java.net/~redestad/8227527/open.00/ > Bug:???? https://bugs.openjdk.java.net/browse/JDK-8227527 > Testing: t1-3 > > Thanks! > > /Claes Hi Claes, This looks good.? One minor comment: share/logging/logDecorations.cpp: - line #53 - 55: I assume that if old_value is not equal to NULL implies that host_name and old_value should give you the same host name string, correct? ? You could always add an assert in the if statement if you think warranted, "assert(strcmp(old_value, host_name) != 0, "comment..."); Thanks, Lois From harold.seigel at oracle.com Wed Jul 10 18:34:04 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Wed, 10 Jul 2019 14:34:04 -0400 Subject: RFR 8226798: JVM crash in klassItable::initialize_itable_for_interface(int, InstanceKlass*, bool, Thread*) In-Reply-To: <8F557F98-C399-4785-A289-450B5F88BB0F@oracle.com> References: <680017b5-a8d9-2e30-f452-c994162c8a44@oracle.com> <8F557F98-C399-4785-A289-450B5F88BB0F@oracle.com> Message-ID: <26b62558-50e2-980e-31b7-8707bfba2ef1@oracle.com> Hi Karen, Thanks for looking at this. Please review this updated webrev that includes your suggestions. http://cr.openjdk.java.net/~hseigel/bug_8226798.2/webrev/index.html Thanks! Harold On 7/10/2019 12:22 PM, Karen Kinnear wrote: > Harold, > > Thank you for figuring out a fix for this. The code looks good for the fix. Thank you > for the assertions. > > A suggestion on the code: > > After the initial large loop, instead of if (!found_pkg_prvt_method) { do the check } > Replace that with > > "If found_pkg_prvt_method is set, then the ONLY matching method in the > superclasses is package private in another package. That matching method will > prevent a miranda vtable entry from being created. Because the target method can not > override the package private method in another package, then it needs to be the root > for its own vtable entry." > if (found_pkg_prvt_method) { > return true; > } > > Then leave the old code and comment alone. > > ?? > Suggestion on the first set of comments: > > ?But, that package private method does ?override? any matching methods in super interfaces, > so there will be no miranda vtable entry created. So, set flag to TRUE for use below, in case there are no > methods in super classes that this target method overrides." > > thank you so much, > Karen > >> On Jul 10, 2019, at 8:09 AM, Harold Seigel wrote: >> >> Hi, >> >> Please review this JDK-14 fix for 8226798. At class load time, the JVM was incorrectly calculating the size of a class's vtable in cases where a super class, in another package, contained a package private method that was also in a super interface. >> >> Open Webrev: http://cr.openjdk.java.net/~hseigel/bug_8226798/webrev/index.html >> >> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8226798 >> >> The fix was regression tested by running Mach5 tiers 1 and 2 tests and builds on Linux-x64, Solaris, Windows, and Mac OS X, by running Mach5 tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on Linux-x64. >> >> Thanks, Harold >> From karen.kinnear at oracle.com Wed Jul 10 19:04:01 2019 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Wed, 10 Jul 2019 15:04:01 -0400 Subject: RFR 8226798: JVM crash in klassItable::initialize_itable_for_interface(int, InstanceKlass*, bool, Thread*) In-Reply-To: <26b62558-50e2-980e-31b7-8707bfba2ef1@oracle.com> References: <680017b5-a8d9-2e30-f452-c994162c8a44@oracle.com> <8F557F98-C399-4785-A289-450B5F88BB0F@oracle.com> <26b62558-50e2-980e-31b7-8707bfba2ef1@oracle.com> Message-ID: Looks good to me. Thank you very much, Karen > On Jul 10, 2019, at 2:34 PM, Harold Seigel > wrote: > > Hi Karen, > > Thanks for looking at this. > > Please review this updated webrev that includes your suggestions. > > http://cr.openjdk.java.net/~hseigel/bug_8226798.2/webrev/index.html > Thanks! Harold > > On 7/10/2019 12:22 PM, Karen Kinnear wrote: >> Harold, >> >> Thank you for figuring out a fix for this. The code looks good for the fix. Thank you >> for the assertions. >> >> A suggestion on the code: >> >> After the initial large loop, instead of if (!found_pkg_prvt_method) { do the check } >> Replace that with >> >> "If found_pkg_prvt_method is set, then the ONLY matching method in the >> superclasses is package private in another package. That matching method will >> prevent a miranda vtable entry from being created. Because the target method can not >> override the package private method in another package, then it needs to be the root >> for its own vtable entry." >> if (found_pkg_prvt_method) { >> return true; >> } >> >> Then leave the old code and comment alone. >> >> ?? >> Suggestion on the first set of comments: >> >> ?But, that package private method does ?override? any matching methods in super interfaces, >> so there will be no miranda vtable entry created. So, set flag to TRUE for use below, in case there are no >> methods in super classes that this target method overrides." >> >> thank you so much, >> Karen >> >>> On Jul 10, 2019, at 8:09 AM, Harold Seigel wrote: >>> >>> Hi, >>> >>> Please review this JDK-14 fix for 8226798. At class load time, the JVM was incorrectly calculating the size of a class's vtable in cases where a super class, in another package, contained a package private method that was also in a super interface. >>> >>> Open Webrev: http://cr.openjdk.java.net/~hseigel/bug_8226798/webrev/index.html >>> >>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8226798 >>> >>> The fix was regression tested by running Mach5 tiers 1 and 2 tests and builds on Linux-x64, Solaris, Windows, and Mac OS X, by running Mach5 tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on Linux-x64. >>> >>> Thanks, Harold >>> From harold.seigel at oracle.com Wed Jul 10 19:05:22 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Wed, 10 Jul 2019 15:05:22 -0400 Subject: RFR 8226798: JVM crash in klassItable::initialize_itable_for_interface(int, InstanceKlass*, bool, Thread*) In-Reply-To: References: <680017b5-a8d9-2e30-f452-c994162c8a44@oracle.com> <8F557F98-C399-4785-A289-450B5F88BB0F@oracle.com> <26b62558-50e2-980e-31b7-8707bfba2ef1@oracle.com> Message-ID: <3ef67c55-1e6a-2592-856f-5f01e53b04e6@oracle.com> Thanks again! Harold On 7/10/2019 3:04 PM, Karen Kinnear wrote: > Looks good to me. > > Thank you very much, > Karen > >> On Jul 10, 2019, at 2:34 PM, Harold Seigel > > wrote: >> >> Hi Karen, >> >> Thanks for looking at this. >> >> Please review this updated webrev that includes your suggestions. >> >> http://cr.openjdk.java.net/~hseigel/bug_8226798.2/webrev/index.html >> >> Thanks! Harold >> >> On 7/10/2019 12:22 PM, Karen Kinnear wrote: >>> Harold, >>> >>> Thank you for figuring out a fix for this. The code looks good for the fix. Thank you >>> for the assertions. >>> >>> A suggestion on the code: >>> >>> After the initial large loop, instead of if (!found_pkg_prvt_method) { do the check } >>> Replace that with >>> >>> "If found_pkg_prvt_method is set, then the ONLY matching method in the >>> superclasses is package private in another package. That matching method will >>> prevent a miranda vtable entry from being created. Because the target method can not >>> override the package private method in another package, then it needs to be the root >>> for its own vtable entry." >>> if (found_pkg_prvt_method) { >>> return true; >>> } >>> >>> Then leave the old code and comment alone. >>> >>> ?? >>> Suggestion on the first set of comments: >>> >>> ?But, that package private method does ?override? any matching methods in super interfaces, >>> so there will be no miranda vtable entry created. So, set flag to TRUE for use below, in case there are no >>> methods in super classes that this target method overrides." >>> >>> thank you so much, >>> Karen >>> >>>> On Jul 10, 2019, at 8:09 AM, Harold Seigel wrote: >>>> >>>> Hi, >>>> >>>> Please review this JDK-14 fix for 8226798. At class load time, the JVM was incorrectly calculating the size of a class's vtable in cases where a super class, in another package, contained a package private method that was also in a super interface. >>>> >>>> Open Webrev:http://cr.openjdk.java.net/~hseigel/bug_8226798/webrev/index.html >>>> >>>> JBS Bug:https://bugs.openjdk.java.net/browse/JDK-8226798 >>>> >>>> The fix was regression tested by running Mach5 tiers 1 and 2 tests and builds on Linux-x64, Solaris, Windows, and Mac OS X, by running Mach5 tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on Linux-x64. >>>> >>>> Thanks, Harold >>>> > From claes.redestad at oracle.com Wed Jul 10 19:30:03 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 10 Jul 2019 21:30:03 +0200 Subject: RFR [S]: 8227527: LogDecorations should lazily resolve host name In-Reply-To: References: Message-ID: Hi, On 2019-07-10 20:02, Lois Foltan wrote: > On 7/10/2019 10:56 AM, Claes Redestad wrote: >> Hi, >> >> reportedly the uname syscall taken to initialize logDecorations can >> carry a small but measurable startup cost on some systems/platforms, so >> the not-used-by-default _host_name should be lazily initialized. >> >> Webrev:? http://cr.openjdk.java.net/~redestad/8227527/open.00/ >> Bug:???? https://bugs.openjdk.java.net/browse/JDK-8227527 >> Testing: t1-3 >> >> Thanks! >> >> /Claes > Hi Claes, > > This looks good. thanks, Lois! >? One minor comment: > > share/logging/logDecorations.cpp: > - line #53 - 55: I assume that if old_value is not equal to NULL implies > that host_name and old_value should give you the same host name string, > correct? > ? You could always add an assert in the if statement if you think > warranted, "assert(strcmp(old_value, host_name) != 0, "comment..."); In most cases they'll be the same, yes, but such an assert could trigger if the host name is being changed and 2 or more threads are racing to init _host_name. Extremely unlikely, but if it ever happened I think we should just use the installed value. This isn't very different from the behavior today (host name changes are ignored), while observing a sudden host name change during execution could be very surprising for various log parsers. Does that sound reasonable? /Claes From thomas.stuefe at gmail.com Wed Jul 10 20:15:53 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 10 Jul 2019 22:15:53 +0200 Subject: RFR [S]: 8227527: LogDecorations should lazily resolve host name In-Reply-To: References: Message-ID: Hi Claes, This looks all very good. My only remark is not directed at your patch, but I would prefer diagnostic code like logging not to crash at a native OOM (I refer to the strdup), but rather handle it gracefully, e.g. by just printing "???" as hostname. Cheers, Thomas On Wed, Jul 10, 2019, 16:55 Claes Redestad wrote: > Hi, > > reportedly the uname syscall taken to initialize logDecorations can > carry a small but measurable startup cost on some systems/platforms, so > the not-used-by-default _host_name should be lazily initialized. > > Webrev: http://cr.openjdk.java.net/~redestad/8227527/open.00/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8227527 > Testing: t1-3 > > Thanks! > > /Claes > From jianglizhou at google.com Wed Jul 10 20:26:21 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Wed, 10 Jul 2019 13:26:21 -0700 Subject: RFR(S) 8226406: JVM fails to detect mismatched or corrupt CDS archive In-Reply-To: References: <1c5ab850-8b93-0b1f-3be4-8ba9f6eb469d@oracle.com> <4f7f0d69-9729-2d98-1437-83e5276b041c@oracle.com> <53b972d6-a253-b2a7-e9f5-1d533753ca0d@oracle.com> Message-ID: The updates look ok. We do have cdsoffsets.cpp in JDK 11 (cds.h doesn't exist in JDK 11), so it shouldn't cause backport issue. Best, Jiangli On Wed, Jul 10, 2019 at 12:14 AM Calvin Cheung wrote: > > > On 7/9/19 5:25 PM, Jiangli Zhou wrote: > > Hi Calvin, > > > > On Mon, Jul 8, 2019 at 11:31 PM Calvin Cheung wrote: > >> > >> On 7/8/19 4:38 PM, Jiangli Zhou wrote: > >>> Hi Calvin, > >>> > >>> - src/hotspot/share/include/cds.h > >>> > >>> 36 #define NUM_CDS_REGIONS 8 > >>> > >>> The above change would need to be hand fixed when backporting to older > >>> versions. It's fine to include it in the current review, but it's > >>> better to create a separate bug and commit using that bug ID. So it > >>> will make the backports cleaner. > >> I don't think it is worthwhile filing a bug just for this line. > >> > >> I've added a comment as follows: > >> > >> 36 #define NUM_CDS_REGIONS 8 // this must be the same as > >> MetaspaceShared::n_regions > > The issue is that one needs to manually change or revert the above > > when backporting the change to older JDK versions, so the backport is > > not clean and introduces risks. Committing it under a separate bug id > > will avoid that issue. Or, you could combine the above it with some > > other change later. In general, it's a good practice to avoid > > combining unrelated changes in one changeset (with single bug id). > > I've reverted the changes in cds.h and filemap.hpp and filed the > following bug for the NUM_CDS_REGIONS adjustment: > > https://bugs.openjdk.java.net/browse/JDK-8227496 > > Updated webrev: > > http://cr.openjdk.java.net/~ccheung/8226406/webrev.02/ > > thanks, > > Calvin > > > > >>> -------- > >>> > >>> 39 #define CDS_END_MAGIC 0xf00babae > >>> > >>> What's the significance of the new end magic? Should the existing > >>> header validation be sufficient as long as it's done first? > >> It seems unnecessary now. I got rid of it. > >>> -------- > >>> > >>> - src/hotspot/share/memory/filemap.cpp > >>> > >>> 901 if (_header->_magic != CDS_ARCHIVE_MAGIC && _header->_magic != > >>> CDS_DYNAMIC_ARCHIVE_MAGIC) { > >>> 902 unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : > >>> CDS_DYNAMIC_ARCHIVE_MAGIC; > >>> 903 log_info(cds)("_magic expected: 0x%08x", expected_magic); > >>> 904 log_info(cds)(" actual: 0x%08x", _header->_magic); > >>> 905 FileMapInfo::fail_continue("The shared archive file has a bad > >>> magic number."); > >>> 906 return false; > >>> 907 } > >>> ... > >>> > >>> 964 if (is_static) { > >>> 965 if (_header->_magic != CDS_ARCHIVE_MAGIC) { > >>> 966 fail_continue("Incorrect static archive magic number"); > >>> 967 return false; > >>> 968 } > >>> > >>> There are two checks for _header->_magic in > >>> FileMapInfo::init_from_file now but behave differently. The second one > >>> can be removed. The first check at line 901 should check the _magic > >>> value based on the 'is_static' flag: > >>> > >>> unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : > >>> CDS_DYNAMIC_ARCHIVE_MAGIC; > >>> if (_header->_magic != expected_magic) { > >>> ... > >> I've made the above change. > >>> -------- > >>> > >>> Most of the work now in FileMapInfo::init_from_file should really > >>> belong to FileMapInfo::validate_header. It would be cleaner to simply > >>> FileMapInfo::init_from_file to be the following and move the rest to > >>> FileMapInfo::validate_header. Thoughts? > >>> > >>> 888 bool FileMapInfo::init_from_file(int fd, bool is_static) { > >>> 889 size_t sz = is_static ? sizeof(FileMapHeader) : > >>> sizeof(DynamicArchiveHeader); > >>> 890 size_t n = os::read(fd, _header, (unsigned int)sz); > >>> 891 if (n != sz) { > >>> 892 fail_continue("Unable to read the file header."); > >>> 893 return false; > >>> 894 } > >>> 895 return true; > >>> } > >> The _file_offset will be based on the size_t n and some other fields > >> (_paths_misc_info, SharedBaseAddress) will be set at lines 953 - 976. > >> Also, there's the following check in validate_header(): > >> > >> 1859 if > >> (!ClassLoader::check_shared_paths_misc_info(_paths_misc_info, > >> _header->_paths_misc_info_size, is_static)) { > >> > >> If the SharedPathsMiscInfo could be removed (JDK-8227370), then it is > >> possible that validate_header could be called within init_from_file. I > >> think we should defer this until JDK-8227370. > > Ok for deferring it. > > > > Best, > > Jiangli > > > >> updated webrev: > >> > >> http://cr.openjdk.java.net/~ccheung/8226406/webrev.01/ > >> > >> thanks, > >> > >> Calvin > >> > >>> Best regards, > >>> > >>> Jiangli > >>> > >>> > >>> On Mon, Jul 8, 2019 at 10:25 AM Jiangli Zhou wrote: > >>>> Hi Calvin, > >>>> > >>>> On Mon, Jul 8, 2019 at 10:00 AM Calvin Cheung wrote: > >>>>> Hi Jiangli, > >>>>> > >>>>> On 7/7/19 5:12 PM, Jiangli Zhou wrote: > >>>>>> Hi Calvin, > >>>>>> > >>>>>> Per our off-mailing-list email exchange from the previous code review > >>>>>> for https://bugs.openjdk.java.net/browse/JDK-8211723, I created > >>>>>> https://bugs.openjdk.java.net/browse/JDK-8227370, 'Remove > >>>>>> SharedPathsMiscInfo' > >>>>> Thanks for filing the RFE. > >>>>>> . I think the crash caused by premature runtime accessing of > >>>>>> _paths_misc_info_size should be handled as part of JDK-8227370, rather > >>>>>> than further patching up the SharedPathsMiscInfo > >>>>> My current patch involves checking most the fields in > >>>>> CDSFileMapHeaderBase before accessing other fields. This part is > >>>>> applicable to other fields, not only to the _paths_misc_info_size. This > >>>>> bug existed for a while and I think it would be a good backport > >>>>> candidate for 11u. The patch for JDK-8211723 and the follow-up RFE > >>>>> JDK-8227370 are not necessary to be backported to 11u. I'd like to fix > >>>>> this bug first and then handle JDK-8227370 as a separate changeset. > >>>> That sounds like a good plan. A fix targeted for backporting should > >>>> have a clean-cut (less dependency) and controlled scope. Addressing > >>>> this incrementally in separate changesets is a suitable approach. > >>>> > >>>> I took a quick look over the weekend and noticed some issues with your > >>>> current patch. That's why I suggested to go with the complete removal > >>>> without spending extra effort on SharedPathsMiscInfo. I will need to > >>>> take a closer look and try to get back to you later today. > >>>> > >>>> Best regards, > >>>> Jiangli > >>>> > >>>>> thanks, > >>>>> > >>>>> Calvin > >>>>> > >>>>>> Thanks and regards, > >>>>>> Jiangli > >>>>>> > >>>>>> On Wed, Jul 3, 2019 at 5:59 PM Calvin Cheung wrote: > >>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8226406 > >>>>>>> > >>>>>>> webrev: http://cr.openjdk.java.net/~ccheung/8226406/webrev.00/ > >>>>>>> > >>>>>>> This bug was found during a bootcycle build when a shared archive built > >>>>>>> by a 64-bit JDK version is used by a 32-bit JDK version. It is due to > >>>>>>> some of the important header fields such as the _jvm_ident was not > >>>>>>> checked prior to accessinng other fields such as the _paths_misc_info_size. > >>>>>>> > >>>>>>> This fix involves checking most the fields in CDSFileMapHeaderBase > >>>>>>> before accessing other fields. > >>>>>>> > >>>>>>> Testing: tiers 1-3. > >>>>>>> > >>>>>>> thanks, > >>>>>>> > >>>>>>> Calvin > >>>>>>> From lois.foltan at oracle.com Wed Jul 10 20:29:13 2019 From: lois.foltan at oracle.com (Lois Foltan) Date: Wed, 10 Jul 2019 16:29:13 -0400 Subject: RFR [S]: 8227527: LogDecorations should lazily resolve host name In-Reply-To: References: Message-ID: <30b8523b-4471-bc90-82a5-62058009e5a8@oracle.com> On 7/10/2019 3:30 PM, Claes Redestad wrote: > Hi, > > On 2019-07-10 20:02, Lois Foltan wrote: >> On 7/10/2019 10:56 AM, Claes Redestad wrote: >>> Hi, >>> >>> reportedly the uname syscall taken to initialize logDecorations can >>> carry a small but measurable startup cost on some systems/platforms, so >>> the not-used-by-default _host_name should be lazily initialized. >>> >>> Webrev:? http://cr.openjdk.java.net/~redestad/8227527/open.00/ >>> Bug:???? https://bugs.openjdk.java.net/browse/JDK-8227527 >>> Testing: t1-3 >>> >>> Thanks! >>> >>> /Claes >> Hi Claes, >> >> This looks good. > > thanks, Lois! > >> ? One minor comment: >> >> share/logging/logDecorations.cpp: >> - line #53 - 55: I assume that if old_value is not equal to NULL >> implies that host_name and old_value should give you the same host >> name string, correct? >> ?? You could always add an assert in the if statement if you think >> warranted, "assert(strcmp(old_value, host_name) != 0, "comment..."); > > In most cases they'll be the same, yes, but such an assert could trigger > if the host name is being changed and 2 or more threads are racing to > init _host_name. Extremely unlikely, but if it ever happened I think we > should just use the installed value. This isn't very different from the > behavior today (host name changes are ignored), while observing a sudden > host name change during execution could be very surprising for various > log parsers. > > Does that sound reasonable? Yes, thank you! Lois > > /Claes From lois.foltan at oracle.com Wed Jul 10 20:29:38 2019 From: lois.foltan at oracle.com (Lois Foltan) Date: Wed, 10 Jul 2019 16:29:38 -0400 Subject: RFR 8226798: JVM crash in klassItable::initialize_itable_for_interface(int, InstanceKlass*, bool, Thread*) In-Reply-To: References: <680017b5-a8d9-2e30-f452-c994162c8a44@oracle.com> <8F557F98-C399-4785-A289-450B5F88BB0F@oracle.com> <26b62558-50e2-980e-31b7-8707bfba2ef1@oracle.com> Message-ID: <88bdbd3d-754d-c139-04d2-132b373f19db@oracle.com> +1. Lois On 7/10/2019 3:04 PM, Karen Kinnear wrote: > Looks good to me. > > Thank you very much, > Karen > >> On Jul 10, 2019, at 2:34 PM, Harold Seigel > wrote: >> >> Hi Karen, >> >> Thanks for looking at this. >> >> Please review this updated webrev that includes your suggestions. >> >> http://cr.openjdk.java.net/~hseigel/bug_8226798.2/webrev/index.html >> Thanks! Harold >> >> On 7/10/2019 12:22 PM, Karen Kinnear wrote: >>> Harold, >>> >>> Thank you for figuring out a fix for this. The code looks good for the fix. Thank you >>> for the assertions. >>> >>> A suggestion on the code: >>> >>> After the initial large loop, instead of if (!found_pkg_prvt_method) { do the check } >>> Replace that with >>> >>> "If found_pkg_prvt_method is set, then the ONLY matching method in the >>> superclasses is package private in another package. That matching method will >>> prevent a miranda vtable entry from being created. Because the target method can not >>> override the package private method in another package, then it needs to be the root >>> for its own vtable entry." >>> if (found_pkg_prvt_method) { >>> return true; >>> } >>> >>> Then leave the old code and comment alone. >>> >>> ?? >>> Suggestion on the first set of comments: >>> >>> ?But, that package private method does ?override? any matching methods in super interfaces, >>> so there will be no miranda vtable entry created. So, set flag to TRUE for use below, in case there are no >>> methods in super classes that this target method overrides." >>> >>> thank you so much, >>> Karen >>> >>>> On Jul 10, 2019, at 8:09 AM, Harold Seigel wrote: >>>> >>>> Hi, >>>> >>>> Please review this JDK-14 fix for 8226798. At class load time, the JVM was incorrectly calculating the size of a class's vtable in cases where a super class, in another package, contained a package private method that was also in a super interface. >>>> >>>> Open Webrev: http://cr.openjdk.java.net/~hseigel/bug_8226798/webrev/index.html >>>> >>>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8226798 >>>> >>>> The fix was regression tested by running Mach5 tiers 1 and 2 tests and builds on Linux-x64, Solaris, Windows, and Mac OS X, by running Mach5 tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on Linux-x64. >>>> >>>> Thanks, Harold >>>> From harold.seigel at oracle.com Wed Jul 10 20:30:55 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Wed, 10 Jul 2019 16:30:55 -0400 Subject: RFR 8226798: JVM crash in klassItable::initialize_itable_for_interface(int, InstanceKlass*, bool, Thread*) In-Reply-To: <88bdbd3d-754d-c139-04d2-132b373f19db@oracle.com> References: <680017b5-a8d9-2e30-f452-c994162c8a44@oracle.com> <8F557F98-C399-4785-A289-450B5F88BB0F@oracle.com> <26b62558-50e2-980e-31b7-8707bfba2ef1@oracle.com> <88bdbd3d-754d-c139-04d2-132b373f19db@oracle.com> Message-ID: <49143109-fed2-99d3-07ec-b823d71a7933@oracle.com> Thanks Lois! Harold On 7/10/2019 4:29 PM, Lois Foltan wrote: > +1. > Lois > > On 7/10/2019 3:04 PM, Karen Kinnear wrote: >> Looks good to me. >> >> Thank you very much, >> Karen >> >>> On Jul 10, 2019, at 2:34 PM, Harold Seigel >> > wrote: >>> >>> Hi Karen, >>> >>> Thanks for looking at this. >>> >>> Please review this updated webrev that includes your suggestions. >>> >>> http://cr.openjdk.java.net/~hseigel/bug_8226798.2/webrev/index.html >>> >>> Thanks! Harold >>> >>> On 7/10/2019 12:22 PM, Karen Kinnear wrote: >>>> Harold, >>>> >>>> Thank you for figuring out a fix for this. The code looks good for >>>> the fix. Thank you >>>> for the assertions. >>>> >>>> A suggestion on the code: >>>> >>>> After the initial large loop, instead of if >>>> (!found_pkg_prvt_method) { do the check } >>>> Replace that with >>>> >>>> "If found_pkg_prvt_method is set, then the ONLY matching method in the >>>> superclasses is package private in another package. That matching >>>> method will >>>> prevent a miranda vtable entry from being created. Because the >>>> target method can not >>>> override the package private method in another package, then it >>>> needs to be the root >>>> for its own vtable entry." >>>> ?? if (found_pkg_prvt_method) { >>>> ????? return true; >>>> ?? } >>>> >>>> Then leave the old code and comment alone. >>>> >>>> ?? >>>> Suggestion on the first set of comments: >>>> >>>> ?But, that package private method does ?override? any matching >>>> methods in super interfaces, >>>> so there will be no miranda vtable entry created. So, set flag to >>>> TRUE? for use below, in case there are no >>>> methods in super classes that this target method overrides." >>>> >>>> thank you so much, >>>> Karen >>>> >>>>> On Jul 10, 2019, at 8:09 AM, Harold Seigel >>>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> Please review this JDK-14 fix for 8226798.? At class load time, >>>>> the JVM was incorrectly calculating the size of a class's vtable >>>>> in cases where a super class, in another package, contained a >>>>> package private method that was also in a super interface. >>>>> >>>>> Open Webrev: >>>>> http://cr.openjdk.java.net/~hseigel/bug_8226798/webrev/index.html >>>>> >>>>> >>>>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8226798 >>>>> >>>>> >>>>> The fix was regression tested by running Mach5 tiers 1 and 2 tests >>>>> and builds on Linux-x64, Solaris, Windows, and Mac OS X, by >>>>> running Mach5 tiers 3-5 tests on Linux-x64, and JCK lang and VM >>>>> tests on Linux-x64. >>>>> >>>>> Thanks, Harold >>>>> > From jianglizhou at google.com Wed Jul 10 21:06:55 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Wed, 10 Jul 2019 14:06:55 -0700 Subject: RFR(S) 8226406: JVM fails to detect mismatched or corrupt CDS archive In-Reply-To: References: <1c5ab850-8b93-0b1f-3be4-8ba9f6eb469d@oracle.com> <4f7f0d69-9729-2d98-1437-83e5276b041c@oracle.com> Message-ID: On Wed, Jul 10, 2019 at 12:29 AM Calvin Cheung wrote: > > > On 7/9/19 5:32 PM, Jiangli Zhou wrote: > > On Mon, Jul 8, 2019 at 11:35 PM Calvin Cheung wrote: > >> > >> On 7/8/19 4:45 PM, Jiangli Zhou wrote: > >>> -#define CURRENT_CDS_ARCHIVE_VERSION 5 > >>> +#define CURRENT_CDS_ARCHIVE_VERSION 6 > >>> > >>> I would also suggestion to not do the above change in this bug fix > >>> since that would make all older versions to use '6' when backported > >>> (unless hand merge is involved). > >> Since the _jvm_ident field has been moved to a different location, I > >> think the CURRENT_CDS_ARCHIVE_VERSION should be updated. Even if the > >> version stays the same, shared archive created by an older version of > >> JVM cannot be used by the current JVM version. > > Can you please clarify the reason for moving the field? > One advantage is that there's currently a 4-byte gap between _version > and _space. Placing the _jvm_ident field after _version, the first 4 > fields will be 4-byte aligned. Anyway, I've reverted the change in my > latest webrev. > 57 struct CDSFileMapHeaderBase { > 58 unsigned int _magic; // identify file type > 59 int _crc; // header crc checksum > 60 int _version; // must be > CURRENT_CDS_ARCHIVE_VERSION > 61 struct CDSFileMapRegion _space[NUM_CDS_REGIONS]; > 62 }; > > > > It's confusing for all different JDK versions to have the same > > CURRENT_CDS_ARCHIVE_VERSION but with significantly different archive > > layouts. > > I've checked the change history of the CURRENT_CDS_ARCHIVE_VERSION. Last > update was for the following bug fix: > > 8208658: Make CDS archived heap regions usable even if compressed oop > encoding has changed > > Since there were 2 fields added to the header for the dynamic CDS > archive, the version should have been updated again. Should I file a bug > to update the version? Filing a bug sounds good. Best, Jiangli > > thanks, > > Calvin > > > > > Best, > > Jiangli > >> thanks, > >> > >> Calvin > >> > >>> Thanks, > >>> Jiangli > >>> > >>> On Mon, Jul 8, 2019 at 4:38 PM Jiangli Zhou wrote: > >>>> Hi Calvin, > >>>> > >>>> - src/hotspot/share/include/cds.h > >>>> > >>>> 36 #define NUM_CDS_REGIONS 8 > >>>> > >>>> The above change would need to be hand fixed when backporting to older > >>>> versions. It's fine to include it in the current review, but it's > >>>> better to create a separate bug and commit using that bug ID. So it > >>>> will make the backports cleaner. > >>>> > >>>> -------- > >>>> > >>>> 39 #define CDS_END_MAGIC 0xf00babae > >>>> > >>>> What's the significance of the new end magic? Should the existing > >>>> header validation be sufficient as long as it's done first? > >>>> > >>>> -------- > >>>> > >>>> - src/hotspot/share/memory/filemap.cpp > >>>> > >>>> 901 if (_header->_magic != CDS_ARCHIVE_MAGIC && _header->_magic != > >>>> CDS_DYNAMIC_ARCHIVE_MAGIC) { > >>>> 902 unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : > >>>> CDS_DYNAMIC_ARCHIVE_MAGIC; > >>>> 903 log_info(cds)("_magic expected: 0x%08x", expected_magic); > >>>> 904 log_info(cds)(" actual: 0x%08x", _header->_magic); > >>>> 905 FileMapInfo::fail_continue("The shared archive file has a bad > >>>> magic number."); > >>>> 906 return false; > >>>> 907 } > >>>> ... > >>>> > >>>> 964 if (is_static) { > >>>> 965 if (_header->_magic != CDS_ARCHIVE_MAGIC) { > >>>> 966 fail_continue("Incorrect static archive magic number"); > >>>> 967 return false; > >>>> 968 } > >>>> > >>>> There are two checks for _header->_magic in > >>>> FileMapInfo::init_from_file now but behave differently. The second one > >>>> can be removed. The first check at line 901 should check the _magic > >>>> value based on the 'is_static' flag: > >>>> > >>>> unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : > >>>> CDS_DYNAMIC_ARCHIVE_MAGIC; > >>>> if (_header->_magic != expected_magic) { > >>>> ... > >>>> > >>>> -------- > >>>> > >>>> Most of the work now in FileMapInfo::init_from_file should really > >>>> belong to FileMapInfo::validate_header. It would be cleaner to simply > >>>> FileMapInfo::init_from_file to be the following and move the rest to > >>>> FileMapInfo::validate_header. Thoughts? > >>>> > >>>> 888 bool FileMapInfo::init_from_file(int fd, bool is_static) { > >>>> 889 size_t sz = is_static ? sizeof(FileMapHeader) : > >>>> sizeof(DynamicArchiveHeader); > >>>> 890 size_t n = os::read(fd, _header, (unsigned int)sz); > >>>> 891 if (n != sz) { > >>>> 892 fail_continue("Unable to read the file header."); > >>>> 893 return false; > >>>> 894 } > >>>> 895 return true; > >>>> } > >>>> > >>>> Best regards, > >>>> > >>>> Jiangli > >>>> > >>>> > >>>> On Mon, Jul 8, 2019 at 10:25 AM Jiangli Zhou wrote: > >>>>> Hi Calvin, > >>>>> > >>>>> On Mon, Jul 8, 2019 at 10:00 AM Calvin Cheung wrote: > >>>>>> Hi Jiangli, > >>>>>> > >>>>>> On 7/7/19 5:12 PM, Jiangli Zhou wrote: > >>>>>>> Hi Calvin, > >>>>>>> > >>>>>>> Per our off-mailing-list email exchange from the previous code review > >>>>>>> for https://bugs.openjdk.java.net/browse/JDK-8211723, I created > >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227370, 'Remove > >>>>>>> SharedPathsMiscInfo' > >>>>>> Thanks for filing the RFE. > >>>>>>> . I think the crash caused by premature runtime accessing of > >>>>>>> _paths_misc_info_size should be handled as part of JDK-8227370, rather > >>>>>>> than further patching up the SharedPathsMiscInfo > >>>>>> My current patch involves checking most the fields in > >>>>>> CDSFileMapHeaderBase before accessing other fields. This part is > >>>>>> applicable to other fields, not only to the _paths_misc_info_size. This > >>>>>> bug existed for a while and I think it would be a good backport > >>>>>> candidate for 11u. The patch for JDK-8211723 and the follow-up RFE > >>>>>> JDK-8227370 are not necessary to be backported to 11u. I'd like to fix > >>>>>> this bug first and then handle JDK-8227370 as a separate changeset. > >>>>> That sounds like a good plan. A fix targeted for backporting should > >>>>> have a clean-cut (less dependency) and controlled scope. Addressing > >>>>> this incrementally in separate changesets is a suitable approach. > >>>>> > >>>>> I took a quick look over the weekend and noticed some issues with your > >>>>> current patch. That's why I suggested to go with the complete removal > >>>>> without spending extra effort on SharedPathsMiscInfo. I will need to > >>>>> take a closer look and try to get back to you later today. > >>>>> > >>>>> Best regards, > >>>>> Jiangli > >>>>> > >>>>>> thanks, > >>>>>> > >>>>>> Calvin > >>>>>> > >>>>>>> Thanks and regards, > >>>>>>> Jiangli > >>>>>>> > >>>>>>> On Wed, Jul 3, 2019 at 5:59 PM Calvin Cheung wrote: > >>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8226406 > >>>>>>>> > >>>>>>>> webrev: http://cr.openjdk.java.net/~ccheung/8226406/webrev.00/ > >>>>>>>> > >>>>>>>> This bug was found during a bootcycle build when a shared archive built > >>>>>>>> by a 64-bit JDK version is used by a 32-bit JDK version. It is due to > >>>>>>>> some of the important header fields such as the _jvm_ident was not > >>>>>>>> checked prior to accessinng other fields such as the _paths_misc_info_size. > >>>>>>>> > >>>>>>>> This fix involves checking most the fields in CDSFileMapHeaderBase > >>>>>>>> before accessing other fields. > >>>>>>>> > >>>>>>>> Testing: tiers 1-3. > >>>>>>>> > >>>>>>>> thanks, > >>>>>>>> > >>>>>>>> Calvin > >>>>>>>> From calvin.cheung at oracle.com Wed Jul 10 21:54:42 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Wed, 10 Jul 2019 14:54:42 -0700 Subject: RFR(S) 8226406: JVM fails to detect mismatched or corrupt CDS archive In-Reply-To: References: <1c5ab850-8b93-0b1f-3be4-8ba9f6eb469d@oracle.com> <4f7f0d69-9729-2d98-1437-83e5276b041c@oracle.com> <53b972d6-a253-b2a7-e9f5-1d533753ca0d@oracle.com> Message-ID: Thanks for taking another look. Calvin On 7/10/19 1:26 PM, Jiangli Zhou wrote: > The updates look ok. We do have cdsoffsets.cpp in JDK 11 (cds.h > doesn't exist in JDK 11), so it shouldn't cause backport issue. > > Best, > Jiangli > > On Wed, Jul 10, 2019 at 12:14 AM Calvin Cheung wrote: >> >> On 7/9/19 5:25 PM, Jiangli Zhou wrote: >>> Hi Calvin, >>> >>> On Mon, Jul 8, 2019 at 11:31 PM Calvin Cheung wrote: >>>> On 7/8/19 4:38 PM, Jiangli Zhou wrote: >>>>> Hi Calvin, >>>>> >>>>> - src/hotspot/share/include/cds.h >>>>> >>>>> 36 #define NUM_CDS_REGIONS 8 >>>>> >>>>> The above change would need to be hand fixed when backporting to older >>>>> versions. It's fine to include it in the current review, but it's >>>>> better to create a separate bug and commit using that bug ID. So it >>>>> will make the backports cleaner. >>>> I don't think it is worthwhile filing a bug just for this line. >>>> >>>> I've added a comment as follows: >>>> >>>> 36 #define NUM_CDS_REGIONS 8 // this must be the same as >>>> MetaspaceShared::n_regions >>> The issue is that one needs to manually change or revert the above >>> when backporting the change to older JDK versions, so the backport is >>> not clean and introduces risks. Committing it under a separate bug id >>> will avoid that issue. Or, you could combine the above it with some >>> other change later. In general, it's a good practice to avoid >>> combining unrelated changes in one changeset (with single bug id). >> I've reverted the changes in cds.h and filemap.hpp and filed the >> following bug for the NUM_CDS_REGIONS adjustment: >> >> https://bugs.openjdk.java.net/browse/JDK-8227496 >> >> Updated webrev: >> >> http://cr.openjdk.java.net/~ccheung/8226406/webrev.02/ >> >> thanks, >> >> Calvin >> >>>>> -------- >>>>> >>>>> 39 #define CDS_END_MAGIC 0xf00babae >>>>> >>>>> What's the significance of the new end magic? Should the existing >>>>> header validation be sufficient as long as it's done first? >>>> It seems unnecessary now. I got rid of it. >>>>> -------- >>>>> >>>>> - src/hotspot/share/memory/filemap.cpp >>>>> >>>>> 901 if (_header->_magic != CDS_ARCHIVE_MAGIC && _header->_magic != >>>>> CDS_DYNAMIC_ARCHIVE_MAGIC) { >>>>> 902 unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : >>>>> CDS_DYNAMIC_ARCHIVE_MAGIC; >>>>> 903 log_info(cds)("_magic expected: 0x%08x", expected_magic); >>>>> 904 log_info(cds)(" actual: 0x%08x", _header->_magic); >>>>> 905 FileMapInfo::fail_continue("The shared archive file has a bad >>>>> magic number."); >>>>> 906 return false; >>>>> 907 } >>>>> ... >>>>> >>>>> 964 if (is_static) { >>>>> 965 if (_header->_magic != CDS_ARCHIVE_MAGIC) { >>>>> 966 fail_continue("Incorrect static archive magic number"); >>>>> 967 return false; >>>>> 968 } >>>>> >>>>> There are two checks for _header->_magic in >>>>> FileMapInfo::init_from_file now but behave differently. The second one >>>>> can be removed. The first check at line 901 should check the _magic >>>>> value based on the 'is_static' flag: >>>>> >>>>> unsigned int expected_magic = is_static ? CDS_ARCHIVE_MAGIC : >>>>> CDS_DYNAMIC_ARCHIVE_MAGIC; >>>>> if (_header->_magic != expected_magic) { >>>>> ... >>>> I've made the above change. >>>>> -------- >>>>> >>>>> Most of the work now in FileMapInfo::init_from_file should really >>>>> belong to FileMapInfo::validate_header. It would be cleaner to simply >>>>> FileMapInfo::init_from_file to be the following and move the rest to >>>>> FileMapInfo::validate_header. Thoughts? >>>>> >>>>> 888 bool FileMapInfo::init_from_file(int fd, bool is_static) { >>>>> 889 size_t sz = is_static ? sizeof(FileMapHeader) : >>>>> sizeof(DynamicArchiveHeader); >>>>> 890 size_t n = os::read(fd, _header, (unsigned int)sz); >>>>> 891 if (n != sz) { >>>>> 892 fail_continue("Unable to read the file header."); >>>>> 893 return false; >>>>> 894 } >>>>> 895 return true; >>>>> } >>>> The _file_offset will be based on the size_t n and some other fields >>>> (_paths_misc_info, SharedBaseAddress) will be set at lines 953 - 976. >>>> Also, there's the following check in validate_header(): >>>> >>>> 1859 if >>>> (!ClassLoader::check_shared_paths_misc_info(_paths_misc_info, >>>> _header->_paths_misc_info_size, is_static)) { >>>> >>>> If the SharedPathsMiscInfo could be removed (JDK-8227370), then it is >>>> possible that validate_header could be called within init_from_file. I >>>> think we should defer this until JDK-8227370. >>> Ok for deferring it. >>> >>> Best, >>> Jiangli >>> >>>> updated webrev: >>>> >>>> http://cr.openjdk.java.net/~ccheung/8226406/webrev.01/ >>>> >>>> thanks, >>>> >>>> Calvin >>>> >>>>> Best regards, >>>>> >>>>> Jiangli >>>>> >>>>> >>>>> On Mon, Jul 8, 2019 at 10:25 AM Jiangli Zhou wrote: >>>>>> Hi Calvin, >>>>>> >>>>>> On Mon, Jul 8, 2019 at 10:00 AM Calvin Cheung wrote: >>>>>>> Hi Jiangli, >>>>>>> >>>>>>> On 7/7/19 5:12 PM, Jiangli Zhou wrote: >>>>>>>> Hi Calvin, >>>>>>>> >>>>>>>> Per our off-mailing-list email exchange from the previous code review >>>>>>>> for https://bugs.openjdk.java.net/browse/JDK-8211723, I created >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227370, 'Remove >>>>>>>> SharedPathsMiscInfo' >>>>>>> Thanks for filing the RFE. >>>>>>>> . I think the crash caused by premature runtime accessing of >>>>>>>> _paths_misc_info_size should be handled as part of JDK-8227370, rather >>>>>>>> than further patching up the SharedPathsMiscInfo >>>>>>> My current patch involves checking most the fields in >>>>>>> CDSFileMapHeaderBase before accessing other fields. This part is >>>>>>> applicable to other fields, not only to the _paths_misc_info_size. This >>>>>>> bug existed for a while and I think it would be a good backport >>>>>>> candidate for 11u. The patch for JDK-8211723 and the follow-up RFE >>>>>>> JDK-8227370 are not necessary to be backported to 11u. I'd like to fix >>>>>>> this bug first and then handle JDK-8227370 as a separate changeset. >>>>>> That sounds like a good plan. A fix targeted for backporting should >>>>>> have a clean-cut (less dependency) and controlled scope. Addressing >>>>>> this incrementally in separate changesets is a suitable approach. >>>>>> >>>>>> I took a quick look over the weekend and noticed some issues with your >>>>>> current patch. That's why I suggested to go with the complete removal >>>>>> without spending extra effort on SharedPathsMiscInfo. I will need to >>>>>> take a closer look and try to get back to you later today. >>>>>> >>>>>> Best regards, >>>>>> Jiangli >>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> Calvin >>>>>>> >>>>>>>> Thanks and regards, >>>>>>>> Jiangli >>>>>>>> >>>>>>>> On Wed, Jul 3, 2019 at 5:59 PM Calvin Cheung wrote: >>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8226406 >>>>>>>>> >>>>>>>>> webrev: http://cr.openjdk.java.net/~ccheung/8226406/webrev.00/ >>>>>>>>> >>>>>>>>> This bug was found during a bootcycle build when a shared archive built >>>>>>>>> by a 64-bit JDK version is used by a 32-bit JDK version. It is due to >>>>>>>>> some of the important header fields such as the _jvm_ident was not >>>>>>>>> checked prior to accessinng other fields such as the _paths_misc_info_size. >>>>>>>>> >>>>>>>>> This fix involves checking most the fields in CDSFileMapHeaderBase >>>>>>>>> before accessing other fields. >>>>>>>>> >>>>>>>>> Testing: tiers 1-3. >>>>>>>>> >>>>>>>>> thanks, >>>>>>>>> >>>>>>>>> Calvin >>>>>>>>> From claes.redestad at oracle.com Thu Jul 11 12:46:19 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 11 Jul 2019 14:46:19 +0200 Subject: RFR [S]: 8227527: LogDecorations should lazily resolve host name In-Reply-To: References: Message-ID: Hi Thomas, On 2019-07-10 22:15, Thomas St?fe wrote: > Hi Claes, > > This looks all very good. thanks, > > My only remark is not directed at your patch, but I would prefer > diagnostic code like logging not to crash at a native OOM (I refer to > the strdup), but rather handle it gracefully, e.g. by just printing > "???" as hostname. since this os::strdup_check_oom is used throughout the UL code, would you mind opening an RFE to have that evaluated more systematically? Or do you insist on a point fix here and now? Although highly unlikely, one could argue my patch here might defer the strdup to a point in time where the first ever logging attempted is in response to a native OOM situation, which might be making a crash ever so slightly more possible. While far-fetched, it is an argument for pre-emptively addressing this here an now. Thanks! /Claes From thomas.stuefe at gmail.com Thu Jul 11 13:25:17 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 11 Jul 2019 15:25:17 +0200 Subject: RFR [S]: 8227527: LogDecorations should lazily resolve host name In-Reply-To: References: Message-ID: Hi Claes, On Thu, Jul 11, 2019, 14:45 Claes Redestad wrote: > Hi Thomas, > > On 2019-07-10 22:15, Thomas St?fe wrote: > > Hi Claes, > > > > This looks all very good. > > thanks, > > > > > My only remark is not directed at your patch, but I would prefer > > diagnostic code like logging not to crash at a native OOM (I refer to > > the strdup), but rather handle it gracefully, e.g. by just printing > > "???" as hostname. > > since this os::strdup_check_oom is used throughout the UL code, would > you mind opening an RFE to have that evaluated more systematically? Or > do you insist on a point fix here and now? > > Although highly unlikely, one could argue my patch here might defer the > strdup to a point in time where the first ever logging attempted is in > response to a native OOM situation, which might be making a crash ever > so slightly more possible. While far-fetched, it is an argument for > pre-emptively addressing this here an now. > > Thanks! > > /Claes > The patch is fine as it is. I will open a separate rfe for the oom issue. Thanks, Thomas > From martin.doerr at sap.com Thu Jul 11 15:31:50 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 11 Jul 2019 15:31:50 +0000 Subject: RFR(S): 8227597: [fastdbg build] Arena::inc_bytes_allocated should get inlined Message-ID: Hi, the simple function Arena::inc_bytes_allocated can be found as CPU consuming when profiling the fastdbg build. It is located in a cpp file. It should better get inlined to improve the performance of the fastdbg VM which is important for complex tests. In addition, atomic 8-Byte load and store functions are available on all platforms, so the "#if defined ..." can get removed. Here's my proposal: http://cr.openjdk.java.net/~mdoerr/8227597_DBG_Inline_inc_bytes_allocated/webrev.00/ Feedback is welcome. Best regards, Martin From daniel.daugherty at oracle.com Thu Jul 11 19:49:11 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 11 Jul 2019 15:49:11 -0400 Subject: RFR(L) 8153224 Monitor deflation prolong safepoints (CR5/v2.05/8-for-jdk13) In-Reply-To: References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com> <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com> <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com> Message-ID: <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com> Greetings, I've been focused on chasing down and fixing the rare test failures that only pop up rarely. So this round is primarily fixes for races with a few additional fixes that came from Karen's review of CR4. Thanks Karen! I have attached the list of fixes from CR4 to CR5 instead of putting in the main body of this email. Main bug URL: ??? JDK-8153224 Monitor deflation prolong safepoints ??? https://bugs.openjdk.java.net/browse/JDK-8153224 The project is currently baselined on jdk-13+29. This will likely be the last JDK13 baseline for this project and I'll roll to the JDK14 (jdk/jdk) repo soon... Here's the full webrev URL: http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.full/ Here's the incremental webrev URL: http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.inc/ I have not yet checked the OpenJDK wiki to see if it needs any updates to match the CR5 changes: https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation (I did update the OpenJDK wiki for the CR4 changes back on 2019.06.26) This version of the patch has been thru Mach5 tier[1-3] testing on Oracle's usual set of platforms. Mach5 tier[4-6] is running now and Mach5 tier[78] will follow. I'll kick off the usual stress testing on Linux-X64, macOSX and Solaris-X64 as those machines become available. Since I haven't made any performance changes in this round, I'll only be running SPECjbb2015 to gather the latest monitorinflation logs. Next up: - We're still seeing 4-5% lower performance with SPECjbb2015 on ? Linux-X64 and we've determined that some of that comes from ? contention on the gListLock. So I'm going to investigate removing ? the gListLock. Yes, another lock free set of changes is coming! - Of course, going lock free often causes new races and new failures ? so that's a good reason for make those changes isolated in their ? own round (and not holding up CR5/v2.05/8-for-jdk13 anymore). - I finally have a potential fix for the Win* failure with ? ? gc/g1/humongousObjects/TestHumongousClassLoader.java ? but I haven't run it through Mach5 yet so it'll be in the next round. - Some RTM tests were recently re-enabled in Mach5 and I'm seeing some ? monitor related failures there. I suspect that I need to go take a ? look at the C2 RTM macro assembler code and look for things that might ? conflict if Async Monitor Deflation. If you're interested in that kind ? of issue, then see the macroAssembler_x86.cpp sanity check that I ? added in this round! Thanks, in advance, for any questions, comments or suggestions. Dan On 5/26/19 8:30 PM, Daniel D. Daugherty wrote: > Greetings, > > I have a fix for an issue that came up during performance testing. > Many thanks to Robbin for diagnosing the issue in his SPECjbb2015 > experiments. > > Here's the list of changes from CR3 to CR4. The list is a bit > verbose due to the complexity of the issue, but the changes > themselves are not that big. > > Functional: > ? - Change SafepointSynchronize::is_cleanup_needed() from calling > ??? ObjectSynchronizer::is_cleanup_needed() to calling > ??? ObjectSynchronizer::is_safepoint_deflation_needed(): > ??? - is_safepoint_deflation_needed() returns the result of > ????? monitors_used_above_threshold() for safepoint based > ????? monitor deflation (!AsyncDeflateIdleMonitors). > ??? - For AsyncDeflateIdleMonitors, it only returns true if > ????? there is a special deflation request, e.g., System.gc() > ????? - This solves a bug where there are a bunch of Cleanup > ??????? safepoints that simply request async deflation which > ??????? keeps the async JavaThreads from making progress on > ??????? their async deflation work. > ? - Add AsyncDeflationInterval diagnostic option. Description: > ????? Async deflate idle monitors every so many milliseconds when > ????? MonitorUsedDeflationThreshold is exceeded (0 is off). > ? - Replace ObjectSynchronizer::gOmShouldDeflateIdleMonitors() with > ??? ObjectSynchronizer::is_async_deflation_needed(): > ??? - is_async_deflation_needed() returns true when > ????? is_async_cleanup_requested() is true or when > ????? monitors_used_above_threshold() is true (but no more often than > ????? AsyncDeflationInterval). > ??? - if AsyncDeflateIdleMonitors Service_lock->wait() now waits for > ????? at most GuaranteedSafepointInterval millis: > ????? - This allows is_async_deflation_needed() to be checked at > ??????? the same interval as GuaranteedSafepointInterval. > ??????? (default is 1000 millis/1 second) > ????? - Once is_async_deflation_needed() has returned true, it > ??????? generally cannot return true for AsyncDeflationInterval. > ??????? This is to prevent async deflation from swamping the > ??????? ServiceThread. > ? - The ServiceThread still handles async deflation of the global > ??? in-use list and now it also marks JavaThreads for async deflation > ??? of their in-use lists. > ??? - The ServiceThread will check for async deflation work every > ????? GuaranteedSafepointInterval. > ??? - A safepoint can still cause the ServiceThread to check for > ????? async deflation work via is_async_deflation_requested. > ? - Refactor code from ObjectSynchronizer::is_cleanup_needed() into > ??? monitors_used_above_threshold() and remove is_cleanup_needed(). > ? - In addition to System.gc(), the VM_Exit VM op and the final > ??? VMThread safepoint now set the is_special_deflation_requested > ??? flag to reduce the in-use monitor population that is reported by > ??? ObjectSynchronizer::log_in_use_monitor_details() at VM exit. > > Test update: > ? - test/hotspot/gtest/oops/test_markOop.cpp is updated to work with > ??? AsyncDeflateIdleMonitors. > > Collateral: > ? - Add/clarify/update some logging messages. > > Cleanup: > ? - Updated comments based on Karen's code review. > ? - Change 'special cleanup' -> 'special deflation' and > ??? 'async cleanup' -> 'async deflation'. > ??? - comment and function name changes > ? - Clarify MonitorUsedDeflationThreshold description; > > > Main bug URL: > > ??? JDK-8153224 Monitor deflation prolong safepoints > ??? https://bugs.openjdk.java.net/browse/JDK-8153224 > > The project is currently baselined on jdk-13+22. > > Here's the full webrev URL: > > http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.full/ > > Here's the incremental webrev URL: > > http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.inc/ > > I have not updated the OpenJDK wiki to reflect the CR4 changes: > > https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation > > The wiki doesn't say a whole lot about the async deflation invocation > mechanism so I have to figure out how to add that content. > > This version of the patch has been thru Mach5 tier[1-8] testing on > Oracle's usual set of platforms. My Solaris-X64 stress kit run is > running now. Kitchensink8H on product, fastdebug, and slowdebug bits > are running on Linux-X64, MacOSX and Solaris-X64. I still have to run > my stress kit on Linux-X64. I still have to run the SPECjbb2015 > baseline and CR4 runs on Linux-X64, MacOSX and Solaris-X64. > > Thanks, in advance, for any questions, comments or suggestions. > > Dan > > On 5/6/19 11:52 AM, Daniel D. Daugherty wrote: >> Greetings, >> >> I had some discussions with Karen about a race that was in the >> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This race was >> theoretical and I had no test failures due to it. The fix is pretty >> simple: remove the special case code for async deflation in the >> ObjectMonitor::enter() function and rely solely on the ref_count >> for ObjectMonitor::enter() protection. >> >> During those discussions Karen also floated the idea of using the >> ref_count field instead of the contentions field for the Async >> Monitor Deflation protocol. I decided to go ahead and code up that >> change and I have run it through the usual stress and Mach5 testing >> with no issues. It's also known as v2.03 (for those for with the >> patches) and as webrev/6-for-jdk13 (for those with webrev URLs). >> Sorry for all the names... >> >> Main bug URL: >> >> ??? JDK-8153224 Monitor deflation prolong safepoints >> ??? https://bugs.openjdk.java.net/browse/JDK-8153224 >> >> The project is currently baselined on jdk-13+18. >> >> Here's the full webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/ >> >> Here's the incremental webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/ >> >> I have also updated the OpenJDK wiki to reflect the CR3 changes: >> >> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation >> >> This version of the patch has been thru Mach5 tier[1-8] testing on >> Oracle's usual set of platforms. My Solaris-X64 stress kit run had >> no issues. Kitchensink8H on product, fastdebug, and slowdebug bits >> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and >> Solaris-X64 release had the usual "Too large time diff" complaints. >> 12 hour Inflate2 runs on product, fastdebug and slowdebug bits on >> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64 >> stress kit is running right now. >> >> I've done the SPECjbb2015 baseline and CR3 runs. I need to gather >> the results and analyze them. >> >> Thanks, in advance, for any questions, comments or suggestions. >> >> Dan >> >> >> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> I have a small but important bug fix for the Async Monitor Deflation >>> project ready to go. It's also known as v2.02 (for those for with the >>> patches) and as webrev/5-for-jdk13 (for those with webrev URLs). Sorry >>> for all the names... >>> >>> JDK-8222295 was pushed to jdk/jdk two days ago so that baseline patch >>> is out of our hair. >>> >>> Main bug URL: >>> >>> ??? JDK-8153224 Monitor deflation prolong safepoints >>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224 >>> >>> The project is currently baselined on jdk-13+17. >>> >>> Here's the full webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/ >>> >>> Here's the incremental webrev URL (JDK-8153224): >>> >>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/ >>> >>> I still have to update the OpenJDK wiki to reflect the CR2 changes: >>> >>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation >>> >>> This version of the patch has been thru Mach5 tier[1-6] testing on >>> Oracle's usual set of platforms. Mach5 tier[7-8] is running now. >>> My stress kit is running on Solaris-X64 now. Kitchensink8H is running >>> now on product, fastdebug, and slowdebug bits on Linux-X64, MacOSX >>> and Solaris-X64. 12 hour Inflate2 runs are running now on product, >>> fastdebug and slowdebug bits on Linux-X64, MacOSX and Solaris-X64. >>> I'll start my my stress kit on Linux-X64 sometime on Sunday (after >>> my jdk-13+18 stress run is done). >>> >>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress >>> testing is done. >>> >>> Thanks, in advance, for any questions, comments or suggestions. >>> >>> Dan >>> >>> >>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote: >>>> Greetings, >>>> >>>> I finally have CR1 for the Async Monitor Deflation project ready to >>>> go. It's also known as v2.01 (for those for with the patches) and as >>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for all the >>>> names... >>>> >>>> Main bug URL: >>>> >>>> ??? JDK-8153224 Monitor deflation prolong safepoints >>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224 >>>> >>>> Baseline bug fixes URL: >>>> >>>> ??? JDK-8222295 more baseline cleanups from Async Monitor Deflation >>>> project >>>> ??? https://bugs.openjdk.java.net/browse/JDK-8222295 >>>> >>>> The project is currently baselined on jdk-13+15. >>>> >>>> Here's the webrev for the latest baseline changes (JDK-8222295): >>>> >>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295 >>>> >>>> Here's the full webrev URL (JDK-8153224 only): >>>> >>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/ >>>> >>>> Here's the incremental webrev URL (JDK-8153224): >>>> >>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/ >>>> >>>> So I'm looking for reviews for both JDK-8222295 and the latest version >>>> of JDK-8153224... >>>> >>>> I still have to update the OpenJDK wiki to reflect the CR changes: >>>> >>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation >>>> >>>> This version of the patch has been thru Mach5 tier[1-3] testing on >>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and >>>> Mach5 tier[78] will be run later today. My stress kit on Solaris-X64 >>>> is running now. Linux-X64 stress testing will start on Sunday. I'm >>>> planning to do Kitchensink runs, SPECjbb2015 runs and my monitor >>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64. >>>> >>>> Thanks, in advance, for any questions, comments or suggestions. >>>> >>>> Dan >>>> >>>> >>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote: >>>>> Greetings, >>>>> >>>>> Welcome to the OpenJDK review thread for my port of Carsten's work >>>>> on: >>>>> >>>>> ??? JDK-8153224 Monitor deflation prolong safepoints >>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224 >>>>> >>>>> Here's a link to the OpenJDK wiki that describes my port: >>>>> >>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation >>>>> >>>>> Here's the webrev URL: >>>>> >>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/ >>>>> >>>>> Here's a link to Carsten's original webrev: >>>>> >>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/ >>>>> >>>>> Earlier versions of this patch have been through several rounds of >>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and >>>>> Roman for their preliminary code review comments. A very special >>>>> thanks to Robbin and Roman for building and testing the patch in >>>>> their own environments (including specJBB2015). >>>>> >>>>> This version of the patch has been thru Mach5 tier[1-8] testing on >>>>> Oracle's usual set of platforms. Earlier versions have been run >>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers >>>>> (product, fastdebug, slowdebug).Earlier versions have run Kitchensink >>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, fastdebug >>>>> and slowdebug). Earlier versions have run my monitor inflation stress >>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, >>>>> fastdebug and slowdebug). >>>>> >>>>> All of the testing done on earlier versions will be redone on the >>>>> latest version of the patch. >>>>> >>>>> Thanks, in advance, for any questions, comments or suggestions. >>>>> >>>>> Dan >>>>> >>>>> P.S. >>>>> One subtest in gc/g1/humongousObjects/TestHumongousClassLoader.java >>>>> is currently failing in -Xcomp mode on Win* only. I've been trying >>>>> to characterize/analyze this failure for more than a week now. At >>>>> this point I'm convinced that Async Monitor Deflation is aggravating >>>>> an existing bug. However, I plan to have a better handle on that >>>>> failure before these bits are pushed to the jdk/jdk repo. >>>>> >>>> >>>> >>> >>> >> >> > > -------------- next part -------------- Functional: - Add sanity check to MacroAssembler::fast_lock() to verify that the ObjectMonitor's object field still refers to the object that we have just locked with the C2 fast enter optimization. - Replace raw references to _ref_count to ref_count() to see the proper version of the value. Fixed rare racy assert() and guarantee() failures. - Add ObjectMonitor::owner_is_DEFLATER_MARKER() accessor to do proper load_acquire of the owner field. - Update ObjectMonitorHandle::save_om_ptr() and ObjectSynchronizer:: deflate_monitor_using_JT() to use owner_is_DEFLATER_MARKER(). Fixed racy guarantee() failures. - Allow the owner == DEFLATER_MARKER value to linger until a deflated ObjectMonitor is reused for an enter operation. This prevents the C2 ObjectMonitor enter optimization from racing with async deflation. Fixed rare "Non-balanced monitor enter/exit!" assertion failures. - Add/redo fast path for acquiring an ObjectMonitor that only went thru the first part of the Async Deflation protocol to: ObjectMonitor::enter(add), ObjectMonitor::EnterI(redo * 2), ObjectMonitor::ReenterI(redo), ObjectSynchronizer::quick_enter(add) Yes, this also handles the lingering owner == DEFLATER_MARKER value. - ObjectSynchronizer::deflate_monitor() needs to check ref_count() in order to not deflate ObjectMonitor* that are in use across a safepoint. Fixed rare racy assert() and guarantee failures. - ObjectSynchronizer::deflate_monitor() can clear a lingering owner == DEFLATER_MARKER value because the C2 ObjectMonitor enter optimization race cannot happen over a safepoint. Test update: - None this round. Collateral: - None this round. Cleanup: - Add comments requested by Karen in her CR4 code review. - VM_Exit::doit_prologue() and the regular VM exit path should only make a special deflation request if we have logging enabled. Karen, thanks for catching this! - Merge ObjectMonitor::is_busy() and ObjectMonitor::is_busy_async() into just is_busy(). - Update is_busy_to_string() to do selective output of the owner field: DEFLATER_MARKER value is not shown as a busy value. - Refactor MonitorBound option check into is_MonitorBound_exceeded() and add support to ObjectSynchronizer::is_async_deflation_needed(); existing MonitorBound code in omAlloc() including InduceScavenge() now uses is_MonitorBound_exceeded() and is not used when AsyncDeflateIdleMonitors == true. - ObjectMonitor::dec_ref_count() and ObjectMonitor::inc_ref_count() should use ADIM_guarantee() instead of guarantee(). - Add and clarify various comments. From mikhailo.seledtsov at oracle.com Fri Jul 12 00:58:18 2019 From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com) Date: Thu, 11 Jul 2019 17:58:18 -0700 Subject: RFR(S): 8227122: [TESTBUG] Create Docker sidecar test cases In-Reply-To: References: <11c03bdc-832b-3479-075b-1a8b5dc95e87@oracle.com> Message-ID: <47b20d2a-02b9-33b8-b2e4-6022f809e139@oracle.com> Hi Severin, ? Thank you for taking a look at this change. On 7/10/19 10:40 AM, Severin Gehwolf wrote: > Hi Misha, > > On Tue, 2019-07-02 at 15:24 -0700, mikhailo.seledtsov at oracle.com wrote: >> Please review this new test that uses a Docker sidecar pattern to >> manage/monitor JVM running in the main payload container. >> >> Sidecar is a common pattern used in the cloud environments for >> monitoring among other uses. In side car pattern the main >> application/service container that runs the payload is paired with a >> sidecar container. It is achieved by sharing certain namespace >> aspects >> between the two containers such as PID namespace, specific >> sub-directories, IPC and more. >> >> This test implements the following cases: >> - "jcmd -l" to list java processes running in "main" container >> from >> the "sidecar" container >> - "jhsdb jinfo" in the sidecar configuration >> - jcmd >> >> This change also builds a basis for more test cases in the future. >> >> Minor changes were done to DockerTestUtils: >> - changing access to DOCKER_COMMAND constant to public >> - minor spelling and terminology corrections >> >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8227122 >> Webrev: http://cr.openjdk.java.net/~mseledtsov/8227122.00/ >> Testing: >> 1. ran Docker tests on Linux-x64 - PASS >> 2. Running Docker tests in test cluster - in progress >> > > // JCMD does not work in sidecar configuration, except for "jcmd -l". > // Including this test case to assist in reproduction of the problem. > // t.assertIsAlive(); > // testCase03(mainProcPid); > > FWIW, "jcmd -l" doesn't work in this case either. It only sees itself > as far as I can tell. In my experiment it does work. Here are parts of the test log, first the command that runs jcmd in a sidecar container, then the output of that container: """ [COMMAND] /usr/local/bin/docker run --tty=true --rm --cap-add=SYS_PTRACE --sig-proxy=true --pid=container:test-container-main --ipc=container:test-container-main -v /ws/playArea/sidecar-jcmd-8227122/JTwork/scratch/.:/tmp/ jdk-internal:test-jfr-jcmd /jdk/bin/jcmd -l [2019-07-12T00:26:29.083764Z] Gathering output for process 8703 [ELAPSED: 5 ms] [STDERR] [STDOUT] 1 EventGeneratorLoop 15 23 jdk.jcmd/sun.tools.jcmd.JCmd -l """ The output shows 2 processes, one is EventGeneratorLoop with PID of 1 (as expected). This is possible because the containers share certain namespaces and mounted volumes in a 'sidecar' configuration. In this case, containers share the PID namespace (--pid=container:test-container-main) and share volumes mounted as "/tmp" inside the container (-v /ws/playArea/sidecar-jcmd-8227122/JTwork/scratch/.:/tmp/) > It should see the JVM of the host container too. > That issue can be fixed by creating a shared /tmp filesystem and > mounting into both containers. The test does that. > > What's more, this seems to be a case of AttachListener::is_init_trigger[1] and > VirtualMachineImpl.createAttachFile[2] disagreeing. The former looks in > $(pwd)/.attach_pid or /tmp/.attach_pid and the latter creates > it in /proc//root/tmp/.attach_pid. > > There seems to be more issues involved. As attaching to a JVM inside a > container doesn't seem to work from outside which is supposed to be > fixed with JDK-8179498. That alone seems to warrant a bug. You are describing a slightly different use case / pattern, but I agree it does not seem to work. I am happy to hear confirmation of that. The pattern addressed in this test is a side car, where both the observer and observee run in containers; the containers are 'friendly' by sharing certain apsects of namespaces. The use case you are describing is somewhat different, if I understand correctly: the observer runs on a host machine, and obsrvee runs in a container. Observer tries to use jcmd to list the java processes running in container(s), and issue commands, but that fails. I can create a bug for that, and a simple test case. > > private static DockerThread startMainContainer() throws Exception { > // start "main" container (the observee) > DockerRunOptions opts = commonDockerOpts("EventGeneratorLoop"); > opts.addDockerOpts("--cap-add=SYS_PTRACE", "--ipc=shareable") > > Is '--ipc=shareable' really needed? It's not a supported option for my > docker here :-( I have removed the '--ipc=shareable' and the test still works. I think this is extra stuff that is not necessary for this test case, so I will remove it. I will incorporate changes from your and Bob's review, run some testing, and post an updated webrev. Thank you, Misha > > Thanks, > Severin > > [1] http://hg.openjdk.java.net/jdk/jdk/file/ba72dac556c3/src/hotspot/os/linux/attachListener_linux.cpp#l500 > [2] http://hg.openjdk.java.net/jdk/jdk/file/ba72dac556c3/src/jdk.attach/linux/classes/sun/tools/attach/VirtualMachineImpl.java#l295 > From jianglizhou at google.com Fri Jul 12 03:04:04 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Thu, 11 Jul 2019 20:04:04 -0700 Subject: RFR (XS): 8227582: runtime/TLS/testtls.sh fails on x86_32 Message-ID: Please review the following test change to disable the negative test case for the glibc TLS issue. As there are potentially different failure modes, the negative test case is not completely suitable for regular testing. For example, a hang could potentially cause timeout failure. webrev: http://cr.openjdk.java.net/~jiangli/8227582/webrev.00/ bug: https://bugs.openjdk.java.net/browse/JDK-8227582 Best Regards, Jiangli From matthias.baesken at sap.com Fri Jul 12 05:54:39 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Fri, 12 Jul 2019 05:54:39 +0000 Subject: RFR(S): 8227597: [fastdbg build] Arena::inc_bytes_allocated should get inlined In-Reply-To: References: Message-ID: Hi Martin, looks good to me . Please increase the copyright year in http://cr.openjdk.java.net/~mdoerr/8227597_DBG_Inline_inc_bytes_allocated/webrev.00/src/hotspot/share/memory/arena.cpp.frames.html but no new webrev needed . Best regards, Matthias From: Doerr, Martin Sent: Donnerstag, 11. Juli 2019 17:32 To: hotspot-runtime-dev at openjdk.java.net Cc: Baesken, Matthias ; Lindenmaier, Goetz ; Claes Redestad Subject: RFR(S): 8227597: [fastdbg build] Arena::inc_bytes_allocated should get inlined Hi, the simple function Arena::inc_bytes_allocated can be found as CPU consuming when profiling the fastdbg build. It is located in a cpp file. It should better get inlined to improve the performance of the fastdbg VM which is important for complex tests. In addition, atomic 8-Byte load and store functions are available on all platforms, so the "#if defined ..." can get removed. Here's my proposal: http://cr.openjdk.java.net/~mdoerr/8227597_DBG_Inline_inc_bytes_allocated/webrev.00/ Feedback is welcome. Best regards, Martin From chengjingwei1 at huawei.com Fri Jul 12 06:50:20 2019 From: chengjingwei1 at huawei.com (chengjingwei (A)) Date: Fri, 12 Jul 2019 06:50:20 +0000 Subject: [8u-dev] Deadlock involving FileSystems.getDefault and System.loadLibrary Message-ID: Hi there, We got a bug report from our customers some time ago, the situation was the same as described in http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-January/050819.html So we adopted the patch proposed in https://bugs.openjdk.java.net/browse/JDK-8194653 and https://mail.openjdk.java.net/pipermail/core-libs-dev/2019-April/059811.html But after then, we've encountered 2 more issues: 1. Win32 jdk would crash on a win64 environment when running AsynchronousFileChannel test in the jck testsuite 2. Win32 jdk sometimes throw TimeoutException when running AysnchronousChannelGroup test in the jck testsuite We are pretty sure that these issues came up right after we applied this patch. As for the root cause, we are still investigating. If someone is interested in these newly introduced issues, I'd like to post more details about them. From shade at redhat.com Fri Jul 12 08:14:31 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 12 Jul 2019 10:14:31 +0200 Subject: RFR (XS): 8227582: runtime/TLS/testtls.sh fails on x86_32 In-Reply-To: References: Message-ID: <53111924-0e78-3626-c1f0-d7500b246893@redhat.com> On 7/12/19 5:04 AM, Jiangli Zhou wrote: > Please review the following test change to disable the negative test > case for the glibc TLS issue. As there are potentially different > failure modes, the negative test case is not completely suitable for > regular testing. For example, a hang could potentially cause timeout > failure. > > webrev: http://cr.openjdk.java.net/~jiangli/8227582/webrev.00/ Looks good to me. This unbreaks the test (and tier1) on x86_32. -- Thanks, -Aleksey From goetz.lindenmaier at sap.com Fri Jul 12 08:26:09 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 12 Jul 2019 08:26:09 +0000 Subject: RFR(S): 8227597: [fastdbg build] Arena::inc_bytes_allocated should get inlined In-Reply-To: References: Message-ID: Hi Martin, thanks for looking at the timeouts we get with the jtreg tests on ppc. Inlining inc_bytes_allocated looks like a step forward. But why do you remove the #if in inc_stat_counter()? It is not there because it's not implemented on other platforms, but because SPARC and X86 have (had) 32-bit variants. Actually, your change should slow down the code on PPC & others. I think the right #define here is #ifndef LP64. And you now need that in inc_bytes_allocated, too. Best, Goetz. > -----Original Message----- > From: Doerr, Martin > Sent: Donnerstag, 11. Juli 2019 17:32 > To: hotspot-runtime-dev at openjdk.java.net > Cc: Baesken, Matthias ; Lindenmaier, Goetz > ; Claes Redestad > Subject: RFR(S): 8227597: [fastdbg build] Arena::inc_bytes_allocated should > get inlined > > Hi, > > > > the simple function Arena::inc_bytes_allocated can be found as CPU consuming > when profiling the fastdbg build. It is located in a cpp file. > It should better get inlined to improve the performance of the fastdbg VM > which is important for complex tests. > In addition, atomic 8-Byte load and store functions are available on all > platforms, so the "#if defined ..." can get removed. > > > > Here's my proposal: > > http://cr.openjdk.java.net/~mdoerr/8227597_DBG_Inline_inc_bytes_allocated > /webrev.00/ > > > > Feedback is welcome. > > > > Best regards, > > Martin > > From sgehwolf at redhat.com Fri Jul 12 09:12:37 2019 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Fri, 12 Jul 2019 11:12:37 +0200 Subject: RFR(S): 8227122: [TESTBUG] Create Docker sidecar test cases In-Reply-To: <47b20d2a-02b9-33b8-b2e4-6022f809e139@oracle.com> References: <11c03bdc-832b-3479-075b-1a8b5dc95e87@oracle.com> <47b20d2a-02b9-33b8-b2e4-6022f809e139@oracle.com> Message-ID: Hi Misha, On Thu, 2019-07-11 at 17:58 -0700, mikhailo.seledtsov at oracle.com wrote: > Hi Severin, > > Thank you for taking a look at this change. > > On 7/10/19 10:40 AM, Severin Gehwolf wrote: > > Hi Misha, > > > > On Tue, 2019-07-02 at 15:24 -0700, mikhailo.seledtsov at oracle.com wrote: > > > Please review this new test that uses a Docker sidecar pattern to > > > manage/monitor JVM running in the main payload container. > > > > > > Sidecar is a common pattern used in the cloud environments for > > > monitoring among other uses. In side car pattern the main > > > application/service container that runs the payload is paired with a > > > sidecar container. It is achieved by sharing certain namespace > > > aspects > > > between the two containers such as PID namespace, specific > > > sub-directories, IPC and more. > > > > > > This test implements the following cases: > > > - "jcmd -l" to list java processes running in "main" container > > > from > > > the "sidecar" container > > > - "jhsdb jinfo" in the sidecar configuration > > > - jcmd > > > > > > This change also builds a basis for more test cases in the future. > > > > > > Minor changes were done to DockerTestUtils: > > > - changing access to DOCKER_COMMAND constant to public > > > - minor spelling and terminology corrections > > > > > > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8227122 > > > Webrev: http://cr.openjdk.java.net/~mseledtsov/8227122.00/ > > > Testing: > > > 1. ran Docker tests on Linux-x64 - PASS > > > 2. Running Docker tests in test cluster - in progress > > > > > > > // JCMD does not work in sidecar configuration, except for "jcmd -l". > > // Including this test case to assist in reproduction of the problem. > > // t.assertIsAlive(); > > // testCase03(mainProcPid); > > > > FWIW, "jcmd -l" doesn't work in this case either. It only sees itself > > as far as I can tell. > > In my experiment it does work. Here are parts of the test log, first the > command that runs jcmd in a sidecar container, then the output of that > container: > > """ > > [COMMAND] > > /usr/local/bin/docker run --tty=true --rm --cap-add=SYS_PTRACE > --sig-proxy=true --pid=container:test-container-main > --ipc=container:test-container-main -v > /ws/playArea/sidecar-jcmd-8227122/JTwork/scratch/.:/tmp/ > jdk-internal:test-jfr-jcmd /jdk/bin/jcmd -l > > [2019-07-12T00:26:29.083764Z] Gathering output for process 8703 > [ELAPSED: 5 ms] > [STDERR] > > [STDOUT] > 1 EventGeneratorLoop 15 > 23 jdk.jcmd/sun.tools.jcmd.JCmd -l > > """ > > The output shows 2 processes, one is EventGeneratorLoop with PID of 1 > (as expected). This is possible because the containers share certain > namespaces and mounted volumes in a 'sidecar' configuration. In this > case, containers share the PID namespace > (--pid=container:test-container-main) and share volumes mounted as > "/tmp" inside the container (-v > /ws/playArea/sidecar-jcmd-8227122/JTwork/scratch/.:/tmp/) > Right, sorry. Perhaps this code should get a comment that sharing /tmp between sidecar and host container is needed for jvmstat - used internally by the attach mechanism - to work. See HotSpotAttachProvider.testAttachable(): + String[] command = new String[] { + DockerTestUtils.DOCKER_COMMAND, "run", + "--tty=true", "--rm", + "--cap-add=SYS_PTRACE", "--sig-proxy=true", + "--pid=container:" + MAIN_CONTAINER_NAME, + "--pid=container:" + MAIN_CONTAINER_NAME, + "--ipc=container:" + MAIN_CONTAINER_NAME, + "-v", WORK_DIR + ":/tmp/", I believe -XX:+UsePerfData would be in order too as I don't think things would work if that default changed. > > What's more, this seems to be a case of AttachListener::is_init_trigger[1] and > > VirtualMachineImpl.createAttachFile[2] disagreeing. The former looks in > > $(pwd)/.attach_pid or /tmp/.attach_pid and the latter creates > > it in /proc//root/tmp/.attach_pid. This seems to be the cause for why testCase03 doesn't work. Perhaps this deserves a bug and I can help fix it. While looking at that, I discovered what I said below, which is a different case I know. > > There seems to be more issues involved. As attaching to a JVM inside a > > container doesn't seem to work from outside which is supposed to be > > fixed with JDK-8179498. That alone seems to warrant a bug. > > You are describing a slightly different use case / pattern, but I agree > it does not seem to work. I am happy to hear confirmation of that. I was pointing out that JDK-8179498 seems to have regressed. It's unrelated but should be taken into account when fixing the above issue. > The pattern addressed in this test is a side car, where both the > observer and observee run in containers; the containers are 'friendly' > by sharing certain apsects of namespaces. Yes. > The use case you are describing is somewhat different, if I understand > correctly: the observer runs on a host machine, and obsrvee runs in a > container. Observer tries to use jcmd to list the java processes running > in container(s), and issue commands, but that fails. I can create a bug > for that, and a simple test case. There should be a bug and a test so that it cannot again regress. JDK-8193710 is also related, but the fix for that bug didn't have a test either :( That's this one which needs fixing: https://bugs.openjdk.java.net/browse/JDK-8195809 > > private static DockerThread startMainContainer() throws Exception { > > // start "main" container (the observee) > > DockerRunOptions opts = commonDockerOpts("EventGeneratorLoop"); > > opts.addDockerOpts("--cap-add=SYS_PTRACE", "--ipc=shareable") > > > > Is '--ipc=shareable' really needed? It's not a supported option for my > > docker here :-( > > I have removed the '--ipc=shareable' and the test still works. I think > this is extra stuff that is not necessary for this test case, so I will > remove it. Excellent! > > I will incorporate changes from your and Bob's review, run some testing, > and post an updated webrev. Thanks, Severin > > [1] http://hg.openjdk.java.net/jdk/jdk/file/ba72dac556c3/src/hotspot/os/linux/attachListener_linux.cpp#l500 > > [2] http://hg.openjdk.java.net/jdk/jdk/file/ba72dac556c3/src/jdk.attach/linux/classes/sun/tools/attach/VirtualMachineImpl.java#l295 > > From martin.doerr at sap.com Fri Jul 12 09:47:47 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 12 Jul 2019 09:47:47 +0000 Subject: RFR(S): 8227597: [fastdbg build] Arena::inc_bytes_allocated should get inlined In-Reply-To: References: Message-ID: Hi G?tz and Matthias, thanks for reviewing. Right, using a simple addition instead of the atomics may allow more optimizations. If nobody insists on treating the statistics values as volatile, I prefer this version: http://cr.openjdk.java.net/~mdoerr/8227597_DBG_Inline_inc_bytes_allocated/webrev.01/ Best regards, Martin > -----Original Message----- > From: Lindenmaier, Goetz > Sent: Freitag, 12. Juli 2019 10:26 > To: Doerr, Martin ; hotspot-runtime- > dev at openjdk.java.net > Cc: Baesken, Matthias ; Claes Redestad > > Subject: RE: RFR(S): 8227597: [fastdbg build] Arena::inc_bytes_allocated > should get inlined > > Hi Martin, > > thanks for looking at the timeouts we get with the jtreg tests > on ppc. Inlining inc_bytes_allocated looks like a step forward. > > But why do you remove the #if in inc_stat_counter()? > It is not there because it's not implemented on other platforms, > but because SPARC and X86 have (had) 32-bit variants. > Actually, your change should slow down the code on > PPC & others. > > I think the right #define here is #ifndef LP64. > And you now need that in inc_bytes_allocated, too. > > Best, > Goetz. > > > > -----Original Message----- > > From: Doerr, Martin > > Sent: Donnerstag, 11. Juli 2019 17:32 > > To: hotspot-runtime-dev at openjdk.java.net > > Cc: Baesken, Matthias ; Lindenmaier, Goetz > > ; Claes Redestad > > > Subject: RFR(S): 8227597: [fastdbg build] Arena::inc_bytes_allocated should > > get inlined > > > > Hi, > > > > > > > > the simple function Arena::inc_bytes_allocated can be found as CPU > consuming > > when profiling the fastdbg build. It is located in a cpp file. > > It should better get inlined to improve the performance of the fastdbg VM > > which is important for complex tests. > > In addition, atomic 8-Byte load and store functions are available on all > > platforms, so the "#if defined ..." can get removed. > > > > > > > > Here's my proposal: > > > > > http://cr.openjdk.java.net/~mdoerr/8227597_DBG_Inline_inc_bytes_allocat > ed > > /webrev.00/ > > > > > > > > Feedback is welcome. > > > > > > > > Best regards, > > > > Martin > > > > From thomas.stuefe at gmail.com Fri Jul 12 10:04:34 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Fri, 12 Jul 2019 12:04:34 +0200 Subject: RFR(S): 8227597: [fastdbg build] Arena::inc_bytes_allocated should get inlined In-Reply-To: References: Message-ID: Hi Martin, I would think using it non-atomically makes the counter less useful, since you may loose concurrent updates, and since this a hot path (arena malloc) we may see a counter which is smaller than reality. But I wonder if we even need it at all. This is the counter of bytes allocated from arena chunks, in total. We already have bytes-in-arena-chunks, via NMT. I guess that this counter will be trailing bytes-in-arena-chunks quite closely. For the purpose of "how much memory is allocated in arenas" the NMT counter works better. I also assume that this counter was introduced when there was no NMT to track memory. This counter is only used when printing allocation stats with PrintMallocStatistics. I would even think about removing that PrintMallocStatistics functionality altogether, since it is better covered with NMT. For now I would just remove the counter, rather than making it non-atomic. Cheers, Thomas On Fri, Jul 12, 2019 at 11:48 AM Doerr, Martin wrote: > Hi G?tz and Matthias, > > thanks for reviewing. > > Right, using a simple addition instead of the atomics may allow more > optimizations. > If nobody insists on treating the statistics values as volatile, I prefer > this version: > > http://cr.openjdk.java.net/~mdoerr/8227597_DBG_Inline_inc_bytes_allocated/webrev.01/ > > Best regards, > Martin > > > > -----Original Message----- > > From: Lindenmaier, Goetz > > Sent: Freitag, 12. Juli 2019 10:26 > > To: Doerr, Martin ; hotspot-runtime- > > dev at openjdk.java.net > > Cc: Baesken, Matthias ; Claes Redestad > > > > Subject: RE: RFR(S): 8227597: [fastdbg build] Arena::inc_bytes_allocated > > should get inlined > > > > Hi Martin, > > > > thanks for looking at the timeouts we get with the jtreg tests > > on ppc. Inlining inc_bytes_allocated looks like a step forward. > > > > But why do you remove the #if in inc_stat_counter()? > > It is not there because it's not implemented on other platforms, > > but because SPARC and X86 have (had) 32-bit variants. > > Actually, your change should slow down the code on > > PPC & others. > > > > I think the right #define here is #ifndef LP64. > > And you now need that in inc_bytes_allocated, too. > > > > Best, > > Goetz. > > > > > > > -----Original Message----- > > > From: Doerr, Martin > > > Sent: Donnerstag, 11. Juli 2019 17:32 > > > To: hotspot-runtime-dev at openjdk.java.net > > > Cc: Baesken, Matthias ; Lindenmaier, Goetz > > > ; Claes Redestad > > > > > Subject: RFR(S): 8227597: [fastdbg build] Arena::inc_bytes_allocated > should > > > get inlined > > > > > > Hi, > > > > > > > > > > > > the simple function Arena::inc_bytes_allocated can be found as CPU > > consuming > > > when profiling the fastdbg build. It is located in a cpp file. > > > It should better get inlined to improve the performance of the fastdbg > VM > > > which is important for complex tests. > > > In addition, atomic 8-Byte load and store functions are available on > all > > > platforms, so the "#if defined ..." can get removed. > > > > > > > > > > > > Here's my proposal: > > > > > > > > http://cr.openjdk.java.net/~mdoerr/8227597_DBG_Inline_inc_bytes_allocat > > ed > > > /webrev.00/ > > > > > > > > > > > > Feedback is welcome. > > > > > > > > > > > > Best regards, > > > > > > Martin > > > > > > > > From martin.doerr at sap.com Fri Jul 12 10:22:48 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 12 Jul 2019 10:22:48 +0000 Subject: RFR(S): 8227597: [fastdbg build] Arena::inc_bytes_allocated should get inlined In-Reply-To: References: Message-ID: Hi Thomas, thanks for looking into this issue. Note that the counter is already non-atomic. We?re not using Atomic::add. It?s not supposed to be precise. We only use Atomic::load + store to avoid word-tearing on 32 bit. For 64 bit platforms, my latest version only removes the ?volatile? treatment on x86 and SPARC which is part of the Atomic::load + store. But I?d also appreciate to remove the functionaliy. Best regards, Martin From: Thomas St?fe Sent: Freitag, 12. Juli 2019 12:05 To: Doerr, Martin Cc: Lindenmaier, Goetz ; hotspot-runtime-dev at openjdk.java.net; Baesken, Matthias Subject: Re: RFR(S): 8227597: [fastdbg build] Arena::inc_bytes_allocated should get inlined Hi Martin, I would think using it non-atomically makes the counter less useful, since you may loose concurrent updates, and since this a hot path (arena malloc) we may see a counter which is smaller than reality. But I wonder if we even need it at all. This is the counter of bytes allocated from arena chunks, in total. We already have bytes-in-arena-chunks, via NMT. I guess that this counter will be trailing bytes-in-arena-chunks quite closely. For the purpose of "how much memory is allocated in arenas" the NMT counter works better. I also assume that this counter was introduced when there was no NMT to track memory. This counter is only used when printing allocation stats with PrintMallocStatistics. I would even think about removing that PrintMallocStatistics functionality altogether, since it is better covered with NMT. For now I would just remove the counter, rather than making it non-atomic. Cheers, Thomas On Fri, Jul 12, 2019 at 11:48 AM Doerr, Martin > wrote: Hi G?tz and Matthias, thanks for reviewing. Right, using a simple addition instead of the atomics may allow more optimizations. If nobody insists on treating the statistics values as volatile, I prefer this version: http://cr.openjdk.java.net/~mdoerr/8227597_DBG_Inline_inc_bytes_allocated/webrev.01/ Best regards, Martin > -----Original Message----- > From: Lindenmaier, Goetz > Sent: Freitag, 12. Juli 2019 10:26 > To: Doerr, Martin >; hotspot-runtime- > dev at openjdk.java.net > Cc: Baesken, Matthias >; Claes Redestad > > > Subject: RE: RFR(S): 8227597: [fastdbg build] Arena::inc_bytes_allocated > should get inlined > > Hi Martin, > > thanks for looking at the timeouts we get with the jtreg tests > on ppc. Inlining inc_bytes_allocated looks like a step forward. > > But why do you remove the #if in inc_stat_counter()? > It is not there because it's not implemented on other platforms, > but because SPARC and X86 have (had) 32-bit variants. > Actually, your change should slow down the code on > PPC & others. > > I think the right #define here is #ifndef LP64. > And you now need that in inc_bytes_allocated, too. > > Best, > Goetz. > > > > -----Original Message----- > > From: Doerr, Martin > > Sent: Donnerstag, 11. Juli 2019 17:32 > > To: hotspot-runtime-dev at openjdk.java.net > > Cc: Baesken, Matthias >; Lindenmaier, Goetz > > >; Claes Redestad > > > > Subject: RFR(S): 8227597: [fastdbg build] Arena::inc_bytes_allocated should > > get inlined > > > > Hi, > > > > > > > > the simple function Arena::inc_bytes_allocated can be found as CPU > consuming > > when profiling the fastdbg build. It is located in a cpp file. > > It should better get inlined to improve the performance of the fastdbg VM > > which is important for complex tests. > > In addition, atomic 8-Byte load and store functions are available on all > > platforms, so the "#if defined ..." can get removed. > > > > > > > > Here's my proposal: > > > > > http://cr.openjdk.java.net/~mdoerr/8227597_DBG_Inline_inc_bytes_allocat > ed > > /webrev.00/ > > > > > > > > Feedback is welcome. > > > > > > > > Best regards, > > > > Martin > > > > From jianglizhou at google.com Fri Jul 12 14:30:30 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Fri, 12 Jul 2019 07:30:30 -0700 Subject: RFR (XS): 8227582: runtime/TLS/testtls.sh fails on x86_32 In-Reply-To: <53111924-0e78-3626-c1f0-d7500b246893@redhat.com> References: <53111924-0e78-3626-c1f0-d7500b246893@redhat.com> Message-ID: Thanks! Best Regards, Jiangli On Fri, Jul 12, 2019 at 1:14 AM Aleksey Shipilev wrote: > > On 7/12/19 5:04 AM, Jiangli Zhou wrote: > > Please review the following test change to disable the negative test > > case for the glibc TLS issue. As there are potentially different > > failure modes, the negative test case is not completely suitable for > > regular testing. For example, a hang could potentially cause timeout > > failure. > > > > webrev: http://cr.openjdk.java.net/~jiangli/8227582/webrev.00/ > > Looks good to me. > > This unbreaks the test (and tier1) on x86_32. > > -- > Thanks, > -Aleksey > From calvin.cheung at oracle.com Fri Jul 12 20:05:30 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Fri, 12 Jul 2019 13:05:30 -0700 Subject: [13] RFR(T) 8227496: Update NUM_CDS_REGIONS and CURRENT_CDS_ARCHIVE_VERSION in cds.h Message-ID: JBS: https://bugs.openjdk.java.net/browse/JDK-8227496 webrev: http://cr.openjdk.java.net/~ccheung/8227496/webrev.00/ This change was originally proposed as part of the change for JDK-8226406. It was suggested to have this in a separate changeset so that JDK-8226406 could be backported to 11u easier. Testing: mach5 tiers 1 - 3. thanks, Calvin From jianglizhou at google.com Sat Jul 13 00:53:11 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Fri, 12 Jul 2019 17:53:11 -0700 Subject: [13] RFR(T) 8227496: Update NUM_CDS_REGIONS and CURRENT_CDS_ARCHIVE_VERSION in cds.h In-Reply-To: References: Message-ID: Looks ok. Best, Jiangli On Fri, Jul 12, 2019, 1:05 PM Calvin Cheung wrote: > JBS: https://bugs.openjdk.java.net/browse/JDK-8227496 > > webrev: http://cr.openjdk.java.net/~ccheung/8227496/webrev.00/ > > This change was originally proposed as part of the change for > JDK-8226406. It was suggested to have this in a separate changeset so > that JDK-8226406 could be backported to 11u easier. > > Testing: mach5 tiers 1 - 3. > > thanks, > > Calvin > > From calvin.cheung at oracle.com Mon Jul 15 16:56:29 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Mon, 15 Jul 2019 09:56:29 -0700 Subject: [13] RFR(T) 8227496: Update NUM_CDS_REGIONS and CURRENT_CDS_ARCHIVE_VERSION in cds.h In-Reply-To: References: Message-ID: <8c97212b-d102-bb96-cc67-6d7fc2a3ccab@oracle.com> Thanks, Jiangli. I've pushed the changes. Calvin On 7/12/19 5:53 PM, Jiangli Zhou wrote: > Looks ok. > > Best, > Jiangli > > On Fri, Jul 12, 2019, 1:05 PM Calvin Cheung > wrote: > > JBS: https://bugs.openjdk.java.net/browse/JDK-8227496 > > webrev: http://cr.openjdk.java.net/~ccheung/8227496/webrev.00/ > > This change was originally proposed as part of the change for > JDK-8226406. It was suggested to have this in a separate changeset so > that JDK-8226406 could be backported to 11u easier. > > Testing: mach5 tiers 1 - 3. > > thanks, > > Calvin > From harold.seigel at oracle.com Mon Jul 15 19:05:24 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Mon, 15 Jul 2019 15:05:24 -0400 Subject: RFR (T) 8227690: Problem list JCK test vm/classfmt/clf/clfver001/clfver00101m029 Message-ID: Hi, Please review this trivial change to problem list JCK test vm/classfmt/clf/clfver001/clfver00101m029 until JCK-14 is available. Closed Webrev: http://javaweb.us.oracle.com/~hseigel/webrev/bug_8227690/webrev/index.html JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8227690 Thanks, Harold From harold.seigel at oracle.com Mon Jul 15 19:28:34 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Mon, 15 Jul 2019 15:28:34 -0400 Subject: RFR (T) 8227690: Problem list JCK test vm/classfmt/clf/clfver001/clfver00101m029 In-Reply-To: References: Message-ID: <14e20029-2083-3c58-adaa-934d5e6fb21d@oracle.com> This change is being withdrawn. Harold On 7/15/2019 3:05 PM, Harold Seigel wrote: > > Hi, > > Please review this trivial change to problem list JCK test > vm/classfmt/clf/clfver001/clfver00101m029 until JCK-14 is available. > > Closed Webrev: > http://javaweb.us.oracle.com/~hseigel/webrev/bug_8227690/webrev/index.html > > JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8227690 > > Thanks, Harold > From martin.doerr at sap.com Mon Jul 15 20:16:37 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 15 Jul 2019 20:16:37 +0000 Subject: RFR(S): 8227692: Remove develop feature PrintMallocStatistics Message-ID: Hi, as announced on hotspot-dev, I'd like to remove the debug build feature for allocation statistics "AllocStats" (controlled by develop flag -XX:+PrintMallocStatistics). I've closed JDK-8227597 which was a proposal to reduce the performance impact of it, but several people have suggested to remove this feature which is even better IMHO. Bug: https://bugs.openjdk.java.net/browse/JDK-8227692 Webrev: http://cr.openjdk.java.net/~mdoerr/8227692_remove_PrintMallocStatistics/webrev.00/ I've also taken over the reworked inc_stat_counter from JDK-8227597 (allocation.inline.hpp). Please review. Best regards, Martin From calvin.cheung at oracle.com Tue Jul 16 04:31:50 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Mon, 15 Jul 2019 21:31:50 -0700 Subject: [13] RFR(xs) 8227646: [TESTBUG] appcds/SharedArchiveConsistency timed out Message-ID: <6bfa61cf-eb23-47cb-ef55-47d670a04c0b@oracle.com> bug: https://bugs.openjdk.java.net/browse/JDK-8227646 webrev: http://cr.openjdk.java.net/~ccheung/8227646/webrev.00/ Increase the timeout to 1000s and add the -XX:-CreateCoredumpOnCrash option to disable coredump. Testing: on 2 macosx hosts on which the timeout was observed. thanks, Calvin From christoph.langer at sap.com Tue Jul 16 11:42:35 2019 From: christoph.langer at sap.com (Langer, Christoph) Date: Tue, 16 Jul 2019 11:42:35 +0000 Subject: RFR(S) 8227435: Perf::attach() should not throw a java.lang.Exception In-Reply-To: References: Message-ID: Hi Ralf, looks good. Prior to pushing you?ll have to take care of the copyright years in the files you touched ?? cc-ing hotspot-runtime, because I think it affects this area, too. Thanks Christoph From: serviceability-dev On Behalf Of Schmelter, Ralf Sent: Montag, 15. Juli 2019 10:10 To: OpenJDK Serviceability Subject: [CAUTION] RFR(S) 8227435: Perf::attach() should not throw a java.lang.Exception Please review this small change. It changes the exception which will be thrown when the perf file has not yet the correct size. Instead of throwing the (not declared) java.lang.Exception, we will now throw java.io.IOException, which is expected by the calling code. webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8227435/webrev.0/ bugreport: https://bugs.openjdk.java.net/browse/JDK-8227435 Best regards, Ralf From daniel.daugherty at oracle.com Tue Jul 16 12:56:06 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 16 Jul 2019 08:56:06 -0400 Subject: [13] RFR(xs) 8227646: [TESTBUG] appcds/SharedArchiveConsistency timed out In-Reply-To: <6bfa61cf-eb23-47cb-ef55-47d670a04c0b@oracle.com> References: <6bfa61cf-eb23-47cb-ef55-47d670a04c0b@oracle.com> Message-ID: On 7/16/19 12:31 AM, Calvin Cheung wrote: > bug: https://bugs.openjdk.java.net/browse/JDK-8227646 > > webrev: http://cr.openjdk.java.net/~ccheung/8227646/webrev.00/ test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java ??? Does the test intentionally crash in one or more of the test cases? ? ? If not, then '-XX:-CreateCoredumpOnCrash' is not really needed. ??? I don't think '-XX:-CreateCoredumpOnCrash' will prevent the timeout ??? handling mechanism from trying to capture a core file in the case ??? of a timeout. ??? The test currently timed out with a default total timeout value of ??? 480 seconds; that 480 comes from the default timeout value of 120 ??? seconds and the default timeout factor of 4 (480 == 120 * 4). ??? The 'timeout=1000' value will get you a total timeout value of 4000. ??? I suspect that is not what you want. ??? If you specify 'timeout=240', you'll get a total timeout value of ??? 960 seconds (240 * 4). Dan > > Increase the timeout to 1000s and add the -XX:-CreateCoredumpOnCrash > option to disable coredump. > > Testing: on 2 macosx hosts on which the timeout was observed. > > > thanks, > > Calvin > From martin.doerr at sap.com Tue Jul 16 13:31:02 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 16 Jul 2019 13:31:02 +0000 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime Message-ID: Hi, the current implementation of FastJNIAccessors ignores the flag -XX:+UseFastJNIAccessors when the JVMTI capability "can_post_field_access" is enabled. This is an unnecessary restriction which makes field accesses (GetField) from native code slower when a JVMTI agent is attached which enables this capability. A better implementation would check at runtime if an agent actually wants to receive field access events. Note that the bytecode interpreter already uses this better implementation by checking if field access watch events were requested (JvmtiExport::_field_access_count != 0). I have implemented such a runtime check on all platforms which currently support FastJNIAccessors. My new jtreg test runtime/jni/FastGetField/FastGetField.java contains a micro benchmark: test-support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/FastGetField/FastGetField.jtr shows the duration of 10000 iterations with and without UseFastJNIAccessors (JVMTI agent gets attached in both runs). My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with FastJNIAccessors and 11.2ms without it. Webrev: http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/ We have run the test on 64 bit x86 platforms, SPARC and aarch64. (FastJNIAccessors are not yet available on PPC64 and s390. I'll contribute them later.) My webrev contains 32 bit implementations for x86 and arm, but completely untested. It'd be great if somebody could volunteer to review and test these platforms. Please review. Best regards, Martin From calvin.cheung at oracle.com Tue Jul 16 14:59:33 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Tue, 16 Jul 2019 07:59:33 -0700 Subject: [13] RFR(xs) 8227646: [TESTBUG] appcds/SharedArchiveConsistency timed out In-Reply-To: References: <6bfa61cf-eb23-47cb-ef55-47d670a04c0b@oracle.com> Message-ID: <248dbad8-bf0a-71d6-69fd-176b4c55df97@oracle.com> Dan, Thanks for your review! On 7/16/19 5:56 AM, Daniel D. Daugherty wrote: > On 7/16/19 12:31 AM, Calvin Cheung wrote: >> bug: https://bugs.openjdk.java.net/browse/JDK-8227646 >> >> webrev: http://cr.openjdk.java.net/~ccheung/8227646/webrev.00/ > > test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java > ??? Does the test intentionally crash in one or more of the test cases? > ? ? If not, then '-XX:-CreateCoredumpOnCrash' is not really needed. > ??? I don't think '-XX:-CreateCoredumpOnCrash' will prevent the timeout > ??? handling mechanism from trying to capture a core file in the case > ??? of a timeout. No, the test does not crash intentionally. Thanks for clarifying the -XX:-CreateCoredumpOnCrash. I will revert the change. > > ??? The test currently timed out with a default total timeout value of > ??? 480 seconds; that 480 comes from the default timeout value of 120 > ??? seconds and the default timeout factor of 4 (480 == 120 * 4). > > ??? The 'timeout=1000' value will get you a total timeout value of 4000. > ??? I suspect that is not what you want. > > ??? If you specify 'timeout=240', you'll get a total timeout value of > ??? 960 seconds (240 * 4). I've seen the total elapsed time for the test got very close to 960s. So to be on the safe side, I would set the timeout=300 as follows: diff --git a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java --- a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java +++ b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java @@ -35,7 +35,7 @@ ? * @build sun.hotspot.WhiteBox ? * @compile test-classes/Hello.java ? * @run driver ClassFileInstaller sun.hotspot.WhiteBox - * @run main/othervm -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI SharedArchiveConsistency + * @run main/othervm/timeout=300 -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI SharedArchiveConsistency ? */ ?import jdk.test.lib.process.OutputAnalyzer; ?import jdk.test.lib.Utils; I will do more testing with the above timeout before pushing the change. Let me know if you'd like to see another webrev. thanks, Calvin > > Dan > > >> >> Increase the timeout to 1000s and add the -XX:-CreateCoredumpOnCrash >> option to disable coredump. >> >> Testing: on 2 macosx hosts on which the timeout was observed. >> >> >> thanks, >> >> Calvin >> > From ioi.lam at oracle.com Tue Jul 16 15:25:40 2019 From: ioi.lam at oracle.com (Ioi Lam) Date: Tue, 16 Jul 2019 08:25:40 -0700 Subject: [13] RFR(xs) 8227646: [TESTBUG] appcds/SharedArchiveConsistency timed out In-Reply-To: <248dbad8-bf0a-71d6-69fd-176b4c55df97@oracle.com> References: <6bfa61cf-eb23-47cb-ef55-47d670a04c0b@oracle.com> <248dbad8-bf0a-71d6-69fd-176b4c55df97@oracle.com> Message-ID: <6fd396d2-9368-2b8d-c150-39cc2cfd8217@oracle.com> HI Calvin, Since the test is stuck at here at the timeout: at sun.nio.ch.FileDispatcherImpl.force0(java.base at 13-ea/Native Method) at sun.nio.ch.FileDispatcherImpl.force(java.base at 13-ea/FileDispatcherImpl.java:82) at sun.nio.ch.FileChannelImpl.force(java.base at 13-ea/FileChannelImpl.java:461) at SharedArchiveConsistency.writeData(SharedArchiveConsistency.java:166) Maybe we should remove the calls to FileChannel.force()? According to the javadoc, this call is for "ensuring that critical information is not lost in the event of a system crash", which I think is not necessary in our test. src/java.base/unix/native/libnio/ch/FileDispatcherImpl.c: JNIEXPORT jint JNICALL Java_sun_nio_ch_FileDispatcherImpl_force0(JNIEnv *env, jobject this, ????????????????????????????????????????? jobject fdo, jboolean md) { ??? jint fd = fdval(env, fdo); ??? int result = 0; #ifdef MACOSX ??? result = fcntl(fd, F_FULLFSYNC); ??? if (result == -1 && errno == ENOTSUP) { ??????? /* Try fsync() in case F_FULLSYUNC is not implemented on the file system. */ ??????? result = fsync(fd); ??? } https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/fcntl.2.html ???? F_FULLFSYNC??????? Does the same thing as fsync(2) then asks the drive to ??????????????????????? flush all buffered data to the permanent storage ??????????????????????? device (arg is ignored).? This is currently implemented ??????????????????????? on HFS, MS-DOS (FAT), and Universal Disk Format ??????????????????????? (UDF) file systems.? The operation may take quite a ??????????????????????? while to complete. Thanks - Ioi On 7/16/19 7:59 AM, Calvin Cheung wrote: > Dan, > > Thanks for your review! > > On 7/16/19 5:56 AM, Daniel D. Daugherty wrote: >> On 7/16/19 12:31 AM, Calvin Cheung wrote: >>> bug: https://bugs.openjdk.java.net/browse/JDK-8227646 >>> >>> webrev: http://cr.openjdk.java.net/~ccheung/8227646/webrev.00/ >> >> test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >> ??? Does the test intentionally crash in one or more of the test cases? >> ? ? If not, then '-XX:-CreateCoredumpOnCrash' is not really needed. >> ??? I don't think '-XX:-CreateCoredumpOnCrash' will prevent the timeout >> ??? handling mechanism from trying to capture a core file in the case >> ??? of a timeout. > No, the test does not crash intentionally. Thanks for clarifying the > -XX:-CreateCoredumpOnCrash. I will revert the change. >> >> ??? The test currently timed out with a default total timeout value of >> ??? 480 seconds; that 480 comes from the default timeout value of 120 >> ??? seconds and the default timeout factor of 4 (480 == 120 * 4). >> >> ??? The 'timeout=1000' value will get you a total timeout value of 4000. >> ??? I suspect that is not what you want. >> >> ??? If you specify 'timeout=240', you'll get a total timeout value of >> ??? 960 seconds (240 * 4). > > I've seen the total elapsed time for the test got very close to 960s. > So to be on the safe side, I would set the timeout=300 as follows: > > diff --git > a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java > b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java > --- a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java > +++ b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java > @@ -35,7 +35,7 @@ > ? * @build sun.hotspot.WhiteBox > ? * @compile test-classes/Hello.java > ? * @run driver ClassFileInstaller sun.hotspot.WhiteBox > - * @run main/othervm -Xbootclasspath/a:. > -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI SharedArchiveConsistency > + * @run main/othervm/timeout=300 -Xbootclasspath/a:. > -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI SharedArchiveConsistency > ? */ > ?import jdk.test.lib.process.OutputAnalyzer; > ?import jdk.test.lib.Utils; > > I will do more testing with the above timeout before pushing the change. > > Let me know if you'd like to see another webrev. > > thanks, > > Calvin > >> >> Dan >> >> >>> >>> Increase the timeout to 1000s and add the -XX:-CreateCoredumpOnCrash >>> option to disable coredump. >>> >>> Testing: on 2 macosx hosts on which the timeout was observed. >>> >>> >>> thanks, >>> >>> Calvin >>> >> From coleen.phillimore at oracle.com Tue Jul 16 15:37:57 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 16 Jul 2019 11:37:57 -0400 Subject: RFR(S): 8227692: Remove develop feature PrintMallocStatistics In-Reply-To: References: Message-ID: This looks good, also the reworking inc_stat_counter that seems to still be used by the c2 compiler. Thanks! Coleen On 7/15/19 4:16 PM, Doerr, Martin wrote: > Hi, > > as announced on hotspot-dev, I'd like to remove the debug build feature for allocation statistics "AllocStats" (controlled by develop flag -XX:+PrintMallocStatistics). > I've closed JDK-8227597 which was a proposal to reduce the performance impact of it, but several people have suggested to remove this feature which is even better IMHO. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8227692 > > Webrev: > http://cr.openjdk.java.net/~mdoerr/8227692_remove_PrintMallocStatistics/webrev.00/ > > I've also taken over the reworked inc_stat_counter from JDK-8227597 (allocation.inline.hpp). > Please review. > > Best regards, > Martin > From daniel.daugherty at oracle.com Tue Jul 16 15:37:48 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 16 Jul 2019 11:37:48 -0400 Subject: [13] RFR(xs) 8227646: [TESTBUG] appcds/SharedArchiveConsistency timed out In-Reply-To: <248dbad8-bf0a-71d6-69fd-176b4c55df97@oracle.com> References: <6bfa61cf-eb23-47cb-ef55-47d670a04c0b@oracle.com> <248dbad8-bf0a-71d6-69fd-176b4c55df97@oracle.com> Message-ID: <54d52e40-b73c-b15f-2edb-b6ff48db2cef@oracle.com> On 7/16/19 10:59 AM, Calvin Cheung wrote: > Dan, > > Thanks for your review! > > On 7/16/19 5:56 AM, Daniel D. Daugherty wrote: >> On 7/16/19 12:31 AM, Calvin Cheung wrote: >>> bug: https://bugs.openjdk.java.net/browse/JDK-8227646 >>> >>> webrev: http://cr.openjdk.java.net/~ccheung/8227646/webrev.00/ >> >> test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >> ??? Does the test intentionally crash in one or more of the test cases? >> ? ? If not, then '-XX:-CreateCoredumpOnCrash' is not really needed. >> ??? I don't think '-XX:-CreateCoredumpOnCrash' will prevent the timeout >> ??? handling mechanism from trying to capture a core file in the case >> ??? of a timeout. > No, the test does not crash intentionally. Thanks for clarifying the > -XX:-CreateCoredumpOnCrash. I will revert the change. Thanks. >> >> ??? The test currently timed out with a default total timeout value of >> ??? 480 seconds; that 480 comes from the default timeout value of 120 >> ??? seconds and the default timeout factor of 4 (480 == 120 * 4). >> >> ??? The 'timeout=1000' value will get you a total timeout value of 4000. >> ??? I suspect that is not what you want. >> >> ??? If you specify 'timeout=240', you'll get a total timeout value of >> ??? 960 seconds (240 * 4). > > I've seen the total elapsed time for the test got very close to 960s. > So to be on the safe side, I would set the timeout=300 as follows: > > diff --git > a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java > b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java > --- a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java > +++ b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java > @@ -35,7 +35,7 @@ > ? * @build sun.hotspot.WhiteBox > ? * @compile test-classes/Hello.java > ? * @run driver ClassFileInstaller sun.hotspot.WhiteBox > - * @run main/othervm -Xbootclasspath/a:. > -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI SharedArchiveConsistency > + * @run main/othervm/timeout=300 -Xbootclasspath/a:. > -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI SharedArchiveConsistency > ? */ > ?import jdk.test.lib.process.OutputAnalyzer; > ?import jdk.test.lib.Utils; > > I will do more testing with the above timeout before pushing the change. > > Let me know if you'd like to see another webrev. I'm good with the 'timeout=300' value. No need for another webrev. Dan > > thanks, > > Calvin > >> >> Dan >> >> >>> >>> Increase the timeout to 1000s and add the -XX:-CreateCoredumpOnCrash >>> option to disable coredump. >>> >>> Testing: on 2 macosx hosts on which the timeout was observed. >>> >>> >>> thanks, >>> >>> Calvin >>> >> From daniel.daugherty at oracle.com Tue Jul 16 16:04:03 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 16 Jul 2019 12:04:03 -0400 Subject: RFR(S): 8227692: Remove develop feature PrintMallocStatistics In-Reply-To: References: Message-ID: <84fbdef1-3735-7344-2d97-ebcc38213140@oracle.com> For anyone that happens to be searching JBS for what happened to the '-XX:+PrintMallocStatistics' option, you might want to include some guidance on how they get the equivalent information from NMT... A short note in JDK-8227692 should suffice... Dan On 7/15/19 4:16 PM, Doerr, Martin wrote: > Hi, > > as announced on hotspot-dev, I'd like to remove the debug build feature for allocation statistics "AllocStats" (controlled by develop flag -XX:+PrintMallocStatistics). > I've closed JDK-8227597 which was a proposal to reduce the performance impact of it, but several people have suggested to remove this feature which is even better IMHO. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8227692 > > Webrev: > http://cr.openjdk.java.net/~mdoerr/8227692_remove_PrintMallocStatistics/webrev.00/ > > I've also taken over the reworked inc_stat_counter from JDK-8227597 (allocation.inline.hpp). > Please review. > > Best regards, > Martin > From calvin.cheung at oracle.com Tue Jul 16 16:09:18 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Tue, 16 Jul 2019 09:09:18 -0700 Subject: [13] RFR(xs) 8227646: [TESTBUG] appcds/SharedArchiveConsistency timed out In-Reply-To: <6fd396d2-9368-2b8d-c150-39cc2cfd8217@oracle.com> References: <6bfa61cf-eb23-47cb-ef55-47d670a04c0b@oracle.com> <248dbad8-bf0a-71d6-69fd-176b4c55df97@oracle.com> <6fd396d2-9368-2b8d-c150-39cc2cfd8217@oracle.com> Message-ID: Hi Ioi, I found 2 fc.force(true) calls in the test. I've removed both and testing it without increasing the timeout value. I test it by running the hotspot_tier2_runtime test group 10 times on mac hosts. Each iteration takes about 30 min. Will let you know about the results. thanks, Calvin On 7/16/19 8:25 AM, Ioi Lam wrote: > HI Calvin, > > Since the test is stuck at here at the timeout: > > at sun.nio.ch.FileDispatcherImpl.force0(java.base at 13-ea/Native Method) > at > sun.nio.ch.FileDispatcherImpl.force(java.base at 13-ea/FileDispatcherImpl.java:82) > at > sun.nio.ch.FileChannelImpl.force(java.base at 13-ea/FileChannelImpl.java:461) > at SharedArchiveConsistency.writeData(SharedArchiveConsistency.java:166) > > Maybe we should remove the calls to FileChannel.force()? According to > the javadoc, this call is for "ensuring that critical information is > not lost in the event of a system crash", which I think is not > necessary in our test. > > src/java.base/unix/native/libnio/ch/FileDispatcherImpl.c: > > JNIEXPORT jint JNICALL > Java_sun_nio_ch_FileDispatcherImpl_force0(JNIEnv *env, jobject this, > ????????????????????????????????????????? jobject fdo, jboolean md) > { > ??? jint fd = fdval(env, fdo); > ??? int result = 0; > > #ifdef MACOSX > ??? result = fcntl(fd, F_FULLFSYNC); > ??? if (result == -1 && errno == ENOTSUP) { > ??????? /* Try fsync() in case F_FULLSYUNC is not implemented on the > file system. */ > ??????? result = fsync(fd); > ??? } > > https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/fcntl.2.html > > > ???? F_FULLFSYNC??????? Does the same thing as fsync(2) then asks the > drive to > ??????????????????????? flush all buffered data to the permanent storage > ??????????????????????? device (arg is ignored).? This is currently > implemented > ??????????????????????? on HFS, MS-DOS (FAT), and Universal Disk Format > ??????????????????????? (UDF) file systems.? The operation may take > quite a > ??????????????????????? while to complete. > > Thanks > - Ioi > > > On 7/16/19 7:59 AM, Calvin Cheung wrote: >> Dan, >> >> Thanks for your review! >> >> On 7/16/19 5:56 AM, Daniel D. Daugherty wrote: >>> On 7/16/19 12:31 AM, Calvin Cheung wrote: >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8227646 >>>> >>>> webrev: http://cr.openjdk.java.net/~ccheung/8227646/webrev.00/ >>> >>> test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>> ??? Does the test intentionally crash in one or more of the test cases? >>> ? ? If not, then '-XX:-CreateCoredumpOnCrash' is not really needed. >>> ??? I don't think '-XX:-CreateCoredumpOnCrash' will prevent the timeout >>> ??? handling mechanism from trying to capture a core file in the case >>> ??? of a timeout. >> No, the test does not crash intentionally. Thanks for clarifying the >> -XX:-CreateCoredumpOnCrash. I will revert the change. >>> >>> ??? The test currently timed out with a default total timeout value of >>> ??? 480 seconds; that 480 comes from the default timeout value of 120 >>> ??? seconds and the default timeout factor of 4 (480 == 120 * 4). >>> >>> ??? The 'timeout=1000' value will get you a total timeout value of >>> 4000. >>> ??? I suspect that is not what you want. >>> >>> ??? If you specify 'timeout=240', you'll get a total timeout value of >>> ??? 960 seconds (240 * 4). >> >> I've seen the total elapsed time for the test got very close to 960s. >> So to be on the safe side, I would set the timeout=300 as follows: >> >> diff --git >> a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >> b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >> --- a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >> +++ b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >> @@ -35,7 +35,7 @@ >> ? * @build sun.hotspot.WhiteBox >> ? * @compile test-classes/Hello.java >> ? * @run driver ClassFileInstaller sun.hotspot.WhiteBox >> - * @run main/othervm -Xbootclasspath/a:. >> -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI SharedArchiveConsistency >> + * @run main/othervm/timeout=300 -Xbootclasspath/a:. >> -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI SharedArchiveConsistency >> ? */ >> ?import jdk.test.lib.process.OutputAnalyzer; >> ?import jdk.test.lib.Utils; >> >> I will do more testing with the above timeout before pushing the change. >> >> Let me know if you'd like to see another webrev. >> >> thanks, >> >> Calvin >> >>> >>> Dan >>> >>> >>>> >>>> Increase the timeout to 1000s and add the >>>> -XX:-CreateCoredumpOnCrash option to disable coredump. >>>> >>>> Testing: on 2 macosx hosts on which the timeout was observed. >>>> >>>> >>>> thanks, >>>> >>>> Calvin >>>> >>> > From richard.reingruber at sap.com Tue Jul 16 16:15:18 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Tue, 16 Jul 2019 16:15:18 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for better performance when debugging Message-ID: Hi, Please review this implementation for the RFE to enable escape analysis when debugging. Bug: https://bugs.openjdk.java.net/browse/JDK-8227745 Webrev: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.0/ In short the JVMTI implementation is changed to revert EA based optimizations just before objects escape through JVMTI. At runtime there is no escape information for each object in scope. Instead each scope is annotated, if non-escaping objects exist and if some are passed as parameters. If a JVMTI agent accesses a reference on stack, then the owning compiled frame C is deoptimized, if any non-escaping object is in scope. Scalar replaced objects are reallocated on the heap and objects with eliminated locking are relocked. This is called "deoptimizing objects" for short. If the agent accesses a reference in a callee frame of C and C is passing any non-escaping object as argument then C and its objects are deoptimized as well. Deoptimized objects are kept as deferred updates (preexisting JavaThread::_deferred_locals_updates). Either all objects of a compiled frame are deoptimized or none. It is annotated at the corresponding deferred updates if it happened already in order to avoid doing it twice. There is preexisting code to deoptimize objects when deoptimizing a compiled frame. The code was extended to be able to deoptimize objects of a frame that is not the top frame and to let another thread than the owning thread do it. Thanks, Richard. From mikhailo.seledtsov at oracle.com Tue Jul 16 16:48:46 2019 From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com) Date: Tue, 16 Jul 2019 09:48:46 -0700 Subject: RFR(S): 8227122: [TESTBUG] Create Docker sidecar test cases In-Reply-To: References: <11c03bdc-832b-3479-075b-1a8b5dc95e87@oracle.com> <47b20d2a-02b9-33b8-b2e4-6022f809e139@oracle.com> Message-ID: <79e78a82-4352-59b7-0a72-16bb18549868@oracle.com> Hi Severin, Bob, ?? Here is an updated webrev that should address all of your feedback: ??? http://cr.openjdk.java.net/~mseledtsov/8227122.01/ To summarize the changes since webrev 00: ???? - using 'docker ps' to wait until the "main" container starts ???? - removed use of --ipc=shareable (not needed) ???? - added comments regarding sharing of /tmp ???? - using docker volumes ("--volumes-from") to share /tmp between the containers instead of mapping to host directories (avoids potential access/permission problems) ???? - few other minor changes and cleanup Testing: ???? ran this test on OL 7.6 and on variety of Linux nodes in the lab, a number of times - all PASS See more of comments inline below On 7/12/19 2:12 AM, Severin Gehwolf wrote: > Hi Misha, > > On Thu, 2019-07-11 at 17:58 -0700, mikhailo.seledtsov at oracle.com wrote: >> Hi Severin, >> >> Thank you for taking a look at this change. >> >> On 7/10/19 10:40 AM, Severin Gehwolf wrote: >>> Hi Misha, >>> >>> On Tue, 2019-07-02 at 15:24 -0700, mikhailo.seledtsov at oracle.com wrote: >>>> Please review this new test that uses a Docker sidecar pattern to >>>> manage/monitor JVM running in the main payload container. >>>> >>>> Sidecar is a common pattern used in the cloud environments for >>>> monitoring among other uses. In side car pattern the main >>>> application/service container that runs the payload is paired with a >>>> sidecar container. It is achieved by sharing certain namespace >>>> aspects >>>> between the two containers such as PID namespace, specific >>>> sub-directories, IPC and more. >>>> >>>> This test implements the following cases: >>>> - "jcmd -l" to list java processes running in "main" container >>>> from >>>> the "sidecar" container >>>> - "jhsdb jinfo" in the sidecar configuration >>>> - jcmd >>>> >>>> This change also builds a basis for more test cases in the future. >>>> >>>> Minor changes were done to DockerTestUtils: >>>> - changing access to DOCKER_COMMAND constant to public >>>> - minor spelling and terminology corrections >>>> >>>> >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8227122 >>>> Webrev: http://cr.openjdk.java.net/~mseledtsov/8227122.00/ >>>> Testing: >>>> 1. ran Docker tests on Linux-x64 - PASS >>>> 2. Running Docker tests in test cluster - in progress >>>> >>> // JCMD does not work in sidecar configuration, except for "jcmd -l". >>> // Including this test case to assist in reproduction of the problem. >>> // t.assertIsAlive(); >>> // testCase03(mainProcPid); >>> >>> FWIW, "jcmd -l" doesn't work in this case either. It only sees itself >>> as far as I can tell. >> In my experiment it does work. Here are parts of the test log, first the >> command that runs jcmd in a sidecar container, then the output of that >> container: >> >> """ >> >> [COMMAND] >> >> /usr/local/bin/docker run --tty=true --rm --cap-add=SYS_PTRACE >> --sig-proxy=true --pid=container:test-container-main >> --ipc=container:test-container-main -v >> /ws/playArea/sidecar-jcmd-8227122/JTwork/scratch/.:/tmp/ >> jdk-internal:test-jfr-jcmd /jdk/bin/jcmd -l >> >> [2019-07-12T00:26:29.083764Z] Gathering output for process 8703 >> [ELAPSED: 5 ms] >> [STDERR] >> >> [STDOUT] >> 1 EventGeneratorLoop 15 >> 23 jdk.jcmd/sun.tools.jcmd.JCmd -l >> >> """ >> >> The output shows 2 processes, one is EventGeneratorLoop with PID of 1 >> (as expected). This is possible because the containers share certain >> namespaces and mounted volumes in a 'sidecar' configuration. In this >> case, containers share the PID namespace >> (--pid=container:test-container-main) and share volumes mounted as >> "/tmp" inside the container (-v >> /ws/playArea/sidecar-jcmd-8227122/JTwork/scratch/.:/tmp/) >> > Right, sorry. Perhaps this code should get a comment that sharing /tmp > between sidecar and host container is needed for jvmstat - used > internally by the attach mechanism - to work. See > HotSpotAttachProvider.testAttachable(): > > + String[] command = new String[] { > + DockerTestUtils.DOCKER_COMMAND, "run", > + "--tty=true", "--rm", > + "--cap-add=SYS_PTRACE", "--sig-proxy=true", > + "--pid=container:" + MAIN_CONTAINER_NAME, > + "--pid=container:" + MAIN_CONTAINER_NAME, > + "--ipc=container:" + MAIN_CONTAINER_NAME, > + "-v", WORK_DIR + ":/tmp/", > > I believe -XX:+UsePerfData would be in order too as I don't think > things would work if that default changed. I have added the comments and added -XX:+UsePerfData for the "main" JVM process. > >>> What's more, this seems to be a case of AttachListener::is_init_trigger[1] and >>> VirtualMachineImpl.createAttachFile[2] disagreeing. The former looks in >>> $(pwd)/.attach_pid or /tmp/.attach_pid and the latter creates >>> it in /proc//root/tmp/.attach_pid. > This seems to be the cause for why testCase03 doesn't work. Perhaps > this deserves a bug and I can help fix it. > > While looking at that, I discovered what I said below, which is a > different case I know. Once these tests are integrated I will file a bug, and can reference the test from that bug. > >>> There seems to be more issues involved. As attaching to a JVM inside a >>> container doesn't seem to work from outside which is supposed to be >>> fixed with JDK-8179498. That alone seems to warrant a bug. >> You are describing a slightly different use case / pattern, but I agree >> it does not seem to work. I am happy to hear confirmation of that. > I was pointing out that JDK-8179498 seems to have regressed. It's > unrelated but should be taken into account when fixing the above issue. > >> The pattern addressed in this test is a side car, where both the >> observer and observee run in containers; the containers are 'friendly' >> by sharing certain apsects of namespaces. > Yes. > >> The use case you are describing is somewhat different, if I understand >> correctly: the observer runs on a host machine, and obsrvee runs in a >> container. Observer tries to use jcmd to list the java processes running >> in container(s), and issue commands, but that fails. I can create a bug >> for that, and a simple test case. > There should be a bug and a test so that it cannot again regress. > > JDK-8193710 is also related, but the fix for that bug didn't have a > test either :( That's this one which needs fixing: > https://bugs.openjdk.java.net/browse/JDK-8195809 ??? JDK-8195809: [TESTBUG] Create tests for JDK-8193710 jps and jcmd -l support for Docker containers ??? I have assigned it to myself, and will be working on it soon. Thank you, Misha > >>> private static DockerThread startMainContainer() throws Exception { >>> // start "main" container (the observee) >>> DockerRunOptions opts = commonDockerOpts("EventGeneratorLoop"); >>> opts.addDockerOpts("--cap-add=SYS_PTRACE", "--ipc=shareable") >>> >>> Is '--ipc=shareable' really needed? It's not a supported option for my >>> docker here :-( >> I have removed the '--ipc=shareable' and the test still works. I think >> this is extra stuff that is not necessary for this test case, so I will >> remove it. > Excellent! > >> I will incorporate changes from your and Bob's review, run some testing, >> and post an updated webrev. > Thanks, > Severin > >>> [1] http://hg.openjdk.java.net/jdk/jdk/file/ba72dac556c3/src/hotspot/os/linux/attachListener_linux.cpp#l500 >>> [2] http://hg.openjdk.java.net/jdk/jdk/file/ba72dac556c3/src/jdk.attach/linux/classes/sun/tools/attach/VirtualMachineImpl.java#l295 >>> From richard.reingruber at sap.com Tue Jul 16 17:23:15 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Tue, 16 Jul 2019 17:23:15 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for better performance when debugging Message-ID: // repost including hotspot-compiler-dev at openjdk.java.net Hi, Please review this implementation for the RFE to enable escape analysis when debugging. Bug: https://bugs.openjdk.java.net/browse/JDK-8227745 Webrev: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.0/ In short the JVMTI implementation is changed to revert EA based optimizations just before objects escape through JVMTI. At runtime there is no escape information for each object in scope. Instead each scope is annotated, if non-escaping objects exist and if some are passed as parameters. If a JVMTI agent accesses a reference on stack, then the owning compiled frame C is deoptimized, if any non-escaping object is in scope. Scalar replaced objects are reallocated on the heap and objects with eliminated locking are relocked. This is called "deoptimizing objects" for short. If the agent accesses a reference in a callee frame of C and C is passing any non-escaping object as argument then C and its objects are deoptimized as well. Deoptimized objects are kept as deferred updates (preexisting JavaThread::_deferred_locals_updates). Either all objects of a compiled frame are deoptimized or none. It is annotated at the corresponding deferred updates if it happened already in order to avoid doing it twice. There is preexisting code to deoptimize objects when deoptimizing a compiled frame. The code was extended to be able to deoptimize objects of a frame that is not the top frame and to let another thread than the owning thread do it. Thanks, Richard. From martin.doerr at sap.com Tue Jul 16 20:15:26 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 16 Jul 2019 20:15:26 +0000 Subject: RFR(S): 8227692: Remove develop feature PrintMallocStatistics In-Reply-To: <84fbdef1-3735-7344-2d97-ebcc38213140@oracle.com> References: <84fbdef1-3735-7344-2d97-ebcc38213140@oracle.com> Message-ID: Hi Dan, I've added the proposal to use "-XX:NativeMemoryTracking=summary -XX:+PrintNMTStatistics" instead. Thanks, Martin > -----Original Message----- > From: Daniel D. Daugherty > Sent: Dienstag, 16. Juli 2019 18:04 > To: Doerr, Martin ; hotspot-runtime- > dev at openjdk.java.net > Cc: Baesken, Matthias > Subject: Re: RFR(S): 8227692: Remove develop feature PrintMallocStatistics > > For anyone that happens to be searching JBS for what happened to the > '-XX:+PrintMallocStatistics' option, you might want to include some > guidance on how they get the equivalent information from NMT... > > A short note in JDK-8227692 should suffice... > > Dan > > > On 7/15/19 4:16 PM, Doerr, Martin wrote: > > Hi, > > > > as announced on hotspot-dev, I'd like to remove the debug build feature > for allocation statistics "AllocStats" (controlled by develop flag - > XX:+PrintMallocStatistics). > > I've closed JDK-8227597 8227597> which was a proposal to reduce the performance impact of it, but > several people have suggested to remove this feature which is even better > IMHO. > > > > Bug: > > https://bugs.openjdk.java.net/browse/JDK-8227692 > > > > Webrev: > > > http://cr.openjdk.java.net/~mdoerr/8227692_remove_PrintMallocStatistics/ > webrev.00/ > > > > I've also taken over the reworked inc_stat_counter from JDK- > 8227597 > (allocation.inline.hpp). > > Please review. > > > > Best regards, > > Martin > > From daniel.daugherty at oracle.com Tue Jul 16 22:20:14 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 16 Jul 2019 18:20:14 -0400 Subject: RFR(S): 8227692: Remove develop feature PrintMallocStatistics In-Reply-To: References: <84fbdef1-3735-7344-2d97-ebcc38213140@oracle.com> Message-ID: Thanks. Sounds good. Dan On 7/16/19 4:15 PM, Doerr, Martin wrote: > Hi Dan, > > I've added the proposal to use "-XX:NativeMemoryTracking=summary -XX:+PrintNMTStatistics" instead. > > Thanks, > Martin > > >> -----Original Message----- >> From: Daniel D. Daugherty >> Sent: Dienstag, 16. Juli 2019 18:04 >> To: Doerr, Martin ; hotspot-runtime- >> dev at openjdk.java.net >> Cc: Baesken, Matthias >> Subject: Re: RFR(S): 8227692: Remove develop feature PrintMallocStatistics >> >> For anyone that happens to be searching JBS for what happened to the >> '-XX:+PrintMallocStatistics' option, you might want to include some >> guidance on how they get the equivalent information from NMT... >> >> A short note in JDK-8227692 should suffice... >> >> Dan >> >> >> On 7/15/19 4:16 PM, Doerr, Martin wrote: >>> Hi, >>> >>> as announced on hotspot-dev, I'd like to remove the debug build feature >> for allocation statistics "AllocStats" (controlled by develop flag - >> XX:+PrintMallocStatistics). >>> I've closed JDK-8227597> 8227597> which was a proposal to reduce the performance impact of it, but >> several people have suggested to remove this feature which is even better >> IMHO. >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8227692 >>> >>> Webrev: >>> >> http://cr.openjdk.java.net/~mdoerr/8227692_remove_PrintMallocStatistics/ >> webrev.00/ >>> I've also taken over the reworked inc_stat_counter from JDK- >> 8227597 >> (allocation.inline.hpp). >>> Please review. >>> >>> Best regards, >>> Martin >>> From calvin.cheung at oracle.com Wed Jul 17 03:09:57 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Tue, 16 Jul 2019 20:09:57 -0700 Subject: [13] RFR(xs) 8227646: [TESTBUG] appcds/SharedArchiveConsistency timed out In-Reply-To: References: <6bfa61cf-eb23-47cb-ef55-47d670a04c0b@oracle.com> <248dbad8-bf0a-71d6-69fd-176b4c55df97@oracle.com> <6fd396d2-9368-2b8d-c150-39cc2cfd8217@oracle.com> Message-ID: <584b6d93-094f-80c0-f0a7-1da5b3383b76@oracle.com> Removing the fc.force(true) calls works and the test performs better on macosx. It takes about 20s for the test to finish without the call vs. up to 9 min. with the call. updated webrev: ??? http://cr.openjdk.java.net/~ccheung/8227646/webrev.01/ Ran tier1 and 2 tests successfully and repeated tier2 test 10 times on a Mac host from which the timeout was observed. thanks, Calvin On 7/16/19 9:09 AM, Calvin Cheung wrote: > Hi Ioi, > > I found 2 fc.force(true) calls in the test. I've removed both and > testing it without increasing the timeout value. > > I test it by running the hotspot_tier2_runtime test group 10 times on > mac hosts. Each iteration takes about 30 min. Will let you know about > the results. > > thanks, > > Calvin > > On 7/16/19 8:25 AM, Ioi Lam wrote: >> HI Calvin, >> >> Since the test is stuck at here at the timeout: >> >> at sun.nio.ch.FileDispatcherImpl.force0(java.base at 13-ea/Native Method) >> at >> sun.nio.ch.FileDispatcherImpl.force(java.base at 13-ea/FileDispatcherImpl.java:82) >> at >> sun.nio.ch.FileChannelImpl.force(java.base at 13-ea/FileChannelImpl.java:461) >> at SharedArchiveConsistency.writeData(SharedArchiveConsistency.java:166) >> >> Maybe we should remove the calls to FileChannel.force()? According to >> the javadoc, this call is for "ensuring that critical information is >> not lost in the event of a system crash", which I think is not >> necessary in our test. >> >> src/java.base/unix/native/libnio/ch/FileDispatcherImpl.c: >> >> JNIEXPORT jint JNICALL >> Java_sun_nio_ch_FileDispatcherImpl_force0(JNIEnv *env, jobject this, >> ????????????????????????????????????????? jobject fdo, jboolean md) >> { >> ??? jint fd = fdval(env, fdo); >> ??? int result = 0; >> >> #ifdef MACOSX >> ??? result = fcntl(fd, F_FULLFSYNC); >> ??? if (result == -1 && errno == ENOTSUP) { >> ??????? /* Try fsync() in case F_FULLSYUNC is not implemented on the >> file system. */ >> ??????? result = fsync(fd); >> ??? } >> >> https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/fcntl.2.html >> >> >> ???? F_FULLFSYNC??????? Does the same thing as fsync(2) then asks the >> drive to >> ??????????????????????? flush all buffered data to the permanent storage >> ??????????????????????? device (arg is ignored).? This is currently >> implemented >> ??????????????????????? on HFS, MS-DOS (FAT), and Universal Disk Format >> ??????????????????????? (UDF) file systems.? The operation may take >> quite a >> ??????????????????????? while to complete. >> >> Thanks >> - Ioi >> >> >> On 7/16/19 7:59 AM, Calvin Cheung wrote: >>> Dan, >>> >>> Thanks for your review! >>> >>> On 7/16/19 5:56 AM, Daniel D. Daugherty wrote: >>>> On 7/16/19 12:31 AM, Calvin Cheung wrote: >>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8227646 >>>>> >>>>> webrev: http://cr.openjdk.java.net/~ccheung/8227646/webrev.00/ >>>> >>>> test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>>> ??? Does the test intentionally crash in one or more of the test >>>> cases? >>>> ? ? If not, then '-XX:-CreateCoredumpOnCrash' is not really needed. >>>> ??? I don't think '-XX:-CreateCoredumpOnCrash' will prevent the >>>> timeout >>>> ??? handling mechanism from trying to capture a core file in the case >>>> ??? of a timeout. >>> No, the test does not crash intentionally. Thanks for clarifying the >>> -XX:-CreateCoredumpOnCrash. I will revert the change. >>>> >>>> ??? The test currently timed out with a default total timeout value of >>>> ??? 480 seconds; that 480 comes from the default timeout value of 120 >>>> ??? seconds and the default timeout factor of 4 (480 == 120 * 4). >>>> >>>> ??? The 'timeout=1000' value will get you a total timeout value of >>>> 4000. >>>> ??? I suspect that is not what you want. >>>> >>>> ??? If you specify 'timeout=240', you'll get a total timeout value of >>>> ??? 960 seconds (240 * 4). >>> >>> I've seen the total elapsed time for the test got very close to >>> 960s. So to be on the safe side, I would set the timeout=300 as >>> follows: >>> >>> diff --git >>> a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>> b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>> --- a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>> +++ b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>> @@ -35,7 +35,7 @@ >>> ? * @build sun.hotspot.WhiteBox >>> ? * @compile test-classes/Hello.java >>> ? * @run driver ClassFileInstaller sun.hotspot.WhiteBox >>> - * @run main/othervm -Xbootclasspath/a:. >>> -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI >>> SharedArchiveConsistency >>> + * @run main/othervm/timeout=300 -Xbootclasspath/a:. >>> -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI >>> SharedArchiveConsistency >>> ? */ >>> ?import jdk.test.lib.process.OutputAnalyzer; >>> ?import jdk.test.lib.Utils; >>> >>> I will do more testing with the above timeout before pushing the >>> change. >>> >>> Let me know if you'd like to see another webrev. >>> >>> thanks, >>> >>> Calvin >>> >>>> >>>> Dan >>>> >>>> >>>>> >>>>> Increase the timeout to 1000s and add the >>>>> -XX:-CreateCoredumpOnCrash option to disable coredump. >>>>> >>>>> Testing: on 2 macosx hosts on which the timeout was observed. >>>>> >>>>> >>>>> thanks, >>>>> >>>>> Calvin >>>>> >>>> >> From ioi.lam at oracle.com Wed Jul 17 03:26:54 2019 From: ioi.lam at oracle.com (Ioi Lam) Date: Tue, 16 Jul 2019 20:26:54 -0700 Subject: [13] RFR(xs) 8227646: [TESTBUG] appcds/SharedArchiveConsistency timed out In-Reply-To: <584b6d93-094f-80c0-f0a7-1da5b3383b76@oracle.com> References: <6bfa61cf-eb23-47cb-ef55-47d670a04c0b@oracle.com> <248dbad8-bf0a-71d6-69fd-176b4c55df97@oracle.com> <6fd396d2-9368-2b8d-c150-39cc2cfd8217@oracle.com> <584b6d93-094f-80c0-f0a7-1da5b3383b76@oracle.com> Message-ID: <8a5e2678-b378-a6d4-0781-f03763a91101@oracle.com> Looks good. Thanks! - Ioi On 7/16/19 8:09 PM, Calvin Cheung wrote: > Removing the fc.force(true) calls works and the test performs better > on macosx. It takes about 20s for the test to finish without the call > vs. up to 9 min. with the call. > > updated webrev: > > ??? http://cr.openjdk.java.net/~ccheung/8227646/webrev.01/ > > Ran tier1 and 2 tests successfully and repeated tier2 test 10 times on > a Mac host from which the timeout was observed. > > thanks, > > Calvin > > On 7/16/19 9:09 AM, Calvin Cheung wrote: >> Hi Ioi, >> >> I found 2 fc.force(true) calls in the test. I've removed both and >> testing it without increasing the timeout value. >> >> I test it by running the hotspot_tier2_runtime test group 10 times on >> mac hosts. Each iteration takes about 30 min. Will let you know about >> the results. >> >> thanks, >> >> Calvin >> >> On 7/16/19 8:25 AM, Ioi Lam wrote: >>> HI Calvin, >>> >>> Since the test is stuck at here at the timeout: >>> >>> at sun.nio.ch.FileDispatcherImpl.force0(java.base at 13-ea/Native Method) >>> at >>> sun.nio.ch.FileDispatcherImpl.force(java.base at 13-ea/FileDispatcherImpl.java:82) >>> at >>> sun.nio.ch.FileChannelImpl.force(java.base at 13-ea/FileChannelImpl.java:461) >>> at >>> SharedArchiveConsistency.writeData(SharedArchiveConsistency.java:166) >>> >>> Maybe we should remove the calls to FileChannel.force()? According >>> to the javadoc, this call is for "ensuring that critical information >>> is not lost in the event of a system crash", which I think is not >>> necessary in our test. >>> >>> src/java.base/unix/native/libnio/ch/FileDispatcherImpl.c: >>> >>> JNIEXPORT jint JNICALL >>> Java_sun_nio_ch_FileDispatcherImpl_force0(JNIEnv *env, jobject this, >>> ????????????????????????????????????????? jobject fdo, jboolean md) >>> { >>> ??? jint fd = fdval(env, fdo); >>> ??? int result = 0; >>> >>> #ifdef MACOSX >>> ??? result = fcntl(fd, F_FULLFSYNC); >>> ??? if (result == -1 && errno == ENOTSUP) { >>> ??????? /* Try fsync() in case F_FULLSYUNC is not implemented on the >>> file system. */ >>> ??????? result = fsync(fd); >>> ??? } >>> >>> https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/fcntl.2.html >>> >>> >>> ???? F_FULLFSYNC??????? Does the same thing as fsync(2) then asks >>> the drive to >>> ??????????????????????? flush all buffered data to the permanent >>> storage >>> ??????????????????????? device (arg is ignored).? This is currently >>> implemented >>> ??????????????????????? on HFS, MS-DOS (FAT), and Universal Disk Format >>> ??????????????????????? (UDF) file systems.? The operation may take >>> quite a >>> ??????????????????????? while to complete. >>> >>> Thanks >>> - Ioi >>> >>> >>> On 7/16/19 7:59 AM, Calvin Cheung wrote: >>>> Dan, >>>> >>>> Thanks for your review! >>>> >>>> On 7/16/19 5:56 AM, Daniel D. Daugherty wrote: >>>>> On 7/16/19 12:31 AM, Calvin Cheung wrote: >>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8227646 >>>>>> >>>>>> webrev: http://cr.openjdk.java.net/~ccheung/8227646/webrev.00/ >>>>> >>>>> test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>>>> ??? Does the test intentionally crash in one or more of the test >>>>> cases? >>>>> ? ? If not, then '-XX:-CreateCoredumpOnCrash' is not really needed. >>>>> ??? I don't think '-XX:-CreateCoredumpOnCrash' will prevent the >>>>> timeout >>>>> ??? handling mechanism from trying to capture a core file in the case >>>>> ??? of a timeout. >>>> No, the test does not crash intentionally. Thanks for clarifying >>>> the -XX:-CreateCoredumpOnCrash. I will revert the change. >>>>> >>>>> ??? The test currently timed out with a default total timeout >>>>> value of >>>>> ??? 480 seconds; that 480 comes from the default timeout value of 120 >>>>> ??? seconds and the default timeout factor of 4 (480 == 120 * 4). >>>>> >>>>> ??? The 'timeout=1000' value will get you a total timeout value of >>>>> 4000. >>>>> ??? I suspect that is not what you want. >>>>> >>>>> ??? If you specify 'timeout=240', you'll get a total timeout value of >>>>> ??? 960 seconds (240 * 4). >>>> >>>> I've seen the total elapsed time for the test got very close to >>>> 960s. So to be on the safe side, I would set the timeout=300 as >>>> follows: >>>> >>>> diff --git >>>> a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>>> b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>>> --- a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>>> +++ b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>>> @@ -35,7 +35,7 @@ >>>> ? * @build sun.hotspot.WhiteBox >>>> ? * @compile test-classes/Hello.java >>>> ? * @run driver ClassFileInstaller sun.hotspot.WhiteBox >>>> - * @run main/othervm -Xbootclasspath/a:. >>>> -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI >>>> SharedArchiveConsistency >>>> + * @run main/othervm/timeout=300 -Xbootclasspath/a:. >>>> -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI >>>> SharedArchiveConsistency >>>> ? */ >>>> ?import jdk.test.lib.process.OutputAnalyzer; >>>> ?import jdk.test.lib.Utils; >>>> >>>> I will do more testing with the above timeout before pushing the >>>> change. >>>> >>>> Let me know if you'd like to see another webrev. >>>> >>>> thanks, >>>> >>>> Calvin >>>> >>>>> >>>>> Dan >>>>> >>>>> >>>>>> >>>>>> Increase the timeout to 1000s and add the >>>>>> -XX:-CreateCoredumpOnCrash option to disable coredump. >>>>>> >>>>>> Testing: on 2 macosx hosts on which the timeout was observed. >>>>>> >>>>>> >>>>>> thanks, >>>>>> >>>>>> Calvin >>>>>> >>>>> >>> From sgehwolf at redhat.com Wed Jul 17 13:23:29 2019 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Wed, 17 Jul 2019 15:23:29 +0200 Subject: RFR(S): 8227122: [TESTBUG] Create Docker sidecar test cases In-Reply-To: <79e78a82-4352-59b7-0a72-16bb18549868@oracle.com> References: <11c03bdc-832b-3479-075b-1a8b5dc95e87@oracle.com> <47b20d2a-02b9-33b8-b2e4-6022f809e139@oracle.com> <79e78a82-4352-59b7-0a72-16bb18549868@oracle.com> Message-ID: Hi Misha, On Tue, 2019-07-16 at 09:48 -0700, mikhailo.seledtsov at oracle.com wrote: > Hi Severin, Bob, > > Here is an updated webrev that should address all of your feedback: > > http://cr.openjdk.java.net/~mseledtsov/8227122.01/ > > To summarize the changes since webrev 00: > > - using 'docker ps' to wait until the "main" container starts > > - removed use of --ipc=shareable (not needed) > > - added comments regarding sharing of /tmp > > - using docker volumes ("--volumes-from") to share /tmp between > the containers instead of mapping to host directories (avoids potential > access/permission problems) > > - few other minor changes and cleanup Looks good. A few nits: + static class DockerThread extends Thread { + DockerRunOptions runOpts; + OutputAnalyzer out; 'out' instance variable is never used and could get removed. I don't need to see another webrev for this. + // The "sidecar" container shares "/tmp" directory with the "main" container for the + // JVM attach mechanism to work. JCMD relies on the attach mechanism (com.sun.tools.attach), + // which in turn relies on JVMSTAT mechanism, which uses Unix socket file hsperf_ residing in + // the /tmp directory. I'd rephrase the last sentense to: """ JCMD relies on the attach mechanism (com.sun.tools.attach), which in turn relies on JVMSTAT mechanism, which puts its mapped buffers in /tmp directory (hsperfdata_). Thus, we mount /tmp via --volumes-from from the main container. """ Thanks, Severin > Testing: > > ran this test on OL 7.6 and on variety of Linux nodes in the lab, > a number of times - all PASS > > > See more of comments inline below > > On 7/12/19 2:12 AM, Severin Gehwolf wrote: > > Hi Misha, > > > > On Thu, 2019-07-11 at 17:58 -0700, mikhailo.seledtsov at oracle.com wrote: > > > Hi Severin, > > > > > > Thank you for taking a look at this change. > > > > > > On 7/10/19 10:40 AM, Severin Gehwolf wrote: > > > > Hi Misha, > > > > > > > > On Tue, 2019-07-02 at 15:24 -0700, mikhailo.seledtsov at oracle.com wrote: > > > > > Please review this new test that uses a Docker sidecar pattern to > > > > > manage/monitor JVM running in the main payload container. > > > > > > > > > > Sidecar is a common pattern used in the cloud environments for > > > > > monitoring among other uses. In side car pattern the main > > > > > application/service container that runs the payload is paired with a > > > > > sidecar container. It is achieved by sharing certain namespace > > > > > aspects > > > > > between the two containers such as PID namespace, specific > > > > > sub-directories, IPC and more. > > > > > > > > > > This test implements the following cases: > > > > > - "jcmd -l" to list java processes running in "main" container > > > > > from > > > > > the "sidecar" container > > > > > - "jhsdb jinfo" in the sidecar configuration > > > > > - jcmd > > > > > > > > > > This change also builds a basis for more test cases in the future. > > > > > > > > > > Minor changes were done to DockerTestUtils: > > > > > - changing access to DOCKER_COMMAND constant to public > > > > > - minor spelling and terminology corrections > > > > > > > > > > > > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8227122 > > > > > Webrev: http://cr.openjdk.java.net/~mseledtsov/8227122.00/ > > > > > Testing: > > > > > 1. ran Docker tests on Linux-x64 - PASS > > > > > 2. Running Docker tests in test cluster - in progress > > > > > > > > > // JCMD does not work in sidecar configuration, except for "jcmd -l". > > > > // Including this test case to assist in reproduction of the problem. > > > > // t.assertIsAlive(); > > > > // testCase03(mainProcPid); > > > > > > > > FWIW, "jcmd -l" doesn't work in this case either. It only sees itself > > > > as far as I can tell. > > > In my experiment it does work. Here are parts of the test log, first the > > > command that runs jcmd in a sidecar container, then the output of that > > > container: > > > > > > """ > > > > > > [COMMAND] > > > > > > /usr/local/bin/docker run --tty=true --rm --cap-add=SYS_PTRACE > > > --sig-proxy=true --pid=container:test-container-main > > > --ipc=container:test-container-main -v > > > /ws/playArea/sidecar-jcmd-8227122/JTwork/scratch/.:/tmp/ > > > jdk-internal:test-jfr-jcmd /jdk/bin/jcmd -l > > > > > > [2019-07-12T00:26:29.083764Z] Gathering output for process 8703 > > > [ELAPSED: 5 ms] > > > [STDERR] > > > > > > [STDOUT] > > > 1 EventGeneratorLoop 15 > > > 23 jdk.jcmd/sun.tools.jcmd.JCmd -l > > > > > > """ > > > > > > The output shows 2 processes, one is EventGeneratorLoop with PID of 1 > > > (as expected). This is possible because the containers share certain > > > namespaces and mounted volumes in a 'sidecar' configuration. In this > > > case, containers share the PID namespace > > > (--pid=container:test-container-main) and share volumes mounted as > > > "/tmp" inside the container (-v > > > /ws/playArea/sidecar-jcmd-8227122/JTwork/scratch/.:/tmp/) > > > > > Right, sorry. Perhaps this code should get a comment that sharing /tmp > > between sidecar and host container is needed for jvmstat - used > > internally by the attach mechanism - to work. See > > HotSpotAttachProvider.testAttachable(): > > > > + String[] command = new String[] { > > + DockerTestUtils.DOCKER_COMMAND, "run", > > + "--tty=true", "--rm", > > + "--cap-add=SYS_PTRACE", "--sig-proxy=true", > > + "--pid=container:" + MAIN_CONTAINER_NAME, > > + "--pid=container:" + MAIN_CONTAINER_NAME, > > + "--ipc=container:" + MAIN_CONTAINER_NAME, > > + "-v", WORK_DIR + ":/tmp/", > > > > I believe -XX:+UsePerfData would be in order too as I don't think > > things would work if that default changed. > I have added the comments and added -XX:+UsePerfData for the "main" JVM > process. > > > > What's more, this seems to be a case of AttachListener::is_init_trigger[1] and > > > > VirtualMachineImpl.createAttachFile[2] disagreeing. The former looks in > > > > $(pwd)/.attach_pid or /tmp/.attach_pid and the latter creates > > > > it in /proc//root/tmp/.attach_pid. > > This seems to be the cause for why testCase03 doesn't work. Perhaps > > this deserves a bug and I can help fix it. > > > > While looking at that, I discovered what I said below, which is a > > different case I know. > Once these tests are integrated I will file a bug, and can reference the > test from that bug. > > > > There seems to be more issues involved. As attaching to a JVM inside a > > > > container doesn't seem to work from outside which is supposed to be > > > > fixed with JDK-8179498. That alone seems to warrant a bug. > > > You are describing a slightly different use case / pattern, but I agree > > > it does not seem to work. I am happy to hear confirmation of that. > > I was pointing out that JDK-8179498 seems to have regressed. It's > > unrelated but should be taken into account when fixing the above issue. > > > > > The pattern addressed in this test is a side car, where both the > > > observer and observee run in containers; the containers are 'friendly' > > > by sharing certain apsects of namespaces. > > Yes. > > > > > The use case you are describing is somewhat different, if I understand > > > correctly: the observer runs on a host machine, and obsrvee runs in a > > > container. Observer tries to use jcmd to list the java processes running > > > in container(s), and issue commands, but that fails. I can create a bug > > > for that, and a simple test case. > > There should be a bug and a test so that it cannot again regress. > > > > JDK-8193710 is also related, but the fix for that bug didn't have a > > test either :( That's this one which needs fixing: > > https://bugs.openjdk.java.net/browse/JDK-8195809 > > JDK-8195809: [TESTBUG] Create tests for JDK-8193710 jps and jcmd -l > support for Docker containers > > I have assigned it to myself, and will be working on it soon. > > > Thank you, > > Misha > > > > > private static DockerThread startMainContainer() throws Exception { > > > > // start "main" container (the observee) > > > > DockerRunOptions opts = commonDockerOpts("EventGeneratorLoop"); > > > > opts.addDockerOpts("--cap-add=SYS_PTRACE", "--ipc=shareable") > > > > > > > > Is '--ipc=shareable' really needed? It's not a supported option for my > > > > docker here :-( > > > I have removed the '--ipc=shareable' and the test still works. I think > > > this is extra stuff that is not necessary for this test case, so I will > > > remove it. > > Excellent! > > > > > I will incorporate changes from your and Bob's review, run some testing, > > > and post an updated webrev. > > Thanks, > > Severin > > > > > > [1] http://hg.openjdk.java.net/jdk/jdk/file/ba72dac556c3/src/hotspot/os/linux/attachListener_linux.cpp#l500 > > > > [2] http://hg.openjdk.java.net/jdk/jdk/file/ba72dac556c3/src/jdk.attach/linux/classes/sun/tools/attach/VirtualMachineImpl.java#l295 > > > > From daniel.daugherty at oracle.com Wed Jul 17 13:41:59 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Wed, 17 Jul 2019 09:41:59 -0400 Subject: [13] RFR(xs) 8227646: [TESTBUG] appcds/SharedArchiveConsistency timed out In-Reply-To: <584b6d93-094f-80c0-f0a7-1da5b3383b76@oracle.com> References: <6bfa61cf-eb23-47cb-ef55-47d670a04c0b@oracle.com> <248dbad8-bf0a-71d6-69fd-176b4c55df97@oracle.com> <6fd396d2-9368-2b8d-c150-39cc2cfd8217@oracle.com> <584b6d93-094f-80c0-f0a7-1da5b3383b76@oracle.com> Message-ID: <3d03428f-897a-c6f5-ac97-0b6705fa498d@oracle.com> On 7/16/19 11:09 PM, Calvin Cheung wrote: > Removing the fc.force(true) calls works and the test performs better > on macosx. It takes about 20s for the test to finish without the call > vs. up to 9 min. with the call. > > updated webrev: > > ??? http://cr.openjdk.java.net/~ccheung/8227646/webrev.01/ test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java ??? I'm assuming that the fc.force(true) calls were there out of an ??? abundance of caution/paranoia and that there is nothing in the ??? test that is relying on the data being properly flushed to disk. ??? Ioi states: "which I think is not necessary in our test" ??? and I don't see anything obvious in the test is relying on ??? complete data flushing. Thumbs up. Dan > > Ran tier1 and 2 tests successfully and repeated tier2 test 10 times on > a Mac host from which the timeout was observed. > > thanks, > > Calvin > > On 7/16/19 9:09 AM, Calvin Cheung wrote: >> Hi Ioi, >> >> I found 2 fc.force(true) calls in the test. I've removed both and >> testing it without increasing the timeout value. >> >> I test it by running the hotspot_tier2_runtime test group 10 times on >> mac hosts. Each iteration takes about 30 min. Will let you know about >> the results. >> >> thanks, >> >> Calvin >> >> On 7/16/19 8:25 AM, Ioi Lam wrote: >>> HI Calvin, >>> >>> Since the test is stuck at here at the timeout: >>> >>> at sun.nio.ch.FileDispatcherImpl.force0(java.base at 13-ea/Native Method) >>> at >>> sun.nio.ch.FileDispatcherImpl.force(java.base at 13-ea/FileDispatcherImpl.java:82) >>> at >>> sun.nio.ch.FileChannelImpl.force(java.base at 13-ea/FileChannelImpl.java:461) >>> at >>> SharedArchiveConsistency.writeData(SharedArchiveConsistency.java:166) >>> >>> Maybe we should remove the calls to FileChannel.force()? According >>> to the javadoc, this call is for "ensuring that critical information >>> is not lost in the event of a system crash", which I think is not >>> necessary in our test. >>> >>> src/java.base/unix/native/libnio/ch/FileDispatcherImpl.c: >>> >>> JNIEXPORT jint JNICALL >>> Java_sun_nio_ch_FileDispatcherImpl_force0(JNIEnv *env, jobject this, >>> ????????????????????????????????????????? jobject fdo, jboolean md) >>> { >>> ??? jint fd = fdval(env, fdo); >>> ??? int result = 0; >>> >>> #ifdef MACOSX >>> ??? result = fcntl(fd, F_FULLFSYNC); >>> ??? if (result == -1 && errno == ENOTSUP) { >>> ??????? /* Try fsync() in case F_FULLSYUNC is not implemented on the >>> file system. */ >>> ??????? result = fsync(fd); >>> ??? } >>> >>> https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/fcntl.2.html >>> >>> >>> ???? F_FULLFSYNC??????? Does the same thing as fsync(2) then asks >>> the drive to >>> ??????????????????????? flush all buffered data to the permanent >>> storage >>> ??????????????????????? device (arg is ignored).? This is currently >>> implemented >>> ??????????????????????? on HFS, MS-DOS (FAT), and Universal Disk Format >>> ??????????????????????? (UDF) file systems.? The operation may take >>> quite a >>> ??????????????????????? while to complete. >>> >>> Thanks >>> - Ioi >>> >>> >>> On 7/16/19 7:59 AM, Calvin Cheung wrote: >>>> Dan, >>>> >>>> Thanks for your review! >>>> >>>> On 7/16/19 5:56 AM, Daniel D. Daugherty wrote: >>>>> On 7/16/19 12:31 AM, Calvin Cheung wrote: >>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8227646 >>>>>> >>>>>> webrev: http://cr.openjdk.java.net/~ccheung/8227646/webrev.00/ >>>>> >>>>> test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>>>> ??? Does the test intentionally crash in one or more of the test >>>>> cases? >>>>> ? ? If not, then '-XX:-CreateCoredumpOnCrash' is not really needed. >>>>> ??? I don't think '-XX:-CreateCoredumpOnCrash' will prevent the >>>>> timeout >>>>> ??? handling mechanism from trying to capture a core file in the case >>>>> ??? of a timeout. >>>> No, the test does not crash intentionally. Thanks for clarifying >>>> the -XX:-CreateCoredumpOnCrash. I will revert the change. >>>>> >>>>> ??? The test currently timed out with a default total timeout >>>>> value of >>>>> ??? 480 seconds; that 480 comes from the default timeout value of 120 >>>>> ??? seconds and the default timeout factor of 4 (480 == 120 * 4). >>>>> >>>>> ??? The 'timeout=1000' value will get you a total timeout value of >>>>> 4000. >>>>> ??? I suspect that is not what you want. >>>>> >>>>> ??? If you specify 'timeout=240', you'll get a total timeout value of >>>>> ??? 960 seconds (240 * 4). >>>> >>>> I've seen the total elapsed time for the test got very close to >>>> 960s. So to be on the safe side, I would set the timeout=300 as >>>> follows: >>>> >>>> diff --git >>>> a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>>> b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>>> --- a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>>> +++ b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>>> @@ -35,7 +35,7 @@ >>>> ? * @build sun.hotspot.WhiteBox >>>> ? * @compile test-classes/Hello.java >>>> ? * @run driver ClassFileInstaller sun.hotspot.WhiteBox >>>> - * @run main/othervm -Xbootclasspath/a:. >>>> -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI >>>> SharedArchiveConsistency >>>> + * @run main/othervm/timeout=300 -Xbootclasspath/a:. >>>> -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI >>>> SharedArchiveConsistency >>>> ? */ >>>> ?import jdk.test.lib.process.OutputAnalyzer; >>>> ?import jdk.test.lib.Utils; >>>> >>>> I will do more testing with the above timeout before pushing the >>>> change. >>>> >>>> Let me know if you'd like to see another webrev. >>>> >>>> thanks, >>>> >>>> Calvin >>>> >>>>> >>>>> Dan >>>>> >>>>> >>>>>> >>>>>> Increase the timeout to 1000s and add the >>>>>> -XX:-CreateCoredumpOnCrash option to disable coredump. >>>>>> >>>>>> Testing: on 2 macosx hosts on which the timeout was observed. >>>>>> >>>>>> >>>>>> thanks, >>>>>> >>>>>> Calvin >>>>>> >>>>> >>> From coleen.phillimore at oracle.com Wed Jul 17 14:49:16 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 17 Jul 2019 10:49:16 -0400 Subject: RFR (S) 8227766: CheckUnhandledOops is broken in MemAllocator Message-ID: Summary: Save oop created in handle more eagerly, so CheckUnhandledOops doesn't bash it. Also added a test for the case that was failing.? There were a couple of changes needed to compile CHECK_UNHANDLED_OOPS with slowdebug, but I reverted the makefile changes for doing that because I don't think we want it for slowdebug. Lastly, made some changes to make CheckUnhandledOops easier to debug, and removed an unnecessary clearing, since it's also done in the transition. Tested with hs-tier1-3 with -XX:+CheckUnhandledOops and without. open webrev at http://cr.openjdk.java.net/~coleenp/2019/8227766.01/webrev bug link https://bugs.openjdk.java.net/browse/JDK-8227766 Thanks, Coleen From calvin.cheung at oracle.com Wed Jul 17 15:37:22 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Wed, 17 Jul 2019 08:37:22 -0700 Subject: [13] RFR(xs) 8227646: [TESTBUG] appcds/SharedArchiveConsistency timed out In-Reply-To: <3d03428f-897a-c6f5-ac97-0b6705fa498d@oracle.com> References: <6bfa61cf-eb23-47cb-ef55-47d670a04c0b@oracle.com> <248dbad8-bf0a-71d6-69fd-176b4c55df97@oracle.com> <6fd396d2-9368-2b8d-c150-39cc2cfd8217@oracle.com> <584b6d93-094f-80c0-f0a7-1da5b3383b76@oracle.com> <3d03428f-897a-c6f5-ac97-0b6705fa498d@oracle.com> Message-ID: Dan, Ioi, Thanks for taking another look. I've pushed the changeset. Calvin On 7/17/19 6:41 AM, Daniel D. Daugherty wrote: > On 7/16/19 11:09 PM, Calvin Cheung wrote: >> Removing the fc.force(true) calls works and the test performs better >> on macosx. It takes about 20s for the test to finish without the call >> vs. up to 9 min. with the call. >> >> updated webrev: >> >> ??? http://cr.openjdk.java.net/~ccheung/8227646/webrev.01/ > > test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java > ??? I'm assuming that the fc.force(true) calls were there out of an > ??? abundance of caution/paranoia and that there is nothing in the > ??? test that is relying on the data being properly flushed to disk. > > ??? Ioi states: "which I think is not necessary in our test" > > ??? and I don't see anything obvious in the test is relying on > ??? complete data flushing. > > Thumbs up. > > Dan > > >> >> Ran tier1 and 2 tests successfully and repeated tier2 test 10 times >> on a Mac host from which the timeout was observed. >> >> thanks, >> >> Calvin >> >> On 7/16/19 9:09 AM, Calvin Cheung wrote: >>> Hi Ioi, >>> >>> I found 2 fc.force(true) calls in the test. I've removed both and >>> testing it without increasing the timeout value. >>> >>> I test it by running the hotspot_tier2_runtime test group 10 times >>> on mac hosts. Each iteration takes about 30 min. Will let you know >>> about the results. >>> >>> thanks, >>> >>> Calvin >>> >>> On 7/16/19 8:25 AM, Ioi Lam wrote: >>>> HI Calvin, >>>> >>>> Since the test is stuck at here at the timeout: >>>> >>>> at sun.nio.ch.FileDispatcherImpl.force0(java.base at 13-ea/Native Method) >>>> at >>>> sun.nio.ch.FileDispatcherImpl.force(java.base at 13-ea/FileDispatcherImpl.java:82) >>>> at >>>> sun.nio.ch.FileChannelImpl.force(java.base at 13-ea/FileChannelImpl.java:461) >>>> at >>>> SharedArchiveConsistency.writeData(SharedArchiveConsistency.java:166) >>>> >>>> Maybe we should remove the calls to FileChannel.force()? According >>>> to the javadoc, this call is for "ensuring that critical >>>> information is not lost in the event of a system crash", which I >>>> think is not necessary in our test. >>>> >>>> src/java.base/unix/native/libnio/ch/FileDispatcherImpl.c: >>>> >>>> JNIEXPORT jint JNICALL >>>> Java_sun_nio_ch_FileDispatcherImpl_force0(JNIEnv *env, jobject this, >>>> ????????????????????????????????????????? jobject fdo, jboolean md) >>>> { >>>> ??? jint fd = fdval(env, fdo); >>>> ??? int result = 0; >>>> >>>> #ifdef MACOSX >>>> ??? result = fcntl(fd, F_FULLFSYNC); >>>> ??? if (result == -1 && errno == ENOTSUP) { >>>> ??????? /* Try fsync() in case F_FULLSYUNC is not implemented on >>>> the file system. */ >>>> ??????? result = fsync(fd); >>>> ??? } >>>> >>>> https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/fcntl.2.html >>>> >>>> >>>> ???? F_FULLFSYNC??????? Does the same thing as fsync(2) then asks >>>> the drive to >>>> ??????????????????????? flush all buffered data to the permanent >>>> storage >>>> ??????????????????????? device (arg is ignored).? This is currently >>>> implemented >>>> ??????????????????????? on HFS, MS-DOS (FAT), and Universal Disk >>>> Format >>>> ??????????????????????? (UDF) file systems.? The operation may take >>>> quite a >>>> ??????????????????????? while to complete. >>>> >>>> Thanks >>>> - Ioi >>>> >>>> >>>> On 7/16/19 7:59 AM, Calvin Cheung wrote: >>>>> Dan, >>>>> >>>>> Thanks for your review! >>>>> >>>>> On 7/16/19 5:56 AM, Daniel D. Daugherty wrote: >>>>>> On 7/16/19 12:31 AM, Calvin Cheung wrote: >>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8227646 >>>>>>> >>>>>>> webrev: http://cr.openjdk.java.net/~ccheung/8227646/webrev.00/ >>>>>> >>>>>> test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>>>>> ??? Does the test intentionally crash in one or more of the test >>>>>> cases? >>>>>> ? ? If not, then '-XX:-CreateCoredumpOnCrash' is not really needed. >>>>>> ??? I don't think '-XX:-CreateCoredumpOnCrash' will prevent the >>>>>> timeout >>>>>> ??? handling mechanism from trying to capture a core file in the >>>>>> case >>>>>> ??? of a timeout. >>>>> No, the test does not crash intentionally. Thanks for clarifying >>>>> the -XX:-CreateCoredumpOnCrash. I will revert the change. >>>>>> >>>>>> ??? The test currently timed out with a default total timeout >>>>>> value of >>>>>> ??? 480 seconds; that 480 comes from the default timeout value of >>>>>> 120 >>>>>> ??? seconds and the default timeout factor of 4 (480 == 120 * 4). >>>>>> >>>>>> ??? The 'timeout=1000' value will get you a total timeout value >>>>>> of 4000. >>>>>> ??? I suspect that is not what you want. >>>>>> >>>>>> ??? If you specify 'timeout=240', you'll get a total timeout >>>>>> value of >>>>>> ??? 960 seconds (240 * 4). >>>>> >>>>> I've seen the total elapsed time for the test got very close to >>>>> 960s. So to be on the safe side, I would set the timeout=300 as >>>>> follows: >>>>> >>>>> diff --git >>>>> a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>>>> b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>>>> --- a/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>>>> +++ b/test/hotspot/jtreg/runtime/appcds/SharedArchiveConsistency.java >>>>> @@ -35,7 +35,7 @@ >>>>> ? * @build sun.hotspot.WhiteBox >>>>> ? * @compile test-classes/Hello.java >>>>> ? * @run driver ClassFileInstaller sun.hotspot.WhiteBox >>>>> - * @run main/othervm -Xbootclasspath/a:. >>>>> -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI >>>>> SharedArchiveConsistency >>>>> + * @run main/othervm/timeout=300 -Xbootclasspath/a:. >>>>> -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI >>>>> SharedArchiveConsistency >>>>> ? */ >>>>> ?import jdk.test.lib.process.OutputAnalyzer; >>>>> ?import jdk.test.lib.Utils; >>>>> >>>>> I will do more testing with the above timeout before pushing the >>>>> change. >>>>> >>>>> Let me know if you'd like to see another webrev. >>>>> >>>>> thanks, >>>>> >>>>> Calvin >>>>> >>>>>> >>>>>> Dan >>>>>> >>>>>> >>>>>>> >>>>>>> Increase the timeout to 1000s and add the >>>>>>> -XX:-CreateCoredumpOnCrash option to disable coredump. >>>>>>> >>>>>>> Testing: on 2 macosx hosts on which the timeout was observed. >>>>>>> >>>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> Calvin >>>>>>> >>>>>> >>>> > From coleen.phillimore at oracle.com Wed Jul 17 16:22:31 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 17 Jul 2019 12:22:31 -0400 Subject: RFR (S) 8227766: CheckUnhandledOops is broken in MemAllocator In-Reply-To: References: Message-ID: Due to some concerns about MemAllocator performance, and to only create handles when needed, I have retested smaller change: open webrev at http://cr.openjdk.java.net/~coleenp/2019/8227766.02/webrev Coleen On 7/17/19 10:49 AM, coleen.phillimore at oracle.com wrote: > Summary: Save oop created in handle more eagerly, so > CheckUnhandledOops doesn't bash it. > > Also added a test for the case that was failing.? There were a couple > of changes needed to compile CHECK_UNHANDLED_OOPS with slowdebug, but > I reverted the makefile changes for doing that because I don't think > we want it for slowdebug. > Lastly, made some changes to make CheckUnhandledOops easier to debug, > and removed an unnecessary clearing, since it's also done in the > transition. > > Tested with hs-tier1-3 with -XX:+CheckUnhandledOops and without. > > open webrev at http://cr.openjdk.java.net/~coleenp/2019/8227766.01/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8227766 > > Thanks, > Coleen From erik.osterlund at oracle.com Wed Jul 17 16:31:53 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Wed, 17 Jul 2019 18:31:53 +0200 Subject: RFR (S) 8227766: CheckUnhandledOops is broken in MemAllocator In-Reply-To: References: Message-ID: Hi Coleen, Looks good! /Erik On 2019-07-17 18:22, coleen.phillimore at oracle.com wrote: > > Due to some concerns about MemAllocator performance, and to only create > handles when needed, I have retested smaller change: > > open webrev at http://cr.openjdk.java.net/~coleenp/2019/8227766.02/webrev > > Coleen > > On 7/17/19 10:49 AM, coleen.phillimore at oracle.com wrote: >> Summary: Save oop created in handle more eagerly, so >> CheckUnhandledOops doesn't bash it. >> >> Also added a test for the case that was failing.? There were a couple >> of changes needed to compile CHECK_UNHANDLED_OOPS with slowdebug, but >> I reverted the makefile changes for doing that because I don't think >> we want it for slowdebug. >> Lastly, made some changes to make CheckUnhandledOops easier to debug, >> and removed an unnecessary clearing, since it's also done in the >> transition. >> >> Tested with hs-tier1-3 with -XX:+CheckUnhandledOops and without. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/2019/8227766.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8227766 >> >> Thanks, >> Coleen > From lois.foltan at oracle.com Wed Jul 17 17:01:47 2019 From: lois.foltan at oracle.com (Lois Foltan) Date: Wed, 17 Jul 2019 13:01:47 -0400 Subject: RFR (S) 8227766: CheckUnhandledOops is broken in MemAllocator In-Reply-To: References: Message-ID: On 7/17/2019 12:22 PM, coleen.phillimore at oracle.com wrote: > > Due to some concerns about MemAllocator performance, and to only > create handles when needed, I have retested smaller change: > > open webrev at http://cr.openjdk.java.net/~coleenp/2019/8227766.02/webrev Looks good. gc/shared/memAllocator.cpp - line #373-375: Since obj is already initialized to NULL to you need the else clause? Thanks, Lois > > Coleen > > On 7/17/19 10:49 AM, coleen.phillimore at oracle.com wrote: >> Summary: Save oop created in handle more eagerly, so >> CheckUnhandledOops doesn't bash it. >> >> Also added a test for the case that was failing.? There were a couple >> of changes needed to compile CHECK_UNHANDLED_OOPS with slowdebug, but >> I reverted the makefile changes for doing that because I don't think >> we want it for slowdebug. >> Lastly, made some changes to make CheckUnhandledOops easier to debug, >> and removed an unnecessary clearing, since it's also done in the >> transition. >> >> Tested with hs-tier1-3 with -XX:+CheckUnhandledOops and without. >> >> open webrev at >> http://cr.openjdk.java.net/~coleenp/2019/8227766.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8227766 >> >> Thanks, >> Coleen > From coleen.phillimore at oracle.com Wed Jul 17 17:48:35 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 17 Jul 2019 13:48:35 -0400 Subject: RFR (S) 8227766: CheckUnhandledOops is broken in MemAllocator In-Reply-To: References: Message-ID: On 7/17/19 1:01 PM, Lois Foltan wrote: > On 7/17/2019 12:22 PM, coleen.phillimore at oracle.com wrote: >> >> Due to some concerns about MemAllocator performance, and to only >> create handles when needed, I have retested smaller change: >> >> open webrev at >> http://cr.openjdk.java.net/~coleenp/2019/8227766.02/webrev > > Looks good. > > gc/shared/memAllocator.cpp > - line #373-375: Since obj is already initialized to NULL to you need > the else clause? Yes, because CheckUnhandledOops poisons oops that are local variables, so we have to set it back to zero.? I can add that to the comment. Thanks! Coleen > > Thanks, > Lois > >> >> Coleen >> >> On 7/17/19 10:49 AM, coleen.phillimore at oracle.com wrote: >>> Summary: Save oop created in handle more eagerly, so >>> CheckUnhandledOops doesn't bash it. >>> >>> Also added a test for the case that was failing.? There were a >>> couple of changes needed to compile CHECK_UNHANDLED_OOPS with >>> slowdebug, but I reverted the makefile changes for doing that >>> because I don't think we want it for slowdebug. >>> Lastly, made some changes to make CheckUnhandledOops easier to >>> debug, and removed an unnecessary clearing, since it's also done in >>> the transition. >>> >>> Tested with hs-tier1-3 with -XX:+CheckUnhandledOops and without. >>> >>> open webrev at >>> http://cr.openjdk.java.net/~coleenp/2019/8227766.01/webrev >>> bug link https://bugs.openjdk.java.net/browse/JDK-8227766 >>> >>> Thanks, >>> Coleen >> > From coleen.phillimore at oracle.com Wed Jul 17 17:49:53 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 17 Jul 2019 13:49:53 -0400 Subject: RFR (S) 8227766: CheckUnhandledOops is broken in MemAllocator In-Reply-To: References: Message-ID: <11b6dfc0-3077-d27d-8793-80844fb6e309@oracle.com> Thanks Erik! Coleen On 7/17/19 12:31 PM, Erik ?sterlund wrote: > Hi Coleen, > > Looks good! > > /Erik > > On 2019-07-17 18:22, coleen.phillimore at oracle.com wrote: >> >> Due to some concerns about MemAllocator performance, and to only >> create handles when needed, I have retested smaller change: >> >> open webrev at >> http://cr.openjdk.java.net/~coleenp/2019/8227766.02/webrev >> >> Coleen >> >> On 7/17/19 10:49 AM, coleen.phillimore at oracle.com wrote: >>> Summary: Save oop created in handle more eagerly, so >>> CheckUnhandledOops doesn't bash it. >>> >>> Also added a test for the case that was failing.? There were a >>> couple of changes needed to compile CHECK_UNHANDLED_OOPS with >>> slowdebug, but I reverted the makefile changes for doing that >>> because I don't think we want it for slowdebug. >>> Lastly, made some changes to make CheckUnhandledOops easier to >>> debug, and removed an unnecessary clearing, since it's also done in >>> the transition. >>> >>> Tested with hs-tier1-3 with -XX:+CheckUnhandledOops and without. >>> >>> open webrev at >>> http://cr.openjdk.java.net/~coleenp/2019/8227766.01/webrev >>> bug link https://bugs.openjdk.java.net/browse/JDK-8227766 >>> >>> Thanks, >>> Coleen >> From mikhailo.seledtsov at oracle.com Wed Jul 17 20:45:58 2019 From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com) Date: Wed, 17 Jul 2019 13:45:58 -0700 Subject: RFR(S): 8227122: [TESTBUG] Create Docker sidecar test cases In-Reply-To: References: <11c03bdc-832b-3479-075b-1a8b5dc95e87@oracle.com> <47b20d2a-02b9-33b8-b2e4-6022f809e139@oracle.com> <79e78a82-4352-59b7-0a72-16bb18549868@oracle.com> Message-ID: Hi Severin, ?? Thank you for the review. On 7/17/19 6:23 AM, Severin Gehwolf wrote: > Hi Misha, > > On Tue, 2019-07-16 at 09:48 -0700, mikhailo.seledtsov at oracle.com wrote: >> Hi Severin, Bob, >> >> Here is an updated webrev that should address all of your feedback: >> >> http://cr.openjdk.java.net/~mseledtsov/8227122.01/ >> >> To summarize the changes since webrev 00: >> >> - using 'docker ps' to wait until the "main" container starts >> >> - removed use of --ipc=shareable (not needed) >> >> - added comments regarding sharing of /tmp >> >> - using docker volumes ("--volumes-from") to share /tmp between >> the containers instead of mapping to host directories (avoids potential >> access/permission problems) >> >> - few other minor changes and cleanup > Looks good. A few nits: > > + static class DockerThread extends Thread { > + DockerRunOptions runOpts; > + OutputAnalyzer out; > > 'out' instance variable is never used and could get removed. I don't > need to see another webrev for this. Removed. > + // The "sidecar" container shares "/tmp" directory with the "main" container for the > + // JVM attach mechanism to work. JCMD relies on the attach mechanism (com.sun.tools.attach), > + // which in turn relies on JVMSTAT mechanism, which uses Unix socket file hsperf_ residing in > + // the /tmp directory. > > I'd rephrase the last sentense to: > > """ > JCMD relies on the attach mechanism (com.sun.tools.attach), > which in turn relies on JVMSTAT mechanism, which puts its mapped > buffers in /tmp directory (hsperfdata_). Thus, we mount /tmp via > --volumes-from from the main container. > """ I have updated the comments. Thank you, Misha > Thanks, > Severin > >> Testing: >> >> ran this test on OL 7.6 and on variety of Linux nodes in the lab, >> a number of times - all PASS >> >> >> See more of comments inline below >> >> On 7/12/19 2:12 AM, Severin Gehwolf wrote: >>> Hi Misha, >>> >>> On Thu, 2019-07-11 at 17:58 -0700, mikhailo.seledtsov at oracle.com wrote: >>>> Hi Severin, >>>> >>>> Thank you for taking a look at this change. >>>> >>>> On 7/10/19 10:40 AM, Severin Gehwolf wrote: >>>>> Hi Misha, >>>>> >>>>> On Tue, 2019-07-02 at 15:24 -0700, mikhailo.seledtsov at oracle.com wrote: >>>>>> Please review this new test that uses a Docker sidecar pattern to >>>>>> manage/monitor JVM running in the main payload container. >>>>>> >>>>>> Sidecar is a common pattern used in the cloud environments for >>>>>> monitoring among other uses. In side car pattern the main >>>>>> application/service container that runs the payload is paired with a >>>>>> sidecar container. It is achieved by sharing certain namespace >>>>>> aspects >>>>>> between the two containers such as PID namespace, specific >>>>>> sub-directories, IPC and more. >>>>>> >>>>>> This test implements the following cases: >>>>>> - "jcmd -l" to list java processes running in "main" container >>>>>> from >>>>>> the "sidecar" container >>>>>> - "jhsdb jinfo" in the sidecar configuration >>>>>> - jcmd >>>>>> >>>>>> This change also builds a basis for more test cases in the future. >>>>>> >>>>>> Minor changes were done to DockerTestUtils: >>>>>> - changing access to DOCKER_COMMAND constant to public >>>>>> - minor spelling and terminology corrections >>>>>> >>>>>> >>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8227122 >>>>>> Webrev: http://cr.openjdk.java.net/~mseledtsov/8227122.00/ >>>>>> Testing: >>>>>> 1. ran Docker tests on Linux-x64 - PASS >>>>>> 2. Running Docker tests in test cluster - in progress >>>>>> >>>>> // JCMD does not work in sidecar configuration, except for "jcmd -l". >>>>> // Including this test case to assist in reproduction of the problem. >>>>> // t.assertIsAlive(); >>>>> // testCase03(mainProcPid); >>>>> >>>>> FWIW, "jcmd -l" doesn't work in this case either. It only sees itself >>>>> as far as I can tell. >>>> In my experiment it does work. Here are parts of the test log, first the >>>> command that runs jcmd in a sidecar container, then the output of that >>>> container: >>>> >>>> """ >>>> >>>> [COMMAND] >>>> >>>> /usr/local/bin/docker run --tty=true --rm --cap-add=SYS_PTRACE >>>> --sig-proxy=true --pid=container:test-container-main >>>> --ipc=container:test-container-main -v >>>> /ws/playArea/sidecar-jcmd-8227122/JTwork/scratch/.:/tmp/ >>>> jdk-internal:test-jfr-jcmd /jdk/bin/jcmd -l >>>> >>>> [2019-07-12T00:26:29.083764Z] Gathering output for process 8703 >>>> [ELAPSED: 5 ms] >>>> [STDERR] >>>> >>>> [STDOUT] >>>> 1 EventGeneratorLoop 15 >>>> 23 jdk.jcmd/sun.tools.jcmd.JCmd -l >>>> >>>> """ >>>> >>>> The output shows 2 processes, one is EventGeneratorLoop with PID of 1 >>>> (as expected). This is possible because the containers share certain >>>> namespaces and mounted volumes in a 'sidecar' configuration. In this >>>> case, containers share the PID namespace >>>> (--pid=container:test-container-main) and share volumes mounted as >>>> "/tmp" inside the container (-v >>>> /ws/playArea/sidecar-jcmd-8227122/JTwork/scratch/.:/tmp/) >>>> >>> Right, sorry. Perhaps this code should get a comment that sharing /tmp >>> between sidecar and host container is needed for jvmstat - used >>> internally by the attach mechanism - to work. See >>> HotSpotAttachProvider.testAttachable(): >>> >>> + String[] command = new String[] { >>> + DockerTestUtils.DOCKER_COMMAND, "run", >>> + "--tty=true", "--rm", >>> + "--cap-add=SYS_PTRACE", "--sig-proxy=true", >>> + "--pid=container:" + MAIN_CONTAINER_NAME, >>> + "--pid=container:" + MAIN_CONTAINER_NAME, >>> + "--ipc=container:" + MAIN_CONTAINER_NAME, >>> + "-v", WORK_DIR + ":/tmp/", >>> >>> I believe -XX:+UsePerfData would be in order too as I don't think >>> things would work if that default changed. >> I have added the comments and added -XX:+UsePerfData for the "main" JVM >> process. >>>>> What's more, this seems to be a case of AttachListener::is_init_trigger[1] and >>>>> VirtualMachineImpl.createAttachFile[2] disagreeing. The former looks in >>>>> $(pwd)/.attach_pid or /tmp/.attach_pid and the latter creates >>>>> it in /proc//root/tmp/.attach_pid. >>> This seems to be the cause for why testCase03 doesn't work. Perhaps >>> this deserves a bug and I can help fix it. >>> >>> While looking at that, I discovered what I said below, which is a >>> different case I know. >> Once these tests are integrated I will file a bug, and can reference the >> test from that bug. >>>>> There seems to be more issues involved. As attaching to a JVM inside a >>>>> container doesn't seem to work from outside which is supposed to be >>>>> fixed with JDK-8179498. That alone seems to warrant a bug. >>>> You are describing a slightly different use case / pattern, but I agree >>>> it does not seem to work. I am happy to hear confirmation of that. >>> I was pointing out that JDK-8179498 seems to have regressed. It's >>> unrelated but should be taken into account when fixing the above issue. >>> >>>> The pattern addressed in this test is a side car, where both the >>>> observer and observee run in containers; the containers are 'friendly' >>>> by sharing certain apsects of namespaces. >>> Yes. >>> >>>> The use case you are describing is somewhat different, if I understand >>>> correctly: the observer runs on a host machine, and obsrvee runs in a >>>> container. Observer tries to use jcmd to list the java processes running >>>> in container(s), and issue commands, but that fails. I can create a bug >>>> for that, and a simple test case. >>> There should be a bug and a test so that it cannot again regress. >>> >>> JDK-8193710 is also related, but the fix for that bug didn't have a >>> test either :( That's this one which needs fixing: >>> https://bugs.openjdk.java.net/browse/JDK-8195809 >> JDK-8195809: [TESTBUG] Create tests for JDK-8193710 jps and jcmd -l >> support for Docker containers >> >> I have assigned it to myself, and will be working on it soon. >> >> >> Thank you, >> >> Misha >> >>>>> private static DockerThread startMainContainer() throws Exception { >>>>> // start "main" container (the observee) >>>>> DockerRunOptions opts = commonDockerOpts("EventGeneratorLoop"); >>>>> opts.addDockerOpts("--cap-add=SYS_PTRACE", "--ipc=shareable") >>>>> >>>>> Is '--ipc=shareable' really needed? It's not a supported option for my >>>>> docker here :-( >>>> I have removed the '--ipc=shareable' and the test still works. I think >>>> this is extra stuff that is not necessary for this test case, so I will >>>> remove it. >>> Excellent! >>> >>>> I will incorporate changes from your and Bob's review, run some testing, >>>> and post an updated webrev. >>> Thanks, >>> Severin >>> >>>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/ba72dac556c3/src/hotspot/os/linux/attachListener_linux.cpp#l500 >>>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/ba72dac556c3/src/jdk.attach/linux/classes/sun/tools/attach/VirtualMachineImpl.java#l295 >>>>> From lois.foltan at oracle.com Wed Jul 17 21:43:16 2019 From: lois.foltan at oracle.com (Lois Foltan) Date: Wed, 17 Jul 2019 17:43:16 -0400 Subject: RFR (S) 8227766: CheckUnhandledOops is broken in MemAllocator In-Reply-To: References: Message-ID: On 7/17/2019 1:48 PM, coleen.phillimore at oracle.com wrote: > > > On 7/17/19 1:01 PM, Lois Foltan wrote: >> On 7/17/2019 12:22 PM, coleen.phillimore at oracle.com wrote: >>> >>> Due to some concerns about MemAllocator performance, and to only >>> create handles when needed, I have retested smaller change: >>> >>> open webrev at >>> http://cr.openjdk.java.net/~coleenp/2019/8227766.02/webrev >> >> Looks good. >> >> gc/shared/memAllocator.cpp >> - line #373-375: Since obj is already initialized to NULL to you need >> the else clause? > > Yes, because CheckUnhandledOops poisons oops that are local variables, > so we have to set it back to zero.? I can add that to the comment. Ahh, I missed on line #369 that obj is passed by reference.? Looks good, you have my review. Lois > > Thanks! > Coleen >> >> Thanks, >> Lois >> >>> >>> Coleen >>> >>> On 7/17/19 10:49 AM, coleen.phillimore at oracle.com wrote: >>>> Summary: Save oop created in handle more eagerly, so >>>> CheckUnhandledOops doesn't bash it. >>>> >>>> Also added a test for the case that was failing.? There were a >>>> couple of changes needed to compile CHECK_UNHANDLED_OOPS with >>>> slowdebug, but I reverted the makefile changes for doing that >>>> because I don't think we want it for slowdebug. >>>> Lastly, made some changes to make CheckUnhandledOops easier to >>>> debug, and removed an unnecessary clearing, since it's also done in >>>> the transition. >>>> >>>> Tested with hs-tier1-3 with -XX:+CheckUnhandledOops and without. >>>> >>>> open webrev at >>>> http://cr.openjdk.java.net/~coleenp/2019/8227766.01/webrev >>>> bug link https://bugs.openjdk.java.net/browse/JDK-8227766 >>>> >>>> Thanks, >>>> Coleen >>> >> > From david.holmes at oracle.com Thu Jul 18 04:38:52 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 18 Jul 2019 14:38:52 +1000 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: References: Message-ID: Hi Martin, I need to think about this some more. A critical property of the fast field accessors are that they are trivial and completely safe. They are complicated by the need to check if a GC may have happened while we directly read the field. If you try to use fast field accessors when you have to post the field access event then how can you safely go off into a JVM TI event callback ?? Thanks, David On 16/07/2019 11:31 pm, Doerr, Martin wrote: > Hi, > > the current implementation of FastJNIAccessors ignores the flag -XX:+UseFastJNIAccessors when the JVMTI capability "can_post_field_access" is enabled. > This is an unnecessary restriction which makes field accesses (GetField) from native code slower when a JVMTI agent is attached which enables this capability. > A better implementation would check at runtime if an agent actually wants to receive field access events. > > Note that the bytecode interpreter already uses this better implementation by checking if field access watch events were requested (JvmtiExport::_field_access_count != 0). > > I have implemented such a runtime check on all platforms which currently support FastJNIAccessors. > > My new jtreg test runtime/jni/FastGetField/FastGetField.java contains a micro benchmark: > test-support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/FastGetField/FastGetField.jtr > shows the duration of 10000 iterations with and without UseFastJNIAccessors (JVMTI agent gets attached in both runs). > My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with FastJNIAccessors and 11.2ms without it. > > Webrev: > http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/ > > We have run the test on 64 bit x86 platforms, SPARC and aarch64. > (FastJNIAccessors are not yet available on PPC64 and s390. I'll contribute them later.) > My webrev contains 32 bit implementations for x86 and arm, but completely untested. It'd be great if somebody could volunteer to review and test these platforms. > > Please review. > > Best regards, > Martin > From david.holmes at oracle.com Thu Jul 18 05:26:05 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 18 Jul 2019 15:26:05 +1000 Subject: RFR(S) 8227435: Perf::attach() should not throw a java.lang.Exception In-Reply-To: References: Message-ID: <0faa1129-aa75-4174-ee24-57dab4941e94@oracle.com> On 16/07/2019 9:42 pm, Langer, Christoph wrote: > Hi Ralf, > > looks good. Prior to pushing you?ll have to take care of the copyright years in the files you touched ?? > > cc-ing hotspot-runtime, because I think it affects this area, too. tl;dr - looks fine. :) I was in two minds about this. To me this is not an I/O error so the current use of Exception rather than IOException seemed more appropriate. But the Java code needs to match the VM code and tweaking the Java code to allow for the Exception would have a flow-on affect to the whole call chain. So overall treating this case as an IOException seems far simpler and not terribly wrong. Cheers, David ----- > Thanks > Christoph > > > From: serviceability-dev On Behalf Of Schmelter, Ralf > Sent: Montag, 15. Juli 2019 10:10 > To: OpenJDK Serviceability > Subject: [CAUTION] RFR(S) 8227435: Perf::attach() should not throw a java.lang.Exception > > Please review this small change. It changes the exception which will be thrown when the perf file has not yet the correct size. Instead of throwing the (not declared) java.lang.Exception, we will now throw java.io.IOException, which is expected by the calling code. > > webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8227435/webrev.0/ > bugreport: https://bugs.openjdk.java.net/browse/JDK-8227435 > > Best regards, > Ralf > From erik.osterlund at oracle.com Thu Jul 18 06:43:08 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Thu, 18 Jul 2019 08:43:08 +0200 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: References: Message-ID: <812BD8C7-B755-47AB-9781-9360D8089B3F@oracle.com> Hi Martin, Since the JNI calls go through function pointers in the JNI env that go either to the fast or slow version, could one option be to go through the JNI envs and change the function pointers to the slow one when this JVMTI feature is enabled? Advantages: 1) No need to change the platform specific code that seems to surprisingly work right now. 2) No need for the fast path to check that condition. Just an idea. Thanks, /Erik > On 16 Jul 2019, at 15:31, Doerr, Martin wrote: > > Hi, > > the current implementation of FastJNIAccessors ignores the flag -XX:+UseFastJNIAccessors when the JVMTI capability "can_post_field_access" is enabled. > This is an unnecessary restriction which makes field accesses (GetField) from native code slower when a JVMTI agent is attached which enables this capability. > A better implementation would check at runtime if an agent actually wants to receive field access events. > > Note that the bytecode interpreter already uses this better implementation by checking if field access watch events were requested (JvmtiExport::_field_access_count != 0). > > I have implemented such a runtime check on all platforms which currently support FastJNIAccessors. > > My new jtreg test runtime/jni/FastGetField/FastGetField.java contains a micro benchmark: > test-support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/FastGetField/FastGetField.jtr > shows the duration of 10000 iterations with and without UseFastJNIAccessors (JVMTI agent gets attached in both runs). > My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with FastJNIAccessors and 11.2ms without it. > > Webrev: > http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/ > > We have run the test on 64 bit x86 platforms, SPARC and aarch64. > (FastJNIAccessors are not yet available on PPC64 and s390. I'll contribute them later.) > My webrev contains 32 bit implementations for x86 and arm, but completely untested. It'd be great if somebody could volunteer to review and test these platforms. > > Please review. > > Best regards, > Martin > From martin.doerr at sap.com Thu Jul 18 09:09:31 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 18 Jul 2019 09:09:31 +0000 Subject: RFR(S): 8227692: Remove develop feature PrintMallocStatistics In-Reply-To: References: <84fbdef1-3735-7344-2d97-ebcc38213140@oracle.com> Message-ID: Hi Dan, can I count this as review? Best regards, Martin > -----Original Message----- > From: Daniel D. Daugherty > Sent: Mittwoch, 17. Juli 2019 00:20 > To: Doerr, Martin ; hotspot-runtime- > dev at openjdk.java.net > Cc: Baesken, Matthias > Subject: Re: RFR(S): 8227692: Remove develop feature PrintMallocStatistics > > Thanks. Sounds good. > > Dan > > > On 7/16/19 4:15 PM, Doerr, Martin wrote: > > Hi Dan, > > > > I've added the proposal to use "-XX:NativeMemoryTracking=summary - > XX:+PrintNMTStatistics" instead. > > > > Thanks, > > Martin > > > > > >> -----Original Message----- > >> From: Daniel D. Daugherty > >> Sent: Dienstag, 16. Juli 2019 18:04 > >> To: Doerr, Martin ; hotspot-runtime- > >> dev at openjdk.java.net > >> Cc: Baesken, Matthias > >> Subject: Re: RFR(S): 8227692: Remove develop feature > PrintMallocStatistics > >> > >> For anyone that happens to be searching JBS for what happened to the > >> '-XX:+PrintMallocStatistics' option, you might want to include some > >> guidance on how they get the equivalent information from NMT... > >> > >> A short note in JDK-8227692 should suffice... > >> > >> Dan > >> > >> > >> On 7/15/19 4:16 PM, Doerr, Martin wrote: > >>> Hi, > >>> > >>> as announced on hotspot-dev, I'd like to remove the debug build > feature > >> for allocation statistics "AllocStats" (controlled by develop flag - > >> XX:+PrintMallocStatistics). > >>> I've closed JDK-8227597 >> 8227597> which was a proposal to reduce the performance impact of it, > but > >> several people have suggested to remove this feature which is even > better > >> IMHO. > >>> Bug: > >>> https://bugs.openjdk.java.net/browse/JDK-8227692 > >>> > >>> Webrev: > >>> > >> > http://cr.openjdk.java.net/~mdoerr/8227692_remove_PrintMallocStatistics/ > >> webrev.00/ > >>> I've also taken over the reworked inc_stat_counter from JDK- > >> 8227597 > >> (allocation.inline.hpp). > >>> Please review. > >>> > >>> Best regards, > >>> Martin > >>> From martin.doerr at sap.com Thu Jul 18 10:01:36 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 18 Jul 2019 10:01:36 +0000 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: References: Message-ID: Hi David and Erik, thank you for looking at my proposal. > If you try to use fast field accessors when you have to post the field > access event then how can you safely go off into a JVM TI event callback ?? We speculatively load the field and check afterwards if we can use this loaded value. It is safe to use it if there was no safepoint and no JVMTI event was requested. Otherwise, we simply discard the (possibly) loaded value and load it again in the slow path where we do all the synchronization and event posting. @Erik: Thanks for your proposal to change the function pointers. I'll look into that. Best regards, Martin > -----Original Message----- > From: David Holmes > Sent: Donnerstag, 18. Juli 2019 06:39 > To: Doerr, Martin ; hotspot-runtime- > dev at openjdk.java.net; serviceability-dev at openjdk.java.net > Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access > event requests at runtime > > Hi Martin, > > I need to think about this some more. A critical property of the fast > field accessors are that they are trivial and completely safe. They are > complicated by the need to check if a GC may have happened while we > directly read the field. > > If you try to use fast field accessors when you have to post the field > access event then how can you safely go off into a JVM TI event callback ?? > > Thanks, > David > > On 16/07/2019 11:31 pm, Doerr, Martin wrote: > > Hi, > > > > the current implementation of FastJNIAccessors ignores the flag - > XX:+UseFastJNIAccessors when the JVMTI capability > "can_post_field_access" is enabled. > > This is an unnecessary restriction which makes field accesses > (GetField) from native code slower when a JVMTI agent is attached > which enables this capability. > > A better implementation would check at runtime if an agent actually wants > to receive field access events. > > > > Note that the bytecode interpreter already uses this better > implementation by checking if field access watch events were requested > (JvmtiExport::_field_access_count != 0). > > > > I have implemented such a runtime check on all platforms which currently > support FastJNIAccessors. > > > > My new jtreg test runtime/jni/FastGetField/FastGetField.java contains a > micro benchmark: > > test- > support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/Fa > stGetField/FastGetField.jtr > > shows the duration of 10000 iterations with and without > UseFastJNIAccessors (JVMTI agent gets attached in both runs). > > My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with > FastJNIAccessors and 11.2ms without it. > > > > Webrev: > > > http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/ > > > > We have run the test on 64 bit x86 platforms, SPARC and aarch64. > > (FastJNIAccessors are not yet available on PPC64 and s390. I'll contribute > them later.) > > My webrev contains 32 bit implementations for x86 and arm, but > completely untested. It'd be great if somebody could volunteer to review > and test these platforms. > > > > Please review. > > > > Best regards, > > Martin > > From coleen.phillimore at oracle.com Thu Jul 18 11:41:40 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 18 Jul 2019 07:41:40 -0400 Subject: RFR (S) 8227766: CheckUnhandledOops is broken in MemAllocator In-Reply-To: References: Message-ID: On 7/17/19 5:43 PM, Lois Foltan wrote: > > > On 7/17/2019 1:48 PM, coleen.phillimore at oracle.com wrote: >> >> >> On 7/17/19 1:01 PM, Lois Foltan wrote: >>> On 7/17/2019 12:22 PM, coleen.phillimore at oracle.com wrote: >>>> >>>> Due to some concerns about MemAllocator performance, and to only >>>> create handles when needed, I have retested smaller change: >>>> >>>> open webrev at >>>> http://cr.openjdk.java.net/~coleenp/2019/8227766.02/webrev >>> >>> Looks good. >>> >>> gc/shared/memAllocator.cpp >>> - line #373-375: Since obj is already initialized to NULL to you >>> need the else clause? >> >> Yes, because CheckUnhandledOops poisons oops that are local >> variables, so we have to set it back to zero.? I can add that to the >> comment. > > Ahh, I missed on line #369 that obj is passed by reference.? Looks > good, you have my review. Yes, thank you!?? I don't like that obj is passed by reference but Erik convinced me to keep it. Coleen > Lois > >> >> Thanks! >> Coleen >>> >>> Thanks, >>> Lois >>> >>>> >>>> Coleen >>>> >>>> On 7/17/19 10:49 AM, coleen.phillimore at oracle.com wrote: >>>>> Summary: Save oop created in handle more eagerly, so >>>>> CheckUnhandledOops doesn't bash it. >>>>> >>>>> Also added a test for the case that was failing.? There were a >>>>> couple of changes needed to compile CHECK_UNHANDLED_OOPS with >>>>> slowdebug, but I reverted the makefile changes for doing that >>>>> because I don't think we want it for slowdebug. >>>>> Lastly, made some changes to make CheckUnhandledOops easier to >>>>> debug, and removed an unnecessary clearing, since it's also done >>>>> in the transition. >>>>> >>>>> Tested with hs-tier1-3 with -XX:+CheckUnhandledOops and without. >>>>> >>>>> open webrev at >>>>> http://cr.openjdk.java.net/~coleenp/2019/8227766.01/webrev >>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8227766 >>>>> >>>>> Thanks, >>>>> Coleen >>>> >>> >> > From martin.doerr at sap.com Thu Jul 18 12:51:45 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 18 Jul 2019 12:51:45 +0000 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: <812BD8C7-B755-47AB-9781-9360D8089B3F@oracle.com> References: <812BD8C7-B755-47AB-9781-9360D8089B3F@oracle.com> Message-ID: Hi Erik, I like the idea, but it seems to be difficult. JNI function table can get copied and redirected at runtime (e.g. SetJNIFunctionTable). We'd have to synchronize with that to avoid messing it up. Also, I think the function pointers should better be made volatile if we change them concurrently. I have to think more about all of that, but I guess this approach will be more complicated than my initial proposal. Best regards, Martin > -----Original Message----- > From: Erik Osterlund > Sent: Donnerstag, 18. Juli 2019 08:43 > To: Doerr, Martin > Cc: hotspot-runtime-dev at openjdk.java.net; serviceability- > dev at openjdk.java.net > Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access > event requests at runtime > > Hi Martin, > > Since the JNI calls go through function pointers in the JNI env that go either > to the fast or slow version, could one option be to go through the JNI envs > and change the function pointers to the slow one when this JVMTI feature is > enabled? > > Advantages: > 1) No need to change the platform specific code that seems to surprisingly > work right now. > 2) No need for the fast path to check that condition. > > Just an idea. > > Thanks, > /Erik > > > > On 16 Jul 2019, at 15:31, Doerr, Martin wrote: > > > > Hi, > > > > the current implementation of FastJNIAccessors ignores the flag - > XX:+UseFastJNIAccessors when the JVMTI capability > "can_post_field_access" is enabled. > > This is an unnecessary restriction which makes field accesses > (GetField) from native code slower when a JVMTI agent is attached > which enables this capability. > > A better implementation would check at runtime if an agent actually wants > to receive field access events. > > > > Note that the bytecode interpreter already uses this better > implementation by checking if field access watch events were requested > (JvmtiExport::_field_access_count != 0). > > > > I have implemented such a runtime check on all platforms which currently > support FastJNIAccessors. > > > > My new jtreg test runtime/jni/FastGetField/FastGetField.java contains a > micro benchmark: > > test- > support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/Fa > stGetField/FastGetField.jtr > > shows the duration of 10000 iterations with and without > UseFastJNIAccessors (JVMTI agent gets attached in both runs). > > My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with > FastJNIAccessors and 11.2ms without it. > > > > Webrev: > > > http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/ > > > > We have run the test on 64 bit x86 platforms, SPARC and aarch64. > > (FastJNIAccessors are not yet available on PPC64 and s390. I'll contribute > them later.) > > My webrev contains 32 bit implementations for x86 and arm, but > completely untested. It'd be great if somebody could volunteer to review > and test these platforms. > > > > Please review. > > > > Best regards, > > Martin > > From erik.osterlund at oracle.com Thu Jul 18 13:52:56 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Thu, 18 Jul 2019 15:52:56 +0200 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: References: <812BD8C7-B755-47AB-9781-9360D8089B3F@oracle.com> Message-ID: Hi Martin, Okay, looks good in that case. Thanks, /Erik > On 18 Jul 2019, at 14:51, Doerr, Martin wrote: > > Hi Erik, > > I like the idea, but it seems to be difficult. > > JNI function table can get copied and redirected at runtime (e.g. SetJNIFunctionTable). > We'd have to synchronize with that to avoid messing it up. > > Also, I think the function pointers should better be made volatile if we change them concurrently. > > I have to think more about all of that, but I guess this approach will be more complicated than my initial proposal. > > Best regards, > Martin > > >> -----Original Message----- >> From: Erik Osterlund >> Sent: Donnerstag, 18. Juli 2019 08:43 >> To: Doerr, Martin >> Cc: hotspot-runtime-dev at openjdk.java.net; serviceability- >> dev at openjdk.java.net >> Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access >> event requests at runtime >> >> Hi Martin, >> >> Since the JNI calls go through function pointers in the JNI env that go either >> to the fast or slow version, could one option be to go through the JNI envs >> and change the function pointers to the slow one when this JVMTI feature is >> enabled? >> >> Advantages: >> 1) No need to change the platform specific code that seems to surprisingly >> work right now. >> 2) No need for the fast path to check that condition. >> >> Just an idea. >> >> Thanks, >> /Erik >> >> >>> On 16 Jul 2019, at 15:31, Doerr, Martin wrote: >>> >>> Hi, >>> >>> the current implementation of FastJNIAccessors ignores the flag - >> XX:+UseFastJNIAccessors when the JVMTI capability >> "can_post_field_access" is enabled. >>> This is an unnecessary restriction which makes field accesses >> (GetField) from native code slower when a JVMTI agent is attached >> which enables this capability. >>> A better implementation would check at runtime if an agent actually wants >> to receive field access events. >>> >>> Note that the bytecode interpreter already uses this better >> implementation by checking if field access watch events were requested >> (JvmtiExport::_field_access_count != 0). >>> >>> I have implemented such a runtime check on all platforms which currently >> support FastJNIAccessors. >>> >>> My new jtreg test runtime/jni/FastGetField/FastGetField.java contains a >> micro benchmark: >>> test- >> support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/Fa >> stGetField/FastGetField.jtr >>> shows the duration of 10000 iterations with and without >> UseFastJNIAccessors (JVMTI agent gets attached in both runs). >>> My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with >> FastJNIAccessors and 11.2ms without it. >>> >>> Webrev: >>> >> http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/ >>> >>> We have run the test on 64 bit x86 platforms, SPARC and aarch64. >>> (FastJNIAccessors are not yet available on PPC64 and s390. I'll contribute >> them later.) >>> My webrev contains 32 bit implementations for x86 and arm, but >> completely untested. It'd be great if somebody could volunteer to review >> and test these platforms. >>> >>> Please review. >>> >>> Best regards, >>> Martin >>> > From daniel.daugherty at oracle.com Thu Jul 18 14:40:25 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 18 Jul 2019 10:40:25 -0400 Subject: RFR(S): 8227692: Remove develop feature PrintMallocStatistics In-Reply-To: References: <84fbdef1-3735-7344-2d97-ebcc38213140@oracle.com> Message-ID: > http://cr.openjdk.java.net/~mdoerr/8227692_remove_PrintMallocStatistics/webrev.00/ src/hotspot/share/memory/allocation.cpp ??? No comments. src/hotspot/share/memory/allocation.hpp ??? No comments. src/hotspot/share/memory/allocation.inline.hpp ??? No comments. src/hotspot/share/memory/arena.cpp ??? No comments. src/hotspot/share/memory/arena.hpp ??? No comments. src/hotspot/share/runtime/globals.hpp ??? No comments. src/hotspot/share/runtime/java.cpp ??? No comments. Thumbs up. On 7/18/19 5:09 AM, Doerr, Martin wrote: > Hi Dan, > > can I count this as review? Yes, now you can... :-) Dan > > Best regards, > Martin > > >> -----Original Message----- >> From: Daniel D. Daugherty >> Sent: Mittwoch, 17. Juli 2019 00:20 >> To: Doerr, Martin ; hotspot-runtime- >> dev at openjdk.java.net >> Cc: Baesken, Matthias >> Subject: Re: RFR(S): 8227692: Remove develop feature PrintMallocStatistics >> >> Thanks. Sounds good. >> >> Dan >> >> >> On 7/16/19 4:15 PM, Doerr, Martin wrote: >>> Hi Dan, >>> >>> I've added the proposal to use "-XX:NativeMemoryTracking=summary - >> XX:+PrintNMTStatistics" instead. >>> Thanks, >>> Martin >>> >>> >>>> -----Original Message----- >>>> From: Daniel D. Daugherty >>>> Sent: Dienstag, 16. Juli 2019 18:04 >>>> To: Doerr, Martin ; hotspot-runtime- >>>> dev at openjdk.java.net >>>> Cc: Baesken, Matthias >>>> Subject: Re: RFR(S): 8227692: Remove develop feature >> PrintMallocStatistics >>>> For anyone that happens to be searching JBS for what happened to the >>>> '-XX:+PrintMallocStatistics' option, you might want to include some >>>> guidance on how they get the equivalent information from NMT... >>>> >>>> A short note in JDK-8227692 should suffice... >>>> >>>> Dan >>>> >>>> >>>> On 7/15/19 4:16 PM, Doerr, Martin wrote: >>>>> Hi, >>>>> >>>>> as announced on hotspot-dev, I'd like to remove the debug build >> feature >>>> for allocation statistics "AllocStats" (controlled by develop flag - >>>> XX:+PrintMallocStatistics). >>>>> I've closed JDK-8227597>>> 8227597> which was a proposal to reduce the performance impact of it, >> but >>>> several people have suggested to remove this feature which is even >> better >>>> IMHO. >>>>> Bug: >>>>> https://bugs.openjdk.java.net/browse/JDK-8227692 >>>>> >>>>> Webrev: >>>>> >> http://cr.openjdk.java.net/~mdoerr/8227692_remove_PrintMallocStatistics/ >>>> webrev.00/ >>>>> I've also taken over the reworked inc_stat_counter from JDK- >>>> 8227597 >>>> (allocation.inline.hpp). >>>>> Please review. >>>>> >>>>> Best regards, >>>>> Martin >>>>> From david.holmes at oracle.com Fri Jul 19 00:30:03 2019 From: david.holmes at oracle.com (David Holmes) Date: Fri, 19 Jul 2019 10:30:03 +1000 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: References: Message-ID: <4dd7dca3-77ea-a682-ad00-e2d6bd75f7ae@oracle.com> Hi Martin, On 18/07/2019 8:01 pm, Doerr, Martin wrote: > Hi David and Erik, > > thank you for looking at my proposal. > >> If you try to use fast field accessors when you have to post the field >> access event then how can you safely go off into a JVM TI event callback ?? > > We speculatively load the field and check afterwards if we can use this loaded value. > It is safe to use it if there was no safepoint and no JVMTI event was requested. > Otherwise, we simply discard the (possibly) loaded value and load it again in the slow path where we do all the synchronization and event posting. Thanks for clarifying for me. That is all fine then. The dynamics of this still concern me, but those concerns are also present in the existing code. Currently we don't use the quick accessors if JvmtiExport::can_post_field_access() is true during VM startup - this is a one-of initialization check that sets the use of fast accessors for the lifetime of the JVM. But that is set between the early-start and start VM events, before the live-phase. But AFAICS the capability for can_post_field_access can be set or cleared dynamically during the live phase, thus invalidating the original decision on whether to use fast accessors or not. With your changes the state of can_post_field_access is still captured during VM initialization so again the decision to check for a field access watch is hard-wired for the lifetime of the VM. But once installed that check allows for use of the fast-path if no actual watches are set - which is the whole point of this enhancement. So the issue with both old and new code is that if the capability is not present at VM startup the VM will be configured to always use the fast path, even if the capability (and field access watches) are added later. Thanks, David > @Erik: > Thanks for your proposal to change the function pointers. I'll look into that. > > Best regards, > Martin > > >> -----Original Message----- >> From: David Holmes >> Sent: Donnerstag, 18. Juli 2019 06:39 >> To: Doerr, Martin ; hotspot-runtime- >> dev at openjdk.java.net; serviceability-dev at openjdk.java.net >> Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access >> event requests at runtime >> >> Hi Martin, >> >> I need to think about this some more. A critical property of the fast >> field accessors are that they are trivial and completely safe. They are >> complicated by the need to check if a GC may have happened while we >> directly read the field. >> >> If you try to use fast field accessors when you have to post the field >> access event then how can you safely go off into a JVM TI event callback ?? >> >> Thanks, >> David >> >> On 16/07/2019 11:31 pm, Doerr, Martin wrote: >>> Hi, >>> >>> the current implementation of FastJNIAccessors ignores the flag - >> XX:+UseFastJNIAccessors when the JVMTI capability >> "can_post_field_access" is enabled. >>> This is an unnecessary restriction which makes field accesses >> (GetField) from native code slower when a JVMTI agent is attached >> which enables this capability. >>> A better implementation would check at runtime if an agent actually wants >> to receive field access events. >>> >>> Note that the bytecode interpreter already uses this better >> implementation by checking if field access watch events were requested >> (JvmtiExport::_field_access_count != 0). >>> >>> I have implemented such a runtime check on all platforms which currently >> support FastJNIAccessors. >>> >>> My new jtreg test runtime/jni/FastGetField/FastGetField.java contains a >> micro benchmark: >>> test- >> support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/Fa >> stGetField/FastGetField.jtr >>> shows the duration of 10000 iterations with and without >> UseFastJNIAccessors (JVMTI agent gets attached in both runs). >>> My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with >> FastJNIAccessors and 11.2ms without it. >>> >>> Webrev: >>> >> http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/ >>> >>> We have run the test on 64 bit x86 platforms, SPARC and aarch64. >>> (FastJNIAccessors are not yet available on PPC64 and s390. I'll contribute >> them later.) >>> My webrev contains 32 bit implementations for x86 and arm, but >> completely untested. It'd be great if somebody could volunteer to review >> and test these platforms. >>> >>> Please review. >>> >>> Best regards, >>> Martin >>> From martin.doerr at sap.com Fri Jul 19 08:22:57 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 19 Jul 2019 08:22:57 +0000 Subject: RFR(S): 8227692: Remove develop feature PrintMallocStatistics In-Reply-To: References: <84fbdef1-3735-7344-2d97-ebcc38213140@oracle.com> Message-ID: Hi Coleen and Dan, thank you for reviewing. I've pushed it. Best regards, Martin > -----Original Message----- > From: Daniel D. Daugherty > Sent: Donnerstag, 18. Juli 2019 16:40 > To: Doerr, Martin ; hotspot-runtime- > dev at openjdk.java.net > Subject: Re: RFR(S): 8227692: Remove develop feature PrintMallocStatistics > > > > http://cr.openjdk.java.net/~mdoerr/8227692_remove_PrintMallocStatistics/ > webrev.00/ > > src/hotspot/share/memory/allocation.cpp > ??? No comments. > > src/hotspot/share/memory/allocation.hpp > ??? No comments. > > src/hotspot/share/memory/allocation.inline.hpp > ??? No comments. > > src/hotspot/share/memory/arena.cpp > ??? No comments. > > src/hotspot/share/memory/arena.hpp > ??? No comments. > > src/hotspot/share/runtime/globals.hpp > ??? No comments. > > src/hotspot/share/runtime/java.cpp > ??? No comments. > > Thumbs up. > > > On 7/18/19 5:09 AM, Doerr, Martin wrote: > > Hi Dan, > > > > can I count this as review? > > Yes, now you can... :-) > > Dan > > > > > > Best regards, > > Martin > > > > > >> -----Original Message----- > >> From: Daniel D. Daugherty > >> Sent: Mittwoch, 17. Juli 2019 00:20 > >> To: Doerr, Martin ; hotspot-runtime- > >> dev at openjdk.java.net > >> Cc: Baesken, Matthias > >> Subject: Re: RFR(S): 8227692: Remove develop feature > PrintMallocStatistics > >> > >> Thanks. Sounds good. > >> > >> Dan > >> > >> > >> On 7/16/19 4:15 PM, Doerr, Martin wrote: > >>> Hi Dan, > >>> > >>> I've added the proposal to use "-XX:NativeMemoryTracking=summary - > >> XX:+PrintNMTStatistics" instead. > >>> Thanks, > >>> Martin > >>> > >>> > >>>> -----Original Message----- > >>>> From: Daniel D. Daugherty > >>>> Sent: Dienstag, 16. Juli 2019 18:04 > >>>> To: Doerr, Martin ; hotspot-runtime- > >>>> dev at openjdk.java.net > >>>> Cc: Baesken, Matthias > >>>> Subject: Re: RFR(S): 8227692: Remove develop feature > >> PrintMallocStatistics > >>>> For anyone that happens to be searching JBS for what happened to the > >>>> '-XX:+PrintMallocStatistics' option, you might want to include some > >>>> guidance on how they get the equivalent information from NMT... > >>>> > >>>> A short note in JDK-8227692 should suffice... > >>>> > >>>> Dan > >>>> > >>>> > >>>> On 7/15/19 4:16 PM, Doerr, Martin wrote: > >>>>> Hi, > >>>>> > >>>>> as announced on hotspot-dev, I'd like to remove the debug build > >> feature > >>>> for allocation statistics "AllocStats" (controlled by develop flag - > >>>> XX:+PrintMallocStatistics). > >>>>> I've closed JDK-8227597 >>>> 8227597> which was a proposal to reduce the performance impact of it, > >> but > >>>> several people have suggested to remove this feature which is even > >> better > >>>> IMHO. > >>>>> Bug: > >>>>> https://bugs.openjdk.java.net/browse/JDK-8227692 > >>>>> > >>>>> Webrev: > >>>>> > >> > http://cr.openjdk.java.net/~mdoerr/8227692_remove_PrintMallocStatistics/ > >>>> webrev.00/ > >>>>> I've also taken over the reworked inc_stat_counter from JDK- > >>>> 8227597 > >>>> (allocation.inline.hpp). > >>>>> Please review. > >>>>> > >>>>> Best regards, > >>>>> Martin > >>>>> From martin.doerr at sap.com Fri Jul 19 11:11:06 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 19 Jul 2019 11:11:06 +0000 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: <4dd7dca3-77ea-a682-ad00-e2d6bd75f7ae@oracle.com> References: <4dd7dca3-77ea-a682-ad00-e2d6bd75f7ae@oracle.com> Message-ID: Hi David, thanks for elaborating on the capability enablement. With respect to "AddCapabilities", I've only found "Typically this function is used in the OnLoad function. Some virtual machines may allow a limited set of capabilities to be added in the live phase." in the spec [1]. I don't know which ones are supposed to be part of this "limited set of capabilities". As you already explained, adding the capability for field access events in the live phase does obviously not work for hotspot. The interpreter has the same issue. Best regards, Martin [1] https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#AddCapabilities > -----Original Message----- > From: David Holmes > Sent: Freitag, 19. Juli 2019 02:30 > To: Doerr, Martin ; hotspot-runtime- > dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Erik Osterlund > > Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access > event requests at runtime > > Hi Martin, > > On 18/07/2019 8:01 pm, Doerr, Martin wrote: > > Hi David and Erik, > > > > thank you for looking at my proposal. > > > >> If you try to use fast field accessors when you have to post the field > >> access event then how can you safely go off into a JVM TI event callback > ?? > > > > We speculatively load the field and check afterwards if we can use this > loaded value. > > It is safe to use it if there was no safepoint and no JVMTI event was > requested. > > Otherwise, we simply discard the (possibly) loaded value and load it again > in the slow path where we do all the synchronization and event posting. > > Thanks for clarifying for me. That is all fine then. > > The dynamics of this still concern me, but those concerns are also > present in the existing code. Currently we don't use the quick accessors > if JvmtiExport::can_post_field_access() is true during VM startup - this > is a one-of initialization check that sets the use of fast accessors for > the lifetime of the JVM. But that is set between the early-start and > start VM events, before the live-phase. But AFAICS the capability for > can_post_field_access can be set or cleared dynamically during the live > phase, thus invalidating the original decision on whether to use fast > accessors or not. With your changes the state of can_post_field_access > is still captured during VM initialization so again the decision to > check for a field access watch is hard-wired for the lifetime of the VM. > But once installed that check allows for use of the fast-path if no > actual watches are set - which is the whole point of this enhancement. > So the issue with both old and new code is that if the capability is not > present at VM startup the VM will be configured to always use the fast > path, even if the capability (and field access watches) are added later. > > Thanks, > David > > > @Erik: > > Thanks for your proposal to change the function pointers. I'll look into that. > > > > Best regards, > > Martin > > > > > >> -----Original Message----- > >> From: David Holmes > >> Sent: Donnerstag, 18. Juli 2019 06:39 > >> To: Doerr, Martin ; hotspot-runtime- > >> dev at openjdk.java.net; serviceability-dev at openjdk.java.net > >> Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field > access > >> event requests at runtime > >> > >> Hi Martin, > >> > >> I need to think about this some more. A critical property of the fast > >> field accessors are that they are trivial and completely safe. They are > >> complicated by the need to check if a GC may have happened while we > >> directly read the field. > >> > >> If you try to use fast field accessors when you have to post the field > >> access event then how can you safely go off into a JVM TI event callback > ?? > >> > >> Thanks, > >> David > >> > >> On 16/07/2019 11:31 pm, Doerr, Martin wrote: > >>> Hi, > >>> > >>> the current implementation of FastJNIAccessors ignores the flag - > >> XX:+UseFastJNIAccessors when the JVMTI capability > >> "can_post_field_access" is enabled. > >>> This is an unnecessary restriction which makes field accesses > >> (GetField) from native code slower when a JVMTI agent is > attached > >> which enables this capability. > >>> A better implementation would check at runtime if an agent actually > wants > >> to receive field access events. > >>> > >>> Note that the bytecode interpreter already uses this better > >> implementation by checking if field access watch events were requested > >> (JvmtiExport::_field_access_count != 0). > >>> > >>> I have implemented such a runtime check on all platforms which > currently > >> support FastJNIAccessors. > >>> > >>> My new jtreg test runtime/jni/FastGetField/FastGetField.java contains a > >> micro benchmark: > >>> test- > >> > support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/Fa > >> stGetField/FastGetField.jtr > >>> shows the duration of 10000 iterations with and without > >> UseFastJNIAccessors (JVMTI agent gets attached in both runs). > >>> My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with > >> FastJNIAccessors and 11.2ms without it. > >>> > >>> Webrev: > >>> > >> > http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/ > >>> > >>> We have run the test on 64 bit x86 platforms, SPARC and aarch64. > >>> (FastJNIAccessors are not yet available on PPC64 and s390. I'll contribute > >> them later.) > >>> My webrev contains 32 bit implementations for x86 and arm, but > >> completely untested. It'd be great if somebody could volunteer to review > >> and test these platforms. > >>> > >>> Please review. > >>> > >>> Best regards, > >>> Martin > >>> From fweimer at redhat.com Fri Jul 19 12:09:21 2019 From: fweimer at redhat.com (Florian Weimer) Date: Fri, 19 Jul 2019 14:09:21 +0200 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: (Jiangli Zhou's message of "Mon, 8 Jul 2019 07:27:11 -0700") References: <874l4bzekk.fsf@oldenburg2.str.redhat.com> <87k1d7xxy7.fsf@oldenburg2.str.redhat.com> <87ftnvt604.fsf@oldenburg2.str.redhat.com> <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> <0dacfb70-8901-9e9f-9a1f-6a5dc1bc6a91@oracle.com> <875zocyiyo.fsf@oldenburg2.str.redhat.com> Message-ID: <87d0i6kyz2.fsf@oldenburg2.str.redhat.com> * Jiangli Zhou: > Here is the full webrev: > http://cr.openjdk.java.net/~jiangli/8225035/webrev.05/, including the > additional comments above get_static_tls_area_size. > > Best regards, > Jiangli > > On Mon, Jul 8, 2019 at 2:27 AM Florian Weimer wrote: >> >> * Jiangli Zhou: >> >> > As you, Florian, Thomas all made great contributions to this >> > workaround, I should list all of you as both contributors and >> > reviewers in the changeset. If there is any objection, please let me >> > know. >> >> Can you share a link with the final patch? I would like to have another >> look. Thanks, looks reasonable. Note that a funny consequence is that the flag may now actually lower stack sizes on recent glibcs because when the flag is enabled, the guard size accounting is again what it used be. Without the flag and new-enough glibc, the guard size is added twice to the stack size (once in OpenJDK and once in glibc). Thanks, Florian From jianglizhou at google.com Mon Jul 22 01:17:06 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Sun, 21 Jul 2019 18:17:06 -0700 Subject: RFR: 8225035: Thread stack size issue caused by large TLS size In-Reply-To: <87d0i6kyz2.fsf@oldenburg2.str.redhat.com> References: <874l4bzekk.fsf@oldenburg2.str.redhat.com> <87k1d7xxy7.fsf@oldenburg2.str.redhat.com> <87ftnvt604.fsf@oldenburg2.str.redhat.com> <87r27fro9e.fsf@oldenburg2.str.redhat.com> <0e500f95-c5dd-4c18-0b85-ce5b2c6a4a11@oracle.com> <347fd6e2-e925-7126-3a42-25343dea2d65@oracle.com> <614f1c9e-1828-f8ba-0d23-886ac26a5af9@oracle.com> <0dacfb70-8901-9e9f-9a1f-6a5dc1bc6a91@oracle.com> <875zocyiyo.fsf@oldenburg2.str.redhat.com> <87d0i6kyz2.fsf@oldenburg2.str.redhat.com> Message-ID: Hi Florian, On Fri, Jul 19, 2019 at 5:09 AM Florian Weimer wrote: > > * Jiangli Zhou: > > > Here is the full webrev: > > http://cr.openjdk.java.net/~jiangli/8225035/webrev.05/, including the > > additional comments above get_static_tls_area_size. > > > > Best regards, > > Jiangli > > > > On Mon, Jul 8, 2019 at 2:27 AM Florian Weimer wrote: > >> > >> * Jiangli Zhou: > >> > >> > As you, Florian, Thomas all made great contributions to this > >> > workaround, I should list all of you as both contributors and > >> > reviewers in the changeset. If there is any objection, please let me > >> > know. > >> > >> Can you share a link with the final patch? I would like to have another > >> look. > > Thanks, looks reasonable. > > Note that a funny consequence is that the flag may now actually lower > stack sizes on recent glibcs because when the flag is enabled, the guard > size accounting is again what it used be. Without the flag and > new-enough glibc, the guard size is added twice to the stack size (once > in OpenJDK and once in glibc). Agree with your comment about the double-counted guard size overhead in the default case (without enabling of the stack size adjustment) with newer glibc. It might be a good idea to open an RFE/bug for that. It would need careful testing and some discussions on how to properly address that. Best regards, Jiangli > > Thanks, > Florian From david.holmes at oracle.com Mon Jul 22 06:49:46 2019 From: david.holmes at oracle.com (David Holmes) Date: Mon, 22 Jul 2019 16:49:46 +1000 Subject: RFC: JWarmup precompile java hot methods at application startup In-Reply-To: <40bd126f-ca71-4a71-8fda-552cf8f289ad.kuaiwei.kw@alibaba-inc.com> References: <8cfbaa83-c50f-61c4-5336-5f30b3885d45@oracle.com> <26f88253-deea-64d5-714c-28bb73989c62@oracle.com> <40bd126f-ca71-4a71-8fda-552cf8f289ad.kuaiwei.kw@alibaba-inc.com> Message-ID: <6c74ebb1-1292-43ae-86d2-9c8be14af0e9@oracle.com> Hi Kuai, On 21/06/2019 5:18 pm, Kuai Wei wrote: > Hi David, > > Sorry for the late reply. Sorry for my even later one. I was traveling and then had vacation, and have had other things to look at. > We plan to create a wiki page on OpenJDK website and put the design documents there. How do you think about it? That sounds like a good idea. > Here are the answers to some questions in your last message: I haven't had time to context switch in everything sorry. So just a couple of responses below. > - "source file" in JWarmUp record: > Application will load same class from multiple places. For example, logging jar will be packaged by different web apps. So we record this property. > > - super class resolution > It's used for diagnostic. Same class loaded by different loaders will cause a warning message in PreloadClassChain::record_loaded_class(). We use the super class resolve mark to reduce warning messages when resolving super class. We are thinking to refine it. If "refine" means remove then I encourage your thinking :) > > - dummy method > It's hard to know whether JWarmUp compilations are completed or not. The dummy method is used as the last method compiled due to JWarmUp. We are able to check its entry to see whether all compilations are finished. > > - native entry in jvm.cpp > JWarmUp defined some jvm entries which are invoked by java. We assume all jvm entries are put into jvm.cpp. Would you give us some reference we can follow? jvm.cpp contains the definitions of the JVM entry point methods but it doesn't (as you can see in existing file) contain code for registering those methods: #define CC (char*) static JNINativeMethod jdk_jwarmup_JWarmUp_methods[] = { { CC "notifyApplicationStartUpIsDone0", CC "()V", (void *)&JVM_NotifyApplicationStartUpIsDone}, { CC "checkIfCompilationIsComplete0", CC "()Z", (void *)&JVM_CheckJWarmUpCompilationIsComplete}, { CC "notifyJVMDeoptWarmUpMethods0", CC "()V", (void *)&JVM_NotifyJVMDeoptWarmUpMethods} }; JVM_ENTRY(void, JVM_RegisterJWarmUpMethods(JNIEnv *env, jclass jwarmupclass)) JVMWrapper("JVM_RegisterJWarmUpMethods"); ThreadToNativeFromVM ttnfv(thread); // can't be in VM when we call JNI int ok = env->RegisterNatives(jwarmupclass, jdk_jwarmup_JWarmUp_methods, sizeof(jdk_jwarmup_JWarmUp_methods)/sizeof(JNINativeMethod)); guarantee(ok == 0, "register jdk.jwarmup.JWarmUp natives"); JVM_END #undef CC That is typically done by the C code in the JDK. See for example src/java.base/share/native/libjava/System.c > - logging flags > JWarmUp was initially developed for JDK8. A flag was used to print out trace. When we ported the patch to JDK tip, we changed code to use the new log utility but with the legacy flag kept. Please remove legacy flag. > - VM flags > We will check and remove unnecessary flags. Thank you. > - init.cpp and mutex initialization > We will modify that. > > - Deoptimization change > I'm not clear about that. Would you like to provide more details? We will check the impact on our patch. I forget the exact context now sorry. If you've rebased to latest code and everything builds and runs then that should suffice. I have a lot of general concerns about the impact of this work on various areas of the JVM. It really needs to be as unobtrusive as possible and ideally causing no changes to executed code unless enabled. Potentially/possibly it might even need to be selectable at build-time, as to whether this feature is included. And I apologise in advance because I don't have a lot of time to deep dive into all the details of this proposed feature. Thanks, David ----- > Thanks, > Kuai Wei > > > > > ------------------------------------------------------------------ > From:David Holmes > Send Time:2019?6?10?(???) 15:18 > To:yumin qi ; hotspot-runtim. > Cc:hotspot-dev > Subject:Re: RFC: JWarmup precompile java hot methods at application startup > > Hi Yumin, > > On 8/06/2019 3:25 am, yumin qi wrote: >> Hi, David and all >> >> Can I have one more comment from runtime expert for the JEP? >> David, can you comment for the changes? Really appreciate your last >> comment. It is best if you follow the comment. >> Looking forward to having your comment. > > I still have a lot of trouble understanding the overall design here. The > JEP is very high-level; the webrev is very low-level; and there's > nothing in between to explain the details of the design - the kind of > document you would produce for a design review/walkthrough. For example > I can't see why you need to record the "source file"; I can't see why > you need to make changes to the superclass resolution. I can't tell when > changes outside of Jwarmup may need to make changes to the Jwarmup code > - the dependencies are unclear. I'm unclear on the role of the "dummy > method" - is it just a sentinel? Why do we need it versus using some > state in the JitWarmup instance? > > Some further code comments, but not a file by file review by any means ... > > The code split between the JDK and JVM doesn't seem quite right to me. > registerNatives is something usually done by the JDK .c files > corresponding to the classes defining the native method; it's not > something done in jvm.cpp. Or if this is meant to be a special case like > JVM_RegisterMethodHandleMethods then probably it should be in the > jwarmup.cpp file. Also if you pass the necessary objects through the API > you won't need to jump back to native to call a JNI function. > > AliasedLoggingFlags are for converting legacy flags to unified logging. > You should just be using UL and not introducing the > PrintCompilationWarmUpDetail psuedo-flag. > > This work introduces thirteen new VM flags! That's very excessive. > Perhaps you should look at defining something more like -Xlog that > encodes all the options? (And this will need a very lengthy CSR request!). > > The init.cpp code should be factored out into jwarmup_init functions in > jwarmup.cpp. > > Mutex initialization should be conditional on jwarmup being enabled. > > Deoptimization has been changed lately to avoid use of safepoints so you > may need to re-examine that aspect. > > You have a number of uses of patterns like this (but not everywhere): > > + JitWarmUp* jwp = JitWarmUp::instance(); > + assert(jwp != NULL, "sanity check"); > + jwp->preloader()->jvm_booted_is_done(); > > The assertion should be inside instance() so that these collapse to a > single line: > > JitWarmup::instance()->preloader->whatever(); > > Your Java files have @version 1.8. > > --- > > Cheers, > David > ----- > > > >> Thanks >> Yumin >> >> On Sun, May 19, 2019 at 10:28 AM yumin qi > > wrote: >> >> Hi, Tobias and all >> Have done changes based on Tobias' comments. New webrev based on >> most recent base is updated at: >> http://cr.openjdk.java.net/~minqi/8220692/webrev-03/ >> >> Tested local for jwarmup and compiler. >> >> Thanks >> Yumin >> >> On Tue, May 14, 2019 at 11:26 AM yumin qi > > wrote: >> >> HI, Tobias >> >> Thanks very much for the comments. >> >> On Mon, May 13, 2019 at 2:58 AM Tobias Hartmann >> > >> wrote: >> >> Hi Yumin, >> >> > In this version, the profiled method data is not used at >> > precomilation, it will be addressed in followed bug fix. >> After the >> > first version integrated, will file bug for it. >> >> Why is that? I think it would be good to have everything in >> one JEP. >> >> >> We have done some tests on adding profiling data and found the >> result is not as expected, and the current version is working >> well for internal online applications. There is no other reason >> not adding to this patch now, we will like to study further to >> see if we can improve that for a better performance. >> >> I've looked at the compiler related changes. Here are some >> comments/questions. >> >> ciMethod.cpp >> - So CompilationWarmUp is not using any profile information? >> Not even the profile obtained in the >> current execution? >> >> >> Yes. This is also related to previous question. >> >> compile.cpp >> - line 748: Why is that required? Couldn't it happen that a >> method is never compiled because the >> code that would resolve a field is never executed? >> >> >> Here a very aggressive decision --- to avoid compilation failure >> requires that all fields have already been resolved. >> >> graphKit.cpp >> - line 2832: please add a comment >> - line 2917: checks should be merged into one if and please >> add a comment >> >> >> Will fix it. >> >> jvm.cpp >> - Could you explain why it's guaranteed that warmup >> compilation is completed once the dummy method >> is compiled? And why is it hardcoded to print >> "com.alibaba.jwarmup.JWarmUp"? >> >> >> This is from practical testing of real applications. Due to the >> parallelism of compilation works, it should check if >> compilation queue contains any of those methods --- completed if >> no any of them on the queue and it is not economic. By using of >> a dummy method as a simplified version for that, in real case, >> it is not observed that dummy method is not the last compilation >> for warmup. Do you have suggestion of a way to do that? The >> dummy way is not strictly a guaranteed one theoretically. >> Forgot to change the print to new name after renaming package, >> thanks for the catching. >> >> - What is test_symbol_matcher() used for? >> >> >> This is a leftover(used to test matching patterns), will remove >> it from the file. >> >> jitWarmUp.cpp: >> - line 146: So what about methods that are only ever >> compiled at C1 level? Wouldn't it make sense to >> keep track of the comp_level during CompilationWarmUpRecording? >> >> >> Will consider your suggestion in future work on it. >> >> I also found several typos while reading through the code >> (listed in random order): >> >> globals.hpp >> - "flushing profling" -> "flushing profiling" >> >> method.hpp >> - "when this method first been invoked" >> >> templateInterpreterGenerator_x86.cpp >> - initializition -> initialization >> >> dict.cpp >> - initializated -> initialized >> >> jitWarmUp.cpp >> - uninitilaized -> uninitialized >> - inited -> should be initialized, right? >> >> jitWarmUp.hpp >> . nofityApplicationStartUpIsDone -> >> notifyApplicationStartUpIsDone >> >> constantPool.cpp >> - recusive -> recursive >> >> JWarmUp.java >> - appliacation -> application >> >> TestThrowInitializaitonException.java -> >> TestThrowInitializationException.java >> >> These tests should be renamed (it's not clear what issue the >> number refers to): >> - issue9780156.sh >> - Issue11272598.java >> >> >> Will fix all above suggestions. >> >> Thanks! >> >> Yumin >> > From david.holmes at oracle.com Mon Jul 22 07:21:57 2019 From: david.holmes at oracle.com (David Holmes) Date: Mon, 22 Jul 2019 17:21:57 +1000 Subject: RFR (trivial): 8225782: Remove expired flags in JDK 14 Message-ID: <2e8bd4b1-e20b-f916-4d82-dbd20589d02f@oracle.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8225782 webrev: http://cr.openjdk.java.net/~dholmes/8225782/webrev/ Expired flags are deleted from the table. No documentation updates needed. Thanks, David From harold.seigel at oracle.com Mon Jul 22 12:50:50 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Mon, 22 Jul 2019 08:50:50 -0400 Subject: RFR (trivial): 8225782: Remove expired flags in JDK 14 In-Reply-To: <2e8bd4b1-e20b-f916-4d82-dbd20589d02f@oracle.com> References: <2e8bd4b1-e20b-f916-4d82-dbd20589d02f@oracle.com> Message-ID: Hi David, This looks good and trivial. Harold On 7/22/2019 3:21 AM, David Holmes wrote: > Bug: https://bugs.openjdk.java.net/browse/JDK-8225782 > webrev: http://cr.openjdk.java.net/~dholmes/8225782/webrev/ > > Expired flags are deleted from the table. No documentation updates > needed. > > Thanks, > David From ralf.schmelter at sap.com Mon Jul 22 13:35:34 2019 From: ralf.schmelter at sap.com (Schmelter, Ralf) Date: Mon, 22 Jul 2019 13:35:34 +0000 Subject: [PING] RE: RFR (M) 8191521: handle long relative path specified in -Xbootclasspath/a on windows Message-ID: Since I will leave for somewhat extended holidays next week, It would be great if this item could be reviewed until then. Best regards, Ralf From coleen.phillimore at oracle.com Mon Jul 22 15:37:31 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 22 Jul 2019 11:37:31 -0400 Subject: RFR (S) 8228484: Remove NoAllocVerifier because nothing uses it Message-ID: <7ccb65ed-ac5b-bd99-7660-cc51da930162@oracle.com> Tested with mach5 tier1-3. open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228484.01/webrev bug link https://bugs.openjdk.java.net/browse/JDK-8228484 Thanks, Coleen From martin.doerr at sap.com Mon Jul 22 15:39:12 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 22 Jul 2019 15:39:12 +0000 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: References: <4dd7dca3-77ea-a682-ad00-e2d6bd75f7ae@oracle.com> Message-ID: Hi David and Erik, I've tried to add the capability "can_generate_field_access_events" during live phase and got "AddCapabilities failed with error 98" which is "JVMTI_ERROR_NOT_AVAILABLE". So hotspot does not support switching it on during live phase. Hotspot initializes "can_generate_field_modification_events" during "init_onload_solo_capabilities". As the name tells, it is implemented as an "onload" capability. So the VM works as expected with and without my change. Can I add you as reviewers? If yes, which parts did you review (x86, SPARC, shared code)? Thanks and best regards, Martin > -----Original Message----- > From: Doerr, Martin > Sent: Freitag, 19. Juli 2019 13:11 > To: David Holmes ; hotspot-runtime- > dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Erik Osterlund > > Subject: RE: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access > event requests at runtime > > Hi David, > > thanks for elaborating on the capability enablement. > With respect to "AddCapabilities", I've only found "Typically this function is > used in the OnLoad function. Some virtual machines may allow a limited set > of capabilities to be added in the live phase." in the spec [1]. > I don't know which ones are supposed to be part of this "limited set of > capabilities". > As you already explained, adding the capability for field access events in the > live phase does obviously not work for hotspot. > The interpreter has the same issue. > > Best regards, > Martin > > > [1] > https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#AddCapa > bilities > > > > -----Original Message----- > > From: David Holmes > > Sent: Freitag, 19. Juli 2019 02:30 > > To: Doerr, Martin ; hotspot-runtime- > > dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Erik > Osterlund > > > > Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field > access > > event requests at runtime > > > > Hi Martin, > > > > On 18/07/2019 8:01 pm, Doerr, Martin wrote: > > > Hi David and Erik, > > > > > > thank you for looking at my proposal. > > > > > >> If you try to use fast field accessors when you have to post the field > > >> access event then how can you safely go off into a JVM TI event callback > > ?? > > > > > > We speculatively load the field and check afterwards if we can use this > > loaded value. > > > It is safe to use it if there was no safepoint and no JVMTI event was > > requested. > > > Otherwise, we simply discard the (possibly) loaded value and load it again > > in the slow path where we do all the synchronization and event posting. > > > > Thanks for clarifying for me. That is all fine then. > > > > The dynamics of this still concern me, but those concerns are also > > present in the existing code. Currently we don't use the quick accessors > > if JvmtiExport::can_post_field_access() is true during VM startup - this > > is a one-of initialization check that sets the use of fast accessors for > > the lifetime of the JVM. But that is set between the early-start and > > start VM events, before the live-phase. But AFAICS the capability for > > can_post_field_access can be set or cleared dynamically during the live > > phase, thus invalidating the original decision on whether to use fast > > accessors or not. With your changes the state of can_post_field_access > > is still captured during VM initialization so again the decision to > > check for a field access watch is hard-wired for the lifetime of the VM. > > But once installed that check allows for use of the fast-path if no > > actual watches are set - which is the whole point of this enhancement. > > So the issue with both old and new code is that if the capability is not > > present at VM startup the VM will be configured to always use the fast > > path, even if the capability (and field access watches) are added later. > > > > Thanks, > > David > > > > > @Erik: > > > Thanks for your proposal to change the function pointers. I'll look into > that. > > > > > > Best regards, > > > Martin > > > > > > > > >> -----Original Message----- > > >> From: David Holmes > > >> Sent: Donnerstag, 18. Juli 2019 06:39 > > >> To: Doerr, Martin ; hotspot-runtime- > > >> dev at openjdk.java.net; serviceability-dev at openjdk.java.net > > >> Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field > > access > > >> event requests at runtime > > >> > > >> Hi Martin, > > >> > > >> I need to think about this some more. A critical property of the fast > > >> field accessors are that they are trivial and completely safe. They are > > >> complicated by the need to check if a GC may have happened while we > > >> directly read the field. > > >> > > >> If you try to use fast field accessors when you have to post the field > > >> access event then how can you safely go off into a JVM TI event callback > > ?? > > >> > > >> Thanks, > > >> David > > >> > > >> On 16/07/2019 11:31 pm, Doerr, Martin wrote: > > >>> Hi, > > >>> > > >>> the current implementation of FastJNIAccessors ignores the flag - > > >> XX:+UseFastJNIAccessors when the JVMTI capability > > >> "can_post_field_access" is enabled. > > >>> This is an unnecessary restriction which makes field accesses > > >> (GetField) from native code slower when a JVMTI agent is > > attached > > >> which enables this capability. > > >>> A better implementation would check at runtime if an agent actually > > wants > > >> to receive field access events. > > >>> > > >>> Note that the bytecode interpreter already uses this better > > >> implementation by checking if field access watch events were > requested > > >> (JvmtiExport::_field_access_count != 0). > > >>> > > >>> I have implemented such a runtime check on all platforms which > > currently > > >> support FastJNIAccessors. > > >>> > > >>> My new jtreg test runtime/jni/FastGetField/FastGetField.java contains > a > > >> micro benchmark: > > >>> test- > > >> > > > support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/Fa > > >> stGetField/FastGetField.jtr > > >>> shows the duration of 10000 iterations with and without > > >> UseFastJNIAccessors (JVMTI agent gets attached in both runs). > > >>> My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with > > >> FastJNIAccessors and 11.2ms without it. > > >>> > > >>> Webrev: > > >>> > > >> > > > http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/ > > >>> > > >>> We have run the test on 64 bit x86 platforms, SPARC and aarch64. > > >>> (FastJNIAccessors are not yet available on PPC64 and s390. I'll > contribute > > >> them later.) > > >>> My webrev contains 32 bit implementations for x86 and arm, but > > >> completely untested. It'd be great if somebody could volunteer to > review > > >> and test these platforms. > > >>> > > >>> Please review. > > >>> > > >>> Best regards, > > >>> Martin > > >>> From harold.seigel at oracle.com Mon Jul 22 17:26:05 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Mon, 22 Jul 2019 13:26:05 -0400 Subject: RFR (S) 8228484: Remove NoAllocVerifier because nothing uses it In-Reply-To: <7ccb65ed-ac5b-bd99-7660-cc51da930162@oracle.com> References: <7ccb65ed-ac5b-bd99-7660-cc51da930162@oracle.com> Message-ID: Hi Coleen, The change looks good! Thanks, Harold On 7/22/2019 11:37 AM, coleen.phillimore at oracle.com wrote: > Tested with mach5 tier1-3. > > open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228484.01/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8228484 > > Thanks, > Coleen From coleen.phillimore at oracle.com Mon Jul 22 17:45:02 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 22 Jul 2019 13:45:02 -0400 Subject: RFR (S) 8228484: Remove NoAllocVerifier because nothing uses it In-Reply-To: References: <7ccb65ed-ac5b-bd99-7660-cc51da930162@oracle.com> Message-ID: <5416d344-b088-384b-8eb7-73ec2fcb001c@oracle.com> Thanks, Harold! Coleen On 7/22/19 1:26 PM, Harold Seigel wrote: > Hi Coleen, > > The change looks good! > > Thanks, Harold > > On 7/22/2019 11:37 AM, coleen.phillimore at oracle.com wrote: >> Tested with mach5 tier1-3. >> >> open webrev at >> http://cr.openjdk.java.net/~coleenp/2019/8228484.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8228484 >> >> Thanks, >> Coleen From kim.barrett at oracle.com Mon Jul 22 19:59:03 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 22 Jul 2019 15:59:03 -0400 Subject: RFR (S) 8228484: Remove NoAllocVerifier because nothing uses it In-Reply-To: <7ccb65ed-ac5b-bd99-7660-cc51da930162@oracle.com> References: <7ccb65ed-ac5b-bd99-7660-cc51da930162@oracle.com> Message-ID: > On Jul 22, 2019, at 11:37 AM, coleen.phillimore at oracle.com wrote: > > Tested with mach5 tier1-3. > > open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228484.01/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8228484 > > Thanks, > Coleen Looks good. One comment fix, for which I don?t need a new webrev: src/hotspot/share/runtime/thread.hpp 379 // The class NoSafepointVerifier is used to set these counters. s/these counters/this counter/ -- there's only one counter now. From coleen.phillimore at oracle.com Mon Jul 22 20:51:28 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 22 Jul 2019 16:51:28 -0400 Subject: RFR (S) 8228484: Remove NoAllocVerifier because nothing uses it In-Reply-To: References: <7ccb65ed-ac5b-bd99-7660-cc51da930162@oracle.com> Message-ID: On 7/22/19 3:59 PM, Kim Barrett wrote: >> On Jul 22, 2019, at 11:37 AM, coleen.phillimore at oracle.com wrote: >> >> Tested with mach5 tier1-3. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228484.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8228484 >> >> Thanks, >> Coleen > Looks good. > > One comment fix, for which I don?t need a new webrev: > > src/hotspot/share/runtime/thread.hpp > 379 // The class NoSafepointVerifier is used to set these counters. > > s/these counters/this counter/ -- there's only one counter now. > Oh, yes.? Good find.? I'll fix it. Thanks!! Coleen From calvin.cheung at oracle.com Mon Jul 22 20:56:42 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Mon, 22 Jul 2019 13:56:42 -0700 Subject: [13] RFR(xs) 8228407: JVM crashes with shared archive file mismatch Message-ID: bug: https://bugs.openjdk.java.net/browse/JDK-8228407 webrev: http://cr.openjdk.java.net/~ccheung/8228407/13-webrev.00/ This bug is a regression caused by the fix for JDK-8226406. Please refer to the bug report for reproducing steps and evaluation. Tested locally on linux-x64 and windows-x64. Will run tier1 - 3 tests. thanks, Calvin From david.holmes at oracle.com Tue Jul 23 03:41:40 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 23 Jul 2019 13:41:40 +1000 Subject: [13] RFR(xs) 8228407: JVM crashes with shared archive file mismatch In-Reply-To: References: Message-ID: Hi Calvin, That fix seems fine. Thanks, David On 23/07/2019 6:56 am, Calvin Cheung wrote: > bug: https://bugs.openjdk.java.net/browse/JDK-8228407 > > webrev: http://cr.openjdk.java.net/~ccheung/8228407/13-webrev.00/ > > This bug is a regression caused by the fix for JDK-8226406. Please refer > to the bug report for reproducing steps and evaluation. > > Tested locally on linux-x64 and windows-x64. Will run tier1 - 3 tests. > > thanks, > > Calvin > From calvin.cheung at oracle.com Tue Jul 23 03:44:59 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Mon, 22 Jul 2019 20:44:59 -0700 Subject: [13] RFR(xs) 8228407: JVM crashes with shared archive file mismatch In-Reply-To: References: Message-ID: <1e850896-77b6-b3bf-ddc8-e0a7331d3ef0@oracle.com> Thanks David! I'll add fix request to the bug report. Calvin On 7/22/19 8:41 PM, David Holmes wrote: > Hi Calvin, > > That fix seems fine. > > Thanks, > David > > On 23/07/2019 6:56 am, Calvin Cheung wrote: >> bug: https://bugs.openjdk.java.net/browse/JDK-8228407 >> >> webrev: http://cr.openjdk.java.net/~ccheung/8228407/13-webrev.00/ >> >> This bug is a regression caused by the fix for JDK-8226406. Please >> refer to the bug report for reproducing steps and evaluation. >> >> Tested locally on linux-x64 and windows-x64. Will run tier1 - 3 tests. >> >> thanks, >> >> Calvin >> From david.holmes at oracle.com Tue Jul 23 03:59:29 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 23 Jul 2019 13:59:29 +1000 Subject: RFR (M) 8191521: handle long relative path specified in -Xbootclasspath/a on windows In-Reply-To: References: Message-ID: Hi Ralf, I think this issue is struggling to find hotspot developers with indepth familiarity with Windows UNC paths - sorry. It's not something I know much about. We try to use equivalent JDK code for guidance here as those folk have more familiarity with this. So does this problem exist in the JDK as well or is it unique to hotspot? Thanks, David On 4/07/2019 8:56 pm, Schmelter, Ralf wrote: > Hi, > > can you please review this patch to fix various long path related problems in the hotspot os code on Windows. > > As described in the bug the current code cannot handle relative paths in these cases: > > 1. If the relative path is < 260 chars, but the absolute path is > 260 chars. In this case if the I/O method uses the *A variant of the system call as an optimization, it will fail. > 2. If the relative path is > 260 chars or the I/O method always uses the *W variant. In this case the create_unc_path() method is called, which just prepends \\?\ to the relative path. But this is not a valid path to use and the system call will fail. > > Additionally there are problems with some other kinds of paths: > > 1. An absolute path which contains '.' or '..' parts and is > 260 chars or the I/O method always uses the *W variant. When given to the create_unc_path() method, it will just prepend \\?\. But this is not a valid path to use and the system call will fail. > 2. An UNC path which is > 260 or the I/O method always uses the *W variant. The create_unc_path erroneously converts \\host\path to \\?\UNC\\host\path (notice the double backslash before the host name). This again is not a valid path. Additionally '.' or '..' parts would not be handled correctly too. > > To fix this I've introduced a new function, which converts a path to a wide character unc path, calling _wfullpath() to make the path absolute if needed and to remove the '.' and '..' path parts. I've adjusted all methods which used create_unc_path() to use the new method. And I removed all fallback code using the ANSI variants, since benchmarking showed that on my machine the additional overhead of converting to a wchar and potentially calling _wfullpath() was less than 5% of the actual I/O routine called. And for this reason, why I haven't tried to optimize avoiding calls to _wfullpath() (e.g. checking for '.' and '..' and only calling it if we find this in the path). > > bugreport: https://bugs.openjdk.java.net/browse/JDK-8191521 > webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8191521/webrev.0/ > > Best regards, > Ralf > From david.holmes at oracle.com Tue Jul 23 04:28:19 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 23 Jul 2019 14:28:19 +1000 Subject: RFR (trivial): 8225782: Remove expired flags in JDK 14 In-Reply-To: References: <2e8bd4b1-e20b-f916-4d82-dbd20589d02f@oracle.com> Message-ID: <39bf7083-8890-ca1a-82ae-cfb09b8a6d1a@oracle.com> Thanks Harold! David On 22/07/2019 10:50 pm, Harold Seigel wrote: > Hi David, > > This looks good and trivial. > > Harold > > On 7/22/2019 3:21 AM, David Holmes wrote: >> Bug: https://bugs.openjdk.java.net/browse/JDK-8225782 >> webrev: http://cr.openjdk.java.net/~dholmes/8225782/webrev/ >> >> Expired flags are deleted from the table. No documentation updates >> needed. >> >> Thanks, >> David From david.holmes at oracle.com Tue Jul 23 05:05:09 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 23 Jul 2019 15:05:09 +1000 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: References: <4dd7dca3-77ea-a682-ad00-e2d6bd75f7ae@oracle.com> Message-ID: Hi Martin, On 23/07/2019 1:39 am, Doerr, Martin wrote: > Hi David and Erik, > > I've tried to add the capability "can_generate_field_access_events" during live phase and got "AddCapabilities failed with error 98" which is "JVMTI_ERROR_NOT_AVAILABLE". So hotspot does not support switching it on during live phase. > > Hotspot initializes "can_generate_field_modification_events" during "init_onload_solo_capabilities". As the name tells, it is implemented as an "onload" capability. > > So the VM works as expected with and without my change. Okay - thanks for verifying that. > Can I add you as reviewers? > If yes, which parts did you review (x86, SPARC, shared code)? I reviewed x86, sparc and shared code. Thanks, David > Thanks and best regards, > Martin > > >> -----Original Message----- >> From: Doerr, Martin >> Sent: Freitag, 19. Juli 2019 13:11 >> To: David Holmes ; hotspot-runtime- >> dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Erik Osterlund >> >> Subject: RE: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access >> event requests at runtime >> >> Hi David, >> >> thanks for elaborating on the capability enablement. >> With respect to "AddCapabilities", I've only found "Typically this function is >> used in the OnLoad function. Some virtual machines may allow a limited set >> of capabilities to be added in the live phase." in the spec [1]. >> I don't know which ones are supposed to be part of this "limited set of >> capabilities". >> As you already explained, adding the capability for field access events in the >> live phase does obviously not work for hotspot. >> The interpreter has the same issue. >> >> Best regards, >> Martin >> >> >> [1] >> https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#AddCapa >> bilities >> >> >>> -----Original Message----- >>> From: David Holmes >>> Sent: Freitag, 19. Juli 2019 02:30 >>> To: Doerr, Martin ; hotspot-runtime- >>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Erik >> Osterlund >>> >>> Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field >> access >>> event requests at runtime >>> >>> Hi Martin, >>> >>> On 18/07/2019 8:01 pm, Doerr, Martin wrote: >>>> Hi David and Erik, >>>> >>>> thank you for looking at my proposal. >>>> >>>>> If you try to use fast field accessors when you have to post the field >>>>> access event then how can you safely go off into a JVM TI event callback >>> ?? >>>> >>>> We speculatively load the field and check afterwards if we can use this >>> loaded value. >>>> It is safe to use it if there was no safepoint and no JVMTI event was >>> requested. >>>> Otherwise, we simply discard the (possibly) loaded value and load it again >>> in the slow path where we do all the synchronization and event posting. >>> >>> Thanks for clarifying for me. That is all fine then. >>> >>> The dynamics of this still concern me, but those concerns are also >>> present in the existing code. Currently we don't use the quick accessors >>> if JvmtiExport::can_post_field_access() is true during VM startup - this >>> is a one-of initialization check that sets the use of fast accessors for >>> the lifetime of the JVM. But that is set between the early-start and >>> start VM events, before the live-phase. But AFAICS the capability for >>> can_post_field_access can be set or cleared dynamically during the live >>> phase, thus invalidating the original decision on whether to use fast >>> accessors or not. With your changes the state of can_post_field_access >>> is still captured during VM initialization so again the decision to >>> check for a field access watch is hard-wired for the lifetime of the VM. >>> But once installed that check allows for use of the fast-path if no >>> actual watches are set - which is the whole point of this enhancement. >>> So the issue with both old and new code is that if the capability is not >>> present at VM startup the VM will be configured to always use the fast >>> path, even if the capability (and field access watches) are added later. >>> >>> Thanks, >>> David >>> >>>> @Erik: >>>> Thanks for your proposal to change the function pointers. I'll look into >> that. >>>> >>>> Best regards, >>>> Martin >>>> >>>> >>>>> -----Original Message----- >>>>> From: David Holmes >>>>> Sent: Donnerstag, 18. Juli 2019 06:39 >>>>> To: Doerr, Martin ; hotspot-runtime- >>>>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net >>>>> Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field >>> access >>>>> event requests at runtime >>>>> >>>>> Hi Martin, >>>>> >>>>> I need to think about this some more. A critical property of the fast >>>>> field accessors are that they are trivial and completely safe. They are >>>>> complicated by the need to check if a GC may have happened while we >>>>> directly read the field. >>>>> >>>>> If you try to use fast field accessors when you have to post the field >>>>> access event then how can you safely go off into a JVM TI event callback >>> ?? >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> On 16/07/2019 11:31 pm, Doerr, Martin wrote: >>>>>> Hi, >>>>>> >>>>>> the current implementation of FastJNIAccessors ignores the flag - >>>>> XX:+UseFastJNIAccessors when the JVMTI capability >>>>> "can_post_field_access" is enabled. >>>>>> This is an unnecessary restriction which makes field accesses >>>>> (GetField) from native code slower when a JVMTI agent is >>> attached >>>>> which enables this capability. >>>>>> A better implementation would check at runtime if an agent actually >>> wants >>>>> to receive field access events. >>>>>> >>>>>> Note that the bytecode interpreter already uses this better >>>>> implementation by checking if field access watch events were >> requested >>>>> (JvmtiExport::_field_access_count != 0). >>>>>> >>>>>> I have implemented such a runtime check on all platforms which >>> currently >>>>> support FastJNIAccessors. >>>>>> >>>>>> My new jtreg test runtime/jni/FastGetField/FastGetField.java contains >> a >>>>> micro benchmark: >>>>>> test- >>>>> >>> >> support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/Fa >>>>> stGetField/FastGetField.jtr >>>>>> shows the duration of 10000 iterations with and without >>>>> UseFastJNIAccessors (JVMTI agent gets attached in both runs). >>>>>> My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with >>>>> FastJNIAccessors and 11.2ms without it. >>>>>> >>>>>> Webrev: >>>>>> >>>>> >>> >> http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/ >>>>>> >>>>>> We have run the test on 64 bit x86 platforms, SPARC and aarch64. >>>>>> (FastJNIAccessors are not yet available on PPC64 and s390. I'll >> contribute >>>>> them later.) >>>>>> My webrev contains 32 bit implementations for x86 and arm, but >>>>> completely untested. It'd be great if somebody could volunteer to >> review >>>>> and test these platforms. >>>>>> >>>>>> Please review. >>>>>> >>>>>> Best regards, >>>>>> Martin >>>>>> From erik.osterlund at oracle.com Tue Jul 23 07:10:11 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 23 Jul 2019 09:10:11 +0200 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: References: <4dd7dca3-77ea-a682-ad00-e2d6bd75f7ae@oracle.com> Message-ID: Hi Martin, 1) In the x86_64 assembly, you can combine the movl; testl; into a single test instruction with one memory operand to the counter, and one immediate zero. 2) If libjvm.so maps in far away, then the movl taking an ExternalAddress, will actually scratch rscratch1, which is r10. That will clobber the rcounter, and will at best cause all loads to take the slow path. However, in the worst case, the subsequent verification might say that the load was okay, even though it was not. I was secretly hoping to never have to touch fast JNI getfield again, because it is so shady, and the odd cases are very hard to test, making it so easy to mess up. The ForceUnreachable JVM flag might be useful in checking if a solution works also when rscratch1 gets clobbered when referencing JVM symbols that are now "far away". The subtle issue of referencing JVM symbols that can be far away, suddenly clobbering r10, has bitten us many times. Perhaps it should be made more explicit somehow. But that's a separate issue. Also, I noticed that the counter that we are checking if it has changed, is a 32 bit signed integer. They can actually wrap around, which is undefined behaviour at best, and will make these tests fail in the worst case. When we don't want counters to overflow, we use 64 bit integers. Thanks, /Erik On 2019-07-22 17:39, Doerr, Martin wrote: > Hi David and Erik, > > I've tried to add the capability "can_generate_field_access_events" during live phase and got "AddCapabilities failed with error 98" which is "JVMTI_ERROR_NOT_AVAILABLE". So hotspot does not support switching it on during live phase. > > Hotspot initializes "can_generate_field_modification_events" during "init_onload_solo_capabilities". As the name tells, it is implemented as an "onload" capability. > > So the VM works as expected with and without my change. > > Can I add you as reviewers? > If yes, which parts did you review (x86, SPARC, shared code)? > > Thanks and best regards, > Martin > > >> -----Original Message----- >> From: Doerr, Martin >> Sent: Freitag, 19. Juli 2019 13:11 >> To: David Holmes ; hotspot-runtime- >> dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Erik Osterlund >> >> Subject: RE: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access >> event requests at runtime >> >> Hi David, >> >> thanks for elaborating on the capability enablement. >> With respect to "AddCapabilities", I've only found "Typically this function is >> used in the OnLoad function. Some virtual machines may allow a limited set >> of capabilities to be added in the live phase." in the spec [1]. >> I don't know which ones are supposed to be part of this "limited set of >> capabilities". >> As you already explained, adding the capability for field access events in the >> live phase does obviously not work for hotspot. >> The interpreter has the same issue. >> >> Best regards, >> Martin >> >> >> [1] >> https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#AddCapa >> bilities >> >> >>> -----Original Message----- >>> From: David Holmes >>> Sent: Freitag, 19. Juli 2019 02:30 >>> To: Doerr, Martin ; hotspot-runtime- >>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Erik >> Osterlund >>> >>> Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field >> access >>> event requests at runtime >>> >>> Hi Martin, >>> >>> On 18/07/2019 8:01 pm, Doerr, Martin wrote: >>>> Hi David and Erik, >>>> >>>> thank you for looking at my proposal. >>>> >>>>> If you try to use fast field accessors when you have to post the field >>>>> access event then how can you safely go off into a JVM TI event callback >>> ?? >>>> >>>> We speculatively load the field and check afterwards if we can use this >>> loaded value. >>>> It is safe to use it if there was no safepoint and no JVMTI event was >>> requested. >>>> Otherwise, we simply discard the (possibly) loaded value and load it again >>> in the slow path where we do all the synchronization and event posting. >>> >>> Thanks for clarifying for me. That is all fine then. >>> >>> The dynamics of this still concern me, but those concerns are also >>> present in the existing code. Currently we don't use the quick accessors >>> if JvmtiExport::can_post_field_access() is true during VM startup - this >>> is a one-of initialization check that sets the use of fast accessors for >>> the lifetime of the JVM. But that is set between the early-start and >>> start VM events, before the live-phase. But AFAICS the capability for >>> can_post_field_access can be set or cleared dynamically during the live >>> phase, thus invalidating the original decision on whether to use fast >>> accessors or not. With your changes the state of can_post_field_access >>> is still captured during VM initialization so again the decision to >>> check for a field access watch is hard-wired for the lifetime of the VM. >>> But once installed that check allows for use of the fast-path if no >>> actual watches are set - which is the whole point of this enhancement. >>> So the issue with both old and new code is that if the capability is not >>> present at VM startup the VM will be configured to always use the fast >>> path, even if the capability (and field access watches) are added later. >>> >>> Thanks, >>> David >>> >>>> @Erik: >>>> Thanks for your proposal to change the function pointers. I'll look into >> that. >>>> >>>> Best regards, >>>> Martin >>>> >>>> >>>>> -----Original Message----- >>>>> From: David Holmes >>>>> Sent: Donnerstag, 18. Juli 2019 06:39 >>>>> To: Doerr, Martin ; hotspot-runtime- >>>>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net >>>>> Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field >>> access >>>>> event requests at runtime >>>>> >>>>> Hi Martin, >>>>> >>>>> I need to think about this some more. A critical property of the fast >>>>> field accessors are that they are trivial and completely safe. They are >>>>> complicated by the need to check if a GC may have happened while we >>>>> directly read the field. >>>>> >>>>> If you try to use fast field accessors when you have to post the field >>>>> access event then how can you safely go off into a JVM TI event callback >>> ?? >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> On 16/07/2019 11:31 pm, Doerr, Martin wrote: >>>>>> Hi, >>>>>> >>>>>> the current implementation of FastJNIAccessors ignores the flag - >>>>> XX:+UseFastJNIAccessors when the JVMTI capability >>>>> "can_post_field_access" is enabled. >>>>>> This is an unnecessary restriction which makes field accesses >>>>> (GetField) from native code slower when a JVMTI agent is >>> attached >>>>> which enables this capability. >>>>>> A better implementation would check at runtime if an agent actually >>> wants >>>>> to receive field access events. >>>>>> >>>>>> Note that the bytecode interpreter already uses this better >>>>> implementation by checking if field access watch events were >> requested >>>>> (JvmtiExport::_field_access_count != 0). >>>>>> >>>>>> I have implemented such a runtime check on all platforms which >>> currently >>>>> support FastJNIAccessors. >>>>>> >>>>>> My new jtreg test runtime/jni/FastGetField/FastGetField.java contains >> a >>>>> micro benchmark: >>>>>> test- >>>>> >>> >> support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/Fa >>>>> stGetField/FastGetField.jtr >>>>>> shows the duration of 10000 iterations with and without >>>>> UseFastJNIAccessors (JVMTI agent gets attached in both runs). >>>>>> My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with >>>>> FastJNIAccessors and 11.2ms without it. >>>>>> >>>>>> Webrev: >>>>>> >>>>> >>> >> http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/ >>>>>> >>>>>> We have run the test on 64 bit x86 platforms, SPARC and aarch64. >>>>>> (FastJNIAccessors are not yet available on PPC64 and s390. I'll >> contribute >>>>> them later.) >>>>>> My webrev contains 32 bit implementations for x86 and arm, but >>>>> completely untested. It'd be great if somebody could volunteer to >> review >>>>> and test these platforms. >>>>>> >>>>>> Please review. >>>>>> >>>>>> Best regards, >>>>>> Martin >>>>>> From erik.osterlund at oracle.com Tue Jul 23 07:23:23 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Tue, 23 Jul 2019 09:23:23 +0200 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: References: <4dd7dca3-77ea-a682-ad00-e2d6bd75f7ae@oracle.com> Message-ID: ..small clarification on point #1, test against immediate -1 of course, not 0. /Erik > On 23 Jul 2019, at 09:10, Erik ?sterlund wrote: > > Hi Martin, > > 1) In the x86_64 assembly, you can combine the movl; testl; into a single test instruction with one memory operand to the counter, and one immediate zero. > > 2) If libjvm.so maps in far away, then the movl taking an ExternalAddress, will actually scratch rscratch1, which is r10. That will clobber the rcounter, and will at best cause all loads to take the slow path. However, in the worst case, the subsequent verification might say that the load was okay, even though it was not. > > I was secretly hoping to never have to touch fast JNI getfield again, because it is so shady, and the odd cases are very hard to test, making it so easy to mess up. The ForceUnreachable JVM flag might be useful in checking if a solution works also when rscratch1 gets clobbered when referencing JVM symbols that are now "far away". > > The subtle issue of referencing JVM symbols that can be far away, suddenly clobbering r10, has bitten us many times. Perhaps it should be made more explicit somehow. But that's a separate issue. > > Also, I noticed that the counter that we are checking if it has changed, is a 32 bit signed integer. They can actually wrap around, which is undefined behaviour at best, and will make these tests fail in the worst case. When we don't want counters to overflow, we use 64 bit integers. > > Thanks, > /Erik > >> On 2019-07-22 17:39, Doerr, Martin wrote: >> Hi David and Erik, >> I've tried to add the capability "can_generate_field_access_events" during live phase and got "AddCapabilities failed with error 98" which is "JVMTI_ERROR_NOT_AVAILABLE". So hotspot does not support switching it on during live phase. >> Hotspot initializes "can_generate_field_modification_events" during "init_onload_solo_capabilities". As the name tells, it is implemented as an "onload" capability. >> So the VM works as expected with and without my change. >> Can I add you as reviewers? >> If yes, which parts did you review (x86, SPARC, shared code)? >> Thanks and best regards, >> Martin >>> -----Original Message----- >>> From: Doerr, Martin >>> Sent: Freitag, 19. Juli 2019 13:11 >>> To: David Holmes ; hotspot-runtime- >>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Erik Osterlund >>> >>> Subject: RE: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access >>> event requests at runtime >>> >>> Hi David, >>> >>> thanks for elaborating on the capability enablement. >>> With respect to "AddCapabilities", I've only found "Typically this function is >>> used in the OnLoad function. Some virtual machines may allow a limited set >>> of capabilities to be added in the live phase." in the spec [1]. >>> I don't know which ones are supposed to be part of this "limited set of >>> capabilities". >>> As you already explained, adding the capability for field access events in the >>> live phase does obviously not work for hotspot. >>> The interpreter has the same issue. >>> >>> Best regards, >>> Martin >>> >>> >>> [1] >>> https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#AddCapa >>> bilities >>> >>> >>>> -----Original Message----- >>>> From: David Holmes >>>> Sent: Freitag, 19. Juli 2019 02:30 >>>> To: Doerr, Martin ; hotspot-runtime- >>>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Erik >>> Osterlund >>>> >>>> Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field >>> access >>>> event requests at runtime >>>> >>>> Hi Martin, >>>> >>>>> On 18/07/2019 8:01 pm, Doerr, Martin wrote: >>>>> Hi David and Erik, >>>>> >>>>> thank you for looking at my proposal. >>>>> >>>>>> If you try to use fast field accessors when you have to post the field >>>>>> access event then how can you safely go off into a JVM TI event callback >>>> ?? >>>>> >>>>> We speculatively load the field and check afterwards if we can use this >>>> loaded value. >>>>> It is safe to use it if there was no safepoint and no JVMTI event was >>>> requested. >>>>> Otherwise, we simply discard the (possibly) loaded value and load it again >>>> in the slow path where we do all the synchronization and event posting. >>>> >>>> Thanks for clarifying for me. That is all fine then. >>>> >>>> The dynamics of this still concern me, but those concerns are also >>>> present in the existing code. Currently we don't use the quick accessors >>>> if JvmtiExport::can_post_field_access() is true during VM startup - this >>>> is a one-of initialization check that sets the use of fast accessors for >>>> the lifetime of the JVM. But that is set between the early-start and >>>> start VM events, before the live-phase. But AFAICS the capability for >>>> can_post_field_access can be set or cleared dynamically during the live >>>> phase, thus invalidating the original decision on whether to use fast >>>> accessors or not. With your changes the state of can_post_field_access >>>> is still captured during VM initialization so again the decision to >>>> check for a field access watch is hard-wired for the lifetime of the VM. >>>> But once installed that check allows for use of the fast-path if no >>>> actual watches are set - which is the whole point of this enhancement. >>>> So the issue with both old and new code is that if the capability is not >>>> present at VM startup the VM will be configured to always use the fast >>>> path, even if the capability (and field access watches) are added later. >>>> >>>> Thanks, >>>> David >>>> >>>>> @Erik: >>>>> Thanks for your proposal to change the function pointers. I'll look into >>> that. >>>>> >>>>> Best regards, >>>>> Martin >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes >>>>>> Sent: Donnerstag, 18. Juli 2019 06:39 >>>>>> To: Doerr, Martin ; hotspot-runtime- >>>>>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net >>>>>> Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field >>>> access >>>>>> event requests at runtime >>>>>> >>>>>> Hi Martin, >>>>>> >>>>>> I need to think about this some more. A critical property of the fast >>>>>> field accessors are that they are trivial and completely safe. They are >>>>>> complicated by the need to check if a GC may have happened while we >>>>>> directly read the field. >>>>>> >>>>>> If you try to use fast field accessors when you have to post the field >>>>>> access event then how can you safely go off into a JVM TI event callback >>>> ?? >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>>> On 16/07/2019 11:31 pm, Doerr, Martin wrote: >>>>>>> Hi, >>>>>>> >>>>>>> the current implementation of FastJNIAccessors ignores the flag - >>>>>> XX:+UseFastJNIAccessors when the JVMTI capability >>>>>> "can_post_field_access" is enabled. >>>>>>> This is an unnecessary restriction which makes field accesses >>>>>> (GetField) from native code slower when a JVMTI agent is >>>> attached >>>>>> which enables this capability. >>>>>>> A better implementation would check at runtime if an agent actually >>>> wants >>>>>> to receive field access events. >>>>>>> >>>>>>> Note that the bytecode interpreter already uses this better >>>>>> implementation by checking if field access watch events were >>> requested >>>>>> (JvmtiExport::_field_access_count != 0). >>>>>>> >>>>>>> I have implemented such a runtime check on all platforms which >>>> currently >>>>>> support FastJNIAccessors. >>>>>>> >>>>>>> My new jtreg test runtime/jni/FastGetField/FastGetField.java contains >>> a >>>>>> micro benchmark: >>>>>>> test- >>>>>> >>>> >>> support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/Fa >>>>>> stGetField/FastGetField.jtr >>>>>>> shows the duration of 10000 iterations with and without >>>>>> UseFastJNIAccessors (JVMTI agent gets attached in both runs). >>>>>>> My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with >>>>>> FastJNIAccessors and 11.2ms without it. >>>>>>> >>>>>>> Webrev: >>>>>>> >>>>>> >>>> >>> http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/ >>>>>>> >>>>>>> We have run the test on 64 bit x86 platforms, SPARC and aarch64. >>>>>>> (FastJNIAccessors are not yet available on PPC64 and s390. I'll >>> contribute >>>>>> them later.) >>>>>>> My webrev contains 32 bit implementations for x86 and arm, but >>>>>> completely untested. It'd be great if somebody could volunteer to >>> review >>>>>> and test these platforms. >>>>>>> >>>>>>> Please review. >>>>>>> >>>>>>> Best regards, >>>>>>> Martin >>>>>>> From david.holmes at oracle.com Tue Jul 23 07:34:49 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 23 Jul 2019 17:34:49 +1000 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: References: <4dd7dca3-77ea-a682-ad00-e2d6bd75f7ae@oracle.com> Message-ID: <01de0dff-8a6f-8374-b373-fb31c935c7ae@oracle.com> Hi Erik, On 23/07/2019 5:10 pm, Erik ?sterlund wrote: > Hi Martin, > > 1) In the x86_64 assembly, you can combine the movl; testl; into a > single test instruction with one memory operand to the counter, and one > immediate zero. > > 2) If libjvm.so maps in far away, then the movl taking an > ExternalAddress, will actually scratch rscratch1, which is r10. That > will clobber the rcounter, and will at best cause all loads to take the > slow path. However, in the worst case, the subsequent verification might > say that the load was okay, even though it was not. > > I was secretly hoping to never have to touch fast JNI getfield again, > because it is so shady, and the odd cases are very hard to test, making > it so easy to mess up. The ForceUnreachable JVM flag might be useful in > checking if a solution works also when rscratch1 gets clobbered when > referencing JVM symbols that are now "far away". > > The subtle issue of referencing JVM symbols that can be far away, > suddenly clobbering r10, has bitten us many times. Perhaps it should be > made more explicit somehow. But that's a separate issue. Too subtle for me. Is this issue written up anywhere? How do we know what sequences are susceptible to this problem? How do we know when the problem actually occurs? What is the fix? > > Also, I noticed that the counter that we are checking if it has changed, > is a 32 bit signed integer. They can actually wrap around, which is > undefined behaviour at best, and will make these tests fail in the worst > case. When we don't want counters to overflow, we use 64 bit integers. Are you referring to the field-access counter? Changing that from a 32-bit to 64-bit value seems somewhat out-of-scope for the current change, and may also have issues on 32-bit systems. Thanks, David ----- > Thanks, > /Erik > > On 2019-07-22 17:39, Doerr, Martin wrote: >> Hi David and Erik, >> >> I've tried to add the capability "can_generate_field_access_events" >> during live phase and got "AddCapabilities failed with error 98" which >> is "JVMTI_ERROR_NOT_AVAILABLE". So hotspot does not support switching >> it on during live phase. >> >> Hotspot initializes "can_generate_field_modification_events" during >> "init_onload_solo_capabilities". As the name tells, it is implemented >> as an "onload" capability. >> >> So the VM works as expected with and without my change. >> >> Can I add you as reviewers? >> If yes, which parts did you review (x86, SPARC, shared code)? >> >> Thanks and best regards, >> Martin >> >> >>> -----Original Message----- >>> From: Doerr, Martin >>> Sent: Freitag, 19. Juli 2019 13:11 >>> To: David Holmes ; hotspot-runtime- >>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Erik >>> Osterlund >>> >>> Subject: RE: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field >>> access >>> event requests at runtime >>> >>> Hi David, >>> >>> thanks for elaborating on the capability enablement. >>> With respect to "AddCapabilities", I've only found "Typically this >>> function is >>> used in the OnLoad function. Some virtual machines may allow a >>> limited set >>> of capabilities to be added in the live phase." in the spec [1]. >>> I don't know which ones are supposed to be part of this "limited set of >>> capabilities". >>> As you already explained, adding the capability for field access >>> events in the >>> live phase does obviously not work for hotspot. >>> The interpreter has the same issue. >>> >>> Best regards, >>> Martin >>> >>> >>> [1] >>> https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#AddCapa >>> bilities >>> >>> >>>> -----Original Message----- >>>> From: David Holmes >>>> Sent: Freitag, 19. Juli 2019 02:30 >>>> To: Doerr, Martin ; hotspot-runtime- >>>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Erik >>> Osterlund >>>> >>>> Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field >>> access >>>> event requests at runtime >>>> >>>> Hi Martin, >>>> >>>> On 18/07/2019 8:01 pm, Doerr, Martin wrote: >>>>> Hi David and Erik, >>>>> >>>>> thank you for looking at my proposal. >>>>> >>>>>> If you try to use fast field accessors when you have to post the >>>>>> field >>>>>> access event then how can you safely go off into a JVM TI event >>>>>> callback >>>> ?? >>>>> >>>>> We speculatively load the field and check afterwards if we can use >>>>> this >>>> loaded value. >>>>> It is safe to use it if there was no safepoint and no JVMTI event was >>>> requested. >>>>> Otherwise, we simply discard the (possibly) loaded value and load >>>>> it again >>>> in the slow path where we do all the synchronization and event posting. >>>> >>>> Thanks for clarifying for me. That is all fine then. >>>> >>>> The dynamics of this still concern me, but those concerns are also >>>> present in the existing code. Currently we don't use the quick >>>> accessors >>>> if JvmtiExport::can_post_field_access() is true during VM startup - >>>> this >>>> is a one-of initialization check that sets the use of fast accessors >>>> for >>>> the lifetime of the JVM. But that is set between the early-start and >>>> start VM events, before the live-phase. But AFAICS the capability for >>>> can_post_field_access can be set or cleared dynamically during the live >>>> phase, thus invalidating the original decision on whether to use fast >>>> accessors or not. With your changes the state of can_post_field_access >>>> is still captured during VM initialization so again the decision to >>>> check for a field access watch is hard-wired for the lifetime of the >>>> VM. >>>> But once installed that check allows for use of the fast-path if no >>>> actual watches are set - which is the whole point of this enhancement. >>>> So the issue with both old and new code is that if the capability is >>>> not >>>> present at VM startup the VM will be configured to always use the fast >>>> path, even if the capability (and field access watches) are added >>>> later. >>>> >>>> Thanks, >>>> David >>>> >>>>> @Erik: >>>>> Thanks for your proposal to change the function pointers. I'll look >>>>> into >>> that. >>>>> >>>>> Best regards, >>>>> Martin >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes >>>>>> Sent: Donnerstag, 18. Juli 2019 06:39 >>>>>> To: Doerr, Martin ; hotspot-runtime- >>>>>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net >>>>>> Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field >>>> access >>>>>> event requests at runtime >>>>>> >>>>>> Hi Martin, >>>>>> >>>>>> I need to think about this some more. A critical property of the fast >>>>>> field accessors are that they are trivial and completely safe. >>>>>> They are >>>>>> complicated by the need to check if a GC may have happened while we >>>>>> directly read the field. >>>>>> >>>>>> If you try to use fast field accessors when you have to post the >>>>>> field >>>>>> access event then how can you safely go off into a JVM TI event >>>>>> callback >>>> ?? >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 16/07/2019 11:31 pm, Doerr, Martin wrote: >>>>>>> Hi, >>>>>>> >>>>>>> the current implementation of FastJNIAccessors ignores the flag - >>>>>> XX:+UseFastJNIAccessors when the JVMTI capability >>>>>> "can_post_field_access" is enabled. >>>>>>> This is an unnecessary restriction which makes field accesses >>>>>> (GetField) from native code slower when a JVMTI agent is >>>> attached >>>>>> which enables this capability. >>>>>>> A better implementation would check at runtime if an agent actually >>>> wants >>>>>> to receive field access events. >>>>>>> >>>>>>> Note that the bytecode interpreter already uses this better >>>>>> implementation by checking if field access watch events were >>> requested >>>>>> (JvmtiExport::_field_access_count != 0). >>>>>>> >>>>>>> I have implemented such a runtime check on all platforms which >>>> currently >>>>>> support FastJNIAccessors. >>>>>>> >>>>>>> My new jtreg test runtime/jni/FastGetField/FastGetField.java >>>>>>> contains >>> a >>>>>> micro benchmark: >>>>>>> test- >>>>>> >>>> >>> support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/Fa >>>>>> stGetField/FastGetField.jtr >>>>>>> shows the duration of 10000 iterations with and without >>>>>> UseFastJNIAccessors (JVMTI agent gets attached in both runs). >>>>>>> My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with >>>>>> FastJNIAccessors and 11.2ms without it. >>>>>>> >>>>>>> Webrev: >>>>>>> >>>>>> >>>> >>> http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/ >>>>>>> >>>>>>> We have run the test on 64 bit x86 platforms, SPARC and aarch64. >>>>>>> (FastJNIAccessors are not yet available on PPC64 and s390. I'll >>> contribute >>>>>> them later.) >>>>>>> My webrev contains 32 bit implementations for x86 and arm, but >>>>>> completely untested. It'd be great if somebody could volunteer to >>> review >>>>>> and test these platforms. >>>>>>> >>>>>>> Please review. >>>>>>> >>>>>>> Best regards, >>>>>>> Martin >>>>>>> From erik.osterlund at oracle.com Tue Jul 23 08:13:04 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 23 Jul 2019 10:13:04 +0200 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: <01de0dff-8a6f-8374-b373-fb31c935c7ae@oracle.com> References: <4dd7dca3-77ea-a682-ad00-e2d6bd75f7ae@oracle.com> <01de0dff-8a6f-8374-b373-fb31c935c7ae@oracle.com> Message-ID: Hi David, On 2019-07-23 09:34, David Holmes wrote: > Hi Erik, > > On 23/07/2019 5:10 pm, Erik ?sterlund wrote: >> Hi Martin, >> >> 1) In the x86_64 assembly, you can combine the movl; testl; into a >> single test instruction with one memory operand to the counter, and >> one immediate zero. >> >> 2) If libjvm.so maps in far away, then the movl taking an >> ExternalAddress, will actually scratch rscratch1, which is r10. That >> will clobber the rcounter, and will at best cause all loads to take >> the slow path. However, in the worst case, the subsequent >> verification might say that the load was okay, even though it was not. >> >> I was secretly hoping to never have to touch fast JNI getfield again, >> because it is so shady, and the odd cases are very hard to test, >> making it so easy to mess up. The ForceUnreachable JVM flag might be >> useful in checking if a solution works also when rscratch1 gets >> clobbered when referencing JVM symbols that are now "far away". >> >> The subtle issue of referencing JVM symbols that can be far away, >> suddenly clobbering r10, has bitten us many times. Perhaps it should >> be made more explicit somehow. But that's a separate issue. > > Too subtle for me. Is this issue written up anywhere? How do we know > what sequences are susceptible to this problem? How do we know when > the problem actually occurs? What is the fix? 1) Is this documented: Nope, buried deep in the code, where you least expect to find it. A whole bunch of code asks if Assembler::reachable(AddressLiteral adr), and depending on the answer either perform a non-clobbering or a clobbering variation of the logical instruction. This is precisely what happens with mov32: void MacroAssembler::mov32(AddressLiteral dst, Register src) { ? if (reachable(dst)) { ??? movl(as_Address(dst), src); ? } else { ??? lea(rscratch1, dst); <-------------------------------------- this is what will make things awkward ??? movl(Address(rscratch1, 0), src); ? } } 2) How do we know we are in trouble: Any macro assembler call that takes an ExternalAddress parameter, might have to clobber r10. Rarely, when the stars align, to make sure testing won't catch it. 3) When does it actually occur? When libjvm.so is mapped in further away from the code cache than reachable in a signed integer, e.g. ~2 GB apart in virtual address space. Tends to happen more often on windows it seems. 4) The fix that has been sadly applied all over the VM is to deal with r10 being clobbered across such macro assembler instructions, either by moving it to a place where it may safely be clobbered, or by stashing away r10 across the macro assembler call and then restore it after. And every now and then we forget this implicit side effect and things blow up instead. > >> >> Also, I noticed that the counter that we are checking if it has >> changed, is a 32 bit signed integer. They can actually wrap around, >> which is undefined behaviour at best, and will make these tests fail >> in the worst case. When we don't want counters to overflow, we use 64 >> bit integers. > > Are you referring to the field-access counter? Changing that from a > 32-bit to 64-bit value seems somewhat out-of-scope for the current > change, and may also have issues on 32-bit systems. True. That might be okay the way it is. Thanks, /Erik > Thanks, > David > ----- > >> Thanks, >> /Erik >> >> On 2019-07-22 17:39, Doerr, Martin wrote: >>> Hi David and Erik, >>> >>> I've tried to add the capability "can_generate_field_access_events" >>> during live phase and got "AddCapabilities failed with error 98" >>> which is "JVMTI_ERROR_NOT_AVAILABLE". So hotspot does not support >>> switching it on during live phase. >>> >>> Hotspot initializes "can_generate_field_modification_events" during >>> "init_onload_solo_capabilities". As the name tells, it is >>> implemented as an "onload" capability. >>> >>> So the VM works as expected with and without my change. >>> >>> Can I add you as reviewers? >>> If yes, which parts did you review (x86, SPARC, shared code)? >>> >>> Thanks and best regards, >>> Martin >>> >>> >>>> -----Original Message----- >>>> From: Doerr, Martin >>>> Sent: Freitag, 19. Juli 2019 13:11 >>>> To: David Holmes ; hotspot-runtime- >>>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Erik >>>> Osterlund >>>> >>>> Subject: RE: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI >>>> field access >>>> event requests at runtime >>>> >>>> Hi David, >>>> >>>> thanks for elaborating on the capability enablement. >>>> With respect to "AddCapabilities", I've only found "Typically this >>>> function is >>>> used in the OnLoad function. Some virtual machines may allow a >>>> limited set >>>> of capabilities to be added in the live phase." in the spec [1]. >>>> I don't know which ones are supposed to be part of this "limited >>>> set of >>>> capabilities". >>>> As you already explained, adding the capability for field access >>>> events in the >>>> live phase does obviously not work for hotspot. >>>> The interpreter has the same issue. >>>> >>>> Best regards, >>>> Martin >>>> >>>> >>>> [1] >>>> https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#AddCapa >>>> >>>> bilities >>>> >>>> >>>>> -----Original Message----- >>>>> From: David Holmes >>>>> Sent: Freitag, 19. Juli 2019 02:30 >>>>> To: Doerr, Martin ; hotspot-runtime- >>>>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Erik >>>> Osterlund >>>>> >>>>> Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field >>>> access >>>>> event requests at runtime >>>>> >>>>> Hi Martin, >>>>> >>>>> On 18/07/2019 8:01 pm, Doerr, Martin wrote: >>>>>> Hi David and Erik, >>>>>> >>>>>> thank you for looking at my proposal. >>>>>> >>>>>>> If you try to use fast field accessors when you have to post the >>>>>>> field >>>>>>> access event then how can you safely go off into a JVM TI event >>>>>>> callback >>>>> ?? >>>>>> >>>>>> We speculatively load the field and check afterwards if we can >>>>>> use this >>>>> loaded value. >>>>>> It is safe to use it if there was no safepoint and no JVMTI event >>>>>> was >>>>> requested. >>>>>> Otherwise, we simply discard the (possibly) loaded value and load >>>>>> it again >>>>> in the slow path where we do all the synchronization and event >>>>> posting. >>>>> >>>>> Thanks for clarifying for me. That is all fine then. >>>>> >>>>> The dynamics of this still concern me, but those concerns are also >>>>> present in the existing code. Currently we don't use the quick >>>>> accessors >>>>> if JvmtiExport::can_post_field_access() is true during VM startup >>>>> - this >>>>> is a one-of initialization check that sets the use of fast >>>>> accessors for >>>>> the lifetime of the JVM. But that is set between the early-start and >>>>> start VM events, before the live-phase. But AFAICS the capability for >>>>> can_post_field_access can be set or cleared dynamically during the >>>>> live >>>>> phase, thus invalidating the original decision on whether to use fast >>>>> accessors or not. With your changes the state of >>>>> can_post_field_access >>>>> is still captured during VM initialization so again the decision to >>>>> check for a field access watch is hard-wired for the lifetime of >>>>> the VM. >>>>> But once installed that check allows for use of the fast-path if no >>>>> actual watches are set - which is the whole point of this >>>>> enhancement. >>>>> So the issue with both old and new code is that if the capability >>>>> is not >>>>> present at VM startup the VM will be configured to always use the >>>>> fast >>>>> path, even if the capability (and field access watches) are added >>>>> later. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> @Erik: >>>>>> Thanks for your proposal to change the function pointers. I'll >>>>>> look into >>>> that. >>>>>> >>>>>> Best regards, >>>>>> Martin >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: David Holmes >>>>>>> Sent: Donnerstag, 18. Juli 2019 06:39 >>>>>>> To: Doerr, Martin ; hotspot-runtime- >>>>>>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net >>>>>>> Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI >>>>>>> field >>>>> access >>>>>>> event requests at runtime >>>>>>> >>>>>>> Hi Martin, >>>>>>> >>>>>>> I need to think about this some more. A critical property of the >>>>>>> fast >>>>>>> field accessors are that they are trivial and completely safe. >>>>>>> They are >>>>>>> complicated by the need to check if a GC may have happened while we >>>>>>> directly read the field. >>>>>>> >>>>>>> If you try to use fast field accessors when you have to post the >>>>>>> field >>>>>>> access event then how can you safely go off into a JVM TI event >>>>>>> callback >>>>> ?? >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>> On 16/07/2019 11:31 pm, Doerr, Martin wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> the current implementation of FastJNIAccessors ignores the flag - >>>>>>> XX:+UseFastJNIAccessors when the JVMTI capability >>>>>>> "can_post_field_access" is enabled. >>>>>>>> This is an unnecessary restriction which makes field accesses >>>>>>> (GetField) from native code slower when a JVMTI agent is >>>>> attached >>>>>>> which enables this capability. >>>>>>>> A better implementation would check at runtime if an agent >>>>>>>> actually >>>>> wants >>>>>>> to receive field access events. >>>>>>>> >>>>>>>> Note that the bytecode interpreter already uses this better >>>>>>> implementation by checking if field access watch events were >>>> requested >>>>>>> (JvmtiExport::_field_access_count != 0). >>>>>>>> >>>>>>>> I have implemented such a runtime check on all platforms which >>>>> currently >>>>>>> support FastJNIAccessors. >>>>>>>> >>>>>>>> My new jtreg test runtime/jni/FastGetField/FastGetField.java >>>>>>>> contains >>>> a >>>>>>> micro benchmark: >>>>>>>> test- >>>>>>> >>>>> >>>> support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/Fa >>>> >>>>>>> stGetField/FastGetField.jtr >>>>>>>> shows the duration of 10000 iterations with and without >>>>>>> UseFastJNIAccessors (JVMTI agent gets attached in both runs). >>>>>>>> My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with >>>>>>> FastJNIAccessors and 11.2ms without it. >>>>>>>> >>>>>>>> Webrev: >>>>>>>> >>>>>>> >>>>> >>>> http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/ >>>>>>>> >>>>>>>> We have run the test on 64 bit x86 platforms, SPARC and aarch64. >>>>>>>> (FastJNIAccessors are not yet available on PPC64 and s390. I'll >>>> contribute >>>>>>> them later.) >>>>>>>> My webrev contains 32 bit implementations for x86 and arm, but >>>>>>> completely untested. It'd be great if somebody could volunteer to >>>> review >>>>>>> and test these platforms. >>>>>>>> >>>>>>>> Please review. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Martin >>>>>>>> From ralf.schmelter at sap.com Tue Jul 23 10:11:23 2019 From: ralf.schmelter at sap.com (Schmelter, Ralf) Date: Tue, 23 Jul 2019 10:11:23 +0000 Subject: RFR (M) 8191521: handle long relative path specified in -Xbootclasspath/a on windows In-Reply-To: References: Message-ID: Hi David, the problems fixed with this change only exist in the hotspot code, not the JDK code, which is much different. The new code Is much closer to the JDK code (see pathToNTPath() in io_util_md.cpp). It always uses Unicode and _wfullpath() to normalize the path, while the old code never used _wfullpath(). The difference of the new code to the JDK code is mostly the missing optimization for small paths which would not need to be prefixed with \\?. We always call (except for path already prefixed with \\?) _wfullpath(), which will make a path absolute (if not already) and resolves all .. and . parts, so the result is safe to be prefixed with \\?. Best regards, Ralf From martin.doerr at sap.com Tue Jul 23 10:29:11 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 23 Jul 2019 10:29:11 +0000 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: References: <4dd7dca3-77ea-a682-ad00-e2d6bd75f7ae@oracle.com> Message-ID: Hi David and Erik, thank you for reviewing and for your very valuable feedback. > 1) In the x86_64 assembly, you can combine the movl; testl; into a single > test instruction with one memory operand to the counter, and one > immediate zero. Thanks for the hint. I'm using cmp32 in my new webrev. > 2) If libjvm.so maps in far away, then the movl taking an ExternalAddress, > will actually scratch rscratch1, which is r10. Good catch! I've exchanged registers and added assert_different_registers. > I was secretly hoping to never have to touch fast JNI getfield again, > because it is so shady, and the odd cases are very hard to test, making it so > easy to mess up. The ForceUnreachable JVM flag might be useful in checking > if a solution works also when rscratch1 gets clobbered when referencing JVM > symbols that are now "far away". I've also changed the test to run with -XX:+ForceUnreachable and -XX:+SafepointALot to hit more corner cases. But as you explained, the test would normally not notice the destroyed counter and just execute the slow path. > The subtle issue of referencing JVM symbols that can be far away, > suddenly clobbering r10, has bitten us many times. Perhaps it should be > made more explicit somehow. It would be possible to explicitly kill r10 in all such assembler instructions in the dbg build, but that'd come with an overhead. > But that's a separate issue. Agreed. > Also, I noticed that the counter that we are checking if it has changed, is a > 32 bit signed integer. They can actually wrap around, which is undefined > behaviour at best, and will make these tests fail in the worst case. When we > don't want counters to overflow, we use 64 bit integers. We could also make it unsigned to get defined behavior, but that's out of scope here. New webrev: http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.01/ Best regards, Martin From kuaiwei.kw at alibaba-inc.com Tue Jul 23 01:46:43 2019 From: kuaiwei.kw at alibaba-inc.com (Kuai Wei) Date: Tue, 23 Jul 2019 09:46:43 +0800 Subject: =?UTF-8?B?UmU6IFJGQzogSldhcm11cCBwcmVjb21waWxlIGphdmEgaG90IG1ldGhvZHMgYXQgYXBwbGlj?= =?UTF-8?B?YXRpb24gc3RhcnR1cA==?= In-Reply-To: <6c74ebb1-1292-43ae-86d2-9c8be14af0e9@oracle.com> References: <8cfbaa83-c50f-61c4-5336-5f30b3885d45@oracle.com> <26f88253-deea-64d5-714c-28bb73989c62@oracle.com> <40bd126f-ca71-4a71-8fda-552cf8f289ad.kuaiwei.kw@alibaba-inc.com>, <6c74ebb1-1292-43ae-86d2-9c8be14af0e9@oracle.com> Message-ID: Hi David, Thanks for the clarification. We will update wiki and patch for next round review. Kuai Wei ------------------------------------------------------------------ From:David Holmes Send Time:2019?7?22?(???) 14:49 To:??(??) ; yumin qi ; hotspot-runtime-dev at openjdk.java.net Cc:hotspot-dev Subject:Re: RFC: JWarmup precompile java hot methods at application startup Hi Kuai, On 21/06/2019 5:18 pm, Kuai Wei wrote: > Hi David, > > Sorry for the late reply. Sorry for my even later one. I was traveling and then had vacation, and have had other things to look at. > We plan to create a wiki page on OpenJDK website and put the design documents there. How do you think about it? That sounds like a good idea. > Here are the answers to some questions in your last message: I haven't had time to context switch in everything sorry. So just a couple of responses below. > - "source file" in JWarmUp record: > Application will load same class from multiple places. For example, logging jar will be packaged by different web apps. So we record this property. > > - super class resolution > It's used for diagnostic. Same class loaded by different loaders will cause a warning message in PreloadClassChain::record_loaded_class(). We use the super class resolve mark to reduce warning messages when resolving super class. We are thinking to refine it. If "refine" means remove then I encourage your thinking :) > > - dummy method > It's hard to know whether JWarmUp compilations are completed or not. The dummy method is used as the last method compiled due to JWarmUp. We are able to check its entry to see whether all compilations are finished. > > - native entry in jvm.cpp > JWarmUp defined some jvm entries which are invoked by java. We assume all jvm entries are put into jvm.cpp. Would you give us some reference we can follow? jvm.cpp contains the definitions of the JVM entry point methods but it doesn't (as you can see in existing file) contain code for registering those methods: #define CC (char*) static JNINativeMethod jdk_jwarmup_JWarmUp_methods[] = { { CC "notifyApplicationStartUpIsDone0", CC "()V", (void *)&JVM_NotifyApplicationStartUpIsDone}, { CC "checkIfCompilationIsComplete0", CC "()Z", (void *)&JVM_CheckJWarmUpCompilationIsComplete}, { CC "notifyJVMDeoptWarmUpMethods0", CC "()V", (void *)&JVM_NotifyJVMDeoptWarmUpMethods} }; JVM_ENTRY(void, JVM_RegisterJWarmUpMethods(JNIEnv *env, jclass jwarmupclass)) JVMWrapper("JVM_RegisterJWarmUpMethods"); ThreadToNativeFromVM ttnfv(thread); // can't be in VM when we call JNI int ok = env->RegisterNatives(jwarmupclass, jdk_jwarmup_JWarmUp_methods, sizeof(jdk_jwarmup_JWarmUp_methods)/sizeof(JNINativeMethod)); guarantee(ok == 0, "register jdk.jwarmup.JWarmUp natives"); JVM_END #undef CC That is typically done by the C code in the JDK. See for example src/java.base/share/native/libjava/System.c > - logging flags > JWarmUp was initially developed for JDK8. A flag was used to print out trace. When we ported the patch to JDK tip, we changed code to use the new log utility but with the legacy flag kept. Please remove legacy flag. > - VM flags > We will check and remove unnecessary flags. Thank you. > - init.cpp and mutex initialization > We will modify that. > > - Deoptimization change > I'm not clear about that. Would you like to provide more details? We will check the impact on our patch. I forget the exact context now sorry. If you've rebased to latest code and everything builds and runs then that should suffice. I have a lot of general concerns about the impact of this work on various areas of the JVM. It really needs to be as unobtrusive as possible and ideally causing no changes to executed code unless enabled. Potentially/possibly it might even need to be selectable at build-time, as to whether this feature is included. And I apologise in advance because I don't have a lot of time to deep dive into all the details of this proposed feature. Thanks, David ----- > Thanks, > Kuai Wei > > > > > ------------------------------------------------------------------ > From:David Holmes > Send Time:2019?6?10?(???) 15:18 > To:yumin qi ; hotspot-runtim. > Cc:hotspot-dev > Subject:Re: RFC: JWarmup precompile java hot methods at application startup > > Hi Yumin, > > On 8/06/2019 3:25 am, yumin qi wrote: >> Hi, David and all >> >> Can I have one more comment from runtime expert for the JEP? >> David, can you comment for the changes? Really appreciate your last >> comment. It is best if you follow the comment. >> Looking forward to having your comment. > > I still have a lot of trouble understanding the overall design here. The > JEP is very high-level; the webrev is very low-level; and there's > nothing in between to explain the details of the design - the kind of > document you would produce for a design review/walkthrough. For example > I can't see why you need to record the "source file"; I can't see why > you need to make changes to the superclass resolution. I can't tell when > changes outside of Jwarmup may need to make changes to the Jwarmup code > - the dependencies are unclear. I'm unclear on the role of the "dummy > method" - is it just a sentinel? Why do we need it versus using some > state in the JitWarmup instance? > > Some further code comments, but not a file by file review by any means ... > > The code split between the JDK and JVM doesn't seem quite right to me. > registerNatives is something usually done by the JDK .c files > corresponding to the classes defining the native method; it's not > something done in jvm.cpp. Or if this is meant to be a special case like > JVM_RegisterMethodHandleMethods then probably it should be in the > jwarmup.cpp file. Also if you pass the necessary objects through the API > you won't need to jump back to native to call a JNI function. > > AliasedLoggingFlags are for converting legacy flags to unified logging. > You should just be using UL and not introducing the > PrintCompilationWarmUpDetail psuedo-flag. > > This work introduces thirteen new VM flags! That's very excessive. > Perhaps you should look at defining something more like -Xlog that > encodes all the options? (And this will need a very lengthy CSR request!). > > The init.cpp code should be factored out into jwarmup_init functions in > jwarmup.cpp. > > Mutex initialization should be conditional on jwarmup being enabled. > > Deoptimization has been changed lately to avoid use of safepoints so you > may need to re-examine that aspect. > > You have a number of uses of patterns like this (but not everywhere): > > + JitWarmUp* jwp = JitWarmUp::instance(); > + assert(jwp != NULL, "sanity check"); > + jwp->preloader()->jvm_booted_is_done(); > > The assertion should be inside instance() so that these collapse to a > single line: > > JitWarmup::instance()->preloader->whatever(); > > Your Java files have @version 1.8. > > --- > > Cheers, > David > ----- > > > >> Thanks >> Yumin >> >> On Sun, May 19, 2019 at 10:28 AM yumin qi > > wrote: >> >> Hi, Tobias and all >> Have done changes based on Tobias' comments. New webrev based on >> most recent base is updated at: >> http://cr.openjdk.java.net/~minqi/8220692/webrev-03/ >> >> Tested local for jwarmup and compiler. >> >> Thanks >> Yumin >> >> On Tue, May 14, 2019 at 11:26 AM yumin qi > > wrote: >> >> HI, Tobias >> >> Thanks very much for the comments. >> >> On Mon, May 13, 2019 at 2:58 AM Tobias Hartmann >> > >> wrote: >> >> Hi Yumin, >> >> > In this version, the profiled method data is not used at >> > precomilation, it will be addressed in followed bug fix. >> After the >> > first version integrated, will file bug for it. >> >> Why is that? I think it would be good to have everything in >> one JEP. >> >> >> We have done some tests on adding profiling data and found the >> result is not as expected, and the current version is working >> well for internal online applications. There is no other reason >> not adding to this patch now, we will like to study further to >> see if we can improve that for a better performance. >> >> I've looked at the compiler related changes. Here are some >> comments/questions. >> >> ciMethod.cpp >> - So CompilationWarmUp is not using any profile information? >> Not even the profile obtained in the >> current execution? >> >> >> Yes. This is also related to previous question. >> >> compile.cpp >> - line 748: Why is that required? Couldn't it happen that a >> method is never compiled because the >> code that would resolve a field is never executed? >> >> >> Here a very aggressive decision --- to avoid compilation failure >> requires that all fields have already been resolved. >> >> graphKit.cpp >> - line 2832: please add a comment >> - line 2917: checks should be merged into one if and please >> add a comment >> >> >> Will fix it. >> >> jvm.cpp >> - Could you explain why it's guaranteed that warmup >> compilation is completed once the dummy method >> is compiled? And why is it hardcoded to print >> "com.alibaba.jwarmup.JWarmUp"? >> >> >> This is from practical testing of real applications. Due to the >> parallelism of compilation works, it should check if >> compilation queue contains any of those methods --- completed if >> no any of them on the queue and it is not economic. By using of >> a dummy method as a simplified version for that, in real case, >> it is not observed that dummy method is not the last compilation >> for warmup. Do you have suggestion of a way to do that? The >> dummy way is not strictly a guaranteed one theoretically. >> Forgot to change the print to new name after renaming package, >> thanks for the catching. >> >> - What is test_symbol_matcher() used for? >> >> >> This is a leftover(used to test matching patterns), will remove >> it from the file. >> >> jitWarmUp.cpp: >> - line 146: So what about methods that are only ever >> compiled at C1 level? Wouldn't it make sense to >> keep track of the comp_level during CompilationWarmUpRecording? >> >> >> Will consider your suggestion in future work on it. >> >> I also found several typos while reading through the code >> (listed in random order): >> >> globals.hpp >> - "flushing profling" -> "flushing profiling" >> >> method.hpp >> - "when this method first been invoked" >> >> templateInterpreterGenerator_x86.cpp >> - initializition -> initialization >> >> dict.cpp >> - initializated -> initialized >> >> jitWarmUp.cpp >> - uninitilaized -> uninitialized >> - inited -> should be initialized, right? >> >> jitWarmUp.hpp >> . nofityApplicationStartUpIsDone -> >> notifyApplicationStartUpIsDone >> >> constantPool.cpp >> - recusive -> recursive >> >> JWarmUp.java >> - appliacation -> application >> >> TestThrowInitializaitonException.java -> >> TestThrowInitializationException.java >> >> These tests should be renamed (it's not clear what issue the >> number refers to): >> - issue9780156.sh >> - Issue11272598.java >> >> >> Will fix all above suggestions. >> >> Thanks! >> >> Yumin >> > From coleen.phillimore at oracle.com Tue Jul 23 11:59:24 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 23 Jul 2019 07:59:24 -0400 Subject: RFR (S) 8228485: JVM crashes when bootstrap method for condy triggers loading of class whose static initializer throws exception Message-ID: <3059f5a9-d111-f41b-259b-f9e9975385dd@oracle.com> Summary: Add case for JVM_CONSTANT_Dynamic in error_message function. Tested locally with hotspot and java/lang/invoke condy tests and new test which exercises the code.? This might be needed for jdk 13. open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228485.01/webrev bug link https://bugs.openjdk.java.net/browse/JDK-8228485 Thanks, Coleen From erik.osterlund at oracle.com Tue Jul 23 12:49:41 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 23 Jul 2019 14:49:41 +0200 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: References: <4dd7dca3-77ea-a682-ad00-e2d6bd75f7ae@oracle.com> Message-ID: Hi Martin, The new webrev looks good. Note though the following though... it looks like the AArch64 code doesn't do appropriate fencing if the field is volatile. The normal JNI accessor goes through thread transitions causing the following semantics: fence() load fence() Which is more than enough for a volatile field load. However, with JNI fast get field... it is insufficient. Thanks, /Erik On 2019-07-23 12:29, Doerr, Martin wrote: > Hi David and Erik, > > thank you for reviewing and for your very valuable feedback. > >> 1) In the x86_64 assembly, you can combine the movl; testl; into a single >> test instruction with one memory operand to the counter, and one >> immediate zero. > Thanks for the hint. I'm using cmp32 in my new webrev. > >> 2) If libjvm.so maps in far away, then the movl taking an ExternalAddress, >> will actually scratch rscratch1, which is r10. > Good catch! I've exchanged registers and added assert_different_registers. > >> I was secretly hoping to never have to touch fast JNI getfield again, >> because it is so shady, and the odd cases are very hard to test, making it so >> easy to mess up. The ForceUnreachable JVM flag might be useful in checking >> if a solution works also when rscratch1 gets clobbered when referencing JVM >> symbols that are now "far away". > I've also changed the test to run with -XX:+ForceUnreachable and -XX:+SafepointALot to hit more corner cases. > But as you explained, the test would normally not notice the destroyed counter and just execute the slow path. > >> The subtle issue of referencing JVM symbols that can be far away, >> suddenly clobbering r10, has bitten us many times. Perhaps it should be >> made more explicit somehow. > It would be possible to explicitly kill r10 in all such assembler instructions in the dbg build, but that'd come with an overhead. > >> But that's a separate issue. > Agreed. > >> Also, I noticed that the counter that we are checking if it has changed, is a >> 32 bit signed integer. They can actually wrap around, which is undefined >> behaviour at best, and will make these tests fail in the worst case. When we >> don't want counters to overflow, we use 64 bit integers. > We could also make it unsigned to get defined behavior, but that's out of scope here. > > New webrev: > http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.01/ > > Best regards, > Martin > From david.holmes at oracle.com Tue Jul 23 13:15:12 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 23 Jul 2019 23:15:12 +1000 Subject: RFR (S) 8228485: JVM crashes when bootstrap method for condy triggers loading of class whose static initializer throws exception In-Reply-To: <3059f5a9-d111-f41b-259b-f9e9975385dd@oracle.com> References: <3059f5a9-d111-f41b-259b-f9e9975385dd@oracle.com> Message-ID: Hi Coleen, On 23/07/2019 9:59 pm, coleen.phillimore at oracle.com wrote: > Summary: Add case for JVM_CONSTANT_Dynamic in error_message function. Fix seems reasonable. Begs the question as to whether there are other missing cases for this code? Can this manifest with regular Java sources? I'm unclear exactly how this is being manifested in the jasm file. Thanks, David > Tested locally with hotspot and java/lang/invoke condy tests and new > test which exercises the code.? This might be needed for jdk 13. > > open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228485.01/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8228485 > > Thanks, > Coleen From bob.vandette at oracle.com Tue Jul 23 13:49:44 2019 From: bob.vandette at oracle.com (Bob Vandette) Date: Tue, 23 Jul 2019 09:49:44 -0400 Subject: RFR(S): 8227122: [TESTBUG] Create Docker sidecar test cases In-Reply-To: <79e78a82-4352-59b7-0a72-16bb18549868@oracle.com> References: <11c03bdc-832b-3479-075b-1a8b5dc95e87@oracle.com> <47b20d2a-02b9-33b8-b2e4-6022f809e139@oracle.com> <79e78a82-4352-59b7-0a72-16bb18549868@oracle.com> Message-ID: <4FD09288-8F51-49E9-A553-AE3BED5F0941@oracle.com> The updated changes look fine to me. Just a nit that you can do with as you wish. Did you really mean to have this System.out in production or is this debug code? System.out.println("The main container has not started yet, count = " + i); Bob. > On Jul 16, 2019, at 12:48 PM, mikhailo.seledtsov at oracle.com wrote: > > Hi Severin, Bob, > > Here is an updated webrev that should address all of your feedback: > > http://cr.openjdk.java.net/~mseledtsov/8227122.01/ > > To summarize the changes since webrev 00: > > - using 'docker ps' to wait until the "main" container starts > > - removed use of --ipc=shareable (not needed) > > - added comments regarding sharing of /tmp > > - using docker volumes ("--volumes-from") to share /tmp between the containers instead of mapping to host directories (avoids potential access/permission problems) > > - few other minor changes and cleanup > > Testing: > > ran this test on OL 7.6 and on variety of Linux nodes in the lab, a number of times - all PASS > > > See more of comments inline below > > On 7/12/19 2:12 AM, Severin Gehwolf wrote: >> Hi Misha, >> >> On Thu, 2019-07-11 at 17:58 -0700, mikhailo.seledtsov at oracle.com wrote: >>> Hi Severin, >>> >>> Thank you for taking a look at this change. >>> >>> On 7/10/19 10:40 AM, Severin Gehwolf wrote: >>>> Hi Misha, >>>> >>>> On Tue, 2019-07-02 at 15:24 -0700, mikhailo.seledtsov at oracle.com wrote: >>>>> Please review this new test that uses a Docker sidecar pattern to >>>>> manage/monitor JVM running in the main payload container. >>>>> >>>>> Sidecar is a common pattern used in the cloud environments for >>>>> monitoring among other uses. In side car pattern the main >>>>> application/service container that runs the payload is paired with a >>>>> sidecar container. It is achieved by sharing certain namespace >>>>> aspects >>>>> between the two containers such as PID namespace, specific >>>>> sub-directories, IPC and more. >>>>> >>>>> This test implements the following cases: >>>>> - "jcmd -l" to list java processes running in "main" container >>>>> from >>>>> the "sidecar" container >>>>> - "jhsdb jinfo" in the sidecar configuration >>>>> - jcmd >>>>> >>>>> This change also builds a basis for more test cases in the future. >>>>> >>>>> Minor changes were done to DockerTestUtils: >>>>> - changing access to DOCKER_COMMAND constant to public >>>>> - minor spelling and terminology corrections >>>>> >>>>> >>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8227122 >>>>> Webrev: http://cr.openjdk.java.net/~mseledtsov/8227122.00/ >>>>> Testing: >>>>> 1. ran Docker tests on Linux-x64 - PASS >>>>> 2. Running Docker tests in test cluster - in progress >>>>> >>>> // JCMD does not work in sidecar configuration, except for "jcmd -l". >>>> // Including this test case to assist in reproduction of the problem. >>>> // t.assertIsAlive(); >>>> // testCase03(mainProcPid); >>>> >>>> FWIW, "jcmd -l" doesn't work in this case either. It only sees itself >>>> as far as I can tell. >>> In my experiment it does work. Here are parts of the test log, first the >>> command that runs jcmd in a sidecar container, then the output of that >>> container: >>> >>> """ >>> >>> [COMMAND] >>> >>> /usr/local/bin/docker run --tty=true --rm --cap-add=SYS_PTRACE >>> --sig-proxy=true --pid=container:test-container-main >>> --ipc=container:test-container-main -v >>> /ws/playArea/sidecar-jcmd-8227122/JTwork/scratch/.:/tmp/ >>> jdk-internal:test-jfr-jcmd /jdk/bin/jcmd -l >>> >>> [2019-07-12T00:26:29.083764Z] Gathering output for process 8703 >>> [ELAPSED: 5 ms] >>> [STDERR] >>> >>> [STDOUT] >>> 1 EventGeneratorLoop 15 >>> 23 jdk.jcmd/sun.tools.jcmd.JCmd -l >>> >>> """ >>> >>> The output shows 2 processes, one is EventGeneratorLoop with PID of 1 >>> (as expected). This is possible because the containers share certain >>> namespaces and mounted volumes in a 'sidecar' configuration. In this >>> case, containers share the PID namespace >>> (--pid=container:test-container-main) and share volumes mounted as >>> "/tmp" inside the container (-v >>> /ws/playArea/sidecar-jcmd-8227122/JTwork/scratch/.:/tmp/) >>> >> Right, sorry. Perhaps this code should get a comment that sharing /tmp >> between sidecar and host container is needed for jvmstat - used >> internally by the attach mechanism - to work. See >> HotSpotAttachProvider.testAttachable(): >> >> + String[] command = new String[] { >> + DockerTestUtils.DOCKER_COMMAND, "run", >> + "--tty=true", "--rm", >> + "--cap-add=SYS_PTRACE", "--sig-proxy=true", >> + "--pid=container:" + MAIN_CONTAINER_NAME, >> + "--pid=container:" + MAIN_CONTAINER_NAME, >> + "--ipc=container:" + MAIN_CONTAINER_NAME, >> + "-v", WORK_DIR + ":/tmp/", >> >> I believe -XX:+UsePerfData would be in order too as I don't think >> things would work if that default changed. > I have added the comments and added -XX:+UsePerfData for the "main" JVM process. >> >>>> What's more, this seems to be a case of AttachListener::is_init_trigger[1] and >>>> VirtualMachineImpl.createAttachFile[2] disagreeing. The former looks in >>>> $(pwd)/.attach_pid or /tmp/.attach_pid and the latter creates >>>> it in /proc//root/tmp/.attach_pid. >> This seems to be the cause for why testCase03 doesn't work. Perhaps >> this deserves a bug and I can help fix it. >> >> While looking at that, I discovered what I said below, which is a >> different case I know. > Once these tests are integrated I will file a bug, and can reference the test from that bug. >> >>>> There seems to be more issues involved. As attaching to a JVM inside a >>>> container doesn't seem to work from outside which is supposed to be >>>> fixed with JDK-8179498. That alone seems to warrant a bug. >>> You are describing a slightly different use case / pattern, but I agree >>> it does not seem to work. I am happy to hear confirmation of that. >> I was pointing out that JDK-8179498 seems to have regressed. It's >> unrelated but should be taken into account when fixing the above issue. >> >>> The pattern addressed in this test is a side car, where both the >>> observer and observee run in containers; the containers are 'friendly' >>> by sharing certain apsects of namespaces. >> Yes. >> >>> The use case you are describing is somewhat different, if I understand >>> correctly: the observer runs on a host machine, and obsrvee runs in a >>> container. Observer tries to use jcmd to list the java processes running >>> in container(s), and issue commands, but that fails. I can create a bug >>> for that, and a simple test case. >> There should be a bug and a test so that it cannot again regress. >> >> JDK-8193710 is also related, but the fix for that bug didn't have a >> test either :( That's this one which needs fixing: >> https://bugs.openjdk.java.net/browse/JDK-8195809 > > JDK-8195809: [TESTBUG] Create tests for JDK-8193710 jps and jcmd -l support for Docker containers > > I have assigned it to myself, and will be working on it soon. > > > Thank you, > > Misha > >> >>>> private static DockerThread startMainContainer() throws Exception { >>>> // start "main" container (the observee) >>>> DockerRunOptions opts = commonDockerOpts("EventGeneratorLoop"); >>>> opts.addDockerOpts("--cap-add=SYS_PTRACE", "--ipc=shareable") >>>> >>>> Is '--ipc=shareable' really needed? It's not a supported option for my >>>> docker here :-( >>> I have removed the '--ipc=shareable' and the test still works. I think >>> this is extra stuff that is not necessary for this test case, so I will >>> remove it. >> Excellent! >> >>> I will incorporate changes from your and Bob's review, run some testing, >>> and post an updated webrev. >> Thanks, >> Severin >> >>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/ba72dac556c3/src/hotspot/os/linux/attachListener_linux.cpp#l500 >>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/ba72dac556c3/src/jdk.attach/linux/classes/sun/tools/attach/VirtualMachineImpl.java#l295 >>>> From coleen.phillimore at oracle.com Tue Jul 23 15:14:40 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 23 Jul 2019 11:14:40 -0400 Subject: RFR (S) 8228485: JVM crashes when bootstrap method for condy triggers loading of class whose static initializer throws exception In-Reply-To: References: <3059f5a9-d111-f41b-259b-f9e9975385dd@oracle.com> Message-ID: <7e1ba972-16d0-7095-3f4b-c11b17098a60@oracle.com> On 7/23/19 9:15 AM, David Holmes wrote: > Hi Coleen, > > On 23/07/2019 9:59 pm, coleen.phillimore at oracle.com wrote: >> Summary: Add case for JVM_CONSTANT_Dynamic in error_message function. > > Fix seems reasonable. Begs the question as to whether there are other > missing cases for this code? No, there are not currently.? Only these constant pool tags use save_and_throw_exception, in order to throw the same exception for each time the cp entry is resolved with an error: jbyte constantTag::error_value() const { ? switch (_tag) { ? case JVM_CONSTANT_UnresolvedClass: ??? return JVM_CONSTANT_UnresolvedClassInError; ? case JVM_CONSTANT_MethodHandle: ??? return JVM_CONSTANT_MethodHandleInError; ? case JVM_CONSTANT_MethodType: ??? return JVM_CONSTANT_MethodTypeInError; ? case JVM_CONSTANT_Dynamic: ??? return JVM_CONSTANT_DynamicInError; ? default: ??? ShouldNotReachHere(); ??? return JVM_CONSTANT_Invalid; ? } } > > Can this manifest with regular Java sources? I'm unclear exactly how > this is being manifested in the jasm file. The jasm file is to force initialization of the static class inside the Condy expression.? Otherwise it may be initialized outside and not hit the error. thanks, Coleen > > Thanks, > David > >> Tested locally with hotspot and java/lang/invoke condy tests and new >> test which exercises the code.? This might be needed for jdk 13. >> >> open webrev at >> http://cr.openjdk.java.net/~coleenp/2019/8228485.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8228485 >> >> Thanks, >> Coleen From mikhailo.seledtsov at oracle.com Tue Jul 23 16:13:28 2019 From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com) Date: Tue, 23 Jul 2019 09:13:28 -0700 Subject: RFR(S): 8227122: [TESTBUG] Create Docker sidecar test cases In-Reply-To: <4FD09288-8F51-49E9-A553-AE3BED5F0941@oracle.com> References: <11c03bdc-832b-3479-075b-1a8b5dc95e87@oracle.com> <47b20d2a-02b9-33b8-b2e4-6022f809e139@oracle.com> <79e78a82-4352-59b7-0a72-16bb18549868@oracle.com> <4FD09288-8F51-49E9-A553-AE3BED5F0941@oracle.com> Message-ID: <3b27ceca-cc28-f898-2d96-44a0b9f5a113@oracle.com> Thank you Bob. On 7/23/19 6:49 AM, Bob Vandette wrote: > The updated changes look fine to me. > > Just a nit that you can do with as you wish. > > Did you really mean to have this System.out in production or is this > debug code? > System.out.println("The main container has not started yet, count = " + i); I will remove this before push. Thank you, Misha > > Bob. > >> On Jul 16, 2019, at 12:48 PM, mikhailo.seledtsov at oracle.com >> wrote: >> >> Hi Severin, Bob, >> >> ?? Here is an updated webrev that should address all of your feedback: >> >> http://cr.openjdk.java.net/~mseledtsov/8227122.01/ >> >> To summarize the changes since webrev 00: >> >> ???? - using 'docker ps' to wait until the "main" container starts >> >> ???? - removed use of --ipc=shareable (not needed) >> >> ???? - added comments regarding sharing of /tmp >> >> ???? - using docker volumes ("--volumes-from") to share /tmp between >> the containers instead of mapping to host directories (avoids >> potential access/permission problems) >> >> ???? - few other minor changes and cleanup >> >> Testing: >> >> ???? ran this test on OL 7.6 and on variety of Linux nodes in the >> lab, a number of times - all PASS >> >> >> See more of comments inline below >> >> On 7/12/19 2:12 AM, Severin Gehwolf wrote: >>> Hi Misha, >>> >>> On Thu, 2019-07-11 at 17:58 -0700, mikhailo.seledtsov at oracle.com >>> wrote: >>>> Hi Severin, >>>> >>>> ???Thank you for taking a look at this change. >>>> >>>> On 7/10/19 10:40 AM, Severin Gehwolf wrote: >>>>> Hi Misha, >>>>> >>>>> On Tue, 2019-07-02 at 15:24 -0700, mikhailo.seledtsov at oracle.com >>>>> wrote: >>>>>> Please review this new test that uses a Docker sidecar pattern to >>>>>> manage/monitor JVM running in the main payload container. >>>>>> >>>>>> Sidecar is a common pattern used in the cloud environments for >>>>>> monitoring among other uses. In side car pattern the main >>>>>> application/service container that runs the payload is paired with a >>>>>> sidecar container. It is achieved by sharing certain namespace >>>>>> aspects >>>>>> between the two containers such as PID namespace, specific >>>>>> sub-directories, IPC and more. >>>>>> >>>>>> This test implements the following cases: >>>>>> ????- "jcmd -l" to list java processes running in "main" container >>>>>> from >>>>>> the "sidecar" container >>>>>> ????- "jhsdb jinfo" in the sidecar configuration >>>>>> ????- jcmd >>>>>> >>>>>> This change also builds a basis for more test cases in the future. >>>>>> >>>>>> Minor changes were done to DockerTestUtils: >>>>>> ????- changing access to DOCKER_COMMAND constant to public >>>>>> ????- minor spelling and terminology corrections >>>>>> >>>>>> >>>>>> ??????JBS: https://bugs.openjdk.java.net/browse/JDK-8227122 >>>>>> ??????Webrev: http://cr.openjdk.java.net/~mseledtsov/8227122.00/ >>>>>> ??????Testing: >>>>>> ??????????1. ran Docker tests on Linux-x64 - PASS >>>>>> ??????????2. Running Docker tests in test cluster - in progress >>>>>> >>>>> // JCMD does not work in sidecar configuration, except for "jcmd -l". >>>>> // Including this test case to assist in reproduction of the problem. >>>>> // t.assertIsAlive(); >>>>> // testCase03(mainProcPid); >>>>> >>>>> FWIW, "jcmd -l" doesn't work in this case either. It only sees itself >>>>> as far as I can tell. >>>> In my experiment it does work. Here are parts of the test log, >>>> first the >>>> command that runs jcmd in a sidecar container, then the output of that >>>> container: >>>> >>>> """ >>>> >>>> [COMMAND] >>>> >>>> /usr/local/bin/docker run --tty=true --rm --cap-add=SYS_PTRACE >>>> --sig-proxy=true --pid=container:test-container-main >>>> --ipc=container:test-container-main -v >>>> /ws/playArea/sidecar-jcmd-8227122/JTwork/scratch/.:/tmp/ >>>> jdk-internal:test-jfr-jcmd /jdk/bin/jcmd -l >>>> >>>> [2019-07-12T00:26:29.083764Z] Gathering output for process 8703 >>>> [ELAPSED: 5 ms] >>>> [STDERR] >>>> >>>> [STDOUT] >>>> 1 EventGeneratorLoop 15 >>>> 23 jdk.jcmd/sun.tools.jcmd.JCmd -l >>>> >>>> """ >>>> >>>> The output shows 2 processes, one is EventGeneratorLoop with PID of 1 >>>> (as expected). This is possible because the containers share certain >>>> namespaces and mounted volumes in a 'sidecar' configuration. In this >>>> case, containers share the PID namespace >>>> (--pid=container:test-container-main) and share volumes mounted as >>>> "/tmp" inside the container (-v >>>> /ws/playArea/sidecar-jcmd-8227122/JTwork/scratch/.:/tmp/) >>>> >>> Right, sorry. Perhaps this code should get a comment that sharing /tmp >>> between sidecar and host container is needed for jvmstat - used >>> internally by the attach mechanism - to work. See >>> HotSpotAttachProvider.testAttachable(): >>> >>> + ???????String[] command = new String[] { >>> + ???????????DockerTestUtils.DOCKER_COMMAND, "run", >>> + ???????????"--tty=true", "--rm", >>> + ???????????"--cap-add=SYS_PTRACE", "--sig-proxy=true", >>> + ???????????"--pid=container:" + MAIN_CONTAINER_NAME, >>> + ???????????"--pid=container:" + MAIN_CONTAINER_NAME, >>> + ???????????"--ipc=container:" + MAIN_CONTAINER_NAME, >>> + ???????????"-v", WORK_DIR + ":/tmp/", >>> >>> I believe -XX:+UsePerfData would be in order too as I don't think >>> things would work if that default changed. >> I have added the comments and added -XX:+UsePerfData for the "main" >> JVM process. >>> >>>>> What's more, this seems to be a case of >>>>> AttachListener::is_init_trigger[1] and >>>>> VirtualMachineImpl.createAttachFile[2] disagreeing. The former >>>>> looks in >>>>> $(pwd)/.attach_pid or /tmp/.attach_pid and the latter >>>>> creates >>>>> it in /proc//root/tmp/.attach_pid. >>> This seems to be the cause for why testCase03 doesn't work. Perhaps >>> this deserves a bug and I can help fix it. >>> >>> While looking at that, I discovered what I said below, which is a >>> different case I know. >> Once these tests are integrated I will file a bug, and can reference >> the test from that bug. >>> >>>>> There seems to be more issues involved. As attaching to a JVM inside a >>>>> container doesn't seem to work from outside which is supposed to be >>>>> fixed with JDK-8179498. That alone seems to warrant a bug. >>>> You are describing a slightly different use case / pattern, but I agree >>>> it does not seem to work. I am happy to hear confirmation of that. >>> I was pointing out that JDK-8179498 seems to have regressed. It's >>> unrelated but should be taken into account when fixing the above issue. >>> >>>> The pattern addressed in this test is a side car, where both the >>>> observer and observee run in containers; the containers are 'friendly' >>>> by sharing certain apsects of namespaces. >>> Yes. >>> >>>> The use case you are describing is somewhat different, if I understand >>>> correctly: the observer runs on a host machine, and obsrvee runs in a >>>> container. Observer tries to use jcmd to list the java processes >>>> running >>>> in container(s), and issue commands, but that fails. I can create a bug >>>> for that, and a simple test case. >>> There should be a bug and a test so that it cannot again regress. >>> >>> JDK-8193710 is also related, but the fix for that bug didn't have a >>> test either :( That's this one which needs fixing: >>> https://bugs.openjdk.java.net/browse/JDK-8195809 >> >> ??? JDK-8195809: [TESTBUG] Create tests for JDK-8193710 jps and jcmd >> -l support for Docker containers >> >> ??? I have assigned it to myself, and will be working on it soon. >> >> >> Thank you, >> >> Misha >> >>> >>>>> ?????private static DockerThread startMainContainer() throws >>>>> Exception { >>>>> ?????????// start "main" container (the observee) >>>>> ?????????DockerRunOptions opts = >>>>> commonDockerOpts("EventGeneratorLoop"); >>>>> ?????????opts.addDockerOpts("--cap-add=SYS_PTRACE", "--ipc=shareable") >>>>> >>>>> Is '--ipc=shareable' really needed? It's not a supported option for my >>>>> docker here :-( >>>> I have removed the '--ipc=shareable' and the test still works. I think >>>> this is extra stuff that is not necessary for this test case, so I will >>>> remove it. >>> Excellent! >>> >>>> I will incorporate changes from your and Bob's review, run some >>>> testing, >>>> and post an updated webrev. >>> Thanks, >>> Severin >>> >>>>> [1] >>>>> http://hg.openjdk.java.net/jdk/jdk/file/ba72dac556c3/src/hotspot/os/linux/attachListener_linux.cpp#l500 >>>>> [2] >>>>> http://hg.openjdk.java.net/jdk/jdk/file/ba72dac556c3/src/jdk.attach/linux/classes/sun/tools/attach/VirtualMachineImpl.java#l295 >>>>> > From jianglizhou at google.com Tue Jul 23 16:33:57 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Tue, 23 Jul 2019 09:33:57 -0700 Subject: [13] RFR(xs) 8228407: JVM crashes with shared archive file mismatch In-Reply-To: References: Message-ID: Looks fine. Best, Jiangli On Mon, Jul 22, 2019 at 1:57 PM Calvin Cheung wrote: > > bug: https://bugs.openjdk.java.net/browse/JDK-8228407 > > webrev: http://cr.openjdk.java.net/~ccheung/8228407/13-webrev.00/ > > This bug is a regression caused by the fix for JDK-8226406. Please refer > to the bug report for reproducing steps and evaluation. > > Tested locally on linux-x64 and windows-x64. Will run tier1 - 3 tests. > > thanks, > > Calvin > From calvin.cheung at oracle.com Tue Jul 23 17:31:05 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Tue, 23 Jul 2019 10:31:05 -0700 Subject: [13] RFR(xs) 8228407: JVM crashes with shared archive file mismatch In-Reply-To: References: Message-ID: Thanks Jiangli! Calvin On 7/23/19 9:33 AM, Jiangli Zhou wrote: > Looks fine. > > Best, > Jiangli > > On Mon, Jul 22, 2019 at 1:57 PM Calvin Cheung wrote: >> bug: https://bugs.openjdk.java.net/browse/JDK-8228407 >> >> webrev: http://cr.openjdk.java.net/~ccheung/8228407/13-webrev.00/ >> >> This bug is a regression caused by the fix for JDK-8226406. Please refer >> to the bug report for reproducing steps and evaluation. >> >> Tested locally on linux-x64 and windows-x64. Will run tier1 - 3 tests. >> >> thanks, >> >> Calvin >> From martin.doerr at sap.com Tue Jul 23 17:50:15 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 23 Jul 2019 17:50:15 +0000 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: References: <4dd7dca3-77ea-a682-ad00-e2d6bd75f7ae@oracle.com> Message-ID: Hi Erik, adding Andrew and Aleksey. > The new webrev looks good. Thanks. > Note though the following though... it looks like the AArch64 code > doesn't do appropriate fencing if the field is volatile. I agree. I was not aware of JDK-8179954 (https://bugs.openjdk.java.net/browse/JDK-8179954). My new aarch64 proposal: http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.02/ Note: All platforms were tested except arm (32 bit). I could also return (address)-1 if JvmtiExport::can_post_field_access() in case nobody wants this for arm. Best regards, Martin From david.holmes at oracle.com Tue Jul 23 23:49:02 2019 From: david.holmes at oracle.com (David Holmes) Date: Wed, 24 Jul 2019 09:49:02 +1000 Subject: RFR (S): 8221205: Obsolete AllowJNIEnvProxy Message-ID: Bug: https://bugs.openjdk.java.net/browse/JDK-8221205 webrev: http://cr.openjdk.java.net/~dholmes/8221205/webrev/ The ancient AllowJNIEnvProxy flag was deprecated in 13 and is now being obsoleted in 14. The code that contained the flag check always operates on a thread instance that is the current thread. Thanks, David From david.holmes at oracle.com Wed Jul 24 05:49:19 2019 From: david.holmes at oracle.com (David Holmes) Date: Wed, 24 Jul 2019 15:49:19 +1000 Subject: RFR (S) 8228485: JVM crashes when bootstrap method for condy triggers loading of class whose static initializer throws exception In-Reply-To: <7e1ba972-16d0-7095-3f4b-c11b17098a60@oracle.com> References: <3059f5a9-d111-f41b-259b-f9e9975385dd@oracle.com> <7e1ba972-16d0-7095-3f4b-c11b17098a60@oracle.com> Message-ID: <8a7f1ee6-33bb-a021-073a-97ae2f4ac398@oracle.com> On 24/07/2019 1:14 am, coleen.phillimore at oracle.com wrote: > On 7/23/19 9:15 AM, David Holmes wrote: >> Hi Coleen, >> >> On 23/07/2019 9:59 pm, coleen.phillimore at oracle.com wrote: >>> Summary: Add case for JVM_CONSTANT_Dynamic in error_message function. >> >> Fix seems reasonable. Begs the question as to whether there are other >> missing cases for this code? > > No, there are not currently.? Only these constant pool tags use > save_and_throw_exception, in order to throw the same exception for each > time the cp entry is resolved with an error: > > jbyte constantTag::error_value() const { > ? switch (_tag) { > ? case JVM_CONSTANT_UnresolvedClass: > ??? return JVM_CONSTANT_UnresolvedClassInError; > ? case JVM_CONSTANT_MethodHandle: > ??? return JVM_CONSTANT_MethodHandleInError; > ? case JVM_CONSTANT_MethodType: > ??? return JVM_CONSTANT_MethodTypeInError; > ? case JVM_CONSTANT_Dynamic: > ??? return JVM_CONSTANT_DynamicInError; > ? default: > ??? ShouldNotReachHere(); > ??? return JVM_CONSTANT_Invalid; > ? } > } Ok. >> >> Can this manifest with regular Java sources? I'm unclear exactly how >> this is being manifested in the jasm file. > > The jasm file is to force initialization of the static class inside the > Condy expression.? Otherwise it may be initialized outside and not hit > the error. Okay. I don't know what source code would lead to use of a condy such that I could suggest how to avoid using jasm. So jasm is fine. IIUC this is not a regression but a day one problem with condy? Thanks, David ----- > thanks, > Coleen >> >> Thanks, >> David >> >>> Tested locally with hotspot and java/lang/invoke condy tests and new >>> test which exercises the code.? This might be needed for jdk 13. >>> >>> open webrev at >>> http://cr.openjdk.java.net/~coleenp/2019/8228485.01/webrev >>> bug link https://bugs.openjdk.java.net/browse/JDK-8228485 >>> >>> Thanks, >>> Coleen > From david.holmes at oracle.com Wed Jul 24 06:34:22 2019 From: david.holmes at oracle.com (David Holmes) Date: Wed, 24 Jul 2019 16:34:22 +1000 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: References: <4dd7dca3-77ea-a682-ad00-e2d6bd75f7ae@oracle.com> Message-ID: <396f569b-10ef-cb65-f818-ef9604e2544a@oracle.com> Hi Martin, No further comments from me. I'm obviously not knowledgeable enough to review any of the assembler changes. Thanks, David On 23/07/2019 8:29 pm, Doerr, Martin wrote: > Hi David and Erik, > > thank you for reviewing and for your very valuable feedback. > >> 1) In the x86_64 assembly, you can combine the movl; testl; into a single >> test instruction with one memory operand to the counter, and one >> immediate zero. > Thanks for the hint. I'm using cmp32 in my new webrev. > >> 2) If libjvm.so maps in far away, then the movl taking an ExternalAddress, >> will actually scratch rscratch1, which is r10. > Good catch! I've exchanged registers and added assert_different_registers. > >> I was secretly hoping to never have to touch fast JNI getfield again, >> because it is so shady, and the odd cases are very hard to test, making it so >> easy to mess up. The ForceUnreachable JVM flag might be useful in checking >> if a solution works also when rscratch1 gets clobbered when referencing JVM >> symbols that are now "far away". > I've also changed the test to run with -XX:+ForceUnreachable and -XX:+SafepointALot to hit more corner cases. > But as you explained, the test would normally not notice the destroyed counter and just execute the slow path. > >> The subtle issue of referencing JVM symbols that can be far away, >> suddenly clobbering r10, has bitten us many times. Perhaps it should be >> made more explicit somehow. > It would be possible to explicitly kill r10 in all such assembler instructions in the dbg build, but that'd come with an overhead. > >> But that's a separate issue. > Agreed. > >> Also, I noticed that the counter that we are checking if it has changed, is a >> 32 bit signed integer. They can actually wrap around, which is undefined >> behaviour at best, and will make these tests fail in the worst case. When we >> don't want counters to overflow, we use 64 bit integers. > We could also make it unsigned to get defined behavior, but that's out of scope here. > > New webrev: > http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.01/ > > Best regards, > Martin > From erik.osterlund at oracle.com Wed Jul 24 08:57:41 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Wed, 24 Jul 2019 10:57:41 +0200 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: References: <4dd7dca3-77ea-a682-ad00-e2d6bd75f7ae@oracle.com> Message-ID: <958E32C9-BCF4-4A07-9B0D-DED3F89F559B@oracle.com> Hi Martin, Looks good for me. Thanks for cleaning up this code! /Erik > On 23 Jul 2019, at 19:50, Doerr, Martin wrote: > > Hi Erik, > > adding Andrew and Aleksey. > >> The new webrev looks good. > Thanks. > >> Note though the following though... it looks like the AArch64 code >> doesn't do appropriate fencing if the field is volatile. > I agree. I was not aware of JDK-8179954 (https://bugs.openjdk.java.net/browse/JDK-8179954). > > My new aarch64 proposal: > http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.02/ > > Note: All platforms were tested except arm (32 bit). I could also return (address)-1 if JvmtiExport::can_post_field_access() in case nobody wants this for arm. > > Best regards, > Martin > From claes.redestad at oracle.com Wed Jul 24 09:42:27 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 24 Jul 2019 11:42:27 +0200 Subject: RFR: 8228507: Archive FDBigInteger Message-ID: <56dfa19a-76c7-3318-bb6f-7e1a04c9cb44@oracle.com> Hi, any double<->String conversion will trigger load of jdk.internal.math.FDBigInteger, which has a static initializer pre-calculating a relatively large number of values. Archiving these pre-calculated values reduces the time spent in FDBigInteger. from a couple of milliseconds down to "nothing". Bug: https://bugs.openjdk.java.net/browse/JDK-8228507 Webrev: http://cr.openjdk.java.net/~redestad/8228507/open.00/ Testing: tier1-3 Thanks! /Claes From shade at redhat.com Wed Jul 24 09:45:00 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 24 Jul 2019 11:45:00 +0200 Subject: RFR (S) 8228485: JVM crashes when bootstrap method for condy triggers loading of class whose static initializer throws exception In-Reply-To: <3059f5a9-d111-f41b-259b-f9e9975385dd@oracle.com> References: <3059f5a9-d111-f41b-259b-f9e9975385dd@oracle.com> Message-ID: <8cbe5e35-753c-0778-92e5-3ae4626a7645@redhat.com> On 7/23/19 1:59 PM, coleen.phillimore at oracle.com wrote: > Summary: Add case for JVM_CONSTANT_Dynamic in error_message function. > > Tested locally with hotspot and java/lang/invoke condy tests and new test which exercises the code.? > This might be needed for jdk 13. > > open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228485.01/webrev Fix is good (I imagined it would go this way). Marked the issue for 11u and 13u backports, we would handle it once it soaks in jdk/jdk for a while. -- Thanks, -Aleksey From harold.seigel at oracle.com Wed Jul 24 11:55:10 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Wed, 24 Jul 2019 07:55:10 -0400 Subject: RFR (S): 8221205: Obsolete AllowJNIEnvProxy In-Reply-To: References: Message-ID: Hi David, This looks good.? Would it be worthwhile to add an assert that thread == JavaThread::current() in JavaThread::check_safepoint_and_suspend_for_native_trans(JavaThread *thread), or get rid of the thread parameter? Thanks, Harold On 7/23/2019 7:49 PM, David Holmes wrote: > Bug: https://bugs.openjdk.java.net/browse/JDK-8221205 > webrev: http://cr.openjdk.java.net/~dholmes/8221205/webrev/ > > The ancient AllowJNIEnvProxy flag was deprecated in 13 and is now > being obsoleted in 14. > > The code that contained the flag check always operates on a thread > instance that is the current thread. > > Thanks, > David From coleen.phillimore at oracle.com Wed Jul 24 12:48:31 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 24 Jul 2019 08:48:31 -0400 Subject: RFR (S) 8228485: JVM crashes when bootstrap method for condy triggers loading of class whose static initializer throws exception In-Reply-To: <8a7f1ee6-33bb-a021-073a-97ae2f4ac398@oracle.com> References: <3059f5a9-d111-f41b-259b-f9e9975385dd@oracle.com> <7e1ba972-16d0-7095-3f4b-c11b17098a60@oracle.com> <8a7f1ee6-33bb-a021-073a-97ae2f4ac398@oracle.com> Message-ID: On 7/24/19 1:49 AM, David Holmes wrote: > On 24/07/2019 1:14 am, coleen.phillimore at oracle.com wrote: >> On 7/23/19 9:15 AM, David Holmes wrote: >>> Hi Coleen, >>> >>> On 23/07/2019 9:59 pm, coleen.phillimore at oracle.com wrote: >>>> Summary: Add case for JVM_CONSTANT_Dynamic in error_message function. >>> >>> Fix seems reasonable. Begs the question as to whether there are >>> other missing cases for this code? >> >> No, there are not currently.? Only these constant pool tags use >> save_and_throw_exception, in order to throw the same exception for >> each time the cp entry is resolved with an error: >> >> jbyte constantTag::error_value() const { >> ?? switch (_tag) { >> ?? case JVM_CONSTANT_UnresolvedClass: >> ???? return JVM_CONSTANT_UnresolvedClassInError; >> ?? case JVM_CONSTANT_MethodHandle: >> ???? return JVM_CONSTANT_MethodHandleInError; >> ?? case JVM_CONSTANT_MethodType: >> ???? return JVM_CONSTANT_MethodTypeInError; >> ?? case JVM_CONSTANT_Dynamic: >> ???? return JVM_CONSTANT_DynamicInError; >> ?? default: >> ???? ShouldNotReachHere(); >> ???? return JVM_CONSTANT_Invalid; >> ?? } >> } > > Ok. > >>> >>> Can this manifest with regular Java sources? I'm unclear exactly how >>> this is being manifested in the jasm file. >> >> The jasm file is to force initialization of the static class inside >> the Condy expression.? Otherwise it may be initialized outside and >> not hit the error. > > Okay. I don't know what source code would lead to use of a condy such > that I could suggest how to avoid using jasm. So jasm is fine. > > IIUC this is not a regression but a day one problem with condy? It's a day one omission, not a regression. Thanks, Coleen > > Thanks, > David > ----- > >> thanks, >> Coleen >>> >>> Thanks, >>> David >>> >>>> Tested locally with hotspot and java/lang/invoke condy tests and >>>> new test which exercises the code.? This might be needed for jdk 13. >>>> >>>> open webrev at >>>> http://cr.openjdk.java.net/~coleenp/2019/8228485.01/webrev >>>> bug link https://bugs.openjdk.java.net/browse/JDK-8228485 >>>> >>>> Thanks, >>>> Coleen >> From david.holmes at oracle.com Wed Jul 24 12:47:48 2019 From: david.holmes at oracle.com (David Holmes) Date: Wed, 24 Jul 2019 22:47:48 +1000 Subject: RFR (S): 8221205: Obsolete AllowJNIEnvProxy In-Reply-To: References: Message-ID: Hi Harold, Thanks for taking a look at this. On 24/07/2019 9:55 pm, Harold Seigel wrote: > Hi David, > > This looks good.? Would it be worthwhile to add an assert that thread == > JavaThread::current() in > JavaThread::check_safepoint_and_suspend_for_native_trans(JavaThread > *thread), or get rid of the thread parameter? That code is part of a much larger code sequence for thread-state transitions that always operates on the current thread. So no assert is needed just for this part. The thread parameter is needed in this case because it is defined as a static method. It would be possible to change it to an instance method, but that would require touching a lot more code. Thanks, David > Thanks, Harold > > On 7/23/2019 7:49 PM, David Holmes wrote: >> Bug: https://bugs.openjdk.java.net/browse/JDK-8221205 >> webrev: http://cr.openjdk.java.net/~dholmes/8221205/webrev/ >> >> The ancient AllowJNIEnvProxy flag was deprecated in 13 and is now >> being obsoleted in 14. >> >> The code that contained the flag check always operates on a thread >> instance that is the current thread. >> >> Thanks, >> David From coleen.phillimore at oracle.com Wed Jul 24 12:55:36 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 24 Jul 2019 08:55:36 -0400 Subject: RFR (S) 8228485: JVM crashes when bootstrap method for condy triggers loading of class whose static initializer throws exception In-Reply-To: <8cbe5e35-753c-0778-92e5-3ae4626a7645@redhat.com> References: <3059f5a9-d111-f41b-259b-f9e9975385dd@oracle.com> <8cbe5e35-753c-0778-92e5-3ae4626a7645@redhat.com> Message-ID: On 7/24/19 5:45 AM, Aleksey Shipilev wrote: > On 7/23/19 1:59 PM, coleen.phillimore at oracle.com wrote: >> Summary: Add case for JVM_CONSTANT_Dynamic in error_message function. >> >> Tested locally with hotspot and java/lang/invoke condy tests and new test which exercises the code. >> This might be needed for jdk 13. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228485.01/webrev > Fix is good (I imagined it would go this way). > > Marked the issue for 11u and 13u backports, we would handle it once it soaks in jdk/jdk for a while. Yes, the fix should be backported. It's low risk imo. Thank you for the code review! Coleen > From harold.seigel at oracle.com Wed Jul 24 12:49:07 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Wed, 24 Jul 2019 08:49:07 -0400 Subject: RFR (S): 8221205: Obsolete AllowJNIEnvProxy In-Reply-To: References: Message-ID: Thanks for the explanation. Harold On 7/24/2019 8:47 AM, David Holmes wrote: > Hi Harold, > > Thanks for taking a look at this. > > On 24/07/2019 9:55 pm, Harold Seigel wrote: >> Hi David, >> >> This looks good.? Would it be worthwhile to add an assert that thread >> == JavaThread::current() in >> JavaThread::check_safepoint_and_suspend_for_native_trans(JavaThread >> *thread), or get rid of the thread parameter? > > That code is part of a much larger code sequence for thread-state > transitions that always operates on the current thread. So no assert > is needed just for this part. The thread parameter is needed in this > case because it is defined as a static method. It would be possible to > change it to an instance method, but that would require touching a lot > more code. > > Thanks, > David > >> Thanks, Harold >> >> On 7/23/2019 7:49 PM, David Holmes wrote: >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8221205 >>> webrev: http://cr.openjdk.java.net/~dholmes/8221205/webrev/ >>> >>> The ancient AllowJNIEnvProxy flag was deprecated in 13 and is now >>> being obsoleted in 14. >>> >>> The code that contained the flag check always operates on a thread >>> instance that is the current thread. >>> >>> Thanks, >>> David From claes.redestad at oracle.com Wed Jul 24 13:28:11 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 24 Jul 2019 15:28:11 +0200 Subject: RFR: 8228581: Archive BigInteger constants Message-ID: <861003b5-4f12-cf12-9cfe-395177a1ba2f@oracle.com> Hi, BigInteger has a number of pre-calculated constants that are profitable to put up for archiving. This reduces initialization time of BigInteger by 0.3-0.5ms, and archives ~12Kb worth of objects. Bug: https://bugs.openjdk.java.net/browse/JDK-8228581 Webrev: http://cr.openjdk.java.net/~redestad/8228581/open.00/ Webrev is applied on top of patch for https://bugs.openjdk.java.net/browse/JDK-8228507 - which I've tested alongside this. Testing: tier1-2 Thanks! /Claes From daniel.daugherty at oracle.com Wed Jul 24 13:56:25 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Wed, 24 Jul 2019 09:56:25 -0400 Subject: RFR (S): 8221205: Obsolete AllowJNIEnvProxy In-Reply-To: References: Message-ID: <37037685-9d20-92c3-3968-0e3656d67ada@oracle.com> On 7/23/19 7:49 PM, David Holmes wrote: > Bug: https://bugs.openjdk.java.net/browse/JDK-8221205 > webrev: http://cr.openjdk.java.net/~dholmes/8221205/webrev/ src/hotspot/share/runtime/globals.hpp ??? No comments. src/hotspot/share/runtime/thread.cpp ??? No comments. Thumbs up. > It would be possible to change it to an instance method, but that would > require touching a lot more code. I agree that getting rid of the 'thread' parameter and changing check_safepoint_and_suspend_for_native_trans() to an instance method does not have to be done by this fix. Do you want to log it as an RFE for the future or just move on to bigger fish? Dan > > The ancient AllowJNIEnvProxy flag was deprecated in 13 and is now > being obsoleted in 14. > > The code that contained the flag check always operates on a thread > instance that is the current thread. > > Thanks, > David From daniil.x.titov at oracle.com Wed Jul 24 17:21:55 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Wed, 24 Jul 2019 10:21:55 -0700 Subject: RFR: 8185005: Improve performance of ThreadMXBean.getThreadInfo(long ids[], int maxDepth) In-Reply-To: References: <4C4212D0-BFFF-4C85-ACC6-05200F220C3F@oracle.com> <2d6dede1-aa79-99ce-a823-773fa2e19827@oracle.com> <6E7B043A-4647-4931-977C-1854CA7EBEC1@oracle.com> Message-ID: <76BCC96D-DB5D-409A-95D5-3A64B893832D@oracle.com> Hi David, Daniel, and Serguei, Please review the new version of the fix, that makes the thread table initialization on demand and moves it inside ThreadsList::find_JavaThread_from_java_tid(). At the creation time the thread table is initialized with the threads from the current thread list. We don't want to hold Threads_lock inside find_JavaThread_from_java_tid(), thus new threads still could be created while the thread table is being initialized . Such threads will be found by the linear search and added to the thread table later, in ThreadsList::find_JavaThread_from_java_tid(). The change also includes additional optimization for some callers of find_JavaThread_from_java_tid() as Daniel suggested. That is correct that ResolvedMethodTable was used as a blueprint for the thread table, however, I tried to strip it of the all functionality that is not required in the thread table case. We need to have the thread table resizable and allow it to grow as the number of threads increases to avoid reserving excessive memory a-priori or deteriorating lookup times. The ServiceThread is responsible for growing the thread table when required. There is no ConcurrentHashTable available in Java 8 and for backporting this fix to Java 8 another implementation of the hash table, probably originally suggested in the patch attached to the JBS issue, should be used. It will make the backporting more complicated, however, adding a new Implementation of the hash table in Java 14 while it already has ConcurrentHashTable doesn't seem reasonable for me. Webrev: http://cr.openjdk.java.net/~dtitov/8185005/webrev.03 Bug: https://bugs.openjdk.java.net/browse/JDK-8185005 Thanks! --Daniil ?On 7/8/19, 3:24 PM, "Daniel D. Daugherty" wrote: On 6/29/19 12:06 PM, Daniil Titov wrote: > Hi Serguei and David, > > Serguei is right, ThreadTable::find_thread(java_tid) cannot return a JavaThread with an unmatched java_tid. > > Please find a new version of the fix that includes the changes Serguei suggested. > > Regarding the concern about the maintaining the thread table when it may never even be queried, one of > the options could be to add ThreadTable ::isEnabled flag, set it to "false" by default, and wrap the calls to the thread table > in ThreadsSMRSupport add_thread() and remove_thread() methods to check this flag. > > When ThreadsList::find_JavaThread_from_java_tid() is called for the first time it could check if ThreadTable ::isEnabled > Is on and if not then set it on and populate the thread table with all existing threads from the thread list. I have the same concerns as David H. about this new ThreadTable. ThreadsList::find_JavaThread_from_java_tid() is only called from code in src/hotspot/share/services/management.cpp so I think that table needs to enabled and populated only if it is going to be used. I've taken a look at the webrev below and I see that David has followed up with additional comments. Before I do a crawl through code review for this, I would like to see the ThreadTable stuff made optional and David's other comments addressed. Another possible optimization is for callers of find_JavaThread_from_java_tid() to save the calling thread's tid value before they loop and if the current tid == saved_tid then use the current JavaThread* instead of calling find_JavaThread_from_java_tid() to get the JavaThread*. Dan > > Webrev: https://cr.openjdk.java.net/~dtitov/8185005/webrev.02/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8185005 > > Thanks! > --Daniil > > From: > Organization: Oracle Corporation > Date: Friday, June 28, 2019 at 7:56 PM > To: Daniil Titov , OpenJDK Serviceability , "hotspot-runtime-dev at openjdk.java.net" , "jmx-dev at openjdk.java.net" > Subject: Re: RFR: 8185005: Improve performance of ThreadMXBean.getThreadInfo(long ids[], int maxDepth) > > Hi Daniil, > > I have several quick comments. > > The indent in the hotspot c/c++ files has to be 2, not 4. > > https://cr.openjdk.java.net/~dtitov/8185005/webrev.01/src/hotspot/share/runtime/threadSMR.cpp.frames.html > 614 JavaThread* ThreadsList::find_JavaThread_from_java_tid(jlong java_tid) const { > 615 JavaThread* java_thread = ThreadTable::find_thread(java_tid); > 616 if (java_thread == NULL && java_tid == PMIMORDIAL_JAVA_TID) { > 617 // ThreadsSMRSupport::add_thread() is not called for the primordial > 618 // thread. Thus, we find this thread with a linear search and add it > 619 // to the thread table. > 620 for (uint i = 0; i < length(); i++) { > 621 JavaThread* thread = thread_at(i); > 622 if (is_valid_java_thread(java_tid,thread)) { > 623 ThreadTable::add_thread(java_tid, thread); > 624 return thread; > 625 } > 626 } > 627 } else if (java_thread != NULL && is_valid_java_thread(java_tid, java_thread)) { > 628 return java_thread; > 629 } > 630 return NULL; > 631 } > 632 bool ThreadsList::is_valid_java_thread(jlong java_tid, JavaThread* java_thread) { > 633 oop tobj = java_thread->threadObj(); > 634 // Ignore the thread if it hasn't run yet, has exited > 635 // or is starting to exit. > 636 return (tobj != NULL && !java_thread->is_exiting() && > 637 java_tid == java_lang_Thread::thread_id(tobj)); > 638 } > > 615 JavaThread* java_thread = ThreadTable::find_thread(java_tid); > > I'd suggest to rename find_thread() to find_thread_by_tid(). > > A space is missed after the comma: > 622 if (is_valid_java_thread(java_tid,thread)) { > > An empty line is needed before L632. > > The name 'is_valid_java_thread' looks wrong (or confusing) to me. > Something like 'is_alive_java_thread_with_tid()' would be better. > It'd better to list parameters in the opposite order. > > The call to is_valid_java_thread() is confusing: > 627 } else if (java_thread != NULL && is_valid_java_thread(java_tid, java_thread)) { > > Why would the call ThreadTable::find_thread(java_tid) return a JavaThread with an unmatched java_tid? > > > Thanks, > Serguei > > On 6/28/19, 9:40 PM, "David Holmes" wrote: > > Hi Daniil, > > The definition and use of this hashtable (yet another hashtable > implementation!) will need careful examination. We have to be concerned > about the cost of maintaining it when it may never even be queried. You > would need to look at footprint cost and performance impact. > > Unfortunately I'm just about to board a plane and will be out for the > next few days. I will try to look at this asap next week, but we will > need a lot more data on it. > > Thanks, > David > > On 6/28/19 3:31 PM, Daniil Titov wrote: > Please review the change that improves performance of ThreadMXBean MXBean methods returning the > information for specific threads. The change introduces the thread table that uses ConcurrentHashTable > to store one-to-one the mapping between the thread ids and JavaThread objects and replaces the linear > search over the thread list in ThreadsList::find_JavaThread_from_java_tid(jlong tid) method with the lookup > in the thread table. > > Testing: Mach5 tier1,tier2 and tier3 tests successfully passed. > > Webrev: https://cr.openjdk.java.net/~dtitov/8185005/webrev.01/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8185005 > > Thanks! > > Best regards, > Daniil > > > > > > > From daniil.x.titov at oracle.com Wed Jul 24 17:34:22 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Wed, 24 Jul 2019 10:34:22 -0700 Subject: RFR: 8170299: Debugger does not stop inside the low memory notifications code In-Reply-To: <60639d41-735a-00d3-c9db-1955f581b89a@oracle.com> References: <75B5F778-DC49-494B-AC12-270F301677CA@oracle.com> <60639d41-735a-00d3-c9db-1955f581b89a@oracle.com> Message-ID: Hi David, Hope you had a great vacation! Please find below the latest version of the change . The only difference from the version 01 is the corrected ordering of include statements as Serguei suggested. Webrev: https://cr.openjdk.java.net/~dtitov/8170299/webrev.02/ Bug: https://bugs.openjdk.java.net/browse/JDK-8170299 Thanks! --Daniil ?On 7/3/19, 11:47 PM, "David Holmes" wrote: Hi Daniil, On 4/07/2019 1:04 pm, Daniil Titov wrote: > Please review the change the fixes the problem with the debugger not stopping in the low memory notification code. > > The problem here is that the ServiceThread that calls these MXBean listeners is hidden from the external view that prevents the debugger from stopping in it. > > The fix introduces new NotificationThread that is visible to the external view and offloads the ServiceThread from sending low memory and other notifications that could result in Java calls ( GC and diagnostic commands notifications) by moving these activities in this new NotificationThread. There is a long and unfortunate history with this bug. The original incarnation of this fix was introducing a new thread at the Java library level, and I had some concerns about that: http://mail.openjdk.java.net/pipermail/serviceability-dev/2017-December/022612.html That effort was resurrected at: http://mail.openjdk.java.net/pipermail/serviceability-dev/2018-July/024466.html and http://mail.openjdk.java.net/pipermail/serviceability-dev/2018-August/024849.html but was left somewhat in limbo. There was a lot of doubt about the right way to fix this bug and whether introducing a new thread was too disruptive. But introducing a new thread in the VM also has the same set of concerns! This needs consideration by the runtime team before going ahead. Introducing a new thread likes this needs to be examined in detail - particularly the synchronization interactions with other threads. It also introduces another monitor designated safepoint-never at a time when we are in the process of cleaning up monitors so that JavaThreads will only use safepoint-check-always monitors. Unfortunately I'm about to head out for two weeks vacation, and a number of other key runtime folk are also on vacation. but I'd ask that you hold off on this until we can look at it in more detail. Thanks, David ----- > Testing: Mach5 tier1,tier2 and tier3 tests succeeded. > > Webrev: https://cr.openjdk.java.net/~dtitov/8170299/webrev.01/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8170299 > > Thanks! > --Daniil > > From coleen.phillimore at oracle.com Wed Jul 24 17:38:26 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 24 Jul 2019 13:38:26 -0400 Subject: RFR (S) 8202339: [TESTBUG] Consolidate the tests in runtime/SharedArchiveFile and runtime/appcds Message-ID: <0aea5937-718b-44b0-1f87-968574be2ce3@oracle.com> Summary: moved appcds and SharedArchive files to test/hotspot/runtime/cds This is 99% tedious.? I moved the files with hg move and fixed the directory references in the tests to 'cds'.?? Tested with mach5 hs-tier1,2,3. open webrev at http://cr.openjdk.java.net/~coleenp/2019/8202339.01/webrev bug link https://bugs.openjdk.java.net/browse/JDK-8202339 Thanks, Coleen From mikhailo.seledtsov at oracle.com Wed Jul 24 17:47:42 2019 From: mikhailo.seledtsov at oracle.com (Mikhailo Seledtsov) Date: Wed, 24 Jul 2019 10:47:42 -0700 Subject: RFR (S) 8202339: [TESTBUG] Consolidate the tests in runtime/SharedArchiveFile and runtime/appcds In-Reply-To: <0aea5937-718b-44b0-1f87-968574be2ce3@oracle.com> References: <0aea5937-718b-44b0-1f87-968574be2ce3@oracle.com> Message-ID: <5D3899BE.6050909@oracle.com> Looks good, Misha On 7/24/19, 10:38 AM, coleen.phillimore at oracle.com wrote: > Summary: moved appcds and SharedArchive files to test/hotspot/runtime/cds > > This is 99% tedious. I moved the files with hg move and fixed the > directory references in the tests to 'cds'. Tested with mach5 > hs-tier1,2,3. > > open webrev at http://cr.openjdk.java.net/~coleenp/2019/8202339.01/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8202339 > > Thanks, > Coleen From jianglizhou at google.com Wed Jul 24 18:00:13 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Wed, 24 Jul 2019 11:00:13 -0700 Subject: RFR: 8228507: Archive FDBigInteger In-Reply-To: <56dfa19a-76c7-3318-bb6f-7e1a04c9cb44@oracle.com> References: <56dfa19a-76c7-3318-bb6f-7e1a04c9cb44@oracle.com> Message-ID: Hi Claes, This looks good to me. I wonder if FDBigInteger.archivedCaches could be placed in closed_archive_subgraph_entry_fields for the 'closed' archive region? The makeImmutable() is called for the cached FDBigInteger objects. Best regards, Jiangli On Wed, Jul 24, 2019 at 2:41 AM Claes Redestad wrote: > > Hi, > > any double<->String conversion will trigger load of > jdk.internal.math.FDBigInteger, which has a static > initializer pre-calculating a relatively large number > of values. > > Archiving these pre-calculated values reduces the time > spent in FDBigInteger. from a couple of > milliseconds down to "nothing". > > Bug: https://bugs.openjdk.java.net/browse/JDK-8228507 > Webrev: http://cr.openjdk.java.net/~redestad/8228507/open.00/ > > Testing: tier1-3 > > Thanks! > > /Claes From coleen.phillimore at oracle.com Wed Jul 24 19:33:43 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 24 Jul 2019 15:33:43 -0400 Subject: RFR (S) 8202339: [TESTBUG] Consolidate the tests in runtime/SharedArchiveFile and runtime/appcds In-Reply-To: <5D3899BE.6050909@oracle.com> References: <0aea5937-718b-44b0-1f87-968574be2ce3@oracle.com> <5D3899BE.6050909@oracle.com> Message-ID: <835096de-e23f-2792-9bf9-ba60252dfeeb@oracle.com> Thanks Misha! Coleen On 7/24/19 1:47 PM, Mikhailo Seledtsov wrote: > Looks good, > Misha > > On 7/24/19, 10:38 AM, coleen.phillimore at oracle.com wrote: >> Summary: moved appcds and SharedArchive files to >> test/hotspot/runtime/cds >> >> This is 99% tedious.? I moved the files with hg move and fixed the >> directory references in the tests to 'cds'.?? Tested with mach5 >> hs-tier1,2,3. >> >> open webrev at >> http://cr.openjdk.java.net/~coleenp/2019/8202339.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8202339 >> >> Thanks, >> Coleen From joe.darcy at oracle.com Wed Jul 24 19:38:17 2019 From: joe.darcy at oracle.com (Joe Darcy) Date: Wed, 24 Jul 2019 12:38:17 -0700 Subject: RFR: 8228581: Archive BigInteger constants In-Reply-To: <861003b5-4f12-cf12-9cfe-395177a1ba2f@oracle.com> References: <861003b5-4f12-cf12-9cfe-395177a1ba2f@oracle.com> Message-ID: <05507628-4b49-ff2f-6c05-71abec9c5085@oracle.com> Hi Claes, For those of us unfamiliar with the archive mechanism, can you describe its semantics or send a pointer to such a description? Thanks, -Joe On 7/24/2019 6:28 AM, Claes Redestad wrote: > Hi, > > BigInteger has a number of pre-calculated constants that are profitable > to put up for archiving. This reduces initialization time of BigInteger > by 0.3-0.5ms, and archives ~12Kb worth of objects. > > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8228581 > Webrev: http://cr.openjdk.java.net/~redestad/8228581/open.00/ > > Webrev is applied on top of patch for > https://bugs.openjdk.java.net/browse/JDK-8228507 - which I've > tested alongside this. > > Testing: tier1-2 > > Thanks! > > /Claes From jianglizhou at google.com Wed Jul 24 19:54:33 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Wed, 24 Jul 2019 12:54:33 -0700 Subject: RFR: 8228581: Archive BigInteger constants In-Reply-To: <05507628-4b49-ff2f-6c05-71abec9c5085@oracle.com> References: <861003b5-4f12-cf12-9cfe-395177a1ba2f@oracle.com> <05507628-4b49-ff2f-6c05-71abec9c5085@oracle.com> Message-ID: On Wed, Jul 24, 2019 at 12:41 PM Joe Darcy wrote: > > Hi Claes, > > For those of us unfamiliar with the archive mechanism, can you describe > its semantics or send a pointer to such a description? That's a good point. There are some design docs describing how Java objects and object graphs archiving work. It might be good to put those in OpenJDK wiki. Best regards, Jiangli > > Thanks, > > -Joe > > On 7/24/2019 6:28 AM, Claes Redestad wrote: > > Hi, > > > > BigInteger has a number of pre-calculated constants that are profitable > > to put up for archiving. This reduces initialization time of BigInteger > > by 0.3-0.5ms, and archives ~12Kb worth of objects. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8228581 > > Webrev: http://cr.openjdk.java.net/~redestad/8228581/open.00/ > > > > Webrev is applied on top of patch for > > https://bugs.openjdk.java.net/browse/JDK-8228507 - which I've > > tested alongside this. > > > > Testing: tier1-2 > > > > Thanks! > > > > /Claes From claes.redestad at oracle.com Wed Jul 24 20:07:57 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 24 Jul 2019 22:07:57 +0200 Subject: RFR: 8228581: Archive BigInteger constants In-Reply-To: References: <861003b5-4f12-cf12-9cfe-395177a1ba2f@oracle.com> <05507628-4b49-ff2f-6c05-71abec9c5085@oracle.com> Message-ID: <1cb8c5c5-effd-a378-a178-99bb08af0f85@oracle.com> Hi Joe, Jiangli, On 2019-07-24 21:54, Jiangli Zhou wrote: > On Wed, Jul 24, 2019 at 12:41 PM Joe Darcy wrote: >> >> Hi Claes, >> >> For those of us unfamiliar with the archive mechanism, can you describe >> its semantics or send a pointer to such a description? > > That's a good point. There are some design docs describing how Java > objects and object graphs archiving work. It might be good to put > those in OpenJDK wiki. > > Best regards, > Jiangli getting the design up on the wiki would be great! Do you have a pointer to the docs and/or time to work on this? Quick outline of my understanding: effectively at time of -Xshare:dump, the contents of the static fields listed in heapShared.cpp will be serialized to the base CDS archive, .e., lib/classes.jsa, and on VM.initializeFromArchive the serialized heap state of the object graph will be mapped in, if possible. There are closed and open archive regions, and the rules differ somewhat between them. Objects archived in closed archive regions must be effectively immutable. Some mutable operations like synchronizing on objects are allowed, but GCs are allowed to ignore these regions, so if you wrote to a reference field pointing to something on the regular heap, you'd not be strongly referencing that object, and this might be disastrous. If you try to put things into the closed archive that are not fully immutable (not all fields are final etc), you'll get a lot of warnings to this effect. Open regions additionally allow writing to reference fields and referencing objects on the "regular" heap, so objects in such regions must be scanned by GCs as if they were a "normal" heap region. The powerCache field in this example, which is volatile, should be acceptable. This patch could probably be improved by moving everything but the initialPowerCache to the closed archive region.. /Claes From calvin.cheung at oracle.com Wed Jul 24 20:19:28 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Wed, 24 Jul 2019 13:19:28 -0700 Subject: RFR (M) 8191521: handle long relative path specified in -Xbootclasspath/a on windows In-Reply-To: References: Message-ID: <2021b295-463c-6ea1-97b5-3512cacc28c1@oracle.com> Hi Ralf, Thanks for working on this RFE. The changes look good in general and other functions such as os::stat() os::open() look cleaner now. os_windows.cpp 4315?????? } else { 4316???????? prefix = L"\\\\?\\UNC"; 4317???????? prefix_off = 1; // Overwrite the first char with the prefix. 4318?????? } Do you mean if the original path is something like "\\\\x\\y", it would become the following? ??? ""\\\\?\\UNC\\x\\y" Maybe add an example in the comment? 4368?? if (err != ERROR_SUCCESS) { 4369???? os::free(result); I think you need a NULL check on result before calling os::free() because result could be NULL if the os::malloc() call at line 4328 has failed. Some minor nit in the comment of the wide_abs_unc_path() function: line 4282 please add a blank space before "The" line 4283 "er" should be "err" Could you also mention that the function is based on pathToNTPath() in io_util_md.cpp? test_os_windows.cpp I haven't reviewed this file in details but I have tried your patch and saw failures in tier1 testing. Excerpt from the log file (I can send you the entire log off list if you like): t:/workspace/open/test/hotspot/gtest/runtime/test_os_windows.cpp(276): error: Expected: (fd) != (-1), actual: -1 vs -1 os::open failed for "\\\\localhost\\T$\\testoutput\\test-support\\jtreg_open_test_hotspot_jtreg_tier1_common\\scratch\\1\\os_windows_long_paths_dir_LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL\\not_empty_directory_with_long_path\\file" with errno 67 t:/workspace/open/test/hotspot/gtest/runtime/test_os_windows.cpp(273): error: Expected equality of these values: ? os::stat(buf, &st) ??? Which is: -1 ? 0 os::stat failed for "\\\\localhost\\T$\\testoutput\\test-support\\jtreg_open_test_hotspot_jtreg_tier1_common\\scratch\\1\\os_windows_long_paths_dir_LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL\\not_empty_directory_with_long_path\\file" LongBCP.java This one looks good. thanks, Calvin On 7/4/19 3:56 AM, Schmelter, Ralf wrote: > Hi, > > can you please review this patch to fix various long path related problems in the hotspot os code on Windows. > > As described in the bug the current code cannot handle relative paths in these cases: > > 1. If the relative path is < 260 chars, but the absolute path is > 260 chars. In this case if the I/O method uses the *A variant of the system call as an optimization, it will fail. > 2. If the relative path is > 260 chars or the I/O method always uses the *W variant. In this case the create_unc_path() method is called, which just prepends \\?\ to the relative path. But this is not a valid path to use and the system call will fail. > > Additionally there are problems with some other kinds of paths: > > 1. An absolute path which contains '.' or '..' parts and is > 260 chars or the I/O method always uses the *W variant. When given to the create_unc_path() method, it will just prepend \\?\. But this is not a valid path to use and the system call will fail. > 2. An UNC path which is > 260 or the I/O method always uses the *W variant. The create_unc_path erroneously converts \\host\path to \\?\UNC\\host\path (notice the double backslash before the host name). This again is not a valid path. Additionally '.' or '..' parts would not be handled correctly too. > > To fix this I've introduced a new function, which converts a path to a wide character unc path, calling _wfullpath() to make the path absolute if needed and to remove the '.' and '..' path parts. I've adjusted all methods which used create_unc_path() to use the new method. And I removed all fallback code using the ANSI variants, since benchmarking showed that on my machine the additional overhead of converting to a wchar and potentially calling _wfullpath() was less than 5% of the actual I/O routine called. And for this reason, why I haven't tried to optimize avoiding calls to _wfullpath() (e.g. checking for '.' and '..' and only calling it if we find this in the path). > > bugreport: https://bugs.openjdk.java.net/browse/JDK-8191521 > webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8191521/webrev.0/ > > Best regards, > Ralf > From jianglizhou at google.com Wed Jul 24 20:38:01 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Wed, 24 Jul 2019 13:38:01 -0700 Subject: RFR: 8228581: Archive BigInteger constants In-Reply-To: <1cb8c5c5-effd-a378-a178-99bb08af0f85@oracle.com> References: <861003b5-4f12-cf12-9cfe-395177a1ba2f@oracle.com> <05507628-4b49-ff2f-6c05-71abec9c5085@oracle.com> <1cb8c5c5-effd-a378-a178-99bb08af0f85@oracle.com> Message-ID: Hi Claes, On Wed, Jul 24, 2019 at 1:06 PM Claes Redestad wrote: > > Hi Joe, Jiangli, > > On 2019-07-24 21:54, Jiangli Zhou wrote: > > On Wed, Jul 24, 2019 at 12:41 PM Joe Darcy wrote: > >> > >> Hi Claes, > >> > >> For those of us unfamiliar with the archive mechanism, can you describe > >> its semantics or send a pointer to such a description? > > > > That's a good point. There are some design docs describing how Java > > objects and object graphs archiving work. It might be good to put > > those in OpenJDK wiki. > > > > Best regards, > > Jiangli > > getting the design up on the wiki would be great! Do you have a pointer > to the docs and/or time to work on this? There are three existing design docs related to this area. They contain information: - GC (G1) Archive heap regions, closed archive region and open archive region - Archive object state transition - Targeted static field pre-initialization and caching using object graph archiving - Archived mirror (j.l.Objects) objects, Strings, etc. I don't have the internal links to those docs. I can help update the docs once they are moved to the OpenJDK wiki. Best regards, Jiangli > > Quick outline of my understanding: > > effectively at time of -Xshare:dump, the contents of the static fields > listed in heapShared.cpp will be serialized to the base CDS archive, > .e., lib/classes.jsa, and on VM.initializeFromArchive the serialized > heap state of the object graph will be mapped in, if possible. > > There are closed and open archive regions, and the rules differ somewhat > between them. > > Objects archived in closed archive regions must be effectively > immutable. Some mutable operations like synchronizing on objects are > allowed, but GCs are allowed to ignore these regions, so if you wrote to > a reference field pointing to something on the regular heap, you'd > not be strongly referencing that object, and this might be disastrous. > If you try to put things into the closed archive that are not > fully immutable (not all fields are final etc), you'll get a lot of > warnings to this effect. > > Open regions additionally allow writing to reference fields and > referencing objects on the "regular" heap, so objects in such regions > must be scanned by GCs as if they were a "normal" heap region. The > powerCache field in this example, which is volatile, should be > acceptable. > > This patch could probably be improved by moving everything but the > initialPowerCache to the closed archive region.. > > /Claes From claes.redestad at oracle.com Wed Jul 24 22:12:32 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 25 Jul 2019 00:12:32 +0200 Subject: RFR: 8228507: Archive FDBigInteger In-Reply-To: References: <56dfa19a-76c7-3318-bb6f-7e1a04c9cb44@oracle.com> Message-ID: <69f7689a-ab6f-5517-60a4-7036176930c5@oracle.com> Hi Jiangli, thank you for reviewing! The objects we're dealing with here are all effectively immutable, however, at least one reference field in FDBigInteger is non-final, so putting it in a closed archive region is causing a lot of warnings at dump time. Perhaps we should add a way to suppress such warnings on a case-by-case basis? Thanks! /Claes On 2019-07-24 20:00, Jiangli Zhou wrote: > Hi Claes, > > This looks good to me. I wonder if FDBigInteger.archivedCaches could > be placed in closed_archive_subgraph_entry_fields for the 'closed' > archive region? The makeImmutable() is called for the cached > FDBigInteger objects. > > Best regards, > Jiangli > > On Wed, Jul 24, 2019 at 2:41 AM Claes Redestad > wrote: >> >> Hi, >> >> any double<->String conversion will trigger load of >> jdk.internal.math.FDBigInteger, which has a static >> initializer pre-calculating a relatively large number >> of values. >> >> Archiving these pre-calculated values reduces the time >> spent in FDBigInteger. from a couple of >> milliseconds down to "nothing". >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8228507 >> Webrev: http://cr.openjdk.java.net/~redestad/8228507/open.00/ >> >> Testing: tier1-3 >> >> Thanks! >> >> /Claes From david.holmes at oracle.com Wed Jul 24 22:47:12 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 25 Jul 2019 08:47:12 +1000 Subject: RFR (S): 8221205: Obsolete AllowJNIEnvProxy In-Reply-To: <37037685-9d20-92c3-3968-0e3656d67ada@oracle.com> References: <37037685-9d20-92c3-3968-0e3656d67ada@oracle.com> Message-ID: <0a1c0e16-bb5b-31cc-a377-f8e4c5e25405@oracle.com> On 24/07/2019 11:56 pm, Daniel D. Daugherty wrote: > On 7/23/19 7:49 PM, David Holmes wrote: >> Bug: https://bugs.openjdk.java.net/browse/JDK-8221205 >> webrev: http://cr.openjdk.java.net/~dholmes/8221205/webrev/ > > src/hotspot/share/runtime/globals.hpp > ??? No comments. > > src/hotspot/share/runtime/thread.cpp > ??? No comments. > > Thumbs up. Thanks for looking at this Dan! > > It would be possible to change it to an instance method, but that would > > require touching a lot more code. > > I agree that getting rid of the 'thread' parameter and changing > check_safepoint_and_suspend_for_native_trans() to an instance method > does not have to be done by this fix. Do you want to log it as an RFE > for the future or just move on to bigger fish? Move on to bigger fish. This could be folded into a larger cleanup/change in this area - I suspect we still have a few in the pipeline. Thanks, David > Dan > > > > >> >> The ancient AllowJNIEnvProxy flag was deprecated in 13 and is now >> being obsoleted in 14. >> >> The code that contained the flag check always operates on a thread >> instance that is the current thread. >> >> Thanks, >> David > From jianglizhou at google.com Wed Jul 24 22:50:57 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Wed, 24 Jul 2019 15:50:57 -0700 Subject: RFR: 8228507: Archive FDBigInteger In-Reply-To: <69f7689a-ab6f-5517-60a4-7036176930c5@oracle.com> References: <56dfa19a-76c7-3318-bb6f-7e1a04c9cb44@oracle.com> <69f7689a-ab6f-5517-60a4-7036176930c5@oracle.com> Message-ID: On Wed, Jul 24, 2019 at 3:11 PM Claes Redestad wrote: > > Hi Jiangli, > > thank you for reviewing! The objects we're dealing with here are all > effectively immutable, however, at least one reference field in > FDBigInteger is non-final, so putting it in a closed archive region is > causing a lot of warnings at dump time. Perhaps we should add a way > to suppress such warnings on a case-by-case basis? Agreed! Best regards, Jiangli > > Thanks! > > /Claes > > On 2019-07-24 20:00, Jiangli Zhou wrote: > > Hi Claes, > > > > This looks good to me. I wonder if FDBigInteger.archivedCaches could > > be placed in closed_archive_subgraph_entry_fields for the 'closed' > > archive region? The makeImmutable() is called for the cached > > FDBigInteger objects. > > > > Best regards, > > Jiangli > > > > On Wed, Jul 24, 2019 at 2:41 AM Claes Redestad > > wrote: > >> > >> Hi, > >> > >> any double<->String conversion will trigger load of > >> jdk.internal.math.FDBigInteger, which has a static > >> initializer pre-calculating a relatively large number > >> of values. > >> > >> Archiving these pre-calculated values reduces the time > >> spent in FDBigInteger. from a couple of > >> milliseconds down to "nothing". > >> > >> Bug: https://bugs.openjdk.java.net/browse/JDK-8228507 > >> Webrev: http://cr.openjdk.java.net/~redestad/8228507/open.00/ > >> > >> Testing: tier1-3 > >> > >> Thanks! > >> > >> /Claes From calvin.cheung at oracle.com Wed Jul 24 23:23:37 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Wed, 24 Jul 2019 16:23:37 -0700 Subject: RFR (S) 8202339: [TESTBUG] Consolidate the tests in runtime/SharedArchiveFile and runtime/appcds In-Reply-To: <0aea5937-718b-44b0-1f87-968574be2ce3@oracle.com> References: <0aea5937-718b-44b0-1f87-968574be2ce3@oracle.com> Message-ID: <8e6f25ef-0456-d1a0-e465-95b74533eb4d@oracle.com> Hi Coleen, Thanks for doing this tests consolidation. In TEST.groups, for the hotspot_appcds_dynamic test group (starting from line 313), I think we need to exclude the tests which used to reside under the SharedArchiveFile dir as well. You can use mach5 to test it by supplying the following args: --test hotspot_appcds_dynamic --jvm-args "Dtest.dynamic.cds.archive=true" or if you'd like to run it locally using jtreg, add the following args: -vmoptions:-Dtest.dynamic.cds.archive=true /open/test/hotspot/jtreg:hotspot_appcds_dynamic (where is the full path to the top dir of your repo) thanks, Calvin On 7/24/19 10:38 AM, coleen.phillimore at oracle.com wrote: > Summary: moved appcds and SharedArchive files to test/hotspot/runtime/cds > > This is 99% tedious.? I moved the files with hg move and fixed the > directory references in the tests to 'cds'.?? Tested with mach5 > hs-tier1,2,3. > > open webrev at http://cr.openjdk.java.net/~coleenp/2019/8202339.01/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8202339 > > Thanks, > Coleen From coleen.phillimore at oracle.com Thu Jul 25 02:43:03 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 24 Jul 2019 22:43:03 -0400 Subject: RFR (S) 8202339: [TESTBUG] Consolidate the tests in runtime/SharedArchiveFile and runtime/appcds In-Reply-To: <8e6f25ef-0456-d1a0-e465-95b74533eb4d@oracle.com> References: <0aea5937-718b-44b0-1f87-968574be2ce3@oracle.com> <8e6f25ef-0456-d1a0-e465-95b74533eb4d@oracle.com> Message-ID: <5c2c9617-08be-fa65-ce04-10c3a5c92952@oracle.com> I'm going to withdraw this change.? It seems that the SharedArchiveFile tests use different utility classes than the appcds tests, so they fail with dynamic archiving.? It looks like consolidating these is a lot more work! Coleen On 7/24/19 7:23 PM, Calvin Cheung wrote: > Hi Coleen, > > Thanks for doing this tests consolidation. > > In TEST.groups, for the hotspot_appcds_dynamic test group (starting > from line 313), I think we need to exclude the tests which used to > reside under the SharedArchiveFile dir as well. > > You can use mach5 to test it by supplying the following args: > > --test hotspot_appcds_dynamic --jvm-args "Dtest.dynamic.cds.archive=true" > > or if you'd like to run it locally using jtreg, add the following args: > > -vmoptions:-Dtest.dynamic.cds.archive=true > /open/test/hotspot/jtreg:hotspot_appcds_dynamic > > (where is the full path to the top dir of your repo) > > thanks, > > Calvin > > On 7/24/19 10:38 AM, coleen.phillimore at oracle.com wrote: >> Summary: moved appcds and SharedArchive files to >> test/hotspot/runtime/cds >> >> This is 99% tedious.? I moved the files with hg move and fixed the >> directory references in the tests to 'cds'.?? Tested with mach5 >> hs-tier1,2,3. >> >> open webrev at >> http://cr.openjdk.java.net/~coleenp/2019/8202339.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8202339 >> >> Thanks, >> Coleen From ralf.schmelter at sap.com Thu Jul 25 15:08:38 2019 From: ralf.schmelter at sap.com (Schmelter, Ralf) Date: Thu, 25 Jul 2019 15:08:38 +0000 Subject: RFR (M) 8191521: handle long relative path specified in -Xbootclasspath/a on windows In-Reply-To: <2021b295-463c-6ea1-97b5-3512cacc28c1@oracle.com> References: <2021b295-463c-6ea1-97b5-3512cacc28c1@oracle.com> Message-ID: Hi Calvin, thanks for reviewing this change. > 4315 } else { > 4316 prefix = L"\\\\?\\UNC"; > 4317 prefix_off = 1; // Overwrite the first char with the prefix. > 4318 } > Do you mean if the original path is something like "\\\\x\\y", it would > become the following? > > ""\\\\?\\UNC\\x\\y" Exactly. Usually you can just add a prefix, but in case of the an UNC path, the double backslash must be changed to a single one. > Maybe add an example in the comment? Good idea. > 4368 if (err != ERROR_SUCCESS) { > 4369 os::free(result); > I think you need a NULL check on result before calling os::free() > because result could be NULL if the os::malloc() call at line 4328 has > failed. os::free() is like ::free() and delete. You can call them with NULL, which is just treated as a no-op. > line 4282 please add a blank space before "The" > line 4283 "er" should be "err" Good catch. > Could you also mention that the function is based on pathToNTPath() in > io_util_md.cpp? OK. > I haven't reviewed this file in details but I have tried your patch and > saw failures in tier1 testing. That's interesting. Do you only see the errors with UNC paths? If yes, could you open the 'Computer Management' application and look at 'System Tools' -> 'Shared Folders' -> 'Shares'. Usually if you have a harddrive at let's say D:, windows will create a share called D$. I've used this to check UNC paths, but the share might be missing. Especially if T: is not a harddrive but a mapped drive. I should probably first test, if a can access the \\localhost\$ share at all and only then run the UNC path tests. Best regards, Ralf From calvin.cheung at oracle.com Thu Jul 25 23:15:22 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Thu, 25 Jul 2019 16:15:22 -0700 Subject: RFR (M) 8191521: handle long relative path specified in -Xbootclasspath/a on windows In-Reply-To: References: <2021b295-463c-6ea1-97b5-3512cacc28c1@oracle.com> Message-ID: <48005d2b-61b8-9027-db56-e3713c5a90bf@oracle.com> On 7/25/19 8:08 AM, Schmelter, Ralf wrote: > Hi Calvin, > > thanks for reviewing this change. > >> 4315 } else { >> 4316 prefix = L"\\\\?\\UNC"; >> 4317 prefix_off = 1; // Overwrite the first char with the prefix. >> 4318 } >> Do you mean if the original path is something like "\\\\x\\y", it would >> become the following? >> >> ""\\\\?\\UNC\\x\\y" > Exactly. Usually you can just add a prefix, but in case of the an UNC path, > the double backslash must be changed to a single one. > >> Maybe add an example in the comment? > Good idea. > >> 4368 if (err != ERROR_SUCCESS) { >> 4369 os::free(result); >> I think you need a NULL check on result before calling os::free() >> because result could be NULL if the os::malloc() call at line 4328 has >> failed. > os::free() is like ::free() and delete. You can call them with NULL, > which is just treated as a no-op. > >> line 4282 please add a blank space before "The" >> line 4283 "er" should be "err" > Good catch. > >> Could you also mention that the function is based on pathToNTPath() in >> io_util_md.cpp? > OK. > >> I haven't reviewed this file in details but I have tried your patch and >> saw failures in tier1 testing. > That's interesting. Do you only see the errors with UNC paths? I'm not sure. Most errors with path begins with "\\\\localhost\\T$\\..." A few errors with path begins with "//\\/\\localhost\\T$\\..." or "\\\\/\\\\localhost\\\\/\\T$..." > If yes, > could you open the 'Computer Management' application and look at > 'System Tools' -> 'Shared Folders' -> 'Shares'. Usually if you have > a harddrive at let's say D:, windows will create a share called D$. > I've used this to check UNC paths, but the share might be missing. > Especially if T: is not a harddrive but a mapped drive. I don't have direct access to the windows machine but I was told that T: is mapped to some folder in the C: drive using 'subst' such as 'subst T: C:\somedir'. > > I should probably first test, if a can access the \\localhost\$ > share at all and only then run the UNC path tests. Sounds good. thanks, Calvin > > Best regards, > Ralf From Pengfei.Li at arm.com Fri Jul 26 03:45:54 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Fri, 26 Jul 2019 03:45:54 +0000 Subject: RFR(S): 8228601: AArch64: Fix interpreter code at JVMCI deoptimization entry Message-ID: Hi, Please help review this AArch64 bug fix. JBS: https://bugs.openjdk.java.net/browse/JDK-8228601 Webrev: http://cr.openjdk.java.net/~pli/rfr/8228601/webrev.00/ AArch64 HotSpot crashes when a Graal-compiled Java synchronized method is deoptimized and then re-executed in the interpreter. This issue can be reproduced by below Java program with VM options "-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -Djvmci.Compiler=graal". public class Test { public static synchronized int hash(Object o) { return o.hashCode(); } public static void main(String[] args) throws Exception { int sum = 0; for (int i = 0; i < 30000; i++) { sum += hash(i); Thread.sleep(1); } sum += hash("Shanghai"); System.out.println(sum); } } When a JVMCI-compiled Java synchronized method gets deoptimized for some reasons, HotSpot sets the thread local _pending_monitorenter flag in the deoptimization routine.[1] Then in the interpreter mode, the method is locked at the deoptimization entry before being re-executed if this flag is set.[2] But in current AArch64 HotSpot, the generated interpreter code which checks the _pending_monitorenter flag is wrong. It causes synchronized method not being correctly locked. When the method returns in interpreter, HotSpot crashes because it's trying to unlock an invalid lock. This patch fixes the condition for locking a Java method in interpreter. Below JTreg test failures with AArch64 Graal also get fixed after this patch. * jdk/java/util/Map/InPlaceOpsCollisions.java * jdk/sun/security/tools/keytool/KeyToolTest.java * hotspot/jtreg/serviceability/sa/TestHeapDumpForInvokeDynamic.java [1] http://hg.openjdk.java.net/jdk/jdk/file/6073b2290c0a/src/hotspot/share/runtime/deoptimization.cpp#l1683 [2] http://hg.openjdk.java.net/jdk/jdk/file/6073b2290c0a/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp#l522 -- Thanks, Pengfei From ralf.schmelter at sap.com Fri Jul 26 08:06:29 2019 From: ralf.schmelter at sap.com (Schmelter, Ralf) Date: Fri, 26 Jul 2019 08:06:29 +0000 Subject: RFR (M) 8191521: handle long relative path specified in -Xbootclasspath/a on windows In-Reply-To: <48005d2b-61b8-9027-db56-e3713c5a90bf@oracle.com> References: <2021b295-463c-6ea1-97b5-3512cacc28c1@oracle.com> <48005d2b-61b8-9027-db56-e3713c5a90bf@oracle.com> Message-ID: Hi Calvin, > I'm not sure. Most errors with path begins with "\\\\localhost\\T$\\..." > A few errors with path begins with "//\\/\\localhost\\T$\\..." > or "\\\\/\\\\localhost\\\\/\\T$..." OK, so it is only for UNC path. I will add a check to see if the share is present and otherwise skip the tests. Best regards, Ralf From boris.ulasevich at bell-sw.com Fri Jul 26 10:49:56 2019 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Fri, 26 Jul 2019 13:49:56 +0300 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: References: Message-ID: <6cdc76d5-dca1-d033-856b-d264a71f8f23@bell-sw.com> Hi Martin, Your change works Ok on arm32 with the minor correction. See the patch attached. thanks, Boris On 16.07.2019 16:31, Doerr, Martin wrote: > Hi, > > the current implementation of FastJNIAccessors ignores the flag -XX:+UseFastJNIAccessors when the JVMTI capability "can_post_field_access" is enabled. > This is an unnecessary restriction which makes field accesses (GetField) from native code slower when a JVMTI agent is attached which enables this capability. > A better implementation would check at runtime if an agent actually wants to receive field access events. > > Note that the bytecode interpreter already uses this better implementation by checking if field access watch events were requested (JvmtiExport::_field_access_count != 0). > > I have implemented such a runtime check on all platforms which currently support FastJNIAccessors. > > My new jtreg test runtime/jni/FastGetField/FastGetField.java contains a micro benchmark: > test-support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/FastGetField/FastGetField.jtr > shows the duration of 10000 iterations with and without UseFastJNIAccessors (JVMTI agent gets attached in both runs). > My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with FastJNIAccessors and 11.2ms without it. > > Webrev: > http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/ > > We have run the test on 64 bit x86 platforms, SPARC and aarch64. > (FastJNIAccessors are not yet available on PPC64 and s390. I'll contribute them later.) > My webrev contains 32 bit implementations for x86 and arm, but completely untested. It'd be great if somebody could volunteer to review and test these platforms. > > Please review. > > Best regards, > Martin > -------------- next part -------------- A non-text attachment was scrubbed... Name: jniFastGetField_arm32.patch Type: text/x-patch Size: 1109 bytes Desc: not available URL: From martin.doerr at sap.com Fri Jul 26 11:02:05 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 26 Jul 2019 11:02:05 +0000 Subject: RFR(S): 8228649: [PPC64] SA reads wrong slots from interpreter frames Message-ID: Hi, the jtreg test "serviceability/sa/sadebugd/DebugdConnectTest.java" fails with "AssertionFailure: result must >= than stack pointer" on PPC64. The Java code doesn't read the right slots for the interpreter frame's monitors. I've removed the extra "reserved" slot which existed only in debug builds. It was used as additional frame check, but I think we can live without it. My new proposal also initializes all relevant interpreter frame slots for better tool support: http://cr.openjdk.java.net/~mdoerr/8228649_PPC64_sa/webrev.00/ Please review. Best regards, Martin From harold.seigel at oracle.com Fri Jul 26 12:04:47 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Fri, 26 Jul 2019 08:04:47 -0400 Subject: RFR 8228596: Class redefinition fails when condy instructions are removed Message-ID: Hi, Please review this small JDK-14 fix for an issue with constant pool merging when redefining a class whose constant pool contains a constant dynamic entry.? The fix makes sure that the has_dynamic_constant flag gets copied properly to the merged constant pool. Open Webrev: http://cr.openjdk.java.net/~hseigel/bug_8228596/webrev/index.html JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8228596 The fix was regression tested by running Mach5 tiers 1 and 2 tests and builds on Linux-x64, Solaris, Windows, and Mac OS X, by running Mach5 tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on Linux-x64. Thanks, Harold From coleen.phillimore at oracle.com Fri Jul 26 12:18:11 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 26 Jul 2019 08:18:11 -0400 Subject: RFR 8228596: Class redefinition fails when condy instructions are removed In-Reply-To: References: Message-ID: <4926ff8e-ee5c-2ff3-abfc-105e6309c6b2@oracle.com> Looks great! Coleen On 7/26/19 8:04 AM, Harold Seigel wrote: > Hi, > > Please review this small JDK-14 fix for an issue with constant pool > merging when redefining a class whose constant pool contains a > constant dynamic entry.? The fix makes sure that the > has_dynamic_constant flag gets copied properly to the merged constant > pool. > > Open Webrev: > http://cr.openjdk.java.net/~hseigel/bug_8228596/webrev/index.html > > JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8228596 > > The fix was regression tested by running Mach5 tiers 1 and 2 tests and > builds on Linux-x64, Solaris, Windows, and Mac OS X, by running Mach5 > tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on Linux-x64. > > Thanks, Harold > From ralf.schmelter at sap.com Fri Jul 26 12:20:32 2019 From: ralf.schmelter at sap.com (Schmelter, Ralf) Date: Fri, 26 Jul 2019 12:20:32 +0000 Subject: RFR (M) 8191521: handle long relative path specified in -Xbootclasspath/a on windows In-Reply-To: <48005d2b-61b8-9027-db56-e3713c5a90bf@oracle.com> References: <2021b295-463c-6ea1-97b5-3512cacc28c1@oracle.com> <48005d2b-61b8-9027-db56-e3713c5a90bf@oracle.com> Message-ID: Hi Calvin, I've updated the webrev with your suggestions. The tests now disable the UNC path portion, if you don't have share $ for : So all tests should now run on your machine. http://cr.openjdk.java.net/~rschmelter/webrevs/8191521/webrev.1/ Best regards, Ralf From harold.seigel at oracle.com Fri Jul 26 12:24:49 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Fri, 26 Jul 2019 08:24:49 -0400 Subject: RFR 8228596: Class redefinition fails when condy instructions are removed In-Reply-To: <4926ff8e-ee5c-2ff3-abfc-105e6309c6b2@oracle.com> References: <4926ff8e-ee5c-2ff3-abfc-105e6309c6b2@oracle.com> Message-ID: Thanks Coleen! Harold On 7/26/2019 8:18 AM, coleen.phillimore at oracle.com wrote: > Looks great! > Coleen > > On 7/26/19 8:04 AM, Harold Seigel wrote: >> Hi, >> >> Please review this small JDK-14 fix for an issue with constant pool >> merging when redefining a class whose constant pool contains a >> constant dynamic entry.? The fix makes sure that the >> has_dynamic_constant flag gets copied properly to the merged constant >> pool. >> >> Open Webrev: >> http://cr.openjdk.java.net/~hseigel/bug_8228596/webrev/index.html >> >> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8228596 >> >> The fix was regression tested by running Mach5 tiers 1 and 2 tests >> and builds on Linux-x64, Solaris, Windows, and Mac OS X, by running >> Mach5 tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on >> Linux-x64. >> >> Thanks, Harold >> > From martin.doerr at sap.com Fri Jul 26 13:59:37 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 26 Jul 2019 13:59:37 +0000 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: <6cdc76d5-dca1-d033-856b-d264a71f8f23@bell-sw.com> References: <6cdc76d5-dca1-d033-856b-d264a71f8f23@bell-sw.com> Message-ID: Hi Boris, thank you very much for testing. Unfortunately, arm 32 was also affected by the issue Erik has found for aarch64: We need a little stronger memory barriers to support accessing volatile fields with correct ordering semantics. I've updated that in the current webrev already: http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.02/ I'm using membar(MacroAssembler::Membar_mask_bits(MacroAssembler::LoadLoad | MacroAssembler::LoadStore), Rtmp2), now. I've already used a cross build to check that it compiles, but I haven't run it. I believe this membar doesn't have a significant performance impact. Would be great if you could take a look and test that, too. Thanks and best regards, Martin > -----Original Message----- > From: Boris Ulasevich > Sent: Freitag, 26. Juli 2019 12:50 > To: Doerr, Martin > Cc: hotspot-runtime-dev at openjdk.java.net; serviceability- > dev at openjdk.java.net > Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access > event requests at runtime > > Hi Martin, > > Your change works Ok on arm32 with the minor correction. See the patch > attached. > > thanks, > Boris > > On 16.07.2019 16:31, Doerr, Martin wrote: > > Hi, > > > > the current implementation of FastJNIAccessors ignores the flag - > XX:+UseFastJNIAccessors when the JVMTI capability > "can_post_field_access" is enabled. > > This is an unnecessary restriction which makes field accesses > (GetField) from native code slower when a JVMTI agent is attached > which enables this capability. > > A better implementation would check at runtime if an agent actually wants > to receive field access events. > > > > Note that the bytecode interpreter already uses this better > implementation by checking if field access watch events were requested > (JvmtiExport::_field_access_count != 0). > > > > I have implemented such a runtime check on all platforms which currently > support FastJNIAccessors. > > > > My new jtreg test runtime/jni/FastGetField/FastGetField.java contains a > micro benchmark: > > test- > support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/Fa > stGetField/FastGetField.jtr > > shows the duration of 10000 iterations with and without > UseFastJNIAccessors (JVMTI agent gets attached in both runs). > > My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with > FastJNIAccessors and 11.2ms without it. > > > > Webrev: > > > http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/ > > > > We have run the test on 64 bit x86 platforms, SPARC and aarch64. > > (FastJNIAccessors are not yet available on PPC64 and s390. I'll contribute > them later.) > > My webrev contains 32 bit implementations for x86 and arm, but > completely untested. It'd be great if somebody could volunteer to review > and test these platforms. > > > > Please review. > > > > Best regards, > > Martin > > From daniel.daugherty at oracle.com Fri Jul 26 14:03:40 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 26 Jul 2019 10:03:40 -0400 Subject: RFR 8228596: Class redefinition fails when condy instructions are removed In-Reply-To: References: Message-ID: On 7/26/19 8:04 AM, Harold Seigel wrote: > Hi, > > Please review this small JDK-14 fix for an issue with constant pool > merging when redefining a class whose constant pool contains a > constant dynamic entry.? The fix makes sure that the > has_dynamic_constant flag gets copied properly to the merged constant > pool. > > Open Webrev: > http://cr.openjdk.java.net/~hseigel/bug_8228596/webrev/index.html src/hotspot/share/prims/jvmtiRedefineClasses.cpp ??? L1626:? if (old_cp->has_dynamic_constant()) { ??? L1627: ?? merge_cp->set_has_dynamic_constant(); ??? L1628: ?? scratch_cp->set_has_dynamic_constant(); ??? L1629: } ??????? L1626-8 need be indented one more space. ??????? L1629 needs to be indented two more spaces. test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/RedefineCondy.jasm ??? No comments. test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/TestRedefineCondy.java ??? No comments. At what Mach5 Tier does the new test execute? Thumbs up. No need to see a new webrev if you fix the indents above. Dan > > JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8228596 > > The fix was regression tested by running Mach5 tiers 1 and 2 tests and > builds on Linux-x64, Solaris, Windows, and Mac OS X, by running Mach5 > tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on Linux-x64. > > Thanks, Harold > From adinn at redhat.com Fri Jul 26 14:05:07 2019 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 26 Jul 2019 15:05:07 +0100 Subject: RFR(S): 8228601: AArch64: Fix interpreter code at JVMCI deoptimization entry In-Reply-To: References: Message-ID: Hi Pengfei, On 26/07/2019 04:45, Pengfei Li (Arm Technology China) wrote: > Please help review this AArch64 bug fix. > JBS: https://bugs.openjdk.java.net/browse/JDK-8228601 > Webrev: http://cr.openjdk.java.net/~pli/rfr/8228601/webrev.00/ Well done for catching that error. The patch is correct. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From daniel.daugherty at oracle.com Fri Jul 26 14:08:26 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 26 Jul 2019 10:08:26 -0400 Subject: RFR 8228596: Class redefinition fails when condy instructions are removed In-Reply-To: References: Message-ID: I forgot to mention that you should ping the Serviceability team for this since both the code and new test are in JVM/TI. I believe that Serguei is handling JVM/TI these days so I've added him on this email thread... Dan On 7/26/19 10:03 AM, Daniel D. Daugherty wrote: > On 7/26/19 8:04 AM, Harold Seigel wrote: >> Hi, >> >> Please review this small JDK-14 fix for an issue with constant pool >> merging when redefining a class whose constant pool contains a >> constant dynamic entry.? The fix makes sure that the >> has_dynamic_constant flag gets copied properly to the merged constant >> pool. >> >> Open Webrev: >> http://cr.openjdk.java.net/~hseigel/bug_8228596/webrev/index.html > > src/hotspot/share/prims/jvmtiRedefineClasses.cpp > ??? L1626:? if (old_cp->has_dynamic_constant()) { > ??? L1627: ?? merge_cp->set_has_dynamic_constant(); > ??? L1628: ?? scratch_cp->set_has_dynamic_constant(); > ??? L1629: } > ??????? L1626-8 need be indented one more space. > ??????? L1629 needs to be indented two more spaces. > > test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/RedefineCondy.jasm > > ??? No comments. > > test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/TestRedefineCondy.java > > ??? No comments. > > At what Mach5 Tier does the new test execute? > > Thumbs up. No need to see a new webrev if you fix the indents above. > > Dan > > > >> >> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8228596 >> >> The fix was regression tested by running Mach5 tiers 1 and 2 tests >> and builds on Linux-x64, Solaris, Windows, and Mac OS X, by running >> Mach5 tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on >> Linux-x64. >> >> Thanks, Harold >> > > From boris.ulasevich at bell-sw.com Fri Jul 26 16:17:57 2019 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Fri, 26 Jul 2019 19:17:57 +0300 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: References: <6cdc76d5-dca1-d033-856b-d264a71f8f23@bell-sw.com> Message-ID: <17af6e76-ecff-9eac-4efd-dec14c956c74@bell-sw.com> Hi Martin, The webrev.02 change works good if we increase BUFFER_SIZE. Current change gives "BUFFER_SIZE too small" assertion. I propose to change BUFFER_SIZE value to 120, it works Ok then. glad to help you, regards, Boris On 26.07.2019 16:59, Doerr, Martin wrote: > Hi Boris, > > thank you very much for testing. > > Unfortunately, arm 32 was also affected by the issue Erik has found for aarch64: > We need a little stronger memory barriers to support accessing volatile fields with correct ordering semantics. > > I've updated that in the current webrev already: > http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.02/ > > I'm using membar(MacroAssembler::Membar_mask_bits(MacroAssembler::LoadLoad | MacroAssembler::LoadStore), Rtmp2), now. > I've already used a cross build to check that it compiles, but I haven't run it. > I believe this membar doesn't have a significant performance impact. > > Would be great if you could take a look and test that, too. > > Thanks and best regards, > Martin > > >> -----Original Message----- >> From: Boris Ulasevich >> Sent: Freitag, 26. Juli 2019 12:50 >> To: Doerr, Martin >> Cc: hotspot-runtime-dev at openjdk.java.net; serviceability- >> dev at openjdk.java.net >> Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access >> event requests at runtime >> >> Hi Martin, >> >> Your change works Ok on arm32 with the minor correction. See the patch >> attached. >> >> thanks, >> Boris >> >> On 16.07.2019 16:31, Doerr, Martin wrote: >>> Hi, >>> >>> the current implementation of FastJNIAccessors ignores the flag - >> XX:+UseFastJNIAccessors when the JVMTI capability >> "can_post_field_access" is enabled. >>> This is an unnecessary restriction which makes field accesses >> (GetField) from native code slower when a JVMTI agent is attached >> which enables this capability. >>> A better implementation would check at runtime if an agent actually wants >> to receive field access events. >>> >>> Note that the bytecode interpreter already uses this better >> implementation by checking if field access watch events were requested >> (JvmtiExport::_field_access_count != 0). >>> >>> I have implemented such a runtime check on all platforms which currently >> support FastJNIAccessors. >>> >>> My new jtreg test runtime/jni/FastGetField/FastGetField.java contains a >> micro benchmark: >>> test- >> support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/Fa >> stGetField/FastGetField.jtr >>> shows the duration of 10000 iterations with and without >> UseFastJNIAccessors (JVMTI agent gets attached in both runs). >>> My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with >> FastJNIAccessors and 11.2ms without it. >>> >>> Webrev: >>> >> http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/ >>> >>> We have run the test on 64 bit x86 platforms, SPARC and aarch64. >>> (FastJNIAccessors are not yet available on PPC64 and s390. I'll contribute >> them later.) >>> My webrev contains 32 bit implementations for x86 and arm, but >> completely untested. It'd be great if somebody could volunteer to review >> and test these platforms. >>> >>> Please review. >>> >>> Best regards, >>> Martin >>> From aph at redhat.com Fri Jul 26 16:35:34 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 26 Jul 2019 17:35:34 +0100 Subject: [aarch64-port-dev ] RFR(S): 8228601: AArch64: Fix interpreter code at JVMCI deoptimization entry In-Reply-To: References: Message-ID: <13ad2be7-169b-0635-1edc-8936d9c4c103@redhat.com> On 7/26/19 4:45 AM, Pengfei Li (Arm Technology China) wrote: > Please help review this AArch64 bug fix. > JBS: https://bugs.openjdk.java.net/browse/JDK-8228601 > Webrev: http://cr.openjdk.java.net/~pli/rfr/8228601/webrev.00/ Great catch, well done! Patch looks OK. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From harold.seigel at oracle.com Fri Jul 26 17:30:30 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Fri, 26 Jul 2019 13:30:30 -0400 Subject: RFR 8228596: Class redefinition fails when condy instructions are removed In-Reply-To: References: Message-ID: <82353232-9f7f-f495-3bc5-008ca74e2267@oracle.com> Hi Dan, Thanks for reviewing this and for cc-ing Seguei. Please see comments inline. On 7/26/2019 10:08 AM, Daniel D. Daugherty wrote: > I forgot to mention that you should ping the Serviceability team for > this since both the code and new test are in JVM/TI. I believe that > Serguei is handling JVM/TI these days so I've added him on this > email thread... > > Dan > > > On 7/26/19 10:03 AM, Daniel D. Daugherty wrote: >> On 7/26/19 8:04 AM, Harold Seigel wrote: >>> Hi, >>> >>> Please review this small JDK-14 fix for an issue with constant pool >>> merging when redefining a class whose constant pool contains a >>> constant dynamic entry.? The fix makes sure that the >>> has_dynamic_constant flag gets copied properly to the merged >>> constant pool. >>> >>> Open Webrev: >>> http://cr.openjdk.java.net/~hseigel/bug_8228596/webrev/index.html >> >> src/hotspot/share/prims/jvmtiRedefineClasses.cpp >> ??? L1626:? if (old_cp->has_dynamic_constant()) { >> ??? L1627: ?? merge_cp->set_has_dynamic_constant(); >> ??? L1628: ?? scratch_cp->set_has_dynamic_constant(); >> ??? L1629: } >> ??????? L1626-8 need be indented one more space. >> ??????? L1629 needs to be indented two more spaces. Done. >> >> test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/RedefineCondy.jasm >> >> ??? No comments. >> >> test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/TestRedefineCondy.java >> >> ??? No comments. >> >> At what Mach5 Tier does the new test execute? The new test executes in both tier1 and tier3. Thanks, Harold >> >> Thumbs up. No need to see a new webrev if you fix the indents above. >> >> Dan >> >> >> >>> >>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8228596 >>> >>> The fix was regression tested by running Mach5 tiers 1 and 2 tests >>> and builds on Linux-x64, Solaris, Windows, and Mac OS X, by running >>> Mach5 tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on >>> Linux-x64. >>> >>> Thanks, Harold >>> >> >> > From coleen.phillimore at oracle.com Fri Jul 26 17:46:13 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 26 Jul 2019 13:46:13 -0400 Subject: RFR (S) 8202339: [TESTBUG] Consolidate the tests in runtime/SharedArchiveFile and runtime/appcds In-Reply-To: <5c2c9617-08be-fa65-ce04-10c3a5c92952@oracle.com> References: <0aea5937-718b-44b0-1f87-968574be2ce3@oracle.com> <8e6f25ef-0456-d1a0-e465-95b74533eb4d@oracle.com> <5c2c9617-08be-fa65-ce04-10c3a5c92952@oracle.com> Message-ID: Okay, trying again.? I moved SharedArchiveFile to cds and moved appcds under cds.? Now the appcds tests can be run with dynamic archiving, and all tests are run with jtreg:hotspot_cds, and they can share files from SharedArchiveFile. open webrev at http://cr.openjdk.java.net/~coleenp/2019/8202339.03/webrev Retested with tier2. Thanks, Coleen On 7/24/19 10:43 PM, coleen.phillimore at oracle.com wrote: > > I'm going to withdraw this change.? It seems that the > SharedArchiveFile tests use different utility classes than the appcds > tests, so they fail with dynamic archiving.? It looks like > consolidating these is a lot more work! > > Coleen > > On 7/24/19 7:23 PM, Calvin Cheung wrote: >> Hi Coleen, >> >> Thanks for doing this tests consolidation. >> >> In TEST.groups, for the hotspot_appcds_dynamic test group (starting >> from line 313), I think we need to exclude the tests which used to >> reside under the SharedArchiveFile dir as well. >> >> You can use mach5 to test it by supplying the following args: >> >> --test hotspot_appcds_dynamic --jvm-args >> "Dtest.dynamic.cds.archive=true" >> >> or if you'd like to run it locally using jtreg, add the following args: >> >> -vmoptions:-Dtest.dynamic.cds.archive=true >> /open/test/hotspot/jtreg:hotspot_appcds_dynamic >> >> (where is the full path to the top dir of your repo) >> >> thanks, >> >> Calvin >> >> On 7/24/19 10:38 AM, coleen.phillimore at oracle.com wrote: >>> Summary: moved appcds and SharedArchive files to >>> test/hotspot/runtime/cds >>> >>> This is 99% tedious.? I moved the files with hg move and fixed the >>> directory references in the tests to 'cds'.?? Tested with mach5 >>> hs-tier1,2,3. >>> >>> open webrev at >>> http://cr.openjdk.java.net/~coleenp/2019/8202339.01/webrev >>> bug link https://bugs.openjdk.java.net/browse/JDK-8202339 >>> >>> Thanks, >>> Coleen > From mikhailo.seledtsov at oracle.com Fri Jul 26 18:01:59 2019 From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com) Date: Fri, 26 Jul 2019 11:01:59 -0700 Subject: RFR (S) 8202339: [TESTBUG] Consolidate the tests in runtime/SharedArchiveFile and runtime/appcds In-Reply-To: References: <0aea5937-718b-44b0-1f87-968574be2ce3@oracle.com> <8e6f25ef-0456-d1a0-e465-95b74533eb4d@oracle.com> <5c2c9617-08be-fa65-ce04-10c3a5c92952@oracle.com> Message-ID: <281a21fd-5157-ce1c-57a0-e6355fcf22cf@oracle.com> Looks good to me, Misha On 7/26/19 10:46 AM, coleen.phillimore at oracle.com wrote: > > Okay, trying again.? I moved SharedArchiveFile to cds and moved appcds > under cds.? Now the appcds tests can be run with dynamic archiving, > and all tests are run with jtreg:hotspot_cds, and they can share files > from SharedArchiveFile. > > open webrev at http://cr.openjdk.java.net/~coleenp/2019/8202339.03/webrev > > Retested with tier2. > > Thanks, > Coleen > > On 7/24/19 10:43 PM, coleen.phillimore at oracle.com wrote: >> >> I'm going to withdraw this change.? It seems that the >> SharedArchiveFile tests use different utility classes than the appcds >> tests, so they fail with dynamic archiving.? It looks like >> consolidating these is a lot more work! >> >> Coleen >> >> On 7/24/19 7:23 PM, Calvin Cheung wrote: >>> Hi Coleen, >>> >>> Thanks for doing this tests consolidation. >>> >>> In TEST.groups, for the hotspot_appcds_dynamic test group (starting >>> from line 313), I think we need to exclude the tests which used to >>> reside under the SharedArchiveFile dir as well. >>> >>> You can use mach5 to test it by supplying the following args: >>> >>> --test hotspot_appcds_dynamic --jvm-args >>> "Dtest.dynamic.cds.archive=true" >>> >>> or if you'd like to run it locally using jtreg, add the following args: >>> >>> -vmoptions:-Dtest.dynamic.cds.archive=true >>> /open/test/hotspot/jtreg:hotspot_appcds_dynamic >>> >>> (where is the full path to the top dir of your repo) >>> >>> thanks, >>> >>> Calvin >>> >>> On 7/24/19 10:38 AM, coleen.phillimore at oracle.com wrote: >>>> Summary: moved appcds and SharedArchive files to >>>> test/hotspot/runtime/cds >>>> >>>> This is 99% tedious.? I moved the files with hg move and fixed the >>>> directory references in the tests to 'cds'. Tested with mach5 >>>> hs-tier1,2,3. >>>> >>>> open webrev at >>>> http://cr.openjdk.java.net/~coleenp/2019/8202339.01/webrev >>>> bug link https://bugs.openjdk.java.net/browse/JDK-8202339 >>>> >>>> Thanks, >>>> Coleen >> > From coleen.phillimore at oracle.com Fri Jul 26 18:04:34 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 26 Jul 2019 14:04:34 -0400 Subject: RFR (S) 8202339: [TESTBUG] Consolidate the tests in runtime/SharedArchiveFile and runtime/appcds In-Reply-To: <281a21fd-5157-ce1c-57a0-e6355fcf22cf@oracle.com> References: <0aea5937-718b-44b0-1f87-968574be2ce3@oracle.com> <8e6f25ef-0456-d1a0-e465-95b74533eb4d@oracle.com> <5c2c9617-08be-fa65-ce04-10c3a5c92952@oracle.com> <281a21fd-5157-ce1c-57a0-e6355fcf22cf@oracle.com> Message-ID: <15ace5c0-be90-1451-d098-d48d4f91d84f@oracle.com> Thanks Misha for the help with this change! Coleen On 7/26/19 2:01 PM, mikhailo.seledtsov at oracle.com wrote: > Looks good to me, > > Misha > > On 7/26/19 10:46 AM, coleen.phillimore at oracle.com wrote: >> >> Okay, trying again.? I moved SharedArchiveFile to cds and moved >> appcds under cds.? Now the appcds tests can be run with dynamic >> archiving, and all tests are run with jtreg:hotspot_cds, and they can >> share files from SharedArchiveFile. >> >> open webrev at >> http://cr.openjdk.java.net/~coleenp/2019/8202339.03/webrev >> >> Retested with tier2. >> >> Thanks, >> Coleen >> >> On 7/24/19 10:43 PM, coleen.phillimore at oracle.com wrote: >>> >>> I'm going to withdraw this change.? It seems that the >>> SharedArchiveFile tests use different utility classes than the >>> appcds tests, so they fail with dynamic archiving.? It looks like >>> consolidating these is a lot more work! >>> >>> Coleen >>> >>> On 7/24/19 7:23 PM, Calvin Cheung wrote: >>>> Hi Coleen, >>>> >>>> Thanks for doing this tests consolidation. >>>> >>>> In TEST.groups, for the hotspot_appcds_dynamic test group (starting >>>> from line 313), I think we need to exclude the tests which used to >>>> reside under the SharedArchiveFile dir as well. >>>> >>>> You can use mach5 to test it by supplying the following args: >>>> >>>> --test hotspot_appcds_dynamic --jvm-args >>>> "Dtest.dynamic.cds.archive=true" >>>> >>>> or if you'd like to run it locally using jtreg, add the following >>>> args: >>>> >>>> -vmoptions:-Dtest.dynamic.cds.archive=true >>>> /open/test/hotspot/jtreg:hotspot_appcds_dynamic >>>> >>>> (where is the full path to the top dir of your repo) >>>> >>>> thanks, >>>> >>>> Calvin >>>> >>>> On 7/24/19 10:38 AM, coleen.phillimore at oracle.com wrote: >>>>> Summary: moved appcds and SharedArchive files to >>>>> test/hotspot/runtime/cds >>>>> >>>>> This is 99% tedious.? I moved the files with hg move and fixed the >>>>> directory references in the tests to 'cds'. Tested with mach5 >>>>> hs-tier1,2,3. >>>>> >>>>> open webrev at >>>>> http://cr.openjdk.java.net/~coleenp/2019/8202339.01/webrev >>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8202339 >>>>> >>>>> Thanks, >>>>> Coleen >>> >> From calvin.cheung at oracle.com Fri Jul 26 18:24:22 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Fri, 26 Jul 2019 11:24:22 -0700 Subject: RFR (S) 8202339: [TESTBUG] Consolidate the tests in runtime/SharedArchiveFile and runtime/appcds In-Reply-To: References: <0aea5937-718b-44b0-1f87-968574be2ce3@oracle.com> <8e6f25ef-0456-d1a0-e465-95b74533eb4d@oracle.com> <5c2c9617-08be-fa65-ce04-10c3a5c92952@oracle.com> Message-ID: <432573bd-f4ec-daf6-fee5-a12b1580a7f3@oracle.com> ?Hi Coleen, This looks good to me. thanks, Calvin On 7/26/19 10:46 AM, coleen.phillimore at oracle.com wrote: > > Okay, trying again.? I moved SharedArchiveFile to cds and moved appcds > under cds.? Now the appcds tests can be run with dynamic archiving, > and all tests are run with jtreg:hotspot_cds, and they can share files > from SharedArchiveFile. > > open webrev at http://cr.openjdk.java.net/~coleenp/2019/8202339.03/webrev > > Retested with tier2. > > Thanks, > Coleen > > On 7/24/19 10:43 PM, coleen.phillimore at oracle.com wrote: >> >> I'm going to withdraw this change.? It seems that the >> SharedArchiveFile tests use different utility classes than the appcds >> tests, so they fail with dynamic archiving.? It looks like >> consolidating these is a lot more work! >> >> Coleen >> >> On 7/24/19 7:23 PM, Calvin Cheung wrote: >>> Hi Coleen, >>> >>> Thanks for doing this tests consolidation. >>> >>> In TEST.groups, for the hotspot_appcds_dynamic test group (starting >>> from line 313), I think we need to exclude the tests which used to >>> reside under the SharedArchiveFile dir as well. >>> >>> You can use mach5 to test it by supplying the following args: >>> >>> --test hotspot_appcds_dynamic --jvm-args >>> "Dtest.dynamic.cds.archive=true" >>> >>> or if you'd like to run it locally using jtreg, add the following args: >>> >>> -vmoptions:-Dtest.dynamic.cds.archive=true >>> /open/test/hotspot/jtreg:hotspot_appcds_dynamic >>> >>> (where is the full path to the top dir of your repo) >>> >>> thanks, >>> >>> Calvin >>> >>> On 7/24/19 10:38 AM, coleen.phillimore at oracle.com wrote: >>>> Summary: moved appcds and SharedArchive files to >>>> test/hotspot/runtime/cds >>>> >>>> This is 99% tedious.? I moved the files with hg move and fixed the >>>> directory references in the tests to 'cds'. Tested with mach5 >>>> hs-tier1,2,3. >>>> >>>> open webrev at >>>> http://cr.openjdk.java.net/~coleenp/2019/8202339.01/webrev >>>> bug link https://bugs.openjdk.java.net/browse/JDK-8202339 >>>> >>>> Thanks, >>>> Coleen >> > From coleen.phillimore at oracle.com Fri Jul 26 18:33:23 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 26 Jul 2019 14:33:23 -0400 Subject: RFR (S) 8202339: [TESTBUG] Consolidate the tests in runtime/SharedArchiveFile and runtime/appcds In-Reply-To: <432573bd-f4ec-daf6-fee5-a12b1580a7f3@oracle.com> References: <0aea5937-718b-44b0-1f87-968574be2ce3@oracle.com> <8e6f25ef-0456-d1a0-e465-95b74533eb4d@oracle.com> <5c2c9617-08be-fa65-ce04-10c3a5c92952@oracle.com> <432573bd-f4ec-daf6-fee5-a12b1580a7f3@oracle.com> Message-ID: Thanks Calvin and also thanks for your help! Coleen On 7/26/19 2:24 PM, Calvin Cheung wrote: > ?Hi Coleen, > > This looks good to me. > > thanks, > > Calvin > > On 7/26/19 10:46 AM, coleen.phillimore at oracle.com wrote: >> >> Okay, trying again.? I moved SharedArchiveFile to cds and moved >> appcds under cds.? Now the appcds tests can be run with dynamic >> archiving, and all tests are run with jtreg:hotspot_cds, and they can >> share files from SharedArchiveFile. >> >> open webrev at >> http://cr.openjdk.java.net/~coleenp/2019/8202339.03/webrev >> >> Retested with tier2. >> >> Thanks, >> Coleen >> >> On 7/24/19 10:43 PM, coleen.phillimore at oracle.com wrote: >>> >>> I'm going to withdraw this change.? It seems that the >>> SharedArchiveFile tests use different utility classes than the >>> appcds tests, so they fail with dynamic archiving.? It looks like >>> consolidating these is a lot more work! >>> >>> Coleen >>> >>> On 7/24/19 7:23 PM, Calvin Cheung wrote: >>>> Hi Coleen, >>>> >>>> Thanks for doing this tests consolidation. >>>> >>>> In TEST.groups, for the hotspot_appcds_dynamic test group (starting >>>> from line 313), I think we need to exclude the tests which used to >>>> reside under the SharedArchiveFile dir as well. >>>> >>>> You can use mach5 to test it by supplying the following args: >>>> >>>> --test hotspot_appcds_dynamic --jvm-args >>>> "Dtest.dynamic.cds.archive=true" >>>> >>>> or if you'd like to run it locally using jtreg, add the following >>>> args: >>>> >>>> -vmoptions:-Dtest.dynamic.cds.archive=true >>>> /open/test/hotspot/jtreg:hotspot_appcds_dynamic >>>> >>>> (where is the full path to the top dir of your repo) >>>> >>>> thanks, >>>> >>>> Calvin >>>> >>>> On 7/24/19 10:38 AM, coleen.phillimore at oracle.com wrote: >>>>> Summary: moved appcds and SharedArchive files to >>>>> test/hotspot/runtime/cds >>>>> >>>>> This is 99% tedious.? I moved the files with hg move and fixed the >>>>> directory references in the tests to 'cds'. Tested with mach5 >>>>> hs-tier1,2,3. >>>>> >>>>> open webrev at >>>>> http://cr.openjdk.java.net/~coleenp/2019/8202339.01/webrev >>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8202339 >>>>> >>>>> Thanks, >>>>> Coleen >>> >> From serguei.spitsyn at oracle.com Fri Jul 26 18:44:33 2019 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 26 Jul 2019 11:44:33 -0700 Subject: RFR 8228596: Class redefinition fails when condy instructions are removed In-Reply-To: References: Message-ID: <30b26fc3-096c-7480-8dc0-209d8ec70767@oracle.com> Hi Harold, This looks good to me. Added the serviceability-dev mailing list. Thanks, Serguei On 7/26/19 05:04, Harold Seigel wrote: > Hi, > > Please review this small JDK-14 fix for an issue with constant pool > merging when redefining a class whose constant pool contains a > constant dynamic entry.? The fix makes sure that the > has_dynamic_constant flag gets copied properly to the merged constant > pool. > > Open Webrev: > http://cr.openjdk.java.net/~hseigel/bug_8228596/webrev/index.html > > JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8228596 > > The fix was regression tested by running Mach5 tiers 1 and 2 tests and > builds on Linux-x64, Solaris, Windows, and Mac OS X, by running Mach5 > tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on Linux-x64. > > Thanks, Harold > From patricio.chilano.mateo at oracle.com Fri Jul 26 18:46:40 2019 From: patricio.chilano.mateo at oracle.com (Patricio Chilano) Date: Fri, 26 Jul 2019 14:46:40 -0400 Subject: RFR 8227528: TestAbortVMOnSafepointTimeout.java failed due to "RuntimeException: 'Safepoint sync time longer than' missing from stdout/stderr" Message-ID: Hi all, Could you review this small fix for test TestAbortVMOnSafepointTimeout.java? The test has been failing intermittently since 8191890. As explained in the bug comments, it turns out that a bias revocation handshake could happen in between the start of the "for" loop without safepoint polls and the safepoint where we want to timeout. That allows for the long loop to actually finish and prevents the desired timeout in the later safepoint. The simple solution is to just avoid using biased locking in this test (and therefore prevent the revocation handshake), since we just want to test the correct behavior of flag AbortVMOnSafepointTimeout. Webrev: http://cr.openjdk.java.net/~pchilanomate/8227528/v01/webrev Bugid: https://bugs.openjdk.java.net/browse/JDK-8227528 Thanks! Patricio From calvin.cheung at oracle.com Fri Jul 26 18:59:29 2019 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Fri, 26 Jul 2019 11:59:29 -0700 Subject: RFR (M) 8191521: handle long relative path specified in -Xbootclasspath/a on windows In-Reply-To: References: <2021b295-463c-6ea1-97b5-3512cacc28c1@oracle.com> <48005d2b-61b8-9027-db56-e3713c5a90bf@oracle.com> Message-ID: <72c67301-b8a9-cab7-4426-40837ca5dae8@oracle.com> Hi Ralf, On 7/26/19 5:20 AM, Schmelter, Ralf wrote: > Hi Calvin, > > I've updated the webrev with your suggestions. The tests now disable the > UNC path portion, if you don't have share $ for : > So all tests should now run on your machine. The update looks good and the gtest passed. I saw the following in the test log: [----------] 2 tests from os_windows [ RUN????? ] os_windows.reserve_memory_special_test_vm [?????? OK ] os_windows.reserve_memory_special_test_vm (0 ms) [ RUN????? ] os_windows.handle_long_paths_test_vm Disabled UNC path test, since T: is not mapped as share T$. [?????? OK ] os_windows.handle_long_paths_test_vm (13835 ms) [----------] 2 tests from os_windows (13835 ms total) thanks, Calvin > > http://cr.openjdk.java.net/~rschmelter/webrevs/8191521/webrev.1/ > > Best regards, > Ralf From harold.seigel at oracle.com Fri Jul 26 19:00:21 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Fri, 26 Jul 2019 15:00:21 -0400 Subject: RFR 8228596: Class redefinition fails when condy instructions are removed In-Reply-To: <30b26fc3-096c-7480-8dc0-209d8ec70767@oracle.com> References: <30b26fc3-096c-7480-8dc0-209d8ec70767@oracle.com> Message-ID: <24a257d7-043c-6236-0c78-03c4249caa55@oracle.com> Thanks Serguei! Harold On 7/26/2019 2:44 PM, serguei.spitsyn at oracle.com wrote: > Hi Harold, > > This looks good to me. > Added the serviceability-dev mailing list. > > Thanks, > Serguei > > > On 7/26/19 05:04, Harold Seigel wrote: >> Hi, >> >> Please review this small JDK-14 fix for an issue with constant pool >> merging when redefining a class whose constant pool contains a >> constant dynamic entry.? The fix makes sure that the >> has_dynamic_constant flag gets copied properly to the merged constant >> pool. >> >> Open Webrev: >> http://cr.openjdk.java.net/~hseigel/bug_8228596/webrev/index.html >> >> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8228596 >> >> The fix was regression tested by running Mach5 tiers 1 and 2 tests >> and builds on Linux-x64, Solaris, Windows, and Mac OS X, by running >> Mach5 tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on >> Linux-x64. >> >> Thanks, Harold >> > From daniel.daugherty at oracle.com Fri Jul 26 19:19:04 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 26 Jul 2019 15:19:04 -0400 Subject: RFR 8227528: TestAbortVMOnSafepointTimeout.java failed due to "RuntimeException: 'Safepoint sync time longer than' missing from stdout/stderr" In-Reply-To: References: Message-ID: On 7/26/19 2:46 PM, Patricio Chilano wrote: > Hi all, > > Could you review this small fix for test > TestAbortVMOnSafepointTimeout.java? > > The test has been failing intermittently since 8191890. As explained > in the bug comments, it turns out that a bias revocation handshake > could happen in between the start of the "for" loop without safepoint > polls and the safepoint where we want to timeout. That allows for the > long loop to actually finish and prevents the desired timeout in the > later safepoint. The simple solution is to just avoid using biased > locking in this test (and therefore prevent the revocation handshake), > since we just want to test the correct behavior of flag > AbortVMOnSafepointTimeout. > > Webrev: http://cr.openjdk.java.net/~pchilanomate/8227528/v01/webrev The change itself is trivial. However, the reasons behind the change aren't. This part of the description caught my eye: ??? the start of the "for" loop without safepoint polls and my brain did a "Say what?!?!" Of course, that was without looking at the test which has a huge number of options, including these: ??? L70: ??????????????? "-XX:-UseCountedLoopSafepoints", ??? L71: ??????????????? "-XX:LoopStripMiningIter=0", ??? L72: ??????????????? "-XX:LoopUnrollLimit=0", Okay, now the world makes much more sense. We are intentionally telling the compiler to not emit safepoint polls in the counted loop and we're turning off other loop optimizations. Basically, we're telling the compiler we want to stall in that loop until we exceed the safepoint timeout limit. Got it... So the new biased locking handshake messes with the timeout that this test is trying to achieve. Disabling biased locking makes the test more robust by allowing the safepoint sync timeout to happen. A couple of minor suggestions: test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.java ??? L30:? * @bug 8219584 ? ????? You should add an @bug for this bug (8227528). I don't know if ??????? you can put more than one bug ID on an @bug line or if you need ??????? a separate @bug line. ??? L61: ??????? ProcessBuilder pb = ProcessTools.createJavaProcessBuilder( ??????? Please add a comment above this line: ??????????? // -XX:-UseBiasedLocking - is used to prevent biased locking ??????????? // handshakes from changing the timing of this test. Thumbs up. I don't need to see another webrev if you choose to make the above changes. Dan > Bugid: https://bugs.openjdk.java.net/browse/JDK-8227528 > > Thanks! > Patricio From patricio.chilano.mateo at oracle.com Fri Jul 26 19:46:37 2019 From: patricio.chilano.mateo at oracle.com (Patricio Chilano) Date: Fri, 26 Jul 2019 15:46:37 -0400 Subject: RFR 8227528: TestAbortVMOnSafepointTimeout.java failed due to "RuntimeException: 'Safepoint sync time longer than' missing from stdout/stderr" In-Reply-To: References: Message-ID: <7c8fabc2-6c9c-ead8-518d-c894a702b4ab@oracle.com> Hi Dan, On 7/26/19 3:19 PM, Daniel D. Daugherty wrote: > On 7/26/19 2:46 PM, Patricio Chilano wrote: >> Hi all, >> >> Could you review this small fix for test >> TestAbortVMOnSafepointTimeout.java? >> >> The test has been failing intermittently since 8191890. As explained >> in the bug comments, it turns out that a bias revocation handshake >> could happen in between the start of the "for" loop without safepoint >> polls and the safepoint where we want to timeout. That allows for the >> long loop to actually finish and prevents the desired timeout in the >> later safepoint. The simple solution is to just avoid using biased >> locking in this test (and therefore prevent the revocation >> handshake), since we just want to test the correct behavior of flag >> AbortVMOnSafepointTimeout. >> >> Webrev: http://cr.openjdk.java.net/~pchilanomate/8227528/v01/webrev > > The change itself is trivial. However, the reasons behind the change > aren't. > > This part of the description caught my eye: > > ??? the start of the "for" loop without safepoint polls > > and my brain did a "Say what?!?!" Of course, that was without looking at > the test which has a huge number of options, including these: > > ??? L70: ??????????????? "-XX:-UseCountedLoopSafepoints", > ??? L71: ??????????????? "-XX:LoopStripMiningIter=0", > ??? L72: ??????????????? "-XX:LoopUnrollLimit=0", > :-D > Okay, now the world makes much more sense. We are intentionally telling > the compiler to not emit safepoint polls in the counted loop and we're > turning off other loop optimizations. Basically, we're telling the > compiler we want to stall in that loop until we exceed the safepoint > timeout limit. Got it... > > So the new biased locking handshake messes with the timeout that this > test is trying to achieve. Disabling biased locking makes the test more > robust by allowing the safepoint sync timeout to happen. > > A couple of minor suggestions: > > test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.java > ??? L30:? * @bug 8219584 > > ? ????? You should add an @bug for this bug (8227528). I don't know if > ??????? you can put more than one bug ID on an @bug line or if you need > ??????? a separate @bug line. Done! I added the bug number in the same line since that's the style I see in other tests. > L61: ??????? ProcessBuilder pb = ProcessTools.createJavaProcessBuilder( > ??????? Please add a comment above this line: > > ??????????? // -XX:-UseBiasedLocking - is used to prevent biased locking > ??????????? // handshakes from changing the timing of this test. Done! > Thumbs up. I don't need to see another webrev if you choose to make > the above changes. Great! Thanks for reviewing this Dan!? : ) Patricio > Dan > > >> Bugid: https://bugs.openjdk.java.net/browse/JDK-8227528 >> >> Thanks! >> Patricio > From dean.long at oracle.com Fri Jul 26 21:44:40 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 26 Jul 2019 14:44:40 -0700 Subject: RFR 8228596: Class redefinition fails when condy instructions are removed In-Reply-To: References: Message-ID: <6b96b8db-3f2d-1000-f685-16b746905eb0@oracle.com> I see a fix for a specific problem, but I don't see anything preventing similar problems (a change in ConstantPool that's isn't reflected in VM_RedefineClasses::merge_cp_and_rewrite) from happening again.? I understand that merge_cp_and_rewrite needs to have intimate knowledge of CP internals, but maybe some refactoring could reduce the future change "risk surface". dl On 7/26/19 5:04 AM, Harold Seigel wrote: > Hi, > > Please review this small JDK-14 fix for an issue with constant pool > merging when redefining a class whose constant pool contains a > constant dynamic entry.? The fix makes sure that the > has_dynamic_constant flag gets copied properly to the merged constant > pool. > > Open Webrev: > http://cr.openjdk.java.net/~hseigel/bug_8228596/webrev/index.html > > JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8228596 > > The fix was regression tested by running Mach5 tiers 1 and 2 tests and > builds on Linux-x64, Solaris, Windows, and Mac OS X, by running Mach5 > tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on Linux-x64. > > Thanks, Harold > From mikhailo.seledtsov at oracle.com Fri Jul 26 22:08:32 2019 From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com) Date: Fri, 26 Jul 2019 15:08:32 -0700 Subject: RFR(S): 8226779: [TESTBUG] Test JFR API from Java agent Message-ID: Please review this new test. It is testing interaction between JFR and Java Agent, creating recordings from within Java agent, both premain() and agentmain(). This is also a bit of a stress test since it tests large number of recordings from multiple threads, from the agent, hence I placed it under jfr/stress/javaagent. ??? JBS: https://bugs.openjdk.java.net/browse/JDK-8226779 ??? Webrev: http://cr.openjdk.java.net/~mseledtsov/8226779.00/ ??? Testing: ????? 1. Running the test itself multiple times on multiple platforms - in progress ???????? (Passed on Mac so far) Thank you, Misha From david.holmes at oracle.com Fri Jul 26 22:27:51 2019 From: david.holmes at oracle.com (David Holmes) Date: Sat, 27 Jul 2019 08:27:51 +1000 Subject: RFR 8227528: TestAbortVMOnSafepointTimeout.java failed due to "RuntimeException: 'Safepoint sync time longer than' missing from stdout/stderr" In-Reply-To: References: Message-ID: On 27/07/2019 5:19 am, Daniel D. Daugherty wrote: > On 7/26/19 2:46 PM, Patricio Chilano wrote: >> Hi all, >> >> Could you review this small fix for test >> TestAbortVMOnSafepointTimeout.java? >> >> The test has been failing intermittently since 8191890. As explained >> in the bug comments, it turns out that a bias revocation handshake >> could happen in between the start of the "for" loop without safepoint >> polls and the safepoint where we want to timeout. That allows for the >> long loop to actually finish and prevents the desired timeout in the >> later safepoint. The simple solution is to just avoid using biased >> locking in this test (and therefore prevent the revocation handshake), >> since we just want to test the correct behavior of flag >> AbortVMOnSafepointTimeout. >> >> Webrev: http://cr.openjdk.java.net/~pchilanomate/8227528/v01/webrev > > The change itself is trivial. However, the reasons behind the change > aren't. > > This part of the description caught my eye: > > ??? the start of the "for" loop without safepoint polls > > and my brain did a "Say what?!?!" Of course, that was without looking at > the test which has a huge number of options, including these: > > ??? L70: ??????????????? "-XX:-UseCountedLoopSafepoints", > ??? L71: ??????????????? "-XX:LoopStripMiningIter=0", > ??? L72: ??????????????? "-XX:LoopUnrollLimit=0", > > Okay, now the world makes much more sense. We are intentionally telling > the compiler to not emit safepoint polls in the counted loop and we're > turning off other loop optimizations. Basically, we're telling the > compiler we want to stall in that loop until we exceed the safepoint > timeout limit. Got it... > > So the new biased locking handshake messes with the timeout that this > test is trying to achieve. Disabling biased locking makes the test more > robust by allowing the safepoint sync timeout to happen. > > A couple of minor suggestions: > > test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.java > ??? L30:? * @bug 8219584 > > ? ????? You should add an @bug for this bug (8227528). I don't know if > ??????? you can put more than one bug ID on an @bug line or if you need > ??????? a separate @bug line. > > ??? L61: ??????? ProcessBuilder pb = > ProcessTools.createJavaProcessBuilder( > ??????? Please add a comment above this line: > > ??????????? // -XX:-UseBiasedLocking - is used to prevent biased locking > ??????????? // handshakes from changing the timing of this test. > > Thumbs up. I don't need to see another webrev if you choose to make > the above changes. I think some additional commentary on the other exotic options to ensure the loop contains no safepoints and is not unrolled etc would also be worthwhile. Change itself makes sense. Thanks, David > > Dan > > >> Bugid: https://bugs.openjdk.java.net/browse/JDK-8227528 >> >> Thanks! >> Patricio > From mark.reinhold at oracle.com Fri Jul 26 22:58:57 2019 From: mark.reinhold at oracle.com (mark.reinhold at oracle.com) Date: Fri, 26 Jul 2019 15:58:57 -0700 (PDT) Subject: New candidate JEP: 358: Helpful NullPointerExceptions Message-ID: <20190726225857.1D16E2B8265@eggemoggin.niobe.net> https://openjdk.java.net/jeps/358 - Mark From Pengfei.Li at arm.com Mon Jul 29 01:39:57 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Mon, 29 Jul 2019 01:39:57 +0000 Subject: [aarch64-port-dev ] RFR(S): 8228601: AArch64: Fix interpreter code at JVMCI deoptimization entry In-Reply-To: <13ad2be7-169b-0635-1edc-8936d9c4c103@redhat.com> References: <13ad2be7-169b-0635-1edc-8936d9c4c103@redhat.com> Message-ID: Hi, > > Please help review this AArch64 bug fix. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8228601 > > Webrev: http://cr.openjdk.java.net/~pli/rfr/8228601/webrev.00/ > > Great catch, well done! Patch looks OK. Thanks for review. I've a question that: should we backport this fix to jdk13 and/or jdk11u? I see this mistake was introduced in aarch64 code long time ago. -- Thanks, Pengfei From ningsheng.jian at arm.com Mon Jul 29 01:48:10 2019 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Mon, 29 Jul 2019 09:48:10 +0800 Subject: [aarch64-port-dev ] RFR(S): 8228601: AArch64: Fix interpreter code at JVMCI deoptimization entry In-Reply-To: References: <13ad2be7-169b-0635-1edc-8936d9c4c103@redhat.com> Message-ID: <91155406-5850-1c68-2e7f-1fbc859e18a3@arm.com> On 7/29/19 9:39 AM, Pengfei Li (Arm Technology China) wrote: > Hi, > >>> Please help review this AArch64 bug fix. >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8228601 >>> Webrev: http://cr.openjdk.java.net/~pli/rfr/8228601/webrev.00/ >> >> Great catch, well done! Patch looks OK. > > Thanks for review. I've a question that: should we backport this fix to jdk13 and/or jdk11u? I see this mistake was introduced in aarch64 code long time ago. > I think you need a jdk13-fix-request label [1], if this is OK to jdk13. [1] http://openjdk.java.net/jeps/3#Late-Enhancement-Request-Process Thanks, Ningsheng From ningsheng.jian at arm.com Mon Jul 29 02:19:22 2019 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Mon, 29 Jul 2019 10:19:22 +0800 Subject: [aarch64-port-dev ] RFR(S): 8228601: AArch64: Fix interpreter code at JVMCI deoptimization entry In-Reply-To: <91155406-5850-1c68-2e7f-1fbc859e18a3@arm.com> References: <13ad2be7-169b-0635-1edc-8936d9c4c103@redhat.com> <91155406-5850-1c68-2e7f-1fbc859e18a3@arm.com> Message-ID: On 7/29/19 9:48 AM, Ningsheng Jian wrote: > On 7/29/19 9:39 AM, Pengfei Li (Arm Technology China) wrote: >> Hi, >> >>>> Please help review this AArch64 bug fix. >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8228601 >>>> Webrev: http://cr.openjdk.java.net/~pli/rfr/8228601/webrev.00/ >>> >>> Great catch, well done! Patch looks OK. >> >> Thanks for review. I've a question that: should we backport this fix to jdk13 and/or jdk11u? I see this mistake was introduced in aarch64 code long time ago. >> > > I think you need a jdk13-fix-request label [1], if this is OK to jdk13. > > [1] http://openjdk.java.net/jeps/3#Late-Enhancement-Request-Process The correct link: http://openjdk.java.net/jeps/3#Fix-Request-Process Thanks, Ningsheng From david.holmes at oracle.com Mon Jul 29 07:53:09 2019 From: david.holmes at oracle.com (David Holmes) Date: Mon, 29 Jul 2019 17:53:09 +1000 Subject: RFR: 8185005: Improve performance of ThreadMXBean.getThreadInfo(long ids[], int maxDepth) In-Reply-To: <76BCC96D-DB5D-409A-95D5-3A64B893832D@oracle.com> References: <4C4212D0-BFFF-4C85-ACC6-05200F220C3F@oracle.com> <2d6dede1-aa79-99ce-a823-773fa2e19827@oracle.com> <6E7B043A-4647-4931-977C-1854CA7EBEC1@oracle.com> <76BCC96D-DB5D-409A-95D5-3A64B893832D@oracle.com> Message-ID: <7e0ba39e-e5b7-f56b-66ea-820a0a35ec2c@oracle.com> Hi Daniil, Overall I think this is a reasonable approach but I would still like to see some performance and footprint numbers, both to verify it fixes the problem reported, and that we are not getting penalized elsewhere. On 25/07/2019 3:21 am, Daniil Titov wrote: > Hi David, Daniel, and Serguei, > > Please review the new version of the fix, that makes the thread table initialization on demand and > moves it inside ThreadsList::find_JavaThread_from_java_tid(). At the creation time the thread table > is initialized with the threads from the current thread list. We don't want to hold Threads_lock > inside find_JavaThread_from_java_tid(), thus new threads still could be created while the thread > table is being initialized . Such threads will be found by the linear search and added to the thread table > later, in ThreadsList::find_JavaThread_from_java_tid(). The initialization allows the created but unpopulated, or partially populated, table to be seen by other threads - is that your intention? It seems it should be okay as the other threads will then race with the initializing thread to add specific entries, and this is a concurrent map so that should be functionally correct. But if so then I think you can also reduce the scope of the ThreadTableCreate_lock so that it covers creation of the table only, not the initial population of the table. I like the approach of only initializing the table when needed and using that to control when the add/remove-thread code needs to update the table. But I would still want to see what impact this has on thread startup cost, both with and without the table being initialized. > The change also includes additional optimization for some callers of find_JavaThread_from_java_tid() > as Daniel suggested. Not sure it's best to combine these, but if they are limited to the changes in management.cpp only then that may be okay. It helps to be able to focus on the table related changes without being distracted by other optimizations. > That is correct that ResolvedMethodTable was used as a blueprint for the thread table, however, I tried > to strip it of the all functionality that is not required in the thread table case. The revised version seems better in that regard. But I still have a concern, see below. > We need to have the thread table resizable and allow it to grow as the number of threads increases to avoid > reserving excessive memory a-priori or deteriorating lookup times. The ServiceThread is responsible for > growing the thread table when required. Yes but why? Why can't this table be grown on demand by the thread that is doing the addition? For other tables we may have to delegate to the service thread because the current thread cannot perform the action, or it doesn't want to perform it at the time the need for the resize is detected (e.g. its detected at a safepoint and you want the resize to happen later outside the safepoint). It's not apparent to me that such restrictions apply here. > There is no ConcurrentHashTable available in Java 8 and for backporting this fix to Java 8 another implementation > of the hash table, probably originally suggested in the patch attached to the JBS issue, should be used. It will make > the backporting more complicated, however, adding a new Implementation of the hash table in Java 14 while it > already has ConcurrentHashTable doesn't seem reasonable for me. Ok. > Webrev: http://cr.openjdk.java.net/~dtitov/8185005/webrev.03 Some specific code comments: src/hotspot/share/runtime/mutexLocker.cpp + def(ThreadTableCreate_lock , PaddedMutex , special, false, Monitor::_safepoint_check_never); I think this needs to be a _safepoint_check_always lock. The table will be created by regular JavaThreads and they should (nearly) always be checking for safepoints if they are going to block acquiring the lock. And it isn't at all obvious that the thread doing the creation can't go to a safepoint whilst this lock is held. --- src/hotspot/share/runtime/threadSMR.cpp Nit: 618 JavaThread* thread = thread_at(i); you could reuse the new java_thread local you introduced at line 613 and just rename that "new" variable to "thread" so you don't have to change all other uses. 628 } else if (java_thread != NULL && ... You don't need to check != NULL here as you only get here when java_thread is not NULL. 755 jlong tid = SharedRuntime::get_java_tid(thread); 926 jlong tid = SharedRuntime::get_java_tid(thread); I think it cleaner/better to just use jlong tid = java_lang_Thread::thread_id(thread->threadObj()); as we know thread is not NULL, it is a JavaThread and it has to have a non-null threadObj. --- src/hotspot/share/services/management.cpp 1323 if (THREAD->is_Java_thread()) { 1324 JavaThread* current_thread = (JavaThread*)THREAD; These calls can only be made on a JavaThread so this be simplified to remove the is_Java_thread() call. Similarly in other places. --- src/hotspot/share/services/threadTable.cpp 55 class ThreadTableEntry : public CHeapObj { 56 private: 57 jlong _tid; I believe hotspot style is to not indent the access modifiers in C++ class declarations, so the above would just be: 55 class ThreadTableEntry : public CHeapObj { 56 private: 57 jlong _tid; etc. 60 ThreadTableEntry(jlong tid, JavaThread* java_thread) : 61 _tid(tid),_java_thread(java_thread) {} line 61 should be indented as it continues line 60. 67 class ThreadTableConfig : public AllStatic { ... 71 static uintx get_hash(Value const& value, bool* is_dead) { The is_dead parameter still bothers me here. I can't make enough sense out of the template code in ConcurrentHashtable to see why we have to have it, but I'm concerned that its very existence means we perhaps should not be trying to extend CHT in this context. ?? 115 size_t start_size_log = size_log > DefaultThreadTableSizeLog 116 ? size_log : DefaultThreadTableSizeLog; line 116 should be indented, though in this case I think a better layout would be: 115 size_t start_size_log = 116 size_log > DefaultThreadTableSizeLog ? size_log : DefaultThreadTableSizeLog; 131 double ThreadTable::get_load_factor() { 132 return (double)_items_count/_current_size; 133 } Not sure that is doing what you want/expect. It will perform integer division and then cast that whole integer to a double. If you want double arithmetic you need: return ((double)_items_count)/_current_size; 180 jlong _tid; 181 uintx _hash; Nit: no need for all those spaces before the variable name. 183 ThreadTableLookup(jlong tid) 184 : _tid(tid), _hash(primitive_hash(tid)) {} line 184 should be indented. 201 ThreadGet():_return(NULL) {} Nit: need space after : 211 assert(_is_initialized, "Thread table is not initialized"); 212 _has_work = false; line 211 is indented one space too far. 229 ThreadTableEntry* entry = new ThreadTableEntry(tid,java_thread); Nit: need space after , 252 return _local_table->remove(thread,lookup); Nit: need space after , Thanks, David ------ > Bug: https://bugs.openjdk.java.net/browse/JDK-8185005 > > Thanks! > --Daniil > > > ?On 7/8/19, 3:24 PM, "Daniel D. Daugherty" wrote: > > On 6/29/19 12:06 PM, Daniil Titov wrote: > > Hi Serguei and David, > > > > Serguei is right, ThreadTable::find_thread(java_tid) cannot return a JavaThread with an unmatched java_tid. > > > > Please find a new version of the fix that includes the changes Serguei suggested. > > > > Regarding the concern about the maintaining the thread table when it may never even be queried, one of > > the options could be to add ThreadTable ::isEnabled flag, set it to "false" by default, and wrap the calls to the thread table > > in ThreadsSMRSupport add_thread() and remove_thread() methods to check this flag. > > > > When ThreadsList::find_JavaThread_from_java_tid() is called for the first time it could check if ThreadTable ::isEnabled > > Is on and if not then set it on and populate the thread table with all existing threads from the thread list. > > I have the same concerns as David H. about this new ThreadTable. > ThreadsList::find_JavaThread_from_java_tid() is only called from code > in src/hotspot/share/services/management.cpp so I think that table > needs to enabled and populated only if it is going to be used. > > I've taken a look at the webrev below and I see that David has > followed up with additional comments. Before I do a crawl through > code review for this, I would like to see the ThreadTable stuff > made optional and David's other comments addressed. > > Another possible optimization is for callers of > find_JavaThread_from_java_tid() to save the calling thread's > tid value before they loop and if the current tid == saved_tid > then use the current JavaThread* instead of calling > find_JavaThread_from_java_tid() to get the JavaThread*. > > Dan > > > > > Webrev: https://cr.openjdk.java.net/~dtitov/8185005/webrev.02/ > > Bug: https://bugs.openjdk.java.net/browse/JDK-8185005 > > > > Thanks! > > --Daniil > > > > From: > > Organization: Oracle Corporation > > Date: Friday, June 28, 2019 at 7:56 PM > > To: Daniil Titov , OpenJDK Serviceability , "hotspot-runtime-dev at openjdk.java.net" , "jmx-dev at openjdk.java.net" > > Subject: Re: RFR: 8185005: Improve performance of ThreadMXBean.getThreadInfo(long ids[], int maxDepth) > > > > Hi Daniil, > > > > I have several quick comments. > > > > The indent in the hotspot c/c++ files has to be 2, not 4. > > > > https://cr.openjdk.java.net/~dtitov/8185005/webrev.01/src/hotspot/share/runtime/threadSMR.cpp.frames.html > > 614 JavaThread* ThreadsList::find_JavaThread_from_java_tid(jlong java_tid) const { > > 615 JavaThread* java_thread = ThreadTable::find_thread(java_tid); > > 616 if (java_thread == NULL && java_tid == PMIMORDIAL_JAVA_TID) { > > 617 // ThreadsSMRSupport::add_thread() is not called for the primordial > > 618 // thread. Thus, we find this thread with a linear search and add it > > 619 // to the thread table. > > 620 for (uint i = 0; i < length(); i++) { > > 621 JavaThread* thread = thread_at(i); > > 622 if (is_valid_java_thread(java_tid,thread)) { > > 623 ThreadTable::add_thread(java_tid, thread); > > 624 return thread; > > 625 } > > 626 } > > 627 } else if (java_thread != NULL && is_valid_java_thread(java_tid, java_thread)) { > > 628 return java_thread; > > 629 } > > 630 return NULL; > > 631 } > > 632 bool ThreadsList::is_valid_java_thread(jlong java_tid, JavaThread* java_thread) { > > 633 oop tobj = java_thread->threadObj(); > > 634 // Ignore the thread if it hasn't run yet, has exited > > 635 // or is starting to exit. > > 636 return (tobj != NULL && !java_thread->is_exiting() && > > 637 java_tid == java_lang_Thread::thread_id(tobj)); > > 638 } > > > > 615 JavaThread* java_thread = ThreadTable::find_thread(java_tid); > > > > I'd suggest to rename find_thread() to find_thread_by_tid(). > > > > A space is missed after the comma: > > 622 if (is_valid_java_thread(java_tid,thread)) { > > > > An empty line is needed before L632. > > > > The name 'is_valid_java_thread' looks wrong (or confusing) to me. > > Something like 'is_alive_java_thread_with_tid()' would be better. > > It'd better to list parameters in the opposite order. > > > > The call to is_valid_java_thread() is confusing: > > 627 } else if (java_thread != NULL && is_valid_java_thread(java_tid, java_thread)) { > > > > Why would the call ThreadTable::find_thread(java_tid) return a JavaThread with an unmatched java_tid? > > > > > > Thanks, > > Serguei > > > > On 6/28/19, 9:40 PM, "David Holmes" wrote: > > > > Hi Daniil, > > > > The definition and use of this hashtable (yet another hashtable > > implementation!) will need careful examination. We have to be concerned > > about the cost of maintaining it when it may never even be queried. You > > would need to look at footprint cost and performance impact. > > > > Unfortunately I'm just about to board a plane and will be out for the > > next few days. I will try to look at this asap next week, but we will > > need a lot more data on it. > > > > Thanks, > > David > > > > On 6/28/19 3:31 PM, Daniil Titov wrote: > > Please review the change that improves performance of ThreadMXBean MXBean methods returning the > > information for specific threads. The change introduces the thread table that uses ConcurrentHashTable > > to store one-to-one the mapping between the thread ids and JavaThread objects and replaces the linear > > search over the thread list in ThreadsList::find_JavaThread_from_java_tid(jlong tid) method with the lookup > > in the thread table. > > > > Testing: Mach5 tier1,tier2 and tier3 tests successfully passed. > > > > Webrev: https://cr.openjdk.java.net/~dtitov/8185005/webrev.01/ > > Bug: https://bugs.openjdk.java.net/browse/JDK-8185005 > > > > Thanks! > > > > Best regards, > > Daniil > > > > > > > > > > > > > > > > > > From adinn at redhat.com Mon Jul 29 07:55:51 2019 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 29 Jul 2019 08:55:51 +0100 Subject: [aarch64-port-dev ] RFR(S): 8228601: AArch64: Fix interpreter code at JVMCI deoptimization entry In-Reply-To: References: <13ad2be7-169b-0635-1edc-8936d9c4c103@redhat.com> Message-ID: On 29/07/2019 02:39, Pengfei Li (Arm Technology China) wrote: > Thanks for review. I've a question that: should we backport this fix > to jdk13 and/or jdk11u? I see this mistake was introduced in aarch64 > code long time ago. Yes, I agree this needs to be backported to jdk13 and jdk11u. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From martin.doerr at sap.com Mon Jul 29 09:45:41 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 29 Jul 2019 09:45:41 +0000 Subject: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access event requests at runtime In-Reply-To: <17af6e76-ecff-9eac-4efd-dec14c956c74@bell-sw.com> References: <6cdc76d5-dca1-d033-856b-d264a71f8f23@bell-sw.com> <17af6e76-ecff-9eac-4efd-dec14c956c74@bell-sw.com> Message-ID: Hi Boris, thanks, I've updated the BUFFER_SIZE in place. Seems like all platform implementations have been reviewed. So I'll push this version if there are no objections. Thanks everyone for reviewing! Best regards, Martin > -----Original Message----- > From: Boris Ulasevich > Sent: Freitag, 26. Juli 2019 18:18 > To: Doerr, Martin > Cc: hotspot-runtime-dev at openjdk.java.net; serviceability- > dev at openjdk.java.net > Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field access > event requests at runtime > > Hi Martin, > > The webrev.02 change works good if we increase BUFFER_SIZE. Current > change gives "BUFFER_SIZE too small" assertion. I propose to change > BUFFER_SIZE value to 120, it works Ok then. > > glad to help you, > regards, > Boris > > On 26.07.2019 16:59, Doerr, Martin wrote: > > Hi Boris, > > > > thank you very much for testing. > > > > Unfortunately, arm 32 was also affected by the issue Erik has found for > aarch64: > > We need a little stronger memory barriers to support accessing volatile > fields with correct ordering semantics. > > > > I've updated that in the current webrev already: > > > http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.02/ > > > > I'm using > membar(MacroAssembler::Membar_mask_bits(MacroAssembler::LoadLoad > | MacroAssembler::LoadStore), Rtmp2), now. > > I've already used a cross build to check that it compiles, but I haven't run it. > > I believe this membar doesn't have a significant performance impact. > > > > Would be great if you could take a look and test that, too. > > > > Thanks and best regards, > > Martin > > > > > >> -----Original Message----- > >> From: Boris Ulasevich > >> Sent: Freitag, 26. Juli 2019 12:50 > >> To: Doerr, Martin > >> Cc: hotspot-runtime-dev at openjdk.java.net; serviceability- > >> dev at openjdk.java.net > >> Subject: Re: RFR(M): 8227680: FastJNIAccessors: Check for JVMTI field > access > >> event requests at runtime > >> > >> Hi Martin, > >> > >> Your change works Ok on arm32 with the minor correction. See the patch > >> attached. > >> > >> thanks, > >> Boris > >> > >> On 16.07.2019 16:31, Doerr, Martin wrote: > >>> Hi, > >>> > >>> the current implementation of FastJNIAccessors ignores the flag - > >> XX:+UseFastJNIAccessors when the JVMTI capability > >> "can_post_field_access" is enabled. > >>> This is an unnecessary restriction which makes field accesses > >> (GetField) from native code slower when a JVMTI agent is > attached > >> which enables this capability. > >>> A better implementation would check at runtime if an agent actually > wants > >> to receive field access events. > >>> > >>> Note that the bytecode interpreter already uses this better > >> implementation by checking if field access watch events were requested > >> (JvmtiExport::_field_access_count != 0). > >>> > >>> I have implemented such a runtime check on all platforms which > currently > >> support FastJNIAccessors. > >>> > >>> My new jtreg test runtime/jni/FastGetField/FastGetField.java contains a > >> micro benchmark: > >>> test- > >> > support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/Fa > >> stGetField/FastGetField.jtr > >>> shows the duration of 10000 iterations with and without > >> UseFastJNIAccessors (JVMTI agent gets attached in both runs). > >>> My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with > >> FastJNIAccessors and 11.2ms without it. > >>> > >>> Webrev: > >>> > >> > http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/ > >>> > >>> We have run the test on 64 bit x86 platforms, SPARC and aarch64. > >>> (FastJNIAccessors are not yet available on PPC64 and s390. I'll contribute > >> them later.) > >>> My webrev contains 32 bit implementations for x86 and arm, but > >> completely untested. It'd be great if somebody could volunteer to review > >> and test these platforms. > >>> > >>> Please review. > >>> > >>> Best regards, > >>> Martin > >>> From harold.seigel at oracle.com Mon Jul 29 13:11:49 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Mon, 29 Jul 2019 09:11:49 -0400 Subject: RFR 8228596: Class redefinition fails when condy instructions are removed In-Reply-To: <6b96b8db-3f2d-1000-f685-16b746905eb0@oracle.com> References: <6b96b8db-3f2d-1000-f685-16b746905eb0@oracle.com> Message-ID: Hi Dean, Thanks for pointing this out.? There is an existing bug, JDK-8155673 , to remove constant pool merging.? This will prevent similar problems to this one because constant pools will no longer need to be merged. Thanks, Harold On 7/26/2019 5:44 PM, dean.long at oracle.com wrote: > I see a fix for a specific problem, but I don't see anything > preventing similar problems (a change in ConstantPool that's isn't > reflected in VM_RedefineClasses::merge_cp_and_rewrite) from happening > again.? I understand that merge_cp_and_rewrite needs to have intimate > knowledge of CP internals, but maybe some refactoring could reduce the > future change "risk surface". > > dl > > On 7/26/19 5:04 AM, Harold Seigel wrote: >> Hi, >> >> Please review this small JDK-14 fix for an issue with constant pool >> merging when redefining a class whose constant pool contains a >> constant dynamic entry.? The fix makes sure that the >> has_dynamic_constant flag gets copied properly to the merged constant >> pool. >> >> Open Webrev: >> http://cr.openjdk.java.net/~hseigel/bug_8228596/webrev/index.html >> >> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8228596 >> >> The fix was regression tested by running Mach5 tiers 1 and 2 tests >> and builds on Linux-x64, Solaris, Windows, and Mac OS X, by running >> Mach5 tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on >> Linux-x64. >> >> Thanks, Harold >> > From martin.doerr at sap.com Mon Jul 29 14:13:44 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 29 Jul 2019 14:13:44 +0000 Subject: RFR 8227528: TestAbortVMOnSafepointTimeout.java failed due to "RuntimeException: 'Safepoint sync time longer than' missing from stdout/stderr" In-Reply-To: References: Message-ID: Hi Patricio, I have also already noticed this issue. Thank you for analyzing the root cause. Fix looks good to me. I don't need to see another webrev for comment improvements, either. I've linked the bug to JDK-8191890 and JDK-8219584. Best regards, Martin > -----Original Message----- > From: hotspot-runtime-dev bounces at openjdk.java.net> On Behalf Of David Holmes > Sent: Samstag, 27. Juli 2019 00:28 > To: daniel.daugherty at oracle.com; Patricio Chilano > ; hotspot-runtime- > dev at openjdk.java.net runtime > Subject: Re: RFR 8227528: TestAbortVMOnSafepointTimeout.java failed due > to "RuntimeException: 'Safepoint sync time longer than' missing from > stdout/stderr" > > On 27/07/2019 5:19 am, Daniel D. Daugherty wrote: > > On 7/26/19 2:46 PM, Patricio Chilano wrote: > >> Hi all, > >> > >> Could you review this small fix for test > >> TestAbortVMOnSafepointTimeout.java? > >> > >> The test has been failing intermittently since 8191890. As explained > >> in the bug comments, it turns out that a bias revocation handshake > >> could happen in between the start of the "for" loop without safepoint > >> polls and the safepoint where we want to timeout. That allows for the > >> long loop to actually finish and prevents the desired timeout in the > >> later safepoint. The simple solution is to just avoid using biased > >> locking in this test (and therefore prevent the revocation handshake), > >> since we just want to test the correct behavior of flag > >> AbortVMOnSafepointTimeout. > >> > >> Webrev: http://cr.openjdk.java.net/~pchilanomate/8227528/v01/webrev > > > > The change itself is trivial. However, the reasons behind the change > > aren't. > > > > This part of the description caught my eye: > > > > ??? the start of the "for" loop without safepoint polls > > > > and my brain did a "Say what?!?!" Of course, that was without looking at > > the test which has a huge number of options, including these: > > > > ??? L70: ??????????????? "-XX:-UseCountedLoopSafepoints", > > ??? L71: ??????????????? "-XX:LoopStripMiningIter=0", > > ??? L72: ??????????????? "-XX:LoopUnrollLimit=0", > > > > Okay, now the world makes much more sense. We are intentionally telling > > the compiler to not emit safepoint polls in the counted loop and we're > > turning off other loop optimizations. Basically, we're telling the > > compiler we want to stall in that loop until we exceed the safepoint > > timeout limit. Got it... > > > > So the new biased locking handshake messes with the timeout that this > > test is trying to achieve. Disabling biased locking makes the test more > > robust by allowing the safepoint sync timeout to happen. > > > > A couple of minor suggestions: > > > > > test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.ja > va > > ??? L30:? * @bug 8219584 > > > > ? ????? You should add an @bug for this bug (8227528). I don't know if > > ??????? you can put more than one bug ID on an @bug line or if you need > > ??????? a separate @bug line. > > > > ??? L61: ??????? ProcessBuilder pb = > > ProcessTools.createJavaProcessBuilder( > > ??????? Please add a comment above this line: > > > > ??????????? // -XX:-UseBiasedLocking - is used to prevent biased locking > > ??????????? // handshakes from changing the timing of this test. > > > > Thumbs up. I don't need to see another webrev if you choose to make > > the above changes. > > I think some additional commentary on the other exotic options to ensure > the loop contains no safepoints and is not unrolled etc would also be > worthwhile. > > Change itself makes sense. > > Thanks, > David > > > > > Dan > > > > > >> Bugid: https://bugs.openjdk.java.net/browse/JDK-8227528 > >> > >> Thanks! > >> Patricio > > From daniil.x.titov at oracle.com Mon Jul 29 15:37:25 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Mon, 29 Jul 2019 08:37:25 -0700 Subject: 8185005: Improve performance of ThreadMXBean.getThreadInfo(long ids[], int maxDepth) In-Reply-To: <65e52a11-21b3-58d2-a707-c7ecdc8bcff7@oracle.com> References: <4C4212D0-BFFF-4C85-ACC6-05200F220C3F@oracle.com> <2d6dede1-aa79-99ce-a823-773fa2e19827@oracle.com> <6E7B043A-4647-4931-977C-1854CA7EBEC1@oracle.com> <76BCC96D-DB5D-409A-95D5-3A64B893832D@oracle.com> <65e52a11-21b3-58d2-a707-c7ecdc8bcff7@oracle.com> Message-ID: Hi Serguei, Thank you for catching this! Regarding the testing, at this stage, I run all tier1- tier3 tests . I will check why this specific case was ?not included in the tests and will add it if required. Thanks, Daniil From: "serguei.spitsyn at oracle.com" Date: Monday, July 29, 2019 at 2:12 AM To: Daniil Titov , , OpenJDK Serviceability , "hotspot-runtime-dev at openjdk.java.net" , "jmx-dev at openjdk.java.net" , David Holmes Subject: Re: RFR: 8185005: Improve performance of ThreadMXBean.getThreadInfo(long ids[], int maxDepth) Hi Daniil, Probably, it'd make sense to re-iterate on this after you resolve David's comments. Now, just one comment though as it looks dangerous. http://cr.openjdk.java.net/~dtitov/8185005/webrev.03/src/hotspot/share/services/management.cpp.udiff.html ?????? } else { ???????? // reset contention statistics for a given thread +??????? JavaThread* java_thread = NULL; +??????? if (THREAD->is_Java_thread()) { +????????? JavaThread* current_thread = (JavaThread*)THREAD; +????????? if (tid == java_lang_Thread::thread_id(current_thread->threadObj())) { +??????????? java_thread = current_thread; +????????? } +??????? } +??????? if (java_thread == NULL) { ???????? JavaThread* java_thread = jtiwh.list()->find_JavaThread_from_java_tid(tid); +??????? } ???????? if (java_thread == NULL) { ?????????? return false; ???????? } Now, the definition of java_thread below: +??????? if (java_thread == NULL) { ?????????? JavaThread* java_thread = jtiwh.list()->find_JavaThread_from_java_tid(tid); +??????? } becomes completely useless because the block of its definition is ended right away. Here, overriding the local variable 'java_thread' does not look as a good idea. Then how was it really tested? Thanks, Serguei On 7/24/19 10:21, Daniil Titov wrote: Hi David, Daniel, and Serguei, Please review the new version of the fix, that makes the thread table initialization on demand and moves it inside ThreadsList::find_JavaThread_from_java_tid(). At the creation time the thread table is initialized with the threads from the current thread list. We don't want to hold Threads_lock inside find_JavaThread_from_java_tid(),? thus new threads still could be created? while the thread table is being initialized . Such threads will be found by the linear search and added to the thread table later, in ThreadsList::find_JavaThread_from_java_tid(). The change also includes additional optimization for some callers of find_JavaThread_from_java_tid() as Daniel suggested. That is correct that ResolvedMethodTable was used as a blueprint for the thread table, however, I tried to strip it of the all functionality that is not required in the thread table case. We need to have the thread table resizable and allow it to grow as the number of threads increases to avoid? reserving excessive memory a-priori or deteriorating lookup times. The ServiceThread is responsible for growing the thread table when required. There is no ConcurrentHashTable available in Java 8 and for backporting this fix to Java 8 another implementation of the hash table, probably originally suggested in the patch attached to the JBS issue, should be used.? It will make the backporting more complicated,? however, adding a new Implementation of the hash table in Java 14 while it already has ConcurrentHashTable doesn't seem? reasonable for me. Webrev: http://cr.openjdk.java.net/~dtitov/8185005/webrev.03 Bug: https://bugs.openjdk.java.net/browse/JDK-8185005 Thanks! --Daniil ?On 7/8/19, 3:24 PM, "Daniel D. Daugherty" wrote: ??? On 6/29/19 12:06 PM, Daniil Titov wrote: ??? > Hi Serguei and David, ??? > ??? > Serguei is right, ThreadTable::find_thread(java_tid) cannot? return a JavaThread with an unmatched java_tid. ??? > ??? > Please find a new version of the fix that includes the changes Serguei suggested. ??? > ??? > Regarding the concern about the maintaining the thread table when it may never even be queried, one of ??? > the options could be to add ThreadTable ::isEnabled flag, set it to "false" by default, and wrap the calls to the thread table ??? > in ThreadsSMRSupport add_thread() and remove_thread() methods to check this flag. ??? > ??? > When ThreadsList::find_JavaThread_from_java_tid() is called for the first time it could check if ThreadTable ::isEnabled ??? > Is on and if not then set it on and populate the thread table with all existing threads from the thread list. ??? ????I have the same concerns as David H. about this new ThreadTable. ??? ThreadsList::find_JavaThread_from_java_tid() is only called from code ??? in src/hotspot/share/services/management.cpp so I think that table ??? needs to enabled and populated only if it is going to be used. ??? ????I've taken a look at the webrev below and I see that David has ??? followed up with additional comments. Before I do a crawl through ??? code review for this, I would like to see the ThreadTable stuff ??? made optional and David's other comments addressed. ??? ????Another possible optimization is for callers of ??? find_JavaThread_from_java_tid() to save the calling thread's ??? tid value before they loop and if the current tid == saved_tid ??? then use the current JavaThread* instead of calling ??? find_JavaThread_from_java_tid() to get the JavaThread*. ??? ????Dan ??? ????> ??? > Webrev: https://cr.openjdk.java.net/~dtitov/8185005/webrev.02/ ??? > Bug: https://bugs.openjdk.java.net/browse/JDK-8185005 ??? > ??? > Thanks! ??? > --Daniil ??? > ??? > From: ??? > Organization: Oracle Corporation ??? > Date: Friday, June 28, 2019 at 7:56 PM ??? > To: Daniil Titov , OpenJDK Serviceability , "hotspot-runtime-dev at openjdk.java.net" , "jmx-dev at openjdk.java.net" ??? > Subject: Re: RFR: 8185005: Improve performance of ThreadMXBean.getThreadInfo(long ids[], int maxDepth) ??? > ??? > Hi Daniil, ??? > ??? > I have several quick comments. ??? > ??? > The indent in the hotspot c/c++ files has to be 2, not 4. ??? > ??? > https://cr.openjdk.java.net/~dtitov/8185005/webrev.01/src/hotspot/share/runtime/threadSMR.cpp.frames.html ??? > 614 JavaThread* ThreadsList::find_JavaThread_from_java_tid(jlong java_tid) const { ??? >?? 615???? JavaThread* java_thread = ThreadTable::find_thread(java_tid); ??? >?? 616???? if (java_thread == NULL && java_tid == PMIMORDIAL_JAVA_TID) { ??? >?? 617???????? // ThreadsSMRSupport::add_thread() is not called for the primordial ??? >?? 618???????? // thread. Thus, we find this thread with a linear search and add it ??? >?? 619???????? // to the thread table. ??? >?? 620???????? for (uint i = 0; i < length(); i++) { ??? >?? 621???????????? JavaThread* thread = thread_at(i); ??? >?? 622???????????? if (is_valid_java_thread(java_tid,thread)) { ??? >?? 623???????????????? ThreadTable::add_thread(java_tid, thread); ??? >?? 624???????????????? return thread; ??? >?? 625???????????? } ??? >?? 626???????? } ??? >?? 627???? } else if (java_thread != NULL && is_valid_java_thread(java_tid, java_thread)) { ??? >?? 628???????? return java_thread; ??? >?? 629???? } ??? >?? 630???? return NULL; ??? >?? 631 } ??? >?? 632 bool ThreadsList::is_valid_java_thread(jlong java_tid, JavaThread* java_thread) { ??? >?? 633???? oop tobj = java_thread->threadObj(); ??? >?? 634???? // Ignore the thread if it hasn't run yet, has exited ??? >?? 635???? // or is starting to exit. ??? >?? 636???? return (tobj != NULL && !java_thread->is_exiting() && ??? >?? 637???????????? java_tid == java_lang_Thread::thread_id(tobj)); ??? >?? 638 } ??? > ??? >?? 615?? ??JavaThread* java_thread = ThreadTable::find_thread(java_tid); ??? > ??? >??? I'd suggest to rename find_thread() to find_thread_by_tid(). ??? > ??? > A space is missed after the comma: ??? >??? 622 if (is_valid_java_thread(java_tid,thread)) { ??? > ??? > An empty line is needed before L632. ??? > ??? > The name 'is_valid_java_thread' looks wrong (or confusing) to me. ??? > Something like 'is_alive_java_thread_with_tid()' would be better. ??? > It'd better to list parameters in the opposite order. ??? > ??? > The call to is_valid_java_thread() is confusing: ??? >???? 627 } else if (java_thread != NULL && is_valid_java_thread(java_tid, java_thread)) { ??? > ??? > Why would the call ThreadTable::find_thread(java_tid) return a JavaThread with an unmatched java_tid? ??? > ??? >?? ????> Thanks, ??? > Serguei ??? > ??? > On 6/28/19, 9:40 PM, "David Holmes" wrote: ??? > ??? >????? Hi Daniil, ??? >????? ????>????? The definition and use of this hashtable (yet another hashtable ??? >????? implementation!) will need careful examination. We have to be concerned ??? >????? about the cost of maintaining it when it may never even be queried. You ??? >????? would need to look at footprint cost and performance impact. ??? >????? ????>????? Unfortunately I'm just about to board a plane and will be out for the ??? >????? next few days. I will try to look at this asap next week, but we will ? ??>????? need a lot more data on it. ??? >????? ????>????? Thanks, ??? >????? David ??? > ??? > On 6/28/19 3:31 PM, Daniil Titov wrote: ??? > Please review the change that improves performance of ThreadMXBean MXBean methods returning the ??? > information for specific threads. The change introduces the thread table that uses ConcurrentHashTable ??? > to store one-to-one the mapping between the thread ids and JavaThread objects and replaces the linear ??? > search over the thread list in ThreadsList::find_JavaThread_from_java_tid(jlong tid) method with the lookup ??? > in the thread table. ??? > ??? > Testing: Mach5 tier1,tier2 and tier3 tests successfully passed. ??? > ??? > Webrev: https://cr.openjdk.java.net/~dtitov/8185005/webrev.01/ ??? > Bug: https://bugs.openjdk.java.net/browse/JDK-8185005 ??? > ??? > Thanks! ??? > ??? > Best regards, ??? > Daniil ??? > ??? > ??? > ??? > ??? > ??? > ??? > ??? ???? From patricio.chilano.mateo at oracle.com Mon Jul 29 15:45:31 2019 From: patricio.chilano.mateo at oracle.com (Patricio Chilano) Date: Mon, 29 Jul 2019 11:45:31 -0400 Subject: RFR 8227528: TestAbortVMOnSafepointTimeout.java failed due to "RuntimeException: 'Safepoint sync time longer than' missing from stdout/stderr" In-Reply-To: References: Message-ID: <4ea238f5-57c7-9763-1af6-0489cfd11a4d@oracle.com> Hi Martin, On 7/29/19 10:13 AM, Doerr, Martin wrote: > Hi Patricio, I have also already noticed this issue. Thank you for > analyzing the root cause. Fix looks good to me. I don't need to see > another webrev for comment improvements, either. I've linked the bug > to JDK-8191890 and JDK-8219584. Great. I'll just add more comments for the other flags as David requested. Thanks for reviewing this Martin! Patricio > Best regards, Martin >> -----Original Message----- >> From: hotspot-runtime-dev > bounces at openjdk.java.net> On Behalf Of David Holmes >> Sent: Samstag, 27. Juli 2019 00:28 >> To: daniel.daugherty at oracle.com; Patricio Chilano >> ; hotspot-runtime- >> dev at openjdk.java.net runtime >> Subject: Re: RFR 8227528: TestAbortVMOnSafepointTimeout.java failed due >> to "RuntimeException: 'Safepoint sync time longer than' missing from >> stdout/stderr" >> >> On 27/07/2019 5:19 am, Daniel D. Daugherty wrote: >>> On 7/26/19 2:46 PM, Patricio Chilano wrote: >>>> Hi all, >>>> >>>> Could you review this small fix for test >>>> TestAbortVMOnSafepointTimeout.java? >>>> >>>> The test has been failing intermittently since 8191890. As explained >>>> in the bug comments, it turns out that a bias revocation handshake >>>> could happen in between the start of the "for" loop without safepoint >>>> polls and the safepoint where we want to timeout. That allows for the >>>> long loop to actually finish and prevents the desired timeout in the >>>> later safepoint. The simple solution is to just avoid using biased >>>> locking in this test (and therefore prevent the revocation handshake), >>>> since we just want to test the correct behavior of flag >>>> AbortVMOnSafepointTimeout. >>>> >>>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8227528/v01/webrev >>> The change itself is trivial. However, the reasons behind the change >>> aren't. >>> >>> This part of the description caught my eye: >>> >>> ??? the start of the "for" loop without safepoint polls >>> >>> and my brain did a "Say what?!?!" Of course, that was without looking at >>> the test which has a huge number of options, including these: >>> >>> ??? L70: ??????????????? "-XX:-UseCountedLoopSafepoints", >>> ??? L71: ??????????????? "-XX:LoopStripMiningIter=0", >>> ??? L72: ??????????????? "-XX:LoopUnrollLimit=0", >>> >>> Okay, now the world makes much more sense. We are intentionally telling >>> the compiler to not emit safepoint polls in the counted loop and we're >>> turning off other loop optimizations. Basically, we're telling the >>> compiler we want to stall in that loop until we exceed the safepoint >>> timeout limit. Got it... >>> >>> So the new biased locking handshake messes with the timeout that this >>> test is trying to achieve. Disabling biased locking makes the test more >>> robust by allowing the safepoint sync timeout to happen. >>> >>> A couple of minor suggestions: >>> >>> >> test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.ja >> va >>> ??? L30:? * @bug 8219584 >>> >>> ? ????? You should add an @bug for this bug (8227528). I don't know if >>> ??????? you can put more than one bug ID on an @bug line or if you need >>> ??????? a separate @bug line. >>> >>> ??? L61: ??????? ProcessBuilder pb = >>> ProcessTools.createJavaProcessBuilder( >>> ??????? Please add a comment above this line: >>> >>> ??????????? // -XX:-UseBiasedLocking - is used to prevent biased locking >>> ??????????? // handshakes from changing the timing of this test. >>> >>> Thumbs up. I don't need to see another webrev if you choose to make >>> the above changes. >> I think some additional commentary on the other exotic options to ensure >> the loop contains no safepoints and is not unrolled etc would also be >> worthwhile. >> >> Change itself makes sense. >> >> Thanks, >> David >> >>> Dan >>> >>> >>>> Bugid: https://bugs.openjdk.java.net/browse/JDK-8227528 >>>> >>>> Thanks! >>>> Patricio From martin.doerr at sap.com Mon Jul 29 15:46:20 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 29 Jul 2019 15:46:20 +0000 Subject: RFR 8227528: TestAbortVMOnSafepointTimeout.java failed due to "RuntimeException: 'Safepoint sync time longer than' missing from stdout/stderr" In-Reply-To: References: Message-ID: Hi everybody, just an additional remark: The test failure indicates that guaranteed safepoints are kind of broken. Using VM Operations resets the timer even if they don't use a real safepoint. Should we document that somehow? Best regards, Martin > -----Original Message----- > From: hotspot-runtime-dev bounces at openjdk.java.net> On Behalf Of Doerr, Martin > Sent: Montag, 29. Juli 2019 16:14 > To: David Holmes ; > daniel.daugherty at oracle.com; Patricio Chilano > ; hotspot-runtime- > dev at openjdk.java.net runtime > Subject: [CAUTION] RE: RFR 8227528: TestAbortVMOnSafepointTimeout.java > failed due to "RuntimeException: 'Safepoint sync time longer than' missing > from stdout/stderr" > > Hi Patricio, > > I have also already noticed this issue. Thank you for analyzing the root cause. > Fix looks good to me. I don't need to see another webrev for comment > improvements, either. > I've linked the bug to JDK-8191890 and JDK-8219584. > > Best regards, > Martin > > > > -----Original Message----- > > From: hotspot-runtime-dev > bounces at openjdk.java.net> On Behalf Of David Holmes > > Sent: Samstag, 27. Juli 2019 00:28 > > To: daniel.daugherty at oracle.com; Patricio Chilano > > ; hotspot-runtime- > > dev at openjdk.java.net runtime dev at openjdk.java.net> > > Subject: Re: RFR 8227528: TestAbortVMOnSafepointTimeout.java failed > due > > to "RuntimeException: 'Safepoint sync time longer than' missing from > > stdout/stderr" > > > > On 27/07/2019 5:19 am, Daniel D. Daugherty wrote: > > > On 7/26/19 2:46 PM, Patricio Chilano wrote: > > >> Hi all, > > >> > > >> Could you review this small fix for test > > >> TestAbortVMOnSafepointTimeout.java? > > >> > > >> The test has been failing intermittently since 8191890. As explained > > >> in the bug comments, it turns out that a bias revocation handshake > > >> could happen in between the start of the "for" loop without safepoint > > >> polls and the safepoint where we want to timeout. That allows for the > > >> long loop to actually finish and prevents the desired timeout in the > > >> later safepoint. The simple solution is to just avoid using biased > > >> locking in this test (and therefore prevent the revocation handshake), > > >> since we just want to test the correct behavior of flag > > >> AbortVMOnSafepointTimeout. > > >> > > >> Webrev: > http://cr.openjdk.java.net/~pchilanomate/8227528/v01/webrev > > > > > > The change itself is trivial. However, the reasons behind the change > > > aren't. > > > > > > This part of the description caught my eye: > > > > > > ??? the start of the "for" loop without safepoint polls > > > > > > and my brain did a "Say what?!?!" Of course, that was without looking at > > > the test which has a huge number of options, including these: > > > > > > ??? L70: ??????????????? "-XX:-UseCountedLoopSafepoints", > > > ??? L71: ??????????????? "-XX:LoopStripMiningIter=0", > > > ??? L72: ??????????????? "-XX:LoopUnrollLimit=0", > > > > > > Okay, now the world makes much more sense. We are intentionally > telling > > > the compiler to not emit safepoint polls in the counted loop and we're > > > turning off other loop optimizations. Basically, we're telling the > > > compiler we want to stall in that loop until we exceed the safepoint > > > timeout limit. Got it... > > > > > > So the new biased locking handshake messes with the timeout that this > > > test is trying to achieve. Disabling biased locking makes the test more > > > robust by allowing the safepoint sync timeout to happen. > > > > > > A couple of minor suggestions: > > > > > > > > > test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.ja > > va > > > ??? L30:? * @bug 8219584 > > > > > > ? ????? You should add an @bug for this bug (8227528). I don't know if > > > ??????? you can put more than one bug ID on an @bug line or if you need > > > ??????? a separate @bug line. > > > > > > ??? L61: ??????? ProcessBuilder pb = > > > ProcessTools.createJavaProcessBuilder( > > > ??????? Please add a comment above this line: > > > > > > ??????????? // -XX:-UseBiasedLocking - is used to prevent biased locking > > > ??????????? // handshakes from changing the timing of this test. > > > > > > Thumbs up. I don't need to see another webrev if you choose to make > > > the above changes. > > > > I think some additional commentary on the other exotic options to ensure > > the loop contains no safepoints and is not unrolled etc would also be > > worthwhile. > > > > Change itself makes sense. > > > > Thanks, > > David > > > > > > > > Dan > > > > > > > > >> Bugid: https://bugs.openjdk.java.net/browse/JDK-8227528 > > >> > > >> Thanks! > > >> Patricio > > > From martin.doerr at sap.com Mon Jul 29 17:43:14 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 29 Jul 2019 17:43:14 +0000 Subject: RFR(M): 8228743: [PPC64, s390] Implement FastJNIAccessors Message-ID: Hi, I'd like to contribute fast JNI Get*Field platform implementations for PPC64 and s390. Please review: http://cr.openjdk.java.net/~mdoerr/8228743_PPC64_s390_FastJNIAccessors/webrev.00/ Best regards, Martin From patricio.chilano.mateo at oracle.com Mon Jul 29 20:26:08 2019 From: patricio.chilano.mateo at oracle.com (Patricio Chilano) Date: Mon, 29 Jul 2019 16:26:08 -0400 Subject: RFR 8227528: TestAbortVMOnSafepointTimeout.java failed due to "RuntimeException: 'Safepoint sync time longer than' missing from stdout/stderr" In-Reply-To: References: Message-ID: <25f3dc03-53d8-40b6-e109-19944fe6f91f@oracle.com> On 7/29/19 11:46 AM, Doerr, Martin wrote: > Hi everybody, just an additional remark: The test failure indicates > that guaranteed safepoints are kind of broken. Using VM Operations > resets the timer even if they don't use a real safepoint. Should we > document that somehow? Just to add to that, the AbortVMOnSafepointTimeout flag now only aborts if once the VMThread starts a safepoint, the time to actually reach it exceeds the SafepointTimeoutDelay limit. If the GuaranteedSafepointInterval flag is supposed to guarantee safepoints every X amount of time, which the name seems to suggest, then yes it's broken. As of now GuaranteedSafepointInterval is used as a parameter to know when to check for pending safepoints (needed for cleanup), but the actual safepoint could happen way after GuaranteedSafepointInterval time since the last one. So, possibly the flag name needs to be changed. Maybe somebody could give more context on the GuaranteedSafepointInterval flag. Patricio > Best regards, > Martin > > >> -----Original Message----- >> From: hotspot-runtime-dev > bounces at openjdk.java.net> On Behalf Of Doerr, Martin >> Sent: Montag, 29. Juli 2019 16:14 >> To: David Holmes ; >> daniel.daugherty at oracle.com; Patricio Chilano >> ; hotspot-runtime- >> dev at openjdk.java.net runtime >> Subject: [CAUTION] RE: RFR 8227528: TestAbortVMOnSafepointTimeout.java >> failed due to "RuntimeException: 'Safepoint sync time longer than' missing >> from stdout/stderr" >> >> Hi Patricio, >> >> I have also already noticed this issue. Thank you for analyzing the root cause. >> Fix looks good to me. I don't need to see another webrev for comment >> improvements, either. >> I've linked the bug to JDK-8191890 and JDK-8219584. >> >> Best regards, >> Martin >> >> >>> -----Original Message----- >>> From: hotspot-runtime-dev >> bounces at openjdk.java.net> On Behalf Of David Holmes >>> Sent: Samstag, 27. Juli 2019 00:28 >>> To: daniel.daugherty at oracle.com; Patricio Chilano >>> ; hotspot-runtime- >>> dev at openjdk.java.net runtime > dev at openjdk.java.net> >>> Subject: Re: RFR 8227528: TestAbortVMOnSafepointTimeout.java failed >> due >>> to "RuntimeException: 'Safepoint sync time longer than' missing from >>> stdout/stderr" >>> >>> On 27/07/2019 5:19 am, Daniel D. Daugherty wrote: >>>> On 7/26/19 2:46 PM, Patricio Chilano wrote: >>>>> Hi all, >>>>> >>>>> Could you review this small fix for test >>>>> TestAbortVMOnSafepointTimeout.java? >>>>> >>>>> The test has been failing intermittently since 8191890. As explained >>>>> in the bug comments, it turns out that a bias revocation handshake >>>>> could happen in between the start of the "for" loop without safepoint >>>>> polls and the safepoint where we want to timeout. That allows for the >>>>> long loop to actually finish and prevents the desired timeout in the >>>>> later safepoint. The simple solution is to just avoid using biased >>>>> locking in this test (and therefore prevent the revocation handshake), >>>>> since we just want to test the correct behavior of flag >>>>> AbortVMOnSafepointTimeout. >>>>> >>>>> Webrev: >> http://cr.openjdk.java.net/~pchilanomate/8227528/v01/webrev >>>> The change itself is trivial. However, the reasons behind the change >>>> aren't. >>>> >>>> This part of the description caught my eye: >>>> >>>> ??? the start of the "for" loop without safepoint polls >>>> >>>> and my brain did a "Say what?!?!" Of course, that was without looking at >>>> the test which has a huge number of options, including these: >>>> >>>> ??? L70: ??????????????? "-XX:-UseCountedLoopSafepoints", >>>> ??? L71: ??????????????? "-XX:LoopStripMiningIter=0", >>>> ??? L72: ??????????????? "-XX:LoopUnrollLimit=0", >>>> >>>> Okay, now the world makes much more sense. We are intentionally >> telling >>>> the compiler to not emit safepoint polls in the counted loop and we're >>>> turning off other loop optimizations. Basically, we're telling the >>>> compiler we want to stall in that loop until we exceed the safepoint >>>> timeout limit. Got it... >>>> >>>> So the new biased locking handshake messes with the timeout that this >>>> test is trying to achieve. Disabling biased locking makes the test more >>>> robust by allowing the safepoint sync timeout to happen. >>>> >>>> A couple of minor suggestions: >>>> >>>> >> test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.ja >>> va >>>> ??? L30:? * @bug 8219584 >>>> >>>> ? ????? You should add an @bug for this bug (8227528). I don't know if >>>> ??????? you can put more than one bug ID on an @bug line or if you need >>>> ??????? a separate @bug line. >>>> >>>> ??? L61: ??????? ProcessBuilder pb = >>>> ProcessTools.createJavaProcessBuilder( >>>> ??????? Please add a comment above this line: >>>> >>>> ??????????? // -XX:-UseBiasedLocking - is used to prevent biased locking >>>> ??????????? // handshakes from changing the timing of this test. >>>> >>>> Thumbs up. I don't need to see another webrev if you choose to make >>>> the above changes. >>> I think some additional commentary on the other exotic options to ensure >>> the loop contains no safepoints and is not unrolled etc would also be >>> worthwhile. >>> >>> Change itself makes sense. >>> >>> Thanks, >>> David >>> >>>> Dan >>>> >>>> >>>>> Bugid: https://bugs.openjdk.java.net/browse/JDK-8227528 >>>>> >>>>> Thanks! >>>>> Patricio From dean.long at oracle.com Mon Jul 29 21:23:26 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 29 Jul 2019 14:23:26 -0700 Subject: RFR 8228596: Class redefinition fails when condy instructions are removed In-Reply-To: References: <6b96b8db-3f2d-1000-f685-16b746905eb0@oracle.com> Message-ID: On 7/29/19 6:11 AM, Harold Seigel wrote: > Hi Dean, > > Thanks for pointing this out.? There is an existing bug, JDK-8155673 > , to remove constant > pool merging.? This will prevent similar problems to this one because > constant pools will no longer need to be merged. > OK, sounds good. dl > Thanks, Harold > > On 7/26/2019 5:44 PM, dean.long at oracle.com wrote: >> I see a fix for a specific problem, but I don't see anything >> preventing similar problems (a change in ConstantPool that's isn't >> reflected in VM_RedefineClasses::merge_cp_and_rewrite) from happening >> again. I understand that merge_cp_and_rewrite needs to have intimate >> knowledge of CP internals, but maybe some refactoring could reduce >> the future change "risk surface". >> >> dl >> >> On 7/26/19 5:04 AM, Harold Seigel wrote: >>> Hi, >>> >>> Please review this small JDK-14 fix for an issue with constant pool >>> merging when redefining a class whose constant pool contains a >>> constant dynamic entry.? The fix makes sure that the >>> has_dynamic_constant flag gets copied properly to the merged >>> constant pool. >>> >>> Open Webrev: >>> http://cr.openjdk.java.net/~hseigel/bug_8228596/webrev/index.html >>> >>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8228596 >>> >>> The fix was regression tested by running Mach5 tiers 1 and 2 tests >>> and builds on Linux-x64, Solaris, Windows, and Mac OS X, by running >>> Mach5 tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on >>> Linux-x64. >>> >>> Thanks, Harold >>> >> From rainer.jung at kippdata.de Mon Jul 29 21:34:29 2019 From: rainer.jung at kippdata.de (Rainer Jung) Date: Mon, 29 Jul 2019 23:34:29 +0200 Subject: New library dependencies due to 8222720 (fb5b3981eac) Message-ID: While doing Tomcat tests I noticed, that at least on SLES 12 JDK 13 and 14 EA have a lot of new runtime library dependencies. Change fb5b3981eac with log 8222720: Provide extended VMWare/vSphere virtualization related info in the hs_error file on linux/windows x86_64 loads /usr/lib64/libguestlib.so.0 already during JVM startup. That library depends on /usr/lib64/libvmtools.so.0, which in turn depends on a lot of other libraries: NEEDED libdnet.so.1 NEEDED libglib-2.0.so.0 NEEDED libicui18n.so.52.1 NEEDED libicuuc.so.52.1 NEEDED libpthread.so.0 NEEDED libdl.so.2 NEEDED libssl.so.1.0.0 NEEDED libcrypto.so.1.0.0 NEEDED libc.so.6 NEEDED ld-linux-x86-64.so.2 NEEDED libgcc_s.so.1 Some are not so problematic, but for instance Tomcat is able to use custom build OpenSSL libraries to replace the JSSE crypto engine with an OpenSSL based one using JNI. Unfortunately the JDK is now loading libssl and libcrypto early. In case our TC OpenSSL also uses SO version 1.0.0 it will not get loaded, in case it is another version we can run into a mix of symbols resolved in the platform OpenSSL libs now loaded early and the ones provided with TC loaded later. This is an example, why it would be good to not introduce too many native library dependencies for the JVM or make it optional in the sense of configurable during runtime. Of the above list, the icu libs, libglib and libdnet are other libs one would probably try to avoid. Don't know whether this list is appropriate for discussing it. If not any pointers to a better list are appreciated. Thanks and regards, Rainer From patricio.chilano.mateo at oracle.com Mon Jul 29 23:45:44 2019 From: patricio.chilano.mateo at oracle.com (Patricio Chilano) Date: Mon, 29 Jul 2019 19:45:44 -0400 Subject: RFR 8227528: TestAbortVMOnSafepointTimeout.java failed due to "RuntimeException: 'Safepoint sync time longer than' missing from stdout/stderr" In-Reply-To: References: Message-ID: <64d07f0c-c920-cc76-6406-ff24168270b6@oracle.com> Hi David, On 7/26/19 6:27 PM, David Holmes wrote: > On 27/07/2019 5:19 am, Daniel D. Daugherty wrote: >> On 7/26/19 2:46 PM, Patricio Chilano wrote: >>> Hi all, >>> >>> Could you review this small fix for test >>> TestAbortVMOnSafepointTimeout.java? >>> >>> The test has been failing intermittently since 8191890. As explained >>> in the bug comments, it turns out that a bias revocation handshake >>> could happen in between the start of the "for" loop without >>> safepoint polls and the safepoint where we want to timeout. That >>> allows for the long loop to actually finish and prevents the desired >>> timeout in the later safepoint. The simple solution is to just avoid >>> using biased locking in this test (and therefore prevent the >>> revocation handshake), since we just want to test the correct >>> behavior of flag AbortVMOnSafepointTimeout. >>> >>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8227528/v01/webrev >> >> The change itself is trivial. However, the reasons behind the change >> aren't. >> >> This part of the description caught my eye: >> >> ???? the start of the "for" loop without safepoint polls >> >> and my brain did a "Say what?!?!" Of course, that was without looking at >> the test which has a huge number of options, including these: >> >> ???? L70: ??????????????? "-XX:-UseCountedLoopSafepoints", >> ???? L71: ??????????????? "-XX:LoopStripMiningIter=0", >> ???? L72: ??????????????? "-XX:LoopUnrollLimit=0", >> >> Okay, now the world makes much more sense. We are intentionally telling >> the compiler to not emit safepoint polls in the counted loop and we're >> turning off other loop optimizations. Basically, we're telling the >> compiler we want to stall in that loop until we exceed the safepoint >> timeout limit. Got it... >> >> So the new biased locking handshake messes with the timeout that this >> test is trying to achieve. Disabling biased locking makes the test more >> robust by allowing the safepoint sync timeout to happen. >> >> A couple of minor suggestions: >> >> test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.java >> ???? L30:? * @bug 8219584 >> >> ?? ????? You should add an @bug for this bug (8227528). I don't know if >> ???????? you can put more than one bug ID on an @bug line or if you need >> ???????? a separate @bug line. >> >> ???? L61: ??????? ProcessBuilder pb = >> ProcessTools.createJavaProcessBuilder( >> ???????? Please add a comment above this line: >> >> ???????????? // -XX:-UseBiasedLocking - is used to prevent biased >> locking >> ???????????? // handshakes from changing the timing of this test. >> >> Thumbs up. I don't need to see another webrev if you choose to make >> the above changes. > > I think some additional commentary on the other exotic options to > ensure the loop contains no safepoints and is not unrolled etc would > also be worthwhile. I added comments for flags UseCountedLoopSafepoints, LoopStripMiningIter and LoopUnrollLimit. Here are the links to v02: Full: http://cr.openjdk.java.net/~pchilanomate/8227528/v02/webrev/ Inc: http://cr.openjdk.java.net/~pchilanomate/8227528/v02/inc/webrev/ Thanks for looking at this David! Patricio > Change itself makes sense. > > Thanks, > David > >> >> Dan >> >> >>> Bugid: https://bugs.openjdk.java.net/browse/JDK-8227528 >>> >>> Thanks! >>> Patricio >> From david.holmes at oracle.com Mon Jul 29 23:56:56 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 30 Jul 2019 09:56:56 +1000 Subject: New library dependencies due to 8222720 (fb5b3981eac) In-Reply-To: References: Message-ID: <86843c72-04ac-28ac-4b92-b9227d7ee649@oracle.com> Hi Rainer, On 30/07/2019 7:34 am, Rainer Jung wrote: > While doing Tomcat tests I noticed, that at least on SLES 12 JDK 13 and > 14 EA have a lot of new runtime library dependencies. > > Change fb5b3981eac with log > > 8222720: Provide extended VMWare/vSphere virtualization related info in > the hs_error file on linux/windows x86_64 > > loads /usr/lib64/libguestlib.so.0 already during JVM startup. That > library depends on /usr/lib64/libvmtools.so.0, which in turn depends on > a lot of other libraries: > > ? NEEDED?????????????? libdnet.so.1 > ? NEEDED?????????????? libglib-2.0.so.0 > ? NEEDED?????????????? libicui18n.so.52.1 > ? NEEDED?????????????? libicuuc.so.52.1 > ? NEEDED?????????????? libpthread.so.0 > ? NEEDED?????????????? libdl.so.2 > ? NEEDED?????????????? libssl.so.1.0.0 > ? NEEDED?????????????? libcrypto.so.1.0.0 > ? NEEDED?????????????? libc.so.6 > ? NEEDED?????????????? ld-linux-x86-64.so.2 > ? NEEDED?????????????? libgcc_s.so.1 > > Some are not so problematic, but for instance Tomcat is able to use > custom build OpenSSL libraries to replace the JSSE crypto engine with an > OpenSSL based one using JNI. Unfortunately the JDK is now loading libssl > and libcrypto early. In case our TC OpenSSL also uses SO version 1.0.0 > it will not get loaded, in case it is another version we can run into a > mix of symbols resolved in the platform OpenSSL libs now loaded early > and the ones provided with TC loaded later. > > This is an example, why it would be good to not introduce too many > native library dependencies for the JVM or make it optional in the sense > of configurable during runtime. Of the above list, the icu libs, libglib > and libdnet are other libs one would probably try to avoid. > > Don't know whether this list is appropriate for discussing it. If not > any pointers to a better list are appreciated. This is the correct list to discuss this. When 8222720 was put in I had no idea it would result in eager loading of libraries beyond the explicit load of libguestlib. To be clear you are running under VMWare? This should only happen to enable reporting for the VMWare virtualization info in case of a crash. This may need to be revisited. Thanks for the report. David ----- > Thanks and regards, > > Rainer > > From daniel.daugherty at oracle.com Tue Jul 30 00:05:52 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 29 Jul 2019 20:05:52 -0400 Subject: RFR 8227528: TestAbortVMOnSafepointTimeout.java failed due to "RuntimeException: 'Safepoint sync time longer than' missing from stdout/stderr" In-Reply-To: <25f3dc03-53d8-40b6-e109-19944fe6f91f@oracle.com> References: <25f3dc03-53d8-40b6-e109-19944fe6f91f@oracle.com> Message-ID: <2e148454-5cc2-20cb-f8bd-cf30dcc09ce5@oracle.com> On 7/29/19 4:26 PM, Patricio Chilano wrote: > > On 7/29/19 11:46 AM, Doerr, Martin wrote: >> Hi everybody, just an additional remark: The test failure indicates >> that guaranteed safepoints are kind of broken. Using VM Operations >> resets the timer even if they don't use a real safepoint. Should we >> document that somehow? > Just to add to that, the AbortVMOnSafepointTimeout flag now only > aborts if once the VMThread starts a safepoint, the time to actually > reach it exceeds the SafepointTimeoutDelay limit. If the > GuaranteedSafepointInterval flag is supposed to guarantee safepoints > every X amount of time, which the name seems to suggest, then yes it's > broken. As of now GuaranteedSafepointInterval is used as a parameter > to know when to check for pending safepoints (needed for cleanup), but > the actual safepoint could happen way after > GuaranteedSafepointInterval time since the last one. So, possibly the > flag name needs to be changed. Maybe somebody could give more context > on the GuaranteedSafepointInterval flag. src/hotspot/share/runtime/globals.hpp: ? /* notice: the max range value here is max_jint, not max_intx */???????? \ ? /* because of overflow issue */???????? \ ? diagnostic(intx, GuaranteedSafepointInterval, 1000,?????????????????????? \ ????????? "Guarantee a safepoint (at least) every so many milliseconds "??? \ ????????? "(0 means none)")???????????????????????????????????????????????? \ ????????? range(0, max_jint)??????????????????????????????????????????????? \ There are a few other options that depend on/interact with GuaranteedSafepointInterval. If it is indeed broken, then we need a new bug and a fix to get that working again. Dan > > Patricio >> Best regards, >> Martin >> >> >>> -----Original Message----- >>> From: hotspot-runtime-dev >> bounces at openjdk.java.net> On Behalf Of Doerr, Martin >>> Sent: Montag, 29. Juli 2019 16:14 >>> To: David Holmes ; >>> daniel.daugherty at oracle.com; Patricio Chilano >>> ; hotspot-runtime- >>> dev at openjdk.java.net runtime >>> Subject: [CAUTION] RE: RFR 8227528: TestAbortVMOnSafepointTimeout.java >>> failed due to "RuntimeException: 'Safepoint sync time longer than' >>> missing >>> from stdout/stderr" >>> >>> Hi Patricio, >>> >>> I have also already noticed this issue. Thank you for analyzing the >>> root cause. >>> Fix looks good to me. I don't need to see another webrev for comment >>> improvements, either. >>> I've linked the bug to JDK-8191890 and JDK-8219584. >>> >>> Best regards, >>> Martin >>> >>> >>>> -----Original Message----- >>>> From: hotspot-runtime-dev >>> bounces at openjdk.java.net> On Behalf Of David Holmes >>>> Sent: Samstag, 27. Juli 2019 00:28 >>>> To: daniel.daugherty at oracle.com; Patricio Chilano >>>> ; hotspot-runtime- >>>> dev at openjdk.java.net runtime >> dev at openjdk.java.net> >>>> Subject: Re: RFR 8227528: TestAbortVMOnSafepointTimeout.java failed >>> due >>>> to "RuntimeException: 'Safepoint sync time longer than' missing from >>>> stdout/stderr" >>>> >>>> On 27/07/2019 5:19 am, Daniel D. Daugherty wrote: >>>>> On 7/26/19 2:46 PM, Patricio Chilano wrote: >>>>>> Hi all, >>>>>> >>>>>> Could you review this small fix for test >>>>>> TestAbortVMOnSafepointTimeout.java? >>>>>> >>>>>> The test has been failing intermittently since 8191890. As explained >>>>>> in the bug comments, it turns out that a bias revocation handshake >>>>>> could happen in between the start of the "for" loop without >>>>>> safepoint >>>>>> polls and the safepoint where we want to timeout. That allows for >>>>>> the >>>>>> long loop to actually finish and prevents the desired timeout in the >>>>>> later safepoint. The simple solution is to just avoid using biased >>>>>> locking in this test (and therefore prevent the revocation >>>>>> handshake), >>>>>> since we just want to test the correct behavior of flag >>>>>> AbortVMOnSafepointTimeout. >>>>>> >>>>>> Webrev: >>> http://cr.openjdk.java.net/~pchilanomate/8227528/v01/webrev >>>>> The change itself is trivial. However, the reasons behind the change >>>>> aren't. >>>>> >>>>> This part of the description caught my eye: >>>>> >>>>> ? ??? the start of the "for" loop without safepoint polls >>>>> >>>>> and my brain did a "Say what?!?!" Of course, that was without >>>>> looking at >>>>> the test which has a huge number of options, including these: >>>>> >>>>> ? ??? L70: "-XX:-UseCountedLoopSafepoints", >>>>> ? ??? L71: ??????????????? "-XX:LoopStripMiningIter=0", >>>>> ? ??? L72: ??????????????? "-XX:LoopUnrollLimit=0", >>>>> >>>>> Okay, now the world makes much more sense. We are intentionally >>> telling >>>>> the compiler to not emit safepoint polls in the counted loop and >>>>> we're >>>>> turning off other loop optimizations. Basically, we're telling the >>>>> compiler we want to stall in that loop until we exceed the safepoint >>>>> timeout limit. Got it... >>>>> >>>>> So the new biased locking handshake messes with the timeout that this >>>>> test is trying to achieve. Disabling biased locking makes the test >>>>> more >>>>> robust by allowing the safepoint sync timeout to happen. >>>>> >>>>> A couple of minor suggestions: >>>>> >>>>> >>> test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.ja >>>> va >>>>> ? ??? L30:? * @bug 8219584 >>>>> >>>>> ? ? ????? You should add an @bug for this bug (8227528). I don't >>>>> know if >>>>> ? ??????? you can put more than one bug ID on an @bug line or if >>>>> you need >>>>> ? ??????? a separate @bug line. >>>>> >>>>> ? ??? L61: ??????? ProcessBuilder pb = >>>>> ProcessTools.createJavaProcessBuilder( >>>>> ? ??????? Please add a comment above this line: >>>>> >>>>> ? ??????????? // -XX:-UseBiasedLocking - is used to prevent biased >>>>> locking >>>>> ? ??????????? // handshakes from changing the timing of this test. >>>>> >>>>> Thumbs up. I don't need to see another webrev if you choose to make >>>>> the above changes. >>>> I think some additional commentary on the other exotic options to >>>> ensure >>>> the loop contains no safepoints and is not unrolled etc would also be >>>> worthwhile. >>>> >>>> Change itself makes sense. >>>> >>>> Thanks, >>>> David >>>> >>>>> Dan >>>>> >>>>> >>>>>> Bugid: https://bugs.openjdk.java.net/browse/JDK-8227528 >>>>>> >>>>>> Thanks! >>>>>> Patricio > From david.holmes at oracle.com Tue Jul 30 02:18:42 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 30 Jul 2019 12:18:42 +1000 Subject: RFR 8227528: TestAbortVMOnSafepointTimeout.java failed due to "RuntimeException: 'Safepoint sync time longer than' missing from stdout/stderr" In-Reply-To: <25f3dc03-53d8-40b6-e109-19944fe6f91f@oracle.com> References: <25f3dc03-53d8-40b6-e109-19944fe6f91f@oracle.com> Message-ID: On 30/07/2019 6:26 am, Patricio Chilano wrote: > > On 7/29/19 11:46 AM, Doerr, Martin wrote: >> Hi everybody, just an additional remark: The test failure indicates >> that guaranteed safepoints are kind of broken. Using VM Operations >> resets the timer even if they don't use a real safepoint. Should we >> document that somehow? > Just to add to that, the AbortVMOnSafepointTimeout flag now only aborts > if once the VMThread starts a safepoint, the time to actually reach it > exceeds the SafepointTimeoutDelay limit. If the > GuaranteedSafepointInterval flag is supposed to guarantee safepoints > every X amount of time, which the name seems to suggest, then yes it's > broken. It's supposed to guarantee a minimum interval to initiate a safepoint. It can't guarantee that a safepoint will actually be reached - as happens in this test. > As of now GuaranteedSafepointInterval is used as a parameter to > know when to check for pending safepoints (needed for cleanup), but the > actual safepoint could happen way after GuaranteedSafepointInterval time > since the last one. So, possibly the flag name needs to be changed. > Maybe somebody could give more context on the > GuaranteedSafepointInterval flag. With Handshakes now added to the mix, yes the actual safepoint could happen even later than previously. The intent is that the VMThread will wakeup at least every GuaranteedSafepointInterval and initiate a safepoint using a "no_op" VM operation (see VMThread::no_op_safepoint()). The main usecase is to run periodic cleanup operations (many of which no longer occur at safepoints anyway). If the VMThread wakes earlier because of a requested VMoperation, and that VMOperation is not a safepoint-VMOp, then we could go much longer between safepoints, depending on the length of time that VM-op takes. But once we've executed a VM-op we re-examine VMThread::no_op_safepoint() and force a safepoint if needed. So the interval is more "best effort" than "guaranteed". David ----- > > Patricio >> Best regards, >> Martin >> >> >>> -----Original Message----- >>> From: hotspot-runtime-dev >> bounces at openjdk.java.net> On Behalf Of Doerr, Martin >>> Sent: Montag, 29. Juli 2019 16:14 >>> To: David Holmes ; >>> daniel.daugherty at oracle.com; Patricio Chilano >>> ; hotspot-runtime- >>> dev at openjdk.java.net runtime >>> Subject: [CAUTION] RE: RFR 8227528: TestAbortVMOnSafepointTimeout.java >>> failed due to "RuntimeException: 'Safepoint sync time longer than' >>> missing >>> from stdout/stderr" >>> >>> Hi Patricio, >>> >>> I have also already noticed this issue. Thank you for analyzing the >>> root cause. >>> Fix looks good to me. I don't need to see another webrev for comment >>> improvements, either. >>> I've linked the bug to JDK-8191890 and JDK-8219584. >>> >>> Best regards, >>> Martin >>> >>> >>>> -----Original Message----- >>>> From: hotspot-runtime-dev >>> bounces at openjdk.java.net> On Behalf Of David Holmes >>>> Sent: Samstag, 27. Juli 2019 00:28 >>>> To: daniel.daugherty at oracle.com; Patricio Chilano >>>> ; hotspot-runtime- >>>> dev at openjdk.java.net runtime >> dev at openjdk.java.net> >>>> Subject: Re: RFR 8227528: TestAbortVMOnSafepointTimeout.java failed >>> due >>>> to "RuntimeException: 'Safepoint sync time longer than' missing from >>>> stdout/stderr" >>>> >>>> On 27/07/2019 5:19 am, Daniel D. Daugherty wrote: >>>>> On 7/26/19 2:46 PM, Patricio Chilano wrote: >>>>>> Hi all, >>>>>> >>>>>> Could you review this small fix for test >>>>>> TestAbortVMOnSafepointTimeout.java? >>>>>> >>>>>> The test has been failing intermittently since 8191890. As explained >>>>>> in the bug comments, it turns out that a bias revocation handshake >>>>>> could happen in between the start of the "for" loop without safepoint >>>>>> polls and the safepoint where we want to timeout. That allows for the >>>>>> long loop to actually finish and prevents the desired timeout in the >>>>>> later safepoint. The simple solution is to just avoid using biased >>>>>> locking in this test (and therefore prevent the revocation >>>>>> handshake), >>>>>> since we just want to test the correct behavior of flag >>>>>> AbortVMOnSafepointTimeout. >>>>>> >>>>>> Webrev: >>> http://cr.openjdk.java.net/~pchilanomate/8227528/v01/webrev >>>>> The change itself is trivial. However, the reasons behind the change >>>>> aren't. >>>>> >>>>> This part of the description caught my eye: >>>>> >>>>> ? ??? the start of the "for" loop without safepoint polls >>>>> >>>>> and my brain did a "Say what?!?!" Of course, that was without >>>>> looking at >>>>> the test which has a huge number of options, including these: >>>>> >>>>> ? ??? L70: ??????????????? "-XX:-UseCountedLoopSafepoints", >>>>> ? ??? L71: ??????????????? "-XX:LoopStripMiningIter=0", >>>>> ? ??? L72: ??????????????? "-XX:LoopUnrollLimit=0", >>>>> >>>>> Okay, now the world makes much more sense. We are intentionally >>> telling >>>>> the compiler to not emit safepoint polls in the counted loop and we're >>>>> turning off other loop optimizations. Basically, we're telling the >>>>> compiler we want to stall in that loop until we exceed the safepoint >>>>> timeout limit. Got it... >>>>> >>>>> So the new biased locking handshake messes with the timeout that this >>>>> test is trying to achieve. Disabling biased locking makes the test >>>>> more >>>>> robust by allowing the safepoint sync timeout to happen. >>>>> >>>>> A couple of minor suggestions: >>>>> >>>>> >>> test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.ja >>>> va >>>>> ? ??? L30:? * @bug 8219584 >>>>> >>>>> ? ? ????? You should add an @bug for this bug (8227528). I don't >>>>> know if >>>>> ? ??????? you can put more than one bug ID on an @bug line or if >>>>> you need >>>>> ? ??????? a separate @bug line. >>>>> >>>>> ? ??? L61: ??????? ProcessBuilder pb = >>>>> ProcessTools.createJavaProcessBuilder( >>>>> ? ??????? Please add a comment above this line: >>>>> >>>>> ? ??????????? // -XX:-UseBiasedLocking - is used to prevent biased >>>>> locking >>>>> ? ??????????? // handshakes from changing the timing of this test. >>>>> >>>>> Thumbs up. I don't need to see another webrev if you choose to make >>>>> the above changes. >>>> I think some additional commentary on the other exotic options to >>>> ensure >>>> the loop contains no safepoints and is not unrolled etc would also be >>>> worthwhile. >>>> >>>> Change itself makes sense. >>>> >>>> Thanks, >>>> David >>>> >>>>> Dan >>>>> >>>>> >>>>>> Bugid: https://bugs.openjdk.java.net/browse/JDK-8227528 >>>>>> >>>>>> Thanks! >>>>>> Patricio > From mikhailo.seledtsov at oracle.com Tue Jul 30 03:46:41 2019 From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com) Date: Mon, 29 Jul 2019 20:46:41 -0700 Subject: RFR(S): 8195809: [TESTBUG] jps and jcmd -l support for Docker containers is not tested Message-ID: Please review this change that: ? - adds test case for "jcmd -l" and "jcmd help" where jcmd is executed on a host/node outside the container, ??? while a targeted JVM is running inside a container ? - factors out some common functionality to DockerTestUtils and docker.Common Please note: ? - the "jcmd -l" works in this configuration, however the JCMD's and Target's username and UID have to match ??? (per design) ? - the "jcmd help", "jcmd JFR.start" or any other JCMD command besides "jcmd -l" does not work in this configuration ??? (Filed "JDK-8228343: JCMD and attach fail to work across Linux Container boundary") ??? The test case is commented out, however can be used for reproducing the issue, and will be enabled ??? once the bug is fixed. ??? JBS: https://bugs.openjdk.java.net/browse/JDK-8195809 ??? Webrev: http://cr.openjdk.java.net/~mseledtsov/8195809.00/ ??? Testing: ????? - ran the new test multiple times on Linux-x64 ????? - ran TestJCMDWithSideCar multiple times on Linux-x64 ????? - ran all Docker/Container tests (HotSpot and JDK) ??? All PASS Thank you, Misha From david.holmes at oracle.com Tue Jul 30 05:09:36 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 30 Jul 2019 15:09:36 +1000 Subject: RFR (XXXS) 8227250: UserHandler contains ancient LinuxThreads code Message-ID: <4c677b95-edcd-584b-6250-630422d20c71@oracle.com> bug: https://bugs.openjdk.java.net/browse/JDK-8227250 webrev: http://cr.openjdk.java.net/~dholmes/8227250/webrev/ Removed some ancient Linux code that pertained to the LinuxThreads implementation, and which was erroneously copied into the BSD and AIX ports. Thanks, David From rainer.jung at kippdata.de Tue Jul 30 07:34:36 2019 From: rainer.jung at kippdata.de (Rainer Jung) Date: Tue, 30 Jul 2019 09:34:36 +0200 Subject: New library dependencies due to 8222720 (fb5b3981eac) In-Reply-To: <86843c72-04ac-28ac-4b92-b9227d7ee649@oracle.com> References: <86843c72-04ac-28ac-4b92-b9227d7ee649@oracle.com> Message-ID: <60ef3524-ca99-4ea8-c3f2-aca642160620@kippdata.de> Hi David, Am 30.07.2019 um 01:56 schrieb David Holmes: > Hi Rainer, > > On 30/07/2019 7:34 am, Rainer Jung wrote: >> While doing Tomcat tests I noticed, that at least on SLES 12 JDK 13 >> and 14 EA have a lot of new runtime library dependencies. >> >> Change fb5b3981eac with log >> >> 8222720: Provide extended VMWare/vSphere virtualization related info >> in the hs_error file on linux/windows x86_64 >> >> loads /usr/lib64/libguestlib.so.0 already during JVM startup. That >> library depends on /usr/lib64/libvmtools.so.0, which in turn depends >> on a lot of other libraries: >> >> ?? NEEDED?????????????? libdnet.so.1 >> ?? NEEDED?????????????? libglib-2.0.so.0 >> ?? NEEDED?????????????? libicui18n.so.52.1 >> ?? NEEDED?????????????? libicuuc.so.52.1 >> ?? NEEDED?????????????? libpthread.so.0 >> ?? NEEDED?????????????? libdl.so.2 >> ?? NEEDED?????????????? libssl.so.1.0.0 >> ?? NEEDED?????????????? libcrypto.so.1.0.0 >> ?? NEEDED?????????????? libc.so.6 >> ?? NEEDED?????????????? ld-linux-x86-64.so.2 >> ?? NEEDED?????????????? libgcc_s.so.1 >> >> Some are not so problematic, but for instance Tomcat is able to use >> custom build OpenSSL libraries to replace the JSSE crypto engine with >> an OpenSSL based one using JNI. Unfortunately the JDK is now loading >> libssl and libcrypto early. In case our TC OpenSSL also uses SO >> version 1.0.0 it will not get loaded, in case it is another version we >> can run into a mix of symbols resolved in the platform OpenSSL libs >> now loaded early and the ones provided with TC loaded later. >> >> This is an example, why it would be good to not introduce too many >> native library dependencies for the JVM or make it optional in the >> sense of configurable during runtime. Of the above list, the icu libs, >> libglib and libdnet are other libs one would probably try to avoid. >> >> Don't know whether this list is appropriate for discussing it. If not >> any pointers to a better list are appreciated. > > This is the correct list to discuss this. > > When 8222720 was put in I had no idea it would result in eager loading > of libraries beyond the explicit load of libguestlib. > > To be clear you are running under VMWare? This should only happen to > enable reporting for the VMWare virtualization info in case of a crash. Yes, I am running under VMWare. The library /usr/lib64/libguestlib.so.0 and its dependency /usr/lib64/libvmtools.so.0 both belong to the package libvmtools0. Its sources seem to be available at https://github.com/vmware/open-vm-tools. > This may need to be revisited. > > Thanks for the report. Thanks for looking at this! Rainer From david.holmes at oracle.com Tue Jul 30 07:50:47 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 30 Jul 2019 17:50:47 +1000 Subject: New library dependencies due to 8222720 (fb5b3981eac) In-Reply-To: <60ef3524-ca99-4ea8-c3f2-aca642160620@kippdata.de> References: <86843c72-04ac-28ac-4b92-b9227d7ee649@oracle.com> <60ef3524-ca99-4ea8-c3f2-aca642160620@kippdata.de> Message-ID: <73b6c6d9-292e-e889-db08-f4001743d18b@oracle.com> Hi Rainer, I have filed: https://bugs.openjdk.java.net/browse/JDK-8228764 Matthias: I think we may have to backout JDK-8222720 from JDK 13, re-examine this and re-do for 14. Thanks, David ----- On 30/07/2019 5:34 pm, Rainer Jung wrote: > Hi David, > > Am 30.07.2019 um 01:56 schrieb David Holmes: >> Hi Rainer, >> >> On 30/07/2019 7:34 am, Rainer Jung wrote: >>> While doing Tomcat tests I noticed, that at least on SLES 12 JDK 13 >>> and 14 EA have a lot of new runtime library dependencies. >>> >>> Change fb5b3981eac with log >>> >>> 8222720: Provide extended VMWare/vSphere virtualization related info >>> in the hs_error file on linux/windows x86_64 >>> >>> loads /usr/lib64/libguestlib.so.0 already during JVM startup. That >>> library depends on /usr/lib64/libvmtools.so.0, which in turn depends >>> on a lot of other libraries: >>> >>> ?? NEEDED?????????????? libdnet.so.1 >>> ?? NEEDED?????????????? libglib-2.0.so.0 >>> ?? NEEDED?????????????? libicui18n.so.52.1 >>> ?? NEEDED?????????????? libicuuc.so.52.1 >>> ?? NEEDED?????????????? libpthread.so.0 >>> ?? NEEDED?????????????? libdl.so.2 >>> ?? NEEDED?????????????? libssl.so.1.0.0 >>> ?? NEEDED?????????????? libcrypto.so.1.0.0 >>> ?? NEEDED?????????????? libc.so.6 >>> ?? NEEDED?????????????? ld-linux-x86-64.so.2 >>> ?? NEEDED?????????????? libgcc_s.so.1 >>> >>> Some are not so problematic, but for instance Tomcat is able to use >>> custom build OpenSSL libraries to replace the JSSE crypto engine with >>> an OpenSSL based one using JNI. Unfortunately the JDK is now loading >>> libssl and libcrypto early. In case our TC OpenSSL also uses SO >>> version 1.0.0 it will not get loaded, in case it is another version >>> we can run into a mix of symbols resolved in the platform OpenSSL >>> libs now loaded early and the ones provided with TC loaded later. >>> >>> This is an example, why it would be good to not introduce too many >>> native library dependencies for the JVM or make it optional in the >>> sense of configurable during runtime. Of the above list, the icu >>> libs, libglib and libdnet are other libs one would probably try to >>> avoid. >>> >>> Don't know whether this list is appropriate for discussing it. If not >>> any pointers to a better list are appreciated. >> >> This is the correct list to discuss this. >> >> When 8222720 was put in I had no idea it would result in eager loading >> of libraries beyond the explicit load of libguestlib. >> >> To be clear you are running under VMWare? This should only happen to >> enable reporting for the VMWare virtualization info in case of a crash. > > Yes, I am running under VMWare. The library /usr/lib64/libguestlib.so.0 > and its dependency /usr/lib64/libvmtools.so.0 both belong to the package > libvmtools0. Its sources seem to be available at > https://github.com/vmware/open-vm-tools. > >> This may need to be revisited. >> >> Thanks for the report. > > Thanks for looking at this! > > Rainer > From rainer.jung at kippdata.de Tue Jul 30 07:53:36 2019 From: rainer.jung at kippdata.de (Rainer Jung) Date: Tue, 30 Jul 2019 09:53:36 +0200 Subject: New library dependencies due to 8222720 (fb5b3981eac) In-Reply-To: <73b6c6d9-292e-e889-db08-f4001743d18b@oracle.com> References: <86843c72-04ac-28ac-4b92-b9227d7ee649@oracle.com> <60ef3524-ca99-4ea8-c3f2-aca642160620@kippdata.de> <73b6c6d9-292e-e889-db08-f4001743d18b@oracle.com> Message-ID: <7298301a-7da8-3bc0-971c-da42a1173f6e@kippdata.de> Am 30.07.2019 um 09:50 schrieb David Holmes: > Hi Rainer, > > I have filed: > > https://bugs.openjdk.java.net/browse/JDK-8228764 Thanks you. > Matthias: I think we may have to backout JDK-8222720 from JDK 13, > re-examine this and re-do for 14. It looks like it is also already in JDK 11 head for the forthcoming 11.0.5: https://bugs.openjdk.java.net/browse/JDK-8226873 Regards, Rainer > Thanks, > David > ----- > > On 30/07/2019 5:34 pm, Rainer Jung wrote: >> Hi David, >> >> Am 30.07.2019 um 01:56 schrieb David Holmes: >>> Hi Rainer, >>> >>> On 30/07/2019 7:34 am, Rainer Jung wrote: >>>> While doing Tomcat tests I noticed, that at least on SLES 12 JDK 13 >>>> and 14 EA have a lot of new runtime library dependencies. >>>> >>>> Change fb5b3981eac with log >>>> >>>> 8222720: Provide extended VMWare/vSphere virtualization related info >>>> in the hs_error file on linux/windows x86_64 >>>> >>>> loads /usr/lib64/libguestlib.so.0 already during JVM startup. That >>>> library depends on /usr/lib64/libvmtools.so.0, which in turn depends >>>> on a lot of other libraries: >>>> >>>> ?? NEEDED?????????????? libdnet.so.1 >>>> ?? NEEDED?????????????? libglib-2.0.so.0 >>>> ?? NEEDED?????????????? libicui18n.so.52.1 >>>> ?? NEEDED?????????????? libicuuc.so.52.1 >>>> ?? NEEDED?????????????? libpthread.so.0 >>>> ?? NEEDED?????????????? libdl.so.2 >>>> ?? NEEDED?????????????? libssl.so.1.0.0 >>>> ?? NEEDED?????????????? libcrypto.so.1.0.0 >>>> ?? NEEDED?????????????? libc.so.6 >>>> ?? NEEDED?????????????? ld-linux-x86-64.so.2 >>>> ?? NEEDED?????????????? libgcc_s.so.1 >>>> >>>> Some are not so problematic, but for instance Tomcat is able to use >>>> custom build OpenSSL libraries to replace the JSSE crypto engine >>>> with an OpenSSL based one using JNI. Unfortunately the JDK is now >>>> loading libssl and libcrypto early. In case our TC OpenSSL also uses >>>> SO version 1.0.0 it will not get loaded, in case it is another >>>> version we can run into a mix of symbols resolved in the platform >>>> OpenSSL libs now loaded early and the ones provided with TC loaded >>>> later. >>>> >>>> This is an example, why it would be good to not introduce too many >>>> native library dependencies for the JVM or make it optional in the >>>> sense of configurable during runtime. Of the above list, the icu >>>> libs, libglib and libdnet are other libs one would probably try to >>>> avoid. >>>> >>>> Don't know whether this list is appropriate for discussing it. If >>>> not any pointers to a better list are appreciated. >>> >>> This is the correct list to discuss this. >>> >>> When 8222720 was put in I had no idea it would result in eager >>> loading of libraries beyond the explicit load of libguestlib. >>> >>> To be clear you are running under VMWare? This should only happen to >>> enable reporting for the VMWare virtualization info in case of a crash. >> >> Yes, I am running under VMWare. The library >> /usr/lib64/libguestlib.so.0 and its dependency >> /usr/lib64/libvmtools.so.0 both belong to the package libvmtools0. Its >> sources seem to be available at https://github.com/vmware/open-vm-tools. >> >>> This may need to be revisited. >>> >>> Thanks for the report. >> >> Thanks for looking at this! >> >> Rainer From fweimer at redhat.com Tue Jul 30 08:03:20 2019 From: fweimer at redhat.com (Florian Weimer) Date: Tue, 30 Jul 2019 10:03:20 +0200 Subject: New library dependencies due to 8222720 (fb5b3981eac) In-Reply-To: (Rainer Jung's message of "Mon, 29 Jul 2019 23:34:29 +0200") References: Message-ID: <87ftmox83b.fsf@oldenburg2.str.redhat.com> * Rainer Jung: > loads /usr/lib64/libguestlib.so.0 already during JVM startup. That > library depends on /usr/lib64/libvmtools.so.0, which in turn depends > on a lot of other libraries: > > NEEDED libdnet.so.1 > NEEDED libglib-2.0.so.0 > NEEDED libicui18n.so.52.1 > NEEDED libicuuc.so.52.1 > NEEDED libpthread.so.0 > NEEDED libdl.so.2 > NEEDED libssl.so.1.0.0 > NEEDED libcrypto.so.1.0.0 > NEEDED libc.so.6 > NEEDED ld-linux-x86-64.so.2 > NEEDED libgcc_s.so.1 Fedora installs the library by default (even on systems that are not virtualized), and it links against many more libraries. So the issue you raise is not restricted to Vmware deployments, and its scope is actually a bit larger. Thanks, Florian From matthias.baesken at sap.com Tue Jul 30 08:18:32 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Tue, 30 Jul 2019 08:18:32 +0000 Subject: New library dependencies due to 8222720 (fb5b3981eac) In-Reply-To: <73b6c6d9-292e-e889-db08-f4001743d18b@oracle.com> References: <86843c72-04ac-28ac-4b92-b9227d7ee649@oracle.com> <60ef3524-ca99-4ea8-c3f2-aca642160620@kippdata.de> <73b6c6d9-292e-e889-db08-f4001743d18b@oracle.com> Message-ID: Hi David, in our proprietary JVM we have an XX flag to enable/disable the usage of the guestlib for people who don't want it . Should I go for this ? Best regards, Matthias > -----Original Message----- > From: David Holmes > Sent: Dienstag, 30. Juli 2019 09:51 > To: Rainer Jung ; hotspot-runtime- > dev at openjdk.java.net; Baesken, Matthias > Subject: Re: New library dependencies due to 8222720 (fb5b3981eac) > > Hi Rainer, > > I have filed: > > https://bugs.openjdk.java.net/browse/JDK-8228764 > > Matthias: I think we may have to backout JDK-8222720 from JDK 13, > re-examine this and re-do for 14. > > Thanks, > David > ----- > > On 30/07/2019 5:34 pm, Rainer Jung wrote: > > Hi David, > > > > Am 30.07.2019 um 01:56 schrieb David Holmes: > >> Hi Rainer, > >> > >> On 30/07/2019 7:34 am, Rainer Jung wrote: > >>> While doing Tomcat tests I noticed, that at least on SLES 12 JDK 13 > >>> and 14 EA have a lot of new runtime library dependencies. > >>> > >>> Change fb5b3981eac with log > >>> > >>> 8222720: Provide extended VMWare/vSphere virtualization related info > >>> in the hs_error file on linux/windows x86_64 > >>> > >>> loads /usr/lib64/libguestlib.so.0 already during JVM startup. That > >>> library depends on /usr/lib64/libvmtools.so.0, which in turn depends > >>> on a lot of other libraries: > >>> > >>> ?? NEEDED?????????????? libdnet.so.1 > >>> ?? NEEDED?????????????? libglib-2.0.so.0 > >>> ?? NEEDED?????????????? libicui18n.so.52.1 > >>> ?? NEEDED?????????????? libicuuc.so.52.1 > >>> ?? NEEDED?????????????? libpthread.so.0 > >>> ?? NEEDED?????????????? libdl.so.2 > >>> ?? NEEDED?????????????? libssl.so.1.0.0 > >>> ?? NEEDED?????????????? libcrypto.so.1.0.0 > >>> ?? NEEDED?????????????? libc.so.6 > >>> ?? NEEDED?????????????? ld-linux-x86-64.so.2 > >>> ?? NEEDED?????????????? libgcc_s.so.1 > >>> > >>> Some are not so problematic, but for instance Tomcat is able to use > >>> custom build OpenSSL libraries to replace the JSSE crypto engine with > >>> an OpenSSL based one using JNI. Unfortunately the JDK is now loading > >>> libssl and libcrypto early. In case our TC OpenSSL also uses SO > >>> version 1.0.0 it will not get loaded, in case it is another version > >>> we can run into a mix of symbols resolved in the platform OpenSSL > >>> libs now loaded early and the ones provided with TC loaded later. > >>> > >>> This is an example, why it would be good to not introduce too many > >>> native library dependencies for the JVM or make it optional in the > >>> sense of configurable during runtime. Of the above list, the icu > >>> libs, libglib and libdnet are other libs one would probably try to > >>> avoid. > >>> > >>> Don't know whether this list is appropriate for discussing it. If not > >>> any pointers to a better list are appreciated. > >> > >> This is the correct list to discuss this. > >> > >> When 8222720 was put in I had no idea it would result in eager loading > >> of libraries beyond the explicit load of libguestlib. > >> > >> To be clear you are running under VMWare? This should only happen to > >> enable reporting for the VMWare virtualization info in case of a crash. > > > > Yes, I am running under VMWare. The library /usr/lib64/libguestlib.so.0 > > and its dependency /usr/lib64/libvmtools.so.0 both belong to the package > > libvmtools0. Its sources seem to be available at > > https://github.com/vmware/open-vm-tools. > > > >> This may need to be revisited. > >> > >> Thanks for the report. > > > > Thanks for looking at this! > > > > Rainer > > From david.holmes at oracle.com Tue Jul 30 08:37:23 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 30 Jul 2019 18:37:23 +1000 Subject: New library dependencies due to 8222720 (fb5b3981eac) In-Reply-To: <87ftmox83b.fsf@oldenburg2.str.redhat.com> References: <87ftmox83b.fsf@oldenburg2.str.redhat.com> Message-ID: <916c6be9-c836-aae7-e76a-8393fcb02e01@oracle.com> On 30/07/2019 6:03 pm, Florian Weimer wrote: > * Rainer Jung: > >> loads /usr/lib64/libguestlib.so.0 already during JVM startup. That >> library depends on /usr/lib64/libvmtools.so.0, which in turn depends >> on a lot of other libraries: >> >> NEEDED libdnet.so.1 >> NEEDED libglib-2.0.so.0 >> NEEDED libicui18n.so.52.1 >> NEEDED libicuuc.so.52.1 >> NEEDED libpthread.so.0 >> NEEDED libdl.so.2 >> NEEDED libssl.so.1.0.0 >> NEEDED libcrypto.so.1.0.0 >> NEEDED libc.so.6 >> NEEDED ld-linux-x86-64.so.2 >> NEEDED libgcc_s.so.1 > > Fedora installs the library by default (even on systems that are not > virtualized), and it links against many more libraries. So the issue > you raise is not restricted to Vmware deployments, and its scope is > actually a bit larger. It is restricted to Vmware deployments because we only load the library if we detect we are running under VMware. David ----- > Thanks, > Florian > From david.holmes at oracle.com Tue Jul 30 08:38:52 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 30 Jul 2019 18:38:52 +1000 Subject: New library dependencies due to 8222720 (fb5b3981eac) In-Reply-To: References: <86843c72-04ac-28ac-4b92-b9227d7ee649@oracle.com> <60ef3524-ca99-4ea8-c3f2-aca642160620@kippdata.de> <73b6c6d9-292e-e889-db08-f4001743d18b@oracle.com> Message-ID: <7f16ceb9-db4c-09cc-bb78-d43163d0e648@oracle.com> Hi Matthias, On 30/07/2019 6:18 pm, Baesken, Matthias wrote: > Hi David, in our proprietary JVM we have an XX flag to enable/disable the usage of the guestlib for people who don't want it . > Should I go for this ? We can look at that for 14 but for 13 (and 11.0.5) I think we just need to back this out. Thanks, David > > Best regards, Matthias > > > >> -----Original Message----- >> From: David Holmes >> Sent: Dienstag, 30. Juli 2019 09:51 >> To: Rainer Jung ; hotspot-runtime- >> dev at openjdk.java.net; Baesken, Matthias >> Subject: Re: New library dependencies due to 8222720 (fb5b3981eac) >> >> Hi Rainer, >> >> I have filed: >> >> https://bugs.openjdk.java.net/browse/JDK-8228764 >> >> Matthias: I think we may have to backout JDK-8222720 from JDK 13, >> re-examine this and re-do for 14. >> >> Thanks, >> David >> ----- >> >> On 30/07/2019 5:34 pm, Rainer Jung wrote: >>> Hi David, >>> >>> Am 30.07.2019 um 01:56 schrieb David Holmes: >>>> Hi Rainer, >>>> >>>> On 30/07/2019 7:34 am, Rainer Jung wrote: >>>>> While doing Tomcat tests I noticed, that at least on SLES 12 JDK 13 >>>>> and 14 EA have a lot of new runtime library dependencies. >>>>> >>>>> Change fb5b3981eac with log >>>>> >>>>> 8222720: Provide extended VMWare/vSphere virtualization related info >>>>> in the hs_error file on linux/windows x86_64 >>>>> >>>>> loads /usr/lib64/libguestlib.so.0 already during JVM startup. That >>>>> library depends on /usr/lib64/libvmtools.so.0, which in turn depends >>>>> on a lot of other libraries: >>>>> >>>>> ?? NEEDED?????????????? libdnet.so.1 >>>>> ?? NEEDED?????????????? libglib-2.0.so.0 >>>>> ?? NEEDED?????????????? libicui18n.so.52.1 >>>>> ?? NEEDED?????????????? libicuuc.so.52.1 >>>>> ?? NEEDED?????????????? libpthread.so.0 >>>>> ?? NEEDED?????????????? libdl.so.2 >>>>> ?? NEEDED?????????????? libssl.so.1.0.0 >>>>> ?? NEEDED?????????????? libcrypto.so.1.0.0 >>>>> ?? NEEDED?????????????? libc.so.6 >>>>> ?? NEEDED?????????????? ld-linux-x86-64.so.2 >>>>> ?? NEEDED?????????????? libgcc_s.so.1 >>>>> >>>>> Some are not so problematic, but for instance Tomcat is able to use >>>>> custom build OpenSSL libraries to replace the JSSE crypto engine with >>>>> an OpenSSL based one using JNI. Unfortunately the JDK is now loading >>>>> libssl and libcrypto early. In case our TC OpenSSL also uses SO >>>>> version 1.0.0 it will not get loaded, in case it is another version >>>>> we can run into a mix of symbols resolved in the platform OpenSSL >>>>> libs now loaded early and the ones provided with TC loaded later. >>>>> >>>>> This is an example, why it would be good to not introduce too many >>>>> native library dependencies for the JVM or make it optional in the >>>>> sense of configurable during runtime. Of the above list, the icu >>>>> libs, libglib and libdnet are other libs one would probably try to >>>>> avoid. >>>>> >>>>> Don't know whether this list is appropriate for discussing it. If not >>>>> any pointers to a better list are appreciated. >>>> >>>> This is the correct list to discuss this. >>>> >>>> When 8222720 was put in I had no idea it would result in eager loading >>>> of libraries beyond the explicit load of libguestlib. >>>> >>>> To be clear you are running under VMWare? This should only happen to >>>> enable reporting for the VMWare virtualization info in case of a crash. >>> >>> Yes, I am running under VMWare. The library /usr/lib64/libguestlib.so.0 >>> and its dependency /usr/lib64/libvmtools.so.0 both belong to the package >>> libvmtools0. Its sources seem to be available at >>> https://github.com/vmware/open-vm-tools. >>> >>>> This may need to be revisited. >>>> >>>> Thanks for the report. >>> >>> Thanks for looking at this! >>> >>> Rainer >>> From sgehwolf at redhat.com Tue Jul 30 09:05:45 2019 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Tue, 30 Jul 2019 11:05:45 +0200 Subject: RFR(S): 8195809: [TESTBUG] jps and jcmd -l support for Docker containers is not tested In-Reply-To: References: Message-ID: <74efd3b779cdb0552069d71f39925b14ce0a416b.camel@redhat.com> Hi Misha, One question: Is it expected for a root user (outside a container) to see *all* Java processes inside a container? As far as I understand it, these tests assert that the same user (outside) and inside a container can see each other's Java processes. Review is below... On Mon, 2019-07-29 at 20:46 -0700, mikhailo.seledtsov at oracle.com wrote: > Please review this change that: > - adds test case for "jcmd -l" and "jcmd help" where jcmd is > executed on a host/node outside the container, > while a targeted JVM is running inside a container > - factors out some common functionality to DockerTestUtils and > docker.Common > > Please note: > - the "jcmd -l" works in this configuration, however the JCMD's and > Target's username and UID have to match > (per design) > - the "jcmd help", "jcmd JFR.start" or any other JCMD command besides > "jcmd -l" does not work in this configuration > (Filed "JDK-8228343: JCMD and attach fail to work across Linux > Container boundary") > The test case is commented out, however can be used for reproducing > the issue, and will be enabled > once the bug is fixed. > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8195809 > Webrev: http://cr.openjdk.java.net/~mseledtsov/8195809.00/ + // find process ID from JCMD output + public static long findPidFromJcmdOutput(OutputAnalyzer out, String name) throws Exception { + List l = out.asLines() + .stream() + .filter(s -> s.contains(name)) + .collect(Collectors.toList()); + if (l.isEmpty()) { + throw new RuntimeException("Could not find matching process"); + } + String psInfo = l.get(0); This seems to assume there is exactly one matching line. In that case, I'd suggest to amend the "l.isEmpty()" condition to: if (l.isEmpty() or l.size() > 1) ... public static void - buildDockerImage(String imageName, Path dockerfile, Path buildDir) throws Exception { + buildDockerImage(String imageName, Path dockerfile, + Path buildDir, String additionalDockerfileContent) throws Exception { generateDockerFile(buildDir.resolve("Dockerfile"), DockerfileConfig.getBaseImageName(), - DockerfileConfig.getBaseImageVersion()); + DockerfileConfig.getBaseImageVersion(), + additionalDockerfileContent); try { // Build the docker execute(Container.ENGINE_COMMAND, "build", "--no-cache", "--tag", imageName, buildDir.toString()) - .shouldHaveExitValue(0); + .shouldHaveExitValue(0) + .shouldContain("Successfully built"); If you assert "Successfully built" in the output it'll break podman support. See https://hg.openjdk.java.net/jdk/jdk/rev/709913d8ace9#l4.38 TestJcmd: + // In order for jcmd to work, the USER name and UID of the observer + // need to match the USERNAME/UID of the observed JVM process + String additionalDockerFileContent = + String.format("RUN useradd %s --uid %d \n", getCurrentUserName(), getCurrentUserId()) + + String.format("USER %s \n", getCurrentUserName()); Perhaps the test should have a more telling class name. Perhaps TestJcmdMatchingUsernames. Failing that, add this info the @summary tag of the test? TestJcmdWithSideCar: 80 // JCMD does not work in sidecar configuration, except for "jcmd -l". 81 // Including this test case to assist in reproduction of the problem. 82 // t.assertIsAlive(); 83 // testCase03(mainProcPid); Shouldn't this refer to JDK-8228343 as well? I believe a fix for JDK- 8228343 should cover both test cases. Thanks, Severin > Testing: > - ran the new test multiple times on Linux-x64 > - ran TestJCMDWithSideCar multiple times on Linux-x64 > - ran all Docker/Container tests (HotSpot and JDK) > All PASS > > Thank you, > Misha > From goetz.lindenmaier at sap.com Tue Jul 30 11:06:12 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 30 Jul 2019 11:06:12 +0000 Subject: New library dependencies due to 8222720 (fb5b3981eac) In-Reply-To: <7f16ceb9-db4c-09cc-bb78-d43163d0e648@oracle.com> References: <86843c72-04ac-28ac-4b92-b9227d7ee649@oracle.com> <60ef3524-ca99-4ea8-c3f2-aca642160620@kippdata.de> <73b6c6d9-292e-e889-db08-f4001743d18b@oracle.com> <7f16ceb9-db4c-09cc-bb78-d43163d0e648@oracle.com> Message-ID: Hi, there is already -XX:ExtensiveErrorReports with default 'false'. It's supposed to guard additional infos in the hs_err file. As it's already available, no CSR should be needed. Can't we just use this? Below tiny fix should do the job. diff -r 144585063bc8 src/hotspot/share/utilities/virtualizationSupport.cpp --- a/src/hotspot/share/utilities/virtualizationSupport.cpp Tue Jul 30 11:14:16 2019 +0800 +++ b/src/hotspot/share/utilities/virtualizationSupport.cpp Tue Jul 30 13:04:58 2019 +0200 @@ -40,6 +40,9 @@ static char extended_resource_info_at_startup[600]; void VirtualizationSupport::initialize() { + + if (!ExtensiveErrorReports) return; + // open vmguestlib and bind SDK functions char ebuf[1024]; dlHandle = os::dll_load("vmGuestLib", ebuf, sizeof ebuf); Best regards, Goetz. > -----Original Message----- > From: hotspot-runtime-dev > On Behalf Of David Holmes > Sent: Dienstag, 30. Juli 2019 10:39 > To: Baesken, Matthias ; Rainer Jung > ; hotspot-runtime-dev at openjdk.java.net > Subject: Re: New library dependencies due to 8222720 (fb5b3981eac) > > Hi Matthias, > > On 30/07/2019 6:18 pm, Baesken, Matthias wrote: > > Hi David, in our proprietary JVM we have an XX flag to enable/disable > the usage of the guestlib for people who don't want it . > > Should I go for this ? > > We can look at that for 14 but for 13 (and 11.0.5) I think we just need > to back this out. > > Thanks, > David > > > > > Best regards, Matthias > > > > > > > >> -----Original Message----- > >> From: David Holmes > >> Sent: Dienstag, 30. Juli 2019 09:51 > >> To: Rainer Jung ; hotspot-runtime- > >> dev at openjdk.java.net; Baesken, Matthias > >> Subject: Re: New library dependencies due to 8222720 (fb5b3981eac) > >> > >> Hi Rainer, > >> > >> I have filed: > >> > >> https://bugs.openjdk.java.net/browse/JDK-8228764 > >> > >> Matthias: I think we may have to backout JDK-8222720 from JDK 13, > >> re-examine this and re-do for 14. > >> > >> Thanks, > >> David > >> ----- > >> > >> On 30/07/2019 5:34 pm, Rainer Jung wrote: > >>> Hi David, > >>> > >>> Am 30.07.2019 um 01:56 schrieb David Holmes: > >>>> Hi Rainer, > >>>> > >>>> On 30/07/2019 7:34 am, Rainer Jung wrote: > >>>>> While doing Tomcat tests I noticed, that at least on SLES 12 JDK 13 > >>>>> and 14 EA have a lot of new runtime library dependencies. > >>>>> > >>>>> Change fb5b3981eac with log > >>>>> > >>>>> 8222720: Provide extended VMWare/vSphere virtualization related info > >>>>> in the hs_error file on linux/windows x86_64 > >>>>> > >>>>> loads /usr/lib64/libguestlib.so.0 already during JVM startup. That > >>>>> library depends on /usr/lib64/libvmtools.so.0, which in turn depends > >>>>> on a lot of other libraries: > >>>>> > >>>>> ?? NEEDED?????????????? libdnet.so.1 > >>>>> ?? NEEDED?????????????? libglib-2.0.so.0 > >>>>> ?? NEEDED?????????????? libicui18n.so.52.1 > >>>>> ?? NEEDED?????????????? libicuuc.so.52.1 > >>>>> ?? NEEDED?????????????? libpthread.so.0 > >>>>> ?? NEEDED?????????????? libdl.so.2 > >>>>> ?? NEEDED?????????????? libssl.so.1.0.0 > >>>>> ?? NEEDED?????????????? libcrypto.so.1.0.0 > >>>>> ?? NEEDED?????????????? libc.so.6 > >>>>> ?? NEEDED?????????????? ld-linux-x86-64.so.2 > >>>>> ?? NEEDED?????????????? libgcc_s.so.1 > >>>>> > >>>>> Some are not so problematic, but for instance Tomcat is able to use > >>>>> custom build OpenSSL libraries to replace the JSSE crypto engine with > >>>>> an OpenSSL based one using JNI. Unfortunately the JDK is now loading > >>>>> libssl and libcrypto early. In case our TC OpenSSL also uses SO > >>>>> version 1.0.0 it will not get loaded, in case it is another version > >>>>> we can run into a mix of symbols resolved in the platform OpenSSL > >>>>> libs now loaded early and the ones provided with TC loaded later. > >>>>> > >>>>> This is an example, why it would be good to not introduce too many > >>>>> native library dependencies for the JVM or make it optional in the > >>>>> sense of configurable during runtime. Of the above list, the icu > >>>>> libs, libglib and libdnet are other libs one would probably try to > >>>>> avoid. > >>>>> > >>>>> Don't know whether this list is appropriate for discussing it. If not > >>>>> any pointers to a better list are appreciated. > >>>> > >>>> This is the correct list to discuss this. > >>>> > >>>> When 8222720 was put in I had no idea it would result in eager loading > >>>> of libraries beyond the explicit load of libguestlib. > >>>> > >>>> To be clear you are running under VMWare? This should only happen to > >>>> enable reporting for the VMWare virtualization info in case of a crash. > >>> > >>> Yes, I am running under VMWare. The library /usr/lib64/libguestlib.so.0 > >>> and its dependency /usr/lib64/libvmtools.so.0 both belong to the package > >>> libvmtools0. Its sources seem to be available at > >>> https://github.com/vmware/open-vm-tools. > >>> > >>>> This may need to be revisited. > >>>> > >>>> Thanks for the report. > >>> > >>> Thanks for looking at this! > >>> > >>> Rainer > >>> From david.holmes at oracle.com Tue Jul 30 11:51:09 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 30 Jul 2019 21:51:09 +1000 Subject: New library dependencies due to 8222720 (fb5b3981eac) In-Reply-To: References: <86843c72-04ac-28ac-4b92-b9227d7ee649@oracle.com> <60ef3524-ca99-4ea8-c3f2-aca642160620@kippdata.de> <73b6c6d9-292e-e889-db08-f4001743d18b@oracle.com> <7f16ceb9-db4c-09cc-bb78-d43163d0e648@oracle.com> Message-ID: <9d637309-c569-b35c-8cd9-62d48f521e7b@oracle.com> Hi Goetz, On 30/07/2019 9:06 pm, Lindenmaier, Goetz wrote: > Hi, > > there is already -XX:ExtensiveErrorReports with default 'false'. > It's supposed to guard additional infos in the hs_err file. > As it's already available, no CSR should be needed. > > Can't we just use this? Below tiny fix should do the job. > > diff -r 144585063bc8 src/hotspot/share/utilities/virtualizationSupport.cpp > --- a/src/hotspot/share/utilities/virtualizationSupport.cpp Tue Jul 30 11:14:16 2019 +0800 > +++ b/src/hotspot/share/utilities/virtualizationSupport.cpp Tue Jul 30 13:04:58 2019 +0200 > @@ -40,6 +40,9 @@ > static char extended_resource_info_at_startup[600]; > > void VirtualizationSupport::initialize() { > + > + if (!ExtensiveErrorReports) return; > + > // open vmguestlib and bind SDK functions > char ebuf[1024]; > dlHandle = os::dll_load("vmGuestLib", ebuf, sizeof ebuf); That seems quite reasonable to me - this is extended error information. Great suggestion! Thanks, David > Best regards, > Goetz. > >> -----Original Message----- >> From: hotspot-runtime-dev >> On Behalf Of David Holmes >> Sent: Dienstag, 30. Juli 2019 10:39 >> To: Baesken, Matthias ; Rainer Jung >> ; hotspot-runtime-dev at openjdk.java.net >> Subject: Re: New library dependencies due to 8222720 (fb5b3981eac) >> >> Hi Matthias, >> >> On 30/07/2019 6:18 pm, Baesken, Matthias wrote: >>> Hi David, in our proprietary JVM we have an XX flag to enable/disable >> the usage of the guestlib for people who don't want it . >>> Should I go for this ? >> >> We can look at that for 14 but for 13 (and 11.0.5) I think we just need >> to back this out. >> >> Thanks, >> David >> >>> >>> Best regards, Matthias >>> >>> >>> >>>> -----Original Message----- >>>> From: David Holmes >>>> Sent: Dienstag, 30. Juli 2019 09:51 >>>> To: Rainer Jung ; hotspot-runtime- >>>> dev at openjdk.java.net; Baesken, Matthias >>>> Subject: Re: New library dependencies due to 8222720 (fb5b3981eac) >>>> >>>> Hi Rainer, >>>> >>>> I have filed: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8228764 >>>> >>>> Matthias: I think we may have to backout JDK-8222720 from JDK 13, >>>> re-examine this and re-do for 14. >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>> On 30/07/2019 5:34 pm, Rainer Jung wrote: >>>>> Hi David, >>>>> >>>>> Am 30.07.2019 um 01:56 schrieb David Holmes: >>>>>> Hi Rainer, >>>>>> >>>>>> On 30/07/2019 7:34 am, Rainer Jung wrote: >>>>>>> While doing Tomcat tests I noticed, that at least on SLES 12 JDK 13 >>>>>>> and 14 EA have a lot of new runtime library dependencies. >>>>>>> >>>>>>> Change fb5b3981eac with log >>>>>>> >>>>>>> 8222720: Provide extended VMWare/vSphere virtualization related info >>>>>>> in the hs_error file on linux/windows x86_64 >>>>>>> >>>>>>> loads /usr/lib64/libguestlib.so.0 already during JVM startup. That >>>>>>> library depends on /usr/lib64/libvmtools.so.0, which in turn depends >>>>>>> on a lot of other libraries: >>>>>>> >>>>>>> ?? NEEDED?????????????? libdnet.so.1 >>>>>>> ?? NEEDED?????????????? libglib-2.0.so.0 >>>>>>> ?? NEEDED?????????????? libicui18n.so.52.1 >>>>>>> ?? NEEDED?????????????? libicuuc.so.52.1 >>>>>>> ?? NEEDED?????????????? libpthread.so.0 >>>>>>> ?? NEEDED?????????????? libdl.so.2 >>>>>>> ?? NEEDED?????????????? libssl.so.1.0.0 >>>>>>> ?? NEEDED?????????????? libcrypto.so.1.0.0 >>>>>>> ?? NEEDED?????????????? libc.so.6 >>>>>>> ?? NEEDED?????????????? ld-linux-x86-64.so.2 >>>>>>> ?? NEEDED?????????????? libgcc_s.so.1 >>>>>>> >>>>>>> Some are not so problematic, but for instance Tomcat is able to use >>>>>>> custom build OpenSSL libraries to replace the JSSE crypto engine with >>>>>>> an OpenSSL based one using JNI. Unfortunately the JDK is now loading >>>>>>> libssl and libcrypto early. In case our TC OpenSSL also uses SO >>>>>>> version 1.0.0 it will not get loaded, in case it is another version >>>>>>> we can run into a mix of symbols resolved in the platform OpenSSL >>>>>>> libs now loaded early and the ones provided with TC loaded later. >>>>>>> >>>>>>> This is an example, why it would be good to not introduce too many >>>>>>> native library dependencies for the JVM or make it optional in the >>>>>>> sense of configurable during runtime. Of the above list, the icu >>>>>>> libs, libglib and libdnet are other libs one would probably try to >>>>>>> avoid. >>>>>>> >>>>>>> Don't know whether this list is appropriate for discussing it. If not >>>>>>> any pointers to a better list are appreciated. >>>>>> >>>>>> This is the correct list to discuss this. >>>>>> >>>>>> When 8222720 was put in I had no idea it would result in eager loading >>>>>> of libraries beyond the explicit load of libguestlib. >>>>>> >>>>>> To be clear you are running under VMWare? This should only happen to >>>>>> enable reporting for the VMWare virtualization info in case of a crash. >>>>> >>>>> Yes, I am running under VMWare. The library /usr/lib64/libguestlib.so.0 >>>>> and its dependency /usr/lib64/libvmtools.so.0 both belong to the package >>>>> libvmtools0. Its sources seem to be available at >>>>> https://github.com/vmware/open-vm-tools. >>>>> >>>>>> This may need to be revisited. >>>>>> >>>>>> Thanks for the report. >>>>> >>>>> Thanks for looking at this! >>>>> >>>>> Rainer >>>>> From goetz.lindenmaier at sap.com Tue Jul 30 12:29:08 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 30 Jul 2019 12:29:08 +0000 Subject: RFR(M): 8228743: [PPC64, s390] Implement FastJNIAccessors In-Reply-To: References: Message-ID: Hi Martin, overall, the change looks good to me. It's a bit confusing that the method with the implementation has _int_ as infix: generate_fast_get_int_field0 while it is used for all data types, but this is similar on other platforms. // order preceding load You might want to capitalize this, like the other comments. No webrev needed. Best regards, Goetz. > -----Original Message----- > From: Doerr, Martin > Sent: Montag, 29. Juli 2019 19:43 > To: hotspot-runtime-dev at openjdk.java.net; Lindenmaier, Goetz > ; Schmidt, Lutz ; > Gustavo Romero > Subject: RFR(M): 8228743: [PPC64, s390] Implement FastJNIAccessors > > Hi, > > > > I'd like to contribute fast JNI Get*Field platform implementations for PPC64 > and s390. > > > > Please review: > > http://cr.openjdk.java.net/~mdoerr/8228743_PPC64_s390_FastJNIAccessors/ > webrev.00/ > /webrev.00/> > > > > Best regards, > > Martin > > From harold.seigel at oracle.com Tue Jul 30 12:40:44 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Tue, 30 Jul 2019 08:40:44 -0400 Subject: RFR (XXXS) 8227250: UserHandler contains ancient LinuxThreads code In-Reply-To: <4c677b95-edcd-584b-6250-630422d20c71@oracle.com> References: <4c677b95-edcd-584b-6250-630422d20c71@oracle.com> Message-ID: <4e431cbe-53ba-ebf9-bf57-0598e0db41ff@oracle.com> This looks good! Thanks, Harold On 7/30/2019 1:09 AM, David Holmes wrote: > bug: https://bugs.openjdk.java.net/browse/JDK-8227250 > webrev: http://cr.openjdk.java.net/~dholmes/8227250/webrev/ > > Removed some ancient Linux code that pertained to the LinuxThreads > implementation, and which was erroneously copied into the BSD and AIX > ports. > > Thanks, > David From goetz.lindenmaier at sap.com Tue Jul 30 12:52:14 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 30 Jul 2019 12:52:14 +0000 Subject: RFR(S): 8228649: [PPC64] SA reads wrong slots from interpreter frames In-Reply-To: References: Message-ID: Hi Martin, I had a look at your change. Thanks for explaining that you had to remove the extra slot in the debug build so that the stack looks the same in both builds offline. It probably could be distinguished in the build of SA, too, but this way it's probably more stable. Also, I don't remember we ever ran into this stack guard. You might want to describe in the bug how you have fixed this. Did you check that all the StackOverflow tests still work in the debug build? Like those that depend that a certain frame causes the overflow? With smaller frames, this might affect the tests. In the ProblemList, please remove the bugids of 8211767. @Gustavo, I don't think closing 8211767 is a good idea. It is still used for a row of excluded SA tests in the ProblemList, see also the comment further down in the bug. Best regards, Goetz. > -----Original Message----- > From: Doerr, Martin > Sent: Freitag, 26. Juli 2019 13:02 > To: serviceability-dev at openjdk.java.net; hotspot-runtime- > dev at openjdk.java.net; Lindenmaier, Goetz ; > Gustavo Romero > Subject: RFR(S): 8228649: [PPC64] SA reads wrong slots from interpreter > frames > > Hi, > > > > the jtreg test "serviceability/sa/sadebugd/DebugdConnectTest.java" fails with > "AssertionFailure: result must >= than stack pointer" on PPC64. > > The Java code doesn't read the right slots for the interpreter frame's monitors. > > > > I've removed the extra "reserved" slot which existed only in debug builds. It > was used as additional frame check, but I think we can live without it. > > My new proposal also initializes all relevant interpreter frame slots for better > tool support: > > http://cr.openjdk.java.net/~mdoerr/8228649_PPC64_sa/webrev.00/ > > > > Please review. > > > > Best regards, > > Martin > > From matthias.baesken at sap.com Tue Jul 30 14:21:09 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Tue, 30 Jul 2019 14:21:09 +0000 Subject: RFR: 8228764: New library dependencies due to JDK-8222720 - was: New library dependencies due to 8222720 (fb5b3981eac) Message-ID: Hello , I prepared a webrev following the idea proposed by Goetz ; please review ! http://cr.openjdk.java.net/~mbaesken/webrevs/8228764.0/ bug opened by David : https://bugs.openjdk.java.net/browse/JDK-8228764 Best regards, Matthias > -----Original Message----- > From: David Holmes > Sent: Dienstag, 30. Juli 2019 13:51 > To: Lindenmaier, Goetz ; Baesken, Matthias > ; Rainer Jung ; > hotspot-runtime-dev at openjdk.java.net > Subject: Re: New library dependencies due to 8222720 (fb5b3981eac) > > Hi Goetz, > > On 30/07/2019 9:06 pm, Lindenmaier, Goetz wrote: > > Hi, > > > > there is already -XX:ExtensiveErrorReports with default 'false'. > > It's supposed to guard additional infos in the hs_err file. > > As it's already available, no CSR should be needed. > > > > Can't we just use this? Below tiny fix should do the job. > > > > diff -r 144585063bc8 src/hotspot/share/utilities/virtualizationSupport.cpp > > --- a/src/hotspot/share/utilities/virtualizationSupport.cpp Tue Jul 30 > 11:14:16 2019 +0800 > > +++ b/src/hotspot/share/utilities/virtualizationSupport.cpp Tue Jul 30 > 13:04:58 2019 +0200 > > @@ -40,6 +40,9 @@ > > static char extended_resource_info_at_startup[600]; > > > > void VirtualizationSupport::initialize() { > > + > > + if (!ExtensiveErrorReports) return; > > + > > // open vmguestlib and bind SDK functions > > char ebuf[1024]; > > dlHandle = os::dll_load("vmGuestLib", ebuf, sizeof ebuf); > > That seems quite reasonable to me - this is extended error information. > > Great suggestion! > > Thanks, > David > > > Best regards, > > Goetz. > > > >> -----Original Message----- > >> From: hotspot-runtime-dev bounces at openjdk.java.net> > >> On Behalf Of David Holmes > >> Sent: Dienstag, 30. Juli 2019 10:39 > >> To: Baesken, Matthias ; Rainer Jung > >> ; hotspot-runtime-dev at openjdk.java.net > >> Subject: Re: New library dependencies due to 8222720 (fb5b3981eac) > >> > >> Hi Matthias, > >> > >> On 30/07/2019 6:18 pm, Baesken, Matthias wrote: > >>> Hi David, in our proprietary JVM we have an XX flag to > enable/disable > >> the usage of the guestlib for people who don't want it . > >>> Should I go for this ? > >> > >> We can look at that for 14 but for 13 (and 11.0.5) I think we just need > >> to back this out. > >> > >> Thanks, > >> David > >> > >>> > >>> Best regards, Matthias > >>> > >>> > >>> > >>>> -----Original Message----- > >>>> From: David Holmes > >>>> Sent: Dienstag, 30. Juli 2019 09:51 > >>>> To: Rainer Jung ; hotspot-runtime- > >>>> dev at openjdk.java.net; Baesken, Matthias > > >>>> Subject: Re: New library dependencies due to 8222720 (fb5b3981eac) > >>>> > >>>> Hi Rainer, > >>>> > >>>> I have filed: > >>>> > >>>> https://bugs.openjdk.java.net/browse/JDK-8228764 > >>>> > >>>> Matthias: I think we may have to backout JDK-8222720 from JDK 13, > >>>> re-examine this and re-do for 14. > >>>> > >>>> Thanks, > >>>> David > >>>> ----- > >>>> > >>>> On 30/07/2019 5:34 pm, Rainer Jung wrote: > >>>>> Hi David, > >>>>> > >>>>> Am 30.07.2019 um 01:56 schrieb David Holmes: > >>>>>> Hi Rainer, > >>>>>> > >>>>>> On 30/07/2019 7:34 am, Rainer Jung wrote: > >>>>>>> While doing Tomcat tests I noticed, that at least on SLES 12 JDK 13 > >>>>>>> and 14 EA have a lot of new runtime library dependencies. > >>>>>>> > >>>>>>> Change fb5b3981eac with log > >>>>>>> > >>>>>>> 8222720: Provide extended VMWare/vSphere virtualization related > info > >>>>>>> in the hs_error file on linux/windows x86_64 > >>>>>>> > >>>>>>> loads /usr/lib64/libguestlib.so.0 already during JVM startup. That > >>>>>>> library depends on /usr/lib64/libvmtools.so.0, which in turn > depends > >>>>>>> on a lot of other libraries: > >>>>>>> > >>>>>>> ?? NEEDED?????????????? libdnet.so.1 > >>>>>>> ?? NEEDED?????????????? libglib-2.0.so.0 > >>>>>>> ?? NEEDED?????????????? libicui18n.so.52.1 > >>>>>>> ?? NEEDED?????????????? libicuuc.so.52.1 > >>>>>>> ?? NEEDED?????????????? libpthread.so.0 > >>>>>>> ?? NEEDED?????????????? libdl.so.2 > >>>>>>> ?? NEEDED?????????????? libssl.so.1.0.0 > >>>>>>> ?? NEEDED?????????????? libcrypto.so.1.0.0 > >>>>>>> ?? NEEDED?????????????? libc.so.6 > >>>>>>> ?? NEEDED?????????????? ld-linux-x86-64.so.2 > >>>>>>> ?? NEEDED?????????????? libgcc_s.so.1 > >>>>>>> > >>>>>>> Some are not so problematic, but for instance Tomcat is able to use > >>>>>>> custom build OpenSSL libraries to replace the JSSE crypto engine > with > >>>>>>> an OpenSSL based one using JNI. Unfortunately the JDK is now > loading > >>>>>>> libssl and libcrypto early. In case our TC OpenSSL also uses SO > >>>>>>> version 1.0.0 it will not get loaded, in case it is another version > >>>>>>> we can run into a mix of symbols resolved in the platform OpenSSL > >>>>>>> libs now loaded early and the ones provided with TC loaded later. > >>>>>>> > >>>>>>> This is an example, why it would be good to not introduce too many > >>>>>>> native library dependencies for the JVM or make it optional in the > >>>>>>> sense of configurable during runtime. Of the above list, the icu > >>>>>>> libs, libglib and libdnet are other libs one would probably try to > >>>>>>> avoid. > >>>>>>> > >>>>>>> Don't know whether this list is appropriate for discussing it. If not > >>>>>>> any pointers to a better list are appreciated. > >>>>>> > >>>>>> This is the correct list to discuss this. > >>>>>> > >>>>>> When 8222720 was put in I had no idea it would result in eager > loading > >>>>>> of libraries beyond the explicit load of libguestlib. > >>>>>> > >>>>>> To be clear you are running under VMWare? This should only > happen to > >>>>>> enable reporting for the VMWare virtualization info in case of a > crash. > >>>>> > >>>>> Yes, I am running under VMWare. The library > /usr/lib64/libguestlib.so.0 > >>>>> and its dependency /usr/lib64/libvmtools.so.0 both belong to the > package > >>>>> libvmtools0. Its sources seem to be available at > >>>>> https://github.com/vmware/open-vm-tools. > >>>>> > >>>>>> This may need to be revisited. > >>>>>> > >>>>>> Thanks for the report. > >>>>> > >>>>> Thanks for looking at this! > >>>>> > >>>>> Rainer > >>>>> From goetz.lindenmaier at sap.com Tue Jul 30 14:26:05 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 30 Jul 2019 14:26:05 +0000 Subject: 8228764: New library dependencies due to JDK-8222720 - was: New library dependencies due to 8222720 (fb5b3981eac) In-Reply-To: References: Message-ID: Hi Matthias, Thanks for picking up my proposal! This looks good to me. Best regards, Goetz. > -----Original Message----- > From: Baesken, Matthias > Sent: Dienstag, 30. Juli 2019 16:21 > To: David Holmes ; Lindenmaier, Goetz > ; Rainer Jung ; > hotspot-runtime-dev at openjdk.java.net > Cc: Doerr, Martin > Subject: RFR: 8228764: New library dependencies due to JDK-8222720 - was: > New library dependencies due to 8222720 (fb5b3981eac) > > Hello , I prepared a webrev following the idea proposed by Goetz ; please > review ! > > http://cr.openjdk.java.net/~mbaesken/webrevs/8228764.0/ > > bug opened by David : > > https://bugs.openjdk.java.net/browse/JDK-8228764 > > > Best regards, Matthias > > > > -----Original Message----- > > From: David Holmes > > Sent: Dienstag, 30. Juli 2019 13:51 > > To: Lindenmaier, Goetz ; Baesken, Matthias > > ; Rainer Jung ; > > hotspot-runtime-dev at openjdk.java.net > > Subject: Re: New library dependencies due to 8222720 (fb5b3981eac) > > > > Hi Goetz, > > > > On 30/07/2019 9:06 pm, Lindenmaier, Goetz wrote: > > > Hi, > > > > > > there is already -XX:ExtensiveErrorReports with default 'false'. > > > It's supposed to guard additional infos in the hs_err file. > > > As it's already available, no CSR should be needed. > > > > > > Can't we just use this? Below tiny fix should do the job. > > > > > > diff -r 144585063bc8 src/hotspot/share/utilities/virtualizationSupport.cpp > > > --- a/src/hotspot/share/utilities/virtualizationSupport.cpp Tue Jul 30 > > 11:14:16 2019 +0800 > > > +++ b/src/hotspot/share/utilities/virtualizationSupport.cpp Tue Jul 30 > > 13:04:58 2019 +0200 > > > @@ -40,6 +40,9 @@ > > > static char extended_resource_info_at_startup[600]; > > > > > > void VirtualizationSupport::initialize() { > > > + > > > + if (!ExtensiveErrorReports) return; > > > + > > > // open vmguestlib and bind SDK functions > > > char ebuf[1024]; > > > dlHandle = os::dll_load("vmGuestLib", ebuf, sizeof ebuf); > > > > That seems quite reasonable to me - this is extended error information. > > > > Great suggestion! > > > > Thanks, > > David > > > > > Best regards, > > > Goetz. > > > > > >> -----Original Message----- > > >> From: hotspot-runtime-dev > bounces at openjdk.java.net> > > >> On Behalf Of David Holmes > > >> Sent: Dienstag, 30. Juli 2019 10:39 > > >> To: Baesken, Matthias ; Rainer Jung > > >> ; hotspot-runtime-dev at openjdk.java.net > > >> Subject: Re: New library dependencies due to 8222720 (fb5b3981eac) > > >> > > >> Hi Matthias, > > >> > > >> On 30/07/2019 6:18 pm, Baesken, Matthias wrote: > > >>> Hi David, in our proprietary JVM we have an XX flag to > > enable/disable > > >> the usage of the guestlib for people who don't want it . > > >>> Should I go for this ? > > >> > > >> We can look at that for 14 but for 13 (and 11.0.5) I think we just need > > >> to back this out. > > >> > > >> Thanks, > > >> David > > >> > > >>> > > >>> Best regards, Matthias > > >>> > > >>> > > >>> > > >>>> -----Original Message----- > > >>>> From: David Holmes > > >>>> Sent: Dienstag, 30. Juli 2019 09:51 > > >>>> To: Rainer Jung ; hotspot-runtime- > > >>>> dev at openjdk.java.net; Baesken, Matthias > > > > >>>> Subject: Re: New library dependencies due to 8222720 (fb5b3981eac) > > >>>> > > >>>> Hi Rainer, > > >>>> > > >>>> I have filed: > > >>>> > > >>>> https://bugs.openjdk.java.net/browse/JDK-8228764 > > >>>> > > >>>> Matthias: I think we may have to backout JDK-8222720 from JDK 13, > > >>>> re-examine this and re-do for 14. > > >>>> > > >>>> Thanks, > > >>>> David > > >>>> ----- > > >>>> > > >>>> On 30/07/2019 5:34 pm, Rainer Jung wrote: > > >>>>> Hi David, > > >>>>> > > >>>>> Am 30.07.2019 um 01:56 schrieb David Holmes: > > >>>>>> Hi Rainer, > > >>>>>> > > >>>>>> On 30/07/2019 7:34 am, Rainer Jung wrote: > > >>>>>>> While doing Tomcat tests I noticed, that at least on SLES 12 JDK 13 > > >>>>>>> and 14 EA have a lot of new runtime library dependencies. > > >>>>>>> > > >>>>>>> Change fb5b3981eac with log > > >>>>>>> > > >>>>>>> 8222720: Provide extended VMWare/vSphere virtualization related > > info > > >>>>>>> in the hs_error file on linux/windows x86_64 > > >>>>>>> > > >>>>>>> loads /usr/lib64/libguestlib.so.0 already during JVM startup. That > > >>>>>>> library depends on /usr/lib64/libvmtools.so.0, which in turn > > depends > > >>>>>>> on a lot of other libraries: > > >>>>>>> > > >>>>>>> ?? NEEDED?????????????? libdnet.so.1 > > >>>>>>> ?? NEEDED?????????????? libglib-2.0.so.0 > > >>>>>>> ?? NEEDED?????????????? libicui18n.so.52.1 > > >>>>>>> ?? NEEDED?????????????? libicuuc.so.52.1 > > >>>>>>> ?? NEEDED?????????????? libpthread.so.0 > > >>>>>>> ?? NEEDED?????????????? libdl.so.2 > > >>>>>>> ?? NEEDED?????????????? libssl.so.1.0.0 > > >>>>>>> ?? NEEDED?????????????? libcrypto.so.1.0.0 > > >>>>>>> ?? NEEDED?????????????? libc.so.6 > > >>>>>>> ?? NEEDED?????????????? ld-linux-x86-64.so.2 > > >>>>>>> ?? NEEDED?????????????? libgcc_s.so.1 > > >>>>>>> > > >>>>>>> Some are not so problematic, but for instance Tomcat is able to use > > >>>>>>> custom build OpenSSL libraries to replace the JSSE crypto engine > > with > > >>>>>>> an OpenSSL based one using JNI. Unfortunately the JDK is now > > loading > > >>>>>>> libssl and libcrypto early. In case our TC OpenSSL also uses SO > > >>>>>>> version 1.0.0 it will not get loaded, in case it is another version > > >>>>>>> we can run into a mix of symbols resolved in the platform OpenSSL > > >>>>>>> libs now loaded early and the ones provided with TC loaded later. > > >>>>>>> > > >>>>>>> This is an example, why it would be good to not introduce too many > > >>>>>>> native library dependencies for the JVM or make it optional in the > > >>>>>>> sense of configurable during runtime. Of the above list, the icu > > >>>>>>> libs, libglib and libdnet are other libs one would probably try to > > >>>>>>> avoid. > > >>>>>>> > > >>>>>>> Don't know whether this list is appropriate for discussing it. If not > > >>>>>>> any pointers to a better list are appreciated. > > >>>>>> > > >>>>>> This is the correct list to discuss this. > > >>>>>> > > >>>>>> When 8222720 was put in I had no idea it would result in eager > > loading > > >>>>>> of libraries beyond the explicit load of libguestlib. > > >>>>>> > > >>>>>> To be clear you are running under VMWare? This should only > > happen to > > >>>>>> enable reporting for the VMWare virtualization info in case of a > > crash. > > >>>>> > > >>>>> Yes, I am running under VMWare. The library > > /usr/lib64/libguestlib.so.0 > > >>>>> and its dependency /usr/lib64/libvmtools.so.0 both belong to the > > package > > >>>>> libvmtools0. Its sources seem to be available at > > >>>>> https://github.com/vmware/open-vm-tools. > > >>>>> > > >>>>>> This may need to be revisited. > > >>>>>> > > >>>>>> Thanks for the report. > > >>>>> > > >>>>> Thanks for looking at this! > > >>>>> > > >>>>> Rainer > > >>>>> From martin.doerr at sap.com Tue Jul 30 14:36:09 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 30 Jul 2019 14:36:09 +0000 Subject: RFR(S): 8228649: [PPC64] SA reads wrong slots from interpreter frames In-Reply-To: References: Message-ID: Hi G?tz, thanks for the review. > You might want to describe in the bug how you have fixed this. Done. > Did you check that all the StackOverflow tests still work in the > debug build? Yes, tests have run several times and no new issues have shown up. > In the ProblemList, please remove the bugids of 8211767. Done. Best regards, Martin From bob.vandette at oracle.com Tue Jul 30 15:12:26 2019 From: bob.vandette at oracle.com (Bob Vandette) Date: Tue, 30 Jul 2019 11:12:26 -0400 Subject: RFR(S): 8195809: [TESTBUG] jps and jcmd -l support for Docker containers is not tested In-Reply-To: References: Message-ID: <9D3D61EC-9A10-4A18-9102-6B95B7CF2038@oracle.com> http://cr.openjdk.java.net/~mseledtsov/8195809.00/test/lib/jdk/test/lib/containers/docker/Common.java.sdiff.html In findPidFromJcmdOutput, what if there are multiple processes with the same name running? Isn?t it possible that multiple different test runs are occurring on the same host? This might cause intermittent failures. A safer alternative might be to get the container id and then run docker inspect. % docker inspect -f '{{.State.Pid}}? cfc6fea3d152 4999 You could also create a random number and pass that as an argument to SimpleLoop and then scan the matching processes for that argument. I assume you?ll be cleaning or commenting out the debugging output in all files. Bob. > On Jul 29, 2019, at 11:46 PM, mikhailo.seledtsov at oracle.com wrote: > > Please review this change that: > - adds test case for "jcmd -l" and "jcmd help" where jcmd is executed on a host/node outside the container, > while a targeted JVM is running inside a container > - factors out some common functionality to DockerTestUtils and docker.Common > > Please note: > - the "jcmd -l" works in this configuration, however the JCMD's and Target's username and UID have to match > (per design) > - the "jcmd help", "jcmd JFR.start" or any other JCMD command besides "jcmd -l" does not work in this configuration > (Filed "JDK-8228343: JCMD and attach fail to work across Linux Container boundary") > The test case is commented out, however can be used for reproducing the issue, and will be enabled > once the bug is fixed. > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8195809 > Webrev: http://cr.openjdk.java.net/~mseledtsov/8195809.00/ > Testing: > - ran the new test multiple times on Linux-x64 > - ran TestJCMDWithSideCar multiple times on Linux-x64 > - ran all Docker/Container tests (HotSpot and JDK) > All PASS > > Thank you, > Misha > From lutz.schmidt at sap.com Tue Jul 30 15:45:58 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 30 Jul 2019 15:45:58 +0000 Subject: RFR(M): 8228743: [PPC64, s390] Implement FastJNIAccessors In-Reply-To: References: Message-ID: <6EDF694A-01B1-465E-9353-ED08B6544278@sap.com> Hi Martin, your change looks good to me. Please note that I'm NOT a Reviewer. Maybe a comment would be helpful at the call sites that JNI_FastGetField::find_slowcase_pc(pc) not only finds the slowcase_pc, but also decides if the signal at hand is related to a FastGetField access. Thanks, Lutz ?On 30.07.19, 14:29, "Lindenmaier, Goetz" wrote: Hi Martin, overall, the change looks good to me. It's a bit confusing that the method with the implementation has _int_ as infix: generate_fast_get_int_field0 while it is used for all data types, but this is similar on other platforms. // order preceding load You might want to capitalize this, like the other comments. No webrev needed. Best regards, Goetz. > -----Original Message----- > From: Doerr, Martin > Sent: Montag, 29. Juli 2019 19:43 > To: hotspot-runtime-dev at openjdk.java.net; Lindenmaier, Goetz > ; Schmidt, Lutz ; > Gustavo Romero > Subject: RFR(M): 8228743: [PPC64, s390] Implement FastJNIAccessors > > Hi, > > > > I?d like to contribute fast JNI Get*Field platform implementations for PPC64 > and s390. > > > > Please review: > > http://cr.openjdk.java.net/~mdoerr/8228743_PPC64_s390_FastJNIAccessors/ > webrev.00/ > /webrev.00/> > > > > Best regards, > > Martin > > From martin.doerr at sap.com Tue Jul 30 15:51:08 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 30 Jul 2019 15:51:08 +0000 Subject: RFR(S): 8228649: [PPC64] SA reads wrong slots from interpreter frames In-Reply-To: References: Message-ID: Hi Gustavo, thanks for reviewing and for running the tests again. I have removed ClhsdbCDSJstackPrintAll.java and ClhsdbFindPC.java from the ProblemList. Tests have passed. Webrev is updated in place. I?ll push it after some more testing time. Best regards, Martin From: Gustavo Bueno Romero Sent: Dienstag, 30. Juli 2019 15:09 To: hotspot-runtime-dev at openjdk.java.net; Doerr, Martin Cc: Lindenmaier, Goetz ; gustavo.romero at protonmail.com Subject: Re: RFR(S): 8228649: [PPC64] SA reads wrong slots from interpreter frames Hi Martin, Change looks good! In fact I see it fixed way more than just 'DebugdConnectTest.java'. On fastdebug, only two tests have issues now: 1. FAILED: serviceability/sa/TestInstanceKlassSize.java 2. Error: serviceability/sa/TestJmapCoreMetaspace.java On release, only one: 2. FAILED: serviceability/sa/TestInstanceKlassSize.java 1. is succeeds on exploded image files tho and 2. looks to fail only in fastdebug due to added asserts related to barriers debugging. So I think we could remove the following two tests too from the problem list (besides the ones you already removed) $ egrep "ClhsdbCDSJstackPrintAll.java|ClhsdbFindPC.java" test/hotspot/jtreg/ProblemList.txt serviceability/sa/ClhsdbCDSJstackPrintAll.java 8193639,8211767 solaris-all,linux-ppc64le,linux-ppc64 serviceability/sa/ClhsdbFindPC.java 8193639,8211767 solaris-all,linux-ppc64le,linux-ppc64 since both pass on fastdebug and release builds. Would you mind to remove them and test the change again against SAP CI to double check if the looks ok there too? (no new webrev needed) So, I wasn't smart enough to figure out: src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/ppc64/PPC64Frame.java: - private static final int INTERPRETER_FRAME_INITIAL_SP_OFFSET = INTERPRETER_FRAME_BCX_OFFSET - 1; // FIXME: probably wrong, but unused anyway - private static final int INTERPRETER_FRAME_MONITOR_BLOCK_TOP_OFFSET = INTERPRETER_FRAME_INITIAL_SP_OFFSET; - private static final int INTERPRETER_FRAME_MONITOR_BLOCK_BOTTOM_OFFSET = INTERPRETER_FRAME_INITIAL_SP_OFFSET; + private static final int INTERPRETER_FRAME_MONITOR_BLOCK_BOTTOM_OFFSET = INTERPRETER_FRAME_METHOD_OFFSET - 1; - Address result = addressOfStackSlot(INTERPRETER_FRAME_MONITOR_BLOCK_TOP_OFFSET).getAddressAt(0); + Address result = addressOfStackSlot(INTERPRETER_FRAME_MONITORS_OFFSET).getAddressAt(0); when investigating JDK-8228649 [0] . Hence I closed [0] as a duplicate. The ablation of reserved slot intended for debugging looks good too. Thumps up from my side. Thank you. Best regards, Gustavo [0] https://bugs.openjdk.java.net/browse/JDK-8211767 -- For the records, the current SA status I see on PPC64 LE when this change is applied: ** fastdebug ** $ sudo JT_JAVA=/usr/lib/jvm/java-1.8.0-openjdk-ppc64el /home/gromero/jtreg/bin/jtreg -nativepath:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-server-fastdebug/support/test/hotspot/jtreg/native/lib -v1 -conc:1 -jdk:./build/linux-ppc64le-server-fastdebug/images/jdk/ ./test/hotspot/jtreg/serviceability/sa Passed: serviceability/sa/jmap-hprof/JMapHProfLargeHeapTest.java Passed: serviceability/sa/sadebugd/DebugdConnectTest.java Passed: serviceability/sa/sadebugd/SADebugDTest.java Passed: serviceability/sa/CDSJMapClstats.java Passed: serviceability/sa/ClhsdbAttach.java Passed: serviceability/sa/ClhsdbCDSCore.java Passed: serviceability/sa/ClhsdbCDSJstackPrintAll.java Passed: serviceability/sa/ClhsdbField.java Passed: serviceability/sa/ClhsdbFindPC.java Passed: serviceability/sa/ClhsdbFlags.java Passed: serviceability/sa/ClhsdbInspect.java Passed: serviceability/sa/ClhsdbJdis.java Passed: serviceability/sa/ClhsdbJhisto.java Passed: serviceability/sa/ClhsdbJstack.java Passed: serviceability/sa/ClhsdbLongConstant.java Passed: serviceability/sa/ClhsdbPmap.java Passed: serviceability/sa/ClhsdbPrintAll.java Passed: serviceability/sa/ClhsdbPrintAs.java Passed: serviceability/sa/ClhsdbPrintStatics.java Passed: serviceability/sa/ClhsdbPstack.java Passed: serviceability/sa/ClhsdbRegionDetailsScanOopsForG1.java Passed: serviceability/sa/ClhsdbScanOops.java Passed: serviceability/sa/ClhsdbSource.java Passed: serviceability/sa/ClhsdbThread.java Passed: serviceability/sa/ClhsdbVmStructsDump.java Passed: serviceability/sa/ClhsdbWhere.java Passed: serviceability/sa/DeadlockDetectionTest.java Passed: serviceability/sa/JhsdbThreadInfoTest.java Passed: serviceability/sa/TestClassDump.java Passed: serviceability/sa/TestClhsdbJstackLock.java Passed: serviceability/sa/TestCpoolForInvokeDynamic.java Passed: serviceability/sa/TestDefaultMethods.java Passed: serviceability/sa/TestG1HeapRegion.java Passed: serviceability/sa/TestHeapDumpForInvokeDynamic.java Passed: serviceability/sa/TestHeapDumpForLargeArray.java FAILED: serviceability/sa/TestInstanceKlassSize.java Passed: serviceability/sa/TestInstanceKlassSizeForInterface.java Passed: serviceability/sa/TestIntConstant.java Passed: serviceability/sa/TestJhsdbJstackLock.java Passed: serviceability/sa/TestJhsdbJstackMixed.java Passed: serviceability/sa/TestJmapCore.java Error: serviceability/sa/TestJmapCoreMetaspace.java Passed: serviceability/sa/TestPrintMdo.java Passed: serviceability/sa/TestRevPtrsForInvokeDynamic.java Passed: serviceability/sa/TestType.java Passed: serviceability/sa/TestUniverse.java Test results: passed: 44; failed: 1; error: 1 ** release ** $ sudo JT_JAVA=/usr/lib/jvm/java-1.8.0-openjdk-ppc64el /home/gromero/jtreg/bin/jtreg -nativepath:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-server-release/support/test/hotspot/jtreg/native/lib -v1 -conc:1 -jdk:./build/linux-ppc64le-server-release/images/jdk/ ./test/hotspot/jtreg/serviceability/sa Passed: serviceability/sa/jmap-hprof/JMapHProfLargeHeapTest.java Passed: serviceability/sa/sadebugd/DebugdConnectTest.java Passed: serviceability/sa/sadebugd/SADebugDTest.java Passed: serviceability/sa/CDSJMapClstats.java Passed: serviceability/sa/ClhsdbAttach.java Passed: serviceability/sa/ClhsdbCDSCore.java Passed: serviceability/sa/ClhsdbCDSJstackPrintAll.java Passed: serviceability/sa/ClhsdbField.java Passed: serviceability/sa/ClhsdbFindPC.java Passed: serviceability/sa/ClhsdbFlags.java Passed: serviceability/sa/ClhsdbInspect.java Passed: serviceability/sa/ClhsdbJdis.java Passed: serviceability/sa/ClhsdbJhisto.java Passed: serviceability/sa/ClhsdbJstack.java Passed: serviceability/sa/ClhsdbLongConstant.java Passed: serviceability/sa/ClhsdbPmap.java Passed: serviceability/sa/ClhsdbPrintAll.java Passed: serviceability/sa/ClhsdbPrintAs.java Passed: serviceability/sa/ClhsdbPrintStatics.java Passed: serviceability/sa/ClhsdbPstack.java Passed: serviceability/sa/ClhsdbRegionDetailsScanOopsForG1.java Passed: serviceability/sa/ClhsdbScanOops.java Passed: serviceability/sa/ClhsdbSource.java Passed: serviceability/sa/ClhsdbThread.java Passed: serviceability/sa/ClhsdbVmStructsDump.java Passed: serviceability/sa/ClhsdbWhere.java Passed: serviceability/sa/DeadlockDetectionTest.java Passed: serviceability/sa/JhsdbThreadInfoTest.java Passed: serviceability/sa/TestClassDump.java Passed: serviceability/sa/TestClhsdbJstackLock.java Passed: serviceability/sa/TestCpoolForInvokeDynamic.java Passed: serviceability/sa/TestDefaultMethods.java Passed: serviceability/sa/TestG1HeapRegion.java Passed: serviceability/sa/TestHeapDumpForInvokeDynamic.java Passed: serviceability/sa/TestHeapDumpForLargeArray.java FAILED: serviceability/sa/TestInstanceKlassSize.java Passed: serviceability/sa/TestInstanceKlassSizeForInterface.java Passed: serviceability/sa/TestIntConstant.java Passed: serviceability/sa/TestJhsdbJstackLock.java Passed: serviceability/sa/TestJhsdbJstackMixed.java Passed: serviceability/sa/TestJmapCore.java Passed: serviceability/sa/TestJmapCoreMetaspace.java Passed: serviceability/sa/TestPrintMdo.java Passed: serviceability/sa/TestRevPtrsForInvokeDynamic.java Passed: serviceability/sa/TestType.java Passed: serviceability/sa/TestUniverse.java Test results: passed: 45; failed: 1 From martin.doerr at sap.com Tue Jul 30 16:03:48 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 30 Jul 2019 16:03:48 +0000 Subject: RFR(M): 8228743: [PPC64, s390] Implement FastJNIAccessors In-Reply-To: <6EDF694A-01B1-465E-9353-ED08B6544278@sap.com> References: <6EDF694A-01B1-465E-9353-ED08B6544278@sap.com> Message-ID: Hi G?tz and Lutz, thank you for the reviews. > Maybe a comment would be helpful at the call sites that > JNI_FastGetField::find_slowcase_pc(pc) not only finds the slowcase_pc, but > also decides if the signal at hand is related to a FastGetField access. I think it'd be a little helpful, but not too hard to find out. I prefer to keep the comment and implementation in the signal handler an exact copy from the other platforms. Best regards, Martin > -----Original Message----- > From: Schmidt, Lutz > Sent: Dienstag, 30. Juli 2019 17:46 > To: Lindenmaier, Goetz ; Doerr, Martin > ; hotspot-runtime-dev at openjdk.java.net; > Gustavo Romero > Subject: Re: RFR(M): 8228743: [PPC64, s390] Implement FastJNIAccessors > > Hi Martin, > > your change looks good to me. Please note that I'm NOT a Reviewer. > > Maybe a comment would be helpful at the call sites that > JNI_FastGetField::find_slowcase_pc(pc) not only finds the slowcase_pc, but > also decides if the signal at hand is related to a FastGetField access. > > Thanks, > Lutz > > ?On 30.07.19, 14:29, "Lindenmaier, Goetz" > wrote: > > Hi Martin, > > overall, the change looks good to me. > > It's a bit confusing that the method with the > implementation has _int_ as infix: > generate_fast_get_int_field0 > while it is used for all data types, > but this is similar on other platforms. > > // order preceding load > You might want to capitalize this, like the other comments. > No webrev needed. > > Best regards, > Goetz. > > > > > -----Original Message----- > > From: Doerr, Martin > > Sent: Montag, 29. Juli 2019 19:43 > > To: hotspot-runtime-dev at openjdk.java.net; Lindenmaier, Goetz > > ; Schmidt, Lutz ; > > Gustavo Romero > > Subject: RFR(M): 8228743: [PPC64, s390] Implement FastJNIAccessors > > > > Hi, > > > > > > > > I?d like to contribute fast JNI Get*Field platform implementations for > PPC64 > > and s390. > > > > > > > > Please review: > > > > > http://cr.openjdk.java.net/~mdoerr/8228743_PPC64_s390_FastJNIAccessors > / > > webrev.00/ > > > rs > > /webrev.00/> > > > > > > > > Best regards, > > > > Martin > > > > > > From coleen.phillimore at oracle.com Tue Jul 30 20:43:47 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 30 Jul 2019 16:43:47 -0400 Subject: RFR (T) 8228673: Remove develop flag StrictSafepointChecks Message-ID: <23c654c8-7042-a615-25b4-2eab659d0e48@oracle.com> Remove option to turn off checking.? See bug for more details. open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228673.01/webrev bug link https://bugs.openjdk.java.net/browse/JDK-8228673 Tested with hs-tier1 on Oracle platforms. Thanks, Coleen From coleen.phillimore at oracle.com Tue Jul 30 20:45:21 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 30 Jul 2019 16:45:21 -0400 Subject: RFR (S) 8228630: Remove always true parameter to NoSafepointVerifier Message-ID: <7c75242b-fd26-16ae-bfee-0ac599e9b431@oracle.com> Summary: Also remove NoGCVerifier since NoSafepointVerifier covers GC checking when not already at a safepoint and is a stronger check. See bug for more details also.? Tested with all jtreg runtime,compiler,serviceability and gc tests.? Also hs-tier1-3 on linux-x64-debug. open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228630.01/webrev bug link https://bugs.openjdk.java.net/browse/JDK-8228630 Thanks, Coleen From david.holmes at oracle.com Tue Jul 30 20:55:53 2019 From: david.holmes at oracle.com (David Holmes) Date: Wed, 31 Jul 2019 06:55:53 +1000 Subject: RFR: 8228764: New library dependencies due to JDK-8222720 - was: New library dependencies due to 8222720 (fb5b3981eac) In-Reply-To: References: Message-ID: <6eab818a-dcdf-221e-62ff-672448f3df0c@oracle.com> Hi Matthias, On 31/07/2019 12:21 am, Baesken, Matthias wrote: > Hello , I prepared a webrev following the idea proposed by Goetz ; please review ! > > http://cr.openjdk.java.net/~mbaesken/webrevs/8228764.0/ Looks good. Thanks for fixing. Please proceed with the RDP2 approval process to get this into 13 (it will then automatically propagate to 14). Thanks, David ----- > bug opened by David : > > https://bugs.openjdk.java.net/browse/JDK-8228764 > > > Best regards, Matthias > > >> -----Original Message----- >> From: David Holmes >> Sent: Dienstag, 30. Juli 2019 13:51 >> To: Lindenmaier, Goetz ; Baesken, Matthias >> ; Rainer Jung ; >> hotspot-runtime-dev at openjdk.java.net >> Subject: Re: New library dependencies due to 8222720 (fb5b3981eac) >> >> Hi Goetz, >> >> On 30/07/2019 9:06 pm, Lindenmaier, Goetz wrote: >>> Hi, >>> >>> there is already -XX:ExtensiveErrorReports with default 'false'. >>> It's supposed to guard additional infos in the hs_err file. >>> As it's already available, no CSR should be needed. >>> >>> Can't we just use this? Below tiny fix should do the job. >>> >>> diff -r 144585063bc8 src/hotspot/share/utilities/virtualizationSupport.cpp >>> --- a/src/hotspot/share/utilities/virtualizationSupport.cpp Tue Jul 30 >> 11:14:16 2019 +0800 >>> +++ b/src/hotspot/share/utilities/virtualizationSupport.cpp Tue Jul 30 >> 13:04:58 2019 +0200 >>> @@ -40,6 +40,9 @@ >>> static char extended_resource_info_at_startup[600]; >>> >>> void VirtualizationSupport::initialize() { >>> + >>> + if (!ExtensiveErrorReports) return; >>> + >>> // open vmguestlib and bind SDK functions >>> char ebuf[1024]; >>> dlHandle = os::dll_load("vmGuestLib", ebuf, sizeof ebuf); >> >> That seems quite reasonable to me - this is extended error information. >> >> Great suggestion! >> >> Thanks, >> David >> >>> Best regards, >>> Goetz. >>> >>>> -----Original Message----- >>>> From: hotspot-runtime-dev > bounces at openjdk.java.net> >>>> On Behalf Of David Holmes >>>> Sent: Dienstag, 30. Juli 2019 10:39 >>>> To: Baesken, Matthias ; Rainer Jung >>>> ; hotspot-runtime-dev at openjdk.java.net >>>> Subject: Re: New library dependencies due to 8222720 (fb5b3981eac) >>>> >>>> Hi Matthias, >>>> >>>> On 30/07/2019 6:18 pm, Baesken, Matthias wrote: >>>>> Hi David, in our proprietary JVM we have an XX flag to >> enable/disable >>>> the usage of the guestlib for people who don't want it . >>>>> Should I go for this ? >>>> >>>> We can look at that for 14 but for 13 (and 11.0.5) I think we just need >>>> to back this out. >>>> >>>> Thanks, >>>> David >>>> >>>>> >>>>> Best regards, Matthias >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes >>>>>> Sent: Dienstag, 30. Juli 2019 09:51 >>>>>> To: Rainer Jung ; hotspot-runtime- >>>>>> dev at openjdk.java.net; Baesken, Matthias >> >>>>>> Subject: Re: New library dependencies due to 8222720 (fb5b3981eac) >>>>>> >>>>>> Hi Rainer, >>>>>> >>>>>> I have filed: >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8228764 >>>>>> >>>>>> Matthias: I think we may have to backout JDK-8222720 from JDK 13, >>>>>> re-examine this and re-do for 14. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> ----- >>>>>> >>>>>> On 30/07/2019 5:34 pm, Rainer Jung wrote: >>>>>>> Hi David, >>>>>>> >>>>>>> Am 30.07.2019 um 01:56 schrieb David Holmes: >>>>>>>> Hi Rainer, >>>>>>>> >>>>>>>> On 30/07/2019 7:34 am, Rainer Jung wrote: >>>>>>>>> While doing Tomcat tests I noticed, that at least on SLES 12 JDK 13 >>>>>>>>> and 14 EA have a lot of new runtime library dependencies. >>>>>>>>> >>>>>>>>> Change fb5b3981eac with log >>>>>>>>> >>>>>>>>> 8222720: Provide extended VMWare/vSphere virtualization related >> info >>>>>>>>> in the hs_error file on linux/windows x86_64 >>>>>>>>> >>>>>>>>> loads /usr/lib64/libguestlib.so.0 already during JVM startup. That >>>>>>>>> library depends on /usr/lib64/libvmtools.so.0, which in turn >> depends >>>>>>>>> on a lot of other libraries: >>>>>>>>> >>>>>>>>> ?? NEEDED?????????????? libdnet.so.1 >>>>>>>>> ?? NEEDED?????????????? libglib-2.0.so.0 >>>>>>>>> ?? NEEDED?????????????? libicui18n.so.52.1 >>>>>>>>> ?? NEEDED?????????????? libicuuc.so.52.1 >>>>>>>>> ?? NEEDED?????????????? libpthread.so.0 >>>>>>>>> ?? NEEDED?????????????? libdl.so.2 >>>>>>>>> ?? NEEDED?????????????? libssl.so.1.0.0 >>>>>>>>> ?? NEEDED?????????????? libcrypto.so.1.0.0 >>>>>>>>> ?? NEEDED?????????????? libc.so.6 >>>>>>>>> ?? NEEDED?????????????? ld-linux-x86-64.so.2 >>>>>>>>> ?? NEEDED?????????????? libgcc_s.so.1 >>>>>>>>> >>>>>>>>> Some are not so problematic, but for instance Tomcat is able to use >>>>>>>>> custom build OpenSSL libraries to replace the JSSE crypto engine >> with >>>>>>>>> an OpenSSL based one using JNI. Unfortunately the JDK is now >> loading >>>>>>>>> libssl and libcrypto early. In case our TC OpenSSL also uses SO >>>>>>>>> version 1.0.0 it will not get loaded, in case it is another version >>>>>>>>> we can run into a mix of symbols resolved in the platform OpenSSL >>>>>>>>> libs now loaded early and the ones provided with TC loaded later. >>>>>>>>> >>>>>>>>> This is an example, why it would be good to not introduce too many >>>>>>>>> native library dependencies for the JVM or make it optional in the >>>>>>>>> sense of configurable during runtime. Of the above list, the icu >>>>>>>>> libs, libglib and libdnet are other libs one would probably try to >>>>>>>>> avoid. >>>>>>>>> >>>>>>>>> Don't know whether this list is appropriate for discussing it. If not >>>>>>>>> any pointers to a better list are appreciated. >>>>>>>> >>>>>>>> This is the correct list to discuss this. >>>>>>>> >>>>>>>> When 8222720 was put in I had no idea it would result in eager >> loading >>>>>>>> of libraries beyond the explicit load of libguestlib. >>>>>>>> >>>>>>>> To be clear you are running under VMWare? This should only >> happen to >>>>>>>> enable reporting for the VMWare virtualization info in case of a >> crash. >>>>>>> >>>>>>> Yes, I am running under VMWare. The library >> /usr/lib64/libguestlib.so.0 >>>>>>> and its dependency /usr/lib64/libvmtools.so.0 both belong to the >> package >>>>>>> libvmtools0. Its sources seem to be available at >>>>>>> https://github.com/vmware/open-vm-tools. >>>>>>> >>>>>>>> This may need to be revisited. >>>>>>>> >>>>>>>> Thanks for the report. >>>>>>> >>>>>>> Thanks for looking at this! >>>>>>> >>>>>>> Rainer >>>>>>> From markus.gronlund at oracle.com Tue Jul 30 21:04:20 2019 From: markus.gronlund at oracle.com (Markus Gronlund) Date: Tue, 30 Jul 2019 14:04:20 -0700 (PDT) Subject: [13] RFR(XS): 8227605: Kitchensink fails "assert((((klass)->trace_id() & (JfrTraceIdEpoch::leakp_in_use_this_epoch_bit())) != 0)) failed: invariant" Message-ID: <41feed64-0678-414e-bb52-e9951f05c1b2@default> Greetings, Kindly asking for reviews for the following changeset: Bug: https://bugs.openjdk.java.net/browse/JDK-8227605 Webrev: http://cr.openjdk.java.net/~mgronlun/8227605/webrev01/ Summary: Clearing a bit that was set in a previous epoch should be done using CAS not to lose information in the current (this) epoch. This has also been the case up to the changes done in relation to Memory Leak Profiler, where the bit tagging scheme and implementation changed quite substantially. Part of the modifications done there had set_traceid_mask() to not use CAS unfortunately. This is the reason for the assertion, as information about the current (this) epoch was lost. We need to restore set_traceid_mask() to use CAS the way it was done originally. Thanks to Erik Gahlin for debugging. Markus From shade at redhat.com Tue Jul 30 21:12:52 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 30 Jul 2019 23:12:52 +0200 Subject: RFR (T) 8228673: Remove develop flag StrictSafepointChecks In-Reply-To: <23c654c8-7042-a615-25b4-2eab659d0e48@oracle.com> References: <23c654c8-7042-a615-25b4-2eab659d0e48@oracle.com> Message-ID: <4d3181d4-3d5e-f1bc-d858-3c2e6eed875f@redhat.com> On 7/30/19 10:43 PM, coleen.phillimore at oracle.com wrote: > Remove option to turn off checking.? See bug for more details. > > open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228673.01/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8228673 Makes sense. Looks good. It is curious that there is #ifdef ASSERT block in Thread::check_for_valid_safepoint_state body, which is probably redundant too, if we discount the idiosyncrasy between (not_)debug and (not_)product. -- Thanks, -Aleksey From david.holmes at oracle.com Tue Jul 30 21:26:15 2019 From: david.holmes at oracle.com (David Holmes) Date: Wed, 31 Jul 2019 07:26:15 +1000 Subject: RFR 8227528: TestAbortVMOnSafepointTimeout.java failed due to "RuntimeException: 'Safepoint sync time longer than' missing from stdout/stderr" In-Reply-To: <64d07f0c-c920-cc76-6406-ff24168270b6@oracle.com> References: <64d07f0c-c920-cc76-6406-ff24168270b6@oracle.com> Message-ID: <60f5127f-926b-1446-3510-4ff890637db6@oracle.com> Looks good! Thanks, David On 30/07/2019 9:45 am, Patricio Chilano wrote: > Hi David, > > On 7/26/19 6:27 PM, David Holmes wrote: >> On 27/07/2019 5:19 am, Daniel D. Daugherty wrote: >>> On 7/26/19 2:46 PM, Patricio Chilano wrote: >>>> Hi all, >>>> >>>> Could you review this small fix for test >>>> TestAbortVMOnSafepointTimeout.java? >>>> >>>> The test has been failing intermittently since 8191890. As explained >>>> in the bug comments, it turns out that a bias revocation handshake >>>> could happen in between the start of the "for" loop without >>>> safepoint polls and the safepoint where we want to timeout. That >>>> allows for the long loop to actually finish and prevents the desired >>>> timeout in the later safepoint. The simple solution is to just avoid >>>> using biased locking in this test (and therefore prevent the >>>> revocation handshake), since we just want to test the correct >>>> behavior of flag AbortVMOnSafepointTimeout. >>>> >>>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8227528/v01/webrev >>> >>> The change itself is trivial. However, the reasons behind the change >>> aren't. >>> >>> This part of the description caught my eye: >>> >>> ???? the start of the "for" loop without safepoint polls >>> >>> and my brain did a "Say what?!?!" Of course, that was without looking at >>> the test which has a huge number of options, including these: >>> >>> ???? L70: ??????????????? "-XX:-UseCountedLoopSafepoints", >>> ???? L71: ??????????????? "-XX:LoopStripMiningIter=0", >>> ???? L72: ??????????????? "-XX:LoopUnrollLimit=0", >>> >>> Okay, now the world makes much more sense. We are intentionally telling >>> the compiler to not emit safepoint polls in the counted loop and we're >>> turning off other loop optimizations. Basically, we're telling the >>> compiler we want to stall in that loop until we exceed the safepoint >>> timeout limit. Got it... >>> >>> So the new biased locking handshake messes with the timeout that this >>> test is trying to achieve. Disabling biased locking makes the test more >>> robust by allowing the safepoint sync timeout to happen. >>> >>> A couple of minor suggestions: >>> >>> test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.java >>> ???? L30:? * @bug 8219584 >>> >>> ?? ????? You should add an @bug for this bug (8227528). I don't know if >>> ???????? you can put more than one bug ID on an @bug line or if you need >>> ???????? a separate @bug line. >>> >>> ???? L61: ??????? ProcessBuilder pb = >>> ProcessTools.createJavaProcessBuilder( >>> ???????? Please add a comment above this line: >>> >>> ???????????? // -XX:-UseBiasedLocking - is used to prevent biased >>> locking >>> ???????????? // handshakes from changing the timing of this test. >>> >>> Thumbs up. I don't need to see another webrev if you choose to make >>> the above changes. >> >> I think some additional commentary on the other exotic options to >> ensure the loop contains no safepoints and is not unrolled etc would >> also be worthwhile. > I added comments for flags UseCountedLoopSafepoints, LoopStripMiningIter > and LoopUnrollLimit. Here are the links to v02: > > Full: http://cr.openjdk.java.net/~pchilanomate/8227528/v02/webrev/ > Inc: http://cr.openjdk.java.net/~pchilanomate/8227528/v02/inc/webrev/ > > > Thanks for looking at this David! > > > Patricio >> Change itself makes sense. >> >> Thanks, >> David >> >>> >>> Dan >>> >>> >>>> Bugid: https://bugs.openjdk.java.net/browse/JDK-8227528 >>>> >>>> Thanks! >>>> Patricio >>> > From coleen.phillimore at oracle.com Tue Jul 30 21:27:10 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 30 Jul 2019 17:27:10 -0400 Subject: RFR (T) 8228673: Remove develop flag StrictSafepointChecks In-Reply-To: <4d3181d4-3d5e-f1bc-d858-3c2e6eed875f@redhat.com> References: <23c654c8-7042-a615-25b4-2eab659d0e48@oracle.com> <4d3181d4-3d5e-f1bc-d858-3c2e6eed875f@redhat.com> Message-ID: On 7/30/19 5:12 PM, Aleksey Shipilev wrote: > On 7/30/19 10:43 PM, coleen.phillimore at oracle.com wrote: >> Remove option to turn off checking.? See bug for more details. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228673.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8228673 > Makes sense. Looks good. > > It is curious that there is #ifdef ASSERT block in Thread::check_for_valid_safepoint_state body, > which is probably redundant too, if we discount the idiosyncrasy between (not_)debug and (not_)product. > I think it needs it for rank(), which is not compiled in product. #ifndef PRODUCT vs #ifdef ASSERT is a mess. ??? DEBUG_ONLY(if (rank() != Mutex::special) \ ?????????????? thread->check_for_valid_safepoint_state(false);) Thanks for the code review! Coleen From coleen.phillimore at oracle.com Tue Jul 30 21:31:16 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 30 Jul 2019 17:31:16 -0400 Subject: RFR (T) 8228673: Remove develop flag StrictSafepointChecks In-Reply-To: References: <23c654c8-7042-a615-25b4-2eab659d0e48@oracle.com> <4d3181d4-3d5e-f1bc-d858-3c2e6eed875f@redhat.com> Message-ID: On 7/30/19 5:27 PM, coleen.phillimore at oracle.com wrote: > > > On 7/30/19 5:12 PM, Aleksey Shipilev wrote: >> On 7/30/19 10:43 PM, coleen.phillimore at oracle.com wrote: >>> Remove option to turn off checking.? See bug for more details. >>> >>> open webrev at >>> http://cr.openjdk.java.net/~coleenp/2019/8228673.01/webrev >>> bug link https://bugs.openjdk.java.net/browse/JDK-8228673 >> Makes sense. Looks good. >> >> It is curious that there is #ifdef ASSERT block in >> Thread::check_for_valid_safepoint_state body, >> which is probably redundant too, if we discount the idiosyncrasy >> between (not_)debug and (not_)product. >> > I think it needs it for rank(), which is not compiled in product. > #ifndef PRODUCT vs #ifdef ASSERT is a mess. > > ??? DEBUG_ONLY(if (rank() != Mutex::special) \ > ?????????????? thread->check_for_valid_safepoint_state(false);) > > Thanks for the code review! > Coleen Oh, you mean this one in check_for_valid_safepoint_state (I answered too quickly on the first inconsistency). #ifdef ASSERT ? if (potential_vm_operation && is_Java_thread() ????? && !Universe::is_bootstrapping()) { Maybe it's not redundant because of the optimized build. Coleen From david.holmes at oracle.com Tue Jul 30 21:32:27 2019 From: david.holmes at oracle.com (David Holmes) Date: Wed, 31 Jul 2019 07:32:27 +1000 Subject: RFR (XXXS) 8227250: UserHandler contains ancient LinuxThreads code In-Reply-To: <4e431cbe-53ba-ebf9-bf57-0598e0db41ff@oracle.com> References: <4c677b95-edcd-584b-6250-630422d20c71@oracle.com> <4e431cbe-53ba-ebf9-bf57-0598e0db41ff@oracle.com> Message-ID: <6ed02c78-f3ff-55f2-4e8b-c9b7aa6208a5@oracle.com> Thanks Harold! David On 30/07/2019 10:40 pm, Harold Seigel wrote: > This looks good! > > Thanks, Harold > > On 7/30/2019 1:09 AM, David Holmes wrote: >> bug: https://bugs.openjdk.java.net/browse/JDK-8227250 >> webrev: http://cr.openjdk.java.net/~dholmes/8227250/webrev/ >> >> Removed some ancient Linux code that pertained to the LinuxThreads >> implementation, and which was erroneously copied into the BSD and AIX >> ports. >> >> Thanks, >> David From coleen.phillimore at oracle.com Tue Jul 30 21:33:49 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 30 Jul 2019 17:33:49 -0400 Subject: RFR (S) 8228630: Remove always true parameter to NoSafepointVerifier In-Reply-To: <7c75242b-fd26-16ae-bfee-0ac599e9b431@oracle.com> References: <7c75242b-fd26-16ae-bfee-0ac599e9b431@oracle.com> Message-ID: <3f113d42-1c53-c4cc-9b20-a30ae5a96279@oracle.com> I fixed the comment in thread.hpp above the _allow_safepoint_count and renamed the field to _no_safepoint_count, which makes a lot more sense.? Retesting. open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228630.02/webrev Thanks, Coleen On 7/30/19 4:45 PM, coleen.phillimore at oracle.com wrote: > Summary: Also remove NoGCVerifier since NoSafepointVerifier covers GC > checking when not already at a safepoint and is a stronger check. > > See bug for more details also.? Tested with all jtreg > runtime,compiler,serviceability and gc tests.? Also hs-tier1-3 on > linux-x64-debug. > > open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228630.01/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8228630 > > Thanks, > Coleen From markus.gronlund at oracle.com Tue Jul 30 21:53:07 2019 From: markus.gronlund at oracle.com (Markus Gronlund) Date: Tue, 30 Jul 2019 14:53:07 -0700 (PDT) Subject: [13] RFR(S): 8228834: Regression caused by JDK-8214542 not installing complete checkpoint data to candidates Message-ID: Greetings, Looking for reviews for the following changeset: Bug: https://bugs.openjdk.java.net/browse/JDK-8228834 Webrev: http://cr.openjdk.java.net/~mgronlun/8228834/webrev01/ Summary: I introduced a regression with https://bugs.openjdk.java.net/browse/JDK-8214542 in that candidates were prematurely marked as resolved. A consequence is that candidates are not being updated with complete checkpoint data. Thanks Markus From gromero at linux.vnet.ibm.com Tue Jul 30 22:18:11 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 30 Jul 2019 19:18:11 -0300 Subject: RFR(S): 8228649: [PPC64] SA reads wrong slots from interpreter frames In-Reply-To: References: Message-ID: <20fa3a13-cc7b-51c2-1312-d1cf4b66e072@linux.vnet.ibm.com> Hi Martin, Goetz, On 07/30/2019 12:51 PM, Doerr, Martin wrote: > I have removed ClhsdbCDSJstackPrintAll.java and ClhsdbFindPC.java from the ProblemList. > Tests have passed. Webrev is updated in place. > I?ll push it after some more testing time. So, just for completeness, I see that ./test/jdk/sun/tools/jhsdb/BasicLauncherTest.java, listed in JDK-8211767, is also fixed on fastdebug and release builds. Afaics it was also failing due to the monitor offset being wrong, like the other SA tests. @Goetz, are you ok to keep JDK-8211767 [0] closed so? Thanks. Best regards, Gustavo [0] https://bugs.openjdk.java.net/browse/JDK-8211767 From kim.barrett at oracle.com Tue Jul 30 22:31:57 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 30 Jul 2019 18:31:57 -0400 Subject: RFR (S) 8228630: Remove always true parameter to NoSafepointVerifier In-Reply-To: <3f113d42-1c53-c4cc-9b20-a30ae5a96279@oracle.com> References: <7c75242b-fd26-16ae-bfee-0ac599e9b431@oracle.com> <3f113d42-1c53-c4cc-9b20-a30ae5a96279@oracle.com> Message-ID: <3CCDE2FA-61B6-41A6-877C-5CD8F089A55E@oracle.com> > On Jul 30, 2019, at 5:33 PM, coleen.phillimore at oracle.com wrote: > > > I fixed the comment in thread.hpp above the _allow_safepoint_count and renamed the field to _no_safepoint_count, which makes a lot more sense. Retesting. > > open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228630.02/webrev Thanks for renaming the member and updating the comment as we discussed. Looks good. > > Thanks, > Coleen > > On 7/30/19 4:45 PM, coleen.phillimore at oracle.com wrote: >> Summary: Also remove NoGCVerifier since NoSafepointVerifier covers GC checking when not already at a safepoint and is a stronger check. >> >> See bug for more details also. Tested with all jtreg runtime,compiler,serviceability and gc tests. Also hs-tier1-3 on linux-x64-debug. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228630.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8228630 >> >> Thanks, >> Coleen From daniel.daugherty at oracle.com Tue Jul 30 23:34:04 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 30 Jul 2019 19:34:04 -0400 Subject: RFR (XXXS) 8227250: UserHandler contains ancient LinuxThreads code In-Reply-To: <4c677b95-edcd-584b-6250-630422d20c71@oracle.com> References: <4c677b95-edcd-584b-6250-630422d20c71@oracle.com> Message-ID: On 7/30/19 1:09 AM, David Holmes wrote: > bug: https://bugs.openjdk.java.net/browse/JDK-8227250 > webrev: http://cr.openjdk.java.net/~dholmes/8227250/webrev/ src/hotspot/os/aix/os_aix.cpp ??? No comments. src/hotspot/os/bsd/os_bsd.cpp ??? No comments. src/hotspot/os/linux/os_linux.cpp ??? No comments. Thumbs up! Thanks for cleaning this up. Dan > > Removed some ancient Linux code that pertained to the LinuxThreads > implementation, and which was erroneously copied into the BSD and AIX > ports. > > Thanks, > David From david.holmes at oracle.com Wed Jul 31 02:49:01 2019 From: david.holmes at oracle.com (David Holmes) Date: Wed, 31 Jul 2019 12:49:01 +1000 Subject: RFR (XXXS) 8227250: UserHandler contains ancient LinuxThreads code In-Reply-To: References: <4c677b95-edcd-584b-6250-630422d20c71@oracle.com> Message-ID: <39f3ba10-f56c-ea01-1ebd-306ec42e5b25@oracle.com> Thanks Dan! David On 31/07/2019 9:34 am, Daniel D. Daugherty wrote: > On 7/30/19 1:09 AM, David Holmes wrote: >> bug: https://bugs.openjdk.java.net/browse/JDK-8227250 >> webrev: http://cr.openjdk.java.net/~dholmes/8227250/webrev/ > > src/hotspot/os/aix/os_aix.cpp > ??? No comments. > > src/hotspot/os/bsd/os_bsd.cpp > ??? No comments. > > src/hotspot/os/linux/os_linux.cpp > ??? No comments. > > Thumbs up! Thanks for cleaning this up. > > Dan > > > > >> >> Removed some ancient Linux code that pertained to the LinuxThreads >> implementation, and which was erroneously copied into the BSD and AIX >> ports. >> >> Thanks, >> David > From david.holmes at oracle.com Wed Jul 31 04:19:59 2019 From: david.holmes at oracle.com (David Holmes) Date: Wed, 31 Jul 2019 14:19:59 +1000 Subject: RFR (T) 8228673: Remove develop flag StrictSafepointChecks In-Reply-To: <23c654c8-7042-a615-25b4-2eab659d0e48@oracle.com> References: <23c654c8-7042-a615-25b4-2eab659d0e48@oracle.com> Message-ID: <14833d39-3654-cdfb-1deb-8a645779a422@oracle.com> Looks good! Another flag bites the dust :) Thanks, David On 31/07/2019 6:43 am, coleen.phillimore at oracle.com wrote: > Remove option to turn off checking.? See bug for more details. > > open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228673.01/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8228673 > > Tested with hs-tier1 on Oracle platforms. > Thanks, > Coleen From david.holmes at oracle.com Wed Jul 31 05:01:12 2019 From: david.holmes at oracle.com (David Holmes) Date: Wed, 31 Jul 2019 15:01:12 +1000 Subject: RFR (S) 8228630: Remove always true parameter to NoSafepointVerifier In-Reply-To: <7c75242b-fd26-16ae-bfee-0ac599e9b431@oracle.com> References: <7c75242b-fd26-16ae-bfee-0ac599e9b431@oracle.com> Message-ID: <912584ac-b79c-0697-6b0a-9273bd88a72c@oracle.com> Hi Coleen, On 31/07/2019 6:45 am, coleen.phillimore at oracle.com wrote: > Summary: Also remove NoGCVerifier since NoSafepointVerifier covers GC > checking when not already at a safepoint and is a stronger check. It wasn't at all clear to me that we may not want a NoGCVerifier that is independent of safepoints e.g. for use in a non-JavaThread. But AFAICS we don't use NoGCVerifier directly, but only via NoSafepointVerifier - in which case the "no safepoint" check subsumes the "no gc" check and the whole thing collapses to what you have (which results in a nice amount of code deletion!). It took me a while to follow through all the changes but it seems good. I spotted this reference to NoGCVerifier: ./share/opto/runtime.cpp:// Thus, it cannot be a leaf since it contains the NoGCVerifier. I'm not at all sure that the rest of the comment related to this is accurate any more - the reference to the NoGCVerifier didn't make sense to me. Thanks, David ----- > See bug for more details also.? Tested with all jtreg > runtime,compiler,serviceability and gc tests.? Also hs-tier1-3 on > linux-x64-debug. > > open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228630.01/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8228630 > Thanks, > Coleen From david.holmes at oracle.com Wed Jul 31 05:35:55 2019 From: david.holmes at oracle.com (David Holmes) Date: Wed, 31 Jul 2019 15:35:55 +1000 Subject: [13] RFR(XS): 8227605: Kitchensink fails "assert((((klass)->trace_id() & (JfrTraceIdEpoch::leakp_in_use_this_epoch_bit())) != 0)) failed: invariant" In-Reply-To: <41feed64-0678-414e-bb52-e9951f05c1b2@default> References: <41feed64-0678-414e-bb52-e9951f05c1b2@default> Message-ID: <1f233b8e-650e-2306-63e3-c729bfcd350d@oracle.com> Hi Markus, On 31/07/2019 7:04 am, Markus Gronlund wrote: > Greetings, > > Kindly asking for reviews for the following changeset: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8227605 > Webrev: http://cr.openjdk.java.net/~mgronlun/8227605/webrev01/ > > Summary: > Clearing a bit that was set in a previous epoch should be done using CAS not to lose information in the current (this) epoch. This has also been the case up to the changes done in relation to Memory Leak Profiler, where the bit tagging scheme and implementation changed quite substantially. Part of the modifications done there had set_traceid_mask() to not use CAS unfortunately. This is the reason for the assertion, as information about the current (this) epoch was lost. > > We need to restore set_traceid_mask() to use CAS the way it was done originally. AFAICS you have: 1. Refactored the CAS code using a template function to avoid code duplication That seems okay. 2. Changed SET_LEAKP_USED_PREV_EPOCH to use SET_LEAKP_TAG_CAS Okay that ensures CAS is used to update the epoch as per your summary. 3. Modified set_mask to use CAS Okay - as per summary 4. Removed use of load_acquire in the non-CAS form of set_bits This raises some queries about the use of OrderAccess in this code. On the one hand I might expect the load-acquire to be necessary to ensure correct ordering with respect to the implicit release_store of the CAS form. But if we expect correct interaction between the CAS and non-CAS forms then set_bits should itself be using a release-store. Further, given set_bits is not using a release-store, the initial load-acquire in the CAS form is not necessary. So it seems to me either all the load-acquires should go, or else the load-acquires all stay and we add the missing release-store. Cheers, David ----- > Thanks to Erik Gahlin for debugging. > > Markus > From matthias.baesken at sap.com Wed Jul 31 07:11:21 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Wed, 31 Jul 2019 07:11:21 +0000 Subject: RFR: 8228764: New library dependencies due to JDK-8222720 - was: New library dependencies due to 8222720 (fb5b3981eac) In-Reply-To: <6eab818a-dcdf-221e-62ff-672448f3df0c@oracle.com> References: <6eab818a-dcdf-221e-62ff-672448f3df0c@oracle.com> Message-ID: Hi David, thanks for the review ! I requested The approval in JBS. Best regards, Matthias > > Hi Matthias, > > On 31/07/2019 12:21 am, Baesken, Matthias wrote: > > Hello , I prepared a webrev following the idea proposed by Goetz ; please > review ! > > > > http://cr.openjdk.java.net/~mbaesken/webrevs/8228764.0/ > > Looks good. Thanks for fixing. > > Please proceed with the RDP2 approval process to get this into 13 (it > will then automatically propagate to 14). > > Thanks, > David > ----- > > > bug opened by David : > > > > https://bugs.openjdk.java.net/browse/JDK-8228764 > > > > > > Best regards, Matthias > > > > > >> -----Original Message----- > >> From: David Holmes > >> Sent: Dienstag, 30. Juli 2019 13:51 > >> To: Lindenmaier, Goetz ; Baesken, > Matthias > >> ; Rainer Jung ; > >> hotspot-runtime-dev at openjdk.java.net > >> Subject: Re: New library dependencies due to 8222720 (fb5b3981eac) > >> > >> Hi Goetz, > >> > >> On 30/07/2019 9:06 pm, Lindenmaier, Goetz wrote: > >>> Hi, > >>> > >>> there is already -XX:ExtensiveErrorReports with default 'false'. > >>> It's supposed to guard additional infos in the hs_err file. > >>> As it's already available, no CSR should be needed. > >>> > >>> Can't we just use this? Below tiny fix should do the job. > >>> > >>> diff -r 144585063bc8 > src/hotspot/share/utilities/virtualizationSupport.cpp > >>> --- a/src/hotspot/share/utilities/virtualizationSupport.cpp Tue Jul > 30 > >> 11:14:16 2019 +0800 > >>> +++ b/src/hotspot/share/utilities/virtualizationSupport.cpp Tue Jul > 30 > >> 13:04:58 2019 +0200 > >>> @@ -40,6 +40,9 @@ > >>> static char extended_resource_info_at_startup[600]; > >>> > >>> void VirtualizationSupport::initialize() { > >>> + > >>> + if (!ExtensiveErrorReports) return; > >>> + > >>> // open vmguestlib and bind SDK functions > >>> char ebuf[1024]; > >>> dlHandle = os::dll_load("vmGuestLib", ebuf, sizeof ebuf); > >> > >> That seems quite reasonable to me - this is extended error information. > >> > >> Great suggestion! > >> > >> Thanks, > >> David > >> > >>> Best regards, > >>> Goetz. > >>> > >>>> -----Original Message----- > >>>> From: hotspot-runtime-dev >> bounces at openjdk.java.net> > >>>> On Behalf Of David Holmes > >>>> Sent: Dienstag, 30. Juli 2019 10:39 > >>>> To: Baesken, Matthias ; Rainer Jung > >>>> ; hotspot-runtime-dev at openjdk.java.net > >>>> Subject: Re: New library dependencies due to 8222720 (fb5b3981eac) > >>>> > >>>> Hi Matthias, > >>>> > >>>> On 30/07/2019 6:18 pm, Baesken, Matthias wrote: > >>>>> Hi David, in our proprietary JVM we have an XX flag to > >> enable/disable > >>>> the usage of the guestlib for people who don't want it . > >>>>> Should I go for this ? > >>>> > >>>> We can look at that for 14 but for 13 (and 11.0.5) I think we just need > >>>> to back this out. > >>>> > >>>> Thanks, > >>>> David > >>>> > >>>>> > >>>>> Best regards, Matthias > >>>>> > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: David Holmes > >>>>>> Sent: Dienstag, 30. Juli 2019 09:51 > >>>>>> To: Rainer Jung ; hotspot-runtime- > >>>>>> dev at openjdk.java.net; Baesken, Matthias > >> > >>>>>> Subject: Re: New library dependencies due to 8222720 > (fb5b3981eac) > >>>>>> > >>>>>> Hi Rainer, > >>>>>> > >>>>>> I have filed: > >>>>>> > >>>>>> https://bugs.openjdk.java.net/browse/JDK-8228764 > >>>>>> > >>>>>> Matthias: I think we may have to backout JDK-8222720 from JDK 13, > >>>>>> re-examine this and re-do for 14. > >>>>>> > >>>>>> Thanks, > >>>>>> David > >>>>>> ----- > >>>>>> > >>>>>> On 30/07/2019 5:34 pm, Rainer Jung wrote: > >>>>>>> Hi David, > >>>>>>> > >>>>>>> Am 30.07.2019 um 01:56 schrieb David Holmes: > >>>>>>>> Hi Rainer, > >>>>>>>> > >>>>>>>> On 30/07/2019 7:34 am, Rainer Jung wrote: > >>>>>>>>> While doing Tomcat tests I noticed, that at least on SLES 12 JDK > 13 > >>>>>>>>> and 14 EA have a lot of new runtime library dependencies. > >>>>>>>>> > >>>>>>>>> Change fb5b3981eac with log > >>>>>>>>> > >>>>>>>>> 8222720: Provide extended VMWare/vSphere virtualization > related > >> info > >>>>>>>>> in the hs_error file on linux/windows x86_64 > >>>>>>>>> > >>>>>>>>> loads /usr/lib64/libguestlib.so.0 already during JVM startup. That > >>>>>>>>> library depends on /usr/lib64/libvmtools.so.0, which in turn > >> depends > >>>>>>>>> on a lot of other libraries: > >>>>>>>>> > >>>>>>>>> ?? NEEDED?????????????? libdnet.so.1 > >>>>>>>>> ?? NEEDED?????????????? libglib-2.0.so.0 > >>>>>>>>> ?? NEEDED?????????????? libicui18n.so.52.1 > >>>>>>>>> ?? NEEDED?????????????? libicuuc.so.52.1 > >>>>>>>>> ?? NEEDED?????????????? libpthread.so.0 > >>>>>>>>> ?? NEEDED?????????????? libdl.so.2 > >>>>>>>>> ?? NEEDED?????????????? libssl.so.1.0.0 > >>>>>>>>> ?? NEEDED?????????????? libcrypto.so.1.0.0 > >>>>>>>>> ?? NEEDED?????????????? libc.so.6 > >>>>>>>>> ?? NEEDED?????????????? ld-linux-x86-64.so.2 > >>>>>>>>> ?? NEEDED?????????????? libgcc_s.so.1 > >>>>>>>>> > >>>>>>>>> Some are not so problematic, but for instance Tomcat is able to > use > >>>>>>>>> custom build OpenSSL libraries to replace the JSSE crypto engine > >> with > >>>>>>>>> an OpenSSL based one using JNI. Unfortunately the JDK is now > >> loading > >>>>>>>>> libssl and libcrypto early. In case our TC OpenSSL also uses SO > >>>>>>>>> version 1.0.0 it will not get loaded, in case it is another version > >>>>>>>>> we can run into a mix of symbols resolved in the platform > OpenSSL > >>>>>>>>> libs now loaded early and the ones provided with TC loaded > later. > >>>>>>>>> > >>>>>>>>> This is an example, why it would be good to not introduce too > many > >>>>>>>>> native library dependencies for the JVM or make it optional in > the > >>>>>>>>> sense of configurable during runtime. Of the above list, the icu > >>>>>>>>> libs, libglib and libdnet are other libs one would probably try to > >>>>>>>>> avoid. > >>>>>>>>> > >>>>>>>>> Don't know whether this list is appropriate for discussing it. If not > >>>>>>>>> any pointers to a better list are appreciated. > >>>>>>>> > >>>>>>>> This is the correct list to discuss this. > >>>>>>>> > >>>>>>>> When 8222720 was put in I had no idea it would result in eager > >> loading > >>>>>>>> of libraries beyond the explicit load of libguestlib. > >>>>>>>> > >>>>>>>> To be clear you are running under VMWare? This should only > >> happen to > >>>>>>>> enable reporting for the VMWare virtualization info in case of a > >> crash. > >>>>>>> > >>>>>>> Yes, I am running under VMWare. The library > >> /usr/lib64/libguestlib.so.0 > >>>>>>> and its dependency /usr/lib64/libvmtools.so.0 both belong to the > >> package > >>>>>>> libvmtools0. Its sources seem to be available at > >>>>>>> https://github.com/vmware/open-vm-tools. > >>>>>>> > >>>>>>>> This may need to be revisited. > >>>>>>>> > >>>>>>>> Thanks for the report. > >>>>>>> > >>>>>>> Thanks for looking at this! > >>>>>>> > >>>>>>> Rainer > >>>>>>> From shade at redhat.com Wed Jul 31 08:15:01 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 31 Jul 2019 10:15:01 +0200 Subject: RFR (T) 8228673: Remove develop flag StrictSafepointChecks In-Reply-To: References: <23c654c8-7042-a615-25b4-2eab659d0e48@oracle.com> <4d3181d4-3d5e-f1bc-d858-3c2e6eed875f@redhat.com> Message-ID: <6d4d2930-d93b-f95a-204b-0f13473c66d5@redhat.com> On 7/30/19 11:31 PM, coleen.phillimore at oracle.com wrote: > Oh, you mean this one in check_for_valid_safepoint_state (I answered too quickly on the first > inconsistency). > > #ifdef ASSERT > ? if (potential_vm_operation && is_Java_thread() > ????? && !Universe::is_bootstrapping()) { > > Maybe it's not redundant because of the optimized build. Yeah. I don't think we have to clean that up at the moment. -- Thanks, -Aleksey From markus.gronlund at oracle.com Wed Jul 31 09:48:17 2019 From: markus.gronlund at oracle.com (Markus Gronlund) Date: Wed, 31 Jul 2019 02:48:17 -0700 (PDT) Subject: [13] RFR(XS): 8227605: Kitchensink fails "assert((((klass)->trace_id() & (JfrTraceIdEpoch::leakp_in_use_this_epoch_bit())) != 0)) failed: invariant" In-Reply-To: <1f233b8e-650e-2306-63e3-c729bfcd350d@oracle.com> References: <41feed64-0678-414e-bb52-e9951f05c1b2@default> <1f233b8e-650e-2306-63e3-c729bfcd350d@oracle.com> Message-ID: Hi David, Thank you for taking a look (yet again). About 4: "... so it seems to me either all the load-acquires should go, or else the load-acquires all stay and we add the missing release-store." Yes and thank you for spotting, I had missed taking out the load-acquire in the set_bits_cas_form. Here is an updated webrev: http://cr.openjdk.java.net/~mgronlun/8227605/webrev02/ Cheers Markus -----Original Message----- From: David Holmes Sent: den 31 juli 2019 07:36 To: Markus Gronlund ; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: [13] RFR(XS): 8227605: Kitchensink fails "assert((((klass)->trace_id() & (JfrTraceIdEpoch::leakp_in_use_this_epoch_bit())) != 0)) failed: invariant" Hi Markus, On 31/07/2019 7:04 am, Markus Gronlund wrote: > Greetings, > > Kindly asking for reviews for the following changeset: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8227605 > Webrev: http://cr.openjdk.java.net/~mgronlun/8227605/webrev01/ > > Summary: > Clearing a bit that was set in a previous epoch should be done using CAS not to lose information in the current (this) epoch. This has also been the case up to the changes done in relation to Memory Leak Profiler, where the bit tagging scheme and implementation changed quite substantially. Part of the modifications done there had set_traceid_mask() to not use CAS unfortunately. This is the reason for the assertion, as information about the current (this) epoch was lost. > > We need to restore set_traceid_mask() to use CAS the way it was done originally. AFAICS you have: 1. Refactored the CAS code using a template function to avoid code duplication That seems okay. 2. Changed SET_LEAKP_USED_PREV_EPOCH to use SET_LEAKP_TAG_CAS Okay that ensures CAS is used to update the epoch as per your summary. 3. Modified set_mask to use CAS Okay - as per summary 4. Removed use of load_acquire in the non-CAS form of set_bits This raises some queries about the use of OrderAccess in this code. On the one hand I might expect the load-acquire to be necessary to ensure correct ordering with respect to the implicit release_store of the CAS form. But if we expect correct interaction between the CAS and non-CAS forms then set_bits should itself be using a release-store. Further, given set_bits is not using a release-store, the initial load-acquire in the CAS form is not necessary. So it seems to me either all the load-acquires should go, or else the load-acquires all stay and we add the missing release-store. Cheers, David ----- > Thanks to Erik Gahlin for debugging. > > Markus > From david.holmes at oracle.com Wed Jul 31 10:18:23 2019 From: david.holmes at oracle.com (David Holmes) Date: Wed, 31 Jul 2019 20:18:23 +1000 Subject: [13] RFR(XS): 8227605: Kitchensink fails "assert((((klass)->trace_id() & (JfrTraceIdEpoch::leakp_in_use_this_epoch_bit())) != 0)) failed: invariant" In-Reply-To: References: <41feed64-0678-414e-bb52-e9951f05c1b2@default> <1f233b8e-650e-2306-63e3-c729bfcd350d@oracle.com> Message-ID: <3899b0d4-47d8-9d2a-aa0b-073dc66f3160@oracle.com> On 31/07/2019 7:48 pm, Markus Gronlund wrote: > Hi David, > > Thank you for taking a look (yet again). > > About 4: > "... so it seems to me either all the load-acquires should go, or else the load-acquires all stay and we add the missing release-store." > > Yes and thank you for spotting, I had missed taking out the load-acquire in the set_bits_cas_form. > > Here is an updated webrev: http://cr.openjdk.java.net/~mgronlun/8227605/webrev02/ Looks good. Thanks, David ----- > Cheers > Markus > > -----Original Message----- > From: David Holmes > Sent: den 31 juli 2019 07:36 > To: Markus Gronlund ; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: [13] RFR(XS): 8227605: Kitchensink fails "assert((((klass)->trace_id() & (JfrTraceIdEpoch::leakp_in_use_this_epoch_bit())) != 0)) failed: invariant" > > Hi Markus, > > On 31/07/2019 7:04 am, Markus Gronlund wrote: >> Greetings, >> >> Kindly asking for reviews for the following changeset: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8227605 >> Webrev: http://cr.openjdk.java.net/~mgronlun/8227605/webrev01/ >> >> Summary: >> Clearing a bit that was set in a previous epoch should be done using CAS not to lose information in the current (this) epoch. This has also been the case up to the changes done in relation to Memory Leak Profiler, where the bit tagging scheme and implementation changed quite substantially. Part of the modifications done there had set_traceid_mask() to not use CAS unfortunately. This is the reason for the assertion, as information about the current (this) epoch was lost. >> >> We need to restore set_traceid_mask() to use CAS the way it was done originally. > > AFAICS you have: > > 1. Refactored the CAS code using a template function to avoid code duplication > > That seems okay. > > 2. Changed SET_LEAKP_USED_PREV_EPOCH to use SET_LEAKP_TAG_CAS > > Okay that ensures CAS is used to update the epoch as per your summary. > > 3. Modified set_mask to use CAS > > Okay - as per summary > > 4. Removed use of load_acquire in the non-CAS form of set_bits > > This raises some queries about the use of OrderAccess in this code. On the one hand I might expect the load-acquire to be necessary to ensure correct ordering with respect to the implicit release_store of the CAS form. But if we expect correct interaction between the CAS and non-CAS forms then set_bits should itself be using a release-store. Further, given set_bits is not using a release-store, the initial load-acquire in the CAS form is not necessary. So it seems to me either all the load-acquires should go, or else the load-acquires all stay and we add the missing release-store. > > Cheers, > David > ----- > > >> Thanks to Erik Gahlin for debugging. >> >> Markus >> From coleen.phillimore at oracle.com Wed Jul 31 11:19:33 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 31 Jul 2019 07:19:33 -0400 Subject: RFR (T) 8228673: Remove develop flag StrictSafepointChecks In-Reply-To: <14833d39-3654-cdfb-1deb-8a645779a422@oracle.com> References: <23c654c8-7042-a615-25b4-2eab659d0e48@oracle.com> <14833d39-3654-cdfb-1deb-8a645779a422@oracle.com> Message-ID: <20a11abe-d0e0-1506-841d-905fc987e890@oracle.com> Thanks David! Coleen On 7/31/19 12:19 AM, David Holmes wrote: > Looks good! > > Another flag bites the dust :) > > Thanks, > David > > On 31/07/2019 6:43 am, coleen.phillimore at oracle.com wrote: >> Remove option to turn off checking.? See bug for more details. >> >> open webrev at >> http://cr.openjdk.java.net/~coleenp/2019/8228673.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8228673 >> >> Tested with hs-tier1 on Oracle platforms. >> Thanks, >> Coleen From coleen.phillimore at oracle.com Wed Jul 31 11:23:32 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 31 Jul 2019 07:23:32 -0400 Subject: RFR (T) 8228673: Remove develop flag StrictSafepointChecks In-Reply-To: <6d4d2930-d93b-f95a-204b-0f13473c66d5@redhat.com> References: <23c654c8-7042-a615-25b4-2eab659d0e48@oracle.com> <4d3181d4-3d5e-f1bc-d858-3c2e6eed875f@redhat.com> <6d4d2930-d93b-f95a-204b-0f13473c66d5@redhat.com> Message-ID: <8d97e267-7073-fe75-58f6-31ed69ace135@oracle.com> On 7/31/19 4:15 AM, Aleksey Shipilev wrote: > On 7/30/19 11:31 PM, coleen.phillimore at oracle.com wrote: >> Oh, you mean this one in check_for_valid_safepoint_state (I answered too quickly on the first >> inconsistency). >> >> #ifdef ASSERT >> ? if (potential_vm_operation && is_Java_thread() >> ????? && !Universe::is_bootstrapping()) { >> >> Maybe it's not redundant because of the optimized build. > Yeah. I don't think we have to clean that up at the moment. Not with this change.? We've talked about removing the optimized build, but there may still be users of it.? I'm surprised it built for me yesterday. I have a change to this function later in my patch queue, so I'll clean this out with that change.? I'm trying to expand where we check NSV. Thanks, Coleen > From coleen.phillimore at oracle.com Wed Jul 31 11:29:53 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 31 Jul 2019 07:29:53 -0400 Subject: RFR (S) 8228630: Remove always true parameter to NoSafepointVerifier In-Reply-To: <3CCDE2FA-61B6-41A6-877C-5CD8F089A55E@oracle.com> References: <7c75242b-fd26-16ae-bfee-0ac599e9b431@oracle.com> <3f113d42-1c53-c4cc-9b20-a30ae5a96279@oracle.com> <3CCDE2FA-61B6-41A6-877C-5CD8F089A55E@oracle.com> Message-ID: <43085a41-0c6d-0861-c4c5-7e7eed503f93@oracle.com> On 7/30/19 6:31 PM, Kim Barrett wrote: >> On Jul 30, 2019, at 5:33 PM, coleen.phillimore at oracle.com wrote: >> >> >> I fixed the comment in thread.hpp above the _allow_safepoint_count and renamed the field to _no_safepoint_count, which makes a lot more sense. Retesting. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228630.02/webrev > Thanks for renaming the member and updating the comment as we discussed. Thanks Kim! Coleen > > Looks good. > >> Thanks, >> Coleen >> >> On 7/30/19 4:45 PM, coleen.phillimore at oracle.com wrote: >>> Summary: Also remove NoGCVerifier since NoSafepointVerifier covers GC checking when not already at a safepoint and is a stronger check. >>> >>> See bug for more details also. Tested with all jtreg runtime,compiler,serviceability and gc tests. Also hs-tier1-3 on linux-x64-debug. >>> >>> open webrev at http://cr.openjdk.java.net/~coleenp/2019/8228630.01/webrev >>> bug link https://bugs.openjdk.java.net/browse/JDK-8228630 >>> >>> Thanks, >>> Coleen > From coleen.phillimore at oracle.com Wed Jul 31 11:39:12 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 31 Jul 2019 07:39:12 -0400 Subject: RFR (S) 8228630: Remove always true parameter to NoSafepointVerifier In-Reply-To: <912584ac-b79c-0697-6b0a-9273bd88a72c@oracle.com> References: <7c75242b-fd26-16ae-bfee-0ac599e9b431@oracle.com> <912584ac-b79c-0697-6b0a-9273bd88a72c@oracle.com> Message-ID: <264d73f2-dff0-c480-7d54-fb312a2bbcc1@oracle.com> On 7/31/19 1:01 AM, David Holmes wrote: > Hi Coleen, > > On 31/07/2019 6:45 am, coleen.phillimore at oracle.com wrote: >> Summary: Also remove NoGCVerifier since NoSafepointVerifier covers GC >> checking when not already at a safepoint and is a stronger check. > > It wasn't at all clear to me that we may not want a NoGCVerifier that > is independent of safepoints e.g. for use in a non-JavaThread. But > AFAICS we don't use NoGCVerifier directly, but only via > NoSafepointVerifier - in which case the "no safepoint" check subsumes > the "no gc" check and the whole thing collapses to what you have > (which results in a nice amount of code deletion!). Yes, it took me a lot longer to come to this conclusion and justification despite my bias.? Erik and Kim couldn't think of any reason for a gc thread, for instance, to have a NoGCVerifier outside of a NoSafepointVerifier either. > > It took me a while to follow through all the changes but it seems good. > > I spotted this reference to NoGCVerifier: > > ./share/opto/runtime.cpp:// Thus, it cannot be a leaf since it > contains the NoGCVerifier. > > I'm not at all sure that the rest of the comment related to this is > accurate any more - the reference to the NoGCVerifier didn't make > sense to me. > Thanks, I didn't find that usage after I removed NoGCVerifier.? What a terrifying comment.? I'm not going to change it other than: // Thus, it cannot be a leaf since it contains the NoSafepointVerifier. JRT_LEAF contains a NoSafepointVerifier.? I think the code doesn't want to change state to _thread_in_vm but it may safepoint.? I don't see where though.? I'll leave it to someone who knows the compiler better if they want to clean this up. Thanks, Coleen > Thanks, > David > ----- > >> See bug for more details also.? Tested with all jtreg >> runtime,compiler,serviceability and gc tests. Also hs-tier1-3 on >> linux-x64-debug. >> >> open webrev at >> http://cr.openjdk.java.net/~coleenp/2019/8228630.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8228630 >> Thanks, >> Coleen From daniel.daugherty at oracle.com Wed Jul 31 12:54:45 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Wed, 31 Jul 2019 08:54:45 -0400 Subject: [13] RFR(XS): 8227605: Kitchensink fails "assert((((klass)->trace_id() & (JfrTraceIdEpoch::leakp_in_use_this_epoch_bit())) != 0)) failed: invariant" In-Reply-To: References: <41feed64-0678-414e-bb52-e9951f05c1b2@default> <1f233b8e-650e-2306-63e3-c729bfcd350d@oracle.com> Message-ID: <2c70df53-0ee5-ff2b-8508-194747c93f84@oracle.com> On 7/31/19 5:48 AM, Markus Gronlund wrote: > Hi David, > > Thank you for taking a look (yet again). > > About 4: > "... so it seems to me either all the load-acquires should go, or else the load-acquires all stay and we add the missing release-store." > > Yes and thank you for spotting, I had missed taking out the load-acquire in the set_bits_cas_form. > > Here is an updated webrev: http://cr.openjdk.java.net/~mgronlun/8227605/webrev02/ src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdBits.inline.hpp ??? Nice work here! src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdMacros.hpp ??? No comments. Thumbs up! Dan P.S. ?I'll have to take another look at my own use of load_acquire() in the face of cmpxchg() in my lock free monitor list changeset. :-) > > Cheers > Markus > > -----Original Message----- > From: David Holmes > Sent: den 31 juli 2019 07:36 > To: Markus Gronlund ; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: [13] RFR(XS): 8227605: Kitchensink fails "assert((((klass)->trace_id() & (JfrTraceIdEpoch::leakp_in_use_this_epoch_bit())) != 0)) failed: invariant" > > Hi Markus, > > On 31/07/2019 7:04 am, Markus Gronlund wrote: >> Greetings, >> >> Kindly asking for reviews for the following changeset: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8227605 >> Webrev: http://cr.openjdk.java.net/~mgronlun/8227605/webrev01/ >> >> Summary: >> Clearing a bit that was set in a previous epoch should be done using CAS not to lose information in the current (this) epoch. This has also been the case up to the changes done in relation to Memory Leak Profiler, where the bit tagging scheme and implementation changed quite substantially. Part of the modifications done there had set_traceid_mask() to not use CAS unfortunately. This is the reason for the assertion, as information about the current (this) epoch was lost. >> >> We need to restore set_traceid_mask() to use CAS the way it was done originally. > AFAICS you have: > > 1. Refactored the CAS code using a template function to avoid code duplication > > That seems okay. > > 2. Changed SET_LEAKP_USED_PREV_EPOCH to use SET_LEAKP_TAG_CAS > > Okay that ensures CAS is used to update the epoch as per your summary. > > 3. Modified set_mask to use CAS > > Okay - as per summary > > 4. Removed use of load_acquire in the non-CAS form of set_bits > > This raises some queries about the use of OrderAccess in this code. On the one hand I might expect the load-acquire to be necessary to ensure correct ordering with respect to the implicit release_store of the CAS form. But if we expect correct interaction between the CAS and non-CAS forms then set_bits should itself be using a release-store. Further, given set_bits is not using a release-store, the initial load-acquire in the CAS form is not necessary. So it seems to me either all the load-acquires should go, or else the load-acquires all stay and we add the missing release-store. > > Cheers, > David > ----- > > >> Thanks to Erik Gahlin for debugging. >> >> Markus >> From erik.gahlin at oracle.com Wed Jul 31 13:50:49 2019 From: erik.gahlin at oracle.com (Erik Gahlin) Date: Wed, 31 Jul 2019 15:50:49 +0200 Subject: [13] RFR(XS): 8227605: Kitchensink fails "assert((((klass)->trace_id() & (JfrTraceIdEpoch::leakp_in_use_this_epoch_bit())) != 0)) failed: invariant" In-Reply-To: References: <41feed64-0678-414e-bb52-e9951f05c1b2@default> <1f233b8e-650e-2306-63e3-c729bfcd350d@oracle.com> Message-ID: <5D419CB9.8030809@oracle.com> On 2019-07-31 11:48, Markus Gronlund wrote: > Hi David, > > Thank you for taking a look (yet again). > > About 4: > "... so it seems to me either all the load-acquires should go, or else the load-acquires all stay and we add the missing release-store." > > Yes and thank you for spotting, I had missed taking out the load-acquire in the set_bits_cas_form. > > Here is an updated webrev: http://cr.openjdk.java.net/~mgronlun/8227605/webrev02/ Looks good. Erik > Cheers > Markus > > -----Original Message----- > From: David Holmes > Sent: den 31 juli 2019 07:36 > To: Markus Gronlund ; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: [13] RFR(XS): 8227605: Kitchensink fails "assert((((klass)->trace_id() & (JfrTraceIdEpoch::leakp_in_use_this_epoch_bit())) != 0)) failed: invariant" > > Hi Markus, > > On 31/07/2019 7:04 am, Markus Gronlund wrote: >> Greetings, >> >> Kindly asking for reviews for the following changeset: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8227605 >> Webrev: http://cr.openjdk.java.net/~mgronlun/8227605/webrev01/ >> >> Summary: >> Clearing a bit that was set in a previous epoch should be done using CAS not to lose information in the current (this) epoch. This has also been the case up to the changes done in relation to Memory Leak Profiler, where the bit tagging scheme and implementation changed quite substantially. Part of the modifications done there had set_traceid_mask() to not use CAS unfortunately. This is the reason for the assertion, as information about the current (this) epoch was lost. >> >> We need to restore set_traceid_mask() to use CAS the way it was done originally. > AFAICS you have: > > 1. Refactored the CAS code using a template function to avoid code duplication > > That seems okay. > > 2. Changed SET_LEAKP_USED_PREV_EPOCH to use SET_LEAKP_TAG_CAS > > Okay that ensures CAS is used to update the epoch as per your summary. > > 3. Modified set_mask to use CAS > > Okay - as per summary > > 4. Removed use of load_acquire in the non-CAS form of set_bits > > This raises some queries about the use of OrderAccess in this code. On the one hand I might expect the load-acquire to be necessary to ensure correct ordering with respect to the implicit release_store of the CAS form. But if we expect correct interaction between the CAS and non-CAS forms then set_bits should itself be using a release-store. Further, given set_bits is not using a release-store, the initial load-acquire in the CAS form is not necessary. So it seems to me either all the load-acquires should go, or else the load-acquires all stay and we add the missing release-store. > > Cheers, > David > ----- > > >> Thanks to Erik Gahlin for debugging. >> >> Markus >> From markus.gronlund at oracle.com Wed Jul 31 13:53:19 2019 From: markus.gronlund at oracle.com (Markus Gronlund) Date: Wed, 31 Jul 2019 06:53:19 -0700 (PDT) Subject: [13] RFR(XS): 8227605: Kitchensink fails "assert((((klass)->trace_id() & (JfrTraceIdEpoch::leakp_in_use_this_epoch_bit())) != 0)) failed: invariant" In-Reply-To: <2c70df53-0ee5-ff2b-8508-194747c93f84@oracle.com> References: <41feed64-0678-414e-bb52-e9951f05c1b2@default> <1f233b8e-650e-2306-63e3-c729bfcd350d@oracle.com> <2c70df53-0ee5-ff2b-8508-194747c93f84@oracle.com> Message-ID: Thank you David, Dan and Erik for the reviews! Markus -----Original Message----- From: Daniel D. Daugherty Sent: den 31 juli 2019 14:55 To: Markus Gronlund ; David Holmes ; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: [13] RFR(XS): 8227605: Kitchensink fails "assert((((klass)->trace_id() & (JfrTraceIdEpoch::leakp_in_use_this_epoch_bit())) != 0)) failed: invariant" On 7/31/19 5:48 AM, Markus Gronlund wrote: > Hi David, > > Thank you for taking a look (yet again). > > About 4: > "... so it seems to me either all the load-acquires should go, or else the load-acquires all stay and we add the missing release-store." > > Yes and thank you for spotting, I had missed taking out the load-acquire in the set_bits_cas_form. > > Here is an updated webrev: > http://cr.openjdk.java.net/~mgronlun/8227605/webrev02/ src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdBits.inline.hpp ??? Nice work here! src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdMacros.hpp ??? No comments. Thumbs up! Dan P.S. ?I'll have to take another look at my own use of load_acquire() in the face of cmpxchg() in my lock free monitor list changeset. :-) > > Cheers > Markus > > -----Original Message----- > From: David Holmes > Sent: den 31 juli 2019 07:36 > To: Markus Gronlund ; > hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: [13] RFR(XS): 8227605: Kitchensink fails "assert((((klass)->trace_id() & (JfrTraceIdEpoch::leakp_in_use_this_epoch_bit())) != 0)) failed: invariant" > > Hi Markus, > > On 31/07/2019 7:04 am, Markus Gronlund wrote: >> Greetings, >> >> Kindly asking for reviews for the following changeset: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8227605 >> Webrev: http://cr.openjdk.java.net/~mgronlun/8227605/webrev01/ >> >> Summary: >> Clearing a bit that was set in a previous epoch should be done using CAS not to lose information in the current (this) epoch. This has also been the case up to the changes done in relation to Memory Leak Profiler, where the bit tagging scheme and implementation changed quite substantially. Part of the modifications done there had set_traceid_mask() to not use CAS unfortunately. This is the reason for the assertion, as information about the current (this) epoch was lost. >> >> We need to restore set_traceid_mask() to use CAS the way it was done originally. > AFAICS you have: > > 1. Refactored the CAS code using a template function to avoid code > duplication > > That seems okay. > > 2. Changed SET_LEAKP_USED_PREV_EPOCH to use SET_LEAKP_TAG_CAS > > Okay that ensures CAS is used to update the epoch as per your summary. > > 3. Modified set_mask to use CAS > > Okay - as per summary > > 4. Removed use of load_acquire in the non-CAS form of set_bits > > This raises some queries about the use of OrderAccess in this code. On the one hand I might expect the load-acquire to be necessary to ensure correct ordering with respect to the implicit release_store of the CAS form. But if we expect correct interaction between the CAS and non-CAS forms then set_bits should itself be using a release-store. Further, given set_bits is not using a release-store, the initial load-acquire in the CAS form is not necessary. So it seems to me either all the load-acquires should go, or else the load-acquires all stay and we add the missing release-store. > > Cheers, > David > ----- > > >> Thanks to Erik Gahlin for debugging. >> >> Markus >> From leonid.mesnik at oracle.com Wed Jul 31 22:10:34 2019 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Wed, 31 Jul 2019 15:10:34 -0700 Subject: RFR(S): 8195809: [TESTBUG] jps and jcmd -l support for Docker containers is not tested In-Reply-To: References: Message-ID: <52422f93-0926-e534-75f4-1a2f7e2940c9@oracle.com> Hi Here are general comments about test design: As I understand your docker process should work at least 20 sec * TIME_FACTOR just to wait until SimpleLoop completed. It is wasting of time.? It is good for 1-2 tests but later it might makes sense to improve driver/test process communications. Also the common way to identify and find java process is to use unique key as parameter. You might want to improve the driver/test communication later. See my comments: http://cr.openjdk.java.net/~mseledtsov/8195809.00/test/hotspot/jtreg/containers/docker/TestJcmd.java.html 54 return; I think you need to throw exception to signal that test is skipped. http://cr.openjdk.java.net/~mseledtsov/8195809.00/test/hotspot/jtreg/containers/docker/SimpleLoop.java.html 34 for (int i=0; i < howLong; i++) { The indentation is wrong. http://cr.openjdk.java.net/~mseledtsov/8195809.00/test/lib/jdk/test/lib/containers/docker/DockerTestUtils.java.udiff.html 340 * @return True if container is running Should be true, not True. 367 for(int i=0; i < count; i++) { The indentation is wrong. Otherwise fix looks good. However please get Review from anyone who is expert in Docker. Leonid On 7/29/19 8:46 PM, mikhailo.seledtsov at oracle.com wrote: > Please review this change that: > ? - adds test case for "jcmd -l" and "jcmd help" where jcmd is > executed on a host/node outside the container, > ??? while a targeted JVM is running inside a container > ? - factors out some common functionality to DockerTestUtils and > docker.Common > > Please note: > ? - the "jcmd -l" works in this configuration, however the JCMD's and > Target's username and UID have to match > ??? (per design) > ? - the "jcmd help", "jcmd JFR.start" or any other JCMD command > besides "jcmd -l" does not work in this configuration > ??? (Filed "JDK-8228343: JCMD and attach fail to work across Linux > Container boundary") > ??? The test case is commented out, however can be used for > reproducing the issue, and will be enabled > ??? once the bug is fixed. > > > ??? JBS: https://bugs.openjdk.java.net/browse/JDK-8195809 > ??? Webrev: http://cr.openjdk.java.net/~mseledtsov/8195809.00/ > ??? Testing: > ????? - ran the new test multiple times on Linux-x64 > ????? - ran TestJCMDWithSideCar multiple times on Linux-x64 > ????? - ran all Docker/Container tests (HotSpot and JDK) > ??? All PASS > > Thank you, > Misha > From mikhailo.seledtsov at oracle.com Wed Jul 31 23:34:03 2019 From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com) Date: Wed, 31 Jul 2019 16:34:03 -0700 Subject: RFR(T): 8228904: Problemlist docker/TestJcmdWithSideCar.java until JDK-8228850 and JDK-8228960 are fixed Message-ID: Please review this Trivial change problem listing the docker/TestJcmdWithSideCar.java test. The test found two bugs, one seems to be a product bug and another seems to be a test bug. Placing the test on the problemlist until the bugs are addressed. JBS: https://bugs.openjdk.java.net/browse/JDK-8228904 =========== change diff --git a/test/hotspot/jtreg/ProblemList.txt b/test/hotspot/jtreg/ProblemList.txt --- a/test/hotspot/jtreg/ProblemList.txt +++ b/test/hotspot/jtreg/ProblemList.txt @@ -135,6 +135,16 @@ ?############################################################################# + +############################################################################# + +# :hotspot_containers + +containers/docker/TestJcmdWithSideCar.java 8228850,8228960 generic-all + +############################################################################# + + ?############################################################################# ?# :vmTestbase_* ============ Thank you, Misha From daniel.daugherty at oracle.com Wed Jul 31 23:45:09 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Wed, 31 Jul 2019 19:45:09 -0400 Subject: RFR(T): 8228904: Problemlist docker/TestJcmdWithSideCar.java until JDK-8228850 and JDK-8228960 are fixed In-Reply-To: References: Message-ID: Thumbs up. And I agree that this is trivial. Dan On 7/31/19 7:34 PM, mikhailo.seledtsov at oracle.com wrote: > Please review this Trivial change problem listing the > docker/TestJcmdWithSideCar.java test. The test found two bugs, one > seems to be a product bug and another seems to be a test bug. > > Placing the test on the problemlist until the bugs are addressed. > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8228904 > > =========== change > > diff --git a/test/hotspot/jtreg/ProblemList.txt > b/test/hotspot/jtreg/ProblemList.txt > --- a/test/hotspot/jtreg/ProblemList.txt > +++ b/test/hotspot/jtreg/ProblemList.txt > @@ -135,6 +135,16 @@ > > ?############################################################################# > > > + > +############################################################################# > > + > +# :hotspot_containers > + > +containers/docker/TestJcmdWithSideCar.java 8228850,8228960 generic-all > + > +############################################################################# > > + > + > ?############################################################################# > > > ?# :vmTestbase_* > > > ============ > > > Thank you, > > Misha > From mikhailo.seledtsov at oracle.com Wed Jul 31 23:46:59 2019 From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com) Date: Wed, 31 Jul 2019 16:46:59 -0700 Subject: RFR(T): 8228904: Problemlist docker/TestJcmdWithSideCar.java until JDK-8228850 and JDK-8228960 are fixed In-Reply-To: References: Message-ID: Thank you, Misha On 7/31/19 4:45 PM, Daniel D. Daugherty wrote: > Thumbs up. And I agree that this is trivial. > > Dan > > > On 7/31/19 7:34 PM, mikhailo.seledtsov at oracle.com wrote: >> Please review this Trivial change problem listing the >> docker/TestJcmdWithSideCar.java test. The test found two bugs, one >> seems to be a product bug and another seems to be a test bug. >> >> Placing the test on the problemlist until the bugs are addressed. >> >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8228904 >> >> =========== change >> >> diff --git a/test/hotspot/jtreg/ProblemList.txt >> b/test/hotspot/jtreg/ProblemList.txt >> --- a/test/hotspot/jtreg/ProblemList.txt >> +++ b/test/hotspot/jtreg/ProblemList.txt >> @@ -135,6 +135,16 @@ >> >> ?############################################################################# >> >> >> + >> +############################################################################# >> >> + >> +# :hotspot_containers >> + >> +containers/docker/TestJcmdWithSideCar.java 8228850,8228960 generic-all >> + >> +############################################################################# >> >> + >> + >> ?############################################################################# >> >> >> ?# :vmTestbase_* >> >> >> ============ >> >> >> Thank you, >> >> Misha >> >